Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Nested foreach help.


    I am currently working with a dataset with 250 variables and aprox. 200.000 obs.

    I need to create a variable called x if the values in the varlist act1-act75 are equal to 042* or V08*, and if the values in the varlist adm1-adm75 = "yes"
    I was able to run the first part using the following code:
    gen x=.
    foreach var of varlist act1-act75 {
    replace x =1 if strmatch(`var', "042*") | strmatch(`var', "V08*")
    }
    replace x=0 if x==.

    But when I tried both varlist, I got 0 changes (and it should be 650 obs). Here is my code:

    drop x
    gen x=.

    foreach var of varlist act1-act75 {
    foreach y of varlist adm1-adm75 {
    replace x=1 if (strmatch(`var', "042*") | strmatch(`var', "V08*")) & strmatch(`y', "Yes")
    }
    }
    replace x=0 if x==.

    Please help. Thanks in advance.

    Lesley.


  • #2
    Try running the two loops seperately, generating X1 for the first and X2 for the second variable. Then use & to combine the X1 and X2 variable into a single one. If you have problem doing this, then paste an example from your data for more specific help.
    Regards
    --------------------------------------------------
    Attaullah Shah, PhD.
    Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
    FinTechProfessor.com
    https://asdocx.com
    Check out my asdoc program, which sends outputs to MS Word.
    For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

    Comment


    • #3
      Dear Attaullah.

      Thank you for replying.
      Im really trying to run this as a nested loop, assuming it can be done using stata (similar to what we can do using sql).

      Thanks again,

      LM

      Comment


      • #4
        Please follow Attaullah's advice and show us an excerpt from your data. You can use dataex from SSC for this (see section 12 in the FAQ).

        Comment


        • #5
          Dear Friedrich,

          Of course I can do two loops, one for x1 and the x2 (as you could tell in my original post, Im able to do that easily).
          The point of the question was to learn a way to use a nested loop and not 2 independent loops and then a third command to link both variables.

          Thanks for you help.

          LM

          Comment


          • #6
            Welcome to Statalist!

            I note that in the statement of the problem you wrote
            Code:
            if the values in the varlist adm1-adm75 = "yes"
            whereas in your code you wrote
            Code:
            strmatch(`y', "Yes")
            Is the difference in capitalization the source of your problem?

            Comment


            • #7
              Dear William,

              The actual coding was done using capital Y.
              When I run the sql coding in access I get the original 650 obs, but I cant make the nested loop work. Any suggestions?

              Thanks again,

              LM

              Comment


              • #8
                It's really not clear precisely what you are asking here. Your code shows nested loops but your complaint is that they do not do what you expect. To make a guess at why not, people need to have more idea of what your data are like. Hence the repeated requests to see examples of your data. If you are unable or unwilling to provide an example, then I don't see what else can be added, as you already have the technique.

                Comment


                • #9
                  Your post at #7 suggests that your data were imported from Access to Stata. Depending on the characteristics in Access, it it possible that the data contain trailing spaces, which is not typical for Stata. Try
                  Code:
                  strmatch(trim(`y'), "Yes")

                  Comment


                  • #10
                    To the point I made in the previous post, let me add that in Stata trailing spaces are considered significant in comparisons. The advice from help strings includes

                    Capitalization matters. Leading and trailing spaces matter.
                    The following demonstrates this.
                    Code:
                    . input str8 s
                    
                                 s
                      1. "Yes"
                      2. "Yes "
                      3. end
                    
                    . gen m = strmatch(s,"Yes")
                    
                    . list, clean
                    
                              s   m  
                      1.    Yes   1  
                      2.   Yes    0
                    Last edited by William Lisowski; 11 Oct 2015, 06:19.

                    Comment


                    • #11
                      Point taken Nick.

                      First I have to say, after thinking about my problem and comparing my results with sql, I think the way I wrote the nested loop was wrong.

                      Let me start describing the data and giving an example:

                      I have 200 variables with 150.000 obs each.
                      I have 75 var related with disease (act - like medicare billing codes, string) and each of those have a second var that code "Yes" if the code in var act was present on admission or "No" otherwise ( so basically pair of var act1 with adm1, act2 with adm2...)

                      So I have a column for id, act1, adm1, act2, adm2 so on and so forth until act75, adm75:

                      Code:
                        * Example generated by -dataex-. To install: ssc install dataex
                        clear
                        input long id str6 act1 str25 adm1 str6 act2 str25 adm2 str6 act3 str25 adm3
                        485325011 "038.0"  "Yes" "998.6"  "Yes" "584.9"  "No" 
                        485424223 "736.39" "Yes" "038.0"  "No"  "785.52" "No" 
                        485434449 "999.31" "Yes" "038.49" "Yes" "584.5"  "No" 
                        485451483 "750.3"  "Yes" "038.9"  "No"  "507.0"  "No" 
                        485507270 "747.10" "No" "584.5"  "No"  "038.0"  "No" 
                        485524608 "710.1"  "Yes" "507.0"  "No"  "038.0"  "No" 
                        485557724 "V30.01" "No" "771.81" "Yes" "770.7"  "Yes"
                        485606121 "V30.01" "Yes" "769"  "Yes" "287.31" "Yes"
                        485651366 "534.90" "Yes" "567.22" "Yes" "710.0"  "Yes"
                        end
                      My original code was:

                      Code:
                        gen x=. 
                        foreach var of varlist act1-act75 {
                        foreach y of varlist adm1-adm75 {
                        replace x=1 if (strmatch(`var', "042*") | strmatch(`var', "V08*")) & strmatch(`y', "Yes")
                        }
                        }
                        replace x=0 if x==.
                      Now what I really need to do is to create a new var x, if var "act1" = "042*" or "V08*" AND at the same time var "adm1" = "Yes", OR the same with "act2" and "adm2"...so on and so forth until var "act75" = "042*" or "V08*" AND at the same time var "adm75" = "Yes"

                      Thanks again.

                      LM

                      Comment


                      • #12
                        The data example does help, so thanks.

                        I agree; your problem does not call for nested loops. Looping over lists in parallel is quite often misread as calling for nested loops. Consider this:

                        Code:
                        * Example generated by -dataex-. To install: ssc install dataex
                        clear
                        input long id str6 act1 str25 adm1 str6 act2 str25 adm2 str6 act3 str25 adm3
                          485325011 "038.0"  "Yes" "998.6"  "Yes" "584.9"  "No"
                          485424223 "736.39" "Yes" "038.0"  "No"  "785.52" "No"
                          485434449 "999.31" "Yes" "038.49" "Yes" "584.5"  "No"
                          485451483 "750.3"  "Yes" "038.9"  "No"  "507.0"  "No"
                          485507270 "747.10" "No" "584.5"  "No"  "038.0"  "No"
                          485524608 "710.1"  "Yes" "507.0"  "No"  "038.0"  "No"
                          485557724 "V30.01" "No" "771.81" "Yes" "770.7"  "Yes"
                          485606121 "V30.01" "Yes" "769"  "Yes" "287.31" "Yes"
                          485651366 "534.90" "Yes" "567.22" "Yes" "710.0"  "Yes"
                        end
                          
                        gen x = 0
                          
                        forval j = 1/3 {
                            replace x = 1 if inlist(substr(act`j', 1, 3), "042", "V08") & adm`j' == "Yes"
                        }
                        Using inlist() or substr() here is no matter than an arbitrary difference of taste. Initialising to 0 when you know that you want an indicator variable is, however, arguably a little simpler by any standards.

                        Comment


                        • #13
                          Dear Nick, it worked perfectly.

                          Thank for your help!

                          LM

                          Comment

                          Working...
                          X