Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Foreach loop in Stata

    I would like to create an indicator variable "Biokids" using the following variable listed in the snapshot below (r2ownkidkn-r12ownkidkn), that would result in the following:
    if r2ownkidkn-r12ownkidkn are missing datat, i.e., "." then return "0" for Biokids, or
    if r2ownkidkn-r12ownkidkn = 0 then return "0" for Biokids, otherwise
    Biokids = 1 (meaning there is a result greater than zero, or not missing, in the series r2ownkidkn-r12ownkidkn)
    Here is a snapshot of the variables and what I would like the new indicator variable "Biokids" to show, given the results in r2ownkidkn-r12ownkidkn.
    hhidpn r2ownkidkn r3ownkidkn r4ownkidkn r5ownkidkn r6ownkidkn r7ownkidkn r8ownkidkn r9ownkidkn r10ownkidkn r11ownkidkn r12ownkidkn Biokids
    1234 4 0 0 . . . . . . . . 1
    5678 8 8 8 8 . . . . . . . 1
    9101 5 5 5 5 5 5 5 5 5 5 . 1
    2134 . . . . . . . . . . . 0
    5167 0 0 0 0 0 0 0 0 0 0 0 0
    This is the coding that I attempted, which did not work:
    generate byte Biokids = 1

    foreach v of varlist r2ownkidkn-r12ownkidkn {

    replace Biokids = 0 if (r2ownkidkn-r12ownkidkn) ==0

    }

    else replace Biokids = 0 if (r2ownkidkn-r12ownkidkn) ==.

    }
    Any ideas or assistance would be much appreciated.

  • #2
    Code:
    egen biokids = rowtotal(r2ownkidkn-r12ownkidkn)
    replace biokids = 1 if inrange(biokids, 1, .)

    In the future when asking for help with code, use the -dataex- command to do that. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Thanks Clyde! The code works perfectly and will save me a lot of time. I also appreciate the instructions on the dataex command, and will use it as future questions arise.

      Comment


      • #4
        As I mentioned above, the following suggested code worked as recommended. Thanks again:

        . egen biokids = rowtotal(r2ownkidkn-r12ownkidkn)

        . replace biokids = 1 if inrange(biokids, 1, .)
        (33,261 real changes made)

        .
        end of do-file

        . dataex r2ownkidkn r3ownkidkn r4ownkidkn r5ownkidkn r6ownkidkn r7ownkidkn r8ownkidkn r9ownkidkn r10ownkidkn r11ownkidkn r12ownkidkn biokids, count(10)

        ----------------------- copy starting from the next line -----------------------
        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input byte(r2ownkidkn r3ownkidkn r4ownkidkn r5ownkidkn r6ownkidkn r7ownkidkn r8ownkidkn r9ownkidkn r10ownkidkn r11ownkidkn r12ownkidkn) float biokids
        4 . . . . . . . . . . 1
        8 8 8 8 . . . . . . . 1
        5 5 5 5 5 5 5 5 5 5 . 1
        2 2 2 2 2 2 2 2 2 2 2 1
        . . . . . . . . . . . 0
        . . . . . . . . . . . 0
        . . . . . . . . . . . 0
        2 2 2 2 2 2 2 2 2 . . 1
        0 0 0 0 0 0 0 0 0 0 0 1
        . 1 1 1 1 1 1 1 1 1 1 1
        end
        ------------------ copy up to and including the previous line ------------------

        Listed 10 out of 37495 observations

        However, when I applied the same logic to create another indicator variable "stepkids", there are several false "1s" returned whenever there are both "0" and "." observation results. Note: in the first case where we created the indicator variable "biokids", there are no observation results of both "0" and "."

        . egen stepkids = rowtotal(r2stepkidkn-r12stepkidkn)

        . replace stepkids = 1 if inrange(stepkids, 1, .)
        (32,036 real changes made)

        .
        end of do-file

        . dataex r2stepkidkn r3stepkidkn r4stepkidkn r5stepkidkn r6stepkidkn r7stepkidkn r8stepkidkn r9stepkidkn r10stepkidkn r11stepkidkn r12stepkidkn stepkids, count
        > (10)

        ----------------------- copy starting from the next line -----------------------
        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input byte(r2stepkidkn r3stepkidkn r4stepkidkn r5stepkidkn r6stepkidkn r7stepkidkn r8stepkidkn r9stepkidkn r10stepkidkn r11stepkidkn r12stepkidkn) float stepkids
        0 . . . . . . . . . . 0
        0 0 0 0 . . . . . . . 1
        0 0 0 0 0 0 0 0 0 0 . 1
        3 3 3 3 3 3 3 3 3 3 3 1
        . . . . . . . . . . . 0
        . . . . . . . . . . . 0
        . . . . . . . . . . . 0
        0 0 0 0 0 0 0 0 0 . . 1
        2 2 2 2 2 2 2 2 2 2 2 1
        . 0 0 0 0 0 0 0 0 0 0 1
        end
        ------------------ copy up to and including the previous line ------------------

        Listed 10 out of 37495 observations

        As can be seen above, the 2nd, 3rd, 8th, and 10th observations are returning "1" for stepkids, and the desire is for any observation with "0" or "." to return "0". I'm using Stata 17.0 BE edition. Any further assistance would be appreciated.

        Comment


        • #5
          As can be seen above, the 2nd, 3rd, 8th, and 10th observations are returning "1" for stepkids

          No, that's not true. The example data you give and the code you give do not produce 1's in the 2nd, 3rd, 8th, and 10th operations. You must be doing something different. Look for yourself:

          Code:
          . * Example generated by -dataex-. For more info, type help dataex
          . clear
          
          . input byte(r2stepkidkn r3stepkidkn r4stepkidkn r5stepkidkn r6stepkidkn r7stepkidkn r8stepkidkn r9stepkidkn r10stepkidkn r11stepkidkn r12
          > stepkidkn)
          
               r2step~n  r3step~n  r4step~n  r5step~n  r6step~n  r7step~n  r8step~n  r9step~n  r10ste~n  r11ste~n  r12ste~n
            1. 0 . . . . . . . . . .
            2. 0 0 0 0 . . . . . . .
            3. 0 0 0 0 0 0 0 0 0 0 .
            4. 3 3 3 3 3 3 3 3 3 3 3
            5. . . . . . . . . . . .
            6. . . . . . . . . . . .
            7. . . . . . . . . . . .
            8. 0 0 0 0 0 0 0 0 0 . .
            9. 2 2 2 2 2 2 2 2 2 2 2
           10. . 0 0 0 0 0 0 0 0 0 0
           11. end
          
          .
          . egen stepkids = rowtotal(r2stepkidkn-r12stepkidkn)
          
          . replace stepkids = 1 if inrange(stepkids, 1, .)
          (2 real changes made)
          
          .
          . list, clean
          
                 r2step~n   r3step~n   r4step~n   r5step~n   r6step~n   r7step~n   r8step~n   r9step~n   r10ste~n   r11ste~n   r12ste~n   stepkids  
            1.          0          .          .          .          .          .          .          .          .          .          .          0  
            2.          0          0          0          0          .          .          .          .          .          .          .          0  
            3.          0          0          0          0          0          0          0          0          0          0          .          0  
            4.          3          3          3          3          3          3          3          3          3          3          3          1  
            5.          .          .          .          .          .          .          .          .          .          .          .          0  
            6.          .          .          .          .          .          .          .          .          .          .          .          0  
            7.          .          .          .          .          .          .          .          .          .          .          .          0  
            8.          0          0          0          0          0          0          0          0          0          .          .          0  
            9.          2          2          2          2          2          2          2          2          2          2          2          1  
           10.          .          0          0          0          0          0          0          0          0          0          0          0
          and the desire is for any observation with "0" or "." to return "0"
          Well, that is quite different from what you said in #1. In fact, the very first row of the data tableau you presented in #1 contradicts that. If what you want is a variable that will return 0 whenever there is any zero or missing value, and 1 otherwise, that would be different:
          Code:
          egen byte anyzero = anymatch(r2stepkidkn-r12stepkidkn), values(0)
          egen byte miss_count = rowmiss(r2stepkidkn-r12stepkidkn)
          gen byte stepkids = !(anyzero | miss_count)

          Comment


          • #6
            Clyde - Thanks again. You are correct in the case where the "stepkids" variable code is executed using only r2stepkidkn-r12stepkidn. Note that there are three variables of interest in my data set, and 11 survey waves (denoted as 2-12). I had included all three variables: rXownkidkn, rXstepkidkn, and rXothkidkn in my subset, to generate indicator variables: ownkids, stepkids, otherkids. When I ran your code with all three present, that's when I ran into problems. My fault for not disclosing this earlier. I suppose the correct route is to filter down to just one variable at a time using this code.

            Comment


            • #7
              This brings up a general point about Stata that bears repeating. In Stata, the notation var2-var12 refers to all variables that are located between var2 and var12 in the layout of the data set at the time the command containing the notation is executed. So if your data are arranged something like r2ownkidkn r2stepkidkn r2othkidkn r3ownkidkn r3stepkidkn r3othkidkn..., which is not a bad way to arrange this kind of data for most purposes, then you have to be careful about using the - notation to refer to groups of variables. Because r2ownkidkn-r12ownkidkn will, in this setting, including all of the r*kidkn variables for waves 2 through 11, plus r12ownkidkn!

              One workaround is to re-arrange the variables so the r*ownkidkn variables are a consecutive block, the r*stepkidkn variables are yet another consecutive block, and the r*othkidkn variables still a third consecutive block.

              But if you want to keep the arrangement you already have, there is another way to go. Don't use the - notation. Use the * wildcard. r*ownkidkn will capture all of the ownkidkn variables, regardless of where they are located in the data set. In fact, in my own data management, I make very little use of the - notation for a group of variables. That's precisely because I'm always concerned that I have no guarantee that the range will not include some extraneous variables. I tend to rely on the * wildcard more, because I name variables in such a way that this usually defines a meaningful group of variables that will be treated similarly in data management. The only reason I used the - notation in my responses in this thread is that I was following your lead from the code you showed in #1, and the data example you showed seemed compatible with using it.

              Well, you've discovered that yourself--I'm writing it here not so much to communicate to you but to make a point to others who might be following this thread.

              Comment


              • #8
                Clyde - Very helpful. As you mention in the first paragraph of your latest response, my variables are arranged exactly as you describe. I used the wildcard notation * as follows, and it worked perfectly without having to rearrange columns. Thank you again for your valuable time and effort!

                . egen biokids = rowtotal(r*ownkidkn)

                . replace biokids = 1 if inrange(biokids, 1, .)
                (32,085 real changes made)

                . egen stepkids = rowtotal(r*stepkidkn)

                . replace stepkids = 1 if inrange(stepkids, 1, .)
                (7,587 real changes made)

                . egen otherkids = rowtotal(r*othkidkn)

                . replace otherkids = 1 if inrange(otherkids, 1, .)
                (915 real changes made)
                .
                end of do-file

                Code:
                * Example generated by -dataex-. For more info, type help dataex
                clear
                input float(biokids stepkids otherkids)
                1 0 0
                1 0 0
                1 0 0
                1 1 0
                0 0 0
                0 0 0
                0 0 0
                1 0 0
                0 1 0
                1 0 0
                end

                Comment

                Working...
                X