Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Figuring out the best foreach loop

    Hello Statalist, I am working with some annoying data where each age group has a different variable name for the same question. My end goal is to get a "total score" of all ~15 or so variables (here as an example I just show 3).

    The final step is easy, just totaling up each variable. It is the first step that I am struggling to do efficiently. Basically, I just need to collapse each question into 1 single variable that incorporates all age groups. I can just do this with many lines of -egen-, as you see below (in my real data it would be many more lines of code, and I'd have to repeat this a few different times).


    Code:
    // Step 1: Collapse into a single variable
    egen a1_w1 = rowmax(a1_1_wave1 b1_1_wave1 c1_1_wave1 d1_1_wave1)
    egen a2_w1 = rowmax(a1_2_wave1 b1_2_wave1 c1_2_wave1 d1_2_wave1)
    egen a3_w1 = rowmax(a1_3_wave1 b1_3_wave1 c1_3_wave1 d1_3_wave1)
    
    // Step 2: Generate total 
    egen a_total_w1 = rowtotal(a1_w1 a2_w1 a3_w1)


    But I am just having a mental block on if there's a way to wrangle that into a nice foreach loop (especially Step 1). Here is some example data.



    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(age a1_1_wave1 b1_1_wave1 c1_1_wave1 d1_1_wave1 a1_2_wave1 b1_2_wave1 c1_2_wave1 d1_2_wave1 a1_3_wave1 b1_3_wave1 c1_3_wave1 d1_3_wave1)
    3 1 . . . 1 . . . 1 . . .
    3 0 . . . 1 . . . 1 . . .
    3 0 . . . 0 . . . 1 . . .
    3 0 . . . 0 . . . 0 . . .
    4 . 0 . . . 0 . . . 0 . .
    4 . 1 . . . 1 . . . 1 . .
    4 . 0 . . . 0 . . . 1 . .
    4 . 0 . . . 0 . . . 0 . .
    5 . . 0 . . . 0 . . . 0 .
    5 . . 1 . . . 1 . . . 1 .
    5 . . 0 . . . 0 . . . 0 .
    5 . . 1 . . . 1 . . . 0 .
    6 . . . 0 . . . 1 . . . 0
    6 . . . 1 . . . 0 . . . 0
    6 . . . 0 . . . 0 . . . 0
    6 . . . 1 . . . 1 . . . 1
    end

  • #2
    I think you want
    Code:
    forvalues i = 1/3 {
        egen a`i'_w1 = rowmax(a1_`i'_wave1 b1_`i'_wave1 c1_`i'_wave1 d1_`i'_wave1)
    }
    egen a_total_w1 = rowtotal(a*_w1)

    Comment


    • #3
      Yes this is exactly it. Thanks Clyde Schechter

      Comment


      • #4
        Clyde Schechter , this was not in my example data, but in an observation where all values were missing, that would result in a_total_w1 = 0 with the way rowtotal handles missings, is that correct? Is there an easy way to replace a_total_w1 = . if a*_w1 == . ?

        I suppose I could add _`i' to the end of each variable I generate with -egen-, and then do something like:

        Code:
        replace a_total_w1 = . if (a_*_w1_1 - a_*_w1_13) == .
        Though that doesn't seem particularly elegant.
        Last edited by Taylor Walter; 26 Jul 2022, 18:18.

        Comment


        • #5
          No, just add the -missing- option to both -egen- commands. From the -egen- help file:

          rowtotal(varlist) [, missing]
          may not be combined with by. It creates the (row) sum of the variables in varlist, treating missing as 0. If missing is specified and all values in varlist are missing for an observation, newvar is set to missing for that observation. [Emphasis added]
          Last edited by Clyde Schechter; 26 Jul 2022, 18:55.

          Comment

          Working...
          X