Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a loop to standardize variables by age and gender and create index

    Hi all, I have a series of questions asked of males and females separately. These are relative to physical development, and are measured on differing scales, thus I want to standardize each variable by age and gender before combining them to create one larger index.

    Given the data structure provided, I have attempted a very inelegant solution that requires a lot of lines of code--below I've laid out what would be needed for just 2 variables, with only 4 ages, and for one gender. I have 5 total variables for each gender in my full data, plus a much larger age range (roughly 7 - 16) and therefore I'm hoping someone will have a solution for a loop or other more efficient way of doing this, otherwise this will quickly become hundreds of lines of code.


    My attempted code:
    Code:
    // First standardize score 1 and then combine
    egen score_1_std_12 = std(score_1) if age == 12 & gender == 1
    egen score_1_std_13 = std(score_1) if age == 13 & gender == 1
    egen score_1_std_14 = std(score_1) if age == 14 & gender == 1
    egen score_1_std_15 = std(score_1) if age == 15 & gender == 1
    
    gen score_1_std = .
    replace score_1_std = score_1_std_12 if score_1_std_12 != .
    replace score_1_std = score_1_std_13 if score_1_std_13 != .
    replace score_1_std = score_1_std_14 if score_1_std_14 != .
    replace score_1_std = score_1_std_15 if score_1_std_15 != .
    
    // Standardize score 2 and combine
    egen score_2_std_12 = std(score_2) if age == 12 & gender == 1
    egen score_2_std_13 = std(score_2) if age == 13 & gender == 1
    egen score_2_std_14 = std(score_2) if age == 14 & gender == 1
    egen score_2_std_15 = std(score_2) if age == 15 & gender == 1
    
    gen score_2_std = .
    replace score_2_std = score_2_std_12 if score_2_std_12 != .
    replace score_2_std = score_2_std_13 if score_2_std_13 != .
    replace score_2_std = score_2_std_14 if score_2_std_14 != .
    replace score_2_std = score_2_std_15 if score_2_std_15 != .
    
    // Combine the vars into one larger index
    gen total_score_female = score_1_std + score_2_std




    Data:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(age gender score_1 score_2 score_3 score_4)
    12 0 . . 3 6
    12 0 . . 4 7
    12 0 . . 3 3
    12 0 . . 2 4
    12 0 . . 3 5
    13 0 . . 2 7
    13 0 . . 3 5
    13 0 . . 4 6
    13 0 . . 5 5
    13 0 . . 3 4
    14 0 . . 2 6
    14 0 . . 3 7
    14 0 . . 1 5
    14 0 . . 1 4
    14 0 . . 3 6
    14 0 . . 2 7
    15 0 . . 3 7
    15 0 . . 4 5
    15 0 . . 5 5
    15 0 . . 3 6
    15 0 . . 2 4
    15 0 . . 2 7
    12 1 1 8 . .
    12 1 1 8 . .
    12 1 2 8 . .
    12 1 2 5 . .
    12 1 3 6 . .
    13 1 2 4 . .
    13 1 1 4 . .
    13 1 2 3 . .
    13 1 3 4 . .
    13 1 4 5 . .
    14 1 3 6 . .
    14 1 3 7 . .
    14 1 1 4 . .
    14 1 4 6 . .
    14 1 3 4 . .
    14 1 4 7 . .
    15 1 5 8 . .
    15 1 5 8 . .
    15 1 4 8 . .
    15 1 1 8 . .
    15 1 3 5 . .
    15 1 4 8 . .
    end
    Last edited by Anne Todd; 14 Mar 2023, 13:49.

  • #2
    Yes, this can actually be reduced to just a few lines:

    Code:
    forvalues i = 1/4 {
        by age gender, sort: egen score_`i'_std = std(score_`i')
    }
    
    egen total_score = rowtotal(score_*_std), missing
    I have not created separate total_score variables for the two sexes because I don't understand how that would be helpful. Each observation is coded for sex, and gets standardized scores for the observation's age and sex. So total score is just total score. If you make them separate variables, then each one will be missing for roughly half of the observations. (And if you try to use them both in a regression, given that one of them would always be missing, your regression sample will have no observations and you will get no results.) If there is some specific reason you really need separate variables, then either -separate- or -reshape- will get you to that.

    Comment


    • #3
      Thank you Clyde, very helpful. And yes, you're correct--in the manual method I was originally attempting, I would have just combined them all into one total score in one single variable at the end. Thanks again.

      Comment

      Working...
      X