Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Conditional counting across rows

    Dear Statausers,

    Dealing with my 1° PhD chapter, I would like to include the variable oldermalesiblings and olderfemalesiblings in my regression so I am able to count how many older brothers and sisters my observations have.

    Here is my data, mm3 refers to the current age of each sibling: mm3_01 = age first sibling; mm3_02 = age second sibling...

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str15 id byte(age siblings) float(malesiblingsalive femalesiblingsalive siblingrank) byte(mm1_01 mm1_02 mm1_03 mm3_01 mm3_02 mm3_03)
    "        1 13  2" 16  4 1 1 1 1 1 2 13  .  7
    "        1 41  3" 18  6 2 1 1 1 2 1  . 14 11
    "        2 60  4" 19  8 4 3 4 2 1 1  . 27 24
    "        2116  3" 19  5 3 2 6 1 1 2 35 32 29
    "        3 62  2" 16  8 3 3 5 2 2 2 32 29 27
    "        3170  3" 15  4 2 1 4 1 1 1 34  . 24
    "        3242  3" 15  6 3 3 1 1 1 2 13 11  8
    "        4123  2" 40  2 2 0 3 1 1 . 50 45  .
    "        5117  5" 15  8 5 1 7 1 1 1 35 32 30
    "        5159  3" 15  7 2 1 2 2 2 2  .  . 20
    "        7 15  3" 15  8 3 1 1 1 2 1  .  .  .
    "        7 57  2" 17  5 4 1 1 1 1 2 11  8  6
    "        7 57  8" 17  0 0 0 1 . . .  .  .  .
    "        8 66  2" 15  3 1 1 1 2 1 1 13  .  6
    "        8 73  4" 19  6 4 2 6 1 2 2 34 31 28
    "        8 87  3" 15  4 3 1 1 2 1 1 12  9  6
    "        8122  3" 25  5 0 2 1 1 1 2  .  .  .
    "        8122  4" 15  5 0 2 3 2 1 1 25  .  .
    "        9192  5" 16  2 0 1 1 1 2 .  .  5  .
    "       10  6  5" 15  9 3 2 4 2 1 1  .  .  .
    "       10 16  4" 17  7 3 2 2 1 1 2 20 14 10
    "       10 76  5" 18 10 5 4 3 2 2 1  . 24 20
    "       10121  3" 16  7 4 3 4 1 1 2 30 25 20
    "       11 13  4" 15  6 4 2 2 1 1 1 17 13 11
    "       11 37  7" 17  5 0 0 1 2 1 2  .  .  .
    "       11 53  3" 17  8 2 3 6 2 2 2  . 37  .
    "       11 77  5" 18  8 5 2 7 2 2 1 35  . 32
    "       11101  3" 17  7 4 2 2 2 1 1 20 15  .
    "       11133  2" 15  6 2 3 4 1 2 1 29 24 22
    "       11141  2" 16  7 3 1 2 1 1 2 21  .  .
    "       11213  4" 15  3 1 1 2 2 2 1 18  .  5
    "       12 21  4" 15 10 5 4 5 1 1 2 27 24 22
    "       12 45  4" 22  9 4 0 5 1 2 2 51  .  .
    "       12 75  4" 18  5 1 3 4 1 2 1 30 27  .
    "       12 75  5" 15  5 1 3 5 1 2 1 30 27  .
    "       12 93  2" 16  7 4 0 1 1 2 1 14  . 10
    "       12111  5" 15  8 1 5 3 2 2 1 28 24  .
    "       12159  3" 16  8 2 3 1 2 2 1  .  . 14
    "       13 15  4" 15  8 1 4 2 1 1 2  . 19 12
    "       13 29  4" 15  6 1 1 2 1 1 2 17  .  .
    "       13 50  2" 20  6 1 4 4 2 2 2 30 27 24
    "       13 85  8" 19  8 2 3 6 2 2 2 43  .  .
    "       13106  4" 18  4 3 0 4 1 2 1 31  . 24
    "       13113  3" 19  6 2 2 1 1 2 1  .  . 16
    "       14 18  5" 17 10 5 2 7 2 2 1  .  . 32
    "       14 63  4" 19  6 2 3 3 1 2 2 26 23 16
    "       14 63  5" 16  6 2 3 4 1 2 2 26 23 19
    "       14 72  4" 15  9 1 4 5 2 1 1  .  .  .
    "       14144  4" 15  6 2 3 3 1 2 1 21 18 12
    "       14180  3" 15  5 1 4 1 1 2 2 14 11  8
    "       15  6  3" 21  4 3 1 1 1 1 2 14 11  7
    "       15 90  3" 15 11 7 1 1 1 2 1  .  .  .
    "       15104  4" 15  8 3 4 7 2 2 2 37 34 31
    "       15132  4" 16  6 2 2 5 1 1 1 33  . 27
    "       15146  3" 19  6 4 0 3 1 1 1 27 23 16
    "       15160  3" 17  7 1 5 5 2 2 2  . 30 26
    "       15188  2" 20  5 2 3 4 2 1 2 29 26 23
    "       15188  3" 17  5 2 3 5 2 1 2 29 26 23
    "       16 21  3" 17  7 2 3 3 2 2 2 22 19  .
    "       16 93  3" 20  9 5 0 2 1 1 1 28  .  .
    "       16123  3" 20 11 5 3 8 1 2 1 43 41 39
    "       16147  2" 15  6 1 5 2 2 2 2 17 13 11
    "       16159  3" 16  7 3 2 2 1 2 2 20  .  .
    "       17 87  3" 17  6 3 3 3 2 1 1 23 20 15
    "       17122  2" 17  8 4 2 1 1 1 1  . 15 13
    "       17157  5" 15 10 5 2 6 1 1 2 34 32 29
    "       18 34  2" 17  4 1 2 3 2 1 1 26 23  .
    "       18 58  3" 17  5 3 1 2 1 1 1 23  . 13
    "       18114  3" 25  8 2 1 3 2 2 2 39  .  .
    "       18122  3" 17  3 2 0 1 1 1 1  .  6  3
    "       18194  3" 17  5 5 0 1 1 1 1 14 11  8
    "       19 43  2" 18  4 3 1 1 2 1 1 16 12  7
    "       19 43  3" 16  4 3 1 2 2 1 1 18 12  7
    "       19 71  3" 23  6 2 3 2 2 2 1 25 19 16
    "       19 99  5" 17  8 5 2 4 1 1 1 26 24 21
    "       19 99  6" 16  8 5 2 5 1 1 1 26 24 21
    "       19113  4" 18  9 3 3 2 1 1 2 20 17  .
    "       19176  3" 20  9 1 3 2 1 1 2  .  . 24
    "       20 15  3" 16 10 5 2 4 1 2 2  . 26  .
    "       20 64  4" 15  8 1 4 6 1 1 1  .  . 33
    "       20 99  2" 25  8 2 1 2 1 2 2 30  .  .
    "       20134  3" 18  7 3 4 4 1 2 2 26 23 20
    "       20141  5" 15  8 3 4 4 2 1 1  . 23 21
    "       21 85  3" 15  6 1 4 2 2 1 2 18  . 11
    "       22 32  3" 16  8 5 2 5 1 1 2 25 25 20
    "       22 46  2" 41  7 2 4 2 1 2 1  . 45 36
    "       22 95  9" 16  1 1 0 1 1 . .  7  .  .
    "       22132  3" 15  5 3 1 5 1 1 1 35 33 27
    "       23 46  3" 15  6 2 1 3 1 2 2 28  .  .
    "       23 81  4" 17  6 2 1 1 2 1 1  .  .  9
    "       23123  4" 17  6 2 0 2 1 2 1 20  .  .
    "       25 13  3" 21  9 5 2 3 1 2 1 32  . 25
    "       25 13  4" 17  9 5 2 4 1 2 1 32  . 25
    "       25 19  3" 16  8 3 2 2 1 1 2  . 21  .
    "       25 79  3" 24  6 3 3 7 1 2 2 43 41 38
    "       25 85  3" 19  6 1 2 4 2 1 1  .  . 31
    "       25127  2" 16 10 4 2 7 2 1 2  . 35 33
    "       25127  3" 19  0 0 0 1 . . .  .  .  .
    "       25145  3" 25  2 2 0 1 1 1 . 22 19  .
    "       25151  4" 16  6 5 0 5 1 1 1  . 33 30
    end
    label values mm1_01 mm1_01
    label def mm1_01 1 "male", modify
    label def mm1_01 2 "female", modify
    label values mm1_02 mm1_02
    label def mm1_02 1 "male", modify
    label def mm1_02 2 "female", modify
    label values mm1_03 mm1_03
    label def mm1_03 1 "male", modify
    label def mm1_03 2 "female", modify
    label values mm3_01 mm3_01
    label values mm3_02 mm3_02
    label values mm3_03 mm3_03
    Thank you in advance, I'm really stuck with this matter.

    Daniel.

  • #2
    Try this:
    Code:
    frame put id age mm*, into(working)
    frame working {
        reshape long mm1_0 mm3_0, i(id) j(order)
        rename mm1_0 sib_sex
        rename mm3_0 sib_age
    
        by id (order), sort: egen older_male_sibs = ///
            total(sib_sex == "male":mm1_03 & sib_age > age & !missing(sib_age))
        by id (order): egen older_female_sibs = ///
            total(sib_sex == "female":mm1_03 & sib_age > age & !missing(sib_age))
        keep id older_*
        duplicates drop
    }
    
    frlink 1:1 id, frame(working)
    frget older_*, from(working)
    Now, there is a problem in the way this question is posed. You have a fair amount of missing data on sibling ages. In the code I have shown, siblings with missing ages are not counted as being older, although they might, in fact, so be. In the absence of a way to get the missing ages, there is no ideal solution to this problem. You either have to undercount, as I have done, or overcount. You might think of these variables not as the number of older male and female siblings, but as the number of male and female siblings definitely known to be older.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      Try this:
      Code:
      frame put id age mm*, into(working)
      frame working {
      reshape long mm1_0 mm3_0, i(id) j(order)
      rename mm1_0 sib_sex
      rename mm3_0 sib_age
      
      by id (order), sort: egen older_male_sibs = ///
      total(sib_sex == "male":mm1_03 & sib_age > age & !missing(sib_age))
      by id (order): egen older_female_sibs = ///
      total(sib_sex == "female":mm1_03 & sib_age > age & !missing(sib_age))
      keep id older_*
      duplicates drop
      }
      
      frlink 1:1 id, frame(working)
      frget older_*, from(working)
      Now, there is a problem in the way this question is posed. You have a fair amount of missing data on sibling ages. In the code I have shown, siblings with missing ages are not counted as being older, although they might, in fact, so be. In the absence of a way to get the missing ages, there is no ideal solution to this problem. You either have to undercount, as I have done, or overcount. You might think of these variables not as the number of older male and female siblings, but as the number of male and female siblings definitely known to be older.
      It works well! and thank you also for the warning.

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        Try this:
        Code:
        frame put id age mm*, into(working)
        frame working {
        reshape long mm1_0 mm3_0, i(id) j(order)
        rename mm1_0 sib_sex
        rename mm3_0 sib_age
        
        by id (order), sort: egen older_male_sibs = ///
        total(sib_sex == "male":mm1_03 & sib_age > age & !missing(sib_age))
        by id (order): egen older_female_sibs = ///
        total(sib_sex == "female":mm1_03 & sib_age > age & !missing(sib_age))
        keep id older_*
        duplicates drop
        }
        
        frlink 1:1 id, frame(working)
        frget older_*, from(working)
        Now, there is a problem in the way this question is posed. You have a fair amount of missing data on sibling ages. In the code I have shown, siblings with missing ages are not counted as being older, although they might, in fact, so be. In the absence of a way to get the missing ages, there is no ideal solution to this problem. You either have to undercount, as I have done, or overcount. You might think of these variables not as the number of older male and female siblings, but as the number of male and female siblings definitely known to be older.
        Hello again Mr. Schechter,

        I made a quick check and everything seemed right but now I realize the command has a couple of problems. First, when we add !missing(sib_age) the outcome of older_male_sibs and older_female_sibs is 0 for every id. Second, there are some observations with more than 10 siblings, so mm variables (mm1_, mm2_, mm3_ ...) range from mm3_01 to mm3_20. By the way, when the sibling's age is missing that means that they don't have more siblings or that the sibling has passed away (as you can see also in my code). I may skipped some important info in my first question, my apologies.


        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str15 id byte(age siblings) float(siblingrank siblingsalive malesiblingsalive femalesiblingsalive older_male_sibs older_female_sibs) byte(mm1_01 mm2_01 mm3_01)
        " 1 17 2" 38 4 1 2 2 0 0 0 2 0 .
        " 1 17 3" 17 5 2 5 2 3 0 0 1 1 23
        " 1 18 2" 42 2 2 1 0 1 0 0 2 1 54
        " 1 25 2" 46 4 2 4 2 2 0 0 1 1 47
        " 1 25 10" 17 11 3 8 6 2 0 0 1 0 .
        " 1 45 1" 36 0 1 0 0 0 0 0 . . .
        " 1 65 2" 31 1 1 1 1 0 0 0 1 1 27
        " 1 90 3" 23 6 7 6 5 1 0 0 1 1 41
        " 1 125 2" 17 0 1 0 0 0 0 0 . . .
        " 1 133 2" 45 2 1 2 2 0 0 0 1 1 43
        " 1 183 2" 28 7 6 5 1 4 0 0 2 0 .
        " 1 207 2" 25 5 2 5 4 1 0 0 1 1 27
        " 1 214 2" 25 8 1 3 0 3 0 0 1 0 .
        " 1 263 2" 27 3 3 3 2 1 0 0 1 1 33
        " 1 279 2" 43 8 5 4 3 1 0 0 1 1 60
        " 1 279 4" 17 6 4 6 4 2 0 0 1 1 26
        " 1 316 2" 34 4 5 4 3 1 0 0 1 1 47
        " 1 358 2" 25 8 3 8 5 3 0 0 2 1 30
        " 1 400 2" 42 4 2 2 1 1 0 0 2 1 45
        " 1 400 3" 19 5 1 5 3 2 0 0 2 1 16
        " 1 400 4" 16 5 2 5 3 2 0 0 2 1 19
        " 1 410 2" 39 3 1 3 2 1 0 0 1 1 35
        " 1 410 3" 18 7 1 7 2 5 0 0 1 1 15
        " 1 439 2" 25 5 1 2 1 1 0 0 1 1 23
        " 1 449 2" 38 5 3 3 2 1 0 0 1 1 48
        " 2 29 2" 20 8 1 8 2 6 0 0 2 1 19
        " 2 97 1" 36 9 7.5 9 7 2 0 0 2 1 48
        " 2 165 1" 38 12 1 2 1 1 0 0 2 0 .
        " 2 168 1" 22 5 2 1 0 1 0 0 2 0 .
        " 2 171 2" 30 2 2 2 1 1 0 0 1 1 34
        " 2 176 1" 25 3 4 3 1 2 0 0 1 1 31
        " 2 234 1" 36 5 4 5 3 2 0 0 1 1 42
        " 2 234 2" 15 7 1 7 5 2 0 0 1 1 13
        " 2 235 2" 21 5 3 5 3 2 0 0 1 1 23
        " 2 340 4" 15 6 2 5 5 0 0 0 1 1 16
        " 2 399 2" 29 8 1 8 4 4 0 0 1 1 27
        " 2 437 2" 20 7 1 7 3 4 0 0 2 1 18
        " 2 463 2" 16 7 2 7 3 4 0 0 1 1 18
        " 3 14 2" 19 5 5 4 2 2 0 0 2 1 43
        " 3 14 3" 20 5 1 4 2 2 0 0 1 1 16
        " 3 17 2" 26 8 5 5 2 3 0 0 1 1 50
        " 3 22 2" 30 4 5 4 2 2 0 0 2 1 40
        " 3 31 3" 35 10 4 10 4 6 0 0 1 1 42
        " 3 31 5" 16 10 10 10 4 6 0 0 1 1 42
        " 3 67 2" 27 5 1 5 2 3 0 0 1 1 25
        " 3 100 2" 27 6 3 4 2 2 0 0 2 0 .
        " 3 115 2" 28 3 1 2 2 0 0 0 1 1 25
        " 3 127 2" 40 8 3 8 2 6 0 0 2 1 46
        " 3 131 4" 18 9 7 6 3 3 0 0 2 1 44
        " 3 141 2" 33 6 1 6 3 3 0 0 1 1 30
        " 3 168 2" 32 6 4 5 5 0 0 0 1 1 40
        " 3 188 3" 20 11 3 2 0 2 0 0 1 0 .
        " 3 241 5" 24 8 4 8 5 3 0 0 2 1 34
        " 3 243 1" 27 7 5 6 3 3 0 0 1 1 48
        " 3 250 2" 19 4 2 4 3 1 0 0 2 1 22
        
        
        end
        label values mm1_01 MM1_01
        label def MM1_01 1 "male", modify
        label def MM1_01 2 "female", modify
        label values mm2_01 MM2_01
        label def MM2_01 0 "dead", modify
        label def MM2_01 1 "alive", modify
        label values mm3_01 MM3_01
        Last edited by Daniel Perez Parra; 29 Nov 2022, 07:00.

        Comment


        • #5
          I don't think what you are asking for can be done. With missing information on ages, you just can't be sure who is older and who is not. Now, I notice that in your example data, the ordering *_01, *_02, ... seems to be partially informative about age. That is, whenever the ages are not missing, they are in descending order. So, if the focal household member is age 10, and if sibling 1 is age 8 and sibling 2's age is missing, I can infer that sibling2 is at most 8 years old. Using that logic (assuming that this ordering of siblings holds in the entire data set, and isn't just a fluke in your example), we can resolve many of the missing ages into older or younger. The following code checks my assumption about the age ordering, and, if verified, will calculate older male and female siblings:
          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear*
          input str15 id byte(age siblings) float(malesiblingsalive femalesiblingsalive siblingrank) byte(mm1_01 mm1_02 mm1_03 mm3_01 mm3_02 mm3_03)
          "        1 13  2" 16  4 1 1 1 1 1 2 13  .  7
          "        1 41  3" 18  6 2 1 1 1 2 1  . 14 11
          "        2 60  4" 19  8 4 3 4 2 1 1  . 27 24
          "        2116  3" 19  5 3 2 6 1 1 2 35 32 29
          "        3 62  2" 16  8 3 3 5 2 2 2 32 29 27
          "        3170  3" 15  4 2 1 4 1 1 1 34  . 24
          "        3242  3" 15  6 3 3 1 1 1 2 13 11  8
          "        4123  2" 40  2 2 0 3 1 1 . 50 45  .
          "        5117  5" 15  8 5 1 7 1 1 1 35 32 30
          "        5159  3" 15  7 2 1 2 2 2 2  .  . 20
          "        7 15  3" 15  8 3 1 1 1 2 1  .  .  .
          "        7 57  2" 17  5 4 1 1 1 1 2 11  8  6
          "        7 57  8" 17  0 0 0 1 . . .  .  .  .
          "        8 66  2" 15  3 1 1 1 2 1 1 13  .  6
          "        8 73  4" 19  6 4 2 6 1 2 2 34 31 28
          "        8 87  3" 15  4 3 1 1 2 1 1 12  9  6
          "        8122  3" 25  5 0 2 1 1 1 2  .  .  .
          "        8122  4" 15  5 0 2 3 2 1 1 25  .  .
          "        9192  5" 16  2 0 1 1 1 2 .  .  5  .
          "       10  6  5" 15  9 3 2 4 2 1 1  .  .  .
          "       10 16  4" 17  7 3 2 2 1 1 2 20 14 10
          "       10 76  5" 18 10 5 4 3 2 2 1  . 24 20
          "       10121  3" 16  7 4 3 4 1 1 2 30 25 20
          "       11 13  4" 15  6 4 2 2 1 1 1 17 13 11
          "       11 37  7" 17  5 0 0 1 2 1 2  .  .  .
          "       11 53  3" 17  8 2 3 6 2 2 2  . 37  .
          "       11 77  5" 18  8 5 2 7 2 2 1 35  . 32
          "       11101  3" 17  7 4 2 2 2 1 1 20 15  .
          "       11133  2" 15  6 2 3 4 1 2 1 29 24 22
          "       11141  2" 16  7 3 1 2 1 1 2 21  .  .
          "       11213  4" 15  3 1 1 2 2 2 1 18  .  5
          "       12 21  4" 15 10 5 4 5 1 1 2 27 24 22
          "       12 45  4" 22  9 4 0 5 1 2 2 51  .  .
          "       12 75  4" 18  5 1 3 4 1 2 1 30 27  .
          "       12 75  5" 15  5 1 3 5 1 2 1 30 27  .
          "       12 93  2" 16  7 4 0 1 1 2 1 14  . 10
          "       12111  5" 15  8 1 5 3 2 2 1 28 24  .
          "       12159  3" 16  8 2 3 1 2 2 1  .  . 14
          "       13 15  4" 15  8 1 4 2 1 1 2  . 19 12
          "       13 29  4" 15  6 1 1 2 1 1 2 17  .  .
          "       13 50  2" 20  6 1 4 4 2 2 2 30 27 24
          "       13 85  8" 19  8 2 3 6 2 2 2 43  .  .
          "       13106  4" 18  4 3 0 4 1 2 1 31  . 24
          "       13113  3" 19  6 2 2 1 1 2 1  .  . 16
          "       14 18  5" 17 10 5 2 7 2 2 1  .  . 32
          "       14 63  4" 19  6 2 3 3 1 2 2 26 23 16
          "       14 63  5" 16  6 2 3 4 1 2 2 26 23 19
          "       14 72  4" 15  9 1 4 5 2 1 1  .  .  .
          "       14144  4" 15  6 2 3 3 1 2 1 21 18 12
          "       14180  3" 15  5 1 4 1 1 2 2 14 11  8
          "       15  6  3" 21  4 3 1 1 1 1 2 14 11  7
          "       15 90  3" 15 11 7 1 1 1 2 1  .  .  .
          "       15104  4" 15  8 3 4 7 2 2 2 37 34 31
          "       15132  4" 16  6 2 2 5 1 1 1 33  . 27
          "       15146  3" 19  6 4 0 3 1 1 1 27 23 16
          "       15160  3" 17  7 1 5 5 2 2 2  . 30 26
          "       15188  2" 20  5 2 3 4 2 1 2 29 26 23
          "       15188  3" 17  5 2 3 5 2 1 2 29 26 23
          "       16 21  3" 17  7 2 3 3 2 2 2 22 19  .
          "       16 93  3" 20  9 5 0 2 1 1 1 28  .  .
          "       16123  3" 20 11 5 3 8 1 2 1 43 41 39
          "       16147  2" 15  6 1 5 2 2 2 2 17 13 11
          "       16159  3" 16  7 3 2 2 1 2 2 20  .  .
          "       17 87  3" 17  6 3 3 3 2 1 1 23 20 15
          "       17122  2" 17  8 4 2 1 1 1 1  . 15 13
          "       17157  5" 15 10 5 2 6 1 1 2 34 32 29
          "       18 34  2" 17  4 1 2 3 2 1 1 26 23  .
          "       18 58  3" 17  5 3 1 2 1 1 1 23  . 13
          "       18114  3" 25  8 2 1 3 2 2 2 39  .  .
          "       18122  3" 17  3 2 0 1 1 1 1  .  6  3
          "       18194  3" 17  5 5 0 1 1 1 1 14 11  8
          "       19 43  2" 18  4 3 1 1 2 1 1 16 12  7
          "       19 43  3" 16  4 3 1 2 2 1 1 18 12  7
          "       19 71  3" 23  6 2 3 2 2 2 1 25 19 16
          "       19 99  5" 17  8 5 2 4 1 1 1 26 24 21
          "       19 99  6" 16  8 5 2 5 1 1 1 26 24 21
          "       19113  4" 18  9 3 3 2 1 1 2 20 17  .
          "       19176  3" 20  9 1 3 2 1 1 2  .  . 24
          "       20 15  3" 16 10 5 2 4 1 2 2  . 26  .
          "       20 64  4" 15  8 1 4 6 1 1 1  .  . 33
          "       20 99  2" 25  8 2 1 2 1 2 2 30  .  .
          "       20134  3" 18  7 3 4 4 1 2 2 26 23 20
          "       20141  5" 15  8 3 4 4 2 1 1  . 23 21
          "       21 85  3" 15  6 1 4 2 2 1 2 18  . 11
          "       22 32  3" 16  8 5 2 5 1 1 2 25 25 20
          "       22 46  2" 41  7 2 4 2 1 2 1  . 45 36
          "       22 95  9" 16  1 1 0 1 1 . .  7  .  .
          "       22132  3" 15  5 3 1 5 1 1 1 35 33 27
          "       23 46  3" 15  6 2 1 3 1 2 2 28  .  .
          "       23 81  4" 17  6 2 1 1 2 1 1  .  .  9
          "       23123  4" 17  6 2 0 2 1 2 1 20  .  .
          "       25 13  3" 21  9 5 2 3 1 2 1 32  . 25
          "       25 13  4" 17  9 5 2 4 1 2 1 32  . 25
          "       25 19  3" 16  8 3 2 2 1 1 2  . 21  .
          "       25 79  3" 24  6 3 3 7 1 2 2 43 41 38
          "       25 85  3" 19  6 1 2 4 2 1 1  .  . 31
          "       25127  2" 16 10 4 2 7 2 1 2  . 35 33
          "       25127  3" 19  0 0 0 1 . . .  .  .  .
          "       25145  3" 25  2 2 0 1 1 1 . 22 19  .
          "       25151  4" 16  6 5 0 5 1 1 1  . 33 30
          end
          label values mm1_01 mm1_01
          label def mm1_01 1 "male", modify
          label def mm1_01 2 "female", modify
          label values mm1_02 mm1_02
          label def mm1_02 1 "male", modify
          label def mm1_02 2 "female", modify
          label values mm1_03 mm1_03
          label def mm1_03 1 "male", modify
          label def mm1_03 2 "female", modify
          label values mm3_01 mm3_01
          label values mm3_02 mm3_02
          label values mm3_03 mm3_03
          
          frame put id age siblings mm1_* mm3_*, into(working)
          frame change working
          rename mm1_0* mm1_*
          rename mm3_0* mm3_*
          reshape long mm1_ mm3_, i(id)
          preserve
          drop if missing(mm3_)
          by id (_j): assert mm3_ <= mm3_[_n-1]
          restore
          by id (_j): egen j_cutoff = max(cond(inrange(mm3_, age, .), _j, .))
          replace j_cutoff = 0 if missing(j_cutoff)
          by id (_j): egen older_male_siblings = total(mm1_ == 1 & _j <= j_cutoff)
          by id (_j): egen older_female_siblings = total(mm1_ == 2 & _j <= j_cutoff)
          collapse (first) older*, by(id)
          
          frame change default
          frlink 1:1 id, frame(working)
          frget older*, from(working)
          Now, there is still a problem with some missing ages. Suppose the focal household member is 18 years old, sibling 1's age is missing, and sibling 2 is 14 years old. Well, sibling 1 could be any age 14 or greater. So Sibling 1 might be older than 18 years, or might be younger. There is no way to know. In this situation, the code treats sibling 1 as not being older. This is, in fact, precisely the situation in the example data for id "1 41 3". And there are other situations like this in the example data where even relying on the age ordering of the variables, we cannot tell if a particular sibling is older or not. In all these cases, this code treats the sibling as not being older. Consequently, our numbers will, in some circumstances, be undercounts. Thus the counts of older siblings should be thought of as counts of those siblings who can be verified as older in the data, but not necessarily all older siblings. I cannot think of any way to resolve this problem with the available information in the data.

          One other thing: the original example data did not contain a variable for sibling vital status. So the code above does not distinguish living from dead siblings. If you want to count only still living siblings, you will need to add -& mm2_ == 1- to the argument of the -total()- function in the two -egen- commands just before the -collapse- command.
          Last edited by Clyde Schechter; 29 Nov 2022, 08:08.

          Comment


          • #6
            Originally posted by Clyde Schechter View Post
            I don't think what you are asking for can be done. With missing information on ages, you just can't be sure who is older and who is not. Now, I notice that in your example data, the ordering *_01, *_02, ... seems to be partially informative about age. That is, whenever the ages are not missing, they are in descending order. So, if the focal household member is age 10, and if sibling 1 is age 8 and sibling 2's age is missing, I can infer that sibling2 is at most 8 years old. Using that logic (assuming that this ordering of siblings holds in the entire data set, and isn't just a fluke in your example), we can resolve many of the missing ages into older or younger. The following code checks my assumption about the age ordering, and, if verified, will calculate older male and female siblings:


            Now, there is still a problem with some missing ages. Suppose the focal household member is 18 years old, sibling 1's age is missing, and sibling 2 is 14 years old. Well, sibling 1 could be any age 14 or greater. So Sibling 1 might be older than 18 years, or might be younger. There is no way to know. In this situation, the code treats sibling 1 as not being older. This is, in fact, precisely the situation in the example data for id "1 41 3". And there are other situations like this in the example data where even relying on the age ordering of the variables, we cannot tell if a particular sibling is older or not. In all these cases, this code treats the sibling as not being older. Consequently, our numbers will, in some circumstances, be undercounts. Thus the counts of older siblings should be thought of as counts of those siblings who can be verified as older in the data, but not necessarily all older siblings. I cannot think of any way to resolve this problem with the available information in the data.

            One other thing: the original example data did not contain a variable for sibling vital status. So the code above does not distinguish living from dead siblings. If you want to count only still living siblings, you will need to add -& mm2_ == 1- to the argument of the -total()- function in the two -egen- commands just before the -collapse- command.

            First and foremost, thank you for all the insightful comments. After some coding, I have created a variable (mm21_01-mm21_20) that shows the current age of the siblings alive and the current age the dead ones would have had if they were alive (+- 1-year margin of error). Having this, I applied your first code (#2) but it generates the following error: variable order contains all missing values

            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input str15 id byte(age siblings) float(siblingrank siblingsalive malesiblingsalive femalesiblingsalive) byte(mm1_01 mm2_01 mm3_01) float mm21_01
            "       1  17  2" 38  4   1  2 2 0 2 0  . 34
            "       1  17  3" 17  5   2  5 2 3 1 1 23 24
            "       1  18  2" 42  2   2  1 0 1 2 1 54 55
            "       1  25  2" 46  4   2  4 2 2 1 1 47 48
            "       1  25 10" 17 11   3  8 6 2 1 0  . 26
            "       1  45  1" 36  0   1  0 0 0 . .  .  .
            "       1  65  2" 31  1   1  1 1 0 1 1 27 28
            "       1  90  3" 23  6   7  6 5 1 1 1 41 42
            "       1 125  2" 17  0   1  0 0 0 . .  .  .
            "       1 133  2" 45  2   1  2 2 0 1 1 43 44
            "       1 183  2" 28  7   6  5 1 4 2 0  . 40
            "       1 207  2" 25  5   2  5 4 1 1 1 27 28
            "       1 214  2" 25  8   1  3 0 3 1 0  . 27
            "       1 263  2" 27  3   3  3 2 1 1 1 33 34
            "       1 279  2" 43  8   5  4 3 1 1 1 60 61
            "       1 279  4" 17  6   4  6 4 2 1 1 26 27
            "       1 316  2" 34  4   5  4 3 1 1 1 47 48
            "       1 358  2" 25  8   3  8 5 3 2 1 30 31
            "       1 400  2" 42  4   2  2 1 1 2 1 45 46
            "       1 400  3" 19  5   1  5 3 2 2 1 16 17
            "       1 400  4" 16  5   2  5 3 2 2 1 19 20
            "       1 410  2" 39  3   1  3 2 1 1 1 35 36
            "       1 410  3" 18  7   1  7 2 5 1 1 15 16
            "       1 439  2" 25  5   1  2 1 1 1 1 23 24
            "       1 449  2" 38  5   3  3 2 1 1 1 48 49
            "       2  29  2" 20  8   1  8 2 6 2 1 19 19
            "       2  97  1" 36  9 7.5  9 7 2 2 1 48 49
            "       2 165  1" 38 12   1  2 1 1 2 0  . 41
            "       2 168  1" 22  5   2  1 0 1 2 0  . 32
            "       2 171  2" 30  2   2  2 1 1 1 1 34 35
            "       2 176  1" 25  3   4  3 1 2 1 1 31 32
            "       2 234  1" 36  5   4  5 3 2 1 1 42 43
            "       2 234  2" 15  7   1  7 5 2 1 1 13 14
            "       2 235  2" 21  5   3  5 3 2 1 1 23 24
            "       2 340  4" 15  6   2  5 5 0 1 1 16 17
            "       2 399  2" 29  8   1  8 4 4 1 1 27 28
            "       2 437  2" 20  7   1  7 3 4 2 1 18 19
            "       2 463  2" 16  7   2  7 3 4 1 1 18 19
            "       3  14  2" 19  5   5  4 2 2 2 1 43 44
            "       3  14  3" 20  5   1  4 2 2 1 1 16 17
            "       3  17  2" 26  8   5  5 2 3 1 1 50 51
            "       3  22  2" 30  4   5  4 2 2 2 1 40 41
            "       3  31  3" 35 10   4 10 4 6 1 1 42 43
            "       3  31  5" 16 10  10 10 4 6 1 1 42 43
            "       3  67  2" 27  5   1  5 2 3 1 1 25 26
            "       3 100  2" 27  6   3  4 2 2 2 0  . 41
            "       3 115  2" 28  3   1  2 2 0 1 1 25 26
            "       3 127  2" 40  8   3  8 2 6 2 1 46 47
            "       3 131  4" 18  9   7  6 3 3 2 1 44 45
            "       3 141  2" 33  6   1  6 3 3 1 1 30 31
            "       3 168  2" 32  6   4  5 5 0 1 1 40 41
            
            end
            label values mm1_01 MM1_01
            label def MM1_01 1 "male", modify
            label def MM1_01 2 "female", modify
            label values mm2_01 MM2_01
            label def MM2_01 0 "dead", modify
            label def MM2_01 1 "alive", modify
            label values mm3_01 MM3_01

            Code:
            frame put id age mm*, into(working)
            frame working {
            reshape long mm1 mm21, i(id) j(order)
            rename mm1 sib_sex
            rename mm21 sib_age
            
            by id (order), sort: egen older_male_sibs = ///
            total(sib_sex == "male":mm1_03 & sib_age > age)
            by id (order): egen older_female_sibs = ///
            total(sib_sex == "female":mm1_03 & sib_age > age)
            keep id older_*
            duplicates drop
            }
            Last edited by Daniel Perez Parra; 29 Nov 2022, 11:59.

            Comment


            • #7
              Did you notice in #5 the two commands:
              Code:
              rename mm1_0* mm1_*
              rename mm3_0* mm3_*
              That was not done for cosmetic reasons. The -reshape- command does not recognize 01 as numeric 1, and so -reshape- chokes on it. That's why those commands were there, and they are needed here. In the code offered in #1, that was not necessary because I wrote -reshape long mm1_0 mm_0 ...- which is a different way of getting around the leading zero. But that, as you noted in #4, is not a good solution because there are some id's that have more than 10 siblings, so not all of these mm1* and mm3* variables have a 0 following the _ character.

              (You can maintain the original names, if you wish, by instead adding the -string- option to the -reshape- command. Then order will be created as a string variable. And given the leading zeroes, it will even sort correctly. But I dislike doing that kind of thing: things that are conceptually numeric I prefer to have as numeric variables. If for no other reason than if one of the mm1_* or mm3_* variables is mistakenly named, say, mm1_2, without the leading zero, then order will not sort correctly and all sorts of things will go wrong from that point on.)
              Last edited by Clyde Schechter; 29 Nov 2022, 12:28.

              Comment


              • #8
                I have applied the suggested changes and it perfectly works! Thank you so much Mr. Schechter.

                This was also helpful to learn the existence of some cool commands like frame .

                Code:
                frame put id age siblings mm1_* mm21_*, into(working)
                frame change working
                rename mm1_0* mm1_*
                rename mm21_0* mm21_*
                reshape long mm1_ mm21_, i(id)
                preserve
                drop if missing(mm21_)
                by id (_j): assert mm21_ <= mm21_[_n-1]
                restore
                by id (_j): egen j_cutoff = max(cond(inrange(mm21_, age, .), _j, .))
                replace j_cutoff = 0 if missing(j_cutoff)
                by id (_j): egen older_male_siblings = total(mm1_ == 1 & _j <= j_cutoff)
                by id (_j): egen older_female_siblings = total(mm1_ == 2 & _j <= j_cutoff)
                collapse (first) older*, by(id)
                
                frame change default
                frlink 1:1 id, frame(working)
                frget older*, from(working)

                Comment

                Working...
                X