Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Code help to create a dummy

    I am very sorry to post this again, but was hoping if someone can give me a suggestion. I am working with longitudinal household data and am stuck at creating a dummy variable that takes the value 1 if a woman is having a child in the following year (so first baby next year is the variable I want to create). Here is a description of the data, where HID is the household ID, PID is the personal ID, and mother ID is the PID of the mother of the person. Any help would be greatly appreciated.
    year HID PID Sex Age MotherID First baby next year
    2019 115 1151 F 30 . 1
    2020 115 1151 F 31 . 0
    2021 115 1151 F 32 . 0
    2020 115 1152 M 1 1151 .
    2021 115 1152 M 2 1151 .
    2021 115 1153 F 1 1151 .

  • #2
    I used rangestat from SSC.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int(year hid pid) str1 sex byte age int motherid byte firstbabynextyear
    2019 115 1151 "F" 30    . 1
    2020 115 1151 "F" 31    . 0
    2021 115 1151 "F" 32    . 0
    2020 115 1152 "M"  1 1151 .
    2021 115 1152 "M"  2 1151 .
    2021 115 1153 "F"  1 1151 .
    end
    
    replace motherid = -1 if motherid == . 
    
    rangestat (min) year, int(motherid pid pid)
    
    gen wanted = year == year_min - 1 if year_min < . 
    
    list, sepby(pid)
    
        +-------------------------------------------------------------------------+
         | year   hid    pid   sex   age   motherid   firstb~r   year_min   wanted |
         |-------------------------------------------------------------------------|
      1. | 2019   115   1151     F    30         -1          1       2020        1 |
      2. | 2020   115   1151     F    31         -1          0       2020        0 |
      3. | 2021   115   1151     F    32         -1          0       2020        0 |
         |-------------------------------------------------------------------------|
      4. | 2020   115   1152     M     1       1151          .          .        . |
      5. | 2021   115   1152     M     2       1151          .          .        . |
         |-------------------------------------------------------------------------|
      6. | 2021   115   1153     F     1       1151          .          .        . |
         +-------------------------------------------------------------------------+
    .

    Comment


    • #3
      I can come close to what you want, and perhaps you can figure out how to finish the job:
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear*
      input int(year hid pid) str2 sex byte age int motherid
      2019 115 1151 "F " 30    .
      2020 115 1151 "F " 31    .
      2021 115 1151 "F " 32    .
      2020 115 1152 "M "  1 1151
      2021 115 1152 "M "  2 1151
      2021 115 1153 "F "  1 1151
      end
      
      frame put _all if !missing(motherid) & age == 1, into(new_babies)
      
      
      frame new_babies {
          by hid motherid (year), sort: keep if _n == 1 // KEEP ONLY FIRST BABY
          gen link_year = year - 1
      }
      
      frlink 1:1 hid pid year, frame(new_babies hid motherid link_year)
      gen byte first_baby_next_year = !missing(new_babies)
      drop new_babies
      frame drop new_babies
      The difference between this result and what you want is that this result has zeroes where you have missing values for the first_baby_next_year variable. It is a matter next of just replacing 0 by missing in the right circumstances. But I don't know what the right circumstances are. Clearly, -replace first_baby_next_year = . if sex == "M"- would be part of it. But I do not know how to identify which women should be reassigned from 0 to missing value. My first thought was that if the person herself has a non-missing mother_id, then we could do that. But there can be three-generation households where the grandmother is still of childbearing age, and one of her daughters is also of childbearing age, and both have a baby the next year. I then thought perhaps to impose it by age. Clearly no 1 or 2 year old will be having a baby in the near term. But it really isn't clear where to draw the line exactly. Although it is very rare, giving birth has occurred even at age 5. And then there are societal and cultural factors that would influence the choice of a more typical cutoff: giving birth at age 13 is pretty rare in economically advanced countries, but less so in some poor countries. So, on the assumption that you know where this data comes from, and you know something about that place, I will leave that judgment call to you. (Or, maybe it is more suitable to just leave it zero anyway.)
      Last edited by Clyde Schechter; 15 Feb 2023, 14:05.

      Comment


      • #4
        Originally posted by Nick Cox View Post
        I used rangestat from SSC.

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input int(year hid pid) str1 sex byte age int motherid byte firstbabynextyear
        2019 115 1151 "F" 30 . 1
        2020 115 1151 "F" 31 . 0
        2021 115 1151 "F" 32 . 0
        2020 115 1152 "M" 1 1151 .
        2021 115 1152 "M" 2 1151 .
        2021 115 1153 "F" 1 1151 .
        end
        
        replace motherid = -1 if motherid == .
        
        rangestat (min) year, int(motherid pid pid)
        
        gen wanted = year == year_min - 1 if year_min < .
        
        list, sepby(pid)
        
        +-------------------------------------------------------------------------+
        | year hid pid sex age motherid firstb~r year_min wanted |
        |-------------------------------------------------------------------------|
        1. | 2019 115 1151 F 30 -1 1 2020 1 |
        2. | 2020 115 1151 F 31 -1 0 2020 0 |
        3. | 2021 115 1151 F 32 -1 0 2020 0 |
        |-------------------------------------------------------------------------|
        4. | 2020 115 1152 M 1 1151 . . . |
        5. | 2021 115 1152 M 2 1151 . . . |
        |-------------------------------------------------------------------------|
        6. | 2021 115 1153 F 1 1151 . . . |
        +-------------------------------------------------------------------------+
        .
        This works great except one issue: it can find the first child next year but the child may not be a baby under 1 years old.

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          I can come close to what you want, and perhaps you can figure out how to finish the job:
          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear*
          input int(year hid pid) str2 sex byte age int motherid
          2019 115 1151 "F " 30 .
          2020 115 1151 "F " 31 .
          2021 115 1151 "F " 32 .
          2020 115 1152 "M " 1 1151
          2021 115 1152 "M " 2 1151
          2021 115 1153 "F " 1 1151
          end
          
          frame put _all if !missing(motherid) & age == 1, into(new_babies)
          
          
          frame new_babies {
          by hid motherid (year), sort: keep if _n == 1 // KEEP ONLY FIRST BABY
          gen link_year = year - 1
          }
          
          frlink 1:1 hid pid year, frame(new_babies hid motherid link_year)
          gen byte first_baby_next_year = !missing(new_babies)
          drop new_babies
          frame drop new_babies
          The difference between this result and what you want is that this result has zeroes where you have missing values for the first_baby_next_year variable. It is a matter next of just replacing 0 by missing in the right circumstances. But I don't know what the right circumstances are. Clearly, -replace first_baby_next_year = . if sex == "M"- would be part of it. But I do not know how to identify which women should be reassigned from 0 to missing value. My first thought was that if the person herself has a non-missing mother_id, then we could do that. But there can be three-generation households where the grandmother is still of childbearing age, and one of her daughters is also of childbearing age, and both have a baby the next year. I then thought perhaps to impose it by age. Clearly no 1 or 2 year old will be having a baby in the near term. But it really isn't clear where to draw the line exactly. Although it is very rare, giving birth has occurred even at age 5. And then there are societal and cultural factors that would influence the choice of a more typical cutoff: giving birth at age 13 is pretty rare in economically advanced countries, but less so in some poor countries. So, on the assumption that you know where this data comes from, and you know something about that place, I will leave that judgment call to you. (Or, maybe it is more suitable to just leave it zero anyway.)

          This worked great, thank you,-I need women between 18 and 50 so it would be easy to make the adjustment.

          Comment


          • #6
            This works great except one issue: it can find the first child next year but the child may not be a baby under 1 years old.
            I don't understand this. If a woman has her first child in a given year, how can the child be more than 1 year old in that year? What am I missing here?

            Comment


            • #7
              I echo Clyde Schechter here.

              Comment


              • #8
                I am sorry, I expressed myself wrongly: In the first code, the dummy "wanted" does not exactly what I need, for example, it captures also mothers who are older (60+) and (first) children greater than 1. The second code, using frames, accomplishes what I need which is a dummy variable next the the mom indicating the presence of a first child (less than 1 years old) next year.

                Comment

                Working...
                X