Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Making a new or existing variable constant within ID groups

    I have a dataset with approximately 300K observations and 50 variables with multiple rows of data per ID (seq8). Date of birth is in a string variable (pdatbir). A minority of records (0.85%) are missing either day or month of birth or both and pdatbir accordingly reads something like 19749901 or 19749999.

    Sometimes, date of birth is missing a component for only one of multiple records within a distinct ID, and sometimes it is missing across all records for the distinct ID.

    In the syntax below, I am trying to accomplish four things
    1. Convert the string variable to a %td format
    2. Replace missing birth months with an imputed birth month according to whether or not they were born before or after the census (birthcensus)
    3. Replace missing birth days with an imputed birth day that is logical (i.e. cannot be 30 in February)
    4. Assign the same imputed birthdate to multiple records with the same ID

    I am having the most trouble with (4). I keep generating different dates by ID. I have tried putting by (seq8) sort: in front of the gen commands, and the replace commands as you will see, but have had no success. Any suggestions would be much appreciated.
    Code:
     gen _birthyr=substr(pdatbir,1,4)
    destring _birthyr, replace
     
     gen _birthmonth=substr(pdatbir,5,2)
     gen _birthmonthmiss=.
     replace _birthmonthmiss=1 if _birthmonth=="99"
     gen _birthmonth_impute1=1+int((5)*runiform())
     gen _birthmonth_impute2=6+int((7)*runiform())
     destring _birthmonth, replace
    replace _birthmonth=_birthmonth_impute1 if _birthmonthmiss==1 & birthcensus==0
    replace _birthmonth=_birthmonth_impute2 if _birthmonthmiss==1 & birthcensus==1
    recast double _birthmonth
    
     gen _birthday=substr(pdatbir,7,2)
     gen _birthdaymiss=.
     replace _birthdaymiss=1 if _birthday=="99"
    gen _birthday_impute=1+int((30)*runiform())
    replace _birthday_impute=1+int((27)*runiform()) if _birthmonth==2
     destring _birthday, replace
     replace _birthday=_birthday_impute if _birthdaymiss==1
    recast double _birthday
    
     bysort seq8: gen _birthdate = mdy(_birthmonth, _birthday, _birthyr)
    format _birthdate %td
    
    by seq8 (tdccrseqnum), sort: replace _birthdate=_birthdate[_n-1] if _birthdate!=.

  • #2
    On (4) your logic seems confused: You want to replace missing values with non-missing values, not to replace non-missing values.

    Code:
      
     by seq8 (tdccrseqnum _birthdate), sort: replace _birthdate=_birthdate[_n-1] if _birthdate == .
    See also

    SJ-12-1 dm0062 . . . . . . . . . Stata tip 105: Daily dates with missing days
    . . . . . . . . . . . . . . . . . . . . . S. J. Samuels and N. J. Cox
    Q1/12 SJ 12(1):159--161 (no commands)
    presents strategies for dealing with daily dates where
    some observations only provide the month and year

    which is slightly smart about stuff like varying month length and leap years.

    Comment

    Working...
    X