I have a dataset with approximately 300K observations and 50 variables with multiple rows of data per ID (seq8). Date of birth is in a string variable (pdatbir). A minority of records (0.85%) are missing either day or month of birth or both and pdatbir accordingly reads something like 19749901 or 19749999.
Sometimes, date of birth is missing a component for only one of multiple records within a distinct ID, and sometimes it is missing across all records for the distinct ID.
In the syntax below, I am trying to accomplish four things
I am having the most trouble with (4). I keep generating different dates by ID. I have tried putting by (seq8) sort: in front of the gen commands, and the replace commands as you will see, but have had no success. Any suggestions would be much appreciated.
Sometimes, date of birth is missing a component for only one of multiple records within a distinct ID, and sometimes it is missing across all records for the distinct ID.
In the syntax below, I am trying to accomplish four things
- Convert the string variable to a %td format
- Replace missing birth months with an imputed birth month according to whether or not they were born before or after the census (birthcensus)
- Replace missing birth days with an imputed birth day that is logical (i.e. cannot be 30 in February)
- Assign the same imputed birthdate to multiple records with the same ID
I am having the most trouble with (4). I keep generating different dates by ID. I have tried putting by (seq8) sort: in front of the gen commands, and the replace commands as you will see, but have had no success. Any suggestions would be much appreciated.
Code:
gen _birthyr=substr(pdatbir,1,4) destring _birthyr, replace gen _birthmonth=substr(pdatbir,5,2) gen _birthmonthmiss=. replace _birthmonthmiss=1 if _birthmonth=="99" gen _birthmonth_impute1=1+int((5)*runiform()) gen _birthmonth_impute2=6+int((7)*runiform()) destring _birthmonth, replace replace _birthmonth=_birthmonth_impute1 if _birthmonthmiss==1 & birthcensus==0 replace _birthmonth=_birthmonth_impute2 if _birthmonthmiss==1 & birthcensus==1 recast double _birthmonth gen _birthday=substr(pdatbir,7,2) gen _birthdaymiss=. replace _birthdaymiss=1 if _birthday=="99" gen _birthday_impute=1+int((30)*runiform()) replace _birthday_impute=1+int((27)*runiform()) if _birthmonth==2 destring _birthday, replace replace _birthday=_birthday_impute if _birthdaymiss==1 recast double _birthday bysort seq8: gen _birthdate = mdy(_birthmonth, _birthday, _birthyr) format _birthdate %td by seq8 (tdccrseqnum), sort: replace _birthdate=_birthdate[_n-1] if _birthdate!=.
Comment