Need help with converting a panel dataset into time-series to calculate the probability of birth

Aryaman Chawla

Join Date: Apr 2023

Posts: 3
#1

Need help with converting a panel dataset into time-series to calculate the probability of birth

05 Apr 2023, 04:34

Hi, I have a dataset, wherein there is an ID variable for each mother, and then it is repeated as many times as she has given birth.
So, for example if she has given 3 births, it would look like this -

ID Kidbirthyr
1. 1972
1. 1978
1. 1980

Kidbirthyr goes from 1970-2016.
Then it has other columns about the kid, their sex, mortality rates, etc.
Now I want to expand this by creating years from 1970 to 2016 for each mother so it would look like this

ID Years Value
1. 1970. 0
1. 1971. 0
1. 1972. 1
1. 1973. 0
1. 1974. 0
1. 1975. 0
1. 1976. 0
1. 1977. 0
1. 1978. 1
1. 1979. 0
1. .
.
.
1. 2016. 0

I want this to repeat for each mother ID and a value variable be created which takes the value 0 if the mother did not give birth that year or 1 if the mother did give birth. I want the other variables such as sex to remain consistent for the values where 1 is present and be missing for values wherein the value is 0.

I have tried using the expand command but it repeats the ID variable more times than I want so for mothers who have 3 births it expands 3X times I want and for mothers who have 2 births, it expands 2X times I want.

How can I achieve this dataset, can someone please help, thank you so much!!!
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35651
#2

05 Apr 2023, 05:26

You have to be clear on exactly what you need and indeed on what you tried. I am at loss to understand exactly how expand creates observations absent from the data but you don't show your code.

If someone gave birth in 1972 it is clear that they were also alive in 1970, but that won't always (or even usually) be true for a birth in 2016. Then -- you know this, but it is relevant, as Stata doesn't know it -- even when alive a woman could be either too young or too old to have a child and do you want 0s for those years? Naturally the limits are hard to determine without data on years of menarche or menopause.

One approach is to use tsfill after tsset. As the example shows, this won't add observations before or after each panel. This approach won't work if there are multiple observations for each identifier and year, but use duplicates to remove any such.

Code:

clear input ID Kidbirthyr 1 1972 1 1978 1 1980 2 2010 2 2012 2 2016 end tsset ID Kidbirthyr gen birth = 1 tsfill replace birth = 0 if birth == . list, sepby(ID) +-----------------------+ | ID Kidbir~r birth | |-----------------------| 1. | 1 1972 1 | 2. | 1 1973 0 | 3. | 1 1974 0 | 4. | 1 1975 0 | 5. | 1 1976 0 | 6. | 1 1977 0 | 7. | 1 1978 1 | 8. | 1 1979 0 | 9. | 1 1980 1 | |-----------------------| 10. | 2 2010 1 | 11. | 2 2011 0 | 12. | 2 2012 1 | 13. | 2 2013 0 | 14. | 2 2014 0 | 15. | 2 2015 0 | 16. | 2 2016 1 | +-----------------------+

I note that fillin ID kidbirthyr would create many dubious or incorrect extra observations.
Comment

Announcement

Need help with converting a panel dataset into time-series to calculate the probability of birth

Comment