Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Need help with converting a panel dataset into time-series to calculate the probability of birth

    Hi, I have a dataset, wherein there is an ID variable for each mother, and then it is repeated as many times as she has given birth.
    So, for example if she has given 3 births, it would look like this -

    ID Kidbirthyr
    1. 1972
    1. 1978
    1. 1980

    Kidbirthyr goes from 1970-2016.
    Then it has other columns about the kid, their sex, mortality rates, etc.
    Now I want to expand this by creating years from 1970 to 2016 for each mother so it would look like this

    ID Years Value
    1. 1970. 0
    1. 1971. 0
    1. 1972. 1
    1. 1973. 0
    1. 1974. 0
    1. 1975. 0
    1. 1976. 0
    1. 1977. 0
    1. 1978. 1
    1. 1979. 0
    1. .
    .
    .
    1. 2016. 0

    I want this to repeat for each mother ID and a value variable be created which takes the value 0 if the mother did not give birth that year or 1 if the mother did give birth. I want the other variables such as sex to remain consistent for the values where 1 is present and be missing for values wherein the value is 0.

    I have tried using the expand command but it repeats the ID variable more times than I want so for mothers who have 3 births it expands 3X times I want and for mothers who have 2 births, it expands 2X times I want.

    How can I achieve this dataset, can someone please help, thank you so much!!!

  • #2
    You have to be clear on exactly what you need and indeed on what you tried. I am at loss to understand exactly how expand creates observations absent from the data but you don't show your code.

    If someone gave birth in 1972 it is clear that they were also alive in 1970, but that won't always (or even usually) be true for a birth in 2016. Then -- you know this, but it is relevant, as Stata doesn't know it -- even when alive a woman could be either too young or too old to have a child and do you want 0s for those years? Naturally the limits are hard to determine without data on years of menarche or menopause.

    One approach is to use tsfill after tsset. As the example shows, this won't add observations before or after each panel. This approach won't work if there are multiple observations for each identifier and year, but use duplicates to remove any such.

    Code:
    clear
    input ID Kidbirthyr
    1 1972
    1 1978
    1 1980
    2 2010
    2 2012
    2 2016 
    end 
    
    tsset ID Kidbirthyr 
    gen birth = 1 
    
    tsfill 
    
    replace birth = 0 if birth == . 
    
    list, sepby(ID)
    
         +-----------------------+
         | ID   Kidbir~r   birth |
         |-----------------------|
      1. |  1       1972       1 |
      2. |  1       1973       0 |
      3. |  1       1974       0 |
      4. |  1       1975       0 |
      5. |  1       1976       0 |
      6. |  1       1977       0 |
      7. |  1       1978       1 |
      8. |  1       1979       0 |
      9. |  1       1980       1 |
         |-----------------------|
     10. |  2       2010       1 |
     11. |  2       2011       0 |
     12. |  2       2012       1 |
     13. |  2       2013       0 |
     14. |  2       2014       0 |
     15. |  2       2015       0 |
     16. |  2       2016       1 |
         +-----------------------+

    I note that fillin ID kidbirthyr would create many dubious or incorrect extra observations.



    Comment

    Working...
    X