Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Filling in missing data from previous values?

    Below I have included an example of a wide dataset in which children have ages reported at each wave (ex: wave 5 age is k5agebg). In some instances, age is set to ".m" but we know that interviews are 2 years a part. How can I make .m's on my age variables be equal to the next value minus 2 if currently .m? In the example, k11agebg should be 35 on the first observation and 26 on the second. k10agebg would be 33 on the first observation and 24 on the second. I've done similar things with panel data, but am a bit stumped on solving this in the wide format.

    Code:
    . list hhidpn kidid k1age k2agebg k3agebg k4agebg k5agebg k6agebg k7agebg k8agebg k9agebg k10agebg k11agebg k12agebg kabyearbg if hhidpn == 920672010 , sepby(hhidpn)
    
            +------------------------------------------------------------------------------------------------------------------------------------------------------------+
            |    hhidpn        kidid   k1age   k2agebg   k3agebg   k4agebg   k5agebg   k6agebg   k7agebg   k8agebg   k9agebg   k10agebg   k11agebg   k12agebg   kabyea~g |
            |------------------------------------------------------------------------------------------------------------------------------------------------------------|
    128812. | 920672010   9206720101       .         .         .         .         .         .         .         .         .         .m         .m         37       1977 |
    128813. | 920672010   9206720102       .         .         .         .         .         .         .         .         .         .m         .m         28       1986 |
            +------------------------------------------------------------------------------------------------------------------------------------------------------------+
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long hhidpn str10 kidid byte(k1age k2agebg k3agebg k4agebg k5agebg k6agebg k7agebg k8agebg k9agebg k10agebg k11agebg k12agebg) int kabyearbg
    920672010 "9206720101" . . . . . . . . . .m .m 37 1977
    920672010 "9206720102" . . . . . . . . . .m .m 28 1986
    end

  • #2
    Like most data management and analysis in Stata, this is much more easily done with the data in long layout. And with successive waves of a survey in particular, you will almost certainly be better off with the long layout. So now's as good a time to do that as any:

    Code:
    reshape long k@agebg, i(hhidpn kidid) j(wave)
    gsort hhidpn kidid -wave
    by hhidpn kidid: replace kagebg = kagebg[_n-1] - 2 if kagebg == .m
    By the way, the above code spreads the ages down only to those observations where the original value is .m, other missing values are left alone. Is that what you want? If you want to go all the way back to the very first wave, you can change -if kagebg == .m- to -if missing(kagebg)-.

    One more refinement to consider: in contrast to the k#agebg variables, you have another variable k1age. Because it is named differently, the code does not treat it as the age at round 1, but carries it along as a different variable into every observation in long layout. If k1age is, in fact, just the age at wave 1, and not some other attribute of the child, it would make more sense to treat it as just another variable in the k#agebg series. So, in that situation, I would precede the code shown above with -rename k1age k1agebg-.

    Comment


    • #3
      Thanks for this. It is definitely easier to achieve in the long format. Eventually, my data is converted to long, however, when converting, I am only interested in waves 5-12. My thought process is that there may have been people with .m on age at wave 5, and they wouldn't be able to draw on the previous wave if I reshaped and then tried to apply a fix. In the wide format, I am able to pull data from the waves before 5, and then reshape to long with waves 5-12. That may be convoluted, so I guess I can reshape keeping all waves and then dropping the waves not of interest.

      Comment

      Working...
      X