Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Missing Values for Age Variable in Panel Data

    Hi All,
    I am using a panel data consisting of 5 waves. The corresponding year and waves are as follows:

    year wave

    2008 1
    2010 2
    2012 3
    2014 4
    2017 5


    I have missing values in the age variable across waves as shown in the sample below: Is there a way or code that can assist me to fill in the missing values for the age variable. With a single observation on the age variable, I can determine what the values should be for the other waves, but I can't do these manually with over 60000 observations. Thanks in advance.

    Code:
    * Example generated by -dataex-.    To    install:    ssc    install    dataex
    clear
    input long pid float(wave age)
    401011 1  8
    401011 2 10
    401011 3 12
    401011 4 14
    401012 1 14
    401013 1  9
    401013 2 11
    401013 3 13
    401014 1  3
    401014 2  5
    401014 3  7
    401014 4  9
    401014 5 12
    401015 1 13
    401015 2 15
    401016 1  5
    401016 2  7
    401016 3  9
    401016 4 11
    401016 5 14
    401017 1  7
    401017 2  9
    401017 3 11
    401017 4 13
    401018 1  3
    401018 2  .
    401018 3  7
    401018 4  9
    401018 5 12
    401019 1 10
    401019 2 12
    401019 3 14
    401020 1  2
    401020 2  4
    401020 3  6
    401020 4  8
    401020 5 11
    401021 1  7
    401021 2  9
    401021 3 11
    401021 4 13
    401022 1  7
    401022 2  9
    401022 3 11
    401022 4 13
    401023 1  3
    401023 2  5
    401023 3  7
    401023 4  9
    401023 5 12
    401024 1  1
    401024 2  3
    401024 3  5
    401024 4  7
    401024 5 10
    401025 1 14
    401026 1  8
    401026 2  .
    401026 3 12
    401026 4 14
    401027 1 11
    401027 2  .
    401027 3 15
    401028 1  5
    401028 2  7
    401028 3  9
    401028 4 11
    401028 5 14
    401030 1  1
    401030 2  3
    401030 3  5
    401030 4  7
    401030 5 10
    401031 1 10
    401031 2 12
    401031 3 14
    401032 1  2
    401032 2  4
    401032 3  6
    401032 4  8
    401032 5 11
    401033 1 11
    401033 2 13
    401033 3 15
    401034 1 10
    401034 2 12
    401034 3  .
    401035 1  1
    401035 2  3
    401035 3  .
    401035 4  .
    401035 5 10
    401036 1  5
    401036 2  7
    401036 3  9
    401036 4 11
    401036 5 14
    401037 1  3
    401037 2  5
    401037 3  7
    401037 4  9
    401037 5 12
    401038 1  3
    401038 2  5
    401038 3  7
    401038 4  9
    401038 5 12
    401039 1  9
    401039 2 11
    401039 3 13
    401039 4 15
    401040 1  5
    401040 2  7
    401040 3  9
    401040 4 11
    401040 5 14
    401041 1  9
    401041 2 11
    401041 3 13
    401042 1  8
    401042 2 10
    401042 3 12
    401042 4 14
    401043 1  1
    401043 2  3
    401043 3  5
    401043 4  7
    401043 5 10
    401044 1  6
    401044 2  8
    401044 3 10
    401044 4 12
    401044 5 15
    401045 1  1
    401045 2  3
    401045 3  5
    401045 4  7
    401045 5 10
    401046 1  7
    401046 2  9
    401046 3 11
    401046 4 13
    401047 1  7
    401047 2  9
    401047 3 11
    401047 4 13
    401048 1  2
    401048 2  4
    401048 3  6
    401048 4  8
    401048 5 11
    401050 1  4
    401050 2  6
    401050 3  8
    401050 4 10
    401050 5 13
    401051 1 13
    401052 1  5
    401052 2  7
    401052 3  9
    401052 4 11
    401052 5 14
    401053 1 11
    401053 2 13
    401054 1 15
    401056 1  6
    401056 2  8
    401056 3 10
    401056 4 12
    401056 5 15
    401058 1 15
    401059 1 13
    401060 1  9
    401060 2 11
    401060 3 13
    401061 1 13
    401064 1  3
    401064 2  5
    401064 3  7
    401064 4  9
    401064 5 12
    401065 1  4
    401065 2  .
    401065 3  8
    401065 4 10
    401065 5 13
    401066 1  5
    401066 2  .
    401066 3  9
    401066 4 11
    401066 5 14
    401067 1 13
    401068 1  2
    401068 2  4
    401068 3  6
    401068 4  8
    401068 5 11
    401069 1  9
    401069 2 11
    401069 3 13
    end





  • #2
    There's a better way (and someone will post one), but this might work:
    Code:
    g agefix = age
    bys pid: replace agefix = (l.age+2) if age==. & wave<=5
    bys pid: replace agefix = (l.age+3) if age==. & wave==5
    bys pid: replace agefix = (f.age-2) if age==. & wave<=5
    bys pid: replace agefix = (f.age-3) if age==. & wave==5
    If you always have age in wave 1, then
    Code:
    g adder = wave
    recode adder (1=0) (2=2) (3=4) (4=6) (5=9)
    bys pid: g agefix = age[1]+adder

    Comment


    • #3
      This is a problem for linear interpolation, remembering the principle familiar to small children that a birthday comes once a year. Naturally the interpolation is in terms of year, not wave.

      Code:
      * Example generated by -dataex-.    To    install:    ssc    install    dataex
      clear
      input long pid float(wave age)
      401011 1  8
      401011 2 10
      401011 3 12
      401011 4 14
      401012 1 14
      401013 1  9
      401013 2 11
      401013 3 13
      401014 1  3
      401014 2  5
      401014 3  7
      401014 4  9
      401014 5 12
      401015 1 13
      401015 2 15
      401016 1  5
      401016 2  7
      401016 3  9
      401016 4 11
      401016 5 14
      401017 1  7
      401017 2  9
      401017 3 11
      401017 4 13
      401018 1  3
      401018 2  .
      401018 3  7
      401018 4  9
      401018 5 12
      401019 1 10
      401019 2 12
      401019 3 14
      401020 1  2
      401020 2  4
      401020 3  6
      401020 4  8
      401020 5 11
      401021 1  7
      401021 2  9
      401021 3 11
      401021 4 13
      401022 1  7
      401022 2  9
      401022 3 11
      401022 4 13
      401023 1  3
      401023 2  5
      401023 3  7
      401023 4  9
      401023 5 12
      401024 1  1
      401024 2  3
      401024 3  5
      401024 4  7
      401024 5 10
      401025 1 14
      401026 1  8
      401026 2  .
      401026 3 12
      401026 4 14
      401027 1 11
      401027 2  .
      401027 3 15
      401028 1  5
      401028 2  7
      401028 3  9
      401028 4 11
      401028 5 14
      401030 1  1
      401030 2  3
      401030 3  5
      401030 4  7
      401030 5 10
      401031 1 10
      401031 2 12
      401031 3 14
      401032 1  2
      401032 2  4
      401032 3  6
      401032 4  8
      401032 5 11
      401033 1 11
      401033 2 13
      401033 3 15
      401034 1 10
      401034 2 12
      401034 3  .
      401035 1  1
      401035 2  3
      401035 3  .
      401035 4  .
      401035 5 10
      401036 1  5
      401036 2  7
      401036 3  9
      401036 4 11
      401036 5 14
      401037 1  3
      401037 2  5
      401037 3  7
      401037 4  9
      401037 5 12
      401038 1  3
      401038 2  5
      401038 3  7
      401038 4  9
      401038 5 12
      401039 1  9
      401039 2 11
      401039 3 13
      401039 4 15
      401040 1  5
      401040 2  7
      401040 3  9
      401040 4 11
      401040 5 14
      401041 1  9
      401041 2 11
      401041 3 13
      401042 1  8
      401042 2 10
      401042 3 12
      401042 4 14
      401043 1  1
      401043 2  3
      401043 3  5
      401043 4  7
      401043 5 10
      401044 1  6
      401044 2  8
      401044 3 10
      401044 4 12
      401044 5 15
      401045 1  1
      401045 2  3
      401045 3  5
      401045 4  7
      401045 5 10
      401046 1  7
      401046 2  9
      401046 3 11
      401046 4 13
      401047 1  7
      401047 2  9
      401047 3 11
      401047 4 13
      401048 1  2
      401048 2  4
      401048 3  6
      401048 4  8
      401048 5 11
      401050 1  4
      401050 2  6
      401050 3  8
      401050 4 10
      401050 5 13
      401051 1 13
      401052 1  5
      401052 2  7
      401052 3  9
      401052 4 11
      401052 5 14
      401053 1 11
      401053 2 13
      401054 1 15
      401056 1  6
      401056 2  8
      401056 3 10
      401056 4 12
      401056 5 15
      401058 1 15
      401059 1 13
      401060 1  9
      401060 2 11
      401060 3 13
      401061 1 13
      401064 1  3
      401064 2  5
      401064 3  7
      401064 4  9
      401064 5 12
      401065 1  4
      401065 2  .
      401065 3  8
      401065 4 10
      401065 5 13
      401066 1  5
      401066 2  .
      401066 3  9
      401066 4 11
      401066 5 14
      401067 1 13
      401068 1  2
      401068 2  4
      401068 3  6
      401068 4  8
      401068 5 11
      401069 1  9
      401069 2 11
      401069 3 13
      end
      
      gen year = real(word("2008 2010 2012 2014 2017", wave)) 
      
      bysort pid (year) : ipolate age year, gen(age2) epolate 
      
      egen problem = total(missing(age)), by(pid)
      
      list pid wave age age2 if problem, sepby(pid)
      
          +----------------------------+
           |    pid   wave   age   age2 |
           |----------------------------|
       25. | 401018      1     3      3 |
       26. | 401018      2     .      5 |
       27. | 401018      3     7      7 |
       28. | 401018      4     9      9 |
       29. | 401018      5    12     12 |
           |----------------------------|
       57. | 401026      1     8      8 |
       58. | 401026      2     .     10 |
       59. | 401026      3    12     12 |
       60. | 401026      4    14     14 |
           |----------------------------|
       61. | 401027      1    11     11 |
       62. | 401027      2     .     13 |
       63. | 401027      3    15     15 |
           |----------------------------|
       85. | 401034      1    10     10 |
       86. | 401034      2    12     12 |
       87. | 401034      3     .     14 |
           |----------------------------|
       88. | 401035      1     1      1 |
       89. | 401035      2     3      3 |
       90. | 401035      3     .      5 |
       91. | 401035      4     .      7 |
       92. | 401035      5    10     10 |
           |----------------------------|
      182. | 401065      1     4      4 |
      183. | 401065      2     .      6 |
      184. | 401065      3     8      8 |
      185. | 401065      4    10     10 |
      186. | 401065      5    13     13 |
           |----------------------------|
      187. | 401066      1     5      5 |
      188. | 401066      2     .      7 |
      189. | 401066      3     9      9 |
      190. | 401066      4    11     11 |
      191. | 401066      5    14     14 |
           +----------------------------+

      Comment


      • #4
        Code:
        * Example generated by -dataex-.    To    install:    ssc    install    dataex
        clear
        input long pid float(wave age)
        401011 1  8
        401011 2 10
        401011 3 12
        401011 4 14
        401012 1 14
        401013 1  9
        401013 2 11
        401013 3 13
        401014 1  3
        401014 2  5
        401014 3  7
        401014 4  9
        401014 5 12
        401015 1 13
        401015 2 15
        401016 1  5
        401016 2  7
        401016 3  9
        401016 4 11
        401016 5 14
        401017 1  7
        401017 2  9
        401017 3 11
        401017 4 13
        401018 1  3
        401018 2  .
        401018 3  7
        401018 4  9
        401018 5 12
        401019 1 10
        401019 2 12
        401019 3 14
        401020 1  2
        401020 2  4
        401020 3  6
        401020 4  8
        401020 5 11
        401021 1  7
        401021 2  9
        401021 3 11
        401021 4 13
        401022 1  7
        401022 2  9
        401022 3 11
        401022 4 13
        401023 1  3
        401023 2  5
        401023 3  7
        401023 4  9
        401023 5 12
        401024 1  1
        401024 2  3
        401024 3  5
        401024 4  7
        401024 5 10
        401025 1 14
        401026 1  8
        401026 2  .
        401026 3 12
        401026 4 14
        401027 1 11
        401027 2  .
        401027 3 15
        401028 1  5
        401028 2  7
        401028 3  9
        401028 4 11
        401028 5 14
        401030 1  1
        401030 2  3
        401030 3  5
        401030 4  7
        401030 5 10
        401031 1 10
        401031 2 12
        401031 3 14
        401032 1  2
        401032 2  4
        401032 3  6
        401032 4  8
        401032 5 11
        401033 1 11
        401033 2 13
        401033 3 15
        401034 1 10
        401034 2 12
        401034 3  .
        401035 1  1
        401035 2  3
        401035 3  .
        401035 4  .
        401035 5 10
        401036 1  5
        401036 2  7
        401036 3  9
        401036 4 11
        401036 5 14
        401037 1  3
        401037 2  5
        401037 3  7
        401037 4  9
        401037 5 12
        401038 1  3
        401038 2  5
        401038 3  7
        401038 4  9
        401038 5 12
        401039 1  9
        401039 2 11
        401039 3 13
        401039 4 15
        401040 1  5
        401040 2  7
        401040 3  9
        401040 4 11
        401040 5 14
        401041 1  9
        401041 2 11
        401041 3 13
        401042 1  8
        401042 2 10
        401042 3 12
        401042 4 14
        401043 1  1
        401043 2  3
        401043 3  5
        401043 4  7
        401043 5 10
        401044 1  6
        401044 2  8
        401044 3 10
        401044 4 12
        401044 5 15
        401045 1  1
        401045 2  3
        401045 3  5
        401045 4  7
        401045 5 10
        401046 1  7
        401046 2  9
        401046 3 11
        401046 4 13
        401047 1  7
        401047 2  9
        401047 3 11
        401047 4 13
        401048 1  2
        401048 2  4
        401048 3  6
        401048 4  8
        401048 5 11
        401050 1  4
        401050 2  6
        401050 3  8
        401050 4 10
        401050 5 13
        401051 1 13
        401052 1  5
        401052 2  7
        401052 3  9
        401052 4 11
        401052 5 14
        401053 1 11
        401053 2 13
        401054 1 15
        401056 1  6
        401056 2  8
        401056 3 10
        401056 4 12
        401056 5 15
        401058 1 15
        401059 1 13
        401060 1  .
        401060 2 11
        401060 3 13
        401061 1 13
        401064 1  3
        401064 2  5
        401064 3  7
        401064 4  9
        401064 5 12
        401065 1  4
        401065 2  .
        401065 3  8
        401065 4 10
        401065 5 13
        401066 1  5
        401066 2  .
        401066 3  9
        401066 4 11
        401066 5 14
        401067 1 13
        401068 1  2
        401068 2  4
        401068 3  6
        401068 4  8
        401068 5 11
        401069 1  9
        401069 2 11
        401069 3 13
        end
        tempfile panel_data
        save `panel_data'
        
        clear
        input int(year wave)
        2008 1
        2010 2
        2012 3
        2014 4
        2017 5
        end
        tempfile waves
        save `waves'
        
        use `panel_data'
        merge m:1 wave using `waves', assert(match using) nogenerate
        by pid (wave), sort: replace age = age[_n-1] + year - year[_n-1] if missing(age)
        
        capture assert !missing(age)
        if c(rc) != 0 {
            gsort pid -wave
            by pid: replace age = age[_n-1] + year - year[_n-1] if missing(age)
        }
        
        isid pid wave, sort
        Note: In the example data you provided, age is never missing in wave 1. If that is true in your entire data set, then everything from -capture assert !missing(age)- on is unnecessary (although harmless, as it will do precisely nothing). However, I put that code there in case there is some observation for wave 1 where age is missing, because the code that precedes that part will not work for wave 1. The only circumstance under which this code will fail to fill in a value of age is if it is missing in all observations for a given id. That doesn't happen in your example, and hopefully it does not happen in your full data either.

        I can't do these manually with over 60000 observations
        And you shouldn't do it manually even if only 1 observation were affected! Unless you are just playing around for fun, it is an unacceptable data practice to manually alter data sets. Your data management and analysis should have a complete, unbroken audit trail from beginning to end. You create that audit trail by doing all your analyses in do-files, which include -log- commands to show a complete record of all of Stata's output. This is important not only so that those who use your results have an assurance of data integrity, but also for you. If you have to return to this project after a period of several months, you would likely not remember all the details of what you did. Your do- and log-files are the complete and fully accurate record that you can rely on to refresh your understanding. Never make data changes manually.

        Added: Crossed with #3 and #2 which show two other ways to do this. I like #3 best.
        Last edited by Clyde Schechter; 20 Oct 2022, 15:20.

        Comment


        • #5
          I told you better solutions were forthcoming.
          I just post stuff here so guys like Nick and Clyde and teach me something cool.

          Comment


          • #6
            My thanks to all for the help! And thanks Clyde Schechter for the counsel.

            Comment

            Working...
            X