Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fill in missing information for only specific observations

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte survey int id byte(implicate gift_received) float gift_total byte age
    1 234 5 1 15000 .
    2 234 1 1 7500 76
    2 234 3 1 7500 76
    2 234 2 1 7500 76
    1 234 2 1 15000 .
    2 234 5 1 7500 76
    2 234 4 1 7500 76
    1 234 1 1 15000 .
    2 234 0 1 7500 76
    1 234 4 1 15000 .
    1 234 3 1 15000 .
    1 234 0 1 15000 .
    2 456 2 2 0 .
    1 456 0 1 50000 54
    2 456 4 2 0 .
    1 456 1 1 50000 54
    2 456 0 2 0 .
    2 456 3 2 0 .
    2 456 5 2 0 .
    1 456 4 1 50000 54
    1 456 3 1 50000 54
    1 456 2 1 50000 54
    1 456 5 1 50000 54
    2 456 1 2 0 .
    end
    ​​​Hello, if you look at the age variable you will notice that some are missing. This sample data consists of 2 people, 234 and 456. Both appear in 2 waves, marked by the survey variable. For some reason, the age variable isn't complete. I would like to do that. 3 years lie between wave 1 and 2. So I would like to correct for that.

    I would need to add 3 if the missing data is in survey 2 and subtract from 3 if the missing data is in survey 1. How do I do this?

    Also, it is possible (even if not shown here) that the person gave no information on age in any of the waves . So I would have 2 missings in both waves. Or the observation has the age variable filled out for both waves In these cases I wouldn't be able to fill in the information. How would I be able to fill the information, controlling for observations that either a) have the full information or b) have none of the information? And would this method be applicable in the same way with other variables, such as a binary variable like gneder?
    Last edited by Oscar Weinzettl; 28 Apr 2019, 14:50.

  • #2
    There may be a simpler way, but this will do it:

    Code:
    by id, sort: egen age1 = min(cond(survey == 1, age, .))
    by id: egen age2 = min(cond(survey == 2, age, .))
    assert age2 - age1 == 3 if !missing(age1, age2)
    replace age2 = age1 + 3 if missing(age2)
    replace age1 = age2 - 3 if missing(age1)
    replace age = cond(survey == 1, age1, age2)
    Evidently, if no age is reported in either wave, age will remain missing. Also, if age is reported in both waves, the code checks that they differ by 3 years as they should. (Where there are huge data gaps, there are also often data errors.)

    Comment


    • #3
      Thank you Clyde very much! This is very helpful for filling in various missing information across multiple variables!

      Comment

      Working...
      X