Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Issue trying to fill in missing data

    Good day,

    I have a gender variable for panel data. This variable is sometimes missing in one or the other wave. Since gender doesn't change over time, I wanted to plug in the missing information to complete my data set. I had a similar issue with the age variable, and Clyde was able to help me in the following thread:

    https://www.statalist.org/forums/for...c-observations

    So I adapted the code from Clyde to use it for the gender variable. And well it worked absolutely fine for my mini data set (don't have access to the entire data) as gender and age were both filled out. The codei s as follows

    Code:
    by id, sort: egen gender1 = min(cond(survey == 1, gender, .))
    by id: egen gender2 = min(cond(survey == 2, gender, .))
    assert gender2 - gender1 == 0 if !missing(gender1, gender2)
    replace gender2 = gender1 + 0 if missing(gender2)
    replace gender1 = gender2 - 0 if missing(gender1)
    replace gender = cond(survey == 1, gender1, gender2)
    drop gender1 gender2
    And in my mini data set, the same as in the thread in the link I posted above, it works just fine. However, my professor sent me back following log file

    Code:
     *gender*
    . by id, sort: egen gender1 = min(cond(survey == 1, gender, .))
    (5628 missing values generated)
    
    . by id: egen gender2 = min(cond(survey == 2, gender, .))
    (6384 missing values generated)
    
    . assert gender2 - gender1 == 0 if !missing(gender1, gender2)
    588 contradictions in 18,456 observations
    assertion is false
    r(9);
    
    end of do-file
    
     su gender2 gender1
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
         gender2 |     19,284     1.61481    .4866527          1          2
         gender1 |     20,040    1.608982    .4879906          1          2
    why is this assertion false? The gender of one wave should be the same as the other, male = 1, female = 2, so subtracting the value from itself should = 0 right? Code-wise it is correct or? The only possible explanation I can think of is that certain observations now report 1 instead of 2 or the other way around.
    Last edited by Oscar Weinzettl; 02 May 2019, 08:58.

  • #2
    Since gender doesn't change over time
    You should know that isn't always true -- in respect of gender that may be in a dataset.

    Code:
    browse id survey gender* if gender1 != gender2
    would be one next step.

    Comment

    Working...
    X