Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to impute missing values for gender and race in panel data

    Dear all,
    I am working with panel data that consists of 3 waves: 3,4 and 5. I want to impute missing values for gender and race variables. I have confirmed that gender and race are constant across time. The race variable is categorical with 1"African" 2"Coloured 3"Asian/Indian" 4"White." Gender is coded as 1"Male" 2"Female". I want to impute the missing gender and race categories based on the observed value in any wave.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long pid float(wave gender race)
    404541 3 1 1
    404541 4 . .
    404541 5 . .
    404544 3 2 1
    404544 4 2 1
    404544 5 2 1
    404545 3 1 1
    404545 4 1 1
    404545 5 1 1
    404546 3 2 1
    404546 4 2 1
    404546 5 2 1
    404547 3 2 1
    404547 4 2 1
    404547 5 2 1
    404549 3 1 1
    404549 4 1 1
    404549 5 1 1
    404550 3 2 1
    404550 4 2 1
    404550 5 2 1
    404551 3 2 1
    404551 4 2 1
    404551 5 2 1
    404552 3 2 1
    404552 4 2 1
    404552 5 2 1
    404555 3 1 1
    404555 4 1 1
    404555 5 1 1
    404556 3 2 1
    404556 4 2 1
    404556 5 2 1
    404560 3 2 1
    404560 4 2 1
    404560 5 2 1
    404561 3 2 1
    404561 4 2 1
    404561 5 2 1
    404562 3 2 1
    404562 4 2 1
    404562 5 2 1
    404569 3 1 1
    404569 4 1 1
    404569 5 1 1
    404571 3 2 1
    404571 4 2 1
    404571 5 2 1
    404584 3 1 3
    404584 4 1 3
    404584 5 1 3
    404585 3 1 4
    404585 4 1 4
    404585 5 1 4
    404586 3 1 4
    404586 4 1 4
    404586 5 1 4
    404594 3 1 1
    404594 4 1 1
    404594 5 1 1
    404595 3 1 1
    404595 4 1 1
    404595 5 1 1
    404596 3 2 1
    404596 4 2 1
    404596 5 2 1
    404600 3 2 1
    404600 4 2 1
    404600 5 2 1
    404601 3 1 1
    404601 4 1 1
    404601 5 1 1
    404602 3 2 1
    404602 4 2 1
    404602 5 2 1
    404607 3 2 1
    404607 4 2 1
    404607 5 2 1
    404609 3 2 1
    404609 4 2 1
    404609 5 2 1
    404611 3 2 1
    404611 4 2 1
    404611 5 2 1
    404612 3 2 1
    404612 4 2 1
    404612 5 2 1
    404613 3 1 1
    404613 4 1 1
    404613 5 1 1
    404614 3 2 1
    404614 4 2 1
    404614 5 2 1
    404616 3 2 1
    404616 4 2 1
    404616 5 2 1
    404618 3 1 2
    404618 4 1 2
    404618 5 1 2
    404619 3 2 2
    end
    label values race racev1
    label def racev1 1 "African", modify
    label def racev1 2 "Coloured", modify
    label def racev1 3 "Asian/indian", modify
    label def racev1 4 "White", modify
    Thanks.

  • #2
    Code:
    foreach v of varlist gender race {
    by pid (`v'), sort: assert `v' == `v'[1] | missing(`v')
    by pid (`v'): replace `v' = `v'[1]
    }
    Note: The first command in the loop verifies your claim that when not missing, race and gender are consistently reported across waves. I have seldom encountered survey demographic data with that level of consistency, so I take a cautious and skeptical approach. If the data really are that good, this code will do what you ask. If the data have some inconsistencies, it will halt with an error message, and that will be your cue to hunt those down and figure out what to do about them.

    Comment


    • #3
      Thanks very much Clyde Schechter

      Comment

      Working...
      X