Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to drop observations when variable is theoretically time invariant but not in my panel data set

    Dear Ladies and Gentlemen,
    I am using the standard fixed effects model and the Hausman Taylor panel estimation procedure with Stata Version 11. I have race and gender in my panel data set. They should be constant over time meaning dropping from the FE model but staying in the HT one. I am not sure how to go about cleaning my data set in order to have both race and gender time invariant. Should I got ahead and drop those observations that have gender and race changes, and if so how would I code that? Or should I not drop any observations and just say use the gender with the highest frequency for any given indivdual over time? If so, again what would a possible Stata syntax look like?
    I would really appreciate your thoughts and suggestions.

  • #2
    To identify persons with varying race (or gender) you might try:
    Code:
    bysort id (time): gen changes = sum(race != race[_n-1]) if _n > 1
    list id race time if changes > 0
    I would be very surprised if any substantial number of persons had actual (non-erroneous) changes in these variables, although of course there would be some. However, even if the proportion were 0.01, I can't imagine it affecting your results. You could check this by running your analyses with different treatments of the missing race or gender variable, including dropping those persons, classifying them by their first or last status, and using the modal category, as you suggest.

    Code:
    //  first race
    bysort id (time): gen firstrace = race[1]
    // last race
    bysort id (time): gen lastrace = race[_N]
    // modal race
    bysort id: egen race_modal = mode(race) minmode
    Note that it's possible to have more than one mode, which I resolved here by using -minmode- to arbitrarily choose the smallest value. And, it's possible that race[1] or race[_N] is missing for an individual, but again, the number of persons who have (changes > 0) and (race[1] = .) is likely to be negligible.

    If you really want to be careful, the first or last nonmissing value can be handled in various ways; see
    http://www.statalist.org/forums/foru...e-new-variable

    Comment


    • #3
      Thank you very much Mike for this prompt reply! I will try it out, I understand what you are writing. Excellent! Have a great day!

      Comment

      Working...
      X