Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Time-invariant characteristics conflict in panel data.

    Dear all,

    I am reading a description of panel data. According to the description, it says that gender and other time-invariant characteristics conflict when two individuals are matched between samples.

    Here is a full description: "Some respondents in the National Survey of College Graduates (NSCG), Survey of Doctorate Recipients (SDR), and National Survey of Recent College Graduates (NSRCG) are surveyed multiple times, providing opportunities for longitudinal data analysis. The variable PERSONID identifies individuals across survey years. (It replaces the original variable, REFID, which contains non-numeric values in some samples.) Users should note that in a small number of cases, gender and other time-invariant characteristics conflict when two individuals are matched between samples."

    Is there any way that we can identify those conflicts by STATA and drop them?

    Thank you for your help.

  • #2
    So if the variables of interest are x, y, z, and w, and you want to drop any person's data if those variables show inconsistent values:

    Code:
    foreach v of varlist x y z w {
        by PERSONID (`v'), sort: drop if `v'[1] != `v'[_N]
    }
    If you are willing to keep a PERSONID if the only inconsistency is between a missing value and a non-missing value, then you could get that with:
    Code:
    foreach v of varlist x y z w {
        by PERSONID (`v'), sort: gen inconsistent = (`v' != `v'[1] & !missing(`v'))
        by PERSONID (inconsistent), sort: drop if inconsistent[_N]
        drop inconsistent
    }

    Comment


    • #3
      Thank you so so much for your help, sir.

      Are there any ways that I can count those observations with conflicts first before dropping them? I do not want to drop too many observations.

      Thanks for your help again.

      Comment


      • #4
        Code:
        foreach v of varlist x y z w {
             by PERSONID (`v'), sort: gen to_drop = (`v'[1] != `v'[_N])
             summ to_drop, meanonly
             display "Variable `v': will drop `r(sum)' observations"
             drop if to_drop
             drop to_drop
        }

        Comment


        • #5
          Thanks so much for your help, sir

          Comment

          Working...
          X