Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Deleting observations with PANEL DATA

    Hi there,


    I am using stata12 and working on a panel dataset with two years (2002 and 2005). I have appended the years together, but due to attrition, some observations do not appear in the 2005 year. Also, new observations have been added to the 2005 year. I would like to drop all these observations and only keep the observations that appear in both years. What commands should I use to keep only the required observations as explained?

    I have attached an extract of my data.

    Code:
    case_id YEAR TOTAL_INCOME
    010404207 2002 1534.416
    010404207 2005 80
    010404208 2002 0
    010404208 2005 5000
    010404290 2005 1447.602
    010404301 2002 3665.751
    010404301 2005 710
    010404302 2002 1330.23
    010404302 2005 785.9066
    010404303 2002 950.0001
    010404303 2005 7957.208
    010404305 2002 300
    010404306 2002 500
    010404306 2005 23572.12
    010404307 2002 1507.841
    Thank you very much for your attention. And please let me know if I am not clear enough.

  • #2
    So, it sounds like you want to keep all and only those cases where there are exactly two years worth of data, the first being 2002 and the last being 2005.

    Code:
    by case_id (year), sort: keep if _N == 2 & year[1] == 2002 & year[_N] == 2005
    should do the trick.

    Comment


    • #3
      that is correct indeed. I think I got what I wanted and thank you for that Clyde. Is there any way of checking if there are two observations (2002 and 2005) for each unique identifier? Or in other words, that the every observation is repeated twice for the whole sample after I have dropped the "unrepeated ones"?

      Comment


      • #4
        Code:
        by case_id (year), sort: assert _N == 2 & inlist(year, 2002, 2005) & year[1] != year[_N]
        will verify for you that each case_id has exactly two observations, and that all years in the data are 2002 or 2005, and that no case_id has two observations for the same year.

        More generally, if you are not familiar with the -assert- command, I strongly recommend you read about it in the user manual. It is indispensable to getting data analysis right once you are beyond simple problems.

        Comment

        Working...
        X