Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping individuals when missing observations and change in variable

    Hi,

    I am running panel data analysis in Stata and work with Survey data from Germany. The period covered is 1984-2015. I have generated a variable that indicated personal identification number and missing observation of the ‘surveyed year’ variable.

    xtset pid syear
    sort pid syear
    egen max_gap=max(syear - syear[_n-1]), by(pid)


    I also have the marital status variable that ranger between 1-5. (1= married, 2=single, 4= divorced, 5= separated).

    I have already dropped the married people with missing observation in two or more years:
    drop if max_gap>2 & marstatus==1

    However I want to drop all missing observation in 2 or more years but for people who have changed their marital status such as - if a person doesn’t report any marital status in two consequences years and on the third year the status has changed - he is dropped form the sample.

    My approach was to create variable for those who have change in marital status over time and then drop it when max_gap is 2 or more years:

    gen change = ((marstatus==1 & syear[_n]) & (marstatus !=1 & syear[_n+1]))
    drop if max_gap>2 & change

    However 0 observations are dropped which cannot be the case.

    I would like to know what is wrong with my approach, maybe I need to use loops ?

    Regards,
    Gabriela

  • #2
    This things are hard to think through if we cannot see your data. Why dont you check the help for -dataex- and provide some sample of your data on which we can work? And then you say for this example how the variable that you want to generate should look like?

    Otherwise even from here, it seems to me that your definition of "change" does not make sense. The expressions -syear[_n]- and -syear[_n+1]- evaluate to true/one identically, because every syear is bigger than 0.

    Comment


    • #3
      Gabriela:
      welcome to this forum.
      Questions like yours have an increasing chance of being replied if you post an example/excerpt of your dataset via -dataex-.
      That said, I'm not clear with your categorical variable -marital_status-, as levels are numbered form 1 to 5 but only four of them are described between brackets. Is level 3 the reference category?

      PS: crossed in the cyberspace with Joro's helpful advice.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        It seems to me that your

        gen change = ((marstatus==1 & syear[_n]) & (marstatus !=1 & syear[_n+1]))

        is equivalent, upon elimination of redundant/pointless terms to

        gen change = ((marstatus==1) & (marstatus !=1))

        which is a pointless statement because for no individual the two mutually exclusive statements can be true at the same time.

        So my guess is that your variable "change" is identically 0 for each and every observation in your sample.

        Comment


        • #5
          Hi,

          thank you for the answers. I tried using this command in order to provide a sample, but unsuccessfully.

          With gen change = ((marstatus==1 & syear[_n]) & (marstatus !=1 & syear[_n+1])) I am trying to capture exactly the missing information in between change of status for 2 or more years. What I mean is if person X has status 1 (single) in year Y, in year Y+1 and Y+2 we don't have information and in year Y+3 he is already with status married (!=1), I need to drop this individual from the sample.

          regarding the missing number 3 - those were widowed people that I am not interested in so I dropped that category.

          I would really do appreciate advice since I cannot warp my head around solving this issue.

          Thanks again and cheers!

          Gabriela

          Comment

          Working...
          X