Dropping individuals when missing observations and change in variable

Gabriela Kalibatseva

Join Date: Jan 2019

Posts: 26
#1

Dropping individuals when missing observations and change in variable

13 Jan 2019, 11:17

Hi,

I am running panel data analysis in Stata and work with Survey data from Germany. The period covered is 1984-2015. I have generated a variable that indicated personal identification number and missing observation of the ‘surveyed year’ variable.

xtset pid syear
sort pid syear
egen max_gap=max(syear - syear[_n-1]), by(pid)

I also have the marital status variable that ranger between 1-5. (1= married, 2=single, 4= divorced, 5= separated).

I have already dropped the married people with missing observation in two or more years:
drop if max_gap>2 & marstatus==1

However I want to drop all missing observation in 2 or more years but for people who have changed their marital status such as - if a person doesn’t report any marital status in two consequences years and on the third year the status has changed - he is dropped form the sample.

My approach was to create variable for those who have change in marital status over time and then drop it when max_gap is 2 or more years:

gen change = ((marstatus==1 & syear[_n]) & (marstatus !=1 & syear[_n+1]))
drop if max_gap>2 & change

However 0 observations are dropped which cannot be the case.

I would like to know what is wrong with my approach, maybe I need to use loops ?

Regards,
Gabriela
Tags: None
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#2

13 Jan 2019, 11:51

This things are hard to think through if we cannot see your data. Why dont you check the help for -dataex- and provide some sample of your data on which we can work? And then you say for this example how the variable that you want to generate should look like?

Otherwise even from here, it seems to me that your definition of "change" does not make sense. The expressions -syear[_n]- and -syear[_n+1]- evaluate to true/one identically, because every syear is bigger than 0.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17740
#3

13 Jan 2019, 11:53

Gabriela:
welcome to this forum.
Questions like yours have an increasing chance of being replied if you post an example/excerpt of your dataset via -dataex-.
That said, I'm not clear with your categorical variable -marital_status-, as levels are numbered form 1 to 5 but only four of them are described between brackets. Is level 3 the reference category?

PS: crossed in the cyberspace with Joro's helpful advice.

Kind regards,
Carlo
(Stata 19.0)
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#4

13 Jan 2019, 11:53

It seems to me that your

gen change = ((marstatus==1 & syear[_n]) & (marstatus !=1 & syear[_n+1]))

is equivalent, upon elimination of redundant/pointless terms to

gen change = ((marstatus==1) & (marstatus !=1))

which is a pointless statement because for no individual the two mutually exclusive statements can be true at the same time.

So my guess is that your variable "change" is identically 0 for each and every observation in your sample.
Comment
Gabriela Kalibatseva

Join Date: Jan 2019

Posts: 26
#5

24 Jan 2019, 14:58

Hi,

thank you for the answers. I tried using this command in order to provide a sample, but unsuccessfully.

With gen change = ((marstatus==1 & syear[_n]) & (marstatus !=1 & syear[_n+1])) I am trying to capture exactly the missing information in between change of status for 2 or more years. What I mean is if person X has status 1 (single) in year Y, in year Y+1 and Y+2 we don't have information and in year Y+3 he is already with status married (!=1), I need to drop this individual from the sample.

regarding the missing number 3 - those were widowed people that I am not interested in so I dropped that category.

I would really do appreciate advice since I cannot warp my head around solving this issue.

Thanks again and cheers!

Gabriela
Comment

Announcement

Dropping individuals when missing observations and change in variable

Comment

Comment

Comment

Comment