Household attrition in panel data

Jodar Joldoshov

Join Date: Dec 2020

Posts: 4
#1

Household attrition in panel data

31 May 2022, 04:19

I am constructing a panel dataset based on the survey data for the years 2010-2013 (four consecutive years). As is usually the case with household survey data, there is an issue of attrition, i.e. some households drop out from the survey from year to year. I need to figure out whether these households are missing at random.

My idea is to come up with a dummy equal to 1 in 2011 if a household is present in 2010 is missing in 2011 (and 0 otherwise), and so on for the years 2012, 2013. Then for each year above (2011, 2012, 2013) I want to run the logit/probit regression on this dummy with a set of covariates that I would like to control for in my study. The variable for household id is "hhid" and I have of course the time dimension variable "year".

Does anyone have a precise idea how this should be properly coded in Stata? I know it is not complicated, but I just cannot wrap my head around it and figure this out....
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35734
#2

31 May 2022, 04:35

Cross-posted and answered at https://stackoverflow.com/questions/...nel-data-stata

See https://www.statalist.org/forums/help#crossposting for our policy on cross-posting, which is that you should tell us about it.
1 like
Comment
Jodar Joldoshov

Join Date: Dec 2020

Posts: 4
#3

01 Jun 2022, 03:33

Originally posted by Nick Cox View Post

Cross-posted and answered at https://stackoverflow.com/questions/...nel-data-stata

See https://www.statalist.org/forums/help#crossposting for our policy on cross-posting, which is that you should tell us about it.

Yes, it was cross-posted as the answer on stackoverflow is not helpful to be honest. I did not know about the policy of cross-posting (it is not a crime I hope).
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#4

01 Jun 2022, 04:05

Jodar:
cross-posting is obviously not a crime.
The core meaning of the FAQ recommendation Nick pointed you to is to save everybody's time and avoid replying to topics that might have been already helpfully replied elsewhere.
In your case, the reasons why you find the reply you got from SE unhelpful could have been useful to reply you (more) positively here.
That said:
1) creating a balanced panel from unbalanced data may create a dataset that has only tenuous relationship with the original one. In addition, Stata can analyse bioth unbalanced and balanced oane datasets;
2) if I get what you've in mind right, it seems a sort of dummy variable adjustment (see https://us.sagepub.com/en-us/nam/missing-data/book9419 page 9-11). Unfortunately,this approach is not recomended as it has been proved to bias the regression coefficients.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Jodar Joldoshov

Join Date: Dec 2020

Posts: 4
#5

03 Jun 2022, 03:57

Thanks a lot, Carlo. I will try to figure out. My question was simply to code this: creating a dummy variable equal to 1 if a household is present in 2010 but absent in 2011, and 0 otherwise.
I will try to find out how to code it properly.

Kind regards,
Jodar
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#6

03 Jun 2022, 04:11

Jodar:
if a given panel-specific observation is absent for a given year, you have to -expand- the number of panel-specific observations to create the observations(s) in the missing year(s).
However, all the values for the missing year(s) will be missing (if you do not -ipolate- them) and, as such, ruled out by Stata (listwise deletion) from subsequent statistical procedure.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Household attrition in panel data

Comment

Comment

Comment

Comment

Comment