Dear Statalisters
I have 3 waves of panel data where some people are interviewed in all 3 waves but others, only in 2 adjacent waves, i.e. everyone is in wave 2, but only some in waves 1 and 3. This is because I’m using within-person fixed effects models for my main analyses and so don’t want big gaps of time between measures of the outcome.
I have around 20% missing data. I have reshaped the data to wide format (reshape wide) to perform the imputation process using mi impute chained (and then will use mi reshape long to get it back to panel format with mi data). Each variable from the original panel now has a 1, 2 or 3 suffix to denote the wave number.
My problem is that I do not want to impute values for people who were not interviewed in a specific wave (i.e. either in waves 1 or 3).
Using the dryrun option, I can see that values at a certain time are predicted by values at a time where sometimes there are no values, as all variables are retained in the imputation model (i.e. each variable at each wave). For example, the first model shown in the dry run output is predicting if someone smokes in wave 2 (smoke2) but if that person was not interviewed in wave 1, then using smoke1 as a predictor of smoke2 is invalid (and I get a lot of error messages as shown below). But as I understand it, I need to keep all the variables that will ultimately be used in the final analysis, in the imputation model.
I have specified only imputing values for a given variable if a person was interviewed in that particular wave (i.e. 1 and 3, but not 2), but I don’t know how to specify only using ‘sensible’ variables as predictor variables. Is there a way to specify the imputation model in this way?
I am using Stata version 13.1
Many thanks
Paula
My code:
Stata output:
Stata output after running the first model above:
I have 3 waves of panel data where some people are interviewed in all 3 waves but others, only in 2 adjacent waves, i.e. everyone is in wave 2, but only some in waves 1 and 3. This is because I’m using within-person fixed effects models for my main analyses and so don’t want big gaps of time between measures of the outcome.
I have around 20% missing data. I have reshaped the data to wide format (reshape wide) to perform the imputation process using mi impute chained (and then will use mi reshape long to get it back to panel format with mi data). Each variable from the original panel now has a 1, 2 or 3 suffix to denote the wave number.
My problem is that I do not want to impute values for people who were not interviewed in a specific wave (i.e. either in waves 1 or 3).
Using the dryrun option, I can see that values at a certain time are predicted by values at a time where sometimes there are no values, as all variables are retained in the imputation model (i.e. each variable at each wave). For example, the first model shown in the dry run output is predicting if someone smokes in wave 2 (smoke2) but if that person was not interviewed in wave 1, then using smoke1 as a predictor of smoke2 is invalid (and I get a lot of error messages as shown below). But as I understand it, I need to keep all the variables that will ultimately be used in the final analysis, in the imputation model.
I have specified only imputing values for a given variable if a person was interviewed in that particular wave (i.e. 1 and 3, but not 2), but I don’t know how to specify only using ‘sensible’ variables as predictor variables. Is there a way to specify the imputation model in this way?
I am using Stata version 13.1
Many thanks
Paula
My code:
Code:
mi impute chained (reg if interview1==1) expect1 /// (reg) expect2 /// (reg if interview3==1) expect3 /// (logit) grandkids2 /// (logit if interview3==1) grandkids3 /// (logit if interview1==1) married1 /// (logit) married2 /// (logit if interview3==1) married3 /// (logit if interview1==1) smoke1 /// (logit) smoke2 /// (logit if interview3==1) smoke3 /// (mlogit if interview1==1) occup1 /// (mlogit) occup2 /// (mlogit if interview3==1) occup3 /// (ologit) isced1997_r = i.country_x gender yrbirth, add(5) rseed(250743) dryrun
Code:
Conditional models: smoke2: logit smoke2 i.married3 i.married2 i.occup2 i.grandkids2 i.isced1997_r expect2 i.grandkids3 i.occup3 expect3 i.married1 i.smoke1 i.occup1 i.smoke3 expect1 i.country_x gender yrbirth married3: logit married3 i.smoke2 i.married2 i.occup2 i.grandkids2 i.isced1997_r expect2 i.grandkids3 i.occup3 expect3 i.married1 i.smoke1 i.occup1 i.smoke3 expect1 i.country_x gender yrbirth if interview3==1 married2: logit married2 i.smoke2 i.married3 i.occup2 i.grandkids2 i.isced1997_r expect2 i.grandkids3 i.occup3 expect3 i.married1 i.smoke1 i.occup1 i.smoke3 expect1 i.country_x gender yrbirth occup2: mlogit occup2 i.smoke2 i.married3 i.married2 i.grandkids2 i.isced1997_r expect2 i.grandkids3 i.occup3 expect3 i.married1 i.smoke1 i.occup1 i.smoke3 expect1 i.country_x gender yrbirth grandkids2: logit grandkids2 i.smoke2 i.married3 i.married2 i.occup2 i.isced1997_r expect2 i.grandkids3 i.occup3 expect3 i.married1 i.smoke1 i.occup1 i.smoke3 expect1 i.country_x gender yrbirth isced1997_r: ologit isced1997_r i.smoke2 i.married3 i.married2 i.occup2 i.grandkids2 expect2 i.grandkids3 i.occup3 expect3 i.married1 i.smoke1 i.occup1 i.smoke3 expect1 i.country_x gender yrbirth expect2: regress expect2 i.smoke2 i.married3 i.married2 i.occup2 i.grandkids2 i.isced1997_r i.grandkids3 i.occup3 expect3 i.married1 i.smoke1 i.occup1 i.smoke3 expect1 i.country_x gender yrbirth grandkids3: logit grandkids3 i.smoke2 i.married3 i.married2 i.occup2 i.grandkids2 i.isced1997_r expect2 i.occup3 expect3 i.married1 i.smoke1 i.occup1 i.smoke3 expect1 i.country_x gender yrbirth if interview3==1 occup3: mlogit occup3 i.smoke2 i.married3 i.married2 i.occup2 i.grandkids2 i.isced1997_r expect2 i.grandkids3 expect3 i.married1 i.smoke1 i.occup1 i.smoke3 expect1 i.country_x gender yrbirth if interview3==1 expect3: regress expect3 i.smoke2 i.married3 i.married2 i.occup2 i.grandkids2 i.isced1997_r expect2 i.grandkids3 i.occup3 i.married1 i.smoke1 i.occup1 i.smoke3 expect1 i.country_x gender yrbirth if interview3==1 married1: logit married1 i.smoke2 i.married3 i.married2 i.occup2 i.grandkids2 i.isced1997_r expect2 i.grandkids3 i.occup3 expect3 i.smoke1 i.occup1 i.smoke3 expect1 i.country_x gender yrbirth if interview1==1 smoke1: logit smoke1 i.smoke2 i.married3 i.married2 i.occup2 i.grandkids2 i.isced1997_r expect2 i.grandkids3 i.occup3 expect3 i.married1 i.occup1 i.smoke3 expect1 i.country_x gender yrbirth if interview1==1 occup1: mlogit occup1 i.smoke2 i.married3 i.married2 i.occup2 i.grandkids2 i.isced1997_r expect2 i.grandkids3 i.occup3 expect3 i.married1 i.smoke1 i.smoke3 expect1 i.country_x gender yrbirth if interview1==1 smoke3: logit smoke3 i.smoke2 i.married3 i.married2 i.occup2 i.grandkids2 i.isced1997_r expect2 i.grandkids3 i.occup3 expect3 i.married1 i.smoke1 i.occup1 expect1 i.country_x gender yrbirth if interview3==1 expect1: regress expect1 i.smoke2 i.married3 i.married2 i.occup2 i.grandkids2 i.isced1997_r expect2 i.grandkids3 i.occup3 expect3 i.married1 i.smoke1 i.occup1 i.smoke3 i.country_x gender yrbirth if interview1==1
Code:
. logit smoke2 i.married3 i.married2 i.occup2 i.grandkids2 i.isced1997_r expect2 i.grandkids3 i.occup3 expect3 i.married1 i.smoke1 i.occup1 i.smoke3 expect1 i.country_x gender yrbi > rth if interview2==1 note: 0.occup2 != 1 predicts failure perfectly 0.occup2 dropped and 1303 obs not used note: 0.grandkids2 != 1 predicts failure perfectly 0.grandkids2 dropped and 199 obs not used note: 0.isced1997_r != 0 predicts failure perfectly 0.isced1997_r dropped and 4 obs not used note: 1.isced1997_r != 0 predicts failure perfectly 1.isced1997_r dropped and 31 obs not used note: 2.isced1997_r != 0 predicts failure perfectly 2.isced1997_r dropped and 101 obs not used note: 3.isced1997_r != 0 predicts failure perfectly 3.isced1997_r dropped and 400 obs not used note: 6.isced1997_r != 0 predicts failure perfectly 6.isced1997_r dropped and 26 obs not used note: 0.grandkids3 != 1 predicts failure perfectly 0.grandkids3 dropped and 64 obs not used note: 0.occup3 != 1 predicts failure perfectly 0.occup3 dropped and 51 obs not used outcome = smoke1 <= 0 predicts data perfectly r(2000);