Hi everyone, I’m currently facing a specific missing data challenge in my research, and I wonder if I could ask for some advice?
I have a 7-wave unbalanced panel dataset of individuals who reported a dependent variable Y as well as explanatory variables X1, X2 and so on. X1 is a variable with 5 categories. Unfortunately due to a mistake in the questionnaire for Wave 4 only, the question for variable Y was only asked if X1=2 or =3 (it should have been asked to everyone).
I now have a few options:
1. Refrain from analyzing the entire Wave 4, which is a pity but does not seem to cause any obvious form of bias.
2. Apply complete case analysis for only those respondents who were asked the question (Y). Based on your book I would think that this is problematic as the missingness of Y is not MCAR – it is systematically related to the variables of interest.
3. Apply multiple imputation assuming the missingness of Y is MAR since it can be totally explained by X1 which is one of the variables in the model.
4. Binarize X1 in two different ways to create two new variables, X1A (0 if X1=1,4,5; 1 if X1=2,3) and X1B (0 if X1=1,2; 1 if X1=3,4,5). Apply multiple imputation using X1A as an auxiliary variable (since it perfectly predicts missingness of Y) and X1B as a predictor in the regression.
If you have any thoughts on these options, I would be immensely grateful.
I have a 7-wave unbalanced panel dataset of individuals who reported a dependent variable Y as well as explanatory variables X1, X2 and so on. X1 is a variable with 5 categories. Unfortunately due to a mistake in the questionnaire for Wave 4 only, the question for variable Y was only asked if X1=2 or =3 (it should have been asked to everyone).
I now have a few options:
1. Refrain from analyzing the entire Wave 4, which is a pity but does not seem to cause any obvious form of bias.
2. Apply complete case analysis for only those respondents who were asked the question (Y). Based on your book I would think that this is problematic as the missingness of Y is not MCAR – it is systematically related to the variables of interest.
3. Apply multiple imputation assuming the missingness of Y is MAR since it can be totally explained by X1 which is one of the variables in the model.
4. Binarize X1 in two different ways to create two new variables, X1A (0 if X1=1,4,5; 1 if X1=2,3) and X1B (0 if X1=1,2; 1 if X1=3,4,5). Apply multiple imputation using X1A as an auxiliary variable (since it perfectly predicts missingness of Y) and X1B as a predictor in the regression.
If you have any thoughts on these options, I would be immensely grateful.