I would greatly appreciate if you could give me a really quick bit of advice around missing values:
A bit of context - I have a panel dataset with 19 variables and approx 250k observations
My dataset has less than 10% missing values, however this leads to stata regressing using only 20% of the overall observations. I have written code for multiple imputation which seems like the best option, however our university computers simply aren’t powerful enough to compute this and would take weeks. Having looked through all alternatives, substituting the mean (and perhaps using a dummy variable indicating missing data) seems the most best solution but has been criticised for artificially decreasing the standard errors, therefore leading to invalid inference.
Are there any alternatives (or simplifications) to multiple imputation which will give unbiased estimates for the missing data without adversely affecting the variance?
Many thanks,
WB
A bit of context - I have a panel dataset with 19 variables and approx 250k observations
My dataset has less than 10% missing values, however this leads to stata regressing using only 20% of the overall observations. I have written code for multiple imputation which seems like the best option, however our university computers simply aren’t powerful enough to compute this and would take weeks. Having looked through all alternatives, substituting the mean (and perhaps using a dummy variable indicating missing data) seems the most best solution but has been criticised for artificially decreasing the standard errors, therefore leading to invalid inference.
Are there any alternatives (or simplifications) to multiple imputation which will give unbiased estimates for the missing data without adversely affecting the variance?
Many thanks,
WB
Comment