Hi,
I have a household data set (cross-sectional data) with over 36,000 observations. However, after running mdesc I've noticed that for most variables over 50% of the data is missing due to family members not responding to the survey question. I need to compute the missing values as dropping them would mean I'd lose the large majority of my sample. The variables I am looking at are a mix of categorical and numerical variables. For example I have highest educational attainment (in categories) and wages (numerical). I am not quite sure whether I should use multiple imputation or mean replacement to deal with my missing data? And if there are any useful codes you could suggest?
Thank you for your time.
I have a household data set (cross-sectional data) with over 36,000 observations. However, after running mdesc I've noticed that for most variables over 50% of the data is missing due to family members not responding to the survey question. I need to compute the missing values as dropping them would mean I'd lose the large majority of my sample. The variables I am looking at are a mix of categorical and numerical variables. For example I have highest educational attainment (in categories) and wages (numerical). I am not quite sure whether I should use multiple imputation or mean replacement to deal with my missing data? And if there are any useful codes you could suggest?
Thank you for your time.
Comment