Advice on multiple imputation/interpolation for missing data

Oliver Adamson

Join Date: Apr 2022

Posts: 14
#1

Advice on multiple imputation/interpolation for missing data

12 Aug 2022, 10:29

Hi,

I have compiled a cross-country panel dataset of various education indicators, Gini and Palma indices for my dissertation on inequality. After transforming this from wide to long to run my regression, there is a lot of missing values where Stata has auto-generated the missing years. I have done a Shapiro-Wilk test and found all my variables are of non-normal distribution (partly due to missing values?). How should I proceed with filling in the missing data? For instance, is it suitable to use multiple imputation or interpolation? I have not formally tested for MAR or MCAR but I know that none of my variables are MNAR.

Last edited by Oliver Adamson; 12 Aug 2022, 10:31.
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17851
#2

12 Aug 2022, 10:58

Oliver:
1) from your post it is not clear why Stata created missing years if they were observed when the dataset was in -wide- format. Answering this question allows you to understand whether -ipolate- or -mi- is the way to go;
2) normality is a weak requirement for residual distribution only.

Kind regards,
Carlo
(Stata 19.0)
Comment
Oliver Adamson

Join Date: Apr 2022

Posts: 14
#3

13 Aug 2022, 04:23

Originally posted by Carlo Lazzaro View Post

Oliver:
1) from your post it is not clear why Stata created missing years if they were observed when the dataset was in -wide- format. Answering this question allows you to understand whether -ipolate- or -mi- is the way to go;
2) normality is a weak requirement for residual distribution only.

One of my variables was in long format with missing years, I transposed to wide and then back to long. Not the proper way to do it I know, but that's where the missing years came from.
Comment
Oliver Adamson

Join Date: Apr 2022

Posts: 14
#4

13 Aug 2022, 04:26

Also I have read here: https://stefvanbuuren.name/fimd/sec-nonnormal.html that multiple imputation has a different method when dealing with non-normal. Am I missing something?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17851
#5

13 Aug 2022, 10:28

Oliver:
the same authoritative source also states that -mi- from normal distribution is in general robust vs. departures from non-normality.
As far as missing years are concerned, is there any way you can estimate them outside -mi- (e.g., ruling out clearly absurd values and/or retrieve them from the other variables)?
If that were not the case, you can go with an unbalanced panel dataset.

Kind regards,
Carlo
(Stata 19.0)
Comment
Oliver Adamson

Join Date: Apr 2022

Posts: 14
#6

14 Aug 2022, 15:24

Originally posted by Carlo Lazzaro View Post

Oliver:
the same authoritative source also states that -mi- from normal distribution is in general robust vs. departures from non-normality.
As far as missing years are concerned, is there any way you can estimate them outside -mi- (e.g., ruling out clearly absurd values and/or retrieve them from the other variables)?
If that were not the case, you can go with an unbalanced panel dataset.

Carlo,

I have attempted the -mi- approach using -mi estimate- and a linear regression to see what would happen. My results are shown below. I'm not sure how well this model explains my dependent variable because I cannot find R-squared. Do you know how I can see this or is there an alternative measure to see the same statistic?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17851
#7

14 Aug 2022, 15:52

Oliver:
see: Harel, O. (2009). The estimation of R² and adjusted R² in incomplete data sets using multiple imputation. Journal of Applied Statistics, 36(10), 1109-1118.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Advice on multiple imputation/interpolation for missing data

Comment

Comment

Comment

Comment

Comment

Comment