Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Advice on multiple imputation/interpolation for missing data

    Hi,

    I have compiled a cross-country panel dataset of various education indicators, Gini and Palma indices for my dissertation on inequality. After transforming this from wide to long to run my regression, there is a lot of missing values where Stata has auto-generated the missing years. I have done a Shapiro-Wilk test and found all my variables are of non-normal distribution (partly due to missing values?). How should I proceed with filling in the missing data? For instance, is it suitable to use multiple imputation or interpolation? I have not formally tested for MAR or MCAR but I know that none of my variables are MNAR.
    Last edited by Oliver Adamson; 12 Aug 2022, 10:31.

  • #2
    Oliver:
    1) from your post it is not clear why Stata created missing years if they were observed when the dataset was in -wide- format. Answering this question allows you to understand whether -ipolate- or -mi- is the way to go;
    2) normality is a weak requirement for residual distribution only.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Originally posted by Carlo Lazzaro View Post
      Oliver:
      1) from your post it is not clear why Stata created missing years if they were observed when the dataset was in -wide- format. Answering this question allows you to understand whether -ipolate- or -mi- is the way to go;
      2) normality is a weak requirement for residual distribution only.
      One of my variables was in long format with missing years, I transposed to wide and then back to long. Not the proper way to do it I know, but that's where the missing years came from.

      Comment


      • #4
        Also I have read here: https://stefvanbuuren.name/fimd/sec-nonnormal.html that multiple imputation has a different method when dealing with non-normal. Am I missing something?

        Comment


        • #5
          Oliver:
          the same authoritative source also states that -mi- from normal distribution is in general robust vs. departures from non-normality.
          As far as missing years are concerned, is there any way you can estimate them outside -mi- (e.g., ruling out clearly absurd values and/or retrieve them from the other variables)?
          If that were not the case, you can go with an unbalanced panel dataset.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Originally posted by Carlo Lazzaro View Post
            Oliver:
            the same authoritative source also states that -mi- from normal distribution is in general robust vs. departures from non-normality.
            As far as missing years are concerned, is there any way you can estimate them outside -mi- (e.g., ruling out clearly absurd values and/or retrieve them from the other variables)?
            If that were not the case, you can go with an unbalanced panel dataset.
            Carlo,

            I have attempted the -mi- approach using -mi estimate- and a linear regression to see what would happen. My results are shown below. I'm not sure how well this model explains my dependent variable because I cannot find R-squared. Do you know how I can see this or is there an alternative measure to see the same statistic?


            Click image for larger version

Name:	mi regression gini.PNG
Views:	1
Size:	83.7 KB
ID:	1677742

            Comment


            • #7
              Oliver:
              see: Harel, O. (2009). The estimation of R2 and adjusted R2 in incomplete data sets using multiple imputation. Journal of Applied Statistics, 36(10), 1109-1118.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment

              Working...
              X