Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • MI for MAR Panel Data

    Hello,

    I am hoping someone might be able to point me to code that would allow me to perform multiple imputation (mi) for Missing At Random (MAR) panel data.

    Right now, xtreg takes the number of observations down from ~3,400 to ~100 due to the variance in missing data which is organized by country year. The number of variables in the each regression range from 17-30. Because of the economic and political nature of the data, about half of those variables are missing 10-20% of the total number of observations that should exist.

    As far as I can tell, mi would be the best option to account for the missing data, though mi is not inherently suited for panel data. In addition to my general hope to find a relevant code, I have a few other questions.
    1. Are there any issues with or solutions to performing mi to variables in longitudinal data? Wide is typically the standard shape for mi.
    2. Is clustering/grouping an option in mi? vce(robust), which should automatically cluster, is used in the regression commands.
    3. Based on the above information, are these basic commands on the right track? Note: when I performed mi on a second variable with a significantly greater amount of missing data than the first, STATA returned this last "mi estimate" command with "estimation sample varies between m=1 and m=22." It did not return an error with the first variable.
    Code:
    mi set mlong
    mi register imputed var
    mi impute mvn var, add(10) rseed(1234)
    mi estimate: xtreg vars
    Any answers, advice, or feedback you are willing to provide would be appreciated!

    Thank you.

  • #2
    Originally posted by Viviana Marshall View Post
    when I performed mi on a second variable with a significantly greater amount of missing data than the first, STATA returned this last "mi estimate" command with "estimation sample varies between m=1 and m=22." It did not return an error with the first variable.
    I might be misunderstanding what you are doing here. Typically, you want to perform multiple imputation once for all variables with missing values. Imputing one variable after another will lead to datasets that still have missing values and, worse, not correctly acknowledge the correlations between the variables.


    Originally posted by Viviana Marshall View Post
    Are there any issues with or solutions to performing mi to variables in longitudinal data?
    With the standard mi impute commands you are losing the within-panel correlations that are often very good predictors. If most of your variables are completely missing within a panel, then losing within-panel correlation might not be as relevant (but there are other problems, such as not having any panel-specific information for the missing values). You can get around that by creating lag and lead variables but I would probably shape the data into a wide layout and impute the missing values.

    Originally posted by Viviana Marshall View Post
    Is clustering/grouping an option in mi? vce(robust), which should automatically cluster, is used in the regression commands.
    Clustering here probably means accounting for within panel-variation; for that, see my comment above. Note that the vce() option only affects the covariance matrix which is not the most relevant during imputation because it does not affect the predicted values (much).*



    * It does affect the predicted values in the sense that the parameters on which the predictions are based are drawn from the (posterior) parameter distribution.

    Comment


    • #3
      Thank you for helping me understand mi impute better! I've taken your advice and reshaped the data to wide. However, when I run "mi impute" for all missing variables, STATA returns with r(2000). What steps could I take to correct this?
      Code:
      import excel "dataset", firstrow
      
      reshape wide vars, i(country) j(year)
      
      mi set wide
      
      mi register imputed vars
      
      mi impute mvn vars, add(10) rseed(1234)
      Performing EM optimization: no observations to obtain initial values for EM using available-cases (ac) method r(2000): error . . . . . . . . . . . . . . . . . . . . . . . . Return code 2000 no observations You have requested some statistical calculation and there are no observations on which to perform it. Perhaps you specified if or in and inadvertently filtered all the data.

      Comment


      • #4
        This is still true after using:

        Code:
        count if !missing(all vars)
        and
        Code:
        mi impute mvn vars, add(10) rseed(1234) force

        Comment


        • #5
          At this point, you probably need to stop using stylized syntax and show exactly why you typed and what exactly Stata did in response. Also, consider including (a small proportion of) your actual data as an example using dataex.

          Comment

          Working...
          X