MI for MAR Panel Data

Viviana Marshall

Join Date: Apr 2020

Posts: 7
#1

MI for MAR Panel Data

03 Mar 2021, 19:49

Hello,

I am hoping someone might be able to point me to code that would allow me to perform multiple imputation (mi) for Missing At Random (MAR) panel data.

Right now, xtreg takes the number of observations down from ~3,400 to ~100 due to the variance in missing data which is organized by country year. The number of variables in the each regression range from 17-30. Because of the economic and political nature of the data, about half of those variables are missing 10-20% of the total number of observations that should exist.

As far as I can tell, mi would be the best option to account for the missing data, though mi is not inherently suited for panel data. In addition to my general hope to find a relevant code, I have a few other questions.
Are there any issues with or solutions to performing mi to variables in longitudinal data? Wide is typically the standard shape for mi.

Is clustering/grouping an option in mi? vce(robust), which should automatically cluster, is used in the regression commands.

Based on the above information, are these basic commands on the right track? Note: when I performed mi on a second variable with a significantly greater amount of missing data than the first, STATA returned this last "mi estimate" command with "estimation sample varies between m=1 and m=22." It did not return an error with the first variable.

Code:

mi set mlong mi register imputed var mi impute mvn var, add(10) rseed(1234) mi estimate: xtreg vars

Any answers, advice, or feedback you are willing to provide would be appreciated!

Thank you.
Tags: mar, mi impute, multiple imputation, panel data
daniel klein

Join Date: Mar 2014

Posts: 3850
#2

04 Mar 2021, 03:19

Originally posted by Viviana Marshall View Post

when I performed mi on a second variable with a significantly greater amount of missing data than the first, STATA returned this last "mi estimate" command with "estimation sample varies between m=1 and m=22." It did not return an error with the first variable.

I might be misunderstanding what you are doing here. Typically, you want to perform multiple imputation once for all variables with missing values. Imputing one variable after another will lead to datasets that still have missing values and, worse, not correctly acknowledge the correlations between the variables.

Originally posted by Viviana Marshall View Post

Are there any issues with or solutions to performing mi to variables in longitudinal data?

With the standard mi impute commands you are losing the within-panel correlations that are often very good predictors. If most of your variables are completely missing within a panel, then losing within-panel correlation might not be as relevant (but there are other problems, such as not having any panel-specific information for the missing values). You can get around that by creating lag and lead variables but I would probably shape the data into a wide layout and impute the missing values.

Originally posted by Viviana Marshall View Post

Is clustering/grouping an option in mi? vce(robust), which should automatically cluster, is used in the regression commands.

Clustering here probably means accounting for within panel-variation; for that, see my comment above. Note that the vce() option only affects the covariance matrix which is not the most relevant during imputation because it does not affect the predicted values (much).*

* It does affect the predicted values in the sense that the parameters on which the predictions are based are drawn from the (posterior) parameter distribution.
Comment

Viviana Marshall

Join Date: Apr 2020
Posts: 7

06 Mar 2021, 13:40

Thank you for helping me understand mi impute better! I've taken your advice and reshaped the data to wide. However, when I run "mi impute" for all missing variables, STATA returns with r(2000). What steps could I take to correct this?

Code:

import excel "dataset", firstrow

reshape wide vars, i(country) j(year)

mi set wide

mi register imputed vars

mi impute mvn vars, add(10) rseed(1234)Performing EM optimization:
no observations to obtain initial values for EM using available-cases (ac) method
r(2000):  error . . . . . . . . . . . . . . . . . . . . . . . . Return code 2000
        no observations
        You have requested some statistical calculation and there are
        no observations on which to perform it.  Perhaps you specified
        if or in and inadvertently filtered all the data.

Comment

Viviana Marshall

Join Date: Apr 2020

Posts: 7
#4

06 Mar 2021, 15:45

This is still true after using:

Code:

count if !missing(all vars)

and

Code:

mi impute mvn vars, add(10) rseed(1234) force
Comment
daniel klein

Join Date: Mar 2014

Posts: 3850
#5

07 Mar 2021, 04:11

At this point, you probably need to stop using stylized syntax and show exactly why you typed and what exactly Stata did in response. Also, consider including (a small proportion of) your actual data as an example using dataex.
Comment

Announcement

MI for MAR Panel Data

Comment

Comment

Comment

Comment