Multiple Imputation

Sean O'Connor

Join Date: Jun 2014

Posts: 119
#1

Multiple Imputation

19 Jun 2014, 03:20

Unfortunately in my panel I have missing data for a number of countries over a number of years e.g Yemen has no recorded GDP data between 86-88. Anyway after using STATA's multiple imputation commands to correct for this data when I go to check what the imputed GDP data is I see the values are negative. Can anyone tell me why does would be the case, as I'm not very sure about the workings of multiple imputation being honest. My commands are as follows

*Dealing with missing data by data imputation
mi set flong
summarize
mi misstable nested
mi misstable patterns
mi misstable patterns , bypatterns
mi register imputed rgdpl ki dist reserves
mi register regular oilp oilc exports PR CL price free partiallyfree notfree
mi describe
set seed 2434
mi impute mvn rgdpl ki dist reserves = oilp oilc exports PR CL price i.free i.partiallyfree i.notfree, add(5)
Tags: None
daniel klein

Join Date: Mar 2014

Posts: 3850
#2

19 Jun 2014, 04:43

Well your imputation method (-mvn-) assumes a multivariate normal distribution and there is no reason to expect the imputed values to be restricted to be positive. Whether that model is appropriate is another question and (lots of) implausible imputed values might (but need not) indicate that this is not the case.

If you want the imputed values to be positive (or more general: closer to observed values in the dataset), you could use -mi chained- and specify predictive mean matching (-pmm-) or another model to impute the values for GDP. Keep in mind however, that the goal of multiple imputation is (asymptotically) valid inference - not description, which is to say you are not interested in the specific imputed values, but in some estimated parameters.

Richard Williams has (besides lots of other very useful material) an introduction to MI using Stata on his homepage https://www3.nd.edu/~rwilliam/stats2/l13.pdf. in Appendix B he talks about -mi impute chained-. Also see https://www3.nd.edu/~rwilliam/stats2/l14.pdf where Richard introduces the basic principles underlying multiple imputation.

Best
Daniel
Comment
Sean O'Connor

Join Date: Jun 2014

Posts: 119
#3

19 Jun 2014, 05:16

Thanks very much for the reply Daniel, it's very informative. I forgot to mention the reason I chose the mvn method was because my missing values aren't monotone e.g. I'm missing data for GDP and the Investment (both in the same years), but also distillation capacity and reserves in unrelated years. From my reading circa p.25 (http://www.stata.com/meeting/boston1..._marchenko.pdf) I believe that was the correct method.

Perhaps you could clear two other things up for me please? After I carried out the imputation I want to alter the variables to the first difference, however, when I try to use the gen command I get error r(5) "not sorted". Would you have any idea what is causing this as I'm not sure what way they need to be sorted?

Also I'm going to use the panel unit root test developed by Breitung (2000) to test for stationarity, however, once again when I run the command "xtunitroot breitung rgdpl, robust" the following error appears. error r (451) repeated time values within the panel. I've used mi xtset id year so I'm not sure why this error is occurring?

Thank you again for you're helpful response.

Best
Sean
Comment
Sean O'Connor

Join Date: Jun 2014

Posts: 119
#4

19 Jun 2014, 05:32

Originally posted by seanyshuffler View Post

Thanks very much for the reply Daniel, it's very informative. I forgot to mention the reason I chose the mvn method was because my missing values aren't monotone e.g. I'm missing data for GDP and the Investment (both in the same years), but also distillation capacity and reserves in unrelated years. From my reading circa p.25 (http://www.stata.com/meeting/boston1..._marchenko.pdf) I believe that was the correct method.

Perhaps you could clear two other things up for me please? After I carried out the imputation I want to alter the variables to the first difference, however, when I try to use the gen command I get error r(5) "not sorted". Would you have any idea what is causing this as I'm not sure what way they need to be sorted?

Also I'm going to use the panel unit root test developed by Breitung (2000) to test for stationarity, however, once again when I run the command "xtunitroot breitung rgdpl, robust" the following error appears. error r (451) repeated time values within the panel. I've used mi xtset id year so I'm not sure why this error is occurring?

Thank you again for you're helpful response.

Best
Sean

I've just realised what is causing these. I've multiple repeated time values due to the data imputation. So STATA can't gen differences or carry out unit root tests because of it. Is there anyway of correcting for this?
Comment
daniel klein

Join Date: Mar 2014

Posts: 3850
#5

19 Jun 2014, 06:09

Sean,

I was not suggesting to use predictive mean matiching (-pmm-) and merely impute one variable (in a monotone missing pattern) , but combine it with -mi impute chained- which is an alternative (less theoretically justified, yet more often used) method to deal with non-monotone missing data patterns.

If you want to use first differences, you are probably best of using the time series operators (help tsvarlist) directly with the estimation command you intend to use for analysis and let Stata handle the rest, Creating the variables beforehand might not be trivial in multiply imputed datasets.

I cannot say much on the stationarity test. xtunitroot seems to store results in r() and is not supported by Stata's mi suit. You will need to think about how such a test would be performed in multiply imputed data, i.e. which of the estimated quantities to combine in which way. Perhaps you can find literature on this issue, or you are lucky and someone on the list can give better advice. You can start reading http://www.stata.com/support/faqs/st...-imputed-data/.

Best
Daniel
1 like
Comment

Announcement

Multiple Imputation

Comment

Comment

Comment

Comment