Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple Imputation

    Unfortunately in my panel I have missing data for a number of countries over a number of years e.g Yemen has no recorded GDP data between 86-88. Anyway after using STATA's multiple imputation commands to correct for this data when I go to check what the imputed GDP data is I see the values are negative. Can anyone tell me why does would be the case, as I'm not very sure about the workings of multiple imputation being honest. My commands are as follows

    *Dealing with missing data by data imputation
    mi set flong
    summarize
    mi misstable nested
    mi misstable patterns
    mi misstable patterns , bypatterns
    mi register imputed rgdpl ki dist reserves
    mi register regular oilp oilc exports PR CL price free partiallyfree notfree
    mi describe
    set seed 2434
    mi impute mvn rgdpl ki dist reserves = oilp oilc exports PR CL price i.free i.partiallyfree i.notfree, add(5)

  • #2
    Well your imputation method (-mvn-) assumes a multivariate normal distribution and there is no reason to expect the imputed values to be restricted to be positive. Whether that model is appropriate is another question and (lots of) implausible imputed values might (but need not) indicate that this is not the case.

    If you want the imputed values to be positive (or more general: closer to observed values in the dataset), you could use -mi chained- and specify predictive mean matching (-pmm-) or another model to impute the values for GDP. Keep in mind however, that the goal of multiple imputation is (asymptotically) valid inference - not description, which is to say you are not interested in the specific imputed values, but in some estimated parameters.

    Richard Williams has (besides lots of other very useful material) an introduction to MI using Stata on his homepage https://www3.nd.edu/~rwilliam/stats2/l13.pdf. in Appendix B he talks about -mi impute chained-. Also see https://www3.nd.edu/~rwilliam/stats2/l14.pdf where Richard introduces the basic principles underlying multiple imputation.

    Best
    Daniel

    Comment


    • #3
      Thanks very much for the reply Daniel, it's very informative. I forgot to mention the reason I chose the mvn method was because my missing values aren't monotone e.g. I'm missing data for GDP and the Investment (both in the same years), but also distillation capacity and reserves in unrelated years. From my reading circa p.25 (http://www.stata.com/meeting/boston1..._marchenko.pdf) I believe that was the correct method.

      Perhaps you could clear two other things up for me please? After I carried out the imputation I want to alter the variables to the first difference, however, when I try to use the gen command I get error r(5) "not sorted". Would you have any idea what is causing this as I'm not sure what way they need to be sorted?

      Also I'm going to use the panel unit root test developed by Breitung (2000) to test for stationarity, however, once again when I run the command "xtunitroot breitung rgdpl, robust" the following error appears. error r (451) repeated time values within the panel. I've used mi xtset id year so I'm not sure why this error is occurring?

      Thank you again for you're helpful response.

      Best
      Sean

      Comment


      • #4

        Originally posted by seanyshuffler View Post
        Thanks very much for the reply Daniel, it's very informative. I forgot to mention the reason I chose the mvn method was because my missing values aren't monotone e.g. I'm missing data for GDP and the Investment (both in the same years), but also distillation capacity and reserves in unrelated years. From my reading circa p.25 (http://www.stata.com/meeting/boston1..._marchenko.pdf) I believe that was the correct method.

        Perhaps you could clear two other things up for me please? After I carried out the imputation I want to alter the variables to the first difference, however, when I try to use the gen command I get error r(5) "not sorted". Would you have any idea what is causing this as I'm not sure what way they need to be sorted?

        Also I'm going to use the panel unit root test developed by Breitung (2000) to test for stationarity, however, once again when I run the command "xtunitroot breitung rgdpl, robust" the following error appears. error r (451) repeated time values within the panel. I've used mi xtset id year so I'm not sure why this error is occurring?

        Thank you again for you're helpful response.

        Best
        Sean
        I've just realised what is causing these. I've multiple repeated time values due to the data imputation. So STATA can't gen differences or carry out unit root tests because of it. Is there anyway of correcting for this?

        Comment


        • #5
          Sean,

          I was not suggesting to use predictive mean matiching (-pmm-) and merely impute one variable (in a monotone missing pattern) , but combine it with -mi impute chained- which is an alternative (less theoretically justified, yet more often used) method to deal with non-monotone missing data patterns.

          If you want to use first differences, you are probably best of using the time series operators (help tsvarlist) directly with the estimation command you intend to use for analysis and let Stata handle the rest, Creating the variables beforehand might not be trivial in multiply imputed datasets.

          I cannot say much on the stationarity test. xtunitroot seems to store results in r() and is not supported by Stata's mi suit. You will need to think about how such a test would be performed in multiply imputed data, i.e. which of the estimated quantities to combine in which way. Perhaps you can find literature on this issue, or you are lucky and someone on the list can give better advice. You can start reading http://www.stata.com/support/faqs/st...-imputed-data/.

          Best
          Daniel

          Comment

          Working...
          X