Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • In a dataset of 2014 observations, 8 were missing, then added with multiple imputation. When I run analysis now is shows 2070 observations

    Hi StataList,

    I am doing a research on subjective well-being, using cross-sectional survey-data for two years (not panel). The integrated datset have some missing values for my dependant varibales, Life satisfaction (8 missing values) and Happiness (28 missing values). Even though this may not sound like a big number, I am focusing on a small groups from my dataset, on each year separately and every obseravtion matters for the size of my sample. So I decided to proceed with multiple imputation. I followed the steps described in the following book: Mehmet Mehmetoglu and Tor Georg Jakobsen (2016) 'Applied Statistics Using Stata: A Guide for the Social Sciences' and did the process only for Life Satisfaction first. I also compared my comands with some youtube videos and it looks all good.

    The results I am getting for the regression after the imputation are based on the entire sample size, which is 2014. I assume this means that the imputation was correct and succesful. However,once I save my dataset and then reopen to run regressions, it gives me results where the number of the observations 2070 and this now exceeds the regular size of the sample. How is this possible? Where did I make mistake? I guess I need to do something different when I am saving my data. Or should I use always 'mi estimate: regress...' even after the imputation was done and the data was saved? I assume there is a way to save the data with imputed variable as a new dataset where I can then run different analysis without incluidng 'mi estimate' every time.


    I really appreciate your time to read my post and come with any suggestions,
    Best wishes,
    Mirjana

  • #2
    Or should I use always 'mi estimate: regress...' even after the imputation was done and the data was saved?
    Yes, you should always use the -mi estimate:- prefix when you want an MI analysis.

    I assume there is a way to save the data with imputed variable as a new dataset where I can then run different analysis without incluidng 'mi estimate' every time.
    Nope. You don't need to do the actual imputations again once you've saved the data sets, but Stata doesn't do an MI analysis unless you tell it to with the -mi estimate:- prefix, even though the data set were saved with all the multiple imputation results and characteristics.

    All of that said, a case can be made the when only the dependent variable(s) has (have) missing values, multiple imputation is pointless or even inappropriate, regardless of sample size issues.

    Comment


    • #3
      With multiple imputation you replace each missing value with multiple guesses. So the result will be a dataset that has a different structure than "normal" datasets. So, you will need to tell Stata this. That is what the mi set program does, and mi est tells Stata to use that information.
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        Dear Clyde and Maarten,

        Many thanks for your swift replies. From your responses I get that there is no mistake with the imputation, even though I get more observations. Thanks for this clarification!


        One more question if you don't mind. If I want to do anything else with my data and do not want to include -mi estimat- then I am getting 'incorect' results because of the bigger sample size. Would you suggest that I should work with the original dataset before the imputation, for any other analysis I will perform?

        Best wishes,
        Mirjana

        Comment


        • #5
          If you want to do analyses on the data as they were before multiple imputation, there are several approaches, which may differ in their convenience depending on the specific context:

          1. Use the original data set instead of the multiply imputed data set.

          2. Or use the multiply-imputed data set and run -mi extract 0-. That command will drop all the imputations and return you to the pre-imputation data.

          3. Use the multiply-imputed data set and prefix whatever command you want to run on the unimputed data with -mi xeq 0:-.

          Comment


          • #6
            Hi Clyde,

            This is very helpful. Thanks a lot for your help and detailed answers.

            Bests,
            Mirjana

            Comment

            Working...
            X