In a dataset of 2014 observations, 8 were missing, then added with multiple imputation. When I run analysis now is shows 2070 observations

Mirjana Grkovska

Join Date: Jan 2018

Posts: 13
#1

In a dataset of 2014 observations, 8 were missing, then added with multiple imputation. When I run analysis now is shows 2070 observations

02 Jan 2018, 08:33

Hi StataList,

I am doing a research on subjective well-being, using cross-sectional survey-data for two years (not panel). The integrated datset have some missing values for my dependant varibales, Life satisfaction (8 missing values) and Happiness (28 missing values). Even though this may not sound like a big number, I am focusing on a small groups from my dataset, on each year separately and every obseravtion matters for the size of my sample. So I decided to proceed with multiple imputation. I followed the steps described in the following book: Mehmet Mehmetoglu and Tor Georg Jakobsen (2016) 'Applied Statistics Using Stata: A Guide for the Social Sciences' and did the process only for Life Satisfaction first. I also compared my comands with some youtube videos and it looks all good.

The results I am getting for the regression after the imputation are based on the entire sample size, which is 2014. I assume this means that the imputation was correct and succesful. However,once I save my dataset and then reopen to run regressions, it gives me results where the number of the observations 2070 and this now exceeds the regular size of the sample. How is this possible? Where did I make mistake? I guess I need to do something different when I am saving my data. Or should I use always 'mi estimate: regress...' even after the imputation was done and the data was saved? I assume there is a way to save the data with imputed variable as a new dataset where I can then run different analysis without incluidng 'mi estimate' every time.

I really appreciate your time to read my post and come with any suggestions,
Best wishes,
Mirjana
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30115
#2

02 Jan 2018, 08:38

Or should I use always 'mi estimate: regress...' even after the imputation was done and the data was saved?

Yes, you should always use the -mi estimate:- prefix when you want an MI analysis.

I assume there is a way to save the data with imputed variable as a new dataset where I can then run different analysis without incluidng 'mi estimate' every time.

Nope. You don't need to do the actual imputations again once you've saved the data sets, but Stata doesn't do an MI analysis unless you tell it to with the -mi estimate:- prefix, even though the data set were saved with all the multiple imputation results and characteristics.

All of that said, a case can be made the when only the dependent variable(s) has (have) missing values, multiple imputation is pointless or even inappropriate, regardless of sample size issues.
1 like
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3458
#3

02 Jan 2018, 08:41

With multiple imputation you replace each missing value with multiple guesses. So the result will be a dataset that has a different structure than "normal" datasets. So, you will need to tell Stata this. That is what the mi set program does, and mi est tells Stata to use that information.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Mirjana Grkovska

Join Date: Jan 2018

Posts: 13
#4

02 Jan 2018, 08:51

Dear Clyde and Maarten,

Many thanks for your swift replies. From your responses I get that there is no mistake with the imputation, even though I get more observations. Thanks for this clarification!

One more question if you don't mind. If I want to do anything else with my data and do not want to include -mi estimat- then I am getting 'incorect' results because of the bigger sample size. Would you suggest that I should work with the original dataset before the imputation, for any other analysis I will perform?

Best wishes,
Mirjana
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30115
#5

02 Jan 2018, 10:25

If you want to do analyses on the data as they were before multiple imputation, there are several approaches, which may differ in their convenience depending on the specific context:

1. Use the original data set instead of the multiply imputed data set.

2. Or use the multiply-imputed data set and run -mi extract 0-. That command will drop all the imputations and return you to the pre-imputation data.

3. Use the multiply-imputed data set and prefix whatever command you want to run on the unimputed data with -mi xeq 0:-.
Comment
Mirjana Grkovska

Join Date: Jan 2018

Posts: 13
#6

02 Jan 2018, 11:21

Hi Clyde,

This is very helpful. Thanks a lot for your help and detailed answers.

Bests,
Mirjana
Comment

Announcement

In a dataset of 2014 observations, 8 were missing, then added with multiple imputation. When I run analysis now is shows 2070 observations

Comment

Comment

Comment

Comment

Comment