Matching Imputed Datasets

Benedikt Hamacher

Join Date: Jan 2018

Posts: 4
#1

Matching Imputed Datasets

10 Jan 2018, 18:02

Hello everyone,

for my master thesis I need to analyze data which has been multiple imputated.

So I have five datasets (5 different .dta-files) which are quite similar, but differ in some cells, because their values were imputated. Additionally, I have one further dataset which only contains of many "0" and some "1" indicating which values were imputated in the other five datasets.

Now my question is how do I "merge" theses datasets and what specifics do I have to care about before analyzing my data like normal (doing a logistic regression etc.)?

Thanks in advance for every piece of help I get.
Tags: multiple imputation
Richard Williams

Join Date: Apr 2014

Posts: 4987
#2

10 Jan 2018, 18:05

Were the imputed data sets created by Stata? I suspect not, but if so that would make things easier.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Benedikt Hamacher

Join Date: Jan 2018

Posts: 4
#3

10 Jan 2018, 18:18

I honestly don't know that. How can I check that?
Comment
Benedikt Hamacher

Join Date: Jan 2018

Posts: 4
#4

11 Jan 2018, 05:24

So far I was not able to check whether those Datasets were created by Stata nor was I able to solve my initial problem.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4462
#5

11 Jan 2018, 06:14

if they were created by the official Stata command, there will be variable names starting "_mi"; are there? also, generally all imputed sets would be in one file (which it appears they are not); the user-written -ice- command also includes a counter and also generally results in just one file; you should probably read:

Code:

help mi import

to see if that gives you any help in determining what is going on

you don't say where this data came from but I would find it very surprising if the source gives you no help/information on the imputation process
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4987
#6

11 Jan 2018, 07:30

They could have been created by the -mi set flongsep- command, which creates separate files for each imputation, in which case the data are already usable. But then again they may have been created some other way. Like Rich G., I am a little surprised that you don't have any info on how these files were created. I suppose you could just assume flongsep was used and see if it works.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4987
#7

11 Jan 2018, 13:12

What are the actual file names? If flongsep was used, you would have names like

myflongsep.dta
_1_myflongsep.dta
_2_myflongsep
.etc

But when you say " Additionally, I have one further dataset which only contains of many "0" and some "1" indicating which values were imputated in the other five datasets" - that doesn't sound like what stata would create.

From what you say, you may not have the m=0 file -- the original unimputed data -- which may make the task more difficult but hopefully not impossible.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4987
#8

11 Jan 2018, 13:15

This may help:

https://stats.idre.ucla.edu/stata/fa...-not-included/

But first make sure it isn't already in flongsep format.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4987
#9

12 Jan 2018, 05:26

One other thing: if you open a file and type mi set, it will tell you if the file is mi set and if so how. For example,

Code:

. use "C:\Dropbox\testprogs\_1_myflongsep.dta" (Fictional heart attack data; bmi missing) . mi set data m=1 of flongsep myflongsep . use "C:\Dropbox\testprogs\myflongsep.dta" (Fictional heart attack data; bmi missing) . mi set data mi set flongsep myflongsep, M = 20 last mi update 11jan2018 14:55:25, approximately 16 hours ago

If you are lucky everything is already mi set and you are ready to go. If not so lucky, You may have to do something like what was described in the UCLA handout.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Benedikt Hamacher

Join Date: Jan 2018

Posts: 4
#10

13 Jan 2018, 05:18

First of all, thank you for your help!

I found out a way how to merge all my files into a stata MI dataset and the link provided by R. Williams was key to this, because with my imputet five datasets and the indicator dataset I was able to restore the original dataset (with missing values). From this point onward I generated an flong dataset myself and now I'm done with it. Thanks again.

Concerning the question were my data is from: Its from the Munich Center for the Economics of Aging. And the Dataset is called SAVE .(http://www.mea.mpisoc.mpg.de/index.php?id=315&L=2)

Regarding multiple imputation its written on their website:

"Why are there five datasets for each year? Which dataset should I use?
Missing data are imputed in SAVE using a multiple imputation technique. This is a Monte Carlo technique in which the missing values are replaced by m>1 simulated versions. Like in other surveys, such as the Survey of Consumer Finances, in SAVE m is set equal to five. In other words, the whole imputation algorithm is repeated five times, producing the five datasets that are provided to the final user.

To get meaningful results, each of the completed dataset should be analyzed by standard methods, and the results should be combined to produce estimates and confidence intervals that incorporate missing-data uncertainty. Standard errors obtained using only a single dataset are generally too low; furthermore single imputation is more prone to generate biased results. The statistical analysis of a single dataset is, however, good to get confidence with the data and to gather a first idea about magnitude and direction of the estimated effects. To this scope, it is absolutely indifferent which of the five dataset is used.

Rubin, D.B. (1996) “Multiple Imputation After 18+ Years” Journal of the American Statistical Association, 91(434), pp. 473-489 explains how to combine the results obtained from the separate analysis of the five datasets."
Comment

Announcement

Matching Imputed Datasets

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment