multiple mi imputes in one dataset

Sara Zakaryan

Join Date: Mar 2016

Posts: 30
#1

multiple mi imputes in one dataset

21 Mar 2016, 10:01

Dear all,

Can you help me please with an issue with multiple imputations? I have different variables that have missing values and I am trying to fill them with mi. I have got a question regarding the variables that should be chosen for regression with mi impute.
If I impute one variable with

mi impute monotone (regress) X1 = Y X2 X3 X4 ( those X2, X3, and X4 have no missing values and they do explain the dependent variable well that is why I have chosen them and Y is the dependent variable)

for the next variable that I want to impute with multiple imputations I need to do it with the same Y X2 X3 X4 or I can include for example already imputed X1 too? I guess the error should get bigger if I do include it, right?

Can you please help
Thanks in advance
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

22 Mar 2016, 05:18

According to what we read here (http://www.stata.com/manuals13/mimii...imputemonotone), I gather that, provided there is a monotone pattern for all the to-be-imputed variables, you may include these variables in the first half of the command, or apply a sequence of commands which "contemplates" previously imputed variables.

Best regards,

Marcos
Comment
daniel klein

Join Date: Mar 2014

Posts: 3850
#3

22 Mar 2016, 06:03

I have never used mi impute monotone, but I understand that you set up one model in this case, meaning you call mi estimate exactly once. From your description it sounds like you are planning to call mi impute more than once. How exactly do you expect this to work, given that each call will create m complete datasets?

Read the manual examples and explanations on standard syntax. I guess instead of what I suppose you have in mind

Code:

mi impute monotone (regress) X1 = Y X2 X3 X4 // , add(m) mi impute monotone (regress) NEXT = X1 Y X2 X3 X4 , add(m) // <- this looks suspicious

you want

Code:

mi impute monotone (regress) X1 NEXT = X2 X3 X4 , add(m)

I would go with mi imputed chained here, anyway. Let Stata figure out the pattern of missing values and the best sequence to perform the imputations.

Best
Daniel
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#4

22 Mar 2016, 08:20

The second option is provided in the example 4 of the text I shared in #2.

Best regards,

Marcos
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4992
#5

22 Mar 2016, 09:45

As a sidelight, if you are going to use regress in the imputation AND in the estimation command (you don't say) then I would seriously consider using the sem command with the mlmv option. It is a lot simpler and less convoluted than using MI. See pgs. 29-32 of http://www3.nd.edu/~rwilliam/xsoc73994/MD02.pdf.

Maybe I am missing something but in Daniel's preferred coding shouldn't Y be included as a regressor?

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
daniel klein

Join Date: Mar 2014

Posts: 3850
#6

22 Mar 2016, 10:18

Maybe I am missing something but in Daniel's preferred coding shouldn't Y be included as a regressor?

Yes,it should definitely be included. Thanks for spotting this.

As an aside, I do not fully follow Allison in generally promoting FIML. I have some doubts about all three claimed main advantages.

Claim 1.

ML is simpler to implement (if you have the right software)

In my view this depends very much on details of the analysis. Personally, I find it kind of cumbersome to do even simple things like interactions in SEM like models (this is the purely technical view). Also, multilevel structures are arguably not as easily implemented. The same goes for non-linear models. I do see the general idea here, though, and I must admit that I have never used gsem.

Claim 2.

Unlike multiple imputation, ML has no potential incompatibility between an imputation model and an analysis model

True. But unlike MI if you get your FIML "imputation model" wrong, so will be the "analysis model". This is obvious as there is only one model. Splitting the process, as MI does, makes the latter more robust with regards to the chosen imputation model. Allison (2001, p. 32) himself makes this point in an earlier work. Given the views expressed in the Stata manual:

MLMV takes the assumption of joint normality seriously in most cases. If your observed variables do not follow a joint normal distribution, you will be better off using ML, QML, or ADF and simply omitting observations with missing values.

I would also be a bit more reluctant of playing down the normality assumption with this method. Last, I can see situations where you might want quite a number of predictor variables in your imputation model while not cluttering your analysis model with these variables at the same time.

Claim 3.

ML produces a deterministic result rather than a different result every time.

I have yet failed to see why this should be an advantage or disadvantage.

Best
Daniel

Allison, P. (2001). Missing Data. Thousand Oaks: Sage.

Last edited by daniel klein; 22 Mar 2016, 10:22.
Comment
Sara Zakaryan

Join Date: Mar 2016

Posts: 30
#7

22 Mar 2016, 13:00

Thank you all for your comments!
I thought to consider ML as well but I am new with imputations so I am trying to see what might fit better for my dataset.
As Richard noted I rather give some more details about the aanalysis that I want to do.

After Imputations I will have logit for mi estimate. The dataset has 100+ variables and most of them has missing values. But I notice some interesting pattern of missingness like 10 of my variables missing 39840 values out of 91321, the other 20 variables have 39787 missing values out of 91321 and as ids are people I thought there might me some similar pattern for those groups where the same information is missing for all.

That is why I thought to have seperate mi commands for those groups.

Also, Daniel, I wanted to ask about the add() in your command, I cant find the optimal number that should be chosen for this option. I read in discussions that for about 20% of missingness in dataset 3 is enough but I am not sure about it. Can you please comment on this as well? How I choose the number if in my dataset about 33% of values are missing.

Thanks,
Sara
Comment
Sara Zakaryan

Join Date: Mar 2016

Posts: 30
#8

22 Mar 2016, 13:17

Richard,

Thank you for the link and helpful material posted.
Paul Allison notes that

For logistic regression and Cox regression, the only commercial package that does ML for missing data is Mplus.

It deals only with linear models, and I cant use logistic regressions for this option.

The main aim of my work is having logistic model with its predictions.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4992
#9

22 Mar 2016, 13:33

I agree with many of Daniel's concerns. There is a reason FIML is not used more widely. But, in those situations where it is legit to use it, I think it has a lot of advantages. In the original example, regress was being used for all the imputations and if it was also being used for the estimation FIML's assumptions would seem to be fine (or at least as legitimate as what MI did). With regards to the imputation model being wrong, you can augment with auxiliary variables; Acock discusses this in his book http://www.stata.com/bookstore/disco...ng-using-stata. MI is a pain (and as this thread suggests, sometimes confusing!) and it is nice to have answers that don't change every time you change the seed.

MPlus has somehow magically found ways to make FIML useful in even more situations (e.g. with logit models). If what it is doing is legitimate I would like to see Stata add those abilities as well.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment

Announcement

multiple mi imputes in one dataset

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment