Multiple imputation + ttest

Morgan Adler

Join Date: Jul 2021

Posts: 2
#1

Multiple imputation + ttest

09 Jul 2021, 16:19

Hi,
I am struggling to find the right code that allows me to do a ttest on imputed data. mi estimate does not work, nor does mi predict. Does anyone know what I can use? Thanks so much
Tags: multiple imputation
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#2

09 Jul 2021, 16:40

You can emulate a ttest with -regress-. If you do -regress Y i.X-, where X is your grouping variable (coded 0/1) and Y is the outcome variable you're testing for equality across groups, the coefficient of X will be the mean difference in the values of Y between the groups, the standard error will be the standard error of the difference, and the t-statistic and p-value will be exactly what you would have gotten from -ttest Y, by(X)-. So with multiply imputed data, use -mi estimate: regress Y i.x-. If you also want to use something like predict, there is the -mi predict- command.

Added: I'm assuming that you are talking about a two-group t-test, not a paired t-test. The latter would require a somewhat different approach.
2 likes
Comment
Morgan Adler

Join Date: Jul 2021

Posts: 2
#3

09 Jul 2021, 16:45

Thank you! I'm really just trying to get the DFs (I was able to run the t-test in SPSS, but their DFs are messed up). The output I get when I estimate means by grouping variable have three different DFs---average, max, and min. Should I use average?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#4

09 Jul 2021, 18:00

Oh, look, you aren't actually doing a t-test here. You're emulating it in a way that, when applied to non-imputed data, replicates the output of the t-test, including the df. But when you're working with multiply imputed data, there is no long any such thing as "the degrees of freedom." They vary, as you have noted, and there is no real reason to single out one value. That's just part of the limitations of working with MI data.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#5

09 Jul 2021, 20:30

Whenever somebody asks me things like, why do the DF in MI equal what they do, I just say "Because." I know a fair amount about statistics but there are lots of things I am very willing to take on blind faith.

As a sidelight, for a simple linear model like this, you could also use sem with Full information Maximum Likelihood. When it works, FIML is ofter better than MI and will return the same results every time. FIML is certainly a lot simpler than MI. For example,

Code:

webuse mheart2, clear sem (bmi <- hsgrad), method (mlmv)

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
1 like
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#6

09 Jul 2021, 20:57

Clyde got me wondering how you would do a paired t-test using regress. This is one way:

Code:

use https://www3.nd.edu/~rwilliam/statafiles/2sample-IV.dta, clear ttest hscore = wscore gen dscore = hscore - wscore reg dscore

or else maybe

Code:

constraint 1 wscore = 1 cnsreg hscore wscore, constraint(1)

Of course, it is a lot easier just to use the ttest command, but if you want to use mi you may have to figure out alternatives to things like ttest.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
daniel klein

Join Date: Mar 2014

Posts: 3860
#7

09 Jul 2021, 21:03

Aside from the more technical questions about DF*, I wonder whether MI (or FIML for that matter) makes a lot of sense in a scenario where we have two variables, one of which is a grouping variable. First, I can hardly imagine a situation in which the probability of the grouping variable being missing depends on the outcome (alone), while the outcome is fully observed. Conversely, I wonder whether a missing outcome can reasonably be assumed to be MAR when the only predictor is the grouping variable. Also, even if the outcome is MAR, the imputation model with one predictor does not add any information and, thus, is likely to merely add noise; I imagine the situation is similar for FIML.

Edit: * The DF for testing a coefficient (which is what the t-test boils down to) in case of MI is documented in the Manuals for mi estimate (p. 66).

Edit2: The option dftable of mi estimate might also be relevant.

Last edited by daniel klein; 09 Jul 2021, 21:16.
1 like
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#8

09 Jul 2021, 21:28

Daniel makes very good points. Of course, the imputation model for the grouping variable can include many other variables besides the outcome, which might be useful in this case.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
1 like
Comment

Announcement

Multiple imputation + ttest

Comment

Comment

Comment

Comment

Comment

Comment

Comment