Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple imputation + ttest

    Hi,
    I am struggling to find the right code that allows me to do a ttest on imputed data. mi estimate does not work, nor does mi predict. Does anyone know what I can use? Thanks so much

  • #2
    You can emulate a ttest with -regress-. If you do -regress Y i.X-, where X is your grouping variable (coded 0/1) and Y is the outcome variable you're testing for equality across groups, the coefficient of X will be the mean difference in the values of Y between the groups, the standard error will be the standard error of the difference, and the t-statistic and p-value will be exactly what you would have gotten from -ttest Y, by(X)-. So with multiply imputed data, use -mi estimate: regress Y i.x-. If you also want to use something like predict, there is the -mi predict- command.

    Added: I'm assuming that you are talking about a two-group t-test, not a paired t-test. The latter would require a somewhat different approach.

    Comment


    • #3
      Thank you! I'm really just trying to get the DFs (I was able to run the t-test in SPSS, but their DFs are messed up). The output I get when I estimate means by grouping variable have three different DFs---average, max, and min. Should I use average?

      Comment


      • #4
        Oh, look, you aren't actually doing a t-test here. You're emulating it in a way that, when applied to non-imputed data, replicates the output of the t-test, including the df. But when you're working with multiply imputed data, there is no long any such thing as "the degrees of freedom." They vary, as you have noted, and there is no real reason to single out one value. That's just part of the limitations of working with MI data.

        Comment


        • #5
          Whenever somebody asks me things like, why do the DF in MI equal what they do, I just say "Because." I know a fair amount about statistics but there are lots of things I am very willing to take on blind faith.

          As a sidelight, for a simple linear model like this, you could also use sem with Full information Maximum Likelihood. When it works, FIML is ofter better than MI and will return the same results every time. FIML is certainly a lot simpler than MI. For example,

          Code:
          webuse mheart2, clear
          sem (bmi <- hsgrad), method (mlmv)
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 19.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment


          • #6
            Clyde got me wondering how you would do a paired t-test using regress. This is one way:

            Code:
            use https://www3.nd.edu/~rwilliam/statafiles/2sample-IV.dta, clear
            ttest hscore = wscore
            gen dscore = hscore - wscore
            reg dscore
            or else maybe

            Code:
            constraint 1 wscore = 1
            cnsreg hscore wscore, constraint(1)
            Of course, it is a lot easier just to use the ttest command, but if you want to use mi you may have to figure out alternatives to things like ttest.
            -------------------------------------------
            Richard Williams, Notre Dame Dept of Sociology
            StataNow Version: 19.5 MP (2 processor)

            EMAIL: [email protected]
            WWW: https://www3.nd.edu/~rwilliam

            Comment


            • #7
              Aside from the more technical questions about DF*, I wonder whether MI (or FIML for that matter) makes a lot of sense in a scenario where we have two variables, one of which is a grouping variable. First, I can hardly imagine a situation in which the probability of the grouping variable being missing depends on the outcome (alone), while the outcome is fully observed. Conversely, I wonder whether a missing outcome can reasonably be assumed to be MAR when the only predictor is the grouping variable. Also, even if the outcome is MAR, the imputation model with one predictor does not add any information and, thus, is likely to merely add noise; I imagine the situation is similar for FIML.


              Edit: * The DF for testing a coefficient (which is what the t-test boils down to) in case of MI is documented in the Manuals for mi estimate (p. 66).

              Edit2: The option dftable of mi estimate might also be relevant.
              Last edited by daniel klein; 09 Jul 2021, 21:16.

              Comment


              • #8
                Daniel makes very good points. Of course, the imputation model for the grouping variable can include many other variables besides the outcome, which might be useful in this case.
                -------------------------------------------
                Richard Williams, Notre Dame Dept of Sociology
                StataNow Version: 19.5 MP (2 processor)

                EMAIL: [email protected]
                WWW: https://www3.nd.edu/~rwilliam

                Comment

                Working...
                X