Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Paired T-test for multiply imputed data

    Dear Stata sages,

    I have conducted multiple imputation using the -mi- command to fill the gaps in a large (N = 3000+) survey dataset. I am using this to run a multivariate regression model (which runs just fine), and also to provide descriptive stats on some peripheral variables. The latter are simple Likert-scale measures (measuring beliefs).

    For the latter, I would like to indicate which means differ (for a table). For this, I would like to conduct paired-sample t-tests. However, -mi- does not support the -ttest- command.

    How could one conduct a paired samples t-test (comparing means of two different variables) for multiply imputed data? (there are great suggestions on here for conducting unpaired t-tests with multiple imputation, but I found none for paired t-tests).

    My deepest thanks for your time!
    Last edited by Chris Reinders Folmer; 15 Dec 2022, 08:28.

  • #2
    You can emulate the paired ttest using -xtreg, fe-. You will first need to reshape your data to long so that your x1 and x2 in -ttest x1 = x2- are a single variable and another variable indicates the 1 vs 2 distinction. -mi estimate- does support -xtreg-. Here's an illustration of the emulation:

    Code:
    . clear*
    
    .
    . set obs 25
    Number of observations (_N) was 0, now 25.
    
    . set seed 1234
    
    . gen long obs_no = _n
    
    . gen z = rnormal()
    
    . forvalues i = 1/2 {
      2.     gen x`i' = z + rnormal(0, 0.2)
      3. }
    
    .
    . ttest x1 = x2
    
    Paired t test
    ------------------------------------------------------------------------------
    Variable |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
    ---------+--------------------------------------------------------------------
          x1 |      25    .0763758    .2087469    1.043734   -.3544566    .5072082
          x2 |      25    .0874277     .198443    .9922152   -.3221386     .496994
    ---------+--------------------------------------------------------------------
        diff |      25    -.011052    .0436359    .2181797   -.1011121    .0790082
    ------------------------------------------------------------------------------
         mean(diff) = mean(x1 - x2)                                   t =  -0.2533
     H0: mean(diff) = 0                              Degrees of freedom =       24
    
     Ha: mean(diff) < 0           Ha: mean(diff) != 0           Ha: mean(diff) > 0
     Pr(T < t) = 0.4011         Pr(|T| > |t|) = 0.8022          Pr(T > t) = 0.5989
    
    .
    . reshape long x, i(obs_no) j(_j)
    (j = 1 2)
    
    Data                               Wide   ->   Long
    -----------------------------------------------------------------------------
    Number of observations               25   ->   50          
    Number of variables                   4   ->   4          
    j variable (2 values)                     ->   _j
    xij variables:
                                      x1 x2   ->   x
    -----------------------------------------------------------------------------
    
    . xtset obs_no _j
    
    Panel variable: obs_no (strongly balanced)
     Time variable: _j, 1 to 2
             Delta: 1 unit
    
    . xtreg x i._j, fe
    
    Fixed-effects (within) regression               Number of obs     =         50
    Group variable: obs_no                          Number of groups  =         25
    
    R-squared:                                      Obs per group:
         Within  = 0.0027                                         min =          2
         Between =      .                                         avg =        2.0
         Overall = 0.0000                                         max =          2
    
                                                    F(1,24)           =       0.06
    corr(u_i, Xb) = -0.0000                         Prob > F          =     0.8022
    
    ------------------------------------------------------------------------------
               x | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
            2._j |    .011052   .0436359     0.25   0.802    -.0790082    .1011121
           _cons |   .0763758   .0308553     2.48   0.021     .0126936    .1400579
    -------------+----------------------------------------------------------------
         sigma_u |  1.0124405
         sigma_e |  .15427638
             rho |  .97730705   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F test that all u_i=0: F(24, 24) = 86.13                     Prob > F = 0.0000
    That same thing will work in the multiple imputation context.

    Comment


    • #3
      A paired sample t-test can easily be converted to a single sample test by computing the difference between the two variables and then testing whether the mean is zero. For example, without mi, some ways you could do it are

      Code:
      use "C:\Users\rwilliam\Downloads\2sample-IV.dta", clear
      ttest hscore = wscore
      gen diff = hscore - wscore
      ttest diff = 0
      reg diff
      mean diff
      test diff = 0
      With mi, you have to decide whether to compute diff and then impute it (the "Just Another Variable" approach), or whether to first impute v1 and v2 and then compute the difference between the imputed vars (passive imputation). I discuss the pros and cons of each on pp. 10-11 of

      https://www3.nd.edu/~rwilliam/xsoc73994/MD02.pdf
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      StataNow Version: 19.5 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://www3.nd.edu/~rwilliam

      Comment


      • #4
        Another equivalent modeling option, if of interest.

        Code:
        . mixed x i._j || obs_no : , reml dfmethod(satt) nolog nolrtest
        
        Mixed-effects REML regression                   Number of obs     =         50
        Group variable: obs_no                          Number of groups  =         25
                                                        Obs per group:
                                                                      min =          2
                                                                      avg =        2.0
                                                                      max =          2
        DF method: Satterthwaite                        DF:           min =      24.00
                                                                      avg =      24.28
                                                                      max =      24.56
                                                        F(1,    24.00)    =       0.06
        Log restricted-likelihood = -35.086189          Prob > F          =     0.8022
        
        ------------------------------------------------------------------------------
                   x | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
        -------------+----------------------------------------------------------------
                2._j |    .011052   .0436359     0.25   0.802    -.0790082    .1011121
               _cons |   .0763758   .2036601     0.38   0.711     -.343454    .4962055
        ------------------------------------------------------------------------------
        
        ------------------------------------------------------------------------------
          Random-effects parameters  |   Estimate   Std. err.     [95% conf. interval]
        -----------------------------+------------------------------------------------
        obs_no: Identity             |
                          var(_cons) |   1.013135   .2959223       .571536    1.795937
        -----------------------------+------------------------------------------------
                       var(Residual) |   .0238012   .0068708      .0135169    .0419102
        ------------------------------------------------------------------------------

        Comment


        • #5
          Dear all,

          Thank you so much for responding! These are really useful. Maybe one follow-up question for prof. Schechter, if I may:

          I have 13 variables (rather than 2) which all share the same basic name: BI_1, BI_2, BI_3, etc. When running your code it separates these into 13 groups. I presume -xtreg- (which is currently running) will test for the overall difference between these. Which is also useful, but how can I break them down into pairs (BI_1 vs BI_2, etc)? I can do this manually via the difference scores suggested by prof. Williams, but would also like to understand how to operate this when using -xtreg-.

          Again many thanks to you all!

          Comment


          • #6
            Well, if you want all 78 possible pairwise comparisons (I don't know what you'll do with all that but....):
            Code:
            forvalues i = 1/13 {
                forvalues k = `=`i'+1'/13 {
                    xtreg BI i._j if inlist(_j, `i', `k'), fe
                }
            }

            Comment


            • #7
              Clyde’s looping approach could be tweaked to use my suggested methods. But, are you sure you really want to do that? If you do, remember that, with 78 tests, you would expect about 4 to be significant at the .05 level even if the means were all the same.
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              StataNow Version: 19.5 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                Helpful, thank you both! The purpose is to indicate in a table (with superscripts) which of the means statistically differ. I will correct for the number of tests.

                Comment


                • #9
                  Here is a "use at your own risk" quick hack as an alternative:

                  Code:
                  program mi_paired_t , eclass
                      
                      version 16.1
                      
                      ttest `0'
                      
                      tempname b V
                      
                      matrix `b' = r(mu_1)-r(mu_2)
                      matrix `V' = r(se)^2
                      
                      matrix rownames `b' = "diff"
                      matrix colnames `b' = "diff"
                      
                      matrix rownames `V' = "diff"
                      matrix colnames `V' = "diff"
                      
                      ereturn post `b' `V'
                      
                      ereturn local  cmd    "mi_paired_t"
                      ereturn scalar df_r = r(df_t)
                      
                  end
                  Define the program in memory, then call as

                  Code:
                  mi estimate , cmdok : mi_paired_t varname == varname
                  Last edited by daniel klein; 16 Dec 2022, 02:38. Reason: added returned DF to force small sample correction and match results from -xtreg- and -regress- (on the calculated difference)

                  Comment


                  • #10
                    Thanks everyone! This is great. These responses will also be invaluable for anyone else who's facing the same question!

                    Comment


                    • #11
                      By the way, building on the idea of differences (see #3), you can more directly

                      Code:
                      mi estimate (diff : _b[varname_1] - _b[varname_2]) : mean varname_1 varname_2
                      Using the loop approach from #6, you could build a list of difference expressions, then make one call to mi:

                      Code:
                      mi estimate (diff_1_2 : _b[varname_1] - _b[varname_2]) (diff_1_3 : _b[varname_1] - _b[varname_2]) ... : mean varname_1 varname_2 varname_3 ...
                      No reshaping, no new variables, no ad-hoc programs; it's all readily supported within the mi machinery.

                      Code:
                      forvalues i = 1/13 {
                          forvalues j = `=`i'+1'/13 {
                              local Diff `Diff' (diff_`i'_`j' : _b[BI_`i']-_b[BI_`j'])
                          }
                          local Vars `Vars' BI_`i'
                      }
                      
                      mi estimate `Diff' : mean `Vars'
                      Last edited by daniel klein; 16 Dec 2022, 03:25. Reason: spelled out the loop

                      Comment


                      • #12
                        Again my thanks for all your helpful advice!

                        Might I perhaps ask a final follow-up question about the -xtreg- approach as suggested in #2 by prof. Schechter?

                        I ran this analysis on the 13 BI variables, and also on a second set of (again 13) variables. Curiously, the F test (which I presume is an omnibus test of differences between the 13 means) shows a difference in (total) df between the two analyses: F(12, 39710) = 301.30 vs F(12, 39695) = 592.66. Otherwise, the output refers to the same number of observations (3317) and groups (13). With everything else being identical, I can't figure out the reason for the difference in df - am I missing something here?

                        Comment


                        • #13
                          Are you running this with or without MI? Could missing data account for the discrepancies?
                          -------------------------------------------
                          Richard Williams, Notre Dame Dept of Sociology
                          StataNow Version: 19.5 MP (2 processor)

                          EMAIL: [email protected]
                          WWW: https://www3.nd.edu/~rwilliam

                          Comment


                          • #14
                            I do not recall all the details but in the multiple imputation framework, the degrees of freedom depend on the within and between imputation variances. These variances are probably different for different sets of variables. The Methods and formulas in mi estimate provide details.

                            Comment


                            • #15
                              With MI, and missing values have been imputed for all these variables. This is the code:

                              Code:
                              gen long obs_no = _n
                              mi reshape long BI_, i(obs_no) j(_j)
                              mi xtset obs_no _j
                              mi estimate: xtreg BI_ i._j, fe
                              and

                              Code:
                              gen long obs_no = _n
                              mi reshape long Advice_, i(obs_no) j(_j)
                              mi xtset obs_no _j
                              mi estimate: xtreg Advice_i._j, fe
                              I will paste the output later but the analyses need to complete first (reshaping tends to take some time on my computer).
                              Last edited by Chris Reinders Folmer; 16 Dec 2022, 10:35.

                              Comment

                              Working...
                              X