2x2 tables with multiple imputed data

Jen Bowdoin

Join Date: May 2014

Posts: 7
#1

2x2 tables with multiple imputed data

23 May 2014, 15:40

Does anyone have any recommendations for obtaining 2x2 tables with multiply imputed data? Tabulate does not work with mi in Stata, but I've seen 2x2 tables in a number of studies conducted on imputed datasets. The literature on multiple imputation also indicates that you can combine parameter estimates and standard errors from imputed datasets into a single inference, using Rubin's rules. This would solve my problem, but I haven't found a way to do this in Stata either. Any suggestions?
Tags: None
Kieran McCaul

Join Date: Apr 2014

Posts: 60
#2

23 May 2014, 16:44

How about using proportion:

Code:

mi estimate, post: proportion var1, over(var2)
Comment
Jen Bowdoin

Join Date: May 2014

Posts: 7
#3

28 May 2014, 10:40

I'll give it a shot, but I think proportion will give me the percentages in each group and not the counts. I'd like to estimate both.
Comment
Jen Bowdoin

Join Date: May 2014

Posts: 7
#4

28 May 2014, 10:47

The only other things I can think of are to run mi xeq for each m and average the results, which is not very practical with 50 imputations especially since I have multiple imputed variables, or to try to create a passive variable that is equal to the average value across m for each imputed variable. I haven't tried the second approach yet, but I think it might be faster than the first if it works. If I'm forced to use the first approach, I'll probably limit the number of imputations I use to create the estimate. I should note that I'm not planning to use the estimates for anything other than descriptive statistics.
Comment

Kieran McCaul

Join Date: Apr 2014
Posts: 60

28 May 2014, 15:33

I don't think that the imputed counts are all that meaningful.

If you were analyzing data from a survey that employed a multistage, clustered design, once the data are weighted to reflect this, the frequency counts are not that informative: you're going to make inferences about the population using the estimates of proportions, not the frequency counts.

Having said that, if I just run tabulate over 10 imputed datasets, I'll get the correct imputed proportions and if I wanted to get the imputed frequencies, I could simply divide these by 10.

Code:

. tab w2score if _mi_m > 0 & _mi_m <= 10

   Frail at |
     Wave 2 |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |     74,325       72.13       72.13
          1 |     28,725       27.87      100.00
------------+-----------------------------------
      Total |    103,050      100.00

Or I could use mi estimate, proportion and get the same thing, albeit with the correct standard errors for the proportions.

Code:

.
. mi estimate, post i(1/10): proportion w2score

Multiple-imputation estimates      Imputations     =        10
Proportion estimation              Number of obs   =     10305
                                   Average RVI     =    1.7141
                                   Largest FMI     =    0.6605
                                   Complete DF     =     10304
DF adjustment:   Small sample      DF:     min     =     22.43
                                           avg     =     22.43
Within VCE type:     Analytic              max     =     22.43

--------------------------------------------------------------
             | Proportion   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
w2score      |
           0 |   .7212518   .0072766      .7061778    .7363258
           1 |   .2787482   .0072766      .2636742    .2938222
--------------------------------------------------------------

. matrix counts = vecdiag(e(_N)'*e(b))

. matrix list counts

counts[1,2]
     w2score:  w2score:
           0         1
r1    7432.5    2872.5

Comment

Jen Bowdoin

Join Date: May 2014

Posts: 7
#6

02 Jun 2014, 11:15

Thanks, Kieran. I'm not quite sure why, but your first suggestion doesn't seem to be working. I have 50 imputations. When I run tab uscY1 if _mi_m>0 & _mi_m<=50, my 2x2 table only includes 12,441 of the 61,489 observations in my dataset. Any sense of why this is happening? I don't have any missing values for the variable, uscY1, in imputations 1 through 50. Does it have something to do with style? My data is wide right now.
Comment

Announcement