Combining estimates and variances from multiple imputed datasets

Kjell Weyde

Join Date: May 2016

Posts: 125
#1

Combining estimates and variances from multiple imputed datasets

09 Nov 2018, 04:26

Hi,
I have 50 imputed datasets, created by the use of mi ice. I now would like to do Bayesian regression analyses on these data. However, since the official bayes commands are not supported by mi estimate, I thought about doing bayesian analyses on each of the 50 datasets, and then combine them in another way. Could I do this by writing a program similar to the one found at stata.com's FAQ, "How can I combine results other than coefficients in e(b) with multiply imputed data?", by Cañette and Marchenko (https://www.stata.com/support/faqs/s...mputed-data/)? If possible, could anyone give me some advice on how to modify the code below, so it can be applied to the following bayesian model:

Code:

bayes, gibbs prior({var}, igamma(0.1, 0.1)) mcmcsize(5000) burnin(1000): regress y x1 x2 i.x3

Code:

cap program drop eroctab program eroctab, eclass version 12.0 /* Step 1: perform ROC analysis */ args refvar classvar roctab `refvar' `classvar' /* Step 2: save estimate and its variance in temporary matrices*/ tempname b V mat `b' = r(area) mat `V' = r(se)^2 local N = r(N) /* Step 3: make column names and row names consistent*/ mat colnames `b' = AUC mat colnames `V' = AUC mat rownames `V' = AUC /*Step 4: post results to e()*/ ereturn post `b' `V', obs(`N') ereturn local cmd "eroctab" ereturn local title "ROC area" end mi estimate, cmdok: eroctab attack high_bmi

Best regards,
Kjell Weyde
Tags: None

Kjell Weyde

Join Date: May 2016
Posts: 125

12 Nov 2018, 06:48

When I try

Code:

cap program drop ebayes
program ebayes, eclass properties(mi)
 version 15.0
 
 args `y' `xlist'
    bayes, gibbs prior({var}, igamma(0.1, 0.1)) mcmcsize(5000) burnin(1000): regress `y' `xlist'
 
 tempname b V
 mat `b'=e(mean)
 mat `V'=e(Cov)
 local N=e(N)
 
 mat colnames `b'=bayesmean
 mat colnames `V'=bayesmean
 mat rownames `V'=bayesmean
 
 ereturn post `b' `V', obs(`N')
 ereturn local cmd "regress"
 ereturn local title "Meancoef"'
 ereturn local method "Gibbs sampling"
 ereturn local prefix "bayesian analysis"
end


mi estimate: ebayes outcome "x1 x2 i.x3"

I get the following error message:

Code:

model specification required
an error occurred when bayes executed regress
an error occurred when mi estimate executed ebayes on m=1

What am I doing wrong, and it is possible to do bayesian analyses this way?

Kjell

Comment

Steve Samuels

Join Date: Mar 2014

Posts: 1785
#3

12 Nov 2018, 10:00

I don't have Stata 15, but the likely error is the quotes around the variable list in your ebayes command. Try

Code:

mi estimate: ebayes outcome x1 x2 i.x3

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Kjell Weyde

Join Date: May 2016

Posts: 125
#4

12 Nov 2018, 10:22

Steve, thanks for your reply.
I tried, but got the same error Message.

Kjell Weyde
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1785
#5

12 Nov 2018, 11:44

A second error: the single quotes around "y" and "xlist" in the args statement. Correct with:

Code:

args y xlist

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Kjell Weyde

Join Date: May 2016

Posts: 125
#6

12 Nov 2018, 11:55

I tried, Stata issued:

Code:

invalid syntax an error occurred when mi estimate executed ebayes on m=1
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1785
#7

12 Nov 2018, 12:37

Well, you can try the following mi estimate prefix:

Code:

mi estimate, esampvaryok cmdok:

But I doubt if that will do it either (you would have gotten different error messages). Sorry, but without Stata 15, I can't help you further. I assume that you've run the standalone bayes command without problem.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Kjell Weyde

Join Date: May 2016

Posts: 125
#8

12 Nov 2018, 13:56

Will it be sufficient to do bayesian regression on each of the imputed datasets, and then compute the mean of the means and the mean of the lower and upper credible interval bounds? See for example p 2 of http://www2.stat.duke.edu/~jerry/Papers/tas10.pdf :

When the posterior distribution of the parameter of interest, or, for

likelihood-oriented statisticians, the sampling distribution of the complete-data estimator, is

approximately Gaussian, the analyst can obtain inferences by computing point and variance

estimates of interest with each dataset and combining these estimates using simple formulas.

These formulas serve to propagate the uncertainty introduced by imputation through the

analyst’s inferences.
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1785
#9

12 Nov 2018, 14:12

I'm not knowledgeable enough about this area to say much, but I doubt that the "simple formulas" are simply the averages. I suggest that you look up "Rubin's Rules". Sorry I can't help you further.

Last edited by Steve Samuels; 12 Nov 2018, 14:18.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment

Nikolay Balov (StataCorp)

StataCorp Employee

Join Date: Apr 2015
Posts: 37

#10

14 Nov 2018, 08:02

Hi Kjell,

I see a few syntactical errors in your program. Try the following:

Code:

cap program drop ebayes
program ebayes, eclass properties(mi)
  version 15.0
 
  args y xlist
  bayes, gibbs prior({var}, igamma(0.1, 0.1)) mcmcsize(5000) burnin(1000): regress `y' `xlist'
 
  tempname b V
  mat `b' = e(mean)
  mat `V' = e(Cov)
  local N = e(N)
 
  mat colnames `b' = `e(parnames)'
  mat colnames `V' = `e(parnames)'
  mat rownames `V' = `e(parnames)'
 
  ereturn post `b' `V', obs(`N')
  ereturn local cmd    = "regress"
  ereturn local title  = "Meancoef"'
  ereturn local method = "Gibbs sampling"
  ereturn local prefix = "bayesian analysis"
end

mi estimate: ebayes outcome "x1 x2 i.x3"

Nikolay

Comment

Kjell Weyde

Join Date: May 2016
Posts: 125

#11

14 Nov 2018, 13:17

Thank you very much, Nikolay Balov! The corrections made the model run (some extra variables added in the model below). Although it says "confidence interval", it is still "credible interval", right? The model output can be interpreted as if it came from a bayesian analysis with non-imputed data?The coef and SE for var seem VERY large, should I worry?

Best,
Kjell

Code:

Multiple-imputation estimates                   Imputations       =         50
Meancoef                                        Number of obs     =        651
                                                Average RVI       =     0.1813
                                                Largest FMI       =     0.1140
DF adjustment:   Large sample                   DF:     min       =   3,803.26
                                                        avg       =   1.59e+08
                                                        max       =   1.18e+09

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
M_5mC_pct    |
       logAs |  -.0107815    .007358    -1.47   0.143    -.0252029    .0036399
       logHg |  -.0017791   .0109609    -0.16   0.871    -.0232622    .0197039
       logCd |   .0093214   .0081135     1.15   0.251    -.0065809    .0252236
       logMn |  -.0030684   .0145324    -0.21   0.833    -.0315515    .0254146
       logPb |  -.0262946   .0138002    -1.91   0.057    -.0533426    .0007535
      Se_std |   .0215154   .0063219     3.40   0.001     .0091248     .033906
      logJod |  -.0046469   .0166121    -0.28   0.780    -.0372063    .0279124
    logFOLAT |   .0218687   .0208464     1.05   0.294    -.0189948    .0627323
  MORS_ALDER |  -.0004277   .0013874    -0.31   0.758     -.003147    .0022915
       _cons |   3.716607   .1101784    33.73   0.000     3.500592    3.932622
-------------+----------------------------------------------------------------
      sigma2 |   .0199093    .001115    17.86   0.000      .017724    .0220947
         var |   2.59e+52   1.29e+55     0.00   0.998    -2.53e+55    2.54e+55
------------------------------------------------------------------------------

Comment

Nikolay Balov (StataCorp)

StataCorp Employee

Join Date: Apr 2015

Posts: 37
#12

14 Nov 2018, 15:45

The parameter var is not used in the likelihood (sigma2 is) and it is only specified by its prior, which is uninformative, hence the huge estimates for it. Just replace var by sigma2 in your prior specification.

Code:

... bayes, gibbs prior({sigma2}, igamma(0.1, 0.1)) mcmcsize(5000) burnin(1000): regress `y' `xlist'
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1785
#13

15 Nov 2018, 07:33

Kjell Weyde , Thanks for the Zhou and Reiter reference. Unfortunately, the intervals in the MI output are confidence intervals as stated, not credible intervals.

Last edited by Steve Samuels; 15 Nov 2018, 07:35.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Kjell Weyde

Join Date: May 2016

Posts: 125
#14

19 Nov 2018, 04:22

Do you know how I could obtain credible intervals instead of confidence intervals? The "coefs" represent the "mean" from a bayesian analysis?

Kjell
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1785
#15

19 Nov 2018, 08:06

I don't know how to obtain credible intervals, but I guess a solution would require something like the Zhou-Reiter approach. To answer your second question: the displayed coefficients are the means of coefficients from the 10 imputations.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment

Announcement