Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combining estimates and variances from multiple imputed datasets

    Hi,
    I have 50 imputed datasets, created by the use of mi ice. I now would like to do Bayesian regression analyses on these data. However, since the official bayes commands are not supported by mi estimate, I thought about doing bayesian analyses on each of the 50 datasets, and then combine them in another way. Could I do this by writing a program similar to the one found at stata.com's FAQ, "How can I combine results other than coefficients in e(b) with multiply imputed data?", by CaƱette and Marchenko (https://www.stata.com/support/faqs/s...mputed-data/)? If possible, could anyone give me some advice on how to modify the code below, so it can be applied to the following bayesian model:
    Code:
    bayes, gibbs prior({var}, igamma(0.1, 0.1)) mcmcsize(5000) burnin(1000): regress y x1 x2 i.x3
    Code:
    cap program drop eroctab
    program eroctab, eclass
            version 12.0
    
            /* Step 1: perform ROC analysis */
            args refvar classvar
            roctab `refvar' `classvar'
    
            /* Step 2: save estimate and its variance in temporary matrices*/
            tempname b V
            mat `b' = r(area)
            mat `V' = r(se)^2
    local N = r(N)
    
            /* Step 3: make column names and row names consistent*/
            mat colnames `b' = AUC
            mat colnames `V' = AUC
            mat rownames `V' = AUC
    
            /*Step 4: post results to e()*/
            ereturn post `b' `V', obs(`N')
            ereturn local cmd "eroctab"
            ereturn local title "ROC area"
    end
    
    mi estimate, cmdok: eroctab attack high_bmi
    Best regards,
    Kjell Weyde

  • #2
    When I try
    Code:
    cap program drop ebayes
    program ebayes, eclass properties(mi)
     version 15.0
     
     args `y' `xlist'
        bayes, gibbs prior({var}, igamma(0.1, 0.1)) mcmcsize(5000) burnin(1000): regress `y' `xlist'
     
     tempname b V
     mat `b'=e(mean)
     mat `V'=e(Cov)
     local N=e(N)
     
     mat colnames `b'=bayesmean
     mat colnames `V'=bayesmean
     mat rownames `V'=bayesmean
     
     ereturn post `b' `V', obs(`N')
     ereturn local cmd "regress"
     ereturn local title "Meancoef"'
     ereturn local method "Gibbs sampling"
     ereturn local prefix "bayesian analysis"
    end
    
    
    mi estimate: ebayes outcome "x1 x2 i.x3"
    I get the following error message:
    Code:
    model specification required
    an error occurred when bayes executed regress
    an error occurred when mi estimate executed ebayes on m=1
    What am I doing wrong, and it is possible to do bayesian analyses this way?

    Kjell

    Comment


    • #3
      I don't have Stata 15, but the likely error is the quotes around the variable list in your ebayes command. Try
      Code:
      mi estimate: ebayes outcome x1 x2 i.x3
      Steve Samuels
      Statistical Consulting
      [email protected]

      Stata 14.2

      Comment


      • #4
        Steve, thanks for your reply.
        I tried, but got the same error Message.

        Kjell Weyde

        Comment


        • #5
          A second error: the single quotes around "y" and "xlist" in the args statement. Correct with:
          Code:
          args y xlist
          Steve Samuels
          Statistical Consulting
          [email protected]

          Stata 14.2

          Comment


          • #6
            I tried, Stata issued:
            Code:
            invalid syntax
            an error occurred when mi estimate executed ebayes on m=1

            Comment


            • #7
              Well, you can try the following mi estimate prefix:

              Code:
              mi estimate, esampvaryok cmdok:
              But I doubt if that will do it either (you would have gotten different error messages). Sorry, but without Stata 15, I can't help you further. I assume that you've run the standalone bayes command without problem.
              Steve Samuels
              Statistical Consulting
              [email protected]

              Stata 14.2

              Comment


              • #8
                Will it be sufficient to do bayesian regression on each of the imputed datasets, and then compute the mean of the means and the mean of the lower and upper credible interval bounds? See for example p 2 of http://www2.stat.duke.edu/~jerry/Papers/tas10.pdf :
                When the posterior distribution of the parameter of interest, or, for

                likelihood-oriented statisticians, the sampling distribution of the complete-data estimator, is

                approximately Gaussian, the analyst can obtain inferences by computing point and variance

                estimates of interest with each dataset and combining these estimates using simple formulas.

                These formulas serve to propagate the uncertainty introduced by imputation through the

                analystā€™s inferences.

                Comment


                • #9

                  I'm not knowledgeable enough about this area to say much, but I doubt that the "simple formulas" are simply the averages. I suggest that you look up "Rubin's Rules". Sorry I can't help you further.
                  Last edited by Steve Samuels; 12 Nov 2018, 14:18.
                  Steve Samuels
                  Statistical Consulting
                  [email protected]

                  Stata 14.2

                  Comment


                  • #10
                    Hi Kjell,

                    I see a few syntactical errors in your program. Try the following:

                    Code:
                    cap program drop ebayes
                    program ebayes, eclass properties(mi)
                      version 15.0
                     
                      args y xlist
                      bayes, gibbs prior({var}, igamma(0.1, 0.1)) mcmcsize(5000) burnin(1000): regress `y' `xlist'
                     
                      tempname b V
                      mat `b' = e(mean)
                      mat `V' = e(Cov)
                      local N = e(N)
                     
                      mat colnames `b' = `e(parnames)'
                      mat colnames `V' = `e(parnames)'
                      mat rownames `V' = `e(parnames)'
                     
                      ereturn post `b' `V', obs(`N')
                      ereturn local cmd    = "regress"
                      ereturn local title  = "Meancoef"'
                      ereturn local method = "Gibbs sampling"
                      ereturn local prefix = "bayesian analysis"
                    end
                    
                    mi estimate: ebayes outcome "x1 x2 i.x3"
                    Nikolay

                    Comment


                    • #11
                      Thank you very much, Nikolay Balov! The corrections made the model run (some extra variables added in the model below). Although it says "confidence interval", it is still "credible interval", right? The model output can be interpreted as if it came from a bayesian analysis with non-imputed data?The coef and SE for var seem VERY large, should I worry?

                      Best,
                      Kjell

                      Code:
                      Multiple-imputation estimates                   Imputations       =         50
                      Meancoef                                        Number of obs     =        651
                                                                      Average RVI       =     0.1813
                                                                      Largest FMI       =     0.1140
                      DF adjustment:   Large sample                   DF:     min       =   3,803.26
                                                                              avg       =   1.59e+08
                                                                              max       =   1.18e+09
                      
                      ------------------------------------------------------------------------------
                                   |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                      -------------+----------------------------------------------------------------
                      M_5mC_pct    |
                             logAs |  -.0107815    .007358    -1.47   0.143    -.0252029    .0036399
                             logHg |  -.0017791   .0109609    -0.16   0.871    -.0232622    .0197039
                             logCd |   .0093214   .0081135     1.15   0.251    -.0065809    .0252236
                             logMn |  -.0030684   .0145324    -0.21   0.833    -.0315515    .0254146
                             logPb |  -.0262946   .0138002    -1.91   0.057    -.0533426    .0007535
                            Se_std |   .0215154   .0063219     3.40   0.001     .0091248     .033906
                            logJod |  -.0046469   .0166121    -0.28   0.780    -.0372063    .0279124
                          logFOLAT |   .0218687   .0208464     1.05   0.294    -.0189948    .0627323
                        MORS_ALDER |  -.0004277   .0013874    -0.31   0.758     -.003147    .0022915
                             _cons |   3.716607   .1101784    33.73   0.000     3.500592    3.932622
                      -------------+----------------------------------------------------------------
                            sigma2 |   .0199093    .001115    17.86   0.000      .017724    .0220947
                               var |   2.59e+52   1.29e+55     0.00   0.998    -2.53e+55    2.54e+55
                      ------------------------------------------------------------------------------

                      Comment


                      • #12
                        The parameter var is not used in the likelihood (sigma2 is) and it is only specified by its prior, which is uninformative, hence the huge estimates for it. Just replace var by sigma2 in your prior specification.

                        Code:
                        ... 
                         bayes, gibbs prior({sigma2}, igamma(0.1, 0.1)) mcmcsize(5000) burnin(1000): regress `y' `xlist'

                        Comment


                        • #13
                          Kjell Weyde , Thanks for the Zhou and Reiter reference. Unfortunately, the intervals in the MI output are confidence intervals as stated, not credible intervals.
                          Last edited by Steve Samuels; 15 Nov 2018, 07:35.
                          Steve Samuels
                          Statistical Consulting
                          [email protected]

                          Stata 14.2

                          Comment


                          • #14
                            Do you know how I could obtain credible intervals instead of confidence intervals? The "coefs" represent the "mean" from a bayesian analysis?

                            Kjell

                            Comment


                            • #15
                              I don't know how to obtain credible intervals, but I guess a solution would require something like the Zhou-Reiter approach. To answer your second question: the displayed coefficients are the means of coefficients from the 10 imputations.
                              Steve Samuels
                              Statistical Consulting
                              [email protected]

                              Stata 14.2

                              Comment

                              Working...
                              X