Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ask experts for help:how to set the prior of regression coefficients that follow dirichlet distribution in bayes model

    Hello, statistics experts,
    I have a question that troubles me for long time.
    My purpose to estimate the joint effect of air pollution on the risk of birth defects. An random effect logistic model was used and was estimated by bayes method.

    The stata programme was showed as following:

    bayesmh case = ({a:} + {b}*({w1}*pre3m_pre1m_pm25 + {w2}*pre3m_pre1m_co + {w3}*pre3m_pre1m_so2 + {w4}*pre3m_pre1m_no2 + {w5}*pre3m_pre1m_o3 )), likelihood(logit) ///
    define(a: U[code],xb) ///
    prior({a:_cons}, normal(-5, 10)) ///
    prior({b},normal(0,5)) ///
    prior({w1 w2 w3 w4 w5 },dirichlet(1, 1, 1, 1, 1)) ///
    prior({var_U}, igamma(0.01,0.01)) ///
    block({var_U},gibbs) ///
    mcmcsize(5000) dots rseed(1234)

    Note: the coefficients {wi} is the weight of different air pollution, w1+....w5=1, and wi should be between 0 and 1. So I set wi as dirichlet distribution.

    The results show as following:
    | Equal-tailed
    | Mean Std. dev. MCSE Median [95% cred. interval]
    -------------+----------------------------------------------------------------
    a |
    _cons | -.6128033 .2640545 .095993 -.6561622 -1.148229 -.1349925
    -------------+----------------------------------------------------------------
    b | .0005872 .0005011 .000108 .0004688 .0000103 .0017945
    w1 | .0709399 .070009 .016397 .0460853 .00433 .2668698
    w2 | .2155131 .1589676 .028325 .2030874 .0035754 .5265113
    w3 | .2662654 .1226952 .036767 .2907973 .0279012 .4378617
    w4 | .1239908 .1079504 .023449 .0927303 .000796 .3904645
    w5 | -1.393883 .7596525 .242501 -1.444812 -2.598773 .2032503
    var_U | 39.29819 9.156024 1.66255 37.93158 25.63195 62.13184

    According the results , the estimation of sum of wi is not equal to 1, w5 is less than 0.

    My question is how to set the prior of regression coefficients that follow dirichlet distribution in bayes model? Is the prior setting right in my programme?

    Thanks a lot.







  • #2
    any experts help me?

    Comment


    • #3
      Any experts help me? thanks a lot

      Comment


      • #4
        Just a wild guess here, as I'm a complete newbie still learning about Bayesian methods.

        You've specified a Dirichlet prior distribution for the w's. The Dirichlet has support on [0,1]. However, the likelihood function is logit, which has no restrictions on parameters. The posterior distribution of the w's therefore has support that includes negative values. Said differently, just because the prior distribution has support on the [0, 1] cube doesn't mean the posterior will as well.

        Comment


        • #5
          Originally posted by Brian Poi View Post
          Just a wild guess here, as I'm a complete newbie still learning about Bayesian methods.

          You've specified a Dirichlet prior distribution for the w's. The Dirichlet has support on [0,1]. However, the likelihood function is logit, which has no restrictions on parameters. The posterior distribution of the w's therefore has support that includes negative values. Said differently, just because the prior distribution has support on the [0, 1] cube doesn't mean the posterior will as well.
          Thanks a lot. Maybe you are right. We have used R software to esimate the same model, and the prior of w's also set by Dirichlet. The results seem right, becuase w is between 0 and 1, and the sum of w is equal to 0.99. But it takes too long to calculate the model by using R. It takes 24 hours to esimate the model for the sample size of 10,000. But i have more than 1,000,000 individual cases. So far, we have spent almost two weeks, but we have not yet estimated the results. But we used Stata to estimate this model with the sample size of 100,000, which only took less than 10 mins. So i think STATA IS MORE efficient than R. But i don't know how to set prior of w.

          Comment


          • #6
            Originally posted by Xiaohong Li View Post

            Thanks a lot. Maybe you are right. We have used R software to esimate the same model, and the prior of w's also set by Dirichlet. The results seem right, becuase w is between 0 and 1, and the sum of w is equal to 0.99. But it takes too long to calculate the model by using R. It takes 24 hours to esimate the model for the sample size of 10,000. But i have more than 1,000,000 individual cases. So far, we have spent almost two weeks, but we have not yet estimated the results. But we used Stata to estimate this model with the sample size of 100,000, which only took less than 10 mins. So i think STATA IS MORE efficient than R. But i don't know how to set prior of w.
            I strongly suspect that if R takes 24 hours while Stata (note capitalization) takes 10 minutes and if R produces estimates of the w's in [0,1], then the models you are specifying differ. Is the model really a simple logit model? Perhaps R is constraining the w parameters to be in [0, 1] and to sum to one. Specifying the logit model in Stata does not imply either of those constraints.

            In short, I don't think your R and Stata specifications are the same.

            Comment


            • #7
              Originally posted by Brian Poi View Post

              I strongly suspect that if R takes 24 hours while Stata (note capitalization) takes 10 minutes and if R produces estimates of the w's in [0,1], then the models you are specifying differ. Is the model really a simple logit model? Perhaps R is constraining the w parameters to be in [0, 1] and to sum to one. Specifying the logit model in Stata does not imply either of those constraints.

              In short, I don't think your R and Stata specifications are the same.
              Thanks for you kindly reply.

              The model we used is random effect logistic regression. The dependent variable is 0/1.



              Comment


              • #8
                I can't find any examples of its use either in the user's manual or online, but I suspect that either you're using the prior incorrectly or there's something that needs an explanation from StataCorp.

                With a flat-Dirichlet prior like you've specified, drawing solely from the prior (i.e., the log-likelihood is a constant), the first K - 1 probabilities seem spot on, but the last one is bizarre and wildly unstable in repeated draws.

                .ÿ
                .ÿversionÿ17.0

                .ÿ
                .ÿclearÿ*

                .ÿ
                .ÿ//ÿseedem
                .ÿsetÿseedÿ1955127249

                .ÿ
                .ÿquietlyÿsetÿobsÿ1

                .ÿ
                .ÿforvaluesÿiÿ=ÿ1/4ÿ{
                ÿÿ2.ÿÿÿÿÿgenerateÿbyteÿx`i'ÿ=ÿ1
                ÿÿ3.ÿ}

                .ÿgenerateÿbyteÿyÿ=ÿ1

                .ÿ
                .ÿbayesmhÿyÿ,ÿlikelihood(llf(0))ÿprior({x1ÿx2ÿx3ÿx4},ÿdirichlet(1,ÿ1,ÿ1,ÿ1))ÿinitrandom
                ÿÿ
                Burn-inÿ...
                Simulationÿ...

                Modelÿsummary
                ------------------------------------------------------------------------------
                Likelihood:ÿ
                ÿÿyÿ~ÿ1ÿ(flat)()

                Hyperprior:ÿ
                ÿÿ{x1ÿx2ÿx3ÿx4}ÿ~ÿdirichlet(4,1,1,1,1)
                ------------------------------------------------------------------------------

                BayesianÿregressionÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿMCMCÿiterationsÿÿ=ÿÿÿÿÿ12,500
                Random-walkÿMetropolis–HastingsÿsamplingÿÿÿÿÿÿÿÿÿBurn-inÿÿÿÿÿÿÿÿÿÿ=ÿÿÿÿÿÿ2,500
                ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿMCMCÿsampleÿsizeÿ=ÿÿÿÿÿ10,000
                ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿ=ÿÿÿÿÿÿÿÿÿÿ1
                ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿAcceptanceÿrateÿÿ=ÿÿÿÿÿÿ.2339
                ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿEfficiency:ÿÿminÿ=ÿÿÿÿ.001341
                ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿavgÿ=ÿÿÿÿÿ.03541
                Logÿmarginal-likelihoodÿ=ÿÿ8.3274328ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿmaxÿ=ÿÿÿÿÿ.05733
                ÿ
                ------------------------------------------------------------------------------
                ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿEqual-tailed
                ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿMeanÿÿÿStd.ÿdev.ÿÿÿÿÿMCSEÿÿÿÿÿMedianÿÿ[95%ÿcred.ÿinterval]
                -------------+----------------------------------------------------------------
                ÿÿÿÿÿÿÿÿÿÿx1ÿ|ÿÿ.2524885ÿÿÿÿ.194967ÿÿÿ.009529ÿÿÿ.2033094ÿÿÿ.0119862ÿÿÿ.7182582
                ÿÿÿÿÿÿÿÿÿÿx2ÿ|ÿÿ.2530415ÿÿÿ.2014287ÿÿÿ.008412ÿÿÿ.2069479ÿÿÿ.0066687ÿÿÿ.7407032
                ÿÿÿÿÿÿÿÿÿÿx3ÿ|ÿÿ.2458174ÿÿÿ.1994163ÿÿÿ.009838ÿÿÿ.1938636ÿÿÿ.0085661ÿÿÿ.7421992
                ÿÿÿÿÿÿÿÿÿÿx4ÿ|ÿ-13675.39ÿÿÿ2925.474ÿÿÿ798.925ÿÿ-14076.71ÿÿ-18507.14ÿÿ-8518.988
                ------------------------------------------------------------------------------
                Note:ÿThereÿisÿaÿhighÿautocorrelationÿafterÿ500ÿlags.

                .ÿ
                .ÿbayesmhÿyÿ,ÿlikelihood(llf(0))ÿprior({x1ÿx2ÿx3ÿx4},ÿdirichlet(1,ÿ1,ÿ1,ÿ1))ÿinitrandom
                ÿÿ
                Burn-inÿ...
                Simulationÿ...

                Modelÿsummary
                ------------------------------------------------------------------------------
                Likelihood:ÿ
                ÿÿyÿ~ÿ1ÿ(flat)()

                Hyperprior:ÿ
                ÿÿ{x1ÿx2ÿx3ÿx4}ÿ~ÿdirichlet(4,1,1,1,1)
                ------------------------------------------------------------------------------

                BayesianÿregressionÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿMCMCÿiterationsÿÿ=ÿÿÿÿÿ12,500
                Random-walkÿMetropolis–HastingsÿsamplingÿÿÿÿÿÿÿÿÿBurn-inÿÿÿÿÿÿÿÿÿÿ=ÿÿÿÿÿÿ2,500
                ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿMCMCÿsampleÿsizeÿ=ÿÿÿÿÿ10,000
                ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿ=ÿÿÿÿÿÿÿÿÿÿ1
                ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿAcceptanceÿrateÿÿ=ÿÿÿÿÿÿ.1657
                ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿEfficiency:ÿÿminÿ=ÿÿÿÿ.001535
                ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿavgÿ=ÿÿÿÿÿ.03216
                Logÿmarginal-likelihoodÿ=ÿÿ11.908795ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿmaxÿ=ÿÿÿÿÿÿÿ.052
                ÿ
                ------------------------------------------------------------------------------
                ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿEqual-tailed
                ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿMeanÿÿÿStd.ÿdev.ÿÿÿÿÿMCSEÿÿÿÿÿMedianÿÿ[95%ÿcred.ÿinterval]
                -------------+----------------------------------------------------------------
                ÿÿÿÿÿÿÿÿÿÿx1ÿ|ÿÿ.2516266ÿÿÿ.2048234ÿÿÿ.011246ÿÿÿ.2008732ÿÿÿ.0101688ÿÿÿ.7653414
                ÿÿÿÿÿÿÿÿÿÿx2ÿ|ÿÿ.2528741ÿÿÿ.1960739ÿÿÿ.009577ÿÿÿ.2109316ÿÿÿ.0064874ÿÿÿ.7024476
                ÿÿÿÿÿÿÿÿÿÿx3ÿ|ÿÿ.2275172ÿÿÿ.1835299ÿÿÿ.008048ÿÿÿ.1814989ÿÿÿ.0068541ÿÿÿ.6580437
                ÿÿÿÿÿÿÿÿÿÿx4ÿ|ÿÿ224487.5ÿÿÿ104823.7ÿÿÿ26754.2ÿÿÿ224376.9ÿÿ-978.0481ÿÿÿ409580.7
                ------------------------------------------------------------------------------
                Note:ÿThereÿisÿaÿhighÿautocorrelationÿafterÿ500ÿlags.

                .ÿ
                .ÿexit

                endÿofÿdo-file


                .


                Perhaps StataCorp's Technical Services staff can provide some guidance on how (and where) to use this prior. I recommend contacting them.

                Comment


                • #9
                  Thank you for pointing this bug to us. The problem has already been fixed in Stata 17 and Stata 18. You would just need to make sure you have the latest update for your corresponding version, and then try executing your code again.

                  Comment

                  Working...
                  X