No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Comibining bayesmh and churdle

    I am interested in using the new Bayesian statistical analyses capabilities of Stata 14 and combine them with a double hurdle model estimation.

    Let me use a typical example. Consider modeling how much people spend at the movies. We have data on a cross-section of people and the amount they spent last month (y1). We have a number of variables to model the decision to go to the movies (z1). These variables are x1 x2 x3. We also have another set of variables to model the amount spent at the movies (x1 x2 x4).

    I am interested in simulating how much of x1 is needed to make the decision to go to the movies viable, and what impact it may have this increase in x1 in the amount spent.

    I understand that I can estimate this double hurdle model using the commands dblhurdle, dhreg or the new command churdle. However I am not sure if I can combine any of this estimations with bayesmh in order to perform the desired simulation.

    Any help on this matter will be greatly appreciated!



  • #2
    My guess is that you could, but it would require "effort" on your part. Reasoning: my quick reading of the help for bayesmh finds the possibility for incorporating user-defined likelihoods for multiple equation models ("Multivariate normal nonlinear regression"). See also page 159 of the corresponding pdf manual (bayesmh evaluators — User-defined evaluators with bayesmh) where there is an example for a Multivariate normal regression model


    • #3
      As Stephen mentioned, bayesmh has a facility for users to write their own evaluators for Bayesian models. This facility is designed for you to program your own log likelihoods or log-posterior densities. Technically, rather than programming your likelihood from scratch, you can instead call any Stata estimation command that returns a log-likelihood value evaluated at the specified parameter values within bayesmh's log-likelihood evaluator.

      Below, I show an example of a simple hurdle model using a subset of the fitness dataset from [R] churdle:

      . webuse fitness
      . set seed 17653
      . sample 10
      . churdle linear hours age, select(commute) ll(0)
      Iteration 0:   log likelihood = -2783.3352  
      Iteration 1:   log likelihood =  -2759.029  
      Iteration 2:   log likelihood = -2758.9992  
      Iteration 3:   log likelihood = -2758.9992  
      Cragg hurdle regression                         Number of obs     =      1,983
                                                      LR chi2(1)        =       3.42
                                                      Prob > chi2       =     0.0645
      Log likelihood = -2758.9992                     Pseudo R2         =     0.0006
             hours |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      hours        |
               age |   .0051263   .0028423     1.80   0.071    -.0004446    .0106971
             _cons |   1.170932   .1238682     9.45   0.000     .9281548    1.413709
      selection_ll |
           commute |  -.0655171   .1561046    -0.42   0.675    -.3714765    .2404423
             _cons |   .1421166   .0882658     1.61   0.107    -.0308813    .3151144
      lnsigma      |
             _cons |   .1280215     .03453     3.71   0.000      .060344     .195699
            /sigma |   1.136577    .039246                      1.062202    1.216161
      The program mychurdle (shown below) is a wrapper for churdle that returns the corresponding log-likelihood value. We can fit a Bayesian hurdle model using bayesmh as follows:

      gen byte hours0 = (hours==0) //dependent variable for the selection equation
      set seed 123
      bayesmh (hours age) (hours0 commute), llevaluator(mychurdle, parameters({lnsig}))
                         prior({hours:} {hours0:} {lnsig}, flat) saving(sim, replace) dots
      We use a two-equation specification to fit this model. The main regression is specified first and the selection regression is specified next. The additional parameter, log of the standard deviation, associated with the main regression is specified in llevaluator()'s suboption parameters(). All parameters are assigned flat priors to obtain results similar to churdle. MCMC results are saved in sim.dta. This command took about 25 minutes to run on my computer.

      Here is the mychurdle program.

      program mychurdle
              version 14.0
              args llf xb1 xb2
              tempname b
              mat `b' = ($MH_b, $MH_p)
              cap churdle linear $MH_y1 $MH_y1x1 if $MH_touse, select($MH_y2x1) ll(0) from(`b') iterate(0)
              if _rc {
                      scalar `llf' = .
              else {
                      scalar `llf' = e(ll)
      This program returns the log-likelihood value computed by churdle at the current values of model parameters. The program accepts three arguments: a temporary scalar to contain the log-likelihood value llf and temporary variables xb1 and xb2, each containing the linear predictor from the corresponding equation evaluated at the current values of model parameters. (We did not need to use these variables in our program because we did not compute the likelihood from scratch.) We stored current values of model parameters, regression coefficients from two equations $MH_b and the extra parameter log-standard-deviation stored in MH_p, in a temporary matrix b. We specified churdle's options from() and iterate() to evaluate the log likelihood at the current parameter values. Finally, we stored the resulting log-likelihood value in llf or missing value if the command failed to evaluate the log likelihood.

      Here is the output:

      . bayesmh (hours age) (hours0 commute), llevaluator(mychurdle1, parameters({lns
      > ig})) prior({hours:} {hours0:} {lnsig}, flat) saving(sim, replace) dots
      Burn-in 2500 aaaaaaaaa1000aaaaaa...2000..... done
      Simulation 10000 .........1000.........2000.........3000.........4000.........5
      > 000.........6000.........7000.........8000.........9000.........10000 done
      Model summary
        hours hours0 ~ mychurdle1(xb_hours,xb_hours0,{lnsig})
             {hours:age _cons} ~ 1 (flat)                                        (1)
        {hours0:commute _cons} ~ 1 (flat)                                        (2)
                       {lnsig} ~ 1 (flat)
      (1) Parameters are elements of the linear form xb_hours.
      (2) Parameters are elements of the linear form xb_hours0.
      Bayesian regression                              MCMC iterations  =     12,500
      Random-walk Metropolis-Hastings sampling         Burn-in          =      2,500
                                                       MCMC sample size =     10,000
                                                       Number of obs    =      1,983
                                                       Acceptance rate  =      .2889
                                                       Efficiency:  min =     .05538
                                                                    avg =     .06266
      Log marginal likelihood = -2772.3953                          max =     .06945
                   |                                                Equal-tailed
                   |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
      hours        |
               age |  .0050916   .0027972   .000106   .0049923  -.0000372   .0107231
             _cons |  1.167265    .124755   .004889   1.175293   .9125105   1.392021
      hours0       |
           commute | -.0621282   .1549908   .006585  -.0613511  -.3623891   .2379805
             _cons |  .1425693   .0863626   .003313   .1430396  -.0254507   .3097677
             lnsig |  .1321532   .0346446   .001472   .1326704   .0663646   .2015249
      file sim.dta saved
      We can use bayesstats summary to obtain the estimate of the standard deviation:

      . bayesstats summary (sigma: exp({lnsig}))
      Posterior summary statistics                      MCMC sample size =    10,000
             sigma : exp({lnsig})
                   |                                                Equal-tailed
                   |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
             sigma |  1.141969   .0396264   .001685   1.141874   1.068616   1.223267
      You can adopt the approach above to any Stata command that returns the log likelihood and allows you to specify parameter values at which this log likelihood must be evaluated. This approach is very straightforward, but would be slower in general than programming the likelihood directly. See, for example,

      and [BAYES] bayesmh evaluators.

      We are also working on a blog entry to show how to fit this model more efficiently using bayesmh.