Comibining bayesmh and churdle

As Stephen mentioned, bayesmh has a facility for users to write their own evaluators for Bayesian models. This facility is designed for you to program your own log likelihoods or log-posterior densities. Technically, rather than programming your likelihood from scratch, you can instead call any Stata estimation command that returns a log-likelihood value evaluated at the specified parameter values within bayesmh's log-likelihood evaluator.

Below, I show an example of a simple hurdle model using a subset of the fitness dataset from [R] churdle:

Code:

. webuse fitness
. set seed 17653
. sample 10
. churdle linear hours age, select(commute) ll(0)
 
Iteration 0:   log likelihood = -2783.3352  
Iteration 1:   log likelihood =  -2759.029  
Iteration 2:   log likelihood = -2758.9992  
Iteration 3:   log likelihood = -2758.9992  

Cragg hurdle regression                         Number of obs     =      1,983
                                                LR chi2(1)        =       3.42
                                                Prob > chi2       =     0.0645
Log likelihood = -2758.9992                     Pseudo R2         =     0.0006

------------------------------------------------------------------------------
       hours |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
hours        |
         age |   .0051263   .0028423     1.80   0.071    -.0004446    .0106971
       _cons |   1.170932   .1238682     9.45   0.000     .9281548    1.413709
-------------+----------------------------------------------------------------
selection_ll |
     commute |  -.0655171   .1561046    -0.42   0.675    -.3714765    .2404423
       _cons |   .1421166   .0882658     1.61   0.107    -.0308813    .3151144
-------------+----------------------------------------------------------------
lnsigma      |
       _cons |   .1280215     .03453     3.71   0.000      .060344     .195699
-------------+----------------------------------------------------------------
      /sigma |   1.136577    .039246                      1.062202    1.216161
------------------------------------------------------------------------------

The program mychurdle (shown below) is a wrapper for churdle that returns the corresponding log-likelihood value. We can fit a Bayesian hurdle model using bayesmh as follows:

Code:

gen byte hours0 = (hours==0) //dependent variable for the selection equation
set seed 123
bayesmh (hours age) (hours0 commute), llevaluator(mychurdle, parameters({lnsig}))
                   prior({hours:} {hours0:} {lnsig}, flat) saving(sim, replace) dots

We use a two-equation specification to fit this model. The main regression is specified first and the selection regression is specified next. The additional parameter, log of the standard deviation, associated with the main regression is specified in llevaluator()'s suboption parameters(). All parameters are assigned flat priors to obtain results similar to churdle. MCMC results are saved in sim.dta. This command took about 25 minutes to run on my computer.

Here is the mychurdle program.

Code:

program mychurdle
        version 14.0
        args llf xb1 xb2
        tempname b
        mat `b' = ($MH_b, $MH_p)
        cap churdle linear $MH_y1 $MH_y1x1 if $MH_touse, select($MH_y2x1) ll(0) from(`b') iterate(0)
        if _rc {
                scalar `llf' = .
        }
        else {
                scalar `llf' = e(ll)
        }
end

This program returns the log-likelihood value computed by churdle at the current values of model parameters. The program accepts three arguments: a temporary scalar to contain the log-likelihood value llf and temporary variables xb1 and xb2, each containing the linear predictor from the corresponding equation evaluated at the current values of model parameters. (We did not need to use these variables in our program because we did not compute the likelihood from scratch.) We stored current values of model parameters, regression coefficients from two equations $MH_b and the extra parameter log-standard-deviation stored in MH_p, in a temporary matrix b. We specified churdle's options from() and iterate() to evaluate the log likelihood at the current parameter values. Finally, we stored the resulting log-likelihood value in llf or missing value if the command failed to evaluate the log likelihood.

Here is the output:

Code:

. bayesmh (hours age) (hours0 commute), llevaluator(mychurdle1, parameters({lns
> ig})) prior({hours:} {hours0:} {lnsig}, flat) saving(sim, replace) dots
 
Burn-in 2500 aaaaaaaaa1000aaaaaa...2000..... done
Simulation 10000 .........1000.........2000.........3000.........4000.........5
> 000.........6000.........7000.........8000.........9000.........10000 done

Model summary
------------------------------------------------------------------------------
Likelihood:
  hours hours0 ~ mychurdle1(xb_hours,xb_hours0,{lnsig})

Priors:
       {hours:age _cons} ~ 1 (flat)                                        (1)
  {hours0:commute _cons} ~ 1 (flat)                                        (2)
                 {lnsig} ~ 1 (flat)
------------------------------------------------------------------------------
(1) Parameters are elements of the linear form xb_hours.
(2) Parameters are elements of the linear form xb_hours0.

Bayesian regression                              MCMC iterations  =     12,500
Random-walk Metropolis-Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =      1,983
                                                 Acceptance rate  =      .2889
                                                 Efficiency:  min =     .05538
                                                              avg =     .06266
Log marginal likelihood = -2772.3953                          max =     .06945
 
------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
-------------+----------------------------------------------------------------
hours        |
         age |  .0050916   .0027972   .000106   .0049923  -.0000372   .0107231
       _cons |  1.167265    .124755   .004889   1.175293   .9125105   1.392021
-------------+----------------------------------------------------------------
hours0       |
     commute | -.0621282   .1549908   .006585  -.0613511  -.3623891   .2379805
       _cons |  .1425693   .0863626   .003313   .1430396  -.0254507   .3097677
-------------+----------------------------------------------------------------
       lnsig |  .1321532   .0346446   .001472   .1326704   .0663646   .2015249
------------------------------------------------------------------------------

file sim.dta saved

We can use bayesstats summary to obtain the estimate of the standard deviation:

Code:

. bayesstats summary (sigma: exp({lnsig}))

Posterior summary statistics                      MCMC sample size =    10,000

       sigma : exp({lnsig})
 
------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
-------------+----------------------------------------------------------------
       sigma |  1.141969   .0396264   .001685   1.141874   1.068616   1.223267
------------------------------------------------------------------------------

You can adopt the approach above to any Stata command that returns the log likelihood and allows you to specify parameter values at which this log likelihood must be evaluated. This approach is very straightforward, but would be slower in general than programming the likelihood directly. See, for example,

http://www.stata.com/stata14/bayesian-evaluators/

and [BAYES] bayesmh evaluators.

We are also working on a blog entry to show how to fit this model more efficiently using bayesmh.

Announcement

Comibining bayesmh and churdle

Leave a comment:

Leave a comment: