Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Javier Escobal
    started a topic Comibining bayesmh and churdle

    Comibining bayesmh and churdle

    Hello,
    I am interested in using the new Bayesian statistical analyses capabilities of Stata 14 and combine them with a double hurdle model estimation.

    Let me use a typical example. Consider modeling how much people spend at the movies. We have data on a cross-section of people and the amount they spent last month (y1). We have a number of variables to model the decision to go to the movies (z1). These variables are x1 x2 x3. We also have another set of variables to model the amount spent at the movies (x1 x2 x4).

    I am interested in simulating how much of x1 is needed to make the decision to go to the movies viable, and what impact it may have this increase in x1 in the amount spent.

    I understand that I can estimate this double hurdle model using the commands dblhurdle, dhreg or the new command churdle. However I am not sure if I can combine any of this estimations with bayesmh in order to perform the desired simulation.

    Any help on this matter will be greatly appreciated!

    best

    Javier

  • Yulia Marchenko (StataCorp)
    replied
    As Stephen mentioned, bayesmh has a facility for users to write their own evaluators for Bayesian models. This facility is designed for you to program your own log likelihoods or log-posterior densities. Technically, rather than programming your likelihood from scratch, you can instead call any Stata estimation command that returns a log-likelihood value evaluated at the specified parameter values within bayesmh's log-likelihood evaluator.

    Below, I show an example of a simple hurdle model using a subset of the fitness dataset from [R] churdle:

    Code:
    . webuse fitness
    . set seed 17653
    . sample 10
    . churdle linear hours age, select(commute) ll(0)
     
    Iteration 0:   log likelihood = -2783.3352  
    Iteration 1:   log likelihood =  -2759.029  
    Iteration 2:   log likelihood = -2758.9992  
    Iteration 3:   log likelihood = -2758.9992  
    
    Cragg hurdle regression                         Number of obs     =      1,983
                                                    LR chi2(1)        =       3.42
                                                    Prob > chi2       =     0.0645
    Log likelihood = -2758.9992                     Pseudo R2         =     0.0006
    
    ------------------------------------------------------------------------------
           hours |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    hours        |
             age |   .0051263   .0028423     1.80   0.071    -.0004446    .0106971
           _cons |   1.170932   .1238682     9.45   0.000     .9281548    1.413709
    -------------+----------------------------------------------------------------
    selection_ll |
         commute |  -.0655171   .1561046    -0.42   0.675    -.3714765    .2404423
           _cons |   .1421166   .0882658     1.61   0.107    -.0308813    .3151144
    -------------+----------------------------------------------------------------
    lnsigma      |
           _cons |   .1280215     .03453     3.71   0.000      .060344     .195699
    -------------+----------------------------------------------------------------
          /sigma |   1.136577    .039246                      1.062202    1.216161
    ------------------------------------------------------------------------------
    The program mychurdle (shown below) is a wrapper for churdle that returns the corresponding log-likelihood value. We can fit a Bayesian hurdle model using bayesmh as follows:

    Code:
    gen byte hours0 = (hours==0) //dependent variable for the selection equation
    set seed 123
    bayesmh (hours age) (hours0 commute), llevaluator(mychurdle, parameters({lnsig}))
                       prior({hours:} {hours0:} {lnsig}, flat) saving(sim, replace) dots
    We use a two-equation specification to fit this model. The main regression is specified first and the selection regression is specified next. The additional parameter, log of the standard deviation, associated with the main regression is specified in llevaluator()'s suboption parameters(). All parameters are assigned flat priors to obtain results similar to churdle. MCMC results are saved in sim.dta. This command took about 25 minutes to run on my computer.

    Here is the mychurdle program.

    Code:
    program mychurdle
            version 14.0
            args llf xb1 xb2
            tempname b
            mat `b' = ($MH_b, $MH_p)
            cap churdle linear $MH_y1 $MH_y1x1 if $MH_touse, select($MH_y2x1) ll(0) from(`b') iterate(0)
            if _rc {
                    scalar `llf' = .
            }
            else {
                    scalar `llf' = e(ll)
            }
    end
    This program returns the log-likelihood value computed by churdle at the current values of model parameters. The program accepts three arguments: a temporary scalar to contain the log-likelihood value llf and temporary variables xb1 and xb2, each containing the linear predictor from the corresponding equation evaluated at the current values of model parameters. (We did not need to use these variables in our program because we did not compute the likelihood from scratch.) We stored current values of model parameters, regression coefficients from two equations $MH_b and the extra parameter log-standard-deviation stored in MH_p, in a temporary matrix b. We specified churdle's options from() and iterate() to evaluate the log likelihood at the current parameter values. Finally, we stored the resulting log-likelihood value in llf or missing value if the command failed to evaluate the log likelihood.

    Here is the output:

    Code:
    . bayesmh (hours age) (hours0 commute), llevaluator(mychurdle1, parameters({lns
    > ig})) prior({hours:} {hours0:} {lnsig}, flat) saving(sim, replace) dots
     
    Burn-in 2500 aaaaaaaaa1000aaaaaa...2000..... done
    Simulation 10000 .........1000.........2000.........3000.........4000.........5
    > 000.........6000.........7000.........8000.........9000.........10000 done
    
    Model summary
    ------------------------------------------------------------------------------
    Likelihood:
      hours hours0 ~ mychurdle1(xb_hours,xb_hours0,{lnsig})
    
    Priors:
           {hours:age _cons} ~ 1 (flat)                                        (1)
      {hours0:commute _cons} ~ 1 (flat)                                        (2)
                     {lnsig} ~ 1 (flat)
    ------------------------------------------------------------------------------
    (1) Parameters are elements of the linear form xb_hours.
    (2) Parameters are elements of the linear form xb_hours0.
    
    Bayesian regression                              MCMC iterations  =     12,500
    Random-walk Metropolis-Hastings sampling         Burn-in          =      2,500
                                                     MCMC sample size =     10,000
                                                     Number of obs    =      1,983
                                                     Acceptance rate  =      .2889
                                                     Efficiency:  min =     .05538
                                                                  avg =     .06266
    Log marginal likelihood = -2772.3953                          max =     .06945
     
    ------------------------------------------------------------------------------
                 |                                                Equal-tailed
                 |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
    -------------+----------------------------------------------------------------
    hours        |
             age |  .0050916   .0027972   .000106   .0049923  -.0000372   .0107231
           _cons |  1.167265    .124755   .004889   1.175293   .9125105   1.392021
    -------------+----------------------------------------------------------------
    hours0       |
         commute | -.0621282   .1549908   .006585  -.0613511  -.3623891   .2379805
           _cons |  .1425693   .0863626   .003313   .1430396  -.0254507   .3097677
    -------------+----------------------------------------------------------------
           lnsig |  .1321532   .0346446   .001472   .1326704   .0663646   .2015249
    ------------------------------------------------------------------------------
    
    file sim.dta saved
    We can use bayesstats summary to obtain the estimate of the standard deviation:

    Code:
    . bayesstats summary (sigma: exp({lnsig}))
    
    Posterior summary statistics                      MCMC sample size =    10,000
    
           sigma : exp({lnsig})
     
    ------------------------------------------------------------------------------
                 |                                                Equal-tailed
                 |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
    -------------+----------------------------------------------------------------
           sigma |  1.141969   .0396264   .001685   1.141874   1.068616   1.223267
    ------------------------------------------------------------------------------
    You can adopt the approach above to any Stata command that returns the log likelihood and allows you to specify parameter values at which this log likelihood must be evaluated. This approach is very straightforward, but would be slower in general than programming the likelihood directly. See, for example,

    http://www.stata.com/stata14/bayesian-evaluators/

    and [BAYES] bayesmh evaluators.

    We are also working on a blog entry to show how to fit this model more efficiently using bayesmh.

    Leave a comment:


  • Stephen Jenkins
    replied
    My guess is that you could, but it would require "effort" on your part. Reasoning: my quick reading of the help for bayesmh finds the possibility for incorporating user-defined likelihoods for multiple equation models ("Multivariate normal nonlinear regression"). See also page 159 of the corresponding pdf manual (bayesmh evaluators — User-defined evaluators with bayesmh) where there is an example for a Multivariate normal regression model

    Leave a comment:

Working...
X