Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Estimation issues when running a double hurdle model

    Hello everyone.

    I am currently doing a regression analysis using data from a survey, in which we asked people how much they are willing to pay to avoid blackouts. The willingness to pay (WTP) is correlated with a number of socio-demographic and attitudinal variables.

    We obtained a great number of zero answers, so we decided to use a double hurdle model. In this model, we assume that people use a two step process when deciding their WTP: first, they decide whether they are willing to pay (yes/no), then they decide how much they are willing to pay (amount). This two decisions steps are modeled using two equations: the participation equation, and the intensity/WTP equation. We asked people their WTP for different durations of blackouts.

    I have some problems with this model. I am using Stata 14 on Win 10. With the command dblhurdle, you just need to specify the Y (the wtp amount), the covariates of the participation equation, and the covariates of the WTP equation. The problems are the following:
    1. some models do not converge, i.e. for some blackout durations, using a certain technique only (nr). I can make them converge using some techniques (bfgs dfp nr), but when they do, I run into the second problem
    2. when models do converge, I either get no standard errors in the participation equation ( in this way (-) ) or the p-value is 0.999/1. I would expect some variable to be significant, but I feel like there are some issue that I cannot understand if ALL the variables have such high p-values.

    For the WTP, we used a choice card, which shows a number of quantities. If people choose quantity X, we assume that their WTP lies between quantity Xi and Xi-1. To do that, I applied the following transformations:
    Code:
     
     interval_midpoint2 = (lob_2h_k + upb_2h_k) / 2 gen category2h = . replace category2h = 1 if interval_midpoint2 <= 10 replace category2h = 2 if interval_midpoint2 > 10 & interval_midpoint2 <= 20 replace category2h = 3 if interval_midpoint2 > 20 & interval_midpoint2 <= 50 replace category2h = 4 if interval_midpoint2 > 50 & interval_midpoint2 <= 100 replace category2h = 5 if interval_midpoint2 > 100 & interval_midpoint2 <= 200 replace category2h = 6 if interval_midpoint2 > 200 & interval_midpoint2 <= 400 replace category2h = 7 if interval_midpoint2 > 400 & interval_midpoint2 <= 800 replace category2h = 8 if interval_midpoint2 > 800interval_midpoint2 = (lob_2h_k + upb_2h_k) / 2
    So the actual variable we use for the WTP is category2h, which takes values from 1 to 8.

    Then, the code for the double hurdle looks like this:

    Code:
     
     gen lnincome = ln(incomeM_INR)  global xlist1 elbill age lnincome elPwrCt_C D_InterBoth D_Female Cl_REPrj D_HAvoid_pwrCt_1417 D_HAvoid_pwrCt_1720 D_HAvoid_pwrCt_2023 Cl_PowerCut D_PrjRES_AvdPwCt Cl_NeedE_Hou Cl_HSc_RELocPart Cl_HSc_RELocEntr Cl_HSc_UtlPart Cl_HSc_UtlEntr   global xlist2 elbill elPwrCt_C Cl_REPrj D_Urban D_RESKnow D_PrjRES_AvdPwCt  foreach var of global xlist1 {     summarize `var', meanonly     scalar `var'_m = r(mean) }   ****DOUBLE HURDLE 2h ****  dblhurdle category2h $xlist1, peq($xlist2) ll(0) tech(nr) tolerance(0.0001)   esttab using "DH2FULLNEW.csv", replace stats(N r2_ll ll aic bic coef p t) cells(b(fmt(%10.6f) star) se(par fmt(3))) keep($xlist1 $xlist2) label  nlcom (category2h: _b[category2h:_cons] + elbill_m * _b[category2h:elbill] + age_m * _b[category2h:age] + lnincome_m * _b[category2h:lnincome] + elPwrCt_C_m * _b[category2h:elPwrCt_C] + Cl_REPrj_m * _b[category2h:Cl_REPrj] + D_InterBoth_m * _b[category2h:D_InterBoth] + D_Female_m * _b[category2h:D_Female] + D_HAvoid_pwrCt_1417_m * _b[category2h:D_HAvoid_pwrCt_1417] + D_HAvoid_pwrCt_1720_m * _b[category2h:D_HAvoid_pwrCt_1720] + D_HAvoid_pwrCt_2023_m * _b[category2h:D_HAvoid_pwrCt_2023] + Cl_PowerCut_m * _b[category2h:Cl_PowerCut] + D_PrjRES_AvdPwCt_m * _b[category2h:D_PrjRES_AvdPwCt] + Cl_NeedE_Hou_m * _b[category2h:Cl_NeedE_Hou] + Cl_HSc_RELocPart_m * _b[category2h:Cl_HSc_RELocPart] + Cl_HSc_RELocEntr_m * _b[category2h:Cl_HSc_RELocEntr] + Cl_HSc_UtlPart_m * _b[category2h:Cl_HSc_UtlPart] + Cl_HSc_UtlEntr_m * _b[category2h:Cl_HSc_UtlEntr]), postgen lnincome = ln(incomeM_INR)  global xlist1 elbill age lnincome elPwrCt_C D_InterBoth D_Female Cl_REPrj D_HAvoid_pwrCt_1417 D_HAvoid_pwrCt_1720 D_HAvoid_pwrCt_2023 Cl_PowerCut D_PrjRES_AvdPwCt Cl_NeedE_Hou Cl_HSc_RELocPart Cl_HSc_RELocEntr Cl_HSc_UtlPart Cl_HSc_UtlEntr   global xlist2 elbill elPwrCt_C Cl_REPrj D_Urban D_RESKnow D_PrjRES_AvdPwCt  foreach var of global xlist1 {     summarize `var', meanonly     scalar `var'_m = r(mean) }   ****DOUBLE HURDLE 2h ****  dblhurdle category2h $xlist1, peq($xlist2) ll(0) tech(nr) tolerance(0.0001)   esttab using "DH2FULLNEW.csv", replace stats(N r2_ll ll aic bic coef p t) cells(b(fmt(%10.6f) star) se(par fmt(3))) keep($xlist1 $xlist2) label  nlcom (category2h: _b[category2h:_cons] + elbill_m * _b[category2h:elbill] + age_m * _b[category2h:age] + lnincome_m * _b[category2h:lnincome] + elPwrCt_C_m * _b[category2h:elPwrCt_C] + Cl_REPrj_m * _b[category2h:Cl_REPrj] + D_InterBoth_m * _b[category2h:D_InterBoth] + D_Female_m * _b[category2h:D_Female] + D_HAvoid_pwrCt_1417_m * _b[category2h:D_HAvoid_pwrCt_1417] + D_HAvoid_pwrCt_1720_m * _b[category2h:D_HAvoid_pwrCt_1720] + D_HAvoid_pwrCt_2023_m * _b[category2h:D_HAvoid_pwrCt_2023] + Cl_PowerCut_m * _b[category2h:Cl_PowerCut] + D_PrjRES_AvdPwCt_m * _b[category2h:D_PrjRES_AvdPwCt] + Cl_NeedE_Hou_m * _b[category2h:Cl_NeedE_Hou] + Cl_HSc_RELocPart_m * _b[category2h:Cl_HSc_RELocPart] + Cl_HSc_RELocEntr_m * _b[category2h:Cl_HSc_RELocEntr] + Cl_HSc_UtlPart_m * _b[category2h:Cl_HSc_UtlPart] + Cl_HSc_UtlEntr_m * _b[category2h:Cl_HSc_UtlEntr]), post
    I tried omitting some observations whose answers do not make much sense (i.e. same wtp for the different blackouts), and I also tried to eliminate random parts of the sample to see if doing so would solve the issue (i.e. some observations are problematic). Nothing changed however.

    Using the command you see, the results I get (which show the model converging but having the p-values in the participation equation all equal to 0,99 or 1) are the following:

    Code:
     
     dblhurdle category2h $xlist1, peq($xlist2) ll(0) tech(nr) tolerance(0.0001)  Iteration 0:   log likelihood = -2716.2139  (not concave) Iteration 1:   log likelihood = -1243.5131   Iteration 2:   log likelihood = -1185.2704  (not concave) Iteration 3:   log likelihood = -1182.4797   Iteration 4:   log likelihood = -1181.1606   Iteration 5:   log likelihood =  -1181.002   Iteration 6:   log likelihood = -1180.9742   Iteration 7:   log likelihood = -1180.9691   Iteration 8:   log likelihood =  -1180.968   Iteration 9:   log likelihood = -1180.9678   Iteration 10:  log likelihood = -1180.9678    Double-Hurdle regression                        Number of obs     =      1,043 -------------------------------------------------------------------------------------          category2h |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval] --------------------+---------------------------------------------------------------- category2h          |              elbill |   .0000317    .000013     2.43   0.015     6.12e-06    .0000573                 age |  -.0017308   .0026727    -0.65   0.517    -.0069693    .0035077            lnincome |   .0133965   .0342249     0.39   0.695    -.0536832    .0804761           elPwrCt_C |   .0465667   .0100331     4.64   0.000     .0269022    .0662312         D_InterBoth |   .2708514   .0899778     3.01   0.003     .0944982    .4472046            D_Female |   .0767811   .0639289     1.20   0.230    -.0485173    .2020794            Cl_REPrj |   .0584215   .0523332     1.12   0.264    -.0441497    .1609928 D_HAvoid_pwrCt_1417 |  -.2296727   .0867275    -2.65   0.008    -.3996555     -.05969 D_HAvoid_pwrCt_1720 |   .3235389   .1213301     2.67   0.008     .0857363    .5613414 D_HAvoid_pwrCt_2023 |   .5057679   .1882053     2.69   0.007     .1368922    .8746436         Cl_PowerCut |    .090257   .0276129     3.27   0.001     .0361368    .1443773    D_PrjRES_AvdPwCt |   .1969443   .1124218     1.75   0.080    -.0233983    .4172869        Cl_NeedE_Hou |   .0402471   .0380939     1.06   0.291    -.0344156    .1149097    Cl_HSc_RELocPart |    .043495   .0375723     1.16   0.247    -.0301453    .1171352    Cl_HSc_RELocEntr |  -.0468001   .0364689    -1.28   0.199    -.1182779    .0246777      Cl_HSc_UtlPart |   .1071663   .0366284     2.93   0.003      .035376    .1789566      Cl_HSc_UtlEntr |  -.1016915   .0381766    -2.66   0.008    -.1765161   -.0268668               _cons |   .1148572   .4456743     0.26   0.797    -.7586484    .9883628 --------------------+---------------------------------------------------------------- peq                 |              elbill |   .0000723   .0952954     0.00   0.999    -.1867034    .1868479           elPwrCt_C |   .0068171   38.99487     0.00   1.000    -76.42171    76.43535            Cl_REPrj |   .0378404   185.0148     0.00   1.000    -362.5845    362.6602             D_Urban |   .0514037   209.6546     0.00   1.000    -410.8641     410.967           D_RESKnow |   .1014026   196.2956     0.00   1.000    -384.6309    384.8337    D_PrjRES_AvdPwCt |   .0727691   330.4314     0.00   1.000     -647.561    647.7065               _cons |    5.36639   820.5002     0.01   0.995    -1602.784    1613.517 --------------------+----------------------------------------------------------------              /sigma |   .7507943   .0164394                      .7185736     .783015         /covariance |  -.1497707   40.91453    -0.00   0.997    -80.34078    80.04124
    I don't know what causes the issues that I mentioned before. I don't know how to post the dataset because it's a bit too large, and I cannot install the command dataex.

    If you're willing to help out and need more info feel free to tell me and I will send you the dataset.

    What would you do in this case? Do you have any idea about what might cause this issues? I'm not experienced enough to understand this, so any help is deepily appreciated. Thank you in advance!

  • #2
    Cross-posted at https://www.reddit.com/r/stata/comme...le_regression/

    Please note that it is a rule on Reddit and a request here that you tell each place about cross-posting elsewhere.

    Comment


    • #3
      Originally posted by Nick Cox View Post
      Cross-posted at https://www.reddit.com/r/stata/comme...le_regression/

      Please note that it is a rule on Reddit and a request here that you tell each place about cross-posting elsewhere.
      I apologize for this, I wasn't aware of these rules.

      Comment


      • #4
        Thanks for your reply.

        Our advice on Statalist is phrased in terms of requests, but with a logic to each one. Every time someone starts a thread they get a prompt to read https://www.statalist.org/forums/help

        Reddit is its own concern. But https://www.reddit.com/r/stata/ carries a list of rules on the right-hand side.

        As it happens, you're getting more reaction on Reddit. I don't use double hurdle models myself and don't have expertise to bring to bear.

        Comment

        Working...
        X