Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Error message after running a double-hurdle model with alcohol purchase data

    Hi there,
    I am using a panel (2008-2010) on alcohol purchases in Scotland. There are 3079 households in the sample survey and 239,542 observation on alcohol purchases, among which 182,450 are 'zero observations' (i.e. no purchase of alcohol). The nature of the data primarily justifies the use of the double-hurdle (DH) model to accommodate for the zero observations and obtain efficient estimates.
    I run a basic DH model with demographic and economic variables (income, occupations, type of residential area, age of household head, household with young children, alcohol purchase level, year of purchase, promotion on alchol) as follows:

    dblhurdle Weekly_cheap_units_new Low_income2 Intermediate_income2 Intermediate Routine_manual Urban_prosperity Comfortable_off Moderate_means Hard_pressed Total_number_of_adults Household_with_children5 Gender Age3040 Age4050 Age5060 Age60p MeanWeeklyAlcoholLvl_0wks2 MeanWeeklyAlcoholLvl_0wks3 Year_of_purchase2 Year_of_purchase3 Total_alcohol_spending Promotion_dummy if Social_Class!=6 , ll(0) peq(Total_number_of_adults Household_with_children5 MeanWeeklyAlcoholLvl_0wks2 MeanWeeklyAlcoholLvl_0wks3)


    However, the model does not run and instead I obtain the following message:

    initial values not feasible
    r(1400);

    My recent search on the web suggest to apply a random sampling, which also means I will loose some observations.

    Please, other than the suggestions above, could you please advice me on how to circumvent the problem I am facing? Your help would be very much appreciated. Thanks

  • #2
    Hi Ourega-Zoe,

    It is not uncommon for maximum likelihood models, or any problem that requires numerical optimization, to fail because the initial values for the problem are not feasible. The reason for this might be an identification problem or just very poor initial values for the problem. The first issue requires you to think about your model and your data. The second problem could be fixed using better starting values using the -from()- option, where you feed the optimizer starting values, or using other maximization options. Below I provide an example of how to do this.


    First I run the model:

    Code:
     . use http://fmwww.bc.edu/ec-p/data/wooldridge/smoke, clear
    
    
    . dblhurdle cigs educ, ll(0) vsquish
    Iteration 0:   log likelihood = -4643.8586  (not concave)
    Iteration 1:   log likelihood = -3338.8689  (not concave)
    Iteration 2:   log likelihood = -3020.2734  (not concave)
    Iteration 3:   log likelihood = -2242.5549  (not concave)
    Iteration 4:   log likelihood = -1961.4711  (not concave)
    Iteration 5:   log likelihood = -1831.9725  (not concave)
    Iteration 6:   log likelihood = -1782.5579  (not concave)
    Iteration 7:   log likelihood = -1762.8551  (not concave)
    Iteration 8:   log likelihood = -1758.6351  (not concave)
    Iteration 9:   log likelihood = -1757.0137  (not concave)
    Iteration 10:  log likelihood = -1756.3723  (not concave)
    Iteration 11:  log likelihood = -1755.9694  (not concave)
    Iteration 12:  log likelihood = -1755.5693  (not concave)
    Iteration 13:  log likelihood = -1749.9021  (not concave)
    Iteration 14:  log likelihood = -1748.7417  
    Iteration 15:  log likelihood = -1745.6506  
    Iteration 16:  log likelihood = -1742.8491  
    Iteration 17:  log likelihood = -1742.5903  
    Iteration 18:  log likelihood = -1742.5775  
    Iteration 19:  log likelihood = -1742.5774  
    
    Double-Hurdle regression                        Number of obs     =        807
    ------------------------------------------------------------------------------
            cigs |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    cigs         |
            educ |   .3409848   .4882945     0.70   0.485    -.6160548    1.298024
           _cons |   .1427075   5.185535     0.03   0.978    -10.02075    10.30617
    -------------+----------------------------------------------------------------
    peq          |
            educ |  -.0730872   .0250463    -2.92   0.004     -.122177   -.0239974
           _cons |   .6902624   .3569937     1.93   0.053    -.0094325    1.389957
    -------------+----------------------------------------------------------------
          /sigma |   20.46622   1.629942                      17.27159    23.66085
     /covariance |   17.84589    2.80837     6.35   0.000     12.34158    23.35019
    ------------------------------------------------------------------------------
    Now I use different starting values. In fact, I will use values close to what the model converged to. This is how I would use different starting values:

    Code:
    . dblhurdle cigs educ, ll(0) vsquish from( .341 .142  -.073 .690 20.466 17.846, copy)
    Iteration 0:   log likelihood = -1742.5776  
    Iteration 1:   log likelihood = -1742.5774  
    Iteration 2:   log likelihood = -1742.5774  
    
    Double-Hurdle regression                        Number of obs     =        807
    ------------------------------------------------------------------------------
            cigs |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    cigs         |
            educ |   .3410538   .4883448     0.70   0.485    -.6160845    1.298192
           _cons |   .1422804   5.185757     0.03   0.978    -10.02162    10.30618
    -------------+----------------------------------------------------------------
    peq          |
            educ |  -.0730905   .0250489    -2.92   0.004    -.1221854   -.0239955
           _cons |   .6903138   .3570382     1.93   0.053    -.0094682    1.390096
    -------------+----------------------------------------------------------------
          /sigma |   20.46601    1.63009                      17.27109    23.66093
     /covariance |    17.8453   2.808994     6.35   0.000     12.33977    23.35083
    ------------------------------------------------------------------------------

    Notice that the model converged after a couple of iterations instead of 19.

    I suggest that you work with the maximization options and explore the numerical characteristics of your problem.

    Finally, you could also think about using the new Stata 14 command -churdle- which fits hurdle models.

    Comment


    • #3
      Hi Enrique,
      thanks very much for answering so quickly and for the examples provided. I will try them as suggested and hope for better results.
      Thanks for your help. It is very much appreciated. Zoe

      Comment


      • #4
        Hi,

        I'm currently experiencing the same issue as the previous user, where I run the below code for a model and get an error message for initial values not feasible.

        How can determine if there is an identification problem or very poor initial values for the problem? I'm not familiar with the double hurdle model and am unsure how to navigate determining the best model. Previously I used a tobit model, but because it failed normality and homoscedasticity, I was advised to use the double hurdle model. Thank you in advanced for you help.

        Alexis


        code:

        global xlist income2 income3 num_child1 num_child3 age_y35 age_y68 age_y911 age_y1214 age_y1517
        dblhurdle yc_clothing_new $xlist if hw_kids & age_young_member < 16 , ll(0)

        result:

        initial values not feasible
        r(1400);

        Comment

        Working...
        X