Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Probit model does not converge

    Hello,

    I am currently using STATA 15 and I am trying to run a probit model of an indicator variable "Home Bias" (1 if person owns stock in domestic country/0 if not) on three indicator variables for whether an individual is in a specific generation (MG=millenial, GX=gen x, BB=baby boomer) and several controls and survey year controls. I run this model using sampling weights and robust standard errors. (refer to ANALYSIS WEIGHTS for more info on the weights used)

    The model does not converge if I either include the sampling weights or, if sampling weights are included, I include all three generational indicators.

    Code:
    probit HB   MG GX BB age  education white male income_xtile networth_xtile yrx* [pw=wgt] if age<=36 ,  vce(robust)
    From reading similar threads, I tried to simplify my model by running the model on simple combinations of the generational covariates that I am interested in and found that the model converges when using each generation indicator separately and in pairs, however including all three is where I run into problems. I also checked that there were sufficient observations in each condition. Below are the tabulated counts for each category...
    Home Bias MG GX BB
    0 255 414 732
    1 1354 2916 983
    Total 1609 3330 1715

    Additionally from similar threads, I used the -iter()- option and found that BB may be the problematic variable.

    Code:
    . probit HB   MG GX BB age  education white male income_xtile networth_xtile yrx* [pw=wgt] if age<=36 ,  iter(10) vce(robust) 
    
    note: yrx1 != 0 predicts failure perfectly
          yrx1 dropped and 571 obs not used
    
    note: yrx10 omitted because of collinearity
    Iteration 0:   log pseudolikelihood =  -10994329  
    Iteration 1:   log pseudolikelihood = -9995004.5  
    Iteration 2:   log pseudolikelihood = -9962652.9  
    Iteration 3:   log pseudolikelihood = -9962035.8  
    Iteration 4:   log pseudolikelihood = -9961948.4  
    Iteration 5:   log pseudolikelihood = -9961932.9  
    Iteration 6:   log pseudolikelihood = -9961930.8  
    Iteration 7:   log pseudolikelihood = -9961930.4  
    Iteration 8:   log pseudolikelihood = -9961930.3  
    Iteration 9:   log pseudolikelihood = -9961930.3  
    Iteration 10:  log pseudolikelihood = -9961930.3  
    convergence not achieved
    
    Probit regression                               Number of obs     =      6,103
                                                    Wald chi2(14)     =          .
                                                    Prob > chi2       =          .
    Log pseudolikelihood = -9961930.3               Pseudo R2         =     0.0939
    
    --------------------------------------------------------------------------------
                   |               Robust
                HB |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ---------------+----------------------------------------------------------------
                MG |  -4.980324   .3090391   -16.12   0.000     -5.58603   -4.374619
                GX |  -5.266788   .3430286   -15.35   0.000    -5.939112   -4.594464
                BB |  -5.707422          .        .       .            .           .
               age |   .0179968   .0089211     2.02   0.044     .0005117     .035482
         education |  -.1401491   .0152256    -9.20   0.000    -.1699908   -.1103074
             white |   .2162418   .0593501     3.64   0.000     .0999177    .3325659
              male |  -.3093737   .0855674    -3.62   0.000    -.4770827   -.1416647
      income_xtile |    .023847   .0030531     7.81   0.000      .017863     .029831
    networth_xtile |  -.0305111   .0029032   -10.51   0.000    -.0362013    -.024821
              yrx1 |          0  (omitted)
              yrx2 |   1.811163   .1970285     9.19   0.000     1.424994    2.197332
              yrx3 |   1.400022   .1790625     7.82   0.000     1.049066    1.750978
              yrx4 |   1.295315   .1721532     7.52   0.000     .9579008    1.632729
              yrx5 |   1.498705   .1551553     9.66   0.000     1.194606    1.802804
              yrx6 |   1.108657    .147112     7.54   0.000     .8203223    1.396991
              yrx7 |    1.05504   .1383681     7.62   0.000      .783844    1.326237
              yrx8 |   .7401583   .1168842     6.33   0.000     .5110695    .9692471
              yrx9 |    .356723   .1095997     3.25   0.001     .1419115    .5715345
             yrx10 |          0  (omitted)
             _cons |   7.084225   .5015207    14.13   0.000     6.101262    8.067187
    --------------------------------------------------------------------------------
    Note: 0 failures and 5 successes completely determined.
    Warning: convergence not achieved
    A potential issue I thought of is that there are no survey years in which there are both millenials and baby boomers who have non-missing values for HB. Below are the counts by year...
    year MG GX BB
    1989 0 10 561
    1992 0 111 563
    1995 0 315 361
    1998 0 498 230
    2001 40 859 0
    2004 75 557 0
    2007 190 440 0
    2010 335 350 0
    2013 360 190 0

    Would it be possible for someone to assist me in determining why this model will not converge and whether there is a possible solution to get around this issue?

    Thanks in advance!

  • #2
    In cases like this a common problem is that the dummy RHS variables don't "span" the dependent variable, i.e. that the data contain all four combinations of dummies and the dependent variable: (0,0), (0,1), (1,0), (1,1). The estimates of MG, GX, and BB are huge (in magnitude), usually a sign that there are problems.

    It wasn't clear from your post whether the model converges okay with sampling weights (?). So I would recommend:

    1. running the model without weights just to be sure there's nothing crazy about the weights.
    2. looking at the 2x2 tables of HB with MG, with GX, and with BB to see if the spanning criterion is met.

    Comment


    • #3
      Hi John, Thank you so much for the suggestions!

      1. The model does converge when I exclude the weights, HOWEVER only when I run the whole model with controls. If I only run the three generational variables without weights, the model converges but the coefficients are huge (similar to above output) and STATA does not report the standard errors in the output. I am not exactly sure how or why the weights would cause the model not to converge. The sampling weights are provided by the Survey of Consumer Finances data set to correct for non-responses.

      The weight (X42001) is a partially design-based weight constructed at the Federal Reserve using original selection probabilities and frame information along with aggregate control totals estimated from the Current Population Survey.
      2. It does seem that the spanning criterion is met with regard to combinations of the three generation variables and HB.
      HB MG GX BB
      0 255 414 732
      1 1354 2916 983
      However, I believe the issue is that I am restricting the sample to individuals 36 years old or younger since the oldest millennial in my sample is 36. The model works if I don't include this age restriction so I was wondering if I can correct for the age bias in another way.

      Comment

      Working...
      X