Probit model does not converge

Melina Murren

Join Date: Mar 2017
Posts: 8

Probit model does not converge

09 Feb 2019, 09:32

Hello,

I am currently using STATA 15 and I am trying to run a probit model of an indicator variable "Home Bias" (1 if person owns stock in domestic country/0 if not) on three indicator variables for whether an individual is in a specific generation (MG=millenial, GX=gen x, BB=baby boomer) and several controls and survey year controls. I run this model using sampling weights and robust standard errors. (refer to ANALYSIS WEIGHTS for more info on the weights used)

The model does not converge if I either include the sampling weights or, if sampling weights are included, I include all three generational indicators.

Code:

probit HB   MG GX BB age  education white male income_xtile networth_xtile yrx* [pw=wgt] if age<=36 ,  vce(robust)

From reading similar threads, I tried to simplify my model by running the model on simple combinations of the generational covariates that I am interested in and found that the model converges when using each generation indicator separately and in pairs, however including all three is where I run into problems. I also checked that there were sufficient observations in each condition. Below are the tabulated counts for each category...

Home Bias	MG	GX	BB
0	255	414	732
1	1354	2916	983
Total	1609	3330	1715

Additionally from similar threads, I used the -iter()- option and found that BB may be the problematic variable.

Code:

. probit HB   MG GX BB age  education white male income_xtile networth_xtile yrx* [pw=wgt] if age<=36 ,  iter(10) vce(robust) 

note: yrx1 != 0 predicts failure perfectly
      yrx1 dropped and 571 obs not used

note: yrx10 omitted because of collinearity
Iteration 0:   log pseudolikelihood =  -10994329  
Iteration 1:   log pseudolikelihood = -9995004.5  
Iteration 2:   log pseudolikelihood = -9962652.9  
Iteration 3:   log pseudolikelihood = -9962035.8  
Iteration 4:   log pseudolikelihood = -9961948.4  
Iteration 5:   log pseudolikelihood = -9961932.9  
Iteration 6:   log pseudolikelihood = -9961930.8  
Iteration 7:   log pseudolikelihood = -9961930.4  
Iteration 8:   log pseudolikelihood = -9961930.3  
Iteration 9:   log pseudolikelihood = -9961930.3  
Iteration 10:  log pseudolikelihood = -9961930.3  
convergence not achieved

Probit regression                               Number of obs     =      6,103
                                                Wald chi2(14)     =          .
                                                Prob > chi2       =          .
Log pseudolikelihood = -9961930.3               Pseudo R2         =     0.0939

--------------------------------------------------------------------------------
               |               Robust
            HB |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
            MG |  -4.980324   .3090391   -16.12   0.000     -5.58603   -4.374619
            GX |  -5.266788   .3430286   -15.35   0.000    -5.939112   -4.594464
            BB |  -5.707422          .        .       .            .           .
           age |   .0179968   .0089211     2.02   0.044     .0005117     .035482
     education |  -.1401491   .0152256    -9.20   0.000    -.1699908   -.1103074
         white |   .2162418   .0593501     3.64   0.000     .0999177    .3325659
          male |  -.3093737   .0855674    -3.62   0.000    -.4770827   -.1416647
  income_xtile |    .023847   .0030531     7.81   0.000      .017863     .029831
networth_xtile |  -.0305111   .0029032   -10.51   0.000    -.0362013    -.024821
          yrx1 |          0  (omitted)
          yrx2 |   1.811163   .1970285     9.19   0.000     1.424994    2.197332
          yrx3 |   1.400022   .1790625     7.82   0.000     1.049066    1.750978
          yrx4 |   1.295315   .1721532     7.52   0.000     .9579008    1.632729
          yrx5 |   1.498705   .1551553     9.66   0.000     1.194606    1.802804
          yrx6 |   1.108657    .147112     7.54   0.000     .8203223    1.396991
          yrx7 |    1.05504   .1383681     7.62   0.000      .783844    1.326237
          yrx8 |   .7401583   .1168842     6.33   0.000     .5110695    .9692471
          yrx9 |    .356723   .1095997     3.25   0.001     .1419115    .5715345
         yrx10 |          0  (omitted)
         _cons |   7.084225   .5015207    14.13   0.000     6.101262    8.067187
--------------------------------------------------------------------------------
Note: 0 failures and 5 successes completely determined.
Warning: convergence not achieved

A potential issue I thought of is that there are no survey years in which there are both millenials and baby boomers who have non-missing values for HB. Below are the counts by year...

year	MG	GX	BB
1989	0	10	561
1992	0	111	563
1995	0	315	361
1998	0	498	230
2001	40	859	0
2004	75	557	0
2007	190	440	0
2010	335	350	0
2013	360	190	0

Would it be possible for someone to assist me in determining why this model will not converge and whether there is a possible solution to get around this issue?

Thanks in advance!

Tags: probit, regression

John Mullahy

Join Date: Dec 2016

Posts: 753
#2

09 Feb 2019, 11:05

In cases like this a common problem is that the dummy RHS variables don't "span" the dependent variable, i.e. that the data contain all four combinations of dummies and the dependent variable: (0,0), (0,1), (1,0), (1,1). The estimates of MG, GX, and BB are huge (in magnitude), usually a sign that there are problems.

It wasn't clear from your post whether the model converges okay with sampling weights (?). So I would recommend:

1. running the model without weights just to be sure there's nothing crazy about the weights.
2. looking at the 2x2 tables of HB with MG, with GX, and with BB to see if the spanning criterion is met.
2 likes
Comment
Melina Murren

Join Date: Mar 2017

Posts: 8
#3

10 Feb 2019, 09:47

Hi John, Thank you so much for the suggestions!

1. The model does converge when I exclude the weights, HOWEVER only when I run the whole model with controls. If I only run the three generational variables without weights, the model converges but the coefficients are huge (similar to above output) and STATA does not report the standard errors in the output. I am not exactly sure how or why the weights would cause the model not to converge. The sampling weights are provided by the Survey of Consumer Finances data set to correct for non-responses.

The weight (X42001) is a partially design-based weight constructed at the Federal Reserve using original selection probabilities and frame information along with aggregate control totals estimated from the Current Population Survey.

2. It does seem that the spanning criterion is met with regard to combinations of the three generation variables and HB.

HB MG GX BB

0 255 414 732

1 1354 2916 983

However, I believe the issue is that I am restricting the sample to individuals 36 years old or younger since the oldest millennial in my sample is 36. The model works if I don't include this age restriction so I was wondering if I can correct for the age bias in another way.
Comment

Announcement

Probit model does not converge

Comment

Comment