GLM Model (log link, poisson family) predicting values > 1 with binary outcome but only with particular independent variable

sarah minion

Join Date: Sep 2017
Posts: 12

GLM Model (log link, poisson family) predicting values > 1 with binary outcome but only with particular independent variable

27 Jul 2020, 10:01

Hello,

Thanks in advance for reviewing my question. I am trying to run a glm model (below) with a restricted cubic spline, but it is predicting values >1 for my binary outcome. I am running the exact same model with other independent continuous spline variables and am having no issues. Just with this particular variable, I am having difficulties. I thought it was because the values are small (between 0 and 0.0013, so I multiplied by 10k and still the same issue. I attached a graph of the predicted probabilities and a sample of the data. I cannot think of a reason why this is happening, so any help is appreciated!! e2sfca30km is the variable giving me difficulties, and kmspline1-6 is the restricted cubic spline from e2sfca30km. I have tried this model with 3-7 splines.

E2sfca30km is a measure at the level of census block group so there are only 9k values for 580k individuals in the dataset. 1.5% of the variable = 0. Skewness = 0.555, kurtosis 4.17.

Code:

glm apncu_cat2 kmspline*, fam(poisson) link(log) vce(robust)

Code:

clear
input float(apncu_cat2 kmspline1 kmspline2 kmspline3 kmspline4 kmspline5 kmspline6 e2sfca30km)
0 .00028818857  .00004457869  5.113746e-06  1.444405e-07             0             0 .00028818857
0 .00026230657 .000033261917  2.703544e-06  7.405989e-09             0             0 .00026230657
0 .00023844297  .00002469184  1.291617e-06             0             0             0 .00023844297
0  .0004159302   .0001384533  .00003760846  9.969267e-06 2.1287353e-06 1.2599354e-07  .0004159302
0  .0004119772  .00013444884 .000035937806  9.285647e-06  1.888368e-06  9.170773e-08  .0004119772
1 .00028010557  .00004080611  4.250433e-06   7.49899e-08             0             0 .00028010557
0 .00027958507  .00004057076  4.198444e-06   7.15074e-08             0             0 .00027958507
0 .00025103986 .000029003764 1.9509584e-06 1.3559164e-10             0             0 .00025103986
0 .00028010557  .00004080611  4.250433e-06   7.49899e-08             0             0 .00028010557
0  .0002661811 .000034817243  3.001255e-06  1.457813e-08             0             0  .0002661811
0 .00026700884  .00003515568  3.067567e-06 1.6548881e-08             0             0 .00026700884
0  .0004536351   .0001806384   .0000561545  .00001819453  5.471376e-06  9.093207e-07  .0004536351
1  .0005572831   .0003282382  .00012796781  .00005453524  .00002325828  7.007693e-06  .0005572831
0  .0003217929 .000062755695  9.952177e-06  8.650636e-07  6.584936e-10             0  .0003217929
0 .00026771924 .000035447887 3.1252505e-06 1.8375584e-08             0             0 .00026771924
0 .00026278905 .000033453016  2.739492e-06 8.1290095e-09             0             0 .00026278905
1    .00039412   .0001173194   .0000290054  6.584881e-06  1.025755e-06 1.1155322e-08    .00039412
0  .0003506344  .00008182417 .000015973721  2.301527e-06  9.386544e-08             0  .0003506344
0  .0002377031  .00002445282  1.258366e-06             0             0             0  .0002377031
0 .00026774168 .000035457142  3.127084e-06  1.843537e-08             0             0 .00026774168
0  .0002678029 .000035482408  3.132092e-06 1.8599193e-08             0             0  .0002678029
0  .0005172913  .00026659502  .00009716836  .00003847566 .000015118503 4.0678246e-06  .0005172913
0  .0004609424  .00018962205  .00006028068    .000020139  6.338406e-06 1.1614068e-06  .0004609424
1  .0001860779 .000011311463  7.507855e-08             0             0             0  .0001860779
1 .00023370735  .00002318857 1.0886133e-06             0             0             0 .00023370735
1 .00018509098 .000011122442  6.856325e-08             0             0             0 .00018509098
1  .0004410896  .00016580876  .00004946164 .000015114076  4.144417e-06  5.508027e-07  .0004410896
1 .00040353995  .00012616147 .000032538937  7.932543e-06  1.437277e-06  4.051566e-08 .00040353995
1  .0007652474   .0006941111   .0003179769  .00015772582  .00007791635 .000027962524  .0007652474
0 .00026773266  .00003545342  3.126347e-06 1.8411317e-08             0             0 .00026773266
end
label values apncu_cat2 apncu2
label def apncu2 0 "Inadequate or Intermediate", modify
label def apncu2 1 "Adequate or Adequate Plus", modify

Click image for larger version

Name: e2sfca30km issue.png
Views: 2
Size: 60.0 KB
ID: 1565556

Attached Files

Tags: None

Joro Kolev

Join Date: Aug 2018
Posts: 3050

27 Jul 2020, 10:32

What you re doing is more simply known as Poisson regression (see below for the numerical equivalence of the two commands).
Poisson regression is
1) Not really meant for binary variables. Not that the computer will break if you fit it, but for binary variables logit and probit are more appropriate.
2) There is nothing in the Poisson regression to restrict the predictions between 0 and 1, so I do not see nothing unusual in what you re reporting. I do not see a problem here at all.

Code:

. glm apncu_cat2 kmspline*, fam(poisson) link(log) vce(robust) nolog

Generalized linear models                         No. of obs      =         30
Optimization     : ML                             Residual df     =         24
                                                  Scale parameter =          1
Deviance         =  14.28542523                   (1/df) Deviance =   .5952261
Pearson          =  24.01640992                   (1/df) Pearson  =   1.000684

Variance function: V(u) = u                       [Poisson]
Link function    : g(u) = ln(u)                   [Log]

                                                  AIC             =   1.476181
Log pseudolikelihood = -16.14271262               BIC             =  -67.34331

------------------------------------------------------------------------------
             |               Robust
  apncu_cat2 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   kmspline1 |    1585174    1949363     0.81   0.416     -2235506     5405855
   kmspline2 |   -8458836   1.01e+07    -0.83   0.404    -2.83e+07    1.14e+07
   kmspline3 |   2.37e+07   2.79e+07     0.85   0.397    -3.11e+07    7.84e+07
   kmspline4 |  -1.84e+07   2.19e+07    -0.84   0.401    -6.14e+07    2.45e+07
   kmspline5 |    2302166    7076294     0.33   0.745    -1.16e+07    1.62e+07
   kmspline6 |    1998675    4742274     0.42   0.673     -7296011    1.13e+07
       _cons |       -201   249.9726    -0.80   0.421    -690.9374    288.9373
------------------------------------------------------------------------------

. poisson apncu_cat2 kmspline*, vce(robust) nolog

Poisson regression                              Number of obs     =         30
                                                Wald chi2(5)      =          .
                                                Prob > chi2       =          .
Log pseudolikelihood = -16.142713               Pseudo R2         =     0.1862

------------------------------------------------------------------------------
             |               Robust
  apncu_cat2 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   kmspline1 |    1585174    1949363     0.81   0.416     -2235507     5405855
   kmspline2 |   -8458836   1.01e+07    -0.83   0.404    -2.83e+07    1.14e+07
   kmspline3 |   2.37e+07   2.79e+07     0.85   0.397    -3.11e+07    7.84e+07
   kmspline4 |  -1.84e+07   2.19e+07    -0.84   0.401    -6.14e+07    2.45e+07
   kmspline5 |    2302166    7076294     0.33   0.745    -1.16e+07    1.62e+07
   kmspline6 |    1998675    4742274     0.42   0.673     -7296011    1.13e+07
       _cons |       -201   249.9726    -0.80   0.421    -690.9374    288.9373
------------------------------------------------------------------------------

.

Comment

Joro Kolev

Join Date: Aug 2018

Posts: 3050
#3

27 Jul 2020, 10:37

The only problem I see here is that in principle you should not be using the Poisson regression model. You should use a model designed for binary outcome, something like probit or logit.
Comment
sarah minion

Join Date: Sep 2017

Posts: 12
#4

27 Jul 2020, 10:45

I have used it previously to model relative risks with common outcomes per https://stats.idre.ucla.edu/stata/fa...ohort-studies/ & Zou G. A Modified Poisson Regression Approach to Prospective Studies with Binary Data. Am J Epidemiol 2004; 159(7):702-6 ; the outcome in question is 71.2% of the dataset. I've never had this issue with any other variable, so I guess I was just trying to understand why changing the independent predictor would have this substantial of an impact on the results of the model
Comment
sarah minion

Join Date: Sep 2017

Posts: 12
#5

27 Jul 2020, 10:52

I guess my bigger question is what characteristics of a continuous variable would lead to the model not working like it has previously and is supposed to?
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#6

27 Jul 2020, 11:18

You keep on saying that Poisson regression is "supposed to" give you predictions between 0 and 1, and no, it is not supposed to do this. It is a different model, it is not designed to obey the 0 to 1 bounds.

Otherwise you can do whatever you want, it is a free world. You can fit Poisson regression to your binary data, you can also fit a Linear probability model, and the predictions are pretty close:

Code:

. qui poisson apncu_cat2 kmspline*, vce(robust) nolog . predict yhatthepoison (option n assumed; predicted number of events) . qui reg apncu_cat2 kmspline*, vce(robust) noheader . predict yhattheremedy (option xb assumed; fitted values) . summ yhatthepoison yhattheremedy Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- yhatthepoi~n | 30 .3 .2942041 .0549956 1.086324 yhattherem~y | 30 .3 .2928146 .0281442 1.077846 . pwcorr yhatthepoison yhattheremedy, sig | yhatth~n yhatth~y -------------+------------------ yhatthepoi~n | 1.0000 | | yhattherem~y | 0.9953 1.0000 | 0.0000

The Poisson and the Linear Probability model give you virtually the same predictions.

As to what characteristics of a regressor can take you out of range that you feel should be obeyed like 0 to 1, you said it: continuous. Continuous regressors have the tendency to take you out of the range of your actual dependent variable. I have not really thought through this because I do not think it is an interesting question, but I guess the wider is the range of your continuous regressor, the more likely it is to take you out of the range of your dependent variable.

Binary variables have the tendency not to take you out of the range of your dependent variable.

Originally posted by sarah minion View Post

I guess my bigger question is what characteristics of a continuous variable would lead to the model not working like it has previously and is supposed to?
1 like
Comment
sarah minion

Join Date: Sep 2017

Posts: 12
#7

27 Jul 2020, 11:25

Ok, thank you for your responses
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#8

27 Jul 2020, 11:30

Regarding the predictors, I conflated the continuity of the regressor with its boundedness. What I wanted to say is:
1) if you are running a regression of 0/1 variable on variables which are also 0/1, be it binary, or continuous but with the limited range of 0 to 1, in my experience it is unlikely for the predictions to come out of the range of the dependent variable, and when they come out, they come out by little like in your example.
2) if you run a regression of 0/1 variable on predictor which is with a huge range out of 0 to 1, then the predictions are very likely to be out of 0 to 1, and to be out by a lot.

Continuity of the regressors might have something to do with it too. But I think the range of the regressors being much larger than the range of the outcome is the main factor.
Comment

Announcement

GLM Model (log link, poisson family) predicting values > 1 with binary outcome but only with particular independent variable

Comment

Comment

Comment

Comment

Comment

Comment

Comment