Comparing poisson models fit with the vce(robust) option

Kjell Weyde

Join Date: May 2016

Posts: 129
#1

Comparing poisson models fit with the vce(robust) option

12 Dec 2016, 07:02

Hi!
I am doing an analysis with the continuous dependent variable "inattention". This is a sum variable ranging from 9-36. The variable is very skewed, but follows a normal distrubution when log transformed. I have read that several people recommend using Poisson with the vce(robust) option in these cases, instead of log transforming the dependent variable and do a linear regression. However, several of my models contain interaction terms, and I want to assess the significance of these interactions. Normally, I would use LR-tests. However, how do you compare models with the vce(robust) option (besides the AIC and BIC)?

Best,
Kjell Weyde
Tags: None
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#2

12 Dec 2016, 12:52

Dear Kjell,

In reply to your question, I would say that you can use Wald tests (say, F and t-tests) based on the robust matrix. Do not use LR tests or AIC/BIC at all because these are likelihood based.

As an aside, I am not sure the models you are considering are appropriate for these data. Are the limits you give the min and max in the sample, or is it that by construction the variable is bounded between 9 and 36?

Best wishes,

Joao
1 like
Comment
Kjell Weyde

Join Date: May 2016

Posts: 129
#3

13 Dec 2016, 01:36

Joao, thank you for your comments! The dependent variable is a sum score with minimal value 9 and max 36 (by construction). 9 questions with response alternatives of 1-4 are summarized. Was there any other model(s) you thought could be a better alternative than regress logdepvar or poission depvar, vce(robust)?

Kjell
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#4

13 Dec 2016, 01:48

For a bounded response I would want to rescale to fractional logit as model of first call. The predictions might turn out similar quantitatively, but I'd feel more comfortable with fractional logit. The variance-mean relation would be quite different for a bounded response.
Comment
Kjell Weyde

Join Date: May 2016

Posts: 129
#5

13 Dec 2016, 02:39

Thanks!
By rescale to fractional logit, do you mean dividing the score on the dependent variable by its maximum score, such as:

Code:

frac_depvar = depvar/36

Are the assumptions of fractional logit similar to those for logistic regression?
Comment

Kjell Weyde

Join Date: May 2016
Posts: 129

13 Dec 2016, 02:48

By the way, the results of the different models are:

The fractional logit gave the following result:

Code:

------------------------------------------------------------------------------------------------
                     frac_oppm |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------------------------+----------------------------------------------------------------
                      stoy10_8 |   .0278099   .0153279     1.81   0.070    -.0022323    .0578521

Poisson With vce(robust) gave:

Code:

------------------------------------------------------------------------------------------------
                               |               Robust
    oppmerksomhetssvikt_8år_2  |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------------------------+----------------------------------------------------------------
                      stoy10_8 |   .0169022   .0093964     1.80   0.072    -.0015143    .0353187

And regress with log of depvar gave:

Code:

------------------------------------------------------------------------------------------------
                       logoppm |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------------------------+----------------------------------------------------------------
                      stoy10_8 |   .0066019   .0037049     1.78   0.075    -.0006659    .0138697

Comment

Maarten Buis

Join Date: Mar 2014

Posts: 3458
#7

13 Dec 2016, 03:21

Originally posted by Kjell Weyde View Post

By rescale to fractional logit, do you mean dividing the score on the dependent variable by its maximum score, such as:

Code:

frac_depvar = depvar/36

No, because now you don't take care of the fact that the lower bound is larger than 0. Instead:

Code:

frac_depvar = (depvar-9)/27

Originally posted by Kjell Weyde View Post

Are the assumptions of fractional logit similar to those for logistic regression?

They are related but different models, so some assumptions apply and others don't.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#8

13 Dec 2016, 03:37

Not quite what I would recommend. If marks are necessarily within [9, 36], then it would seem that (mark - 9) / 27 is the appropriate scaling. (NOTE: Glad to see that my friend Maarten concurs.)

Being shown coefficients is not itself informative given the use of different link functions. But you can generate predictions, even with multiple predictor models, and cross-correlate those.

The consistency of t and P values may show what often happens, that models forgive a bad error specification. I think some groups obsess a little too much about error structure assumptions when getting the functional form right is always the bigger deal.

Here are some classic data from G.U. Yule. 1897. JRSS 60: 812--844 on % of population receiving relief and agricultural earnings.

Yule's own analysis was exemplary for his time and included careful discussion of a scatter plot.

But now many practitioners would want to insist that logit would be a better model, for which we scale to proportions. (I have no argument against probit for those so inclined.)

Code:

clear input float(pauperism earnings) 2.4 20.75 2.29 20.25 1.39 19.67 1.92 18.5 2.98 17.67 1.17 17.5 3.79 17.08 3.01 17 2.39 17 2.78 16.93 3.09 16.5 2.78 16.33 2.61 16.25 4.33 16.25 3.02 16 4.2 16 1.29 15.75 5.16 15.67 4.75 15.5 4.64 15.5 4.26 15.33 1.66 15.25 5.37 15 3.38 15 5.84 15 4.63 15 3.93 15 4.54 15 3.42 14.83 5.88 14.75 4.36 14.75 3.85 14.75 3.92 14.58 4.48 14.5 5.67 14.5 4.91 14.33 4.34 13.5 5.19 12.5 end label var earnings "Earnings of agricultural labourers, shillings/week" label var pauperism "Percent of population receiving relief" gen pauperism2 = pauperism/100 label variable pauperism2 "Fraction of population receiving relief" reg pauperism2 earnings predict p_regress fracreg logit pauperism2 earnings predict p_logit poisson pauperism2 earnings predict p_poisson scatter pauperism2 earnings, ms(Oh) ytitle("`: var label pauperism2'") /// || line p_regress p_logit p_poisson earnings, sort lp(solid solid dash) /// legend(order(2 "Regression" 3 "Logit" 4 "Poisson") pos(1) col(1) ring(0)) /// yla(0 "0" 0.01(0.01)0.06, ang(h) format(%03.2f))

Manifestly the logit captures the modest curvature and doesn't have the worrying (to modern eyes) implication of the linear model that predictions would go negative for higher values of the predictor. But the model predictions are close.

In fact the Poisson -- for these data -- gives almost identical predictions to the logit, which is a side-effect of having proportions only in a narrow range. Otherwise put, for small p, log [p / (1 - p)] = log p - log (1 - p) is very close to log p, so that predictions from the logit and the logarithmic link are inevitably close, even though the data are at first sight a long, long way away from a Poisson distribution.

I didn't specify vce(robust) for the Poisson but in the example I care only about fitted values.

Still, none of that undermines a principle that a model that respects the range of the response is, with nothing else said, preferable to one that doesn't.

Historical note: 20 shillings = 1 GBP. No inflation adjustments were made.

Last edited by Nick Cox; 13 Dec 2016, 03:51.
1 like
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#9

13 Dec 2016, 12:10

Dear Kjell,

Thanks for clarifying. I obviously concur with the advice provided by Maarten and Nick. About the assumptions, I would say that the assumptions needed for the fractional logit to be valid are similar to those needed for Poisson regression: basically you need a functional form assumption; see the original paper by Wooldridge and Papke (1996).

Best wishes,

Joao
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3458
#10

14 Dec 2016, 03:18

As a side note: the Papke and Wooldridge 1996 article is great and played a great role in introducing this model in economics, but I would not call it the original. For example, using quasi-likelihood to model proportions was already discussed in by Wedderburn in 1974.

A nice discussion of this use of quasi-likelihood is in (McCullagh and Nelder 1989) and in (Hardin and Hilbe 2007).

Hardin, J. W., Hilbe, J. M., & Hilbe, J. (2007). Generalized linear models and extensions. College Station, TX: Stata press.
McCullagh, Peter; Nelder, John (1989). Generalized Linear Models, Second Edition. Boca Raton: Chapman and Hall/CRC
R.W.M. Wedderburn (1974) Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method Biometrika 61(3):439-447

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment

Kjell Weyde

Join Date: May 2016
Posts: 129

#11

14 Dec 2016, 03:23

Thank you very much, everyone! I will use the fracreg logit. The output is:

Code:

Fractional logistic regression                  Number of obs     =      1,396
                                                Wald chi2(24)     =     343.63
                                                Prob > chi2       =     0.0000
Log pseudolikelihood = -647.06999               Pseudo R2         =     0.0320

------------------------------------------------------------------------------------------------
                               |               Robust
                     frac_oppm |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------------------------+----------------------------------------------------------------
                      stoy10_8 |   .0588769   .0333114     1.77   0.077    -.0064123    .1241661

This is interpreted as: For a one unit increase in stoy10_8, the fraction increases with 0.059, or 5.9%?

Are there any good ways to assess the fit of the fracreg model?

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35724

#12

14 Dec 2016, 03:25

Indeed. This is a Stata-readable version of the blotch data used by Wedderburn.

Code:

 
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id percent) str3(site gen)
1 .05 "S1" "G01"
2 0 "S1" "G02"
3 0 "S1" "G03"
4 .1 "S1" "G04"
5 .25 "S1" "G05"
6 .05 "S1" "G06"
7 .5 "S1" "G07"
8 1.3 "S1" "G08"
9 1.5 "S1" "G09"
10 1.5 "S1" "G10"
11 0 "S2" "G01"
12 .05 "S2" "G02"
13 .05 "S2" "G03"
14 .3 "S2" "G04"
15 .75 "S2" "G05"
16 .3 "S2" "G06"
17 3 "S2" "G07"
18 7.5 "S2" "G08"
19 1 "S2" "G09"
20 12.7 "S2" "G10"
21 1.25 "S3" "G01"
22 1.25 "S3" "G02"
23 2.5 "S3" "G03"
24 16.6 "S3" "G04"
25 2.5 "S3" "G05"
26 2.5 "S3" "G06"
27 0 "S3" "G07"
28 20 "S3" "G08"
29 37.5 "S3" "G09"
30 26.25 "S3" "G10"
31 2.5 "S4" "G01"
32 .5 "S4" "G02"
33 .01 "S4" "G03"
34 3 "S4" "G04"
35 2.5 "S4" "G05"
36 .01 "S4" "G06"
37 25 "S4" "G07"
38 55 "S4" "G08"
39 5 "S4" "G09"
40 40 "S4" "G10"
41 5.5 "S5" "G01"
42 1 "S5" "G02"
43 6 "S5" "G03"
44 1.1 "S5" "G04"
45 2.5 "S5" "G05"
46 8 "S5" "G06"
47 16.5 "S5" "G07"
48 29.5 "S5" "G08"
49 20 "S5" "G09"
50 43.5 "S5" "G10"
51 1 "S6" "G01"
52 5 "S6" "G02"
53 5 "S6" "G03"
54 5 "S6" "G04"
55 5 "S6" "G05"
56 5 "S6" "G06"
57 10 "S6" "G07"
58 5 "S6" "G08"
59 50 "S6" "G09"
60 75 "S6" "G10"
61 5 "S7" "G01"
62 .1 "S7" "G02"
63 5 "S7" "G03"
64 5 "S7" "G04"
65 50 "S7" "G05"
66 10 "S7" "G06"
67 50 "S7" "G07"
68 25 "S7" "G08"
69 50 "S7" "G09"
70 75 "S7" "G10"
71 5 "S8" "G01"
72 10 "S8" "G02"
73 5 "S8" "G03"
74 5 "S8" "G04"
75 25 "S8" "G05"
76 75 "S8" "G06"
77 50 "S8" "G07"
78 75 "S8" "G08"
79 75 "S8" "G09"
80 75 "S8" "G10"
81 17.5 "S9" "G01"
82 25 "S9" "G02"
83 42.5 "S9" "G03"
84 50 "S9" "G04"
85 37.5 "S9" "G05"
86 95 "S9" "G06"
87 62.5 "S9" "G07"
88 95 "S9" "G08"
89 95 "S9" "G09"
90 95 "S9" "G10"
end

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35724

#13

14 Dec 2016, 03:49

Kjell #11: No, that is definitely not the interpretation as fracreg is not implementing the linear probability model!

The logit link changes the scale, so that the coefficient is not the slope on a probability scale, but the slope of logit of probability with the predictor (conditional on the rest of the model).

fracreg is great, but I would still fire up the glm equivalent myself whenever I wanted to look at residuals, as I do for any serious project of my own.

Here is a reproducible example with Yule's data posted earlier.

Code:

. clear

. input float(pauperism earnings)

     pauperism   earnings
  1.  2.4 20.75
  2. 2.29 20.25
  3. 1.39 19.67
  4. 1.92  18.5
  5. 2.98 17.67
  6. 1.17  17.5
  7. 3.79 17.08
  8. 3.01    17
  9. 2.39    17
 10. 2.78 16.93
 11. 3.09  16.5
 12. 2.78 16.33
 13. 2.61 16.25
 14. 4.33 16.25
 15. 3.02    16
 16.  4.2    16
 17. 1.29 15.75
 18. 5.16 15.67
 19. 4.75  15.5
 20. 4.64  15.5
 21. 4.26 15.33
 22. 1.66 15.25
 23. 5.37    15
 24. 3.38    15
 25. 5.84    15
 26. 4.63    15
 27. 3.93    15
 28. 4.54    15
 29. 3.42 14.83
 30. 5.88 14.75
 31. 4.36 14.75
 32. 3.85 14.75
 33. 3.92 14.58
 34. 4.48  14.5
 35. 5.67  14.5
 36. 4.91 14.33
 37. 4.34  13.5
 38. 5.19  12.5
 39. end

.
. label var earnings "Earnings of agricultural labourers, shillings/week"

. label var pauperism  "Percent of population receiving relief"

. gen pauperism2 = pauperism/100

. label variable pauperism2 "Fraction of population receiving relief"

. reg pauperism2 earnings

      Source |       SS           df       MS      Number of obs   =        38
-------------+----------------------------------   F(1, 36)        =     28.21
       Model |  .002770242         1  .002770242   Prob > F        =    0.0000
    Residual |  .003535251        36  .000098201   R-squared       =    0.4393
-------------+----------------------------------   Adj R-squared   =    0.4238
       Total |  .006305493        37  .000170419   Root MSE        =    .00991

------------------------------------------------------------------------------
  pauperism2 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    earnings |   -.004989   .0009393    -5.31   0.000     -.006894    -.003084
       _cons |   .1162599   .0150575     7.72   0.000     .0857218     .146798
------------------------------------------------------------------------------

. predict p_regress
(option xb assumed; fitted values)

. fracreg logit pauperism2 earnings

Iteration 0:   log pseudolikelihood = -26.990382  
Iteration 1:   log pseudolikelihood = -6.0029195  
Iteration 2:   log pseudolikelihood = -5.9429204  
Iteration 3:   log pseudolikelihood = -5.9405238  
Iteration 4:   log pseudolikelihood = -5.9405216  
Iteration 5:   log pseudolikelihood = -5.9405216  

Fractional logistic regression                  Number of obs     =         38
                                                Wald chi2(1)      =      40.82
                                                Prob > chi2       =     0.0000
Log pseudolikelihood = -5.9405216               Pseudo R2         =     0.0071

------------------------------------------------------------------------------
             |               Robust
  pauperism2 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    earnings |  -.1599957   .0250415    -6.39   0.000    -.2090762   -.1109152
       _cons |  -.7483478   .3891495    -1.92   0.054    -1.511067    .0143712
------------------------------------------------------------------------------

. predict p_logit
(option cm assumed)

. glm pauperism2 earnings, link(logit) vce(robust) f(binomial)
note: pauperism2 has noninteger values

Iteration 0:   log pseudolikelihood = -5.7617687  
Iteration 1:   log pseudolikelihood = -4.6258457  
Iteration 2:   log pseudolikelihood = -4.6094723  
Iteration 3:   log pseudolikelihood =  -4.609455  
Iteration 4:   log pseudolikelihood =  -4.609455  

Generalized linear models                         No. of obs      =         38
Optimization     : ML                             Residual df     =         36
                                                  Scale parameter =          1
Deviance         =  .1072531205                   (1/df) Deviance =   .0029793
Pearson          =  .0985563836                   (1/df) Pearson  =   .0027377

Variance function: V(u) = u*(1-u/1)               [Binomial]
Link function    : g(u) = ln(u/(1-u))             [Logit]

                                                  AIC             =   .3478661
Log pseudolikelihood = -4.609455006               BIC             =  -130.8458

------------------------------------------------------------------------------
             |               Robust
  pauperism2 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    earnings |  -.1599957   .0250415    -6.39   0.000    -.2090762   -.1109152
       _cons |  -.7483478   .3891495    -1.92   0.054    -1.511067    .0143712
------------------------------------------------------------------------------

*  next two commands require installation of -modeldiag- from Stata Journal website

. rvfplot2

. regplot

* these graphs just require official commands

. twoway function _b[_cons] + _b[earnings] * x , ra(earnings)

. twoway function invlogit(_b[_cons] + _b[earnings] * x) , ra(earnings)

Comment

Kjell Weyde

Join Date: May 2016

Posts: 129
#14

14 Dec 2016, 04:31

Perhaps the best way to interpret and explain the coefficient, then, is to use margins?

The plots you suggested look like below. I am not very familiar with those kinds of plots. How do they look?

rvfplot2:

regplot:

twoway function _b[_cons] + _b[stoy10_8] * x , ra(stoy10_8):

twoway function invlogit(_b[_cons] + _b[stoy10_8] * x) , ra(stoy10_8):

Attached Files
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#15

14 Dec 2016, 04:52

Sure, you can look at margins too.

The residual vs fitted plot is a kind of health check. No (obvious) pattern is in itself good news. The diagonal bands residual = constant - predicted reflect granularity in the observed response. But those are lines for the original response 9 (1) 36.

Sorry, but I should have reminded you to read the help for regplot. It doesn't apply wherever there are other predictors.

You never show us complete model commands, so we can hardly know precisely what model you are fitting.
Comment

Announcement