Main Effects and Interaction Effects in Count Data Model

Behram Wali

Join Date: Mar 2016

Posts: 50
#1

Main Effects and Interaction Effects in Count Data Model

23 Feb 2017, 23:51

Dear all,

I am estimating a count data model for predicting traffic crashes as a function of explanatory variables. While controlling for other explanatory factors, the key focus is on investigating the effect of "variation in a co-variate" (which happens to be an interaction term) on the response outcome. I am seeking expert opinion on whether excluding "main effects" and keeping "interaction effect" in the model is reasonable? Of course, i have followed the discussion over here http://www.statalist.org/forums/forum/general-stata-discussion/general/1374798-how-does-the-interpretation-change-if-i-drop-the-linear-terms,
but i want to clarify my concept with relevance to the data i have.

Consider the data description:

Code:

avecrash //response outcome meanspeed //average speed - main effect sdspeed //sdspeed - main effect covspeed //Interaction term: Coefficient of variation for above two variables i.e. sdspeed/meanspeed

The two models are:

Code:

nbreg avecrash meanspeed sdspeed covspeed // Model 1 nbreg avecrash covspeed // Model 2

Following Professor. Phil's comment in the above thread, It is seldom desirable to run interactions without including the main effects, because the second specification (Model 2 above) forces the influence of x2 (say sdspeed in my case) to be 0 when x1 (say meanspeed) equals 0 while the first specification (Model 1 above) does not put such a restriction.

Here is my question please: In my case, the interaction term consists of two variables that are sort of derivative of a same variable "overall speed", from which mean speed and standard deviation of speed is calculated and then put into the interaction. So, based on the data i have, there is no case where meanspeed can be zero while sdspeed can be non-zero. Such a case is also not possible conceptually. In other words, if i am understanding the concept correctly, when main-effects are excluded, we are not forcing the coefficient of meanspeed to be zero when sdspeed equals zero (and vice versa), because no such case exist in the data i have. Any guidance in this regard will be highly appreciated.

Below are the descriptive statistics for clarity.

-Behram
Tags: None
Maarten Buis

Join Date: Mar 2014

Posts: 3459
#2

24 Feb 2017, 02:13

Stata does not know what your variables mean, nor what is on substantive grounds possible or impossible. So it will force that constrain, even though that combination of values does not happen in your data.

The basic idea is easier to visualize if we look at whether we want to include the constant or not. Consider the example below. we look at the association between hourly wage and hours works. We may safely assume that if someone works 0 hours she (the example data is for women only) will get a wage of 0. So why not include that assumption in our model by excluding the constant? As you can see in the graph below, forcing the regression line through the point 0,0 significantly deteriorates the fit of the model, even though that point is not present in the data. The same thing will happen when you exclude the main effects in your model; it will have a big influence on your results, even though the point 0 is not present, and cannot happen, in your data.

Code:

// open some example data sysuse nlsw88, clear // nobody works 0 hours sum hours // with constant reg wage hours predict xb1 // without constant reg wage hours, nocons predict xb2 // which line fits better? scatter wage hours, msymbol(oh) mcolor(gs8) || /// line xb1 xb2 hours, sort /// legend(order(2 "with" "constant" /// 3 "without" "constant")) /// ytitle(hourly wage)

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Behram Wali

Join Date: Mar 2016

Posts: 50
#3

24 Feb 2017, 02:47

Dear Maarten,

Thank you for your detailed and helpful response. I understood your valid point.

However, if i decide to include the "conditional effects" in addition to the interaction effects, i am facing difficulty in intuitively interpreting the results.

For example, model with only interaction effect is:

Code:

poisson avecrash covspeed lnadtmaj lnadtmin x4leg totnumleft rangespeed if sigornot == 1

The model with both conditional and interaction effects is:

Code:

poisson avecrash meanspeed sdspeed covspeed lnadtmaj lnadtmin x4leg totnumleft rangespeed if sigornot == 1

The interaction effects are significant in both of the above models. Now, how can i interpret the conditional effect of "sdspeed" (or the conditional effect of meanspeed) in the above model. For example, for "sdspeed" equal zero, an increase of 1-unit in meanspeed affects outcome by x units ?? Though in this case, the conditional effects are statistically significant, i just find it conceptually difficult to interpret it and intuitively relate it to the response.

Also, including conditional effects do not affect the statistical significance of interaction effects in this un-pooled model, however it does in other un-pooled models. Should i still keep them?

Thank you for your guidance, -Behram
Comment

Maarten Buis

Join Date: Mar 2014
Posts: 3459

24 Feb 2017, 03:18

It may not influence the significance but it does strongly impact the coefficient, which is what you ultimately care about. So, yes include the main effects.

You did not include an abstract from the data so I will use another the nlsw88 data instead. Wage is not a count variable, but with the vce(robust) option it is an attractive model for this type of variable ( http://blog.stata.com/2011/08/22/use...tell-a-friend/ ) Here we have a similar "impossible" zero value. In this case nobody works 0 hours. To interpret the coefficients it is very helpful to include the irr option. So in this case non-union members can expect their hourly wage to increase by a factor 1.004 or (1.004-1)*100%=0.4% for ever hour per week they work longer. Becoming a union member will increase the hourly wage by 89% if they work 0 hours. That is not very helpful, and that is the problem you are referring to.

Code:

. // open some example data
. sysuse nlsw88, clear
(NLSW, 1988 extract)

.
. // estimate the model
. poisson wage c.hours##i.union ttl_exp grade , irr vce(robust)
note: you are responsible for interpretation of noncount dep. variable

Iteration 0:   log pseudolikelihood = -4766.5829  
Iteration 1:   log pseudolikelihood = -4766.5828  

Poisson regression                              Number of obs     =      1,875
                                                Wald chi2(5)      =    1052.40
                                                Prob > chi2       =     0.0000
Log pseudolikelihood = -4766.5828               Pseudo R2         =     0.1199

-------------------------------------------------------------------------------
              |               Robust
         wage |        IRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
        hours |   1.004248   .0017532     2.43   0.015     1.000818     1.00769
              |
        union |
       union  |   1.886415   .2393017     5.00   0.000     1.471153    2.418893
              |
union#c.hours |
       union  |   .9866461   .0030773    -4.31   0.000     .9806331     .992696
              |
      ttl_exp |   1.038123   .0024918    15.59   0.000      1.03325    1.043018
        grade |   1.085762   .0048586    18.39   0.000     1.076281    1.095327
        _cons |   1.254774   .1029197     2.77   0.006     1.068435    1.473613
-------------------------------------------------------------------------------

I typically solve that by centering my variable to have meaningful 0 values within the range of the data. In case of hours per week worked, it makes sense to choose 40, as that is the standard for full-time employment. You do that by creating a new variable which contains hours - 40. The effect of hours remains unchanged. Only the constant and the main effect of union changes. So becoming a union member increases the wage by 10% if one is full-time employed, and this effect of becoming a union member decreases by 1.3% ((0.987 - 1)*100%=-1.3%) for every hour one works longer. Notice that this change in effect is a change in percentages and not percentage points. Also see: http://maartenbuis.nl/publications/interactions.html

Code:

. // center variable
. gen hours_c = hours - 40
(4 missing values generated)

. label var hours_c "usual hours worked, centered at 40"

.
. // reestimate the model
. poisson wage c.hours_c##i.union ttl_exp grade , irr vce(robust)
note: you are responsible for interpretation of noncount dep. variable

Iteration 0:   log pseudolikelihood = -4766.5829  
Iteration 1:   log pseudolikelihood = -4766.5828  

Poisson regression                              Number of obs     =      1,875
                                                Wald chi2(5)      =    1052.40
                                                Prob > chi2       =     0.0000
Log pseudolikelihood = -4766.5828               Pseudo R2         =     0.1199

---------------------------------------------------------------------------------
                |               Robust
           wage |        IRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
        hours_c |   1.004248   .0017532     2.43   0.015     1.000818     1.00769
                |
          union |
         union  |   1.101777    .026129     4.09   0.000     1.051738    1.154198
                |
union#c.hours_c |
         union  |   .9866461   .0030773    -4.31   0.000     .9806331     .992696
                |
        ttl_exp |   1.038123   .0024918    15.59   0.000      1.03325    1.043018
          grade |   1.085762   .0048586    18.39   0.000     1.076281    1.095327
          _cons |   1.486633   .0893659     6.60   0.000     1.321404    1.672523
---------------------------------------------------------------------------------

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------

Comment

Virginia Carter Leno

Join Date: Jul 2020

Posts: 2
#5

05 Oct 2020, 12:43

Hi Maarten,

This thread is very informative! I want to run a Poisson model with an interaction term to predict count of psychiatric symptoms, but to complicate matters further, the predictors are count of other (earlier) symptoms, a binary grouping variable (sex), plus an interaction of the two (to ask if earlier psychiatric symptoms have differential predictive effects in males vs. females). Is this possible with Poisson regression? And I assume I shouldn't be specifying robust standard errors if it's true count data (and there aren't any issues with impossible zero values)?

Many thanks,

Virginia Carter Leno
Comment

Announcement