Issues in the interpretation results of interaction term in Poisson regression model using Stata

Emerald Chang

Join Date: Sep 2017
Posts: 50

Issues in the interpretation results of interaction term in Poisson regression model using Stata

06 May 2021, 11:52

Dear statalists

I have been encountering an issue with interpreting the output/results generated by Stata. I have done some troubleshooting and googling but still can't figure out why the risk ratio of preterm birth in the intervention group could go up so high (IRR=5361.1) after adding an interaction term (log serum zinc * intervention group) to a Poisson regression model.

Output (1):

Code:

.  poisson  preterm  c.log_aZinc##i.group ,irr

Iteration 0:   log likelihood = -144.87344  
Iteration 1:   log likelihood = -144.86904  
Iteration 2:   log likelihood = -144.86904  

Poisson regression                              Number of obs     =        581
                                                LR chi2(3)        =       9.66
                                                Prob > chi2       =     0.0217
Log likelihood = -144.86904                     Pseudo R2         =     0.0323

-----------------------------------------------------------------------------------
          preterm |        IRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------+----------------------------------------------------------------
        log_aZinc |   .9636623   .9396929    -0.04   0.970     .1425264    6.515598
                  |
            group |
    Intervention  |     5361.1   24921.31     1.85   0.065     .5921151    4.85e+07
                  |
group#c.log_aZinc |
    Intervention  |   .0485165   .0747777    -1.96   0.050     .0023656    .9950238
                  |
            _cons |    .100541   .3040938    -0.76   0.448     .0002678    37.74564
-----------------------------------------------------------------------------------
Note: _cons estimates baseline incidence rate.

However, the risk ratio seemed to look slightly reasonable even though not quite when I replaced log serum zinc with absolute/raw serum zinc values.

Output (2):

Code:

. poisson  preterm  c.aZinc##i.group ,irr

Iteration 0:   log likelihood = -144.73568  
Iteration 1:   log likelihood = -144.71196  
Iteration 2:   log likelihood = -144.71191  
Iteration 3:   log likelihood = -144.71191  

Poisson regression                              Number of obs     =        581
                                                LR chi2(3)        =       9.97
                                                Prob > chi2       =     0.0188
Log likelihood = -144.71191                     Pseudo R2         =     0.0333

----------------------------------------------------------------------------------
         preterm |        IRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------+----------------------------------------------------------------
        aZinc |   .9961657   .0422645    -0.09   0.928     .9166796    1.082544
                 |
           group |
   Intervention  |   14.77571   24.07052     1.65   0.098     .6065829    359.9206
                 |
group#c.aZinc |
   Intervention  |   .8558705   .0670031    -1.99   0.047     .7341258     .997805
                 |
           _cons |   .0977647   .0952735    -2.39   0.017     .0144767    .6602271
----------------------------------------------------------------------------------
Note: _cons estimates baseline incidence rate.

The rationale of including an interaction term into the Poisson regression model is to test our assumption that if zinc levels at preconception may be a critical factor in determining how well our zinc supplements can reduce the risk of preterm birth at a later stage of pregnancy.

In this case, may I check how to interpret the result of the intervention group on its own?

- Is that there is a 5000-fold increase in risk of preterm birth in the intervention group relative to the control group when log serum zinc is zero ?

-Is this massive increase in relative risk value of preterm birth has anything to do with low sample size in the intervention group of whom ended up having the outcome when serum zinc is equal to zero?

-Is it recommendable to normalise data if the continuous independent variable of my interest is not normally distributed for the Poisson regression analysis in Stata?

Many thanks for taking time to this post and any insight would be greatly appreciated.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#2

06 May 2021, 12:19

Looking at your first model, you can see that as high as that IRR for 1.group is, the standard error is also enormous. This says that your data contains very little information about the difference in outcomes between groups at zero levels of log-zinc. That could certainly arise if one of the two groups is small, which you say is true of your data. Indeed that is the likely explanation. Notice, by the way, that the standard error for the zinc variables are, themselves, perfectly reasonable--the zinc variable is not problematic.

The use of a log transform compresses the range of the zinc variable, so the zinc coefficient gets magnified as a result, and the transformation to an IRR magnifies things even farther. That is why the results in your non-log transformed group don't look as unreasonable. But they are still unreasonable. There are really few things in the real world that are associated with an IRR of almost 15. This, too, is likely due to the occurrence of some events that are typically uncommon in a small group.

Normalizing the zinc variable will make your results incomprehensible: nobody will grasp what the values of such a variable mean in real life. And it will not solve your problem either. The problem is not the distribution of the zinc variable. It's the small sample size in one of the groups.
1 like
Comment
Emerald Chang

Join Date: Sep 2017

Posts: 50
#3

06 May 2021, 21:41

Thanks, Clyde ! Your input is much appreciated
Comment
Paul Dickman

Join Date: Apr 2014

Posts: 294
#4

07 May 2021, 02:33

I'm putting myself at risk of being thrown out of the statisticians union for making this blasphemous suggestion, but you could consider categorising serum zinc concentration. You will then get, from your model, an incidence rate ratio for the effect of the intervention (hereafter called IRR) for each category of seum zinc concentration.

I very much agree that an interaction is required, because one might expect the effect of the intervention (zinc supplements) to depend on serum zinc concentration. The next question is appropriately model the form of this interaction. You have examined two approaches, one where the IRR cnahes as a linear function of serum zinc and one where it changes as a log-linear function of serum zinc. Based on your models, there will be a different estimate of the IRR for each value of serum zinc. Your table of parameters estimates gives you an estimate for one value of serum zinc (when it is zero and when it is one (i.e., when log serum is zero)). Questions of interest are, what is the functional form of the interaction and what are the estimated IRRs for other values of serum zinc.

I am not a huge fan of categorising, but it will give some insight into the function form (you are not contrained to linear and log linear) and you can easily get estimates of the IRR for levels of serum zinc.

Another thing you can do is center serum zinc (subtract the mean from all values). The IRR that was 14.775 (i.e., the IRR at serum zinc = 0) will now be the estimates IRR at the mean value of serum zinc.

A better solution might be to use a fractional polynomial or spline to model serum zinc and then obtain predicted IRRs for selected values of serum zinc.
2 likes
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2159
#5

07 May 2021, 06:01

I agree with Paul. What are its units of measurement of the serum zinc? I suspect zero is not an interesting value, and maybe not even close. Is one an interesting value? (So that log(zinc) = 0 is interesting?) Can you show a distribution of your zinc variable? I would center zinc or log(zinc) about its mean value to force the estimate on the intervention variable to be meaningful. That you might be estimating a meaningless parameter also explains why the standard error is so large.

One problem with focusing on irr is that the margins command in Stata does not compute the marginal effect on irr. If you drop the irr option and compute the marginal effect at different values of the zinc variable then you'll see what's happening.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2159
#6

07 May 2021, 06:06

I would use the following to get the average semi-elasticity if you don't center:

Code:

poisson preterm i.group c.log_aZinc i.group#c.log_a_Zinc, vce(robust) margins, eydx(group)

By the way, I doubt you can trust the nonrobust standard errors for Poisson. They could even be too large if there is underdispersion. I strongly recommend using the robust option.
Comment
Emerald Chang

Join Date: Sep 2017

Posts: 50
#7

07 May 2021, 11:23

Originally posted by Jeff Wooldridge View Post

I agree with Paul. What are its units of measurement of the serum zinc? I suspect zero is not an interesting value, and maybe not even close. Is one an interesting value? (So that log(zinc) = 0 is interesting?) Can you show a distribution of your zinc variable? I would center zinc or log(zinc) about its mean value to force the estimate on the intervention variable to be meaningful. That you might be estimating a meaningless parameter also explains why the standard error is so large.

One problem with focusing on irr is that the margins command in Stata does not compute the marginal effect on irr. If you drop the irr option and compute the marginal effect at different values of the zinc variable then you'll see what's happening.

Hi Jeff,

The unit of serum zinc is umol/L. Have discarded the thought of including log (zinc) as the independent variable in the Poisson model.

The latest approach is to standardise the data of serum zinc as I realised that it didn't make sense to assume serum zinc as 0 as there must be some zinc presence in individuals blood; it is just whether above or below the threshold.

Since no one had 0 value for serum zinc in the dataset, guess how Stata worked out the IRR was actually based on the lowest log zinc value where a number of preterm birth cases versus term born could be observed between control and intervention groups, among 13 participants (4 in control and 9 in intervention) whose log-transformed serum zinc levels below 1% percentile of the entire study cohort, only one participant from the intervention group had preterm birth despite being given zinc supplements. A massive increase in the risk of preterm birth in the intervention group shown in the output 1 above may the result of this case.

After substituting the serum zinc z-score as the ID variable in the model, the results seem more sensible in the passion model with an interaction term.

Code:

. poisson preterm c.z_aZinc##i.group ,irr Iteration 0: log likelihood = -144.73568 Iteration 1: log likelihood = -144.71196 Iteration 2: log likelihood = -144.71191 Iteration 3: log likelihood = -144.71191 Poisson regression Number of obs = 581 LR chi2(3) = 9.97 Prob > chi2 = 0.0188 Log likelihood = -144.71191 Pseudo R2 = 0.0333 ------------------------------------------------------------------------------------ preterm | IRR Std. Err. z P>|z| [95% Conf. Interval] -------------------+---------------------------------------------------------------- z_aZinc | .9812178 .2054698 -0.09 0.928 .6509106 1.479141 | group | Intervention | .4378535 .1686583 -2.14 0.032 .2058029 .9315499 | group#c.z_aZinc | Intervention | .4638677 .1792338 -1.99 0.047 .2175197 .9892129 | _cons | .0896314 .0175832 -12.30 0.000 .0610209 .1316564 ------------------------------------------------------------------------------------ Note: _cons estimates baseline incidence rate.

Serum zinc was natural logged transformed to normalise data for analysis purpose as suggested by a colleague. But, I only know that the assumption of Normality seems more critical to linear or logistic regression but not Poisson regression though. In this case, may I seek your advice if it is okay for me to just take either absolute serum zinc or serum zinc z-score as it is to my Poisson model?

Any input is appreciated. Thank you
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2159
#8

09 May 2021, 05:57

Emerald: I suspected the distribution looked something like that -- where zero isn't close to being an interesting value. That would also be true when you take the log (although it gets closer to zero). Remember, the distribution of your explanatory variables is irrelevant for either linear regression or Poisson or almost anything else. You want to choose transformations that are easy to interpret and provide a good fit. I wouldn't totally abandon using log(zinc) as long as you center it first before constructing the interaction. With log(zinc), you get an elasticity. For me, that makes it easy to interpret its magnitude -- but my training is in economics.

I again strongly recommend using the vce(robust) option to obtain standard errors and confidence intervals.

Last edited by Jeff Wooldridge; 09 May 2021, 06:00.
1 like
Comment
Emerald Chang

Join Date: Sep 2017

Posts: 50
#9

11 May 2021, 20:07

Many thanks to Clyde, Paul and Jeff for your valuable inputs and suggestions
Comment

Announcement

Issues in the interpretation results of interaction term in Poisson regression model using Stata

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment