Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpreting continuous variable interactions and main effects in Poisson / Negative Binomial regression

    Dear all,

    I am currently trying to understand how to run the poisson / negative binomial regression model with interactions between continuous variables for my thesis.
    I have a count variable as DV, which I call "trans" (my Y) and several continuous IVs with X1 = ln_rpat_rRD (measure for rival innovation as patents per unit spending), X2= rel_size (as measure of relative firm size), X3 = rel_ROA (measure of relative firm performance), X4 = sthom (measure for strategic homogeneity among rival firms), X5 = dummy variable ... to just name the most relevant ones. I have hypothesized for X2, X3, and X4 to moderate the relationship between X1 and Y (with X2 and X3 to have a negative moderating effect and X4 to have a positive moderating effect). Furthermore, I have hypothesized the relationship between X1 and Y to be positive and significant.
    I have understood that it seems to make sense to mean center all my moderating IVs first in order to give the 0 value of a variable a substantive meaning. This, I have done.

    I am currently still facing two questions that I unfortunately could neither find an answer to on the internet nor in this forum. In case I have missed the respective post, I apologize for duplicating the question and would greatly appreciate if you could hint me towards it.

    I'm afraid I do not yet fully understand the process for conducting my regression analysis with interaction terms as I have read that the "main effect" (i.e. the significance and the coefficient of my X1 variable ln_rpat_rRD) cannot be easily interpreted anymore once interaction terms are included in the model.
    Thus, my question is whether I need to pursue a step-wise process, or whether the entire analysis can be done in one model only.
    By this, I mean the following:

    1) Step-wise process: I first need to identify one regression model where I have a significant and positive relationship between X1 and Y with no interaction term included but rather including all relevant IVs as control variables only (I would assume I need this somehow as this positive and significant correlation between X1 and Y builds the basis for my entire analysis). Second, once I have identified this significant and positive correlation, I then separately add an interaction term to the model and analyze its significance and respective coefficient. In this way, I would test all three different interaction effects that I have hypothesized separately and never in combination in one model.

    2) One full model: I include all three interaction terms in one model directly without having to go through this step-wise process. If this is the recommended approach, I would really need help in understanding
    a) Whether to run my regressions and exclude non-significant interaction terms until I end up only having significant interaction terms in the model anymore. In different contexts, I have read differing opinions, with one stating that it might make sense to exclude them, and one stating to rather keep them in as they theoretically make sense.
    b) How do I interpret the main effect of my X1-Y relationship. As I mentioned before, I need this relationship to be significant in order for the interpretation of the interaction terms to even make sense. Yet, I have no experience with regard to plotting and interpreting graphs in STATA as some might suggest, so if this is the case, I would greatly appreciate a more extensive answer.

    I would be very happy if someone could help me.

  • #2
    Welcome to the Stata Forum / Statalist,

    To start, I'd recommend to take a look at - margins - and - marginsplot - commands for Poisson postestimations.

    Interaction terms should not be used in an isolated way, I mean, the model should have the main effects and, if appropriate, we may add interaction terms.

    Usually, non-significant interaction terms will not "improve" the model. That said, sometimes they are non-significant due to lack of power.

    A more helpful answer, as remarked in the FAQ, can be rendered when the query presents data (full, abridged or mock) under CODE delimiters or by installing the SSC dataex.
    Last edited by Marcos Almeida; 30 Jun 2017, 03:32.
    Best regards,

    Marcos

    Comment


    • #3
      Dear Marcos,

      Thank you for your reply
      Below, you find the SSC dataex that you requested
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input byte trans float(mc_L4_ln_rRD_s mc_L4_relsize_s_ln mc_L4_relROS_s mc_L4_sthom mc_L4_debt_rat L4_noSGA L4_orgslack)
      0  -.1567566   .11731642  -5.326488    .161819  -.12070493 0  .9365643
      1 .028904006  .018240513  -4.901955   .1618459  -.06999952 0 .53556466
      0  -.1691453 -.008716343  -5.673233  .16184373 -.026724854 0  .6101543
      0  -.3631491  -.04843283  -5.638961   -.899178  -.09353696 0  3.144684
      1  .08067803  -.09498632   6.401905 -2.7186046  -.09970851 0  2.997201
      2          .           .          .          .           . .         .
      2 -.07431193  -.12937951 -4.3262672   .1879928   .14366318 0  .8815936
      1   -.417279  -.07793838  -3.767993  .18800128   .05663762 0 1.4779098
      0   .1951819  .014789643   -4.85533   .1618338  -.02707594 0 1.0076804
      0   .1967204  .003574552  -9.806684  .18799643   .13440071 0 1.0428796
      end
      The command I used was
      Code:
      xtnbreg trans c.mc_L4_ln_rRD_s##c.mc_L4_relsize_s_ln c.mc_L4_ln_rRD_s##c.mc_L4_relROS_s c.mc_L4_sthom mc_L4_debt_rat L4_noSGA L4_orgslack, fe
      As you see in the code, the main effects are included separately as well as the interaction terms. But how do I interpret the coefficient of mc_L4_ln_rRD_s?

      I'm afraid I don't know how to post the output field of my regression here to make my question more clear.
      Any advice or is this already sufficient?
      Last edited by Corinna Leis; 30 Jun 2017, 08:18.

      Comment


      • #4
        I'm not sure I understand how to interpret the coefficient of X1 (=mc_L4_ln_rRD_s). As this is the main relationship that I want to analyze (correlation X1 and Y), I would assume it would need to be significant and for my case positive (although it is negative here), but all the forum entries make it look like I am not able to interpret this X1 coefficient at all anymore. So I am not sure how to run this analysis or how to interpret it eventually.
        Last edited by Corinna Leis; 30 Jun 2017, 08:18.

        Comment


        • #5
          Arguably, the odds of getting a clarifying answer improve on a par with a clear query.

          In #1, it seemed your doubts were mostly related to core knowledge concerning interaction terms and count data regression models.

          However in #3, in spite of no information about this, we see that, actually, the main structure relates to panel data.

          What is more, it is a fixed- effects technique (which, by the way, will "leave out" the time invariant variables... ).

          On top of that, there is no output to interpret, neither an information about the reason for having selected the FE model.

          Last but not least, we will surely fail to reproduce the xtnbreg command shared in #3, for there is no clue about the xtsetting.

          On such grounds, sorry, but I fear I cannot help much.
          Best regards,

          Marcos

          Comment


          • #6
            Hi Marcos,

            Sorry, I was not aware I could actually post my output by using the {CODE} thing. But here you go.
            Yes, I am using an unbalanced panel dataset with
            xtset ID t_d, quarterly (starting 1Q1997 and ending 4Q2014)
            I selected the negative binomial model based on the alpha test at the end of the xtnbreg regression output
            I chose fixed effects despite the hausman test suggesting differently as my thesis supervisor advised me to do so as her field of research apparently tends to be rather sceptical towards random effects.
            Is there anything else that you would need in order to be able to answer my question? Sorry, I am new here and thus I didn't know what it was precisely that you would need to see from me.
            But your help would be very much appreciated.

            Code:
            xtnbreg trans c.mc_L4_ln_rRD_s##c.mc_L4_relsize_s_ln c.mc_L4_ln_rRD_s##c.mc_L4_relROS_s c.mc_L4_sthom mc_L4_debt_rat L4_noSGA L4_orgslack, fe
            note: mc_L4_ln_rRD_s omitted because of collinearity
            
            Iteration 0:   log likelihood = -775.51954  
            Iteration 1:   log likelihood = -743.93862  
            Iteration 2:   log likelihood = -738.25253  
            Iteration 3:   log likelihood = -737.29391  
            Iteration 4:   log likelihood = -737.18144  
            Iteration 5:   log likelihood = -737.17367  
            Iteration 6:   log likelihood = -737.17359  
            Iteration 7:   log likelihood = -737.17359  
            
            Conditional FE negative binomial regression     Number of obs      =       600
            Group variable: ID                              Number of groups   =        10
            
                                                            Obs per group: min =        60
                                                                           avg =      60.0
                                                                           max =        60
            
                                                            Wald chi2(9)       =     18.68
            Log likelihood  = -737.17359                    Prob > chi2        =    0.0281
            
            -------------------------------------------------------------------------------------------------------
                                            trans |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            --------------------------------------+----------------------------------------------------------------
                                   mc_L4_ln_rRD_s |  -1.008503   .4454698    -2.26   0.024    -1.881608   -.1353983
                               mc_L4_relsize_s_ln |  -2.245752   1.814972    -1.24   0.216    -5.803031    1.311526
                                                  |
            c.mc_L4_ln_rRD_s#c.mc_L4_relsize_s_ln |  -.7896045    2.74339    -0.29   0.773     -6.16655    4.587341
                                                  |
                                   mc_L4_ln_rRD_s |          0  (omitted)
                                   mc_L4_relROS_s |   .0311535   .0144101     2.16   0.031     .0029102    .0593969
                                                  |
                c.mc_L4_ln_rRD_s#c.mc_L4_relROS_s |  -.1832663   .0825191    -2.22   0.026    -.3450007    -.021532
                                                  |
                                      mc_L4_sthom |  -.0042235   .1083706    -0.04   0.969     -.216626    .2081789
                                   mc_L4_debt_rat |   1.894229   .9165164     2.07   0.039     .0978902    3.690568
                                         L4_noSGA |   .4649841   .2546079     1.83   0.068    -.0340383    .9640064
                                      L4_orgslack |  -.0190885   .0249952    -0.76   0.445    -.0680783    .0299012
                                            _cons |    3.33684   1.673753     1.99   0.046     .0563432    6.617336
            -------------------------------------------------------------------------------------------------------
            
             hausman fe re
            
                             ---- Coefficients ----
                         |      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
                         |       fe           re         Difference          S.E.
            -------------+----------------------------------------------------------------
            mc_L4_ln~D_s |   -1.008503    -1.156585         .148082        .1120009
            mc_L4_r~s_ln |   -2.245752    -.2798344       -1.965918        1.593889
                      c. |
            mc_L4_ln~D_s#|
                      c. |
            mc_L4_r~s_ln |   -.7896045     .8608161       -1.650421        1.127575
            mc_L4_relR~s |    .0311535     .0319416        -.000788        .0031288
                      c. |
            mc_L4_ln~D_s#|
                      c. |
            mc_L4_relR~s |   -.1832663    -.1797339       -.0035324         .014296
             mc_L4_sthom |   -.0042235      .051087       -.0553105        .0101274
            mc_L4_debt~t |    1.894229     1.041852        .8523774        .5956901
                L4_noSGA |    .4649841      .600622       -.1356379         .092718
             L4_orgslack |   -.0190885    -.0236521        .0045635               .
            ------------------------------------------------------------------------------
                                     b = consistent under Ho and Ha; obtained from xtnbreg
                      B = inconsistent under Ha, efficient under Ho; obtained from xtnbreg
            
                Test:  Ho:  difference in coefficients not systematic
            
                              chi2(9) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                                      =       10.34
                            Prob>chi2 =      0.3240
                            (V_b-V_B is not positive definite)

            Comment


            • #7
              The theme is far from my field. That said, I got the impression the variables described have different names, when compared to the variables shown in the output. Also, if I got it right, it seems there are many more time measures than "individuals", i.e, just 10, followed up to 60 quarters. There is collinearity up to the point of omitting one variable, and maybe you could check whether some of the variables are directly related as well. The non-significant interaction term, IMHO, should be excluded. The coefficients for re versus fe for the variable mc_L4_relsize_s_ln were very different, and maybe it has something to do with what was previously said about time invariant variables. The same for mc_L4_r~s_ln , whose coefficient goes under re in the opposite direction of fe. You may wish to consider the re model, and see if it "explains" the results accordingly. Also, I wonder whether you checked the need to add a quadratic term so as to improve the model. This is the furthest I can go. Hopefully you will get more insightful advice. Meanwhile, I recommend to (re)discuss with your supervisor over the above-mentioned aspects. Finally, as remarked in #2, under - margins - plus - marginsplot - command - you can get a clarifying perspective of interactions.
              Last edited by Marcos Almeida; 30 Jun 2017, 15:42.
              Best regards,

              Marcos

              Comment

              Working...
              X