Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Negative binomial regression interaction + other issue

    Dear Statalisters,

    For my thesis I am using a negative binomial regression as I use count data.
    However, in the past I have only used OLS, thus I have some questions on nbreg that I am not able to answer.

    First, in a video tutorial it was mentioned that the DV in count data models cannot be too large. In my dataset I do have large DV values, even exceeding 1000s. Is there a threshold? Or would in my case count data models still be applicable?

    Second, I cannot seem to find how to analyze an interaction effect with a negative binomial regression. What I am trying to analyze are the following hypotheses:

    H1: The effect of knowledge similarity on exploitative interorganizational learning is stronger when the alliance is a joint-venture or (Hypothesis 2) repeated

    Please find below the dataex of my dataset. The unit of analysis is the firm-year. FirmyearID = unique ID. NameP1 = focal firm, NameP2 = partner firm, total_exploitative = Exploitative interorganizational learning, nFMk_s = knowledge similarity between the partnering firms, nFMrepeat = average percentage of repeated alliances, nFMjointventure = average percentage of joint ventures

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float firmyearID str30(NameP1 NameP2) float(total_exploitative nFMk_s nFMrepeat nFMjointventure)
     1 "Actel Corp"                    "Synopsys Inc"               0  .004682927         0  0
     2 "Adobe Systems Inc"             "AT&T Corp"                  2   .04184891         0  0
     3 "Advanced Micro Devices Inc"    "Mentor Graphics Corp"       3 .0023655505         0  0
     4 "Altera Corp"                   "Synopsys Inc"               3  .014079786         0  0
     5 "Analog Devices Inc"            "Hitachi Ltd"               17   .28879926         0  0
     6 "Andrea Electronics Corp"       "Microsoft Corp"             0 .0021314388         0  0
     7 "Apple Computer Inc"            "IBM Corp"                  69     .319171        .5  0
     8 "Atmel Corp"                    "DSP Group Inc"              0           0         0  0
     9 "BMC Software Inc"              "IBM Corp"                  17   .07380227         0  0
    10 "Cirrus Logic Inc"              "Lucent Technologies Inc"   19    .1505007         0 .5
    11 "Citrix Systems Inc"            "Sun Microsystems Inc"       7  .016647745        .5  0
    12 "Consilium Inc"                 "Hewlett-Packard Co"         0 .0087808175         1  0
    13 "Corning Inc"                   "PeopleSoft Inc"             0 .0003534922         0  0
    14 "DSP Group Inc"                 "Texas Instruments Inc"      0  .006935244         0  0
    15 "Data General Corp"             "Unisys Corp"                5   .07923308         0  0
    16 "Energy Conversion Devices Inc" "Canon Inc"                  1   .04497243         1  0
    17 "GE"                            "United Technologies Corp" 102   .33444855         0  1
    18 "Gensym Corp"                   "Motorola Inc"               0 .0031369035         0  0
    19 "Harman Intl Industries Inc"    "Compaq Computer Corp"       1  .000671141         0  0
    20 "Hewlett-Packard Co"            "Guidant Corp"               0 .0029042866  .2857143  0
    21 "IBM Corp"                      "Sun Microsystems Inc"     402    .3668424         1 .5
    22 "Integrated Silicon Solution"   "Intel Corp"                 8  .035957843         0  0
    23 "Intel Corp"                    "Lockheed Martin Corp"      13     .327844  .3333333  0
    24 "Lucent Technologies Inc"       "NEC Corp"                 413    .3758927        .2  0
    25 "Mentor Graphics Corp"          "Dassault Systemes SA"       0           0         0  0
    26 "Microsoft Corp"                "Rational Software Corp"     2   .08655567 .16666667  0
    27 "Molex Inc"                     "Teradyne Inc"               1  .006868887         1  0
    28 "Motorola Inc"                  "Sun Microsystems Inc"      66   .22535086  .3333333  0
    29 "NetManage Inc"                 "Intel Corp"                 0 .0006203474         0  0
    30 "Oracle Corp"                   "Digital Equipment Corp"    54   .11729223         0  0
    end
    Any help is much appreciated

    Regards

    Guy Swillens

  • #2
    Well, I'm not aware of any limit on the size of the DV in a negative binomial model. It is true of all regression models that if some of the predictor variables have very wide scales spanning many orders of magnitude, it can be difficult to get convergence. But that's a separate issue. Anyway, interaction terms are handled exactly the same way in a negative binomial regression as they are in a linear regression. To the extent I understand the design of your study it looks like

    Code:
    nbreg total_exploitative c.nFMk_s##c.nFMjointventure // H1
    nbreg total_exploitative c.nFMk_s##c.nFMrepeat // H2
    I am here treating your variables for joint venture and repeat as continuous variables, which strikes me as odd, but perhaps that is because I don't really understand what these variables represent.

    After you run each regression, you will want to estimate predicted outcomes, and perhaps marginal effects, at various combinations of values of nFMKs and the other variable.
    The -margins- command will do that for you, and, if you like, you can also create graphs with -marginsplot-. The -margins- section of the user manuals is quite complete and includes many helpful examples, but an easier introduction would probably come from reading Richard Williams' excellent http://www.stata-journal.com/sjpdf.h...iclenum=st0260. In models with interaction terms, understanding is more easily gleaned from studying the predicted outcomes or marginal effects than from looking at the regression coefficients.

    Comment


    • #3
      Thank you very much for your reply.

      I changed the continuous interaction variable to a dummy variable.

      I do have 2 questions concerning the interaction and margins command. It concerns the following hypothesis:
      H1: The effect of knowledge complementarity on explorative interorganizational learning is stronger when the alliance is formed with a new partner.

      Please find attached a screenshot of the regression. The RepeatedallianceP1 variable is a dummy (0 = not repeated, 1 = repeated).

      Following the regression, there is a significant positive effect of 1. RepeatedallianceP1 on my DV. The 1. stands for the baseline (in this case 0 = repeated), isnt that correct?

      However, when I look at the interaction in the regression, there is no "1." in front of the RepeatedallianceP1. What does this mean? Does it use the 0 or 1 value in the interaction?

      Secondly, as for the marginsplot, I get a different representation when I take different margins (obviously). If I use margins RepeatedallianceP1, at (k_c = (0 0.5 0.75 1)), I get the marginsplot shown in Figure 1.When I use margins RepeatedallianceP1, at (k_c = (0 1)), I get the marginsplot shown in Figure 2. Is there something that is not correct in Figure 1? or is it just common practice to use margins with integer values (e.g. 0 1) instead of non-integer steps (e.g. 0 0.25 0.5 0.75 1)?

      kind regards

      Guy Swillens


      Attached Files

      Comment


      • #4
        Following the regression, there is a significant positive effect of 1. RepeatedallianceP1 on my DV. The 1. stands for the baseline (in this case 0 = repeated), isnt that correct?
        No, that isn't correct. You are using an interaction model. So there is no such thing as the effect of RepeatedallianceP1 on your DV. There are infinitely many different effects of RepeatedallianceP1 on your DV, and those effects depend on the value of k_c. The 2.558458 coefficient you see in the regression output is the effect of RepeatedallianceP1 on your DV when k_c = 0. For any given value of k_c, the effect of Repeated allianceP1 on DV is 2.558458 -3.489197* k_c. From this formula you can see that if k_c is sufficiently large, this effect will be negative, and if k_c = 2.5584583/3.489197, the effect will be exactly 0. (Actually, strictly speaking, these coefficients are effects on log DV in a negative binomial model--but the language is convoluted enough already, so I'm abstracting away from that in this paragraph.)

        However, when I look at the interaction in the regression, there is no "1." in front of the RepeatedallianceP1. What does this mean? Does it use the 0 or 1 value in the interaction?
        Stata uses the same values of the discrete variable(s) in an interaction term that it uses for the "main effects" of those variables (unless you go out of your way to code you factor variable notation in such a way as to force a different solution). Instead of putting the numbers in front of the variable name (as it did for the main effect in your case), it lists them underneath the general interaction term. So if you look at the line immediately below you will see the 1 there.

        Secondly, as for the marginsplot, I get a different representation when I take different margins (obviously). If I use margins RepeatedallianceP1, at (k_c = (0 0.5 0.75 1)), I get the marginsplot shown in Figure 1.When I use margins RepeatedallianceP1, at (k_c = (0 1)), I get the marginsplot shown in Figure 2. Is there something that is not correct in Figure 1? or is it just common practice to use margins with integer values (e.g. 0 1) instead of non-integer steps (e.g. 0 0.25 0.5 0.75 1)?
        Stata calculates the margins and marginal effects of variables specified in -at()- options at exactly the values specified in the -at()- option, no more, and no less. When -marginsplot- comes in, it just graphs those results. So if you specify only -at(k_c = (0 1))-, two predicted values are calculated and plotted. And of course, with only two points you get a straight line. But, in a negative binomial model, you are using a log link, so the relationship between the actual DV and the predictor variables is non-linear. When you specify more points, with -at(k_c = (0 0.5 0.75 1))-, Stata calculates more points and plots them: since the relationship is not linear, the graph begins to show the curvilinear shape. Indeed, if you were to specify -at(k_c = (0(0.025)1))-, you would get about 40 data points and the graph would look very much like a smooth curve. Just how much computation time you are willing to invest in smoothing the curve is up to you. There are no general rules or conventions that I'm aware of.
        Last edited by Clyde Schechter; 08 Oct 2016, 11:02.

        Comment


        • #5
          Thank you very much for clearing that up. How would I able to smoot the curve? Is there a manual on that? I did not see that process in the -margins- document.

          Comment


          • #6
            No, I'm not suggesting that you smooth the curve. What I'm saying is that the more points you specify in the -at()- option, the smoother the curve will be. But the more points you specify, the longer the calculations will run (which may or may not be a problem for you). Anyway, that's the tradeoff. If you want a smoother curve, specify more points in -at()-.

            Comment

            Working...
            X