Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • When does multi-collinearity increase significance of regressors?

    I'm stuck with some unexpected findings that seem to contradict what I (thought I) knew about multicollinearity. Among others in this post, the discussion of multicollinearity suggests that estimates remain unbiased and the variance and standard errors of the estimates will increase. This I always presumed that including multi-collinear variables makes it less likely to have significant effects.
    Now my question is, under which conditions does the opposite happen?

    Specifically, I am dealing with regressions in which I am trying to look at the contingent effect of direct and indirect ties in a collaboration network on the impact of inventions. I reproduce the outcomes of three regressions: In the first one, I only include direct ties dt, in the second one only indirect ties it, and in the third one I include dt and it (note both have a correlation of 0.87).

    Code:
    xtpoisson fwd log_assets log_breadth log_depth firm_prod degree struct team_deg team_str_hole team_size team_div team_sim team_mk claims num
    > _cited_patents num_sbcls backcitation_struct lag it priv c.priv#c.priv pub c.pub#c.pub dt c.priv#c.dt c.priv#c.priv#c.dt c.pub#c.dt c.pub#c.
    > pub#c.dt i.app_year i.grant i.tech_cat, fe robust
    note: 10 groups (10 obs) dropped because of only one obs per group
    
    Iteration 0:   log pseudolikelihood = -263803.46  
    Iteration 1:   log pseudolikelihood = -244586.92  
    Iteration 2:   log pseudolikelihood = -244120.53  
    Iteration 3:   log pseudolikelihood = -244117.75  
    Iteration 4:   log pseudolikelihood = -244117.75  
    
    Conditional fixed-effects Poisson regression    Number of obs     =     39,785
    Group variable: firm                            Number of groups  =        127
    
                                                    Obs per group:
                                                                  min =          2
                                                                  avg =      313.3
                                                                  max =      8,114
    
                                                    Wald chi2(44)     =   55360.70
    Log pseudolikelihood  = -244117.75              Prob > chi2       =     0.0000
    
                                              (Std. Err. adjusted for clustering on firm)
    -------------------------------------------------------------------------------------
                        |               Robust
                    fwd |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    --------------------+----------------------------------------------------------------
             log_assets |  -.0421138   .0183623    -2.29   0.022    -.0781032   -.0061243
            log_breadth |  -.0621538   .0740653    -0.84   0.401    -.2073191    .0830115
              log_depth |  -.0143182   .0451562    -0.32   0.751    -.1028227    .0741862
              firm_prod |  -.0001577   .0000945    -1.67   0.095    -.0003428    .0000274
                 degree |   .0003697   .0004692     0.79   0.431      -.00055    .0012894
                 struct |  -.0872349   .0384485    -2.27   0.023    -.1625926   -.0118771
       team_degree_cent |  -.0000941   .0005957    -0.16   0.875    -.0012615    .0010734
          team_str_hole |   .0418548    .036082     1.16   0.246    -.0288646    .1125742
              team_size |    .037857   .0061041     6.20   0.000     .0258932    .0498208
               team_div |   .0015678   .0583958     0.03   0.979    -.1128859    .1160214
               team_sim |  -.0420113   .0478152    -0.88   0.380    -.1357274    .0517047
                team_mk |   .0080637   .0122664     0.66   0.511     -.015978    .0321053
                 claims |   .0065597   .0005373    12.21   0.000     .0055065    .0076129
      num_cited_patents |   .0014349   .0003425     4.19   0.000     .0007637    .0021061
              num_sbcls |   .0210066   .0026866     7.82   0.000      .015741    .0262721
    backcitation_struct |   .0046322   .0054924     0.84   0.399    -.0061328    .0153971
                    lag |    .001186   .1057822     0.01   0.991    -.2061433    .2085153
                     it |    -.02097   .0265965    -0.79   0.430    -.0730983    .0311582
                   priv |   .4264326   .1077134     3.96   0.000     .2153183     .637547
                        |
          c.priv#c.priv |  -.4015753   .1140479    -3.52   0.000    -.6251051   -.1780454
                        |
                    pub |   .2271145    .070996     3.20   0.001     .0879649    .3662641
                        |
            c.pub#c.pub |  -.1989517   .0592801    -3.36   0.001    -.3151385   -.0827649
                        |
                     dt |   .0026489   .0044727     0.59   0.554    -.0061175    .0114153
                        |
            c.priv#c.dt |   .0065184   .0102537     0.64   0.525    -.0135786    .0266153
                        |
     c.priv#c.priv#c.dt |  -.0025707    .009038    -0.28   0.776    -.0202848    .0151434
                        |
             c.pub#c.dt |   .0055046   .0173412     0.32   0.751    -.0284835    .0394927
                        |
       c.pub#c.pub#c.dt |  -.0078393   .0186939    -0.42   0.675    -.0444787    .0288001
                        |
               app_year |
                  2001  |  -.0546176   .1084301    -0.50   0.614    -.2671367    .1579014
                  2002  |   -.023543   .2147132    -0.11   0.913    -.4443732    .3972871
                  2003  |  -.0025697   .3204385    -0.01   0.994    -.6306176    .6254782
                  2004  |  -.1106726   .4307254    -0.26   0.797    -.9548789    .7335336
                        |
                  grant |
                  2001  |  -.3226067   .1814502    -1.78   0.075    -.6782426    .0330293
                  2002  |  -.4679425   .2522891    -1.85   0.064      -.96242     .026535
                  2003  |  -.5601343   .3512596    -1.59   0.111    -1.248591    .1283218
                  2004  |  -1.296592   .4499721    -2.88   0.004    -2.178521   -.4146625
                  2005  |  -.9112545   .5510036    -1.65   0.098    -1.991202    .1686928
                  2006  |   -.953167   .6529846    -1.46   0.144    -2.232993    .3266594
                  2007  |  -1.062879   .7565935    -1.40   0.160    -2.545775    .4200173
                  2008  |  -1.142632   .8518252    -1.34   0.180    -2.812178     .526915
                        |
               tech_cat |
                     2  |   .4231011   .0461803     9.16   0.000     .3325894    .5136128
                     3  |   .2627944   .2686015     0.98   0.328    -.2636548    .7892437
                     4  |   .3656174   .0411299     8.89   0.000     .2850043    .4462306
                     5  |   .1058336   .0698564     1.52   0.130    -.0310824    .2427496
                     6  |   .1518013   .0784977     1.93   0.053    -.0020514    .3056541
    -------------------------------------------------------------------------------------
    Code:
     xtpoisson fwd log_assets log_breadth log_depth firm_prod degree struct team_deg team_str_hole team_size team_div team_sim team_mk claims num
    > _cited_patents num_sbcls backcitation_struct lag ///
    > priv c.priv#c.priv pub c.pub#c.pub it c.priv#c.it c.priv#c.priv#c.it  c.pub#c.it c.pub#c.pub#c.it i.app_year i.grant i.tech_cat, fe robust
    note: 10 groups (10 obs) dropped because of only one obs per group
    
    Iteration 0:   log pseudolikelihood = -263803.46  
    Iteration 1:   log pseudolikelihood = -244617.71  
    Iteration 2:   log pseudolikelihood =  -244151.4  
    Iteration 3:   log pseudolikelihood = -244148.57  
    Iteration 4:   log pseudolikelihood = -244148.57  
    
    Conditional fixed-effects Poisson regression    Number of obs     =     39,785
    Group variable: firm                            Number of groups  =        127
    
                                                    Obs per group:
                                                                  min =          2
                                                                  avg =      313.3
                                                                  max =      8,114
    
                                                    Wald chi2(43)     =   43653.67
    Log pseudolikelihood  = -244148.57              Prob > chi2       =     0.0000
    
                                              (Std. Err. adjusted for clustering on firm)
    -------------------------------------------------------------------------------------
                        |               Robust
                    fwd |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    --------------------+----------------------------------------------------------------
             log_assets |  -.0417972   .0184625    -2.26   0.024    -.0779831   -.0056114
            log_breadth |  -.0609179    .074052    -0.82   0.411    -.2060571    .0842213
              log_depth |  -.0139061   .0451387    -0.31   0.758    -.1023763    .0745641
              firm_prod |  -.0001575   .0000948    -1.66   0.097    -.0003434    .0000283
                 degree |   .0003651   .0004677     0.78   0.435    -.0005516    .0012817
                 struct |  -.0865138   .0385132    -2.25   0.025    -.1619983   -.0110293
       team_degree_cent |  -.0000917   .0006003    -0.15   0.879    -.0012682    .0010849
          team_str_hole |   .0408362   .0358463     1.14   0.255    -.0294212    .1110935
              team_size |   .0373599   .0063835     5.85   0.000     .0248486    .0498713
               team_div |   .0025884   .0604123     0.04   0.966    -.1158175    .1209943
               team_sim |  -.0415313   .0475202    -0.87   0.382    -.1346692    .0516065
                team_mk |   .0075196   .0118594     0.63   0.526    -.0157243    .0307636
                 claims |   .0065466   .0005445    12.02   0.000     .0054794    .0076138
      num_cited_patents |   .0014235   .0003391     4.20   0.000     .0007589     .002088
              num_sbcls |   .0210129   .0026838     7.83   0.000     .0157527    .0262731
    backcitation_struct |   .0050463   .0056992     0.89   0.376     -.006124    .0162165
                    lag |   .0015166   .1058611     0.01   0.989    -.2059674    .2090006
                   priv |   .4084412   .1079815     3.78   0.000     .1968013    .6200812
                        |
          c.priv#c.priv |  -.3834548   .1167273    -3.29   0.001    -.6122361   -.1546735
                        |
                    pub |   .2400234   .0716232     3.35   0.001     .0996444    .3804024
                        |
            c.pub#c.pub |  -.2132266   .0570253    -3.74   0.000    -.3249941    -.101459
                        |
                     it |  -.0070345   .0077406    -0.91   0.363    -.0222058    .0081368
                        |
            c.priv#c.it |  -.0211951   .0935572    -0.23   0.821    -.2045638    .1621736
                        |
     c.priv#c.priv#c.it |   .0148457   .0691849     0.21   0.830    -.1207542    .1504456
                        |
             c.pub#c.it |   .1324775   .0920244     1.44   0.150    -.0478869     .312842
                        |
       c.pub#c.pub#c.it |  -.1396024    .072464    -1.93   0.054    -.2816292    .0024244
                        |
               app_year |
                  2001  |  -.0545978   .1084881    -0.50   0.615    -.2672305    .1580349
                  2002  |  -.0236868   .2148361    -0.11   0.912    -.4447579    .3973842
                  2003  |  -.0021412   .3206502    -0.01   0.995    -.6306041    .6263217
                  2004  |  -.1102924   .4309893    -0.26   0.798    -.9550158     .734431
                        |
                  grant |
                  2001  |  -.3217998   .1813656    -1.77   0.076    -.6772698    .0336703
                  2002  |  -.4677767   .2523486    -1.85   0.064    -.9623709    .0268175
                  2003  |  -.5597596   .3512835    -1.59   0.111    -1.248263    .1287434
                  2004  |  -1.296205   .4500562    -2.88   0.004    -2.178299   -.4141108
                  2005  |  -.9120754   .5512629    -1.65   0.098    -1.992531      .16838
                  2006  |  -.9536463   .6532563    -1.46   0.144    -2.234005    .3267125
                  2007  |  -1.063942    .756966    -1.41   0.160    -2.547568    .4196842
                  2008  |  -1.143782   .8523521    -1.34   0.180    -2.814361    .5267978
                        |
               tech_cat |
                     2  |   .4247155   .0460621     9.22   0.000     .3344355    .5149954
                     3  |   .2651247   .2684868     0.99   0.323    -.2610997    .7913492
                     4  |   .3663356    .041281     8.87   0.000     .2854264    .4472449
                     5  |    .104499   .0699651     1.49   0.135    -.0326301    .2416281
                     6  |   .1533703   .0775916     1.98   0.048     .0012936     .305447
    -------------------------------------------------------------------------------------
    Code:
     xtpoisson fwd log_assets log_breadth log_depth firm_prod degree struct team_deg team_str_hole team_size team_div team_sim team_mk claims num
    > _cited_patents num_sbcls backcitation_struct lag ///
    > priv c.priv#c.priv pub c.pub#c.pub dt it c.priv#c.dt c.priv#c.priv#c.dt c.priv#c.it c.priv#c.priv#c.it c.pub#c.dt c.pub#c.pub#c.dt c.pub#c.i
    > t c.pub#c.pub#c.it ///
    > i.app_year i.grant i.tech_cat, fe robust
    note: 10 groups (10 obs) dropped because of only one obs per group
    
    Iteration 0:   log pseudolikelihood = -263803.46  
    Iteration 1:   log pseudolikelihood = -244485.64  
    Iteration 2:   log pseudolikelihood = -244016.26  
    Iteration 3:   log pseudolikelihood = -244013.44  
    Iteration 4:   log pseudolikelihood = -244013.44  
    
    Conditional fixed-effects Poisson regression    Number of obs     =     39,785
    Group variable: firm                            Number of groups  =        127
    
                                                    Obs per group:
                                                                  min =          2
                                                                  avg =      313.3
                                                                  max =      8,114
    
                                                    Wald chi2(48)     =   65473.80
    Log pseudolikelihood  = -244013.44              Prob > chi2       =     0.0000
    
                                              (Std. Err. adjusted for clustering on firm)
    -------------------------------------------------------------------------------------
                        |               Robust
                    fwd |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    --------------------+----------------------------------------------------------------
             log_assets |  -.0416536   .0185879    -2.24   0.025    -.0780853   -.0052219
            log_breadth |   -.063431   .0738539    -0.86   0.390     -.208182      .08132
              log_depth |  -.0137816   .0450594    -0.31   0.760    -.1020963    .0745332
              firm_prod |  -.0001579   .0000946    -1.67   0.095    -.0003433    .0000275
                 degree |   .0003798   .0004742     0.80   0.423    -.0005497    .0013093
                 struct |  -.0864763   .0388271    -2.23   0.026    -.1625761   -.0103765
       team_degree_cent |  -.0001107   .0005991    -0.18   0.853    -.0012849    .0010635
          team_str_hole |   .0414972   .0363237     1.14   0.253    -.0296959    .1126903
              team_size |   .0378755   .0061385     6.17   0.000     .0258443    .0499068
               team_div |   .0007188   .0581518     0.01   0.990    -.1132566    .1146941
               team_sim |  -.0402486   .0477124    -0.84   0.399    -.1337633    .0532661
                team_mk |   .0073269    .012242     0.60   0.550     -.016667    .0313208
                 claims |   .0065359   .0005381    12.15   0.000     .0054813    .0075905
      num_cited_patents |   .0014213   .0003433     4.14   0.000     .0007484    .0020942
              num_sbcls |   .0209203    .002683     7.80   0.000     .0156617    .0261789
    backcitation_struct |   .0062702   .0058392     1.07   0.283    -.0051744    .0177148
                    lag |   .0012678   .1057345     0.01   0.990     -.205968    .2085035
                   priv |   .3277654   .1057273     3.10   0.002     .1205437    .5349871
                        |
          c.priv#c.priv |   -.333851   .1163007    -2.87   0.004    -.5617962   -.1059058
                        |
                    pub |   .3216886   .0734082     4.38   0.000     .1778112     .465566
                        |
            c.pub#c.pub |  -.2669564    .057914    -4.61   0.000    -.3804658   -.1534469
                        |
                     dt |   .0011642   .0054724     0.21   0.832    -.0095615      .01189
                     it |  -.0148946   .0319061    -0.47   0.641    -.0774295    .0476403
                        |
            c.priv#c.dt |   .0637551   .0240834     2.65   0.008     .0165524    .1109578
                        |
     c.priv#c.priv#c.dt |  -.0336542   .0185492    -1.81   0.070    -.0700099    .0027016
                        |
            c.priv#c.it |   -.352787   .1849786    -1.91   0.056    -.7153385    .0097644
                        |
     c.priv#c.priv#c.it |   .1721147   .1350795     1.27   0.203    -.0926363    .4368657
                        |
             c.pub#c.dt |  -.0737924   .0138189    -5.34   0.000    -.1008769   -.0467079
                        |
       c.pub#c.pub#c.dt |   .0548186   .0163733     3.35   0.001     .0227275    .0869098
                        |
             c.pub#c.it |   .5046088   .1258178     4.01   0.000     .2580105    .7512072
                        |
       c.pub#c.pub#c.it |  -.3730006   .0887886    -4.20   0.000    -.5470232    -.198978
                        |
               app_year |
                  2001  |  -.0545112   .1084542    -0.50   0.615    -.2670774    .1580551
                  2002  |  -.0231496   .2147579    -0.11   0.914    -.4440673    .3977681
                  2003  |  -.0026099   .3203355    -0.01   0.993    -.6304559    .6252361
                  2004  |  -.1108742   .4305263    -0.26   0.797    -.9546902    .7329418
                        |
                  grant |
                  2001  |  -.3247795   .1811756    -1.79   0.073    -.6798773    .0303182
                  2002  |  -.4692111    .251969    -1.86   0.063    -.9630614    .0246391
                  2003  |  -.5624661   .3507764    -1.60   0.109    -1.249975     .125043
                  2004  |  -1.298661   .4493931    -2.89   0.004    -2.179455   -.4178663
                  2005  |  -.9139517   .5503947    -1.66   0.097    -1.992705     .164802
                  2006  |  -.9544268   .6524444    -1.46   0.144    -2.233194    .3243408
                  2007  |  -1.065164   .7559497    -1.41   0.159    -2.546799    .4164698
                  2008  |  -1.144925   .8512318    -1.35   0.179    -2.813308     .523459
                        |
               tech_cat |
                     2  |   .4252098    .046273     9.19   0.000     .3345164    .5159031
                     3  |   .2677324   .2693331     0.99   0.320    -.2601507    .7956155
                     4  |   .3684181   .0411073     8.96   0.000     .2878494    .4489868
                     5  |   .1054494    .070141     1.50   0.133    -.0320244    .2429231
                     6  |   .1546547   .0782129     1.98   0.048     .0013602    .3079493
    -------------------------------------------------------------------------------------
    As you can see the interaction effects are ONLY significant when both are included. Any suggestions on how to interpret this?
    Note that running a simple OLS on the log of the response variable has the same peculiar results, with only significant interactions when both direct and indirect ties are included.

  • #2
    Simon:
    I'm not sure whether what follows is relevant, but the two first regressions shows a different number of parameters Wald chi2(44) vs Wald chi2(43), whereas it seems, from your description, that, in te second model, you replaced one predictor only vs the first model.
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Dear Simon Schillebeeckx,

      Jeff Wooldridge's book has a really nice and clear discussion of this problem in the context of linear models. In short, adding regressors that are highly collinear may lead to increased significance if their inclusion causes a large enough reduction in the variance of the error term. I did not think deeply about the problem in the context of Poisson regression, but it is likely that the situation is similar.

      Best wishes,

      Joao

      Comment


      • #4
        Goldberger's text also has a nice discussion of colinearity.

        Comment


        • #5
          Thanks for the feedback all,
          I've tried a few times to read Dr. Wooldridge's book but it is beyond my statistical capabilities to understand much of the maths unfortunately. Which Goldberger text Phil Bromiley ? I could look into it.
          Joao Santos Silva if adding a collinear regressor indeed reduces the variance of the error term, is that not the entire point of running a regression with additional regressors? If the variance goes down, does this mean that the regression with the collinear regression (in my case the third model I provided) has a better model fit ?

          Thanks again!

          Simon

          Comment


          • #6
            Dear Simon Schillebeeckx

            I believe Phil Bromiley was referring to Chapter 23 of the book "A Course in Econometrics" by the great Arthur Goldberger; the book is freely available here and the rest of the book is also a "must".

            About your question, adding new variables will always reduce the variance of the error (and increase the R2) but that is not a good reason to include variables. In general, the purpose of estimating a model is not to get a good fit. Adding new variables also always increases collinearity which reduces the precision of the estimates.So, adding new variables has two opposite effects on the precision of the estimates and the overall effect depends on the particular application.

            Best wishes,

            Joao

            Comment

            Working...
            X