Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Collinearity issues with -areg- and not with -reghdfe-

    Hi Statalist.

    I am having trouble diagnosing a collinearity problem. Observations in my dataset are counties across years, ranging from 1996 to 2006. I am running a regression with county fixed effects. Intent, defier, m0, m1, m2 and m3 are binary variables and running is an integer that ranges from 0 to a 100.

    -areg- drops some of the variables due to collinearity:

    Code:
    * areg drops variables
    # delimit ;
        areg ${outcome}
    
        1.intent#0.defier#1.m0
        l1.1.intent#l1.0.defier#1.m1
        l2.1.intent#l2.0.defier#1.m2
        l3.1.intent#l3.0.defier#1.m3
        
        c.running#0.defier#1.m0
        l1.c.running#l1.0.defier#1.m1
        l2.c.running#l2.0.defier#1.m2
        l3.c.running#l3.0.defier#1.m3
    
        0.defier#m0
        0.l1.defier#m1
        0.l2.defier#m2
        0.l3.defier#m3    
            
                        
        if inrange(year,1996,2006)
        & insample                
        , cluster(cty) absorb(cty);
    # delimit cr
    
    note: 0L2.defier#1.m2#cL2.running omitted because of collinearity
    note: 0L3.defier#1.m3#cL3.running omitted because of collinearity
    
    Linear regression, absorbing indicators         Number of obs     =      1,100
                                                    F(  10,     99)   =      11.86
                                                    Prob > F          =     0.0000
                                                    R-squared         =     0.5210
                                                    Adj R-squared     =     0.4683
                                                    Root MSE          =     1.4812
    
                                             (Std. Err. adjusted for 100 clusters in cty)
    -------------------------------------------------------------------------------------
                        |               Robust
             unemp_rate |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    --------------------+----------------------------------------------------------------
       intent#defier#m0 |
                 1 0 1  |  -.9089798    .678034    -1.34   0.183    -2.254346    .4363869
                        |
      L.intent#L.defier#|
                     m1 |
                 1 0 1  |  -1.139256   .6345052    -1.80   0.076    -2.398252    .1197395
                        |
                    L2. |
                 intent#|
           L2.defier#m2 |
                 1 0 1  |    -1.0513    .492405    -2.14   0.035    -2.028339   -.0742618
                        |
                    L3. |
                 intent#|
           L3.defier#m3 |
                 1 0 1  |  -1.969312   1.007312    -1.96   0.053    -3.968037    .0294126
                        |
    defier#m0#c.running |
                   0 1  |  -.0625051   .0289846    -2.16   0.033    -.1200169   -.0049933
                        |
            L.defier#m1#|
             cL.running |
                   0 1  |  -.0364417   .0252344    -1.44   0.152    -.0865122    .0136287
                        |
           L2.defier#m2#|
            cL2.running |
                   0 1  |          0  (omitted)
                        |
           L3.defier#m3#|
            cL3.running |
                   0 1  |          0  (omitted)
                        |
              defier#m0 |
                   0 1  |  -.0751216   .3238042    -0.23   0.817    -.7176194    .5673762
                        |
            L.defier#m1 |
                   0 1  |   .0497029   .2771395     0.18   0.858    -.5002019    .5996077
                        |
           L2.defier#m2 |
                   0 1  |   .4190932   .2787426     1.50   0.136    -.1339925    .9721789
                        |
           L3.defier#m3 |
                   0 1  |   1.926908   .5189397     3.71   0.000     .8972196    2.956597
                        |
                  _cons |   6.062341   .4269764    14.20   0.000     5.215127    6.909555
    --------------------+----------------------------------------------------------------
                    cty |   absorbed                                     (100 categories)
    The omitted variables are collinear before 1999, but they are not after 2000, so they are not collinear in the entire sample. If I estimate the regression in the subsample after 2000, there aren't any collinearity warnings:

    Code:
    * areg doesn't drop variables if post 1999
    # delimit ;
        areg ${outcome}
    
        1.intent#0.defier#1.m0
        l1.1.intent#l1.0.defier#1.m1
        l2.1.intent#l2.0.defier#1.m2
        l3.1.intent#l3.0.defier#1.m3
        
        c.running#0.defier#1.m0
        l1.c.running#l1.0.defier#1.m1
        l2.c.running#l2.0.defier#1.m2
        l3.c.running#l3.0.defier#1.m3
    
        0.defier#m0
        0.l1.defier#m1
        0.l2.defier#m2
        0.l3.defier#m3    
            
                        
        if inrange(year,2000,2006)
        & insample                
        , cluster(cty) absorb(cty);
    # delimit cr
    
    Linear regression, absorbing indicators         Number of obs     =        700
                                                    F(  12,     99)   =       6.15
                                                    Prob > F          =     0.0000
                                                    R-squared         =     0.6168
                                                    Adj R-squared     =     0.5444
                                                    Root MSE          =     1.1326
    
                                             (Std. Err. adjusted for 100 clusters in cty)
    -------------------------------------------------------------------------------------
                        |               Robust
             unemp_rate |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    --------------------+----------------------------------------------------------------
       intent#defier#m0 |
                 1 0 1  |   .4648492   .3104504     1.50   0.137    -.1511518     1.08085
                        |
      L.intent#L.defier#|
                     m1 |
                 1 0 1  |  -.0860283   .2276794    -0.38   0.706    -.5377936    .3657371
                        |
                    L2. |
                 intent#|
           L2.defier#m2 |
                 1 0 1  |  -.1695172   .3319051    -0.51   0.611     -.828089    .4890546
                        |
                    L3. |
                 intent#|
           L3.defier#m3 |
                 1 0 1  |    .226809   .4802937     0.47   0.638    -.7261979    1.179816
                        |
    defier#m0#c.running |
                   0 1  |  -.0109445   .0076418    -1.43   0.155    -.0261074    .0042184
                        |
            L.defier#m1#|
             cL.running |
                   0 1  |   .0101195   .0064702     1.56   0.121    -.0027188    .0229577
                        |
           L2.defier#m2#|
            cL2.running |
                   0 1  |   .0129306   .0083487     1.55   0.125     -.003635    .0294962
                        |
           L3.defier#m3#|
            cL3.running |
                   0 1  |  -.0003409   .0113759    -0.03   0.976    -.0229131    .0222313
                        |
              defier#m0 |
                   0 1  |  -.5470303   .3260741    -1.68   0.097    -1.194032    .0999715
                        |
            L.defier#m1 |
                   0 1  |  -.4926713   .2377283    -2.07   0.041    -.9643758   -.0209669
                        |
           L2.defier#m2 |
                   0 1  |  -.0659969   .2842434    -0.23   0.817    -.6299975    .4980037
                        |
           L3.defier#m3 |
                   0 1  |    .023994   .2798196     0.09   0.932    -.5312288    .5792167
                        |
                  _cons |   6.464826   .2387412    27.08   0.000     5.991112     6.93854
    --------------------+----------------------------------------------------------------
                    cty |   absorbed                                     (100 categories)
    If I estimate the regression in the full sample with -reghdfe- instead of areg, I don't get any collinearity warnings and I get very different results.

    Code:
    * reghdfe doesn't drop variables
    
    # delimit ;
        reghdfe ${outcome}
    
        1.intent#0.defier#1.m0
        l1.1.intent#l1.0.defier#1.m1
        l2.1.intent#l2.0.defier#1.m2
        l3.1.intent#l3.0.defier#1.m3
        
        c.running#0.defier#1.m0
        l1.c.running#l1.0.defier#1.m1
        l2.c.running#l2.0.defier#1.m2
        l3.c.running#l3.0.defier#1.m3
    
        0.defier#m0
        0.l1.defier#m1
        0.l2.defier#m2
        0.l3.defier#m3    
            
                        
        if inrange(year,1996,2006)
        & insample                
        , cluster(cty) absorb(cty);
    # delimit cr
    
    (converged in 1 iterations)
    
    HDFE Linear regression                            Number of obs   =      1,100
    Absorbing 1 HDFE group                            F(  12,     99) =      22.41
    Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                      R-squared       =     0.6201
                                                      Adj R-squared   =     0.5774
                                                      Within R-sq.    =     0.1373
    Number of clusters (cty)     =        100         Root MSE        =     1.3204
    
                                             (Std. Err. adjusted for 100 clusters in cty)
    -------------------------------------------------------------------------------------
                        |               Robust
             unemp_rate |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    --------------------+----------------------------------------------------------------
       intent#defier#m0 |
                 1 0 1  |   .0634621   .2583711     0.25   0.806    -.4492021    .5761264
                        |
      L.intent#L.defier#|
                     m1 |
                 1 0 1  |  -.0027516   .1981444    -0.01   0.989    -.3959132    .3904099
                        |
                    L2. |
                 intent#|
           L2.defier#m2 |
                 1 0 1  |  -.2087028   .2232519    -0.93   0.352     -.651683    .2342773
                        |
                    L3. |
                 intent#|
           L3.defier#m3 |
                 1 0 1  |   .1684336   .2454237     0.69   0.494    -.3185403    .6554075
                        |
    defier#m0#c.running |
                   0 1  |   -.016822   .0066538    -2.53   0.013    -.0300245   -.0036195
                        |
            L.defier#m1#|
             cL.running |
                   0 1  |   .0023001   .0047169     0.49   0.627    -.0070593    .0116596
                        |
           L2.defier#m2#|
            cL2.running |
                   0 1  |   .0021448   .0048294     0.44   0.658    -.0074377    .0117274
                        |
           L3.defier#m3#|
            cL3.running |
                   0 1  |   .0171726   .0048115     3.57   0.001     .0076255    .0267197
                        |
              defier#m0 |
                   0 1  |  -.2761202   .2393532    -1.15   0.251    -.7510489    .1988084
                        |
            L.defier#m1 |
                   0 1  |   -.453506    .149009    -3.04   0.003    -.7491721   -.1578398
                        |
           L2.defier#m2 |
                   0 1  |  -.0499182   .1498133    -0.33   0.740    -.3471802    .2473439
                        |
           L3.defier#m3 |
                   0 1  |   .8106551   .1642063     4.94   0.000     .4848342    1.136476
    -------------------------------------------------------------------------------------
    
    Absorbed degrees of freedom:
    ----------------------------------------------------------------------+
            Absorbed FE |  Num. Coefs.  =   Categories  -   Redundant     |
    --------------------+-------------------------------------------------|
                    cty |            0             100            100 *   |
    ----------------------------------------------------------------------+
    * = fixed effect nested within cluster; treated as redundant for DoF computation
    Any thoughts on why -areg- is diagnosing collinearity in the first regression, and why -reghdfe- doesn't return any collinearity warning?

    Thank you.
    Jorge Eduardo Pérez Pérez
    www.jorgeperezperez.com

  • #2
    There is nothing wrong here and nothing to explain. Your expectations are incorrect.

    Colinearity is only colinearity when it permeates the entire estimation sample. Colinearity in a subset of the data does not lead to colinearity and it does not lead to omission of variables.

    What your results show you is that the relationships among the variables you are working with have changed since 2000. So it is not a surprise that when you smash everything into a single analysis, the results don't resemble what you found when analyzing the parts separately. Many of the coefficients change drastically between the two separate analyses. So it would seem better to analyze the data from the two eras separately.

    Finally, although it has nothing to do with the question you asked, I will note that all of these models appear to be mis-specified. Whenever you have interaction terms, you must also include the constituent effects and all lower-order interaction terms. (There are rare exceptions to this rule, but nothing I see here suggests that these analyses are among the exceptions.) The simplest way to fix this problem will be to replace # by ## throughout your regression commands. Then Stata will expand the variable list to include these other necessary terms.

    Comment


    • #3
      Thank you for the feedback.

      I understand that collinearity in a sub-sample does not lead to collinearity in the full sample. Which is why it's puzzling that the first regression with -areg-, with the entire sample, gives a collinearity warning, where there shouldn't be any. My intuition tells me I should just trust the -reghdfe- results, but I am afraid I am missing something.

      As for mis-specification, these models have fewer variables than needed deliberately. I was trying to produce the smallest example that still showed the issue. My full model has the main effects, and has the same issues.




      Jorge Eduardo Pérez Pérez
      www.jorgeperezperez.com

      Comment


      • #4
        Sergio Correia is the method -reghdfe- uses to diagnose collinearity different from the one that -areg- uses?
        Jorge Eduardo Pérez Pérez
        www.jorgeperezperez.com

        Comment

        Working...
        X