Difference-in-difference with fixed effects - omitted because of collinearity

Erlend Sunde

Join Date: May 2022
Posts: 6

Difference-in-difference with fixed effects - omitted because of collinearity

20 May 2022, 04:07

Hi, I am looking at the impact of trust (trust in government, interpersonal trust) on economic outcomes (GDP per capita) during COVID-19. I look at 37 different countries (OECD) over the period 2017Q1-2021Q3. I am comparing what I define as "high-trust countries" and "low-trust" countries to see if there are any significant differences between the groups. To do this, I estimate a difference-in-difference panel data regression with fixed effects. What is somewhat special in my case is that I do not have a control group per se as the treatment is COVID-19 and both groups (high-trust & low-trust) is hit by the "treatment".

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(date TRUST_high countryid) double GDP_percapita float post
228 1 1 53649.2 0
229 1 1 53855.6 0
230 1 1 54274.4 0
231 1 1 55088.1 0
232 1 1 56136.4 0
233 1 1 56950.7 0
234 1 1 57103.1 0
235 1 1 57714.9 0
236 1 1   57951 0
237 1 1 57788.3 0
238 1 1   58109 0
239 1 1 58406.4 0
240 1 1 57611.9 1
241 1 1 51087.8 1
242 1 1 57265.1 1
243 1 1   56436 1
244 1 1 56654.7 1
245 1 1 59708.1 1
246 1 1 62434.1 1
228 1 2 47831.2 0
229 1 2 48261.6 0
230 1 2 48489.5 0
231 1 2 48947.6 0
232 1 2 49694.6 0
233 1 2 50331.3 0
234 1 2 50331.4 0
235 1 2 49947.8 0
236 1 2 49373.4 0
237 1 2 49532.3 0
238 1 2 49305.1 0
239 1 2 49301.3 0
240 1 2 48290.8 1
241 1 2   42721 1
242 1 2 47049.5 1
243 1 2 48394.3 1
244 1 2 49554.3 1
245 1 2 49799.2 1
246 1 2 51074.4 1
228 1 3 68623.6 0
229 1 3 68642.8 0
230 1 3 69006.8 0
231 1 3 70062.7 0
232 1 3 71151.2 0
233 1 3 72151.5 0
234 1 3 71736.2 0
235 1 3   71675 0
236 1 3 71408.4 0
237 1 3 71728.1 0
238 1 3 72184.6 0
239 1 3 72962.1 0
240 1 3 72526.9 1
241 1 3 67972.3 1
242 1 3 72966.8 1
243 1 3 73158.3 1
244 1 3 73638.9 1
245 1 3 75891.5 1
246 1 3   78268 1
228 0 4 23711.3 0
229 0 4 24299.5 0
230 0 4 24934.1 0
231 0 4 25196.3 0
232 0 4 25546.4 0
233 0 4 25915.2 0
234 0 4 25691.5 0
235 0 4 25688.7 0
236 0 4 25256.4 0
237 0 4   25460 0
238 0 4 25480.2 0
239 0 4 24622.5 0
240 0 4 25409.7 1
241 0 4 22926.9 1
242 0 4 24290.8 1
243 0 4 25892.1 1
244 0 4   27256 1
245 0 4 28300.7 1
228 0 5 14262.3 0
229 0 5 14384.9 0
230 0 5   14511 0
231 0 5 14757.9 0
232 0 5 15071.6 0
233 0 5 15354.5 0
234 0 5 15579.3 0
235 0 5   15749 0
236 0 5 15826.7 0
237 0 5 16043.4 0
238 0 5 16188.6 0
239 0 5 16457.4 0
240 0 5 16249.2 1
241 0 5 13648.9 1
242 0 5 15131.6 1
243 0 5 16156.5 1
244 0 5 16765.2 1
245 0 5 16360.5 1
246 0 5 17509.3 1
228 1 6 37633.5 0
229 1 6 38801.2 0
230 1 6   39245 0
231 1 6 39889.4 0
232 1 6 40439.1 0
233 1 6 41109.2 0
end
format %tq date

I run:

Code:

xtreg GDP_percapita TRUST_high##post i.date, fe

where TRUST_high is a dummy equal to 1 if the country is a high-trust country and zero otherwise. Post is equal to 1 if the period is within COVID-19 period (from 2020Q1 and forward) and zero otherwise.

Then I get the following output:

Code:

note: 1.TRUST_high omitted because of collinearity.
note: 246.date omitted because of collinearity.

Fixed-effects (within) regression               Number of obs     =        700
Group variable: countryid                       Number of groups  =         37

R-squared:                                      Obs per group:
     Within  = 0.5585                                         min =         18
     Between = 0.3657                                         avg =       18.9
     Overall = 0.0310                                         max =         19

                                                F(19,644)         =      42.88
corr(u_i, Xb) = 0.0633                          Prob > F          =     0.0000

---------------------------------------------------------------------------------
  GDP_percapita | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
----------------+----------------------------------------------------------------
   1.TRUST_high |          0  (omitted)
         1.post |   7996.809    507.212    15.77   0.000      7000.82    8992.798
                |
TRUST_high#post |
           1 1  |   1168.738   318.2788     3.67   0.000     543.7489    1793.728
                |
           date |
           229  |   527.4892   471.2242     1.12   0.263    -397.8322    1452.811
           230  |   994.2324   471.2242     2.11   0.035     68.91102    1919.554
           231  |   1710.951   471.2242     3.63   0.000     785.6299    2636.273
           232  |   2250.219   471.2242     4.78   0.000     1324.898     3175.54
           233  |   2884.216   471.2242     6.12   0.000     1958.895    3809.538
           234  |   2990.186   471.2242     6.35   0.000     2064.865    3915.508
           235  |     3211.1   471.2242     6.81   0.000     2285.779    4136.421
           236  |   3318.389   471.2242     7.04   0.000     2393.068    4243.711
           237  |   3668.522   471.2242     7.79   0.000       2743.2    4593.843
           238  |   3899.641   471.2242     8.28   0.000     2974.319    4824.962
           239  |   4377.276   471.2242     9.29   0.000     3451.954    5302.597
           240  |   -4580.13     482.08    -9.50   0.000    -5526.769   -3633.492
           241  |  -8843.763     482.08   -18.35   0.000    -9790.401   -7897.124
           242  |  -4701.144     482.08    -9.75   0.000    -5647.782   -3754.505
           243  |  -4134.444     482.08    -8.58   0.000    -5081.082   -3187.805
           244  |  -3119.941     482.08    -6.47   0.000     -4066.58   -2173.303
           245  |  -1448.576     482.08    -3.00   0.003    -2395.215   -501.9378
           246  |          0  (omitted)
                |
          _cons |   43646.96   333.2077   130.99   0.000     42992.65    44301.26
----------------+----------------------------------------------------------------
        sigma_u |  19161.688
        sigma_e |  2026.8117
            rho |  .98893561   (fraction of variance due to u_i)
---------------------------------------------------------------------------------
F test that all u_i=0: F(36, 644) = 1463.68                  Prob > F = 0.0000

.

As you can see, the TRUST_high variable is omitted because of collinearity. I just wonder if this is a major problem or if the model is estimated correctly as is? Or is there some other way I should define the xtreg code in this case?

This might be a bit of topic from my original question, but I also wonder if using fixed effects is the most appropriate here or if I should use random effects instead? I computed the Hausman test and it clearly stated that I should use random effects. However, potential relevant control variables I have are (more or less) time invariant, so including them in a RE model would be essentially the same as estimating the FE model without the controls, right?

Thank you.

Tags: None

Maxence Morlet

Join Date: Mar 2021

Posts: 652
#2

20 May 2022, 04:24

Take a look at this thread on the choice between fixed and random effects: https://www.statalist.org/forums/for...d-effect-model.

Also, it makes sense that Stata omits trust_high as it is collinear with the unit fixed effects you've included.

If I were you, I would run

Code:

xtreg GDP_percapita TRUST_high#post i.date, fe cluster(countryid)

However, the use of clustered standard errors here is debatable. It is conventional to use them in the literature, however you have a very small number of observations and you have less than 50 clusters, which may cause standard errors to be biased (Cameron and Miller, 2015). Try the regression with heteroscedasticity-robust standard errors as well.
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#3

20 May 2022, 04:30

DD without a control group isn't a thing. You need 4 time periods for DD: treatment units pre-post and untreated units pre-post.

What are your untreated units here? The entire world was affected by the pandemic. There's no other planet to compare Earth to.

I think you're better off doing propensity score matching.
Comment
Erlend Sunde

Join Date: May 2022

Posts: 6
#4

20 May 2022, 09:26

Thank you for your replies.

Sorry, I see now that I was a bit fast in terms of putting out this question. This is not a standard diff-in-diff approach where the goal is to estimate the effect of the treatment (the COVID-19 shock) per se. This is barely possible, since the pandemic affected the entire globe, as you mention Jared Greathouse. This is more of a "differential diff-in-diff" where the goal is to estimate differences in the effect of treatment, across units (countries) where all countries is hit by the shock, but potentially will be affected in different ways.

As I understand from the thread you linked to Maxence Morlet you would also recommend using fixed effects in this model? And, as much as it makes sense that Stata omits TRUST_high, should it be a problem here?
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17854

20 May 2022, 11:01

Erlend:
in addition to previous helpful replies:
1) set aside that your number of panels is low, if you go -xtreg-, you cannot invoke a -robust-option for heteroskedasticity-corrected standard errors only (that is, without considering clustering your standard errors on -panelid- because of serial correlation of the epsilon error). In sum, both -robust- and -vce(cluster panelid)- options call cluster-robust standard errror under -xtreg-. This is a relevant diffference between -regress- and -xtreg-;
2) if you go default (non default) standard errors, you can compare the -fe- with -re- specifiction via -hausman- (the community-contributed module -xtoverid-);
3) what follows might be due to your data excerpt, but you seem to have a (too small) number of predictors:

Code:

. xtset countryid date

Panel variable: countryid (unbalanced)
 Time variable: date, 2017q1 to 2021q3
         Delta: 1 quarter

. xtreg GDP_percapita TRUST_high##post i.date, fe
note: 1.TRUST_high omitted because of collinearity.
note: 246.date omitted because of collinearity.

Fixed-effects (within) regression               Number of obs     =        100
Group variable: countryid                       Number of groups  =          6

R-squared:                                      Obs per group:
     Within  = 0.7548                                         min =          6
     Between = 0.0195                                         avg =       16.7
     Overall = 0.0098                                         max =         19

                                                F(19,75)          =      12.15
corr(u_i, Xb) = 0.0195                          Prob > F          =     0.0000

---------------------------------------------------------------------------------
  GDP_percapita | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
----------------+----------------------------------------------------------------
   1.TRUST_high |          0  (omitted)
         1.post |   6365.616   748.7638     8.50   0.000     4874.002     7857.23
                |
TRUST_high#post |
           1 1  |  -99.20401   458.7336    -0.22   0.829    -1013.048    814.6403
                |
           date |
           229  |   422.4167   599.0407     0.71   0.483    -770.9335    1615.767
           230  |   791.6167   599.0407     1.32   0.190    -401.7335    1984.967
           231  |   1371.817   599.0407     2.29   0.025     178.4665    2565.167
           232  |     2054.7   599.0407     3.43   0.001     861.3498     3248.05
           233  |    2683.55   599.0407     4.48   0.000       1490.2      3876.9
           234  |   2605.857   633.0209     4.12   0.000     1344.814    3866.899
           235  |   2672.637   633.0209     4.22   0.000     1411.594    3933.679
           236  |   2480.737   633.0209     3.92   0.000     1219.694    3741.779
           237  |   2627.977   633.0209     4.15   0.000     1366.934    3889.019
           238  |   2771.057   633.0209     4.38   0.000     1510.014    4032.099
           239  |   2867.497   633.0209     4.53   0.000     1606.454    4128.539
           240  |  -3770.837   699.6727    -5.39   0.000    -5164.657   -2377.018
           241  |  -8117.157   699.6727   -11.60   0.000    -9510.977   -6723.338
           242  |  -4447.777   699.6727    -6.36   0.000    -5841.597   -3053.958
           243  |  -3781.097   699.6727    -5.40   0.000    -5174.917   -2387.278
           244  |  -3014.717   699.6727    -4.31   0.000    -4408.537   -1620.898
           245  |  -1776.537   699.6727    -2.54   0.013    -3170.357   -382.7179
           246  |          0  (omitted)
                |
          _cons |   41473.34   426.4742    97.25   0.000     40623.76    42322.92
----------------+----------------------------------------------------------------
        sigma_u |  20605.401
        sigma_e |  1037.5688
            rho |  .99747087   (fraction of variance due to u_i)
---------------------------------------------------------------------------------
F test that all u_i=0: F(5, 75) = 5138.04                    Prob > F = 0.0000

. predict fitted, xb

. g sq_fitted=fitted^2

. xtreg GDP_percapita TRUST_high##post i.date fitted sq_fitted , fe
note: 1.TRUST_high omitted because of collinearity.
note: 1.TRUST_high#1.post omitted because of collinearity.
note: 246.date omitted because of collinearity.

Fixed-effects (within) regression               Number of obs     =        100
Group variable: countryid                       Number of groups  =          6

R-squared:                                      Obs per group:
     Within  = 0.8006                                         min =          6
     Between = 0.0232                                         avg =       16.7
     Overall = 0.0107                                         max =         19

                                                F(20,74)          =      14.86
corr(u_i, Xb) = 0.0221                          Prob > F          =     0.0000

---------------------------------------------------------------------------------
  GDP_percapita | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
----------------+----------------------------------------------------------------
   1.TRUST_high |          0  (omitted)
         1.post |   23489.93   27043.83     0.87   0.388    -30396.08    77375.93
                |
TRUST_high#post |
           1 1  |          0  (omitted)
                |
           date |
           229  |  -6182.061   2384.778    -2.59   0.011    -10933.84   -1430.286
           230  |  -10670.36   4246.512    -2.51   0.014    -19131.72   -2209.003
           231  |  -16000.12    6964.89    -2.30   0.024    -29877.97   -2122.278
           232  |  -19572.84   9859.788    -1.99   0.051    -39218.89    73.22069
           233  |  -20280.98   12303.68    -1.65   0.104     -44796.6    4234.651
           234  |  -20327.86    12011.9    -1.69   0.095    -44262.09    3606.367
           235  |   -20290.1   12264.05    -1.65   0.102    -44726.74    4146.546
           236  |  -20323.27   11535.15    -1.76   0.082    -43307.55    2661.018
           237  |  -20318.36   12095.57    -1.68   0.097    -44419.31    3782.589
           238  |   -20183.3   12633.05    -1.60   0.114    -45355.19    4988.601
           239  |   -20020.3   12991.96    -1.54   0.128    -45907.34    5866.748
           240  |  -43819.56   19040.69    -2.30   0.024    -81758.96   -5880.164
           241  |   15683.87   34250.26     0.46   0.648    -52561.25       83929
           242  |  -42327.08   21278.84    -1.99   0.050    -84726.09    71.93684
           243  |  -43818.47   19076.27    -2.30   0.024    -81828.77   -5808.169
           244  |  -42095.23   16238.57    -2.59   0.011     -74451.3   -9739.165
           245  |  -31542.46    10672.2    -2.96   0.004    -52807.29   -10277.64
           246  |          0  (omitted)
                |
         fitted |   276.5893    66.9569     4.13   0.000     143.1748    410.0039
      sq_fitted |  -.0031301    .000759    -4.12   0.000    -.0046424   -.0016178
          _cons |   -6045708    1476271    -4.10   0.000     -8987243    -3104174
----------------+----------------------------------------------------------------
        sigma_u |  20600.802
        sigma_e |  941.90822
            rho |  .99791387   (fraction of variance due to u_i)
---------------------------------------------------------------------------------
F test that all u_i=0: F(5, 74) = 6229.65                    Prob > F = 0.0000

.

As you can see, -sq_fitted- reaches statistical significance; therefore, the functional form of your regressand is misspecified (under some conditions, this means that you have to check your model specification in the right-hand side of your rgeression equation)

Kind regards,
Carlo
(Stata 19.0)

Comment

Maxence Morlet

Join Date: Mar 2021

Posts: 652
#6

20 May 2022, 11:20

Yep two way fixed effects (unit and time). No problem at all that Stata omits this dummy.
Comment

Announcement

Difference-in-difference with fixed effects - omitted because of collinearity

Comment

Comment

Comment

Comment

Comment