Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference-in-difference with fixed effects - omitted because of collinearity

    Hi, I am looking at the impact of trust (trust in government, interpersonal trust) on economic outcomes (GDP per capita) during COVID-19. I look at 37 different countries (OECD) over the period 2017Q1-2021Q3. I am comparing what I define as "high-trust countries" and "low-trust" countries to see if there are any significant differences between the groups. To do this, I estimate a difference-in-difference panel data regression with fixed effects. What is somewhat special in my case is that I do not have a control group per se as the treatment is COVID-19 and both groups (high-trust & low-trust) is hit by the "treatment".

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(date TRUST_high countryid) double GDP_percapita float post
    228 1 1 53649.2 0
    229 1 1 53855.6 0
    230 1 1 54274.4 0
    231 1 1 55088.1 0
    232 1 1 56136.4 0
    233 1 1 56950.7 0
    234 1 1 57103.1 0
    235 1 1 57714.9 0
    236 1 1   57951 0
    237 1 1 57788.3 0
    238 1 1   58109 0
    239 1 1 58406.4 0
    240 1 1 57611.9 1
    241 1 1 51087.8 1
    242 1 1 57265.1 1
    243 1 1   56436 1
    244 1 1 56654.7 1
    245 1 1 59708.1 1
    246 1 1 62434.1 1
    228 1 2 47831.2 0
    229 1 2 48261.6 0
    230 1 2 48489.5 0
    231 1 2 48947.6 0
    232 1 2 49694.6 0
    233 1 2 50331.3 0
    234 1 2 50331.4 0
    235 1 2 49947.8 0
    236 1 2 49373.4 0
    237 1 2 49532.3 0
    238 1 2 49305.1 0
    239 1 2 49301.3 0
    240 1 2 48290.8 1
    241 1 2   42721 1
    242 1 2 47049.5 1
    243 1 2 48394.3 1
    244 1 2 49554.3 1
    245 1 2 49799.2 1
    246 1 2 51074.4 1
    228 1 3 68623.6 0
    229 1 3 68642.8 0
    230 1 3 69006.8 0
    231 1 3 70062.7 0
    232 1 3 71151.2 0
    233 1 3 72151.5 0
    234 1 3 71736.2 0
    235 1 3   71675 0
    236 1 3 71408.4 0
    237 1 3 71728.1 0
    238 1 3 72184.6 0
    239 1 3 72962.1 0
    240 1 3 72526.9 1
    241 1 3 67972.3 1
    242 1 3 72966.8 1
    243 1 3 73158.3 1
    244 1 3 73638.9 1
    245 1 3 75891.5 1
    246 1 3   78268 1
    228 0 4 23711.3 0
    229 0 4 24299.5 0
    230 0 4 24934.1 0
    231 0 4 25196.3 0
    232 0 4 25546.4 0
    233 0 4 25915.2 0
    234 0 4 25691.5 0
    235 0 4 25688.7 0
    236 0 4 25256.4 0
    237 0 4   25460 0
    238 0 4 25480.2 0
    239 0 4 24622.5 0
    240 0 4 25409.7 1
    241 0 4 22926.9 1
    242 0 4 24290.8 1
    243 0 4 25892.1 1
    244 0 4   27256 1
    245 0 4 28300.7 1
    228 0 5 14262.3 0
    229 0 5 14384.9 0
    230 0 5   14511 0
    231 0 5 14757.9 0
    232 0 5 15071.6 0
    233 0 5 15354.5 0
    234 0 5 15579.3 0
    235 0 5   15749 0
    236 0 5 15826.7 0
    237 0 5 16043.4 0
    238 0 5 16188.6 0
    239 0 5 16457.4 0
    240 0 5 16249.2 1
    241 0 5 13648.9 1
    242 0 5 15131.6 1
    243 0 5 16156.5 1
    244 0 5 16765.2 1
    245 0 5 16360.5 1
    246 0 5 17509.3 1
    228 1 6 37633.5 0
    229 1 6 38801.2 0
    230 1 6   39245 0
    231 1 6 39889.4 0
    232 1 6 40439.1 0
    233 1 6 41109.2 0
    end
    format %tq date


    I run:

    Code:
    xtreg GDP_percapita TRUST_high##post i.date, fe
    where TRUST_high is a dummy equal to 1 if the country is a high-trust country and zero otherwise. Post is equal to 1 if the period is within COVID-19 period (from 2020Q1 and forward) and zero otherwise.

    Then I get the following output:

    Code:
    note: 1.TRUST_high omitted because of collinearity.
    note: 246.date omitted because of collinearity.
    
    Fixed-effects (within) regression               Number of obs     =        700
    Group variable: countryid                       Number of groups  =         37
    
    R-squared:                                      Obs per group:
         Within  = 0.5585                                         min =         18
         Between = 0.3657                                         avg =       18.9
         Overall = 0.0310                                         max =         19
    
                                                    F(19,644)         =      42.88
    corr(u_i, Xb) = 0.0633                          Prob > F          =     0.0000
    
    ---------------------------------------------------------------------------------
      GDP_percapita | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    ----------------+----------------------------------------------------------------
       1.TRUST_high |          0  (omitted)
             1.post |   7996.809    507.212    15.77   0.000      7000.82    8992.798
                    |
    TRUST_high#post |
               1 1  |   1168.738   318.2788     3.67   0.000     543.7489    1793.728
                    |
               date |
               229  |   527.4892   471.2242     1.12   0.263    -397.8322    1452.811
               230  |   994.2324   471.2242     2.11   0.035     68.91102    1919.554
               231  |   1710.951   471.2242     3.63   0.000     785.6299    2636.273
               232  |   2250.219   471.2242     4.78   0.000     1324.898     3175.54
               233  |   2884.216   471.2242     6.12   0.000     1958.895    3809.538
               234  |   2990.186   471.2242     6.35   0.000     2064.865    3915.508
               235  |     3211.1   471.2242     6.81   0.000     2285.779    4136.421
               236  |   3318.389   471.2242     7.04   0.000     2393.068    4243.711
               237  |   3668.522   471.2242     7.79   0.000       2743.2    4593.843
               238  |   3899.641   471.2242     8.28   0.000     2974.319    4824.962
               239  |   4377.276   471.2242     9.29   0.000     3451.954    5302.597
               240  |   -4580.13     482.08    -9.50   0.000    -5526.769   -3633.492
               241  |  -8843.763     482.08   -18.35   0.000    -9790.401   -7897.124
               242  |  -4701.144     482.08    -9.75   0.000    -5647.782   -3754.505
               243  |  -4134.444     482.08    -8.58   0.000    -5081.082   -3187.805
               244  |  -3119.941     482.08    -6.47   0.000     -4066.58   -2173.303
               245  |  -1448.576     482.08    -3.00   0.003    -2395.215   -501.9378
               246  |          0  (omitted)
                    |
              _cons |   43646.96   333.2077   130.99   0.000     42992.65    44301.26
    ----------------+----------------------------------------------------------------
            sigma_u |  19161.688
            sigma_e |  2026.8117
                rho |  .98893561   (fraction of variance due to u_i)
    ---------------------------------------------------------------------------------
    F test that all u_i=0: F(36, 644) = 1463.68                  Prob > F = 0.0000
    
    . 
    As you can see, the TRUST_high variable is omitted because of collinearity. I just wonder if this is a major problem or if the model is estimated correctly as is? Or is there some other way I should define the xtreg code in this case?

    This might be a bit of topic from my original question, but I also wonder if using fixed effects is the most appropriate here or if I should use random effects instead? I computed the Hausman test and it clearly stated that I should use random effects. However, potential relevant control variables I have are (more or less) time invariant, so including them in a RE model would be essentially the same as estimating the FE model without the controls, right?

    Thank you.

  • #2
    Take a look at this thread on the choice between fixed and random effects: https://www.statalist.org/forums/for...d-effect-model.

    Also, it makes sense that Stata omits trust_high as it is collinear with the unit fixed effects you've included.

    If I were you, I would run
    Code:
    xtreg GDP_percapita TRUST_high#post i.date, fe cluster(countryid)
    However, the use of clustered standard errors here is debatable. It is conventional to use them in the literature, however you have a very small number of observations and you have less than 50 clusters, which may cause standard errors to be biased (Cameron and Miller, 2015). Try the regression with heteroscedasticity-robust standard errors as well.

    Comment


    • #3
      DD without a control group isn't a thing. You need 4 time periods for DD: treatment units pre-post and untreated units pre-post.

      What are your untreated units here? The entire world was affected by the pandemic. There's no other planet to compare Earth to.

      I think you're better off doing propensity score matching.

      Comment


      • #4
        Thank you for your replies.

        Sorry, I see now that I was a bit fast in terms of putting out this question. This is not a standard diff-in-diff approach where the goal is to estimate the effect of the treatment (the COVID-19 shock) per se. This is barely possible, since the pandemic affected the entire globe, as you mention Jared Greathouse. This is more of a "differential diff-in-diff" where the goal is to estimate differences in the effect of treatment, across units (countries) where all countries is hit by the shock, but potentially will be affected in different ways.

        As I understand from the thread you linked to Maxence Morlet you would also recommend using fixed effects in this model? And, as much as it makes sense that Stata omits TRUST_high, should it be a problem here?

        Comment


        • #5
          Erlend:
          in addition to previous helpful replies:
          1) set aside that your number of panels is low, if you go -xtreg-, you cannot invoke a -robust-option for heteroskedasticity-corrected standard errors only (that is, without considering clustering your standard errors on -panelid- because of serial correlation of the epsilon error). In sum, both -robust- and -vce(cluster panelid)- options call cluster-robust standard errror under -xtreg-. This is a relevant diffference between -regress- and -xtreg-;
          2) if you go default (non default) standard errors, you can compare the -fe- with -re- specifiction via -hausman- (the community-contributed module -xtoverid-);
          3) what follows might be due to your data excerpt, but you seem to have a (too small) number of predictors:
          Code:
          . xtset countryid date
          
          Panel variable: countryid (unbalanced)
           Time variable: date, 2017q1 to 2021q3
                   Delta: 1 quarter
          
          . xtreg GDP_percapita TRUST_high##post i.date, fe
          note: 1.TRUST_high omitted because of collinearity.
          note: 246.date omitted because of collinearity.
          
          Fixed-effects (within) regression               Number of obs     =        100
          Group variable: countryid                       Number of groups  =          6
          
          R-squared:                                      Obs per group:
               Within  = 0.7548                                         min =          6
               Between = 0.0195                                         avg =       16.7
               Overall = 0.0098                                         max =         19
          
                                                          F(19,75)          =      12.15
          corr(u_i, Xb) = 0.0195                          Prob > F          =     0.0000
          
          ---------------------------------------------------------------------------------
            GDP_percapita | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
          ----------------+----------------------------------------------------------------
             1.TRUST_high |          0  (omitted)
                   1.post |   6365.616   748.7638     8.50   0.000     4874.002     7857.23
                          |
          TRUST_high#post |
                     1 1  |  -99.20401   458.7336    -0.22   0.829    -1013.048    814.6403
                          |
                     date |
                     229  |   422.4167   599.0407     0.71   0.483    -770.9335    1615.767
                     230  |   791.6167   599.0407     1.32   0.190    -401.7335    1984.967
                     231  |   1371.817   599.0407     2.29   0.025     178.4665    2565.167
                     232  |     2054.7   599.0407     3.43   0.001     861.3498     3248.05
                     233  |    2683.55   599.0407     4.48   0.000       1490.2      3876.9
                     234  |   2605.857   633.0209     4.12   0.000     1344.814    3866.899
                     235  |   2672.637   633.0209     4.22   0.000     1411.594    3933.679
                     236  |   2480.737   633.0209     3.92   0.000     1219.694    3741.779
                     237  |   2627.977   633.0209     4.15   0.000     1366.934    3889.019
                     238  |   2771.057   633.0209     4.38   0.000     1510.014    4032.099
                     239  |   2867.497   633.0209     4.53   0.000     1606.454    4128.539
                     240  |  -3770.837   699.6727    -5.39   0.000    -5164.657   -2377.018
                     241  |  -8117.157   699.6727   -11.60   0.000    -9510.977   -6723.338
                     242  |  -4447.777   699.6727    -6.36   0.000    -5841.597   -3053.958
                     243  |  -3781.097   699.6727    -5.40   0.000    -5174.917   -2387.278
                     244  |  -3014.717   699.6727    -4.31   0.000    -4408.537   -1620.898
                     245  |  -1776.537   699.6727    -2.54   0.013    -3170.357   -382.7179
                     246  |          0  (omitted)
                          |
                    _cons |   41473.34   426.4742    97.25   0.000     40623.76    42322.92
          ----------------+----------------------------------------------------------------
                  sigma_u |  20605.401
                  sigma_e |  1037.5688
                      rho |  .99747087   (fraction of variance due to u_i)
          ---------------------------------------------------------------------------------
          F test that all u_i=0: F(5, 75) = 5138.04                    Prob > F = 0.0000
          
          . predict fitted, xb
          
          . g sq_fitted=fitted^2
          
          . xtreg GDP_percapita TRUST_high##post i.date fitted sq_fitted , fe
          note: 1.TRUST_high omitted because of collinearity.
          note: 1.TRUST_high#1.post omitted because of collinearity.
          note: 246.date omitted because of collinearity.
          
          Fixed-effects (within) regression               Number of obs     =        100
          Group variable: countryid                       Number of groups  =          6
          
          R-squared:                                      Obs per group:
               Within  = 0.8006                                         min =          6
               Between = 0.0232                                         avg =       16.7
               Overall = 0.0107                                         max =         19
          
                                                          F(20,74)          =      14.86
          corr(u_i, Xb) = 0.0221                          Prob > F          =     0.0000
          
          ---------------------------------------------------------------------------------
            GDP_percapita | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
          ----------------+----------------------------------------------------------------
             1.TRUST_high |          0  (omitted)
                   1.post |   23489.93   27043.83     0.87   0.388    -30396.08    77375.93
                          |
          TRUST_high#post |
                     1 1  |          0  (omitted)
                          |
                     date |
                     229  |  -6182.061   2384.778    -2.59   0.011    -10933.84   -1430.286
                     230  |  -10670.36   4246.512    -2.51   0.014    -19131.72   -2209.003
                     231  |  -16000.12    6964.89    -2.30   0.024    -29877.97   -2122.278
                     232  |  -19572.84   9859.788    -1.99   0.051    -39218.89    73.22069
                     233  |  -20280.98   12303.68    -1.65   0.104     -44796.6    4234.651
                     234  |  -20327.86    12011.9    -1.69   0.095    -44262.09    3606.367
                     235  |   -20290.1   12264.05    -1.65   0.102    -44726.74    4146.546
                     236  |  -20323.27   11535.15    -1.76   0.082    -43307.55    2661.018
                     237  |  -20318.36   12095.57    -1.68   0.097    -44419.31    3782.589
                     238  |   -20183.3   12633.05    -1.60   0.114    -45355.19    4988.601
                     239  |   -20020.3   12991.96    -1.54   0.128    -45907.34    5866.748
                     240  |  -43819.56   19040.69    -2.30   0.024    -81758.96   -5880.164
                     241  |   15683.87   34250.26     0.46   0.648    -52561.25       83929
                     242  |  -42327.08   21278.84    -1.99   0.050    -84726.09    71.93684
                     243  |  -43818.47   19076.27    -2.30   0.024    -81828.77   -5808.169
                     244  |  -42095.23   16238.57    -2.59   0.011     -74451.3   -9739.165
                     245  |  -31542.46    10672.2    -2.96   0.004    -52807.29   -10277.64
                     246  |          0  (omitted)
                          |
                   fitted |   276.5893    66.9569     4.13   0.000     143.1748    410.0039
                sq_fitted |  -.0031301    .000759    -4.12   0.000    -.0046424   -.0016178
                    _cons |   -6045708    1476271    -4.10   0.000     -8987243    -3104174
          ----------------+----------------------------------------------------------------
                  sigma_u |  20600.802
                  sigma_e |  941.90822
                      rho |  .99791387   (fraction of variance due to u_i)
          ---------------------------------------------------------------------------------
          F test that all u_i=0: F(5, 74) = 6229.65                    Prob > F = 0.0000
          
          .
          As you can see, -sq_fitted- reaches statistical significance; therefore, the functional form of your regressand is misspecified (under some conditions, this means that you have to check your model specification in the right-hand side of your rgeression equation)
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Yep two way fixed effects (unit and time). No problem at all that Stata omits this dummy.

            Comment

            Working...
            X