X'X Not Full Rank: acreg, Conley SE, and High-Dimensional FE

Ariel Gomez

Join Date: Apr 2020
Posts: 4

X'X Not Full Rank: acreg, Conley SE, and High-Dimensional FE

18 May 2023, 17:03

Hello,

I am trying to use the command acreg (see https://www.stata.com/meeting/switzerland20/slides/Switzerland20_Colella.pdf) in Stata 17 to calculate Conley spatial standard errors. I am getting a warning about a matrix not being full rank and I am not sure if this is something I need to address or can ignore.

Here is a sample of my data using dataex:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(literate male disabled rural_sch young rural_schXyoung lat lon yob) long agg_obj_id_encode float mun_yob
0 1 0 1 0 0 22.1291 -104.657 1900  10  70233
0 1 0 1 0 0 22.1291 -104.657 1900  10  70233
0 1 0 1 0 0 22.1291 -104.657 1900  10  70233
0 0 0 1 0 0 22.1291 -104.657 1905  10  79913
0 0 0 1 0 0 22.1291 -104.657 1905  10  79913
0 1 0 1 0 0 22.1291 -104.657 1908  10  85814
0 0 0 1 1 1 22.1291 -104.657 1922  10 113496
0 1 0 1 1 1 22.1291 -104.657 1924  10 117520
0 0 0 0 0 0 20.9253 -100.738 1890 279  51524
0 1 0 0 0 0 20.9253 -100.738 1894 279  58828
0 0 0 0 0 0 20.9253 -100.738 1902 279  73608
1 1 0 0 0 0 20.9253 -100.738 1905 279  79447
0 0 0 0 0 0 20.9253 -100.738 1905 279  79447
0 1 0 0 0 0 20.9253 -100.738 1905 279  79447
0 1 0 0 0 0 20.9253 -100.738 1905 279  79447
0 0 0 0 0 0 20.9253 -100.738 1906 279  81448
0 1 0 0 0 0 20.9253 -100.738 1907 279  83397
0 1 0 0 0 0 20.9253 -100.738 1907 279  83397
0 1 0 0 0 0 20.9253 -100.738 1908 279  85348
0 1 0 0 1 0 20.9253 -100.738 1921 279 111029
end
label values agg_obj_id_encode agg_obj_id_encode
label def agg_obj_id_encode 10 "18083400013", modify
label def agg_obj_id_encode 279 "cve110030027", modify

I am using cohort variation in a single cross-section to run a DiD analysis with some high dimensional fixed-effects. My regression code is as follows:

Code:

acreg literate male disabled yob_fe* rural_sch young rural_schXyoung, spatial latitude(lat) longitude(lon) dist(5) id(agg_obj_id) time(yob) pfe1(agg_obj_id_encode) pfe2(mun_yob) dropsingletons

I am regressing an indicator for literacy onto some controls, year of birth fixed effects, and am absorbing locality fixed effects (agg_obj_id_encode) and municiplity-by-cohort fixed effects (mun_yob). For reference, municipalities are the next highest administrative level above localities in my context. I also include the locality-level treatment variable indicator (rural_sch), an indicator for being part of the younger cohort exposed to treatment (young), and an interaction between the two. I am ultimately interested in the coefficient and standard errors on rural_schXyoung. (Dummies for year of birth (yob_fe*) are not included in the dataex sample.)

Leaving out most of the ommitted and estimated coefficients on the year of birth fixed effects, this is the output:

Code:

SPATIAL CORRECTION
DistCutoff: 5
LagCutoff:  0
No HAC Correction
Absorbed FE: agg_obj_id_encode and mun_yob
             554 singleton observations dropped
Included instruments: male disabled yob_fe1 yob_fe2 yob_fe3...yob_fe114 yob_fe115 rural_sch young rural_schXyoung
                                                      Number of obs =    22120
Total (centered) SS     =  1600.728742                Centered R2   =   0.0299
Total (uncentered) SS   =  1600.728742                Uncentered R2 =   0.0299
Residual SS             =  1552.847495

---------------------------------------------------------------------------------
       literate | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
----------------+----------------------------------------------------------------
           male |   .0959288   .0049651    19.32   0.000     .0861973    .1056603
       disabled |  -.0824189   .0245307    -3.36   0.001    -.1304982   -.0343396
        yob_fe1 |          0  (omitted)
        yob_fe2 |          0  (omitted)
        yob_fe3 |          0  (omitted)
...
      yob_fe105 |   4860.244   2.19e+08     0.00   1.000    -4.29e+08    4.30e+08
      yob_fe106 |  -4936.966   1.74e+08    -0.00   1.000    -3.40e+08    3.40e+08
      yob_fe107 |   -6484.07   1.88e+08    -0.00   1.000    -3.69e+08    3.69e+08
      yob_fe108 |  -18760.33   1.75e+08    -0.00   1.000    -3.44e+08    3.44e+08
      yob_fe109 |  -46066.02   2.27e+08    -0.00   1.000    -4.45e+08    4.45e+08
      yob_fe110 |          0  (omitted)
      yob_fe111 |          0  (omitted)
      yob_fe112 |          0  (omitted)
      yob_fe113 |          0  (omitted)
      yob_fe114 |          0  (omitted)
      yob_fe115 |          0  (omitted)
      rural_sch |          0  (omitted)
          young |   25670.82   1.49e+08     0.00   1.000    -2.93e+08    2.93e+08
rural_schXyoung |    .075119   .0154737     4.85   0.000     .0447912    .1054468
          _cons |  -5.80e-09   .0003258    -0.00   1.000    -.0006386    .0006386
---------------------------------------------------------------------------------
nb: total SS, model and R2s are after partialling out.
To get the corrected ones use the option correctr2
Warning: X'X matrix not of full rank. Some variables might be omitted.
Beta Coefficients and Standard Errors should be interpreted with caution.

I am concerned about the warning, but the coefficient on the interaction is quantitatively similar to estimates using other commands. My feeling is that the warning is more about the year of birth fixed-effects, which, because of omission and collinearity, are probably not reliable, though these estimates are not of interest for me. If this is the case, does it make sense to ignore the warning? Or is this indicating a problem with the estimation of the standard errors for my variable of interest?

Tags: None

Announcement

X'X Not Full Rank: acreg, Conley SE, and High-Dimensional FE