Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • X'X Not Full Rank: acreg, Conley SE, and High-Dimensional FE

    Hello,

    I am trying to use the command acreg (see https://www.stata.com/meeting/switzerland20/slides/Switzerland20_Colella.pdf) in Stata 17 to calculate Conley spatial standard errors. I am getting a warning about a matrix not being full rank and I am not sure if this is something I need to address or can ignore.

    Here is a sample of my data using dataex:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(literate male disabled rural_sch young rural_schXyoung lat lon yob) long agg_obj_id_encode float mun_yob
    0 1 0 1 0 0 22.1291 -104.657 1900  10  70233
    0 1 0 1 0 0 22.1291 -104.657 1900  10  70233
    0 1 0 1 0 0 22.1291 -104.657 1900  10  70233
    0 0 0 1 0 0 22.1291 -104.657 1905  10  79913
    0 0 0 1 0 0 22.1291 -104.657 1905  10  79913
    0 1 0 1 0 0 22.1291 -104.657 1908  10  85814
    0 0 0 1 1 1 22.1291 -104.657 1922  10 113496
    0 1 0 1 1 1 22.1291 -104.657 1924  10 117520
    0 0 0 0 0 0 20.9253 -100.738 1890 279  51524
    0 1 0 0 0 0 20.9253 -100.738 1894 279  58828
    0 0 0 0 0 0 20.9253 -100.738 1902 279  73608
    1 1 0 0 0 0 20.9253 -100.738 1905 279  79447
    0 0 0 0 0 0 20.9253 -100.738 1905 279  79447
    0 1 0 0 0 0 20.9253 -100.738 1905 279  79447
    0 1 0 0 0 0 20.9253 -100.738 1905 279  79447
    0 0 0 0 0 0 20.9253 -100.738 1906 279  81448
    0 1 0 0 0 0 20.9253 -100.738 1907 279  83397
    0 1 0 0 0 0 20.9253 -100.738 1907 279  83397
    0 1 0 0 0 0 20.9253 -100.738 1908 279  85348
    0 1 0 0 1 0 20.9253 -100.738 1921 279 111029
    end
    label values agg_obj_id_encode agg_obj_id_encode
    label def agg_obj_id_encode 10 "18083400013", modify
    label def agg_obj_id_encode 279 "cve110030027", modify
    I am using cohort variation in a single cross-section to run a DiD analysis with some high dimensional fixed-effects. My regression code is as follows:

    Code:
    acreg literate male disabled yob_fe* rural_sch young rural_schXyoung, spatial latitude(lat) longitude(lon) dist(5) id(agg_obj_id) time(yob) pfe1(agg_obj_id_encode) pfe2(mun_yob) dropsingletons
    I am regressing an indicator for literacy onto some controls, year of birth fixed effects, and am absorbing locality fixed effects (agg_obj_id_encode) and municiplity-by-cohort fixed effects (mun_yob). For reference, municipalities are the next highest administrative level above localities in my context. I also include the locality-level treatment variable indicator (rural_sch), an indicator for being part of the younger cohort exposed to treatment (young), and an interaction between the two. I am ultimately interested in the coefficient and standard errors on rural_schXyoung. (Dummies for year of birth (yob_fe*) are not included in the dataex sample.)

    Leaving out most of the ommitted and estimated coefficients on the year of birth fixed effects, this is the output:

    Code:
    SPATIAL CORRECTION
    DistCutoff: 5
    LagCutoff:  0
    No HAC Correction
    Absorbed FE: agg_obj_id_encode and mun_yob
                 554 singleton observations dropped
    Included instruments: male disabled yob_fe1 yob_fe2 yob_fe3...yob_fe114 yob_fe115 rural_sch young rural_schXyoung
                                                          Number of obs =    22120
    Total (centered) SS     =  1600.728742                Centered R2   =   0.0299
    Total (uncentered) SS   =  1600.728742                Uncentered R2 =   0.0299
    Residual SS             =  1552.847495
    
    ---------------------------------------------------------------------------------
           literate | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    ----------------+----------------------------------------------------------------
               male |   .0959288   .0049651    19.32   0.000     .0861973    .1056603
           disabled |  -.0824189   .0245307    -3.36   0.001    -.1304982   -.0343396
            yob_fe1 |          0  (omitted)
            yob_fe2 |          0  (omitted)
            yob_fe3 |          0  (omitted)
    ...
          yob_fe105 |   4860.244   2.19e+08     0.00   1.000    -4.29e+08    4.30e+08
          yob_fe106 |  -4936.966   1.74e+08    -0.00   1.000    -3.40e+08    3.40e+08
          yob_fe107 |   -6484.07   1.88e+08    -0.00   1.000    -3.69e+08    3.69e+08
          yob_fe108 |  -18760.33   1.75e+08    -0.00   1.000    -3.44e+08    3.44e+08
          yob_fe109 |  -46066.02   2.27e+08    -0.00   1.000    -4.45e+08    4.45e+08
          yob_fe110 |          0  (omitted)
          yob_fe111 |          0  (omitted)
          yob_fe112 |          0  (omitted)
          yob_fe113 |          0  (omitted)
          yob_fe114 |          0  (omitted)
          yob_fe115 |          0  (omitted)
          rural_sch |          0  (omitted)
              young |   25670.82   1.49e+08     0.00   1.000    -2.93e+08    2.93e+08
    rural_schXyoung |    .075119   .0154737     4.85   0.000     .0447912    .1054468
              _cons |  -5.80e-09   .0003258    -0.00   1.000    -.0006386    .0006386
    ---------------------------------------------------------------------------------
    nb: total SS, model and R2s are after partialling out.
    To get the corrected ones use the option correctr2
    Warning: X'X matrix not of full rank. Some variables might be omitted.
    Beta Coefficients and Standard Errors should be interpreted with caution.
    I am concerned about the warning, but the coefficient on the interaction is quantitatively similar to estimates using other commands. My feeling is that the warning is more about the year of birth fixed-effects, which, because of omission and collinearity, are probably not reliable, though these estimates are not of interest for me. If this is the case, does it make sense to ignore the warning? Or is this indicating a problem with the estimation of the standard errors for my variable of interest?
Working...
X