Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Category omitted from fixed effect when running regression

    I am having a confusing problem -- when I ran my code last week, everything was normal. Now, not so much.

    I have a crossectional data set (DHS for Colombia), and am running a diff-in-diff specification. I have three years of data (2005, 2010, 2015), and tabbing my "year" variable confirms this. My "post" variable is equal to 1 if year == 2015. I have a "policy" variable which is equal to one for certain geographic departments in the treated group, and my difference in difference variable is did = post*policy.

    When I run a regression, I am including department and year fixed effects. As I also have to cluster at the department level, the power for this group goes away, which should leave three degrees of freedom, one for each year. Now there are only two, and I'm not sure why. This first appeared when I re-ran all of my code from the start of my data cleaning file through to the regressions, but I'm not sure why this would change. When I ran it last week, there were three df, which I noticed when I plotted an event study (all three years showed up, now only two show up).

    Why would this happen? Please let me know if more explanation or code is needed. I am using reghdfe; the following is the basic format: reghdfe y did $controls [pweight = wtvar], absorb(dept year) cluster(dept)

    I actually have seven different outcomes, so I am representing them just by "y". Thank you all!

  • #2
    Maya:
    as per FAQ, please post:
    1) what you typed and what Atata gave you back;
    2) an example/excerpt of your dataset via -dataex-.
    Thanks.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Note: I am
      Last edited by Maya Ward; 12 Mar 2023, 20:13.

      Comment


      • #4
        Note: I am using Stata version 15.1

        Here is a sample using dataex (it wouldn't let me include all of my control variables, as several are categorical):

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input float(fsay_hcR fsay_lgpchR fsay_hhR fsay_famR fsay_cookR fsay_ownwageR pc1R did post policy) byte dept float year str15 caseid
        1 0 1 1 1 1  1.4320685 0 0 0 44 2005 "    00010102 02"
        1 0 0 0 0 1  -1.200748 0 0 0 44 2005 "    00010201 04"
        1 1 1 1 0 1   1.934896 0 0 0 44 2005 "    00010201 05"
        1 0 0 0 0 1  -1.200748 0 0 0 44 2005 "    00010301 02"
        1 1 1 1 1 1   2.597703 0 0 0 44 2005 "    00010301 06"
        1 1 1 1 1 1   2.597703 0 0 0 44 2005 "    00010501 02"
        1 1 1 1 0 1   1.934896 0 0 0 44 2005 "    00010601 01"
        1 0 1 1 1 1  1.4320685 0 0 0 44 2005 "    00010901 04"
        1 1 1 1 0 1   1.934896 0 0 0 44 2005 "    00011101 02"
        0 0 0 0 0 1 -1.9910043 0 0 0 44 2005 "    00020301 03"
        1 1 0 0 0 1 -.03511305 0 0 0 44 2005 "    00020401 02"
        1 1 0 0 0 1 -.03511305 0 0 0 44 2005 "    00020601 03"
        1 1 1 1 1 1   2.597703 0 0 0 44 2005 "    00020701 01"
        1 0 0 0 0 1  -1.200748 0 0 0 44 2005 "    00020701 03"
        1 0 0 1 0 1  -.3033985 0 0 0 44 2005 "    00020701 05"
        end
        label values dept dept_names
        label def dept_names 44 "La Guajira", modify
        What I typed:

        Code:
        . tab year
        
               year |      Freq.     Percent        Cum.
        ------------+-----------------------------------
               2005 |     29,849       43.18       43.18
               2010 |     22,526       32.59       75.77
               2015 |     16,753       24.23      100.00
        ------------+-----------------------------------
              Total |     69,128      100.00
        
        g post = 0;
        replace post = 1 if year == 2015 ;
        
        g policy = 0;
        // replace policy = 1 if policytype == 1;
        replace policy = 1 if
            dept == 11 | // Bogotá
            dept == 5  | // Antioquia    
            dept == 54 | // Norte de Santander, AECID only
            dept == 68 | // Santander
            dept == 50 | // Meta
            dept == 41 | // Huila
            dept == 88 | // San Andrés
            //dept == 47 | // Magdalena
            //dept == 18 | // Caquetá
            dept == 76      // Valle
            ;
        
        reghdfe y did $controls [pweight = wtvar], absorb(dept year) cluster(dept)
        The following is the output for when y = fsay_hcR (Respondent has final say on their own healthcare):

        Code:
        . reghdfe fsay_hcR did $controls [pweight = wtvar], absorb(dept year) cluster(dept) allbaselevels
        note: current_union is probably collinear with the fixed effects (all partialled-out values are close to zero;
        >  tol = 1.0e-09)
        (MWFE estimator converged in 3 iterations)
        note: current_union omitted because of collinearity
        
        HDFE Linear regression                            Number of obs   =     39,279
        Absorbing 2 HDFE groups                           F(  15,     32) =     190.89
        Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                          R-squared       =     0.0470
                                                          Adj R-squared   =     0.0458
                                                          Within R-sq.    =     0.0290
        Number of clusters (dept)    =         33         Root MSE        =     0.4117
        
                                           (Std. Err. adjusted for 33 clusters in dept)
        -------------------------------------------------------------------------------
                      |               Robust
             fsay_hcR |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        --------------+----------------------------------------------------------------
                  did |  -.0712436   .0222167    -3.21   0.003    -.1164976   -.0259895
                  age |    .011764   .0028608     4.11   0.000     .0059367    .0175912
                 age2 |  -.0001951   .0000411    -4.74   0.000    -.0002789   -.0001113
                      |
            wealthinx |
             poorest  |          0  (base)
              poorer  |   .0428757   .0083682     5.12   0.000     .0258302    .0599211
              middle  |   .0465615   .0137567     3.38   0.002     .0185401     .074583
              richer  |   .0496003   .0161687     3.07   0.004     .0166658    .0825349
             richest  |   .0748075   .0226024     3.31   0.002     .0287679    .1208472
                      |
                urban |    .032728   .0124933     2.62   0.013     .0072799     .058176
               eduyrs |   .0105277   .0021442     4.91   0.000       .00616    .0148954
                      |
               edulvl |
        no education  |          0  (base)
             primary  |   .0222386   .0196484     1.13   0.266    -.0177838     .062261
           secondary  |   .0299738   .0231699     1.29   0.205    -.0172217    .0771692
              higher  |   .0243469   .0298719     0.82   0.421    -.0365002     .085194
                      |
              numkids |   .0069555   .0033087     2.10   0.043      .000216     .013695
               jobnow |   .0297152   .0059258     5.01   0.000     .0176447    .0417857
        current_union |          0  (omitted)
            ethnicity |   .0089527   .0035276     2.54   0.016     .0017673    .0161381
                _cons |   .3461327   .0507949     6.81   0.000      .242667    .4495985
        -------------------------------------------------------------------------------
        
        Absorbed degrees of freedom:
        -----------------------------------------------------+
         Absorbed FE | Categories  - Redundant  = Num. Coefs |
        -------------+---------------------------------------|
                dept |        33          33           0    *|
                year |         2           0           2     |
        -----------------------------------------------------+
        * = FE nested within cluster; treated as redundant for DoF computation
        Where I first noticed the issue was in the regression for the event study, code also provided below:

        Code:
        
        g year_policy = policy*year
        
        fvset base 2010 year
        fvset base 0 policy
        fvset base 2010 year_policy
        
        // fvset base 1 con_groups
        // fvset base 1 treat_groups
        
        #delimit ;
        label define coef_treat
            0 "Control"
            2005 "2005"
            2010 "2010"
            2015 "2015" ;
        label values year_policy coef_treat ;
        label var year_policy "Treatment" ;
        
        
        
        . reghdfe pc1R i.year_policy $controls [pweight = wtvar], absorb(dept year)
        >         cluster(dept) baselevels;
        note: 0bn.year_policy is probably collinear with the fixed effects (all partialled-out values are close to zer
        > o; tol = 1.0e-09)
        note: current_union is probably collinear with the fixed effects (all partialled-out values are close to zero;
        >  tol = 1.0e-09)
        (MWFE estimator converged in 3 iterations)
        note: 0.year_policy omitted because of collinearity
        note: current_union omitted because of collinearity
        
        HDFE Linear regression                            Number of obs   =     39,279
        Absorbing 2 HDFE groups                           F(  15,     32) =     170.01
        Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                          R-squared       =     0.0358
                                                          Adj R-squared   =     0.0346
                                                          Within R-sq.    =     0.0183
        Number of clusters (dept)    =         33         Root MSE        =     1.3957
        
                                           (Std. Err. adjusted for 33 clusters in dept)
        -------------------------------------------------------------------------------
                      |               Robust
                 pc1R |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        --------------+----------------------------------------------------------------
          year_policy |
             Control  |          0  (omitted)
                2010  |          0  (base)
                2015  |   .0520408    .040778     1.28   0.211    -.0310212    .1351028
                      |
                  age |   .0451996   .0077022     5.87   0.000     .0295106    .0608885
                 age2 |  -.0004721   .0001196    -3.95   0.000    -.0007157   -.0002286
                      |
            wealthinx |
             poorest  |          0  (base)
              poorer  |   .0744988    .022523     3.31   0.002     .0286209    .1203767
              middle  |    .115128   .0621139     1.85   0.073    -.0113938    .2416498
              richer  |   .0355656   .0702949     0.51   0.616    -.1076205    .1787517
             richest  |   .0043075   .0710721     0.06   0.952    -.1404616    .1490765
                      |
                urban |   .1901572     .03164     6.01   0.000     .1257087    .2546058
               eduyrs |   .0170864   .0090113     1.90   0.067     -.001269    .0354418
                      |
               edulvl |
        no education  |          0  (base)
             primary  |   .0050659   .0747045     0.07   0.946    -.1471021    .1572339
           secondary  |   .0546609   .1041734     0.52   0.603    -.1575333    .2668552
              higher  |  -.0525286   .1211991    -0.43   0.668    -.2994031    .1943459
                      |
              numkids |   .0427943   .0071832     5.96   0.000     .0281626    .0574261
               jobnow |   .0484199   .0186195     2.60   0.014     .0104933    .0863465
        current_union |          0  (omitted)
            ethnicity |   .0218879   .0120949     1.81   0.080    -.0027485    .0465244
                _cons |  -1.676471   .1570277   -10.68   0.000    -1.996326   -1.356616
        -------------------------------------------------------------------------------
        
        Absorbed degrees of freedom:
        -----------------------------------------------------+
         Absorbed FE | Categories  - Redundant  = Num. Coefs |
        -------------+---------------------------------------|
                dept |        33          33           0    *|
                year |         2           0           2     |
        -----------------------------------------------------+
        * = FE nested within cluster; treated as redundant for DoF computation
        Thanks

        Comment


        • #5
          Maya:
          the -reghfdfe- note tells you exactly the reason of the omission (collinearity with the -fe-).
          In addition, your 1-year only data example does not allow interested listers to delve into the issue.
          That said, I'd be more concerned about the low -Within R-sq- that bith regressions report.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Mr. Lazzaro: I do see

            0bn.year_policy is probably collinear with the fixed effects (all partialled-out values are close to zer > o; tol = 1.0e-09)

            which tells me why the "control" category of year_policy is omitted (it is also listed as such in the regression output), but it does not tell me why the "2005" category is not even listed in the regression. My question was really about the second part. Why would a category of a variable not even be listed or shown?

            Comment


            • #7
              Update, trying to figure out my problem: the regression is not even reading in the observations from 2005, but when I do "tab year" or "tab [any variable] year", the data shows up. The observations listed in the regression output (39,279) exactly correspond to the number of observations for years 2010 and 2015. How do I get the regression to also include year = 2005 in the sample?? Did I somehow make it ignore this? How do I reverse it?

              Comment


              • #8
                I just solved the problem! One of my control variables didn't exist for 2005, and was throwing everything off. I need to recode that. Thank you for the comment about the R-squared; when I re-run everything I will be sure to note if that is still an issue.

                Comment


                • #9
                  Did I somehow make it ignore this?
                  Almost certainly. -reghdfe- has been in widespread use for a long time. If there is a bug that would effect something like this, it would almost certainly have become known (and been fixed) early on. Your problem is almost guaranteed to be due to a problem in your data.

                  How do I reverse it?
                  I would say that the most likely reason for the omission of all year 2005 observations is that there is some other variable whose value is always missing when year == 2005. Or perhaps it just happens that for each year 2005 observation there is some model variable with a missing value. Remember that in any regression, any observation with missing value for any regression variable is automatically excluded. Your example data does not exhibit any missing values, but it also does not include all of the variables in your regression. By the way, "model variable" here means every variable mentioned in the regression command, including the pweight, the fixed effects, and the outcome, as well as all of the explanatory variables. I would look into this possibility first.

                  If that doesn't turn up the problem, then I would post back with a more complete -dataex- output that includes all of the regression variables, along with observations from each of the three years.

                  Another issue that may be related is your variable year_policy, which looks mis-specified. You have calculated it as year*policy. This could be an appropriate way to set up an interaction between a dichotomous variable (policy) and a continuous variable (year). But it would then be inappropriate to enter it into the regression as i.year_policy, treating it as a discrete variable. If, on the one hand, your intent is to have an interaction between dichotomous policy and continuous year, don't calculate a new variable. Just enter i.policy##c.year into the model. If, on the other hand, your intent is to treat year as discrete, then, again, don't calculate a new variable; enter i.policy##i.year into the model. (year will be omitted by -reghdfe- as it is also present as an absorbed effect--that's not a problem.)

                  Added: Crossed with #8, which confirms that the problem is what I suspected.
                  Last edited by Clyde Schechter; 19 Mar 2023, 17:34.

                  Comment

                  Working...
                  X