Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Event study using reghdfe - trying to include all Years to get an event study plot

    Hi,

    I am trying to run out a very basic event study looking at the effect of a homeownership treatment variable (rtbphs_81) on marriagerate. The data runs from 1971 to 2001 with observations occurring only decennially, however, I'm trying to obtain an event study plot and observe estimated coefficients on pre-treatment (pre-1981) observations.

    Any help would be greatly appreciated

    Below is an example of my data:
    Code:
    clear
    input str28 LocalAuthority int(Year rtb) double(rtbphs_81 marriagerate)
    "Adur" 1971 0 0 .6717478839674583
    "Adur" 1981 0 0 .626276880304977
    "Adur" 1991 1030 .22543226088859708 .602944264773825
    "Adur" 2001 1513 .3311446706062596 .5539222790418418
    "Allerdale" 1971 0 0 .665259539360605
    "Allerdale" 1981 15 .0012472975220355895 .6304059641155547
    "Allerdale" 1991 1385 .11516713786795277 .6144204677606926
    "Allerdale" 2001 1887 .15691002827207717 .5820694925750728
    "Alnwick" 1971 0 0 .6434144314666913
    "Alnwick" 1981 0 0 .6197158469945355
    "Alnwick" 1991 978 .27580372250423013 .6285081240768094
    "Alnwick" 2001 1452 .40947546531302875 .6109930752282027
    "Amber Valley" 1971 0 0 .7119203696088628
    "Amber Valley" 1981 11 .0012402751155710903 .6645791329225847
    "Amber Valley" 1991 2282 .2573007103393844 .6233453142807048
    "Amber Valley" 2001 3094 .34885556432517756 .578904159247689
    Code:
    reghdfe marriagerate c.rtbphs_81#i.Year, a(lacode Year) cluster(lacode)
    This is the output that I receive from Stata:
    Code:
    reghdfe marriagerate c.rtbphs_81#i.Year, a(lacode Year) cluster(lacode)
    (MWFE estimator converged in 2 iterations)
    note: 1971b.Year#c.rtbphs_81 omitted because of collinearity
    
    HDFE Linear regression Number of obs = 1,416
    Absorbing 2 HDFE groups F( 3, 353) = 6.22
    Statistics robust to heteroskedasticity Prob > F = 0.0004
    R-squared = 0.9380
    Adj R-squared = 0.9169
    Within R-sq. = 0.0210
    Number of clusters (lacode) = 354 Root MSE = 0.0200
    
    (Std. err. adjusted for 354 clusters in lacode)
    ----------------------------------------------------------------------------------
    | Robust
    marriagerate | Coefficient std. err. t P>|t| [95% conf. interval]
    -----------------+----------------------------------------------------------------
    Year#c.rtbphs_81 |
    1971 | 0 (omitted)
    1981 | -.3906885 .4275486 -0.91 0.361 -1.231551 .4501744
    1991 | .0585169 .0170253 3.44 0.001 .0250332 .0920006
    2001 | .0503912 .0250985 2.01 0.045 .0010299 .0997526
    |
    _cons | .6067599 .002923 207.58 0.000 .6010112 .6125086
    ----------------------------------------------------------------------------------
    
    Absorbed degrees of freedom:
    -----------------------------------------------------+
    Absorbed FE | Categories - Redundant = Num. Coefs |
    -------------+---------------------------------------|
    lacode | 354 354 0 *|
    Year | 4 0 4 |
    -----------------------------------------------------+
    * = FE nested within cluster; treated as redundant for DoF computation
    Q: How can I stop 1971 from being omitted and instead categorise 1981 as the base.

    When I try this code:
    Code:
    reghdfe marriagerate c.rtbphs_81#ib1981.Year, a(lacode Year) cluster(lacode)
    I get the same output:
    Code:
    note: 1971.Year#c.rtbphs_81 omitted because of collinearity
    
    HDFE Linear regression Number of obs = 1,416
    Absorbing 2 HDFE groups F( 3, 353) = 6.22
    Statistics robust to heteroskedasticity Prob > F = 0.0004
    R-squared = 0.9380
    Adj R-squared = 0.9169
    Within R-sq. = 0.0210
    Number of clusters (lacode) = 354 Root MSE = 0.0200
    
    (Std. err. adjusted for 354 clusters in lacode)
    ----------------------------------------------------------------------------------
    | Robust
    marriagerate | Coefficient std. err. t P>|t| [95% conf. interval]
    -----------------+----------------------------------------------------------------
    Year#c.rtbphs_81 |
    1971 | 0 (omitted)
    1981 | -.3906885 .4275486 -0.91 0.361 -1.231551 .4501744
    1991 | .0585169 .0170253 3.44 0.001 .0250332 .0920006
    2001 | .0503912 .0250985 2.01 0.045 .0010299 .0997526
    |
    _cons | .6067599 .002923 207.58 0.000 .6010112 .6125086
    ----------------------------------------------------------------------------------
    
    Absorbed degrees of freedom:
    -----------------------------------------------------+
    Absorbed FE | Categories - Redundant = Num. Coefs |
    -------------+---------------------------------------|
    lacode | 354 354 0 *|
    Year | 4 0 4 |
    -----------------------------------------------------+
    * = FE nested within cluster; treated as redundant for DoF computation
    Last edited by Neil Patel; 26 Mar 2023, 18:46.

  • #2
    Neil:
    did you give -xtreg,fe- a shot?
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Carlo:
      Yes, I have tried doing that, even though my dissertation tutor has advised me against using xtreg. This is what I have tried and the output

      Code:
      xtreg marriagerate c.rtbphs_81#i.Year, fe vce(cluster lacode)
      It yields very different results to what I have posted above:
      Code:
      xtreg marriagerate c.rtbphs_81#i.Year, fe vce(cluster lacode)
      note: 1971b.Year#c.rtbphs_81 omitted because of collinearity.
      
      Fixed-effects (within) regression               Number of obs     =      1,416
      Group variable: lacode                          Number of groups  =        354
      
      R-squared:                                      Obs per group:
           Within  = 0.6555                                         min =          4
           Between = 0.0177                                         avg =        4.0
           Overall = 0.2899                                         max =          4
      
                                                      F(3,353)          =     342.26
      corr(u_i, Xb) = -0.1096                         Prob > F          =     0.0000
      
                                         (Std. err. adjusted for 354 clusters in lacode)
      ----------------------------------------------------------------------------------
                       |               Robust
          marriagerate | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
      -----------------+----------------------------------------------------------------
      Year#c.rtbphs_81 |
                 1971  |          0  (omitted)
                 1981  |  -8.868833   1.372009    -6.46   0.000    -11.56717   -6.170493
                 1991  |  -.2245455   .0095392   -23.54   0.000    -.2433063   -.2057847
                 2001  |  -.2961476    .010319   -28.70   0.000     -.316442   -.2758533
                       |
                 _cons |   .6519498   .0014377   453.46   0.000     .6491222    .6547773
      -----------------+----------------------------------------------------------------
               sigma_u |  .05057533
               sigma_e |  .03429548
                   rho |  .68501179   (fraction of variance due to u_i)
      ----------------------------------------------------------------------------------

      Comment


      • #4
        I have also managed to add some data to the dataset which goes further back in time to 1961,

        Code:
        "Adur"                         1961 .6828842206402586                     0
        "Adur"                         1971 .6717478839674583                     0
        "Adur"                         1981  .626276880304977                     0
        "Adur"                         1991  .602944264773825    .22543226088859708
        "Adur"                         2001 .5539222790418418     .3311446706062596
        "Allerdale"                    1961 .6574736172390423                     0
        "Allerdale"                    1971  .665259539360605                     0
        "Allerdale"                    1981 .6304059641155547  .0012472975220355895
        "Allerdale"                    1991 .6144204677606926    .11516713786795277
        "Allerdale"                    2001 .5820694925750728    .15691002827207717
        "Alnwick"                      1961 .6471748564415026                     0
        "Alnwick"                      1971 .6434144314666913                     0
        "Alnwick"                      1981 .6197158469945355                     0
        "Alnwick"                      1991 .6285081240768094    .27580372250423013
        "Alnwick"                      2001 .6109930752282027    .40947546531302875
        I tried performing reghdfe again to see whether it would now include 1971 interacted with the treatment variable (rtbphs81). However, it still omits 1971 and now 1961 coefficients as well which is confusing me:

        Code:
        :
        reghdfe marriagerate c.rtbphs81#i.year, a(lacode year) cluster(lacode)
        (MWFE estimator converged in 2 iterations)
        note: 1961b.year#c.rtbphs81 omitted because of collinearity
        note: 1971.year#c.rtbphs81 omitted because of collinearity
        
        HDFE Linear regression                            Number of obs   =      1,770
        Absorbing 2 HDFE groups                           F(   3,    353) =       5.73
        Statistics robust to heteroskedasticity           Prob > F        =     0.0008
                                                          R-squared       =     0.9051
                                                          Adj R-squared   =     0.8808
                                                          Within R-sq.    =     0.0202
        Number of clusters (lacode)  =        354         Root MSE        =     0.0236
        
                                          (Std. err. adjusted for 354 clusters in lacode)
        ---------------------------------------------------------------------------------
                        |               Robust
           marriagerate | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
        ----------------+----------------------------------------------------------------
        year#c.rtbphs81 |
                  1961  |          0  (omitted)
                  1971  |          0  (omitted)
                  1981  |  -.2481771   .4121412    -0.60   0.547    -1.058738     .562384
                  1991  |   .0746717   .0245354     3.04   0.003     .0264178    .1229255
                  2001  |   .0610035   .0308826     1.98   0.049     .0002665    .1217404
                        |
                  _cons |   .6186601   .0030706   201.48   0.000     .6126211    .6246992
        ---------------------------------------------------------------------------------
        
        Absorbed degrees of freedom:
        -----------------------------------------------------+
         Absorbed FE | Categories  - Redundant  = Num. Coefs |
        -------------+---------------------------------------|
              lacode |       354         354           0    *|
                year |         5           0           5     |
        -----------------------------------------------------+
        * = FE nested within cluster; treated as redundant for DoF computation
        Is there any reason why this may be occurring?

        Thanks,
        Neil

        Comment


        • #5
          Neil:
          1) I fail to get why your supervisor advised you against -xtreg, fe- when what you're doing is totally in line with -xtreg,fe- capabilities. Obviously, a realpolitik approach from your side is called for in this instance (and unfortunately so).
          That said:
          1) I gave your 1st code another shot and the reason why -1971- is unavodably omitted probably rests on the fact that both rtb==0 & rtbphs_81==0 in 1971, As you know, the fixed estimator wipes out all the within-panel time-invariant variables;
          Code:
          . tab Year if rtb==0 & rtbphs_81==0
          
                 Year |      Freq.     Percent        Cum.
          ------------+-----------------------------------
                 1971 |          4       66.67       66.67
                 1981 |          2       33.33      100.00
          ------------+-----------------------------------
                Total |          6      100.00
          
          .
          2) I do not have the time at the moment to make your second code palatable for Stata, but at a very first glance it seems that -1971- and -1961- suffer from the same disease already diagnosed in 1).
          Kind regards,
          Carlo
          (Stata 18.0 SE)

          Comment


          • #6
            Carlo:

            Thank you very much for pointing that out. RTB refers to a homeownership policy that was initiated in 1981, so it didn't exist in 1971 hence why all observations are coded as zero. However, even in 1981 at the policy's inception, treatment was very very limited and in many Local Authorities, 0. Is there a way that I should code this differently to get an estimation for a coefficient on 1971.

            Thank you

            Comment


            • #7
              Neil:
              take a look at -help didregress- if you have a group of treated (that os, those who benefitted from the policy vs those who did not), too.
              If this is not the case, -fe- (no matter the way it s coded) cannot help you out and you should consider -re- , that, in turn, has its own drawbacks (the further restriction about the zero correlation between the -u- error and the vector of predictors sometimes/oftentimes does not hold).
              Kind regards,
              Carlo
              (Stata 18.0 SE)

              Comment

              Working...
              X