Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Understanding -xthdidregress ra- with covariates in Stata 18

    I am trying to understand how -xthdidregress ra- works in StataSE 18. I successfully reproduce the coefficients "by hand" without covariates. However, when I add covariate(s), the estimates do not match. Why is this?

    Reproducible example below using a panel of three states from 2001 to 2005. State 1's treatment begins in 2003. The estimates match do not match when I include the -jobs- covariate. (Note: I use the notation provided on p. 17 of the -xthdidregress- help file.)

    Setup
    Code:
    clear all
    
    input state year gdp post2003 treatmentGroup treated jobs
    1 2001 100 0 1 0 329
    1 2002 115 0 1 0 203
    1 2003 95 1 1 1 215
    1 2004 87 1 1 1 151
    1 2005 73 1 1 1 120
    2 2001 113 0 0 0 415
    2 2002 117 0 0 0 417
    2 2003 121 1 0 0 425
    2 2004 125 1 0 0 429
    2 2005 129 1 0 0 437
    3 2001 47 0 0 0 143
    3 2002 53 0 0 0 142
    3 2003 59 1 0 0 149
    3 2004 62 1 0 0 152
    3 2005 66 1 0 0 155
    end
    
    *Set panel
    xtset state year, yearly
    WITHOUT covariates
    Code:
    . ///*** Stata Command ***///
    
    . xthdidregress ra (gdp) (treated), group(state)
    note: variable _did_cohort, containing cohort indicators formed by treatment variable treated and group variable state, was added to the dataset.
    
    Computing ATET for each cohort and time:
    Cohort 2003 (4): .... done
    
    Treatment and time information
    
    Time variable: year
    Time interval: 2001 to 2005
    Control:       _did_cohort = 0
    Treatment:     _did_cohort > 0
    -------------------------------
                      | _did_cohort
    ------------------+------------
    Number of cohorts |           2
    ------------------+------------
    Number of obs     |
        Never treated |          10
                 2003 |           5
    -------------------------------
    
    Heterogeneous-treatment-effects regression               Number of obs    = 15
                                                             Number of panels =  3
    Estimator:       Regression adjustment
    Panel variable:  state
    Treatment level: state
    Control group:   Never treated
    
                                      (Std. err. adjusted for 3 clusters in state)
    ------------------------------------------------------------------------------
                 |               Robust
    Cohort       |       ATET   std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
            year |
           2002  |         10   .7071068    14.14   0.000     8.614096     11.3859
           2003  |        -25   .7071068   -35.36   0.000     -26.3859    -23.6141
           2004  |      -36.5   .3535534  -103.24   0.000    -37.19295   -35.80705
           2005  |      -54.5   .3535534  -154.15   0.000    -55.19295   -53.80705
    ------------------------------------------------------------------------------
    
    .
    . ///*** Attempt to Reproduce ***///
    
    . *y_t
    . gen y_t = gdp
    
    .
    . *y_g-1
    . gen y_2002 = gdp if year==2002
    (12 missing values generated)
    
    . bysort state (y_2002): replace y_2002 = y_2002[1]
    (12 real changes made)
    
    .
    . *y_t - y_g-1
    . gen dy = y_t - y_2002
    
    .
    .
    . *m_g,t(x)
    . reg dy ib2001.year if treatmentGroup==0
    
          Source |       SS           df       MS      Number of obs   =        10
    -------------+----------------------------------   F(4, 5)         =     95.15
           Model |       380.6         4       95.15   Prob > F        =    0.0001
        Residual |           5         5           1   R-squared       =    0.9870
    -------------+----------------------------------   Adj R-squared   =    0.9767
           Total |       385.6         9  42.8444444   Root MSE        =         1
    
    ------------------------------------------------------------------------------
              dy | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
            year |
           2002  |          5          1     5.00   0.004     2.429418    7.570582
           2003  |         10          1    10.00   0.000     7.429418    12.57058
           2004  |       13.5          1    13.50   0.000     10.92942    16.07058
           2005  |       17.5          1    17.50   0.000     14.92942    20.07058
                 |
           _cons |         -5   .7071068    -7.07   0.001    -6.817676   -3.182324
    ------------------------------------------------------------------------------
    
    . predict mhat, xb
    
    .
    . *Predicted ATET
    . egen atet = mean(y_t - y_2002 - m) if treated==1, by(year)
    (12 missing values generated)
    
    . tab atet year
    
               |               year
          atet |      2003       2004       2005 |     Total
    -----------+---------------------------------+----------
         -54.5 |         0          0          1 |         1
         -36.5 |         0          1          0 |         1
           -25 |         1          0          0 |         1
    -----------+---------------------------------+----------
         Total |         1          1          1 |         3
    WITH -jobs- covariate
    Below uses the same code, except I include -jobs- in -xthdidregress ra- and the regression for -m_g,t(x)-.
    Code:
    . ///*** Stata Command ***///
    
    . xthdidregress ra (gdp jobs) (treated), group(state)
    note: variable _did_cohort, containing cohort indicators formed by treatment variable treated and group variable state, was added to the dataset.
    
    Computing ATET for each cohort and time:
    Cohort 2003 (4): .... done
    
    Treatment and time information
    
    Time variable: year
    Time interval: 2001 to 2005
    Control:       _did_cohort = 0
    Treatment:     _did_cohort > 0
    -------------------------------
                      | _did_cohort
    ------------------+------------
    Number of cohorts |           2
    ------------------+------------
    Number of obs     |
        Never treated |          10
                 2003 |           5
    -------------------------------
    
    Heterogeneous-treatment-effects regression               Number of obs    = 15
                                                             Number of panels =  3
    Estimator:       Regression adjustment
    Panel variable:  state
    Treatment level: state
    Control group:   Never treated
    
                                      (Std. err. adjusted for 3 clusters in state)
    ------------------------------------------------------------------------------
                 |               Robust
    Cohort       |       ATET   std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
            year |
           2002  |   10.36765   1.78e-15  5.8e+15   0.000     10.36765    10.36765
           2003  |  -25.55636   3.55e-15 -7.2e+15   0.000    -25.55636   -25.55636
           2004  |  -36.77818          .        .       .            .           .
           2005  |  -54.77818          .        .       .            .           .
    ------------------------------------------------------------------------------
    Note: ATET computed using covariates.
    
    .
    . ///*** Attempt to Reproduce ***///
    
    . *y_t
    . gen y_t = gdp
    
    .
    . *y_g-1
    . gen y_2002 = gdp if year==2002
    (12 missing values generated)
    
    . bysort state (y_2002): replace y_2002 = y_2002[1]
    (12 real changes made)
    
    .
    . *y_t - y_g-1
    . gen dy = y_t - y_2002
    
    .
    .
    . *m_g,t(x)
    . reg dy ib2001.year jobs if treatmentGroup==0
    
          Source |       SS           df       MS      Number of obs   =        10
    -------------+----------------------------------   F(5, 4)         =     66.56
           Model |  381.020755         5  76.2041511   Prob > F        =    0.0006
        Residual |  4.57924473         4  1.14481118   R-squared       =    0.9881
    -------------+----------------------------------   Adj R-squared   =    0.9733
           Total |       385.6         9  42.8444444   Root MSE        =      1.07
    
    ------------------------------------------------------------------------------
              dy | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
            year |
           2002  |   5.000742   1.069959     4.67   0.009     2.030059    7.971425
           2003  |   10.01187   1.070138     9.36   0.001     7.040695    12.98305
           2004  |   13.51707   1.070329    12.63   0.000     10.54536    16.48878
           2005  |   17.52523   1.070768    16.37   0.000      14.5523    20.49816
                 |
            jobs |  -.0014841   .0024481    -0.61   0.577    -.0082812    .0053129
           _cons |  -4.585923   1.019275    -4.50   0.011    -7.415883   -1.755963
    ------------------------------------------------------------------------------
    
    . predict mhat, xb
    
    .
    . *Predicted ATET
    . egen atet = mean(y_t - y_2002 - m) if treated==1, by(year)
    (12 missing values generated)
    
    . tab atet year
    
               |               year
          atet |      2003       2004       2005 |     Total
    -----------+---------------------------------+----------
     -54.76121 |         0          0          1 |         1
     -36.70704 |         0          1          0 |         1
     -25.10686 |         1          0          0 |         1
    -----------+---------------------------------+----------
         Total |         1          1          1 |         3

  • #2
    Hi Noah,

    Please see the attached file that replicates -xthdidregress- for ATET(2003, 2004). The other ATETs can be computed similarly. One thing to notice is that the covariate $X$ is assumed to be time-invariant. When computing one ATET, the data is trimmed to a traditional 2x2 DID. I hope it helps.
    Attached Files

    Comment

    Working...
    X