SDID: Obtain the coefficients and Std. Err. of covariates

Wenhan Yan

Join Date: Oct 2020
Posts: 56

SDID: Obtain the coefficients and Std. Err. of covariates

15 Jul 2024, 09:44

Refer to the Question "How to discern the coefficients and their p-values of covariates using sdid", I have the same issue when using the sdid in Stata, and I tried to following the what Prof. Daniel Daniel PV' method to obtain the coefficients and Standard Error of covariates. I am using "optimized" not "projected" for the type. I find out the result I obtained from these two different type is far different from each other. And is this because of the perfect multicolineality issue in which ultimately causing the coefficent from e(beta) not align with result from reghdfe

Here is my code and result:

Code:

sdid l_wo state date treat, vce(bootstrap) method(sdid) covariates(l_pop l_black  l_unemp l_inc l_HS l_college lgdp, optimized) seed(1234)

Synthetic Difference-in-Differences Estimator

-----------------------------------------------------------------------------
        l_wo |     ATT     Std. Err.     t      P>|t|    [95% Conf. Interval]
-------------+---------------------------------------------------------------
       treat |  -0.03294    0.00791    -4.16    0.000    -0.04845    -0.01743
-----------------------------------------------------------------------------
95% CIs and p-values are based on Large-Sample approximations.
Refer to Arkhangelsky et al., (2020) for theoretical derivations.

Code:

mat list e(beta)

e(beta)[8,1]
                   c1
    l_pop  -.00045378
  l_black   -.0013566
  l_unemp  -.00082513
    l_inc  -.00287189
     l_HS  -.00229133
l_college    .0056747
     lgdp  -.00051779
 adoption         733

Code:

egen W = mean(treat), by(state)

reghdfe l_wo l_pop l_black  l_unemp l_inc l_HS l_college lgdp if W == 0 , abs(state date) cluster(state)
(MWFE estimator converged in 2 iterations)

HDFE Linear regression                            Number of obs   =      2,256
Absorbing 2 HDFE groups                           F(   7,     46) =       1.34
Statistics robust to heteroskedasticity           Prob > F        =     0.2554
                                                  R-squared       =     0.7907
                                                  Adj R-squared   =     0.7810
                                                  Within R-sq.    =     0.0458
Number of clusters (state)   =         47         Root MSE        =     0.0560

                                 (Std. err. adjusted for 47 clusters in state)
------------------------------------------------------------------------------
             |               Robust
        l_wo | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       l_pop |    -.23401   .8383909    -0.28   0.781    -1.921603    1.453583
     l_black |  -8.711286   5.632934    -1.55   0.129    -20.04979    2.627222
     l_unemp |  -.0444142   .0375408    -1.18   0.243    -.1199799    .0311515
       l_inc |   .1109817   .2650816     0.42   0.677    -.4225999    .6445633
        l_HS |  -.2006585    .314647    -0.64   0.527      -.83401    .4326931
   l_college |   .0534295   .1830538     0.29   0.772    -.3150387    .4218977
        lgdp |  -.3214158   .2657447    -1.21   0.233    -.8563321    .2135006
       _cons |   8.264076   11.40091     0.72   0.472    -14.68476    31.21291
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
       state |        47          47           0    *|
        date |        48           1          47     |
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

Tags: None

George Ford

Join Date: Aug 2014

Posts: 3128
#2

16 Jul 2024, 09:11

looks like you have only 1 control in the final estimates. I wonder if you should restrict the sample to it?
Comment
Wenhan Yan

Join Date: Oct 2020

Posts: 56
#3

17 Jul 2024, 08:52

Originally posted by George Ford View Post

looks like you have only 1 control in the final estimates. I wonder if you should restrict the sample to it?

Hello George,

Thanks for the reply, may I ask what do you mean by I only have 1 control in the final estimates? Do you mean the control group or control variable? Like where did the result show? Actually according to Daniel PV, in order to obtian the p-values for the betas from reghdfe, I need to run the regression only on the control units (untreated in the whole period), is this what you mention about only 1 control in the final estimates?

Following betas obtained when optimized change to projected

Code:

mat list e(beta) e(beta)[7,1] c1 l_pop -.23400999 l_black -8.7112855 l_unemp -.04441418 l_inc .1109817 l_HS -.20065846 l_college .05342949 lgdp -.32141575

And I aslo want to ask Prof. Damian Clarke about this question as well.
Comment
George Ford

Join Date: Aug 2014

Posts: 3128
#4

17 Jul 2024, 10:18

Daniel PV knows best. It looks like you have the right result now. I was thinking optimized might be the problem.
Comment

Wenhan Yan

Join Date: Oct 2020
Posts: 56

18 Jul 2024, 01:24

Originally posted by George Ford View Post

Daniel PV knows best. It looks like you have the right result now. I was thinking optimized might be the problem.

Unfortunately, Daniel PV's Last Activity on StataList is 26th Oct 2023, but I wonder why optimized is the problem, I mean which one is the one we should use given two method yield different result

Code:

sdid l_wo state date treat, vce(bootstrap) method(sdid) covariates(l_pop l_black  l_unemp l_inc l_HS l_college lgdp , optimized) seed(1234)
Bootstrap replications (50). This may take some time.
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..................................................     50


Synthetic Difference-in-Differences Estimator

-----------------------------------------------------------------------------
        l_wo |     ATT     Std. Err.     t      P>|t|    [95% Conf. Interval]
-------------+---------------------------------------------------------------
       treat |  -0.03294    0.00791    -4.16    0.000    -0.04845    -0.01743
-----------------------------------------------------------------------------
95% CIs and p-values are based on Large-Sample approximations.
Refer to Arkhangelsky et al., (2020) for theoretical derivations.

Code:

sdid l_wo state date treat, vce(bootstrap) method(sdid) covariates(l_pop l_black  l_unemp l_inc l_HS l_college lgdp , projected) seed(1234)
Bootstrap replications (50). This may take some time.
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..................................................     50


Synthetic Difference-in-Differences Estimator

-----------------------------------------------------------------------------
        l_wo |     ATT     Std. Err.     t      P>|t|    [95% Conf. Interval]
-------------+---------------------------------------------------------------
       treat |  -0.02644    0.00931    -2.84    0.005    -0.04470    -0.00819
-----------------------------------------------------------------------------
95% CIs and p-values are based on Large-Sample approximations.
Refer to Arkhangelsky et al., (2020) for theoretical derivations.

Comment

George Ford

Join Date: Aug 2014

Posts: 3128
#6

18 Jul 2024, 07:41

Optimized isn't a problem, it just uses residuals for the analysis so the coef won't match up. It's presumably possible to get the t-stats on the coefs from the optimized method, but you'd have to use residuals from another model. It may be discernable from the ado file.

The help file discusses this issue and provides some citations. I'd read them and decide what you think. The key is the Kranz paper, which shows that "optimized" can cause problems, and that projected is better (at least under some conditions), but offers no theory as to why.

If including covariates, then at this point (until the issue is resolved) it might make sense to report both.
Comment
Wenhan Yan

Join Date: Oct 2020

Posts: 56
#7

18 Jul 2024, 09:36

Originally posted by George Ford View Post

Optimized isn't a problem, it just uses residuals for the analysis so the coef won't match up. It's presumably possible to get the t-stats on the coefs from the optimized method, but you'd have to use residuals from another model. It may be discernable from the ado file.

The help file discusses this issue and provides some citations. I'd read them and decide what you think. The key is the Kranz paper, which shows that "optimized" can cause problems, and that projected is better (at least under some conditions), but offers no theory as to why.

If including covariates, then at this point (until the issue is resolved) it might make sense to report both.

Hello George,

Thank you for your reply and the explanation I see the reason why the method using reghdfe would work.

I also read the help file and this is why I am a bit strugle about the method. If I can obtain the residuals, should I still use reghdfe to get the p-values? I mean I know the betas, I can use Y_res = Y - beta*X_it to get the residuals.

However, the help file mentions potential issues with the “optimized” method, even though it doesn’t specify the reasons. Since there’s a ready-made code to obtain the p-values , I think I’ll stick with the “projected” method for now, given that the issue remains unresolved.
Comment

Announcement

SDID: Obtain the coefficients and Std. Err. of covariates

Comment

Comment

Comment

Comment

Comment

Comment