Heterogeneous effects csdid

Henry Strawforrd

Join Date: Sep 2021

Posts: 228
#1

Heterogeneous effects csdid

17 Oct 2022, 08:38

Is there a way to test for treatment effect heterogeneity by covariate when using csdid? Say we have a binary variable gender and we want to test H0: ATT_female > ATT_male?

PS: csdid is an amazing user-written command by Fernando Rios-Avila https://friosavila.github.io/playing...ain_csdid.html
Tags: None
Nick Barton

Join Date: Apr 2015

Posts: 19
#2

06 Dec 2022, 07:23

While I don't have a solution for you, this question was raised on twitter and received responses from the authors of the Stata command as well as the paper itself (plus the author of did_imputation as a bonus). See here (not sure if I am permitted to post these links but hope they are helpful): https://twitter.com/mobariz_ahmad/status/1511085669883465746
In this thread is a link to some R code to calculate the difference in ATTs between subgroups and then bootstrap the standard errors for the difference in ATT between the two groups: https://www.dropbox.com/s/tbzz7mqwe6...ity_analysis.R

If someone wants to adapt these for Stata, I guess we would both be grateful!
1 like
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2494
#3

06 Dec 2022, 07:45

Hi There
So as of right now, that isn't completely possible
There are two options tho. You can use CSDID and obtain ATTS for Men and Women separately. Then, assuming both samples are independent, and test for the different between both.
You could also get the RIFS for each one (for the Aggregate ATTs) and again compare them directly
Right now CSDID (specifically csdid_stats) has a new option called -save-.
This allows you to "save" the RIF for any of the aggregations you call far.
So you can simply get those RIFS and then estimate variances covariances with clusters using mean or the other companion command csdid_rif
HTH
Comment
Nick Barton

Join Date: Apr 2015

Posts: 19
#4

17 Feb 2023, 09:11

Sorry for the very slow reply here Fernando and thank you for your input.
When you talk about testing the difference between both groups (you mention Men and Women), how would you approach testing the equality of the ATT in the two subsamples?
I see two options when estimating the ATTs separately but am not really clear which would make more sense:
1) taking the difference between the two ATTs by writing a small program and then bootstrapping the difference between ATTs for a standard error, or
2) a simpler approach with a Z-statistic based on the ATTs and standard errors in each estimation (see Clogg et al, 1995 and Paternoster et al., 1998). This would ignore any covariance between the two estimates but I don't think that "sureg" can be used in this setting for the ATT.
In my case I am actually in a 2x2 case so using the "drdid" command (from SSC), but the same principles should hold as far as I can tell, where I can save the RIF for the ATT using the "stub" option. Is it reasonable to compare the mean RIF for Male and Female given that this is a categorical variable and the RIF is intended for small changes in the distribution (as you say in your Stata journal paper from 2020 on RIFs)? The easiest comparison would be via ttest command "by" a dummy for female.
Comment

FernandoRios

Join Date: Apr 2014
Posts: 2494

17 Feb 2023, 09:33

Hi Nick
Actually, if you are using DRDID, this is far easier. (although you will still need csdid subcommand
here an example:

Code:

use https://friosavila.github.io/playingwithstata/drdid/lalonde.dta, clear
keep if treated==0 | sample==2
** Say you estimate ATT for Black and Nonblack
drdid re age educ married nodegree hisp re74 if black==0, ivar(id) time(year) tr(experimental) stub(nb)
drdid re age educ married nodegree hisp re74 if black==1, ivar(id) time(year) tr(experimental) stub(b)
** Then you can use the RIFs to compare the effects
csdid_rif nbatt batt

------------------------------------------------------------------------------
             |               Robust
             | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
       nbatt |  -1002.956   574.7723    -1.74   0.081    -2129.489    123.5773
        batt |  -237.7852    409.969    -0.58   0.562     -1041.31    565.7392
------------------------------------------------------------------------------

This is similar to the mean command. But follows the Rules behind CSDID to pull together results and address missings
From here just test
test nbatt=batt

 ( 1)  nbatt - batt = 0

           chi2(  1) =    1.17
         Prob > chi2 =    0.2784

Hope this helps

Also, in this case the RIF works perfectly, because its estimated separately for black and white. Or in your case, for women and men.

When RIF fails is when you try to estimate the RIF for everyone, and then test for differences for different groups.
For example:

Code:

** for everyone
drdid re age educ married nodegree hisp re74 black , ivar(id) time(year) tr(experimental) stub(nbn)


. tabstat nbatt batt nbnatt, by(black)

Summary statistics: Mean
Group variable: black 

   black |     nbatt      batt    nbnatt
---------+------------------------------
       0 | -1002.956         . -595.4293
       1 |         . -237.7852  752.8134
---------+------------------------------
   Total | -1002.956 -237.7852 -428.4786
----------------------------------------

As you can see, the effect is different for both groups. And is more positive for blacks compare to whites. But nbnatt suggests that blacks actually benefit from the program, which is not true in this case.
This is the pitfall of trying to use an "unconditional" RIF and estimate separate means across categorical variables.

The first approach, however, is correct, because you are estimating "conditional" RIFS.

HTH
Fernando

Last edited by FernandoRios; 17 Feb 2023, 09:39.

Comment

Nick Barton

Join Date: Apr 2015
Posts: 19

21 Feb 2023, 08:21

Dear Fernando,
thanks for your response, which is incredibly helpful. I have two follow up questions:

1) This one is more for the record in case anyone else gets confused when trying to replicate your code above. Did you run the "keep" command when testing? I get different results and think your results are from the full sample rather than restricted.

Code:

. csdid_rif nbatt batt
------------------------------------------------------------------------------
             |               Robust
             | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
       nbatt |  -696.5067   678.1548    -1.03   0.304    -2025.666    632.6523
        batt |  -933.8973   468.3633    -1.99   0.046    -1851.872   -15.92217
------------------------------------------------------------------------------

2) If I were to run an estimation with clustered standard errors (since treatment is assigned at the village level in my case), then do I also need to include clustering in my csdid_rif command? I assume yes, since the saved RIF is for an ATT and does not keep a record of the standard error of the ATT from the two previous estimations. Just for the sake of example, I could add clustered standard errors by age to your example above

Code:

drdid re age educ married nodegree hisp re74 if black==0, ivar(id) time(year) tr(experimental) stub(nbc) cluster(age) replace
drdid re age educ married nodegree hisp re74 if black==1, ivar(id) time(year) tr(experimental) stub(bc) cluster(age) replace
csdid_rif nbcatt bcatt //this is incorrect and gives the same ATTs as if I had not clustered
------------------------------------------------------------------------------
             |               Robust
             | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
       nbatt |  -696.5067   678.1548    -1.03   0.304    -2025.666    632.6523
        batt |  -933.8973   468.3633    -1.99   0.046    -1851.872   -15.92217
------------------------------------------------------------------------------

csdid_rif nbcatt bcatt, cluster(age)
                                   (Std. err. adjusted for 40 clusters in age)
------------------------------------------------------------------------------
             |               Robust
             | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
      nbcatt |  -696.5067   489.2556    -1.42   0.155     -1655.43    262.4165
       bcatt |  -933.8973   409.5808    -2.28   0.023    -1736.661   -131.1338
------------------------------------------------------------------------------

I think the latter is therefore correct, but would be glad of your expert opinion.
Thanks so much for your help!
Nick

Comment

FernandoRios

Join Date: Apr 2014

Posts: 2494
#7

21 Feb 2023, 08:30

Hi Nick
Yes, I did! or intended to do! I think that was the plan, but just copy it here rather than run it on my computer.
Thank you for pointing that out!

for the second point, you are also correct! csdid_rif allows you to do so.
In fact, if you are interested in it, you could even use Wild bootstrap Standard errors (wboot) that includes clusters.
Perhaps at some point I will go back to csdid_rif and also let the command "FIX" the rifs. So you could even estimate the difference in RIFS directly, thus no further need of testing, since that would be included.

Hope this helps
Fernando
Comment
Nick Barton

Join Date: Apr 2015

Posts: 19
#8

22 Feb 2023, 17:06

Thanks again Fernando!

Moving away from the original question in a DID context, I wonder whether it is possible to do something similar if estimating treatment effects in a cross section from IPWRA with subgroups. Is it possible to calculate the RIF for the ATT from IPWRA? I have looked at the egen rifvar function after estimating the ATT with teffects ipwra, and also considered rifhdreg using the rwlogit option but did not work out how to obtain a RIF for the ATT.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2494
#9

22 Feb 2023, 19:56

Yes its possible, but not easy!
I have it on my to do list, to include an IPWRA estimator. Just haven't done it yet, because requires a bit more math than a simple sit down would allow.
However, there is (i believe) the command dstat by Ben Jann, who also is able to estimate RIFS given data structure. Check it out, and you can produce the RIFS as needed, then use csdid_rif to do aggregations and compare the results!
Hope this was helpful.
Best wishes
Fernando
Comment
Ben Jann

Join Date: Sep 2014

Posts: 269
#10

24 Feb 2023, 02:38

The influence function of IPWRA is discussed in https://ideas.repec.org/p/bss/wpaper/35.html (Section 3.4).
ben
1 like
Comment
Ben Jann

Join Date: Sep 2014

Posts: 269
#11

24 Feb 2023, 02:51

dstat (from SSC) can be used to obtain the influence function (or the RIF) of the IPW estimate of any supported statistic; see the balance() option. Example:

Code:

sysuse nlsw88, clear dstat (mean) wage, over(union) balance(grade hours ttl_exp tenure, ref(1)) rif(RIF*) gen double ATET = RIF2 - RIF1 mean ATET display %9.0g _se[ATET] * sqrt((e(N)-1) / e(N)) teffects ipw (wage) (union grade hours ttl_exp tenure), atet

However, dstat has no support for RA or IPWRA.
ben
Comment

Nick Barton

Join Date: Apr 2015
Posts: 19

#12

24 Feb 2023, 06:29

Dear Ben, thank you very much. I had found that and started to try and implement it using your Mata code and adding a line such that the vector for the IF_ATT is stored as a variable.

Code:

keep if year==2022 & female==0 //to obtain my subsample of interest

mata
 Dnm = "treat"; Xnm = "$controls" 
 Ynm = "outcome" ; Znm = "$controls" 
 N = st_nobs()
 D = st_data(., Dnm); X = st_data(., Xnm), J(N, 1, 1)
 Y = st_data(., Ynm); Z = st_data(., Znm), J(N, 1, 1)
 
 // estimate logit and create weights
 stata("quietly logit " + Dnm + " " + Xnm ) 
 p = invlogit(X * st_matrix("e(b)")')
 w0 = p :/ (1 :- p) :* !D
 st_store(., st_addvar("double", "w0"), w0)
 // estimate regression model
 stata("quietly regress " + Ynm + " " + Znm + " if " + Dnm + "==0 [iw=w0]") 
 Zg0 = Z * st_matrix("e(b)")'
 // compute IF for eta01
 h1 = X :* (D - p)
 G11inv = invsym(cross(X, p :* (1 :- p), X) / N)
 h2 = Z :* w0 :* (Y :- Zg0)
 G21 = cross(-h2, X) / N
 G22inv = invsym(cross(Z, w0, Z) / N)
 eta01 = mean(Zg0, D)
 h3 = D :* (Zg0 :- eta01)
 G32 = colsum(-D :* Z) / N
 IF_eta01 = N/sum(D) * (h3 - (h2 - h1 * G11inv' * G21') * G22inv' * G32')
 // compute IF for eta11
 eta11 = mean(Y, D)

 IF_eta11 = N/sum(D) * D :* (Y :- eta11)
 // compute IF for ATT
 ATT = eta11 - eta01
 ATT
 
 IF_ATT = IF_eta11 - IF_eta01
 st_store(., st_addvar("double", "if_att"), IF_ATT) //added to store the IF for all observations
 // display results (point estimate, mean of IF, standard error)
 (ATT, eta11, eta01)', mean((IF_ATT, IF_eta11, IF_eta01))', sqrt(diagonal(variance((IF_ATT, IF_eta11, IF_eta01)) / N)) * sqrt((N-1)/N)
 
end

I would then calculate the RIF, put all this inside a preserve-restore then save the RIF as a tempfile with the ID and merge back on to my main data so I then have the RIF_ATT for each subsample as separate variables. This would allow me to carry out the same procedure as above for the drdid estimations.

For the Mata code, I am struggling to add pweights (to adjust for sampling) as well as clustering to the Mata version of the code to get equivalent results to the teffects ipwra when I also included them there. Does anyone have advice on how to incorporate this?

Sorry, I am new to working with IFs and RIFs, so am likely overlooking something here. In trying to adjust from the IF to RIF, I followed the principle of adding the statistic of interest (in this case the ATT) to the IF obtained. From my understanding, the mean of the resulting RIF should equal the statistic of interest, which implies that the E(IF)=0. Does that mean I first have to normalise the IF calculated by your Stata code and then add the ATT in order to obtain a useable RIF?

Thanks and best regards,
Nick

Comment

FernandoRios

Join Date: Apr 2014

Posts: 2494
#13

24 Feb 2023, 07:44

Hi Nick!
I may get this code and use it back for DRDID! (again when I have time to plug it in with the rest of the syntax I use)
anyways.
Adding weights is simple, but the code is longish.
1. Standardize weights for each subgroup
w_0 = w :* (trt:==0)
w_1 = w :* (trt:==1)
w_1 = w_1:/mean(w_1)
w_0 = w_0:/mean(w_0)

2. Clustering, if you already save the RIF's, you can simply use csdid_rif to get those clusters right.
otherwise :
1. sort data by cluster
2. create info by cluster (info=panelsetup(cluster,1))
3. Sum the RIFs by cluster srif=panelsum(rif,info)
4. Variance is just the cross product of the srif (divided by N^2)

3. If the IF was correctly obtained , it should have a mean of zero. If not, there is something missing in the code.
Benn can probably give you further advise.

Fernando
Comment
Nick Barton

Join Date: Apr 2015

Posts: 19
#14

24 Feb 2023, 08:05

Sorry, I found an error on my part. My outcome variable has a couple of missings which mean the sample differs between the first and second stage in the mata code. Once I fix the sample, the sum of the IF is indeed very close to zero (as it was in the example in Ben Jann's paper).
Comment
Ben Jann

Join Date: Sep 2014

Posts: 269
#15

24 Feb 2023, 09:18

Adding sampling weights is not difficult. You need to apply them when estimating the logit and then also multiply them into the IPWs and then you also need to take them into account when computing G (or the different parts of G; i.e. the weights need to be included in the cross-products). When analyzing the IF also apply the weights (and, possibly, clustering).
Comment

Announcement