How does htdidregress estimate ATET for repeated cross sections with only one observation per-unit

Kyle Huisman

Join Date: Feb 2023

Posts: 20
#1

How does htdidregress estimate ATET for repeated cross sections with only one observation per-unit

23 May 2025, 11:41

The point estimates for xthdidregress can be obtained by subtracting each unit's outcome in the base period from the treated period, and applying teffects estimators to this transformed outcome. (At least, IPW results can be replicated as such.) However, assume we have only one observation per-unit across a number of years. There is a treatment status indicator for observations in earlier years prior to the treatment, but we don't observe these pre-treatment observations again in the post treatment period. How does hdidregress ipw work in this context? Can anyone give me an example point-estimate replication using logit to calculate propensity scores and regress with IPW weights?

Let me give a concrete example. Say we are trying to estimate the effect of a scholarship on university attendance of students within a district by comparing students who are eligible for the scholarship or would have been if it was in place to ineligible students before and after the program began. Say it began in 2023. The treatment is a scholarship program going into effect, each unit of observation is an individual high school graduate's college attendance outcome within six-months for the graduating cohorts of 2016 to 2024, and treated and untreated observations are grouped based on whether graduating students were eligible post-treatment or would have been eligible pre-treatment if the program had been in effect. I tried replicating the htdidregress atet for the class of 2023 using the following code:

Code:

preserve gen treat_2023 = inlist(graduate_year, 2022, 2023) & eligible_dummy == 1 gen control_2022 = inlist(graduate_year, 2022, 2023) & eligible_dummy == 0 keep if treat_2023 | control_2022 gen treated = 0 replace treated = 1 if graduate_year == 2023 logit eligible_dummy i.FRPL i.sex i.racex if graduate_year == 2022 predict pscore, pr gen ipw = . replace ipw = 1/pscore if treat_2023 == 1 replace ipw = 1/(1 - pscore) if treat_2023 == 0 replace ipw = 1 if treat_2023 == 1 drop if graduate_year < 2022 reg attend_uni i.treat_2023##treated [pw = ipw], vce(robust)

But this isn't quite correct. I actually get an even closer result to htdidregress if I include both years in the 2x2 comparison in the logit, but they still do not exactly match.

Ignore the fact that this is not really a staggered treatment timing case. I am using the command as a convenience tool to incorporate a selection model into the regression. I want to do this replication so that I can check covariate balance diagnostics for each ATET estimate. Can the ATETs be replicated using basic commands here? What am I missing?

Sorry that I cannot give a data example for confidentiality reasons.
Tags: None
Kyle Huisman

Join Date: Feb 2023

Posts: 20
#2

25 May 2025, 05:11

Please let me know if I can provide any additional details to clarify my question. I feel like I am missing something simple here. I've reviewed the estimator equations in the documentation but can't figure out exactly what I am doing wrong here. https://www.stata.com/manuals/causalhdidregress.pdf
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35681
#3

25 May 2025, 05:26

I don't do this kind of thing, but just underline that our longstanding FAQ Advice covers the problem of confidentiality. The advice then is to pose your question in terms of a realistic fake data example that can be used to show your problem or in terms of a dataset anyone can use that does.

Your question very likely makes perfect sense to people who do do this, but among the many reasons people don't get answers here is that they're phrased in terms of results for data we can't access. In addition, it is the weekend, and so on.

Last edited by Nick Cox; 25 May 2025, 05:50.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2466
#4

25 May 2025, 06:20

https://friosavila.github.io/app_met..._metrics2.html
in the aprendiz of these slides I give exact fórmulas I use for csdid
bottom line
youe logit should consider both pre and post treat eme t periods data
Comment
Kyle Huisman

Join Date: Feb 2023

Posts: 20
#5

25 May 2025, 06:29

Apologies, Nick. I did not mean to rush anyone on a Holiday weekend! Here is some code to generate a fake version of my dataset:

clear
set obs 500
gen graduate_year = 2016 + floor((_n - 1) / 200)
gen attend_uni = runiform() > 0.5
gen eligible_dummy = runiform() > 0.5
gen school_dummy = 1 + floor(2 * runiform())
gen FRPL = runiform() > 0.5
gen sex = runiform() > 0.5
gen racex = ceil(3 * runiform())
gen student_id = _n

Note that I can manually replicate ATETs for panel data settings using the following code:

Code:

****************** Htdid regress replication ************************* clear all use "\mpdta.dta" gen treatx = treat replace treatx = 0 if first_treat > year xtset countyreal year xthdidregress ipw (lemp) (treatx lpop), group(first_treat) keep if first_treat == 0 | first_treat ==2004 *keep if year == 2003 | year == 2004 * Can switch to 2004, 05, 06, 07, etc. keep if year == 2003 | year == 2007 frame copy default frame,replace frame copy default frame2,replace frame change frame keep if year == 2003 frame change frame2 keep if year == 2007 rename lemp lemp_2007 rename lpop lpop_2007 frame change frame frlink 1:1 countyreal,frame(frame2) frget lemp_2007 lpop_2007,from(frame2) gen lemp_diff = lemp_2007 - lemp * Replication of ipw htdidregress teffects ipw (lemp_diff) (treat lpop), atet

An r-version of the dataset can be found here.
https://github.com/bcallaway11/did/tree/master/data

Based on the Stata documentation, it seems that hdidregress actually uses the post-treatment period to calculate propensity scores, so I think that's one problem with the original replication attempt I gave, but I still don't get the same results when I correct it.

Let me know what additional details would be helpful.
Comment

Kyle Huisman

Join Date: Feb 2023
Posts: 20

25 May 2025, 06:43

Thanks, Fernando! When I set the logit model in my original replication code to include both periods, my ATET is still a few decimals off from what I get from htdidregress.

Code:

preserve

gen treat_2023 = inlist(graduate_year, 2022, 2023) & eligible_dummy == 1
gen control_2022 = inlist(graduate_year, 2022, 2023) & eligible_dummy == 0

keep if treat_2023 | control_2022 


gen treated = 0
replace treated = 1 if graduate_year == 2023


logit eligible_dummy i.FRPL i.sex i.racex 
predict pscore, pr

gen ipw = .
replace ipw = 1/pscore if treat_2023 == 1
replace ipw = 1/(1 - pscore) if treat_2023 == 0
replace ipw = 1 if treat_2023 == 1


reg attend_uni i.treat_2023##treated i.graduate_year [pw = ipw], vce(robust)

bysort eligible_dummy graduate_year: sum attend_uni [aw = ipw]

restore

I will dig into the slides to see where I am still going wrong and post the corrected code here for future reference.

Comment

Kyle Huisman

Join Date: Feb 2023

Posts: 20
#7

25 May 2025, 06:54

This code does the trick:

preserve

* Step 0: Keep only obs for 2022 and 2023
keep if inlist(graduate_year, 2022, 2023) & inlist(eligible_dummy, 0, 1)

* Step 1: Define treatment and time indicators
gen D = eligible_dummy == 1
gen T = graduate_year == 2023

* Step 2: Estimate propensity score on pooled sample
logit D i.FRPL i.sex i.racex
predict pscore, pr
gen omega = pscore / (1 - pscore)

* Step 3: Define group indicators
gen D1_T1 = (D == 1 & T == 1)
gen D1_T0 = (D == 1 & T == 0)
gen D0_T1 = (D == 0 & T == 1)
gen D0_T0 = (D == 0 & T == 0)

* Step 4: Compute unweighted treated means
summarize attend_uni if D1_T1 == 1, meanonly
local E11 = r(mean)
summarize attend_uni if D1_T0 == 1, meanonly
local E10 = r(mean)

* Step 5: Compute weighted control means
gen wy_01 = attend_uni * omega if D0_T1 == 1
gen wy_00 = attend_uni * omega if D0_T0 == 1

gen omega_01 = omega if D0_T1 == 1
gen omega_00 = omega if D0_T0 == 1

summarize wy_01, meanonly
local num01 = r(sum)
summarize omega_01, meanonly
local den01 = r(sum)
local E01 = `num01' / `den01'

summarize wy_00, meanonly
local num00 = r(sum)
summarize omega_00, meanonly
local den00 = r(sum)
local E00 = `num00' / `den00'

* Step 6: Compute IPW ATET
local ipw_atet = (`E11' - `E10') - (`E01' - `E00')
display "IPW ATET(2023, 2023) = `ipw_atet'"

restore

Thanks for the helpful slides.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2466
#8

25 May 2025, 07:39

Also
stata’s hdidregress does something different than cadid when using regression adjustment
in my code I only run 1 regression
Stata runs 4. Pre post treated not treated
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35681
#9

25 May 2025, 07:57

My very small point about the weekend was that answers are less frequent at weekends, not at all that questions are out of order then.
1 like
Comment

Announcement

How does htdidregress estimate ATET for repeated cross sections with only one observation per-unit

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment