DiD approach

Taiba Chau

Join Date: Feb 2022

Posts: 105
#1

DiD approach

20 May 2022, 13:14

I am trying to understand the type of data I have to do a DiD analysis. I am evaluating a new policy aimed at decreasing the rate of elderly illness for those undergoing heart surgery in hospitals. The intervention is deployed in all urban region hospitals in the country on the 1st of March 2018. This data is on weekly illness rates for the elderly from March 2016 till March 2019. Then I have a collection of similar data from the rural region hospitals. Again it is weekly data on illness rates for the elderly between March 2016 and March 2019. I want to apply a DiD design to this.

I am trying to wrap my head around this, does this then mean I have cross-sectional data or panel data? I am trying to relate this to the potential outcomes framework.

One group that receives treatment which is the elderly in the urban hospitals. Then the control group are not subject to the intervention. This will be the elderly in rural hospitals. So I thought this meant I have repeated cross-section data as different individuals are observed in each group at each point in time. Am I correct? Also would this then be the 2x 2 DiD model?

Last edited by Taiba Chau; 20 May 2022, 13:16.
Tags: None
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#2

20 May 2022, 14:09

Panel data (in my opinion) is
1. Observing J >1 units
2. Observing T>1 time periods and
3. Observing the exact same units across the exact same time periods.

group that receives treatment which is the elderly in the urban hospitals. Then the control group are not subject to the intervention. This will be the elderly in rural hospitals. So I thought this meant I have repeated cross-section data as different individuals are observed in each group at each point in time.

Yes, this is what I would call "panel" data, repeated cross section is too much of a mouthful for me, but that's what it is.

From a potential outcomes standpoint, you have an unbiased causal effect if the average expected trends of the outcomes for the rural hospitals and the average outcomes of the urban hospitals would move in une same direction, IF the intervention did not happen. The missing data problem is the counterfactual, or how the outcomes of the treated units would look after March 1 2018 if the treatment didn't happen
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#3

20 May 2022, 14:14

There are

Number of Rural Community Hospitals 1,796

Number of Urban Community Hospitals 3,343

in the United States. And in my opinion, not all of these will be good comparisons to each other. Would it make sense to compare hospitals in Puckwana South Dakota to those in LA, NYC, or Detroit? Likely not. So whatever effect size you DO estimate, you'll likely need to break it down by region (i.e., analysis of hospitals in the northeast vs. Southeast). You could also conduct such an analysis by studying them at the state level.
Comment
Taiba Chau

Join Date: Feb 2022

Posts: 105
#4

20 May 2022, 14:40

Thanks for the thoughts. I thought there was a difference between panel and repeated cross-sectional data. I thought panel data would be the same units observed in each group over time. Whilst repeated cross-section would be observing different units in each group at each point in time.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2175
#5

20 May 2022, 20:19

Taiba: You have repeated cross sections. Or, you can aggregate the data at the hospital level to create a panel. I’d probably try both ways.
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#6

20 May 2022, 20:25

Jeff Wooldridge I'm not understanding the difference between a repeated cross section and a panel dataset. Could you please explain?

I presume it, as you write in your book, is a little like the idiosyncrasies between model and estimator: when I say "OLS model", you and most folks know I in fact mean "a model estimated via OLS", but these, in technical writing anyways, aren't interchangeable.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2175
#7

20 May 2022, 21:33

The data are at the individual level and you don’t (or very rarely) observe the same individuals in the different time period. You effectively observe each individual once. That’s not panel data. It’s pooled or repeated cross sections.

if the unit of observation is the hospital, then it would be panel data.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2175
#8

20 May 2022, 21:51

double post
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#9

21 May 2022, 03:18

Oh I see, that makes perfect sense then
Comment
Taiba Chau

Join Date: Feb 2022

Posts: 105
#10

21 May 2022, 15:40

Thank you that makes a lot of sense! I was wondering with this then would be a 2 x2 model or a model which has 2 groups but multiple time periods based on the data being weekly?

Last edited by Taiba Chau; 21 May 2022, 15:50.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2175
#11

21 May 2022, 20:30

You should use the multiple time periods. I’ll answer more completely tomorrow.
Comment
Taiba Chau

Join Date: Feb 2022

Posts: 105
#12

22 May 2022, 05:25

I was also wondering in a regression focused on the firm level, would it be correct or interesting to include other fixed effects, like state or county fixed effects? I have done this within my regression and found some interesting results. But I was trying to understand the intuition behind it
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#13

22 May 2022, 06:41

You could include them if you want...... but in all honesty, I may have a better idea for you. Honestly if this were my problem, I would use either my synthetic control estimator or my colleague's. When used judiciously, synthetic controls solve SO many problems of difference-in-difference designs and even account for unobserved level confounding.

I'm not saying DD isn't useful or that it isn't even an appropriate design in this instance, I'm saying SCM is a higher, more generalized form of DD that would work much better in this circumstance.
Comment
Taiba Chau

Join Date: Feb 2022

Posts: 105
#14

22 May 2022, 08:16

Thank you for your ideas. I have come across synthetic controls but I was wondering if I were to include the so-called county or state fixed effects, I would no longer have the firm fixed effects.
So the regression would then be

So now there is no firm fixed effect. Instead there is a county fixed effect denoted by a subscript c. The outcome if still for the firm level.
Comment
Taiba Chau

Join Date: Feb 2022

Posts: 105
#15

22 May 2022, 13:34

I was also wondering how to do a coefplot for my regression results for those treated and those not treated. I tried:
quietly reg illness b22233.date##treated if inrange(date,td(14nov2020),td(30nov2020)) |(treated==1),robust
estimates store Treated

quietly reg illness b22233.date##treated if inrange(date,td(14nov2020),td(30nov2020)) |(treated==0),robust
estimates store Untreated

Then I tried to plots:
coefplot Treated Untreated, vertical drop(_cons) xlabel(1 "14 Nov 20" 25 "18 Nov 20" 50 "22 Nov 20" 75 "26 Nov 20" 100 "30 Nov 20", angle(50))

and

coefplot Treated Untreated, vertical drop(_cons) keep(*.t) xlabel(1 "14 Nov 20" 25 "18 Nov 20" 50 "22 Nov 20" 75 "26 Nov 20" 100 "30 Nov 20", angle(50))

Both gave me different results but doesn't seem to be exactly correct

Last edited by Taiba Chau; 22 May 2022, 13:43.
Comment

Number of Rural Community Hospitals	1,796
Number of Urban Community Hospitals	3,343

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment