Diff in Diff: DRDID and CSDID

Kagiso Matswalela

Join Date: Aug 2023

Posts: 11
#466

25 Mar 2024, 13:16

Dear @FernandoRios,

Excuse me for my lack of knowledge of the csdid and post-estimation processes. I'm learning
Please help:

with csdid STATA commands using firm-level panel data and not-yet-treated as the control group. My data is from 2009-2020 and treatment at various times/ staggered adoption.
- I need to understand by the form of example commands, how do I differentiate between "Conditional and Unconditional" PTA? and how to apply it with my commands. e.g. csdid loglabprod logcapital loglabour lograwmaterials , ivar(firm_id) time(year) gvar(first_treat) notyet. Where would conditional PTA fit in this equation? How do I show which PTA is in the model?

- Also, How do I deal with respective PTA violations where not-yet-treated control is consent? (I'm reading papers by Rambachan & Roth, 2023 and Ryan, Kontopantelis, Linden, and Burgess, 2019) to try and understand. However, the data for the control group used for simulation is never treated, whereas I use not-yet-treated.
- I also read about the "CSDID2", are the guidelines commands for those available on STATA and how different it is from to first CSDID? Would this help me in anyhow?

again, my apologies for these loaded questions. But I'm hoping to get clarity or guidance on a solution.

Thanks and much appreciated.

Last edited by Kagiso Matswalela; 25 Mar 2024, 13:18.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2466
#467

26 Mar 2024, 07:09

Hi Kagiso
1. If you add any controls to the model (here logcapital loglabor and lograwmaterials) you are already using conditional PTA. If you add no controls, then its the Unconditional PTA
2. If PTA fails, it fails. However, there are other methods that allow for some violations of PTA adapting it before moving into estimating treatment effects. CSDID does not have any built in feature for that
3. csdid2 works just as csdid. to show results you need to type estat event, estat attgt, estat group, etc
HTH
Comment
Kagiso Matswalela

Join Date: Aug 2023

Posts: 11
#468

27 Mar 2024, 12:23

Hi Fernandos

Thank you.
So these other methods you speak of would be the DID with propensity score matching; and single/multiple-group interrupted time-series analysis. As stated in Ryan et al. 2019 "Coping with non-parallel trends in difference-in-differences analysis". and to those alternative approaches stated in Roth (2020). "Pre-test with Caution: Event-study Estimates After Testing for Parallel Trends"?.

I'm just trying to figure out how I can explore some alternative options without having to redo or change my research approach.

Thanks for everything.
Comment
Andrew Sylvester

Join Date: Jan 2020

Posts: 6
#469

02 Apr 2024, 04:35

Hi Fernando

I work for the National Health Service (NHS) in England and we are using csdid to assess the impact of a specific intervention at NHS acute hospitals on patient outcomes. The rollout of the intervention was staggered over time making csdid a useful approach for us. We have a very small set of control providers depending on the time period analysed – the preferred time period of analysis would leave only a single provider in the control set – and so we are using the not yet treated option.

The models run and we're able to produce interesting results, but the natural experiment which gave us our dataset has some peculiarities, and it would be very helpful to make sure we are using the model correctly given our unusual dataset:

1) The intervention at the first treatment provider occurred must earlier than the next treated provider. This means that a large part of our 60 months post implementation from the csdid model is driver by a single early implementing provider. To avoid our results being largely driven by a single provider, we’ve used the censored event and truncated at 20 months post-implementation as per the code below:

Code:

csdid LOS, ivar(Group_ID) time(Calendar_Month) gvar(Group_Var) notyet agg(event) saverif(mod) wboot replace rseed(123456) use mod, clear estat pretrend csdid_stats cevent, wboot window(-80 20) rseed(123456)

Unfortunately we can’t share the underlying data or details, but is there anything obviously problematic with the above? Is it appropriate to say that the ATTC the average treatment effect on the treated for the first 20 periods post-implementation?

2) We also get a peculiar result when graphing our results with csdid_plot: the confidence intervals become impossibly large. When we drop the first and the last treatment groups – which coincidentally have only a single treated provider in each – then the confidence intervals on the plot become normal again. Is this a known issue with a solution?

Many thanks

Andrew

Last edited by Andrew Sylvester; 02 Apr 2024, 04:39.
Comment
Andrew Sylvester

Join Date: Jan 2020

Posts: 6
#470

04 Apr 2024, 00:59

Hi FernandoRios

An update on question 2 in my above post, we're able to get meaningful graphs when we use the Rademacher option for wild bootstrap type applied to the model. Is the Rademacher option appropriate to use when sample sizes are small per group?

Many thanks

Andrew
Comment
Sebastian Gravesen

Join Date: Apr 2024

Posts: 3
#471

08 Apr 2024, 07:23

Hello I am attempting to use the csdid command for my paper and have run in to some issues.

I am trying to estimate the effects of the introduction of carbon taxation in119 countries from 1989 to 2019. Since I am dealing with panel data containing multiple countries receiving treatment at different time periods in the dataset, I thought csdid would be appropriate.

When I try to run csdid in stata the output gives 0 observations.

the code I have run is as follows:
csdid lco2 co2price, time(year) gvar(gvar) ivar(id)

lco2 = log of co2 emission
co2price = Average price on emissions covered by a carbon tax
gvar = year for a given countries first year of treatment, otherwise 0 if never treated.
id = numeric country id
year = year (1989-2019)

control : never treated

first lines of stata output:
---------------------------------
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Difference-in-difference with Multiple Time Periods

Number of obs = 0
Outcome model : least squares
Treatment model: inverse probability
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
g1990 |
t_1989_1990 | 0 (omitted)
t_1989_1991 | 0 (omitted)

etc..
----------------------------------

Any help would be greatly appreciated!

Kind Regards,
Sebastian
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2466
#472

08 Apr 2024, 08:44

basic question. do you have drdid installed?
also, can you tab year gvar?
Comment
Sebastian Gravesen

Join Date: Apr 2024

Posts: 3
#473

08 Apr 2024, 13:14

I do have drdid installed

here is tab year gvar, I hope that it is somewhat legible

Year | 0 1990 1991 1992 2008 2010 2011 2012 2013 2014 2015 2017 2018 2019 | Total
-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------+----------
1989 | 118 0 0 0 0 0 0 0 0 0 0 0 0 0 | 118
1990 | 117 1 0 0 0 0 0 0 0 0 0 0 0 0 | 118
1991 | 115 1 2 0 0 0 0 0 0 0 0 0 0 0 | 118
1992 | 114 1 2 1 0 0 0 0 0 0 0 0 0 0 | 118
1993 | 114 1 2 1 0 0 0 0 0 0 0 0 0 0 | 118
1994 | 114 1 2 1 0 0 0 0 0 0 0 0 0 0 | 118
1995 | 114 1 2 1 0 0 0 0 0 0 0 0 0 0 | 118
1996 | 114 1 2 1 0 0 0 0 0 0 0 0 0 0 | 118
1997 | 114 1 2 1 0 0 0 0 0 0 0 0 0 0 | 118
1998 | 114 1 2 1 0 0 0 0 0 0 0 0 0 0 | 118
1999 | 114 1 2 1 0 0 0 0 0 0 0 0 0 0 | 118
2000 | 114 1 2 1 0 0 0 0 0 0 0 0 0 0 | 118
2001 | 114 1 2 1 0 0 0 0 0 0 0 0 0 0 | 118
2002 | 114 1 2 1 0 0 0 0 0 0 0 0 0 0 | 118
2003 | 114 1 2 1 0 0 0 0 0 0 0 0 0 0 | 118
2004 | 114 1 2 1 0 0 0 0 0 0 0 0 0 0 | 118
2005 | 114 1 2 1 0 0 0 0 0 0 0 0 0 0 | 118
2006 | 114 1 2 1 0 0 0 0 0 0 0 0 0 0 | 118
2007 | 114 1 2 1 0 0 0 0 0 0 0 0 0 0 | 118
2008 | 113 1 2 1 1 0 0 0 0 0 0 0 0 0 | 118
2009 | 113 1 2 1 1 0 0 0 0 0 0 0 0 0 | 118
2010 | 112 1 2 1 1 1 0 0 0 0 0 0 0 0 | 118
2011 | 111 1 2 1 1 1 1 0 0 0 0 0 0 0 | 118
2012 | 109 1 2 1 1 1 1 2 0 0 0 0 0 0 | 118
2013 | 108 1 2 1 1 1 1 2 1 0 0 0 0 0 | 118
2014 | 106 1 2 1 1 1 1 2 1 2 0 0 0 0 | 118
2015 | 105 1 2 1 1 1 1 2 1 2 1 0 0 0 | 118
2016 | 105 1 2 1 1 1 1 2 1 2 1 0 0 0 | 118
2017 | 103 1 2 1 1 1 1 2 1 2 1 2 0 0 | 118
2018 | 102 1 2 1 1 1 1 2 1 2 1 2 1 0 | 118
2019 | 100 1 2 1 1 1 1 2 1 2 1 2 1 2 | 118
-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------+----------
Total | 3,461 30 58 28 12 10 9 16 7 12 5 6 2 2 | 3,658
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2466
#474

08 Apr 2024, 13:19

Thatis our answer. GVAR is incorrectly defined.
Gvar has to be such that its value is constant across all years...it cannot go from 0 to the year imputed.
Do something like
by country:egen mgvar = max(gvar)
Comment
Sebastian Gravesen

Join Date: Apr 2024

Posts: 3
#475

08 Apr 2024, 13:47

That seems to have fixed it thank you kindly!
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2466
#476

08 Apr 2024, 14:01

Also no controls! You don’t have enough data to add any controls
Comment
Toby Low

Join Date: Apr 2024

Posts: 2
#477

16 Apr 2024, 05:05

Dear FernandoRios

I hope this message finds you well. I am reaching out with a question on csdid application, to expand on an existing question asked by my colleague Andrew Sylvester in this forum. We are conducting our analysis in collaboration with Dr. Giuseppe Moscelli, and are facing two main challenges and would greatly appreciate your insights.

Context and Challenges:

1. Bootstrap Analysis:
- When bootstrapping our model across the entire analytical period using the `notyet` csdid approach, we encounter very wide and symmetrical confidence intervals.
- Conversely, when we specify `rademacher` as our wildbootstrap type, the confidence intervals significantly narrow and become non-symmetrical. This variance might stem from having an early adopter at the start and a late adopter at the end of our period. Removing these two providers results in more reasonable, non-symmetrical intervals using the default bootstrap method. Dropping only one does not resolve the issue.

2. Desired Analysis Range:
- We aim to focus our analysis on a specific timeframe (month -79 to month 20) and exclude data outside this range from our csdid plots. Despite trying `addplot`, adjusting csdid settings, and using the graph editor, we only managed to achieve this by excluding event time values outside our desired range, which led to an unbalanced panel due to differing calendar months remaining in our dataset for each provider. The results, although slightly different to the cevented results on the model that retains all data, seem usable and allow for the default bootstrapping approach.

Questions:

1. Could you share your thoughts on why the `rademacher` type might yield more realistic intervals in our full-provider model and whether it’s advisable to use this method for our main results?
2. Given the unbalanced panel when we restrict our analysis to the desired months, do you think it's acceptable to use these results, or should we consider an alternative approach? This is the only approach we have found to date that allows us to generate our desired graphical output, but I have some concerns around the implications of the panel now being unbalanced.

Thank you very much for your time and expertise. I apologise for the inability to share code due to data confidentiality.

Kind regards,

Toby
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2466
#478

16 Apr 2024, 07:04

Hi Toby
Couple of thoughts.
1. Im not sure about what is happening with Rademacher WBootstrap. I usually simply apply the default WB options. Have not played around with other Noise multipliers, however, for similar applications, i have played with other options , and they all produce rather similar results.
In other words, I have my doubts that the problem is caused by the choice of multiplier.

It may be that the problem is due to "small" groups used for the estimation of CI. In which case, it is advisable to drop the events that are far in the past of far in the future.

2. So for the event studies constraints you have two options.

a) As you say, restrict the data to cover only specific periods
b) Estimate using all data, but restrict the events estimation only.
The later could be done using
estat event, window(#1 #2)

3. Since you are using so many periods, it may be that CSDID is slow. I would suggest to install csdid2, which has the same syntax, except that you need to explicitly use "estat event" or other aggregations to see the results, To install it type:

net install csdid2, from("https://raw.githubusercontent.com/friosavila/stpackages/main")
HTH
Comment
Toby Low

Join Date: Apr 2024

Posts: 2
#479

16 Apr 2024, 08:31

Thanks FernandoRios, this is very helpful.

We will proceed by reporting the outputs of our unbalanced panel, as this also allows us to utilise the default WB type.

Would you mind answering one follow up question we have?

As our analysis uses the notyettreated csdid model, we are looking to limit our pre and post periods to ensure that we retain a minimum of 5 providers at each Event_Time in our analysis. We run the below lines to ensure this happens.

drop if Event_Time > 20
drop if Event_Time <-79

Below is the code we then run on our remaining unbalanced panel

csdid `depvar', ivar(Group_ID) time(Calendar_Month) gvar(Group_Var) notyet agg(event) saverif(mod) wboot replace rseed(123456)
use mod, clear
estat pretrend
csdid_stats event, wboot rseed(123456)

We then plan to report the outputs shown.

Is there a more optimal way for us to specify our model to ensure that we retain and utilise as many providers as possible in our remaining dataset? We have not previously applied and reported results from an unbalanced panel.

Thanks again for your help.

Kind regards,

Toby
Comment
Carlos GN

Join Date: May 2023

Posts: 3
#480

18 Apr 2024, 15:34

Dear @FernandoRios,

I have a question regarding the overlap assumption in csdid.

In my setup, I have a panel that includes all U.S. counties for the period 2003-2019. The outcome variable is employment. I am also adding some covariates (total population and other characteristics). Around 200 of these counties receive treatment in various years throughout this period, in a way that there are treated counties in each year.

However, I would like to know if it is possible to add state fixed effects in this setting. If there are states that, in a given year, do not have any treated counties, would this violate the overlap assumption?

Also, there is no need to add time fixed effects in this setting, is there?

Thank you in advance!
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment