how to identfy treated, newly treated and control firms in DID analysis

Saleh sharmah

Join Date: Aug 2022

Posts: 120
#1

how to identfy treated, newly treated and control firms in DID analysis

28 Sep 2023, 19:26

Dear members

I have an unbalanced dataset spanning from 2005 to 2020 to assess the effect of a policy mandate on a firm's outcome. The policy mandate is effective in some countries from the sample beginning period (2005) so firms in these countries should be coded as "treated". In another country, the mandate becomes effective during the sample period so firms in these countries should be coded as "newly treated" and firms located in countries that never have the mandate should be coded as zero. I already have the treaedfirm indicator equal to one if the firm operates in a country with a mandate and zero otherwise. However, I need to be able to identify the number of distinct firms that are treated through the sample period (2005-2020) the newly treated firms, and the control distinct firms that have never been treated. I further need to be able to graph them where on the Y-axis it shows the number of observations, and on the x-axis, it shows the type of the firm treatment. I greatly appreciate your help in advance. Here is an example of my dataset:
input str44 Country int Year float(policy_in_effect treatedfirm firm_id)
"United Arab Emirates" 2017 0 0 1
"United Arab Emirates" 2019 0 0 1
"United Arab Emirates" 2018 0 0 1
"United Arab Emirates" 2016 0 0 2
"United Arab Emirates" 2018 0 0 2
"United Arab Emirates" 2015 0 0 2
"United Arab Emirates" 2014 0 0 2
"United Arab Emirates" 2013 0 0 2
"United Arab Emirates" 2017 0 0 2
"United Arab Emirates" 2019 0 0 2
"United States" 2014 0 0 3
"United States" 2011 0 0 3
"United States" 2009 0 0 3
"United States" 2008 0 0 3
"United States" 2010 0 0 3
"United States" 2019 0 0 3
"United States" 2012 0 0 3
"United States" 2006 0 0 3
"United States" 2018 0 0 3
"United States" 2015 0 0 3
"United States" 2005 0 0 3
"United States" 2013 0 0 3
"United States" 2017 0 0 3
"United States" 2007 0 0 3
"United States" 2016 0 0 3
"Argentina" 2015 0 0 4
"Argentina" 2016 0 0 4
"Argentina" 2018 0 0 4
"Argentina" 2017 0 0 4
"Argentina" 2014 0 0 4
"Argentina" 2019 0 0 4
"Argentina" 2020 0 0 5
"Argentina" 2015 0 0 5
"Argentina" 2019 0 0 5
"Argentina" 2017 0 0 5
"Argentina" 2016 0 0 5
"Argentina" 2018 0 0 5
"Argentina" 2019 0 0 6
"Argentina" 2018 0 0 6
"Argentina" 2017 0 0 6
"Argentina" 2016 0 0 6
"Argentina" 2015 0 0 6
"Argentina" 2017 0 0 7
"Argentina" 2015 0 0 7
"Argentina" 2019 0 0 7
"Argentina" 2014 0 0 7
"Argentina" 2016 0 0 7
"Argentina" 2018 0 0 7
"Argentina" 2013 0 0 7
"Argentina" 2010 0 0 7
"Argentina" 2018 0 0 8
"Argentina" 2019 0 0 8
"Austria" 2018 0 0 9
"Austria" 2013 0 0 9
"Austria" 2012 0 0 9
"Austria" 2011 0 0 9
"Austria" 2014 0 0 9
"Austria" 2019 0 0 9
"Austria" 2017 0 0 9
"Austria" 2015 0 0 9
"Austria" 2010 0 0 9
"Austria" 2016 0 0 9
"Austria" 2019 0 0 10
"Austria" 2020 0 0 10
"Austria" 2018 0 0 10
"Austria" 2018 0 0 11
"Austria" 2011 0 0 11
Tags: None

Dimitriy V. Masterov

Join Date: Mar 2014
Posts: 609

29 Sep 2023, 17:37

Your data example is not very interesting since there is no variation in treatment status. I tweaked the last two firms to deviate from the "never treated" pattern. You can do something like this, assuming treatedfirm is the policy dummy:

Code:

clear
input str44 Country int Year float(policy_in_effect treatedfirm firm_id)
"United Arab Emirates" 2017 0 0 1
"United Arab Emirates" 2019 0 0 1
"United Arab Emirates" 2018 0 0 1
"United Arab Emirates" 2016 0 0 2
"United Arab Emirates" 2018 0 0 2
"United Arab Emirates" 2015 0 0 2
"United Arab Emirates" 2014 0 0 2
"United Arab Emirates" 2013 0 0 2
"United Arab Emirates" 2017 0 0 2
"United Arab Emirates" 2019 0 0 2
"United States" 2014 0 0 3
"United States" 2011 0 0 3
"United States" 2009 0 0 3
"United States" 2008 0 0 3
"United States" 2010 0 0 3
"United States" 2019 0 0 3
"United States" 2012 0 0 3
"United States" 2006 0 0 3
"United States" 2018 0 0 3
"United States" 2015 0 0 3
"United States" 2005 0 0 3
"United States" 2013 0 0 3
"United States" 2017 0 0 3
"United States" 2007 0 0 3
"United States" 2016 0 0 3
"Argentina" 2015 0 0 4
"Argentina" 2016 0 0 4
"Argentina" 2018 0 0 4
"Argentina" 2017 0 0 4
"Argentina" 2014 0 0 4
"Argentina" 2019 0 0 4
"Argentina" 2020 0 0 5
"Argentina" 2015 0 0 5
"Argentina" 2019 0 0 5
"Argentina" 2017 0 0 5
"Argentina" 2016 0 0 5
"Argentina" 2018 0 0 5
"Argentina" 2019 0 0 6
"Argentina" 2018 0 0 6
"Argentina" 2017 0 0 6
"Argentina" 2016 0 0 6
"Argentina" 2015 0 0 6
"Argentina" 2017 0 0 7
"Argentina" 2015 0 0 7
"Argentina" 2019 0 0 7
"Argentina" 2014 0 0 7
"Argentina" 2016 0 0 7
"Argentina" 2018 0 0 7
"Argentina" 2013 0 0 7
"Argentina" 2010 0 0 7
"Argentina" 2018 0 0 8
"Argentina" 2019 0 0 8
"Austria" 2018 0 0 9
"Austria" 2013 0 0 9
"Austria" 2012 0 0 9
"Austria" 2011 0 0 9
"Austria" 2014 0 0 9
"Austria" 2019 0 0 9
"Austria" 2017 0 0 9
"Austria" 2015 0 0 9
"Austria" 2010 0 0 9
"Austria" 2016 0 0 9
"Austria" 2019 0 1 10
"Austria" 2020 0 1 10
"Austria" 2018 0 1 10
"Austria" 2018 0 0 11
"Austria" 2011 0 1 11
end

bys firm_id (Year): egen start = min(treatedfirm)
bys firm_id (Year): egen end   = max(treatedfirm)
gen type = start + end
assert inlist(type,0,1,2)
label define type 0 "Never Treated" 1 "Newly Treated" 2 "Always Treated"
lab val type type
egen keeper = tag(firm_id)
graph bar (count) firm_id if keeper == 1, over(type) ylab(#10) ytitle("Firms")
drop start end keeper
bys type: distinct firm_id

Comment

Saleh sharmah

Join Date: Aug 2022

Posts: 120
#3

01 Oct 2023, 17:27

Thank you so much Dimitriy for your helpful codes. I am just wondering how to graph the treatment as the groph below:
Comment

Dimitriy V. Masterov

Join Date: Mar 2014
Posts: 609

02 Oct 2023, 17:44

That's not quite what you asked for. In the future, you should (1) provide an example dataset that captures the essential features of the data and (2) clearly spell out the output you want.

You probably want something like this:

Code:

xtset firm_id year, yearly
gen controls = (treated == 0)
gen newly_treated = (treated == 1 & treated != L1.treated & !missing(L1.treated))
sort year type

collapse (sum) treated controls newly_treated, by(year)
labvarch, after(" ")
lab var newly_treated "Newly Treated"
gen bar_label = string(newly_treated) if newly_treated > 0
tw (line treated controls year, xlab(#30, angle(45)) ylab(#15) ytitle("N Obs.")) (bar newly_treated year, mlabel(bar_label))

Comment

Saleh sharmah

Join Date: Aug 2022

Posts: 120
#5

02 Oct 2023, 19:11

Thank you so much, Dimitriy for your helpful response and useful code. The code worked perfectly well. Thanks again.
Comment
Saleh sharmah

Join Date: Aug 2022

Posts: 120
#6

08 Oct 2023, 21:21

Dear Dimitriy

I have a follow-up inquiry if I may.I am running a staggered DID regression for my study and would like to run a stacked regression as a robustness check for my original result. However, I am not sure how to carry out stacked regression in Stata. Reading the Stata code and other comments from this forum, I have created cohort fixed effects as follows:
egen firmxccohortFE= group( firm_id treated Year)
egen indyearxccohortFE= group( indyear treated Year)
Two concerns that I have:
1- I am unsure if I correctly created the cohort fixed effects
2-How can I run the stacked regression for my staggered DID model?

Here is an example of my data:
input float(magnitude treatedxmandate log_assets indyear firm_id firmxccohortFE indyearxccohortFE)
0 0 13.5533 324 1101 7777 593
0 0 14.067616 40 2431 16460 70
0 0 14.14755 372 297 1775 688
0 0 14.223843 371 297 1774 686
0 0 14.256618 373 297 1776 690
0 0 15.242467 151 2872 19476 273
0 0 15.244625 150 2872 19475 271
0 0 15.34352 152 2872 19477 275
0 0 15.44925 154 2872 19479 279
thanks for your input
Comment

Announcement