Need your kinder help with Staggered Difference-in-Differences model

Saleh sharmah

Join Date: Aug 2022

Posts: 120
#1

Need your kinder help with Staggered Difference-in-Differences model

16 Aug 2023, 20:43

Dear members

I am conducting a cross-country analysis to investigate the effect of mandating IRFS by a country on its public firms' tax avoidance. My independent variable is a country's IRFS_mandate and it is a dummy variable equal to one when a country introduces the mandate in a given year, and zero otherwise. It is important to note that countries across the globe have introduced the IRFS mandate in different years during my sample period, while some countries NEVER mandated the IRFS, and further some countries have mandated the IRFS in all years covered in my sample. Thus, I am going to use the staggered DID model to assess the effect of the IRFS mandate on firms' tax avoidance in across-country settings. The firm tax avoidance is my dependent variable.

I would like to get help in the following areas:
1- I would like to know how to prepare my data in order to use the staggered DID to assess the effect of cross- country's IRFS mandate status on firms' tax avoidance. I would like to be able to identify my treatment group ( firms that operate in countries that have mandated the IRFS) versus the control group ( firms that operate in countries that did not enforce the IRFS mandate.
2- I am thinking of using this DID model as follows: Yi,t = α + β1 IRFS mandate + δXi,t + μi + πt + εi.

Yi,t represents the firm's tax avoidance during a given year (t), xi,t represents controls, μi represents firm FE, π represents year FE, and i cluster at the firm level. is this model adequate to accomplish what I need in this study?

here is an example of my dataset: Thanks so much in advance for your help.

nput str20 Country float firm_id byte IFRS_MADNATE float tax_avoidance
"UNITED ARAB EMIRATES" 1 0 .81
"UNITED ARAB EMIRATES" 1 0 .57
"UNITED ARAB EMIRATES" 1 0 .57
"UNITED ARAB EMIRATES" 1 0 .57
"UNITED ARAB EMIRATES" 1 0 .57
"UNITED ARAB EMIRATES" 1 0 .57
"UNITED ARAB EMIRATES" 1 0 .57
"UNITED ARAB EMIRATES" 2 0 .57
"UNITED ARAB EMIRATES" 2 0 .57
"UNITED ARAB EMIRATES" 2 0 .57
"UNITED ARAB EMIRATES" 3 0 .95
"UNITED ARAB EMIRATES" 4 0 .96
"UNITED ARAB EMIRATES" 4 0 .95
"UNITED ARAB EMIRATES" 4 0 .95
"UNITED ARAB EMIRATES" 4 0 .56
"UNITED ARAB EMIRATES" 4 0 .74
"UNITED ARAB EMIRATES" 4 0 .67
"UNITED ARAB EMIRATES" 4 0 .46
"UNITED ARAB EMIRATES" 5 0 .97
"UNITED ARAB EMIRATES" 5 0 .97
"UNITED ARAB EMIRATES" 6 0 .74
"UNITED STATES" 7 0 .51
"UNITED STATES" 7 0 .55
"UNITED STATES" 7 0 .63
"UNITED STATES" 7 0 .71
"UNITED STATES" 7 0 .65
"UNITED STATES" 7 0 .88
"UNITED STATES" 7 0 .89
"UNITED STATES" 7 0 .89
"UNITED STATES" 7 0 .86
"UNITED STATES" 7 0 .85
"UNITED STATES" 7 0 .86
"UNITED STATES" 7 0 .9
"UNITED STATES" 7 0 .46
"UNITED STATES" 7 0 .45
"UNITED STATES" 7 0 .45
"ARGENTINA" 8 0 .98
"ARGENTINA" 8 0 .98
"ARGENTINA" 8 0 .98
"ARGENTINA" 8 0 .99
"ARGENTINA" 8 0 .99
"ARGENTINA" 8 0 .99
"ARGENTINA" 9 0 .88
"ARGENTINA" 9 0 .53
"ARGENTINA" 9 0 .53999996
"ARGENTINA" 9 0 .55999994
"ARGENTINA" 9 0 .57
"ARGENTINA" 9 0 .98
"ARGENTINA" 10 0 1.2
"ARGENTINA" 10 0 1.33
"ARGENTINA" 10 0 1.43
"ARGENTINA" 10 0 1.4
"ARGENTINA" 11 0 .51
"ARGENTINA" 11 0 .32
"ARGENTINA" 11 0 .31
"ARGENTINA" 11 0 .31
"ARGENTINA" 11 0 .3
"ARGENTINA" 12 0 1.05
"ARGENTINA" 12 0 1.05
"ARGENTINA" 12 0 1.05
"ARGENTINA" 13 0 .97
"ARGENTINA" 13 0 1.01
"ARGENTINA" 13 0 .99
"ARGENTINA" 13 0 .99
"ARGENTINA" 13 0 .45
"ARGENTINA" 13 0 .4
"ARGENTINA" 13 0 .39
"ARGENTINA" 13 0 .39
"ARGENTINA" 14 0 1.02
"ARGENTINA" 14 0 1.04
"AUSTRIA" 15 0 1.06
"AUSTRIA" 15 0 1.06
"AUSTRIA" 15 0 1.06
"AUSTRIA" 15 0 1.03
"AUSTRIA" 15 0 1.03
"AUSTRIA" 15 0 1.01
"AUSTRIA" 15 0 0
"AUSTRIA" 15 0 0
"AUSTRIA" 15 0 1.03
"AUSTRIA" 15 0 1.02
"AUSTRIA" 16 0 .98
"AUSTRIA" 16 0 .99
"AUSTRIA" 16 0 1.01
"AUSTRIA" 17 0 1.1
"AUSTRIA" 17 0 1.1
"AUSTRIA" 17 0 1.1
"AUSTRIA" 17 0 1.1
"AUSTRIA" 17 0 1.17
"AUSTRIA" 17 0 .46
"AUSTRIA" 17 0 .47
"AUSTRIA" 17 0 .5
"AUSTRIA" 17 0 .52
"AUSTRIA" 17 0 .52
"AUSTRIA" 18 0 .49
"AUSTRIA" 18 0 .53
"AUSTRIA" 18 0 .51
"AUSTRIA" 18 0 .51
"AUSTRIA" 18 0 .96
"AUSTRIA" 18 0 .92
"AUSTRIA" 19 0 1.0699999
Tags: None
Saleh sharmah

Join Date: Aug 2022

Posts: 120
#2

16 Aug 2023, 20:51

sorry, I just noticed that I did not include the year in my previous dataset example, here is a more accurate dataset:
input str20 Country int Year float firm_id byte IFRS_MADNATE float tax_avoidance
"UNITED ARAB EMIRATES" 2013 1 0 .81
"UNITED ARAB EMIRATES" 2014 1 0 .57
"UNITED ARAB EMIRATES" 2015 1 0 .57
"UNITED ARAB EMIRATES" 2016 1 0 .57
"UNITED ARAB EMIRATES" 2017 1 0 .57
"UNITED ARAB EMIRATES" 2018 1 0 .57
"UNITED ARAB EMIRATES" 2019 1 0 .57
"UNITED ARAB EMIRATES" 2017 2 0 .57
"UNITED ARAB EMIRATES" 2018 2 0 .57
"UNITED ARAB EMIRATES" 2019 2 0 .57
"UNITED ARAB EMIRATES" 2019 3 0 .95
"UNITED ARAB EMIRATES" 2013 4 0 .96
"UNITED ARAB EMIRATES" 2014 4 0 .95
"UNITED ARAB EMIRATES" 2015 4 0 .95
"UNITED ARAB EMIRATES" 2016 4 0 .56
"UNITED ARAB EMIRATES" 2017 4 0 .74
"UNITED ARAB EMIRATES" 2018 4 0 .67
"UNITED ARAB EMIRATES" 2019 4 0 .46
"UNITED ARAB EMIRATES" 2018 5 0 .97
"UNITED ARAB EMIRATES" 2019 5 0 .97
"UNITED ARAB EMIRATES" 2019 6 0 .74
"UNITED STATES" 2005 7 0 .51
"UNITED STATES" 2006 7 0 .55
"UNITED STATES" 2007 7 0 .63
"UNITED STATES" 2008 7 0 .71
"UNITED STATES" 2009 7 0 .65
"UNITED STATES" 2010 7 0 .88
"UNITED STATES" 2011 7 0 .89
"UNITED STATES" 2012 7 0 .89
"UNITED STATES" 2013 7 0 .86
"UNITED STATES" 2014 7 0 .85
"UNITED STATES" 2015 7 0 .86
"UNITED STATES" 2016 7 0 .9
"UNITED STATES" 2017 7 0 .46
"UNITED STATES" 2018 7 0 .45
"UNITED STATES" 2019 7 0 .45
"ARGENTINA" 2014 8 0 .98
"ARGENTINA" 2015 8 0 .98
"ARGENTINA" 2016 8 0 .98
"ARGENTINA" 2017 8 0 .99
"ARGENTINA" 2018 8 0 .99
"ARGENTINA" 2019 8 0 .99
"ARGENTINA" 2015 9 0 .88
"ARGENTINA" 2016 9 0 .53
"ARGENTINA" 2017 9 0 .53999996
"ARGENTINA" 2018 9 0 .55999994
"ARGENTINA" 2019 9 0 .57
"ARGENTINA" 2020 9 0 .98
"ARGENTINA" 2009 10 0 1.2
"ARGENTINA" 2016 10 0 1.33
"ARGENTINA" 2017 10 0 1.43
"ARGENTINA" 2018 10 0 1.4
"ARGENTINA" 2015 11 0 .51
"ARGENTINA" 2016 11 0 .32
"ARGENTINA" 2017 11 0 .31
"ARGENTINA" 2018 11 0 .31
"ARGENTINA" 2019 11 0 .3
"ARGENTINA" 2018 12 0 1.05
"ARGENTINA" 2019 12 0 1.05
"ARGENTINA" 2020 12 0 1.05
"ARGENTINA" 2010 13 0 .97
"ARGENTINA" 2013 13 0 1.01
"ARGENTINA" 2014 13 0 .99
"ARGENTINA" 2015 13 0 .99
"ARGENTINA" 2016 13 0 .45
"ARGENTINA" 2017 13 0 .4
"ARGENTINA" 2018 13 0 .39
"ARGENTINA" 2019 13 0 .39
"ARGENTINA" 2018 14 0 1.02
"ARGENTINA" 2019 14 0 1.04
"AUSTRIA" 2010 15 0 1.06
"AUSTRIA" 2011 15 0 1.06
"AUSTRIA" 2012 15 0 1.06
"AUSTRIA" 2013 15 0 1.03
"AUSTRIA" 2014 15 0 1.03
"AUSTRIA" 2015 15 0 1.01
"AUSTRIA" 2016 15 0 0
"AUSTRIA" 2017 15 0 0
"AUSTRIA" 2018 15 0 1.03
"AUSTRIA" 2019 15 0 1.02
"AUSTRIA" 2018 16 0 .98
"AUSTRIA" 2019 16 0 .99
"AUSTRIA" 2020 16 0 1.01
"AUSTRIA" 2010 17 0 1.1
"AUSTRIA" 2011 17 0 1.1
"AUSTRIA" 2012 17 0 1.1
"AUSTRIA" 2013 17 0 1.1
"AUSTRIA" 2014 17 0 1.17
"AUSTRIA" 2015 17 0 .46
"AUSTRIA" 2016 17 0 .47
"AUSTRIA" 2017 17 0 .5
"AUSTRIA" 2018 17 0 .52
"AUSTRIA" 2019 17 0 .52
"AUSTRIA" 2014 18 0 .49
"AUSTRIA" 2015 18 0 .53
"AUSTRIA" 2016 18 0 .51
"AUSTRIA" 2017 18 0 .51
"AUSTRIA" 2018 18 0 .96
"AUSTRIA" 2019 18 0 .92
"AUSTRIA" 2016 19 0 1.0699999
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#3

16 Aug 2023, 22:27

The key to DID estimation with an intervention that is introduced at different times in different places is a variable that is 1 for those observations in which the mandate is in place in the country in that year, and 0 otherwise. You do not need to identify "treatment" and "control" groups for this purpose, though that may be useful for descriptive purposes.

From what you describe in #1, I was expecting IFRS_MADNATE to be that key variable. But in your example data it is just always 0, so I don't know if you have this correctly constructed or not. If you do have it right, then the equation you show is almost the correct model. The only error in it is that the final term, the residual error, should be εit, because it is specific to both the country and the time, not just the country.
Comment
Saleh sharmah

Join Date: Aug 2022

Posts: 120
#4

16 Aug 2023, 22:39

Thanks, Clyde for your kind response. Regarding your inquiry, my independent variable is a dummy variable and it happens that the sample data I presented above only show zero, whereas other observations, not shown here have 1. My question now is how to regress the DID in Stata given the model and data provided above.

Thank you for your support.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#5

17 Aug 2023, 08:39

Again, on the assumption that your variable IFRS_MADNATE is correctly constructed, the regression command to estimate your model in Stata is:

Code:

encode Country, gen(country) xtset country Year xtreg tax_avoidance i.IFRS_MADNATE i.Year, fe vce(cluster country)

The coefficient of 1.IFRS_MADNATE is your generalized difference in differences estimate of the effect of IFRS_MADNATE on tax_avoidance.
Comment
Saleh sharmah

Join Date: Aug 2022

Posts: 120
#6

21 Aug 2023, 17:43

Thank you so much Clyde for your kind help.
Comment
Saleh sharmah

Join Date: Aug 2022

Posts: 120
#7

21 Aug 2023, 19:04

just a follow-up question if I could. when I declare my data, Stata output shows that
Panel variable: firm_id (unbalanced)
Time variable: Year, 2005 to 2020, but with gaps
Delta: 1 unit.

My question is what is the appropriate stata command to use in this case: reg or xtreg when I run the difference in difference estimation?

my supervisor wants me to use this command:reg tax_avoidance IRFS_madnate treatment_group IRFS_madnate* treatment_group,r where treatment group is firms that reside in countries with IRFS mandate, but Stata gives me the following output:

reg tax_avoidance IRFS_madnate treatment_group IRFS_madnate* treatment_group,r
note: treatment_group omitted because of collinearity.
note: IRFS_madnate omitted because of collinearity.
note: treatment_group omitted because of collinearity.

Linear regression Number of obs = 37,554
F(1, 37552) = 71.03
Prob > F = 0.0000
R-squared = 0.0020
Root MSE = .33685

---------------------------------------------------------------------------------
| Robust
tax_avoidance | Coefficient std. err. t P>|t| [95% conf. interval]
----------------+----------------------------------------------------------------
IRFS_madnate | -.0326883 .0038785 -8.43 0.000 -.0402903 -.0250862
treatment_group | 0 (omitted)
IRFS_madnate | 0 (omitted)
treatment_group | 0 (omitted)
_cons | .7140223 .003298 216.50 0.000 .7075582 .7204864

how can I construct the correct treatment group and the time of treatment ?

thanks again for your enlighten
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#8

22 Aug 2023, 08:59

The proper construction of your variables depends on whether all of the firms that were subject to the mandate became subject to it at the same time or not. Whether -reg- or -xtreg- is more appropriate for your data also depends on this same thing. If you post back with that, advice can be given.

With regard to the strange outputs you are getting from reg, this is because your syntax is wrong. You cannot put IRFS_madnate* treatment_group in a -regress- command because the * is not valid syntax in that context. The correct syntax for that command would be:

Code:

reg tax_avoidance IRFS_madnate treatment_group IRFS_madnate#treatment_group,r
Comment
Saleh sharmah

Join Date: Aug 2022

Posts: 120
#9

23 Aug 2023, 06:25

Thanks Clyde for your kind response. Just a follow up question if I may: suppose that the UK introduces the mandate in 2008 and my sample goes from 2005-2020. I have a variable name policy1year in stata, which only shows that year 2008 ( the policy implementation year). My question now is: I would like to creat a treatment =1 for UK yearly- firm observations from 2008 onwards to 2020 and zero for 2007,2006,2005. Please note that my current variable in stata is policy1year for the UK only indicates 2008. Hence I would like to fill these years to create the treatment variable.
here is an short example of my data:
Here is an exmaple of my dataset:
nput str28 Country int(Policy1Year Year)
"ITALY" 2007 2005
"SWEDEN" 2005 2005
"BRAZIL" . 2005
"SWEDEN" 2005 2005
"FRANCE" 2005 2005
"UNITED STATES" . 2005
"JAPAN" 2005 2005
"JAPAN" 2005 2005
"SOUTH KOREA" 2012 2005
"GERMANY" 2005 2005
"UNITED STATES" . 2005
"FRANCE" 2005 2005
"UNITED STATES" . 2005
"INDIA" 2005 2005
"NETHERLANDS" 2006 2005
"FRANCE" 2005 2005
"UNITED STATES" . 2005
"JAPAN" 2005 2005
"UNITED KINGDOM" 2008 2005
"AUSTRALIA" 2005 2005
"CANADA" 2005 2005
"UNITED STATES" . 2005
"CANADA" 2005 2005
"AUSTRALIA" 2005 2005
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#10

23 Aug 2023, 10:03

Thank you for the additional information. The key here is that different countries implemented the policies in different years. So the key variable is one that is 1 at and after the year the policy goes into effect for that country and 0 otherwise. (And it is 0 in all observations for countries that never introduce the policy--which is what I assume is the meaning of those observations where Policy1Year has missing value.)

Code:

assert !missing(Year) by Country (Year), sort: gen policy_in_effect = Year >= Policy1Year

From there, your (generalized) DID analysis will be based on that variable in a two-way fixed-effects model:

Code:

encode Country, gen(n_country) xtset n_country Year xtreg outcome_variable i.policy_in_effect i.Year /*possibly add covariates*/, fe vce(cluster Country)

Note: In your example data there are only 13 different countries. The use of cluster robust standard errors requires a larger number of Countries. If your real data has only 13 or some other relatively small number (there is no hard and fast guideline on the minimum number of clusters, but something like 50 or 100 is usually recommended), then just use ordinary standard errors.
Comment
Saleh sharmah

Join Date: Aug 2022

Posts: 120
#11

23 Aug 2023, 19:54

Dear Clyde
I can not thank you enough for your help and support. Much appreciated.
Comment
Amani Shayo

Join Date: Oct 2023

Posts: 5
#12

17 Oct 2023, 08:52

Hello Clyde
I have seen your answers regarding staggered did i have tried the commands above it seems i am missing something
Currently am working on the paper on the educational outcome of issuing driver's license to undocumented immigrants. I have 19 states plus the districts of Columbia that have adopted the driving license reforms in different time periods and my data set stated from 2000 to 2021. due to variation in treatment timing i need to use staggered DiD specifically wooldid.
Please am new to this approach am asking if its ok with you, do guide me on undertaking my study am kind of stuck with wooldid. I have just generated driver license reform variable to capture all the states .
gen driver=1 if statefip==6 & year>=2015
replace driver=1 if statefip==8 & year>=2015
replace driver=1 if statefip==9 & year>=2015
replace driver=1 if statefip==10 & year>=2016
replace driver=1 if statefip==15 & year>=2016
replace driver=1 if statefip==17 & year>=2014
replace driver=1 if statefip==24 & year>=2014
replace driver=1 if statefip==32 & year>=2014
replace driver=1 if statefip==34 & year>=2020
replace driver=1 if statefip==35 & year>=2003
replace driver=1 if statefip==36 & year>=2019
replace driver=1 if statefip==41 & year>=2019
replace driver=1 if statefip==49 & year>=2005
replace driver=1 if statefip==50 & year>=2014
replace driver=1 if statefip==51 & year>=2021
replace driver=1 if statefip==53 & year>=1993
replace driver=1 if statefip==11 & year>=2014
replace driver=0 if driver==.

please help me on how to proceed with the approach by wooldridge.
Thanks in advance
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#13

17 Oct 2023, 10:32

I'm sorry, but I am not familiar with the -wooldid- command. It is not part of official Stata, and I know nothing about it.
Comment
Amani Shayo

Join Date: Oct 2023

Posts: 5
#14

17 Oct 2023, 10:49

Thank you for your reply
But can you assist me based on the current official stata approach for Staggered DiD Please
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#15

17 Oct 2023, 10:57

Well, sure. You've already done the hardest part. That is, assuming that states 6, 8, and 9 adopted the policy in 2015, states 10, and 15 did so in 2013, etc., you already have the key variable set up.

To provide more advice, you need to describe your educational outcome variable. And you also need to explain the organization of your data set: is it a panel data set with state level variables, or is it perhaps individual person-level data within states and by years? Are there any covariates ("control variables") you plan to include? Am I correct in assuming the the unit of time in your data is years? Does your data also include some or all of the states that never adopted the policy?
Comment

Announcement

Need your kinder help with Staggered Difference-in-Differences model

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment