Panel Data : Grouping dummy varibles and creating interaction term.

Diego Spaey

Join Date: Jun 2022

Posts: 2
#1

Panel Data : Grouping dummy varibles and creating interaction term.

18 Jun 2022, 03:25

Hello Everyone,

I recently started to use Stata for a paper on companies (installation_id below) emissions over 16 years. I am interested in measuring the impact of 3 reforms that happened at different point in time over the 16 years (2008, 2012, 2020). For that matter, I want to observe the difference in Emissions; before 2008, between 2008 and 2012, and after 2013 across companies. My dataset is composed of 337 companies and 5000+ observations on emissions.

1. I understood that I should run a panel data regression and create dummy variables for the 3 reforms. I believe that I should first create dummy variables for each year in the sample but its after that I get confused... Here is my first question:

- How do I group the "year" dummy variables into 3 "Phase" dummy variables ?
Here is the code I tried and ran but I am unsure if its relevance:

"generate Phase1 = (year<2008)
generate Phase2 = (year>2007) & (year<2013)
generate Phase3 = (year>2012)
xtset installation_id year
xtreg Emissions Phase 2 Phase 3"

2. From this, I want to add a dimension in the regression above which consists in measuring whether a listed or non-listed company on the stock exchange will have a lower/higher emissions volume. What term should I add in my model to reflect this ?

I just started using stata which is why I struggle with that type of problem...

I hope I provided enough information, let me know if I can precise things for you.
Cordially,

Diego
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30356
#2

18 Jun 2022, 13:22

Unless you are using an ancient version of Stata, there is no reason to create dummy variables. Contemporary Stata has factor-variable notation. So:

Code:

label define phase 1 "Pre-2008" 2 "2008-2012" 3 "Post=2012" gen byte phase:phase = 1 if year < 2008 replace phase = 2 if inrange(year, 2008, 2012) replace phase = 3 if year > 2012 & !missing(year) xtset installation_id year xtreg Emissions i.phase

Note: This type of analysis simply characterizes the emissions levels during those three phases. Assuming that this is all observational data, you cannot make causal inferences about the effectiveness of the reforms, as any differences observed may be attributable to other things that happened during those time periods. If you have a parallel group of otherwise comparable installations that were not subject to the reforms, then you can do a difference-in-differences analysis that might credibly get at causal inference.

Concerning the interaction with listed vs unlisted, assuming you have a variable, named listed, that is 0 for unlisted companies and 1 for listed ones, you can include this interaction by coding:

Code:

xtreg Emissions i.phase##i.listed // NOTE ##, NOT # HERE

You will probably want to follow that up to see what the emissions actually were during each phase in both types of companies, for which the code would be:

Code:

margins i.phase#i.listed // NOTE #, NOT ## HERE
1 like
Comment
Diego Spaey

Join Date: Jun 2022

Posts: 2
#3

20 Jun 2022, 05:54

Hello Clyde,

Thank you for your answer, it is super useful for me ! I will modify my model according to your feedback.
Have a good day !

Diego Spaey
Comment

Announcement

Panel Data : Grouping dummy varibles and creating interaction term.

Comment

Comment