Controling for fixed effects (time, industry, law)

Jaka Petric

Join Date: Jun 2019

Posts: 5
#1

Controling for fixed effects (time, industry, law)

04 Jun 2019, 06:46

Hi all,

I am running a form of OLS regression, where I am trying to determine the effect of national culture on firms' leverage ratio. My sample consists of some 2000 firms from 19 countries for the years 2010-2018, bringing the total number of observations to around 15000. My Dependent variable is leverage ratio and my independent variables are 6 proxies for culture (Scores on cultural dimensions, by country). Moreover, I control for certain firm specific (Profitability, Size etc.) and country specific variables (GDP, Taxes etc.), which have been shown to have an effect on capital structure choices. I cluster standard errors at the firm level.
I also want to control for the Year, Industry and Law fixed effects. My question is, what is the difference between doing so using dummies or i.fixed effects? Basically, which of the following 2 regressions/codes should I insert/run, or are they essentially the same?

Code:

reg IndependentVariables FirmLevelControlVariables CountryLevelControlVariables YearDummies IndustryDummies LawDummies, cluster(Firm)

or

Code:

reg IndependentVariables FirmLevelControlVariables CountryLevelControlVariables i.Year i.Industry i.Law, cluster(Firm)

Thank you already in advance!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

04 Jun 2019, 09:49

In terms of the regression command itself, these are equivalent. But the i.Year i.Industry i.Law version is better for a few reasons:

1. You don't have to write code to create the indicator variables ("dummies") yourself. Writing that code is a waste of your time, and offers opportunities for making mistakes.

2. After the regression, you can use the -margins- command to get useful statistics for interpreting the model if you use the i. notation.

That said, it may be that you aren't really interested in the Year, Industry, and Law effects themselves and are including them only because it is necessary to adjust for their nuisance effects on the outcome. In that case, you might be better off using a different command entirely:

Code:

reghdfe IndependentVariables FirmLevelControlVariables CountryLevelControlVariables, /// absorb(Year Industry Law) cluster(Firm)

That will run more quickly and will also suppress output for Year Industry and Law themselves, while properly adjusting for them in the calculations. -reghdfe- is written by Sergio Correa and is available from SSC.
1 like
Comment
Jaka Petric

Join Date: Jun 2019

Posts: 5
#3

05 Jun 2019, 04:01

Dear Clyde, thank you so much for an extensive answer and also for an alternative!
I still have a question - my results (both coefficients and significance) are different when using i.Variable instead of variableDummies. The differences are not very big, but some previously significant variables do become insignificant and vice versa. If the two methods are the same how could that be? Or perhaps there is an error from my side in creating dummy variables?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

05 Jun 2019, 10:01

I am quite confident that the two results should be the same (not even small rounding errors). I can only conclude that there is some error in your code for one or both methods. If you show the code and outputs I will try to find where things go awry.
1 like
Comment
Jaka Petric

Join Date: Jun 2019

Posts: 5
#5

05 Jun 2019, 16:07

Dear Clyde, I have manually created dummy variables for the 3 fixed effects variables:
Year dummies (values 0, 1, 2, 3, 4, 5 , 6, 7 and 8):
2010 = 0
2011 = 1
2012 = 2
2013 = 3
2014 = 4
2015 = 5
2016 = 6
2017 = 7
2018 = 8

Law dummies (values 0, 1, 2, 3 and 4):
French origin = 0
English origin = 1
German origin = 2
Scandinavian origin = 3
EU accession = 4

Industry dummies (values 0, 1, 2 and 3):
Services = 0
Manufacturing = 1
Advanced manufacturing = 2
Primary = 3

Note that the only thing that I do prior running the regressions below is to import the excel datafile (I type no other code apart from the regression code) :

Code 1 (IV...independent variable; CV...control variable):

Code:

reg leverage IV1 IV2 IV3 IV4 IV5 CV1 CV2 CV3 CV4 CV5 CV6 i.Lawdummies i.Industrydummies i.yeardummies , cluster (Companyname)

Code 2:

Code:

reg leverage IV1 IV2 IV3 IV4 IV5 CV1 CV2 CV3 CV4 CV5 CV6 Lawdummies Industrydummies yeardummies , cluster (Companyname)

if I run Code 1, 2 IVs and 2 CVs are insignificant (at 10% level) - IV2, IV3 and CV5, CV6
If I run Code 2, 3 IVs and 2 CV are insignificant (at 10% level) - IV1, IV2, IV3 and CV5, CV6
All the coefficient are also different, including the constant (0.4025595 for Code 1 and 0.5374583 for Code 2)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#6

05 Jun 2019, 17:35

What you are showing for the regressions in #5 is different from what you said in #1. And I'm not sure, now, that I understand either of them correctly. Let me try to set things out clearly. I will focus on Law, but the same considerations apply to Industry and Year.

You have to distinguish two different ways of representing Law. One is as a single variable that takes on values 0, 1, 2, 3, or 4. If that is what you have, then you must include it in the regression as i.Law. If you use Law without the i. prefix, then Stata will misinterpret it as a continuous variable and will give you incorrect results for everything.

If, on the other hand you have four separate indicator ("dummy") variables for Law, say, Lawdummy1, Lawdummy2, Lawdummy3, and Lawdummy4, which are all 0 when Law = 0, and take on the value 1 when Law = 1, 2, 3, or 4, respectively, then you should enter them in the regression as Lawdummy1 Lawdummy2 Lawdummy3 and Lawdummy4 (which might be more conveniently done as Lawdummy*). You can put i. in front of those as well if you wish, but it serves no real purpose unless you plan to use these variable in the -margins- command following -regress-.

I can't tell from your description which of these situations you have, or whether you have both. If you have both, you must pick which you wish to use. I think using i.Law would make your life easier. It is particularly confusing to me because you are writing as if you have a single variable you call Lawdummies (plural!) and I don't know what you mean by that.

Added: If this does not answer your question, when you post back be sure to show an example of your data, using the -dataex- command, and also show the output Stata gave you from the regression commands.
1 like
Comment
Jaka Petric

Join Date: Jun 2019

Posts: 5
#7

06 Jun 2019, 03:06

Dear Clyde,
In your last response you understand it correctly. I was indeed using dummies in a way that Stata misinterpreted them as continuous variables, as you mention. It all makes sense now. Really thank you so much for helping me out with this, I am just a beginner at Stata!
Comment
Chongxian Bi

Join Date: Jun 2021

Posts: 1
#8

30 Jun 2021, 14:03

Dear Petric
I am a postgraduate and I am doing research about the influence of national culture on cash holdings.
I meet trouble collecting data. You mentioned that 'your sample consists of some 2000 firms from 19 countries for the years 2010-2018, bringing the total number of observations to around 15000.'
I want to know what sources do you use? Eikon or datastream or any others?
Thanks
Ronaldo
Comment

Announcement

Controling for fixed effects (time, industry, law)

Comment

Comment

Comment

Comment

Comment

Comment

Comment