How to create robust standard errors, clustered by firm ID

Philippe Foultier

Join Date: May 2020

Posts: 12
#1

How to create robust standard errors, clustered by firm ID

26 May 2020, 04:26

Hi guys,

I am currently using a data panel set and want to do a simple pooled OLS regression. I have included time fixed effects, and now, corresponding with a reference paper, want to create "Robust standard errors, clustered by firm ID". I am quite new to Stata and wonder how I can 1) correctly do a pooled OLS regression, although I might have already found the way to do so myself, but most crucially, which I have issues with, how to create these robust standard errors clustered by firm ID. I am not an expert on econometrics and would appreciate any help with this, thank you so much!
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10191
#2

26 May 2020, 06:45

Technically, if you include time fixed effects, you cannot refer to the regression as pooled OLS.

Code:

regress depvar indvars i.time, cluster(firmID)
1 like
Comment
Philippe Foultier

Join Date: May 2020

Posts: 12
#3

26 May 2020, 07:13

Originally posted by Andrew Musau View Post

Technically, if you include time fixed effects, you cannot refer to the regression as pooled OLS.

Code:

regress depvar indvars i.time, cluster(firmID)

Thank you so much for the response Andrew! I will refrain from using that term, then, thanks for the suggestion. Also, I wonder, what exactly do you mean by "i.time" in your code?

Best,

Philippe
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#4

26 May 2020, 07:30

Philippe:
-i.time- tells Stata that -year- is a categorical variable; it impies the use of -fvvarlist- notation.

Last edited by Carlo Lazzaro; 26 May 2020, 07:35.

Kind regards,
Carlo
(Stata 19.0)
2 likes
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10191
#5

26 May 2020, 07:32

In #1, you state

I have included time fixed effects

The way to do this using regress is to use time dummies. So I have a variable called time and I am including time dummies to capture time fixed effects. If your time variable is named year, then you use "i.year". See factor variable notation

Code:

help fvvarlist

Nb. Crossed with Carlo's helpful reply.
2 likes
Comment
Philippe Foultier

Join Date: May 2020

Posts: 12
#6

26 May 2020, 07:50

Thank you Carlo and Andrew for your responses! I have uploaded my file for a bit of context. Specifically, I will have column L as my independent variable, column F-J as my dependent variables, column R as another independent variable, then column T to AK as my independent variables representing my time dummies (2018-2001) and column AUM to AU representing another type of dummies ("Property type" dummies). For both my property type and time dummies, I would prefer to have them be fixed, which I assume I can achieve through eliminating one year from the time dummy selection and one property type from the property type selection.

I have some trouble to code this, especially on how to code the property type dummies as well. Would the below be somewhat correct to achieve my goal? (My goal being to conduct an OLS using the above dependent and independent variables, while ensuring that the standard errors are robust and clustered by companyID (with companyID being represented by column A in my excel sheet).

Code:
regress Dogan2019Lev indvars (columns F to J) i.time(YR2018) i.time(YR2017) ... i.time(YR2001) Healthcare_REIT ... Specialty_REIT, cluster(companyID)

Best,

Philippe

Last edited by Philippe Foultier; 26 May 2020, 08:27. Reason: removed spreadsheet
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#7

26 May 2020, 08:01

Philippe:
as reminded by the FAQ, please do not post attachments, but use CODE delimiters and/or -dataex- to share what you're doing.
In addition, nobody on this forum will ever download spreadsheets due to the risk of active contents. Thanks.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Philippe Foultier

Join Date: May 2020

Posts: 12
#8

26 May 2020, 08:06

Dear Carlo,

My apologies! I will try and rephrase my data into a valid format!
Comment
Philippe Foultier

Join Date: May 2020

Posts: 12
#9

26 May 2020, 08:21

I have attempted to use dataex, however, I unfortunately have too much variables to use them. I tried cutting a large portion of observations to potentially let it work still, but the error notification remained. Thus, I'll quickly summarize it by words here.

I have one dependent variable:
- Dogan2019Lev

I have 5 regular independent variables here:
- FirmSize
- Growthopps
- Profitability
- InterestCoverage
- Tangibility

Further, I have a group of time dummies, which effect I'd like to be fixed. The time span is from 2001 to 2018, thus, I plan to exclude one year to ensure the 'fixed' effect. The specific time dummies are as follows, with each value it can take representing the usual 0 and 1;
- YR2018
- YR2017
- YR2016
- YR2015
- YR2014
- YR2013
- YR2012
- YR2011
- YR2010
- YR2009
- YR2008
- YR2007
- YR2006
- YR2005
- YR2004
- YR2003
- YR2002
- YR2001

I next have one other independent dummy variable that represents the legal status of a certain country, with this variable represented as 0 and 1 as well. This dummy variable is very important, as I am interested in finding out the effect of the variable upon the dependent variable;
- NoLeverage

The last set of dummy variables concerns property 'types', with them representing the usual 0 and 1 if a certain property type status is met;
- DiversifiedREIT
- HealthCareREIT
- HotelREIT
- IndustrialREIT
- OfficeREIT
- ResidentialREIT
- RetailREIT
- SelfStorageREIT
- SpecialtyREIT

My data concerns panel data, with the data representing yearly company observations within the time range of 2001 to 2018. The companies are identified by a "CompanyID" variable, which simply attaches a certain number to a certain business.

Using these independent and dependent variables, including my dummy variables, I would like to do a OLS regression, but crucially keep my time and property type effects fixed and furthermore have robust standard errors which are clustered by CompanyID.

I currently created this code, but I am not certain if it'll work out perfectly;

regress Dogan2019Lev* FirmSize Growthopps Profitability InterestCoverage Tangibility i.time(YR2018) i.time(YR2017) ... i.time(YR2001) Diversified_REIT ... Specialty_REIT, cluster(companyID)

* = my dependent variable, with all other variables concering my independent variables.

Last edited by Philippe Foultier; 26 May 2020, 08:26.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#10

26 May 2020, 09:06

Philippe:
I'm (still) not clear with what refrains you from switching to -xtreg-, since you have panel data.

Last edited by Carlo Lazzaro; 26 May 2020, 09:19.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Philippe Foultier

Join Date: May 2020

Posts: 12
#11

26 May 2020, 09:12

Dear Carlo,

You are right. I am reading more into xtreg as code and it looks like this is indeed optimal for my panel dataset, thank you!

Also, I am sorry to bother you agian, but for the other variables I specified in my code, did I do this correctly?

Best,

Philippe
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#12

26 May 2020, 09:26

Philippe:
I think that your data are not in -long- format (which is the best data layout for almost all Stata procedures).
For instance, you should have an unique -year- variable (ie, your -timevar-) that you should use to -xtset- your dataset before running -xtreg-.
In addition, the fixed effect you're looking for is actually the panel-wise heterogeneity.

Kind regards,
Carlo
(Stata 19.0)
Comment
Philippe Foultier

Join Date: May 2020

Posts: 12
#13

26 May 2020, 09:38

Dear Carlo,

Thank you for your response. I do have my data in the long format, with e.g. one observation per year, per firm, having data for all my required variables. I have already done this in my Excel. If this is fulfilled, would my code be valid?

Best,

Philippe
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17707

#14

26 May 2020, 09:42

Philippe:
you should have one categorical variable only for -year- and not so many dummies:

Code:

use "https://www.stata-press.com/data/r16/nlswork.dta"
. list idcode year in 1/10

     +---------------+
     | idcode   year |
     |---------------|
  1. |      1     70 |
  2. |      1     71 |
  3. |      1     72 |
  4. |      1     73 |
  5. |      1     75 |
     |---------------|
  6. |      1     77 |
  7. |      1     78 |
  8. |      1     80 |
  9. |      1     83 |
 10. |      1     85 |
     +---------------+

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Philippe Foultier

Join Date: May 2020

Posts: 12
#15

26 May 2020, 09:51

Carlo, thank you for your response. That is interesting. So I should create one variable, which simply has values from 2001-2018, and then use the following code in order to create my fixed effect year dummy? ;

use "https://www.stata-press.com/data/r16/nlswork.dta"
. list idcode year in 1/18
Comment

Announcement

How to create robust standard errors, clustered by firm ID

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment