xtreg re (and xtreg fe) with many different dummy types (i.firm, i.industry, i.month, i.year)

Victoria Rogers

Join Date: Oct 2014

Posts: 138
#1

xtreg re (and xtreg fe) with many different dummy types (i.firm, i.industry, i.month, i.year)

11 Oct 2014, 21:06

Dear readers,

I'm executing the following code:

Code:

xtset personID date (date=combination of day, month and year) xtreg DV IVs i.firm, i.industry, i.month, i.year, re cluster(firm)

cluster(industry) didn't work because then I received the error: "Panels are not nested within cluster"

DV= dependent variable ; IVs= independent variables

I was considering to also add i.fund but now I'm wondering if there's a better method to control for the fixed effects of funds, firms, industries, months and years because the above code takes more than 1 hour to run (without the i.fund)

It's about managers managing funds and their stock returns

EDIT:

Code:

xtreg DV IVs i.firm, i.month, i.year, re cluster(firm)

also takes longer than 1 hour

-year: I'm 100% sure that I need year fixed effects due to the crisis years.
-month: I'm 99% sure that I need the month fixed effects due to well-known phenomena aspects of December-January (closing year, start of the year, snow-effect....http://www.investopedia.com/terms/j/januaryeffect.asp) and the phenomena called "go away in May and remember to come back in September"
-industry: 90% sure,,,due to different regulation per industry
-firm: 80% sure,,,due to things like ?

Last edited by Victoria Rogers; 11 Oct 2014, 21:27.
Tags: None
Victoria Rogers

Join Date: Oct 2014

Posts: 138
#2

11 Oct 2014, 22:45

Code:

xtreg DV IVs i.firm, i.industry, i.month, i.year, re cluster(firm)

gives a P>IzI of 0.000 (P-value)

Code:

xtreg DV IVs i.industry, i.month, i.year, re cluster(firm)

gives a P>IzI of 0.211 (P-value)

So, is it wise to use all the dummy variables of the first code? Do you think that it's a good idea to also add i.fund as a control-variable?

The -xttest0- indicates that there's no random effect and that I should use OLS instead of -xtreg re-
When I use -xtreg fe- , ceteris paribus, then there is a random effect (according to fraction of variance due to u_i)

Even though it's strange that the random effect 'disappears', I would rather use -rreg- instead of the normal OLS.
However, is it a good idea to cluster(firm) with -rreg-? Example:

Code:

rreg DV IVs i.firm i.industry, i.month, i.year, cluster(firm)

is what I want to test, however there's no option for cluster(firm)

Last edited by Victoria Rogers; 11 Oct 2014, 23:33.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#3

12 Oct 2014, 02:59

Victoria:
- it is not a good idea to divide the same query in different posts, as listers can find difficult to follow you (and the likelihood of getting helpful replies decreases accordingly).
- you seem you're looking for the "golden code" for your problem, but the way you outline it is not that clear. You started out with a linear panel data analysis (with no continuous predictors?) and your doubt seemingly rests on: re or fe? You can test for the best specification of your linear panel data model via the Hausman's test (which unfortunately does not support -vce(cluster)-):

Code:

xtreg DV IVs i.firm, i.month, i.year, fe estimate store fe xtreg DV IVs i.firm, i.month, i.year, re estimate store re hausman fe re

For more on Hausman's test (and its possible drawbacks) I would point you to -help hausman- and related entry in Stata 13.1 .pdf manual;
- - rreg- do not support -cluster(<whatyouwant>)- because

performs one version of robust regression of depvar on indepvars.

, as reported in -help rreg-.

Kind regards,
Carlo

Kind regards,
Carlo
(Stata 19.0)
Comment
Victoria Rogers

Join Date: Oct 2014

Posts: 138
#4

12 Oct 2014, 03:21

Thank you for the help. I wasn't sure if it would be better to ask many questions in 1 topic or to divide them over different topics. It indeed seems that the former would have been better.

At the moment, I'm focussing on gender which is either 1 (male) or 0 (female) but other predictors like market risk premium (continuous predictor) are also included.

I did perform Hausman tests to check which model I needed to use. It turned out to be re, which I expected due to the time-invariant variable gender I'm interested in. However, after adding many different dummies to control for a lot of fixed effects (i.firm i..industry i.month i.year) the -xttest0- indicates that there's no random effect anymore (due to the dummies the fraction of variance due to u_i becomes 0) and that I should use OLS instead of -xtreg re-.

Instead of using a normal OLS, it's probably better to use something like -rreg- (even though I prefer to work with -xtreg-).

My main concern is whether it's a bad or good idea to include that many different types of dummies (i.firm i..industry i.month i.year) and if it's either wise or unwise to also add i.fund
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#5

12 Oct 2014, 03:40

Victoria:
please see my reply to most of your questions at http://www.statalist.org/forums/foru...gher-r-squared #6

- using

normal OLS

instead of a linear panel model means to ignore the panel data structure of your data. Any decent econometrics textbook highlights that pooled OLS (with -vce(cluster id)- standard error to take multiple observations per id into account) can be an option in a limited set of instances. I would reconsider performing Hausman's test on your new model and see whether -xtreg, fe - is the way to go.

Kind regards,
Carlo

Kind regards,
Carlo
(Stata 19.0)
Comment
Victoria Rogers

Join Date: Oct 2014

Posts: 138
#6

12 Oct 2014, 04:09

I'm executing the following code:

Code:

xtset personID date (date=combination of day, month and year) xtreg Excess_return male market_riskpremium i.firm, i.industry, i.month, i.year, re vce (cluster firm)

cluster(industry) didn't work because then I received the error: "Panels are not nested within cluster"

DV= dependent variable ; IVs= independent variables

This is my most basic/simple regression but the question and aspects remain the same when I add more IVs including control-variables. It's about managers managing funds and their related stock returns.

The question whether it's a bad or good idea to include that many different types of dummies (i.firm i..industry i.month i.year) and if it's either wise or unwise to also add i.fund

After the Hausman-tests on each model Stata suggested -xtreg re- instead of -xtreg fe- but thereafter I used -xttest0- which indicated I should use OLS instead of -xtreg re-. I don't know what other Stata output you'd like to see. It's kind of a mess with all those dummies. I don't want to ignore the panel data structure of my data, maybe I should just use -xtreg re- despite of the -xttest0- which probably isn't well-known.

dummies:
-year: I'm 100% sure that I need year fixed effects due to the crisis years.
-month: I'm 99% sure that I need the month fixed effects due to well-known phenomena aspects of December-January (closing year, start of the year, snow-effect....http://www.investopedia.com/terms/j/januaryeffect.asp) and the phenomena called "go away in May and remember to come back in September"
-industry: 90% sure,,,due to different regulation per industry
-firm: 80% sure,,,due to things like ?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#7

12 Oct 2014, 05:12

Victoria:
as far as

I don't know what other Stata output you'd like to see.

is concerned, I mean something like the attachment.

Kind regards,
Carlo
Attached Files

Kind regards,
Carlo
(Stata 19.0)
Comment
Victoria Rogers

Join Date: Oct 2014

Posts: 138
#8

12 Oct 2014, 06:09

Do you want the Stata output with or without the effect of the dummies?

I don't know why you would need that to solve my 2 questions, but I'll give the important part to you at the end of this message/post

-The question whether it's a bad or good idea to include that many different types of dummies (i.firm i..industry i.month i.year) and if it's either wise or unwise to also add i.fund
-The question whether I should just use -xtreg re- based on the Hausman test and thereby neglect the -xttest0- which indicates that OLS would be alot better (which neglects my unbalanced panel data)

Someone on another website said the following about pooled OLS: "From what I have understood, the risk is that the coefficients will be correlated with the error term, thus making the estimates biased. There will be some form of endogeneity. Would it help if I include year dummies in the pooled OLS regression? It still wouldn’t capture the effects of varying intercept in the individual dimension, right?"

I don't want biased results (even though a little bias will always exist) and I mainly want to see the coefficient and significance of the intercept (alpha)

using i.industry i.month i.year without i.firm leads to

Code:

xtreg RminRF Male mrp smb hml mom i.industry i.month i.year, re vce(cluster firm)

http://i.imgur.com/5PM7u8J.png (p-value of 0.168 with positive sigma_u and rho) -->xttest0 recommends -xtreg re-

when I use i.industry i.month i.year and i.firm

Code:

xtreg RminRF Male mrp smb hml mom i.firm i.industry i.month i.year, re vce(cluster firm)

sigma_u and rho and the p-value become 0 -->xttest0 recommends OLS (because the random effect 'disappeared')

http://i.imgur.com/IP5MY9y.png

So, what would you do? Use i.firm or not and with which method?

Last edited by Victoria Rogers; 12 Oct 2014, 06:48.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#9

12 Oct 2014, 09:05

Victoria:
short answer: I would use

Code:

xtreg RminRF Male mrp smb hml mom i.industry i.month i.year, re vce(cluster firm)

Long answer: I would also investigate if your problems are related to the fact that the default standard errors (SEs) for -xtreg, re- differ substantially from clustered-robust SEs, as this is a possible drawback of standard Hausman's test. A guidance on how to deal with issue is reported in http://www.stata.com/bookstore/micro...ata/index.html (pages 267-268).

Kind regards,
Carlo

Kind regards,
Carlo
(Stata 19.0)
Comment
Victoria Rogers

Join Date: Oct 2014

Posts: 138
#10

12 Oct 2014, 09:48

Thank you Carlo. I've tried to find the possible drawback on google because I don't prefer to buy an expensive book for only 2 pages. Could you please tell me some more about the possible drawback or how I can find information about the problem online (some related keywords perhaps)

(there are no page numbers when I check the small free part of the book after I clicked on your link)

EDIT: the number of observations drops dramatically from 140,000 to 20,000 when I use all the different types of dummies (so i.firm and i.industry included) I assume that it's wrong to use that regression output, even though the R-squared and P-value became a lot better. Could someone tell me if it's okay or not to use that output?

It would be strange to drop i.firm or i.industry ....strange to drop i.firm because I also have individual fixed effects due to -xtreg- and strange to drop i.industry (because I worked a lot to get that data)

Last edited by Victoria Rogers; 12 Oct 2014, 10:38.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#11

12 Oct 2014, 10:06

Victoria:
- I reported the page numbers of Cameron and Trivedi' textbook hoping that you could find it at your university library;
- googling with the string -robust hausman stata- will give you back some helpful results (especially some interesting Stata threads).

Kind regards,
Carlo

Kind regards,
Carlo
(Stata 19.0)
Comment
Victoria Rogers

Join Date: Oct 2014

Posts: 138
#12

12 Oct 2014, 10:55

Thank you, I'm working on this project because my boss asked me to do it. So, I don't have access to the library of my old university.

the number of observations drops dramatically from 140,000 to 20,000 when I include i.industry
I assume that it's wrong to use that regression output, even though the R-squared and P-value became a lot better. Could you tell me if it's okay or not to use that output? (while the other regression outputs in the same table contain 140,000 observations).....the 20,000 is the result of only having 21,000 observations with industries, the rest wasn't publicly available)

Last edited by Victoria Rogers; 12 Oct 2014, 11:05.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#13

12 Oct 2014, 14:24

By your own words, you are caught between a rock and a hard place. You note that different industries are differently regulated, and you indicated that this is relevant to your outcome, so omitting industry seems a bad idea. Yet the industry variable is missing from 120,000 observations. I don't work in financial or economic data, so perhaps I am naïve, but it is hard for me to understand why industry would not be publicly available information for every firm. But, taking you at your word, the next question is what mechanism causes that information to be sometimes available publicly and sometimes not. If it represents some kind of Act of God, or if the original data collection by design restricted collection of that data to a random sample, then the missingness would be ignorable. But otherwise, analyzing only complete cases is likely to get you biased results. So I think you need to find out what's going on with this issue before you proceed farther.
Comment
Victoria Rogers

Join Date: Oct 2014

Posts: 138
#14

12 Oct 2014, 14:54

Thank you for the great help Clyde. I didn't realize that I should have clarified my problem situation better.

The industry information for mutual funds is only given for some quarters and I'm using monthly returns. So the maximum available industry data per 12 months of data is 4. However, lots of quarters of different funds don't have an industry code (Lipper classification) (on the famous economic data-site called WRDS) and sometimes mutual funds get a different industry code in the same year. So, there isn't a trustworthy way to get the industry codes for the other monthly observations. At least, I don't know a trustworthy, unbiased method.

I agree that using 20,000 observations instead of the 140,000 gives me a biased result, so it's indeed a very difficult situation unfortunately.
Hopefully someone knows a solution for this problem or the most common method when researchers encounter this kind of problem,

EDIT: is it possible to automatically fill in monthly cells with industry codes if the industry codes are given of the quarter before the specific monthly cells and if the industry codes are given after the monthly cells. To make it more clear:

January=code Valuegrowth
February=
March=
April=
May=
June=
July=code Valuegrowth
August=
September=
October=
November=
December=

It would be nice if someone knows a code to automatically fill in the empty cells of February, March, April, May and June with the code Valuegrowth, while keeping empty cells for August, September, October, November, December. If the code of July would be different than the code of January...then there should be no automatic fill in of the months in between,

Last edited by Victoria Rogers; 12 Oct 2014, 15:53.
Comment
Victoria Rogers

Join Date: Oct 2014

Posts: 138
#15

12 Oct 2014, 16:40

-mi impute pmm ivar, add(#) knn(2)- may be the solution, however, I'm not sure which value I must put after add, between the ( )
http://www.stata.com/manuals13/mimiimputepmm.pdf

Does anyone know how this command works and if it's indeed to the solution for the above problem (with the months and the code)
Comment

Announcement

xtreg re (and xtreg fe) with many different dummy types (i.firm, i.industry, i.month, i.year)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment