Panel data with fixed effects model having dummy/time-invariant variables in data set

Greg Wilson

Join Date: Sep 2019

Posts: 4
#1

Panel data with fixed effects model having dummy/time-invariant variables in data set

29 Sep 2019, 04:25

Greetings,

I'm trying to conduct FE-model test on my panel data set. The panel consists of 116 companies, observed over 5 years. Within these companies I try to measure different characteristics of different managers (ex CEO) as independent variables and a set of control variables to see the effect on leverage (dependent variable). The 116 companies have different amount of managers presented, as to why I want to run separate regressions for each manager.

Furthermore, the variables are coded as following:
Dependent:
Leverage: as per average leverage for each year
Independent:
Gender: 0 Male, 1 Female (dummy)
Age: as per age by measuring point
Education: 1-4 depending on educational level
Experience: 0 if no experience, 1 if experience (dummy)
Tenure: as per tenure by measuring point
Misdem: 0 if none, 1 if (dummy)
Control variables:
Industry: 0-9 depending on type of industry (dummy) - unfortunately presented below are my old tests, showing industry as 1-10.
ROA: measured as %
Firm size: Ln(sales)

As noticed, across my 5 year time-period, there are many time-invariant variables (at least in regards of my panel), for example the Education variable does not vary (if the manager has educational level "2", this will not vary in my panel). All the dummy variables are also time invariant for each manager.

Investigating into how panel data regressions should be run, they can (as to what I understand is mostly common) be run using a Pooled OLS technique or panel data regression using either a fixed effects (fe) or random effects (re) model. As I also understood, panel data regressions are superior to Pooled OLS regressions, which leaves me with the choice between fe or re panel regression. From what I understood further, whether fe or re should be used is determined by a number of factors but mainly using a Hausman-test. To conduct the Hausman-test both regressions, fe and re, are run and then somehow compared to determine which is most fitting ones data - significant at 5% level means that fe should be prefered over re(?).

Anyway, to get to the problem..
I set up my panel using:

Code:

egen companynum = group(Company) xtset companynum xtset companynum Years, yearly

As I then try to run the fe regression using:

Code:

xtreg dep indep1 indep2 indep3 indep4 indep5 indep6 cont1 cont2 cont3 cont4 cont5, fe vce(robust)

or as in my case (using manager CEO):

Code:

xtreg Leverage i.CEOGen CEOAge i.CEOEdu i.CEOExp CEOTen CEOMis ROA i.Industry FirmSize, fe vce(robust)

I choose to vce(robust) because of potential auto-correlation and heteroskedacity tested with respectively Wooldridge and White tests.

I get the following result:

From what I understood this problem (omitted variables) is because fe already accounts for time-invariance in the regression, which is why time-invariant variables don't work, or get knocked out/omitted regressions using fe..
Which presents my questions:
1. Is this possible to fix so I can run a fe regression followed by a Hausman-test? - or should I choose the re model / pooled ols anyway, even if probably Hausman-test would probably say that fe regression is prefered, because of the impossibility of running this data set as fe regression?
2. Is my regression correct in terms of using "i." on Education (CEOEdu variable) and having it coded as 1-4?
3. Is my regression and reasoning correct in general, have I missed any steps in regards of the preparation of conducting this type of regression?
4. Not related to the regression or problem - is there a simple way to get the regression from STATA into some form of presentation type or other document type?

Important to add is that my knowledge about statistics (and STATA) is highly limited and time is limited to acquire knowledge, hence I have turned to this forum of great expertise for help.

I would very much appreciate fast help with how I should command my regression to get the correct output for my reporting of the results and to analyze the results. An explanation to why one regression is used instead of another with regards to my data set, if this is the case, would also be highly appreciated.

Thank you in advance and best regards,
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30116
#2

29 Sep 2019, 12:36

The -hausman- command cannot be used following regressions with cluster-robust standard errors.

But you can't use the Hausman test here anyway, and you can't use a fixed effects model to test your hypotheses regardless of what it would say. You have declared that you are interested in the effects of time-invariant attributes like education and sex. Those effects are simply not estimable in any fixed effects model. This is not some peculiarity of -xtreg-, nor of Stata. It is linear algebra and there is no way around it. So for your purposes, fixed effects modeling is out of the question.

Going to OLS is generally not a good idea when you have panel structure in your data. I would consider it acceptable only if you first used a panel regression and the results showed that sigma_u is practically zero. Then it would be OK to use OLS. But that doesn't happen very often.

So you have a few choices. You can use -xtreg, re-. Or you can use -xthybrid-, which is available from SSC. This enables you to get both within- and between-panel estimates of the effects of all of the variables in your model. I think it is the best choice for your purposes.
Comment
Greg Wilson

Join Date: Sep 2019

Posts: 4
#3

29 Sep 2019, 13:04

Hello Clyde,

Thank you for the fast reply.

Then I will look into using xthybrid and if not, go with the re. Should I cluster the sample when using re if so?

And only two other questions remain:
1. How should I use Education in the regression?
2. And is there a simple way to get the regression from STATA into some form of presentation type or other document type?

Thanks again,
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30116
#4

29 Sep 2019, 13:15

Then I will look into using xthybrid and if not, go with the re. Should I cluster the sample when using re if so?

It is usually a good idea to do this with panel data. The only issue is whether you have enough clusters to make this valid. If there are fewer then 15 then most people would say you should not. If you have at least 50, most people will say you should. If you are in between, then it is a gray area and if you ask 5 statisticians you will get 6 or more opinions on the matter.

How should I use Education in the regression?

If you expect a monotone and more or less linear relationship between your outcome (leverage) and education, then enter it as a continuous variable. If that doesn't seem like a plausible data generating model, then enter it as a discrete variable, i.Education. The latter approach imposes no constraints on the form of the education-leverage relationship, but it is less efficient if there really is a monotone linear relationship.

And is there a simple way to get the regression from STATA into some form of presentation type or other document type?

There are a number of both official Stata commands and user written commands devoted to preparing publication-ready or presentation-ready tables. I'm afraid I can't help you with them, though. I simply don't encounter this kind of thing much in my work flow. I usually generate the results and then pass them on to others for preparation of documents. Or, when I do write my own documents, I just copy/paste from my log into the document and edit it to my taste. I don't do that often enough to make it worth while to learn the commands that would automate the process. Perhaps somebody else who is following this thread can advise you on this.
1 like
Comment
Dan McArthur

Join Date: Feb 2018

Posts: 6
#5

30 Sep 2019, 05:34

The esttab command by Ben Jann (available on SSC) is a really good command for exporting regression tables.
Comment
Greg Wilson

Join Date: Sep 2019

Posts: 4
#6

22 Oct 2019, 11:32

Thank you guys for the great tips!

Clyde,
I've conducted a new test using clustered re regression. As you said previously, if the sigma_u = zero then I should consider using Pooled OLS, why is this? Is there a scientific report on this?
As a matter of fact I've gotten sigma_u on one my regressions using clustered re regression, please see badly attached picture as I printed it to Excel:

1. Should I consider using Pooled OLS instead or not?
For the other two regressions (managers) the sigma_u was ~40-45 so i'm guessing they're fine.

2. Another question of mine is the missing value of "Prob>chi2" or "Prob>F", it has come to my notice that they are missing in a few of my regressions.
What is the reason for this and should I be worried - how do I interpret the value "."?

3. As for the Education-variable, i've heard that using nominal variable on a ratio dependent variable (leverage) you interpret by marginal effect instead of linear connection. Does this mean anything for my tests in regards of how to conduct them with this variable? If using it as a continous variable I will just get 1 value in the coefficient which you can't interpret in terms of marginal effect? But yes, the hypothesis assumes that the higher educational level the lower the leverage.. If it is true, would this affect any of my other variables in terms of conducting the test?

Once again, thank you very much
Comment

Announcement

Panel data with fixed effects model having dummy/time-invariant variables in data set

Comment

Comment

Comment

Comment

Comment