Fixed effect versus clustered standard errors

Morten Gravesen

Join Date: May 2018

Posts: 18
#1

Fixed effect versus clustered standard errors

02 Aug 2018, 03:22

Hi, i am taking a chance asking here, as my teacher seems to be having a nice vacation, not answering my email. I am writing my master thesis, but I have a hard time understanding which regression model to use.

The dataset I am using is of panel structure - 1,000 firms (500 Swedish, 100 Danish, 200 Norwegian and 200 Finish) with years ranging from 2004 to 2017. It is unbalanced and has gabs, because I have removed observations with missing values, book leverage above 1, total assets below 10 million dollars and market-to-book ratios above 10.

The regression I am running is:

Code:

book leverage = EFWAMB(t-1) + Market-to-book(t-1) + Tangibility(t-1) + Profitability(t-1) + Size(t-1)

The results from different versions of this model can be seen in the table below.

1. I do not know which model to trust?
2. I am confused to why the OLS estimated coeffecients (column 1) is the same as those from clustering the standard errors on both time and firm (column 9). I thought, that by clustering on these two dimensions, I would be able to remove serial correlation and heteroskedasticity and as such, the coeffecients would be different from those of OLS?
3. I am also confused to why the fixed effects regressions are so different from the OLS .

In general I find the litterature on this matter very unfullfilling as it is a LOT OF IFS and WHYS. There is never a clear answer to get. My teacher says - use fixed - but when I ask why, he can't answer. He is one of those corporate finance dudes, who just by default sticks to fixed effects. However, it does not provide me with the results I am looking for - the paper I am following use OLS with robust and Fama Macbeth and get results similar to those I get from doing this - however, the fixed effects model ruins the variable of importance - EFWAMB - as it turns small and insignificant.

So, if anybody could please take a moment and reflect upon my setting of data - the variables included - and come up with a good recommendation on which model to go with and why, by answering questions 1, 2 and 3 above, I would be more than greatfull.

In case you should ask for it, here are the different statacode used to estimate the models above:

OLS robust:

Code:

reg b_lev L1.efwamb L1.mb L1.tang L1.prof L1.size, robust

Fixed effects:

Code:

areg b_lev L1.efwamb L1.mb L1.tang L1.prof L1.size, absorb(gvkey)

Fixed effects, cluster year:

Code:

xi: areg b_lev L1.efwamb L1.mb L1.tang L1.prof L1.size i.year, absorb(gvkey)

Random effects:

Code:

xtreg b_lev L1.efwamb L1.mb L1.tang L1.prof L1.size

Fama Macbeth cross-sectional:

Code:

xtfmb b_lev L1.efwamb L1.mb L1.tang L1.prof L1.size

The Fama Macbeth two path regression is estimated manually by first making 1000 time series regressions, which provides me with 5*1000 betas using:

Code:

statsby, by(gvkey) saving(betas): reg b_lev L1.efwamb L1.mb L1.tang L1.prof L1.size merge m:1 gvkey using betas drop _merge

I then do 14 cross-sectional regressions, one for each time period 2004 - 2017 with the estimated betas from above being the new independant variabes, which provides me with 5*14 new beta values (gamma) using:

Code:

statsby, by(year) saving(gamma): reg b_lev b1 b2 b3 b4 b5

I then open the gamme file, and take the average of the 14 betas in each row - this is my beta estimates reported in the model above. To get t-test, I simply divide this coefficient through with the square root of the variance of the betas divided by 14.

4. Why do I not get the same coefficients and t-stats as those calculated using the xtfmb command?

Best regards,
Morten

Attached Files

Last edited by Morten Gravesen; 02 Aug 2018, 03:37.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#2

02 Aug 2018, 09:24

The answer to your first question comes from substantive finance considerations, not statistics or Stata, so you will have to await your advisor's return (or seek advice from somebody else in finance who can give you a better answer.) Suffice it to say that from a statistical perspective, you should not be running multiple models like this: that decision should have been made before you ran any analyses at all (and, ideally, before you even set eyes on the data). And you certainly should not be selecting your model based on whether you like the results it produces.

Regarding question 2, the cluster robust standard error calculation corrects the standard errors for heteroscedasticity and serial correlation. It doesn't affect the coefficients at all. In fact, the analysis when you specify -cluster(gvkey)- begins by doing the ordinary regression, and then doing adjustments to the variance-covariance matrix afterwards. The coefficients themselves are not changed by this option.

Regarding question 3, the fixed effects model estimates the within-gvkey effects of the variables. The OLS model estimates a mix of within- and between effects. When the within- and between- effects are different, the OLS and fixed-effects estimates will be different. They can be very different, even having opposite signs. The following example is very artificial, but makes the point with unmistakable clarity:

Code:

clear set obs 5 gen panel_id = _n expand 2 set seed 1234 by panel_id , sort: gen y = 4*panel_id - _n + 3 + rnormal(0, 0.5) by panel_id: gen x = panel_id + _n xtset panel_id xtreg y x, fe regress y x // GRAPH THE DATA TO SHOW WHAT'S HAPPENING separate y, by(panel_id) graph twoway connect y? x || lfit y x
Comment
Morten Gravesen

Join Date: May 2018

Posts: 18
#3

02 Aug 2018, 10:40

Hi Clyde, thank you for your response.

It is not like I pick the one I like best. However, when people talk about Panel data, they always seem to be fixed on the idea of either fixed or random effects.

Then I read a paper by Mitchell Petersen (2009) who compared different methods, and he mentioned the method of clustering by two dimentions.

The issue is, that I am following a paper by Baker and Wurgler (2002) and they use Simple OLS with white standard errors and Fama Macbeth regressions on their dataset. However, these methods only correct for heteroskedasticity and cross-sectional correlation in the residuals. It does not correct for serial correlation. Petersen argue that this is wrong. This is why I don't want to use the same methods as they do.

However, my advisor keeps blindly saying - FIXED EFFECTS - without giving me any explanation for this. Why fixed effects? What does this model do exactly? In what circumstances should I use fixed effects rather than the method of clustering by two dimentions.

Clustering by two dimentions corrects the standard errors, however, are the coeffecients effecient at all?

By using the fixed effects model, I get completely different results than those given by Baker and Wurgler (2002) in their paper. This is why I don't like using it, because I am not smarter than those guys. Using their method of OLS with white standard errors and Fama Macbeth gives me "correct" results, but again, these are ineffecient because of serial correlation.

So again, which method should I use, if I want to correct for heteroskedasticity, cross-sectional correlation and serial correlation, providing me with unbiased standard errors AND effecient coeffecients?

Initiall I wanted to use fixed effects, but because I got results so different from those by Baker and Wurgler (2002), i decided not to. That is why I tested multiple models.

Best regards,
Morten Gravesen
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#4

02 Aug 2018, 11:25

Well, as I indicated earlier, I don't have the knowledge to respond to your question about which model is appropriate here. There are plenty of people in the finance community who are members of this Forum, and perhaps one of them will chime in with advice.

I will, however, remark on a more general point. Unless you are actually replicating a prior study using the same data that they used, the fact that you get different results does not necessarily mean you did something wrong. The scientific literature is bristling with studies that fail to replicate when the study is carried out again with new data gathered. There is a burgeoning literature on just this topic and the reasons for it are many. And while you may or may not be "smarter" than some other people, it is still possible that your results are "right" and theirs are "wrong." Certainly, a substantial discrepancy between your own results and those of prior similar studies is good cause to stop and ask questions. But you should ask those questions with equal seriousness about the other studies as you ask them about your own. Just because it's been published in a peer reviewed journal doesn't mean it's right, and just because your results are different, do not assume they are wrong. Investigate, but do so with an open mind.
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#5

02 Aug 2018, 11:46

Morten:
Clyde gave as usual excellent adivice.
As far as the "fixeed effect mantra" is concerned, the reason to use this specification sounds like "it is more interesting to take a look at variation within the same panel as time goes by, than taking a look at difference between different panels as time goes by".
That said, my experience tells that it is often difficult to replicate others' researches just moving from an article. Sometimes it happens that reviewers missed to spot an error in the methods that carries over to results; hence, you cannot replicate authors' findings by applying the correct methods.
This is one of the reason why Stata Press textbooks (with .dta, .do files and and examples) are an useful source for learning Stata and statistics.

Kind regards,
Carlo
(Stata 19.0)
Comment
Jesse Tielens

Join Date: Jul 2018

Posts: 46
#6

02 Aug 2018, 12:26

Hi Morten,

I think the advice given by Clyde and Carlo is excellent. I'm a finance master's student myself, so maybe I could chime in a little.

The choice between fixed effects and random effects here would mostly come down to whether you're interested in a between estimator or within estimator. In your case, you're interested in finding out what drives a firm's leverage ratio.
A fixed effects estimator essentially shows you a firm's leverage ratio changes when its size, profitability, etc change over time. The Random Effects model allows for the possibility that the the relationship you're modelling is different for each firm.

You are modelling the relationship between a firm's leverage ratio and several firm-specific factors. Unless, you have a reason to believe that this relationship works very different for some firms than for others, a fixed-effects model is the best choice. For example, you could say that the effect of firm size on leverage is smaller for firms in the pharmaceutical sector, because pharmaceutical firms are always highly leveraged no matter what their size (just an illustration). That would mean the relationship between leverage and size is different for different firms and a random-effects estimator could account for this difference.

Unless you have reason to believe the relationship you're documenting is different between the panels, a fixed estimator would be the best choice. My best guess as to your professor's reasoning would be that since you haven't told him of such a reason, he believes the fixed model to be best.
1 like
Comment
Morten Gravesen

Join Date: May 2018

Posts: 18
#7

02 Aug 2018, 12:54

I am very greatful with all your answers. But to be clear the choiseis not between fixed effects or random effects but between fixed effects or OLS with clustered standard errors. I know that the later does correct for serial correlation in the standard errors which is something that I assume to be an issue in my data. However, I am worried that this model does not provide effecient coefficient estimates. The fixed effects on the otherhand gives me very odd results, very different from all other litterature out there (which uses simple OLS with White standard errors).
Comment
Jesse Tielens

Join Date: Jul 2018

Posts: 46
#8

02 Aug 2018, 13:25

Essentially, a fixed effects model is basically the equivalent of doing a Pooled OLS on a de-meaned model. This way, you're just looking at change between time-periods and ignoring the absolute values. As Clyde already mentioned, a pooled OLS is much more like a Random Effects model in that regard. A pooled OLS is also a mix between a within and a between estimator.

Looking at your results, this quickly becomes clear. The coefficient signs and significance for the Pooled OLS and Random Effect models are not that different.

Assuming 'EFWAMB' is the External Finance Weighted Average Market-to-Book ratio, we can infer some things from your results. The EFWAMB is high for firms that issue lots of equity when their market valuation is very high, sort of like evidence of market timing. You found a very negative coefficient in Pooled OLS and RE-models for the lagged EFWAMB variable, I think this would mean that firms that issued lots of equity in the last year have a lower leverage ratio this year. Whereas the coefficient in the fixed effects model looks at the change EFWAMB variable last year.

But to which of these two questions is more relevant to your research?
-When firms EFWAMB rate changes, what is the effect of this change on the leverage rate?
-Do firms with a higher EFWAMB rate have a higher/lower leverage rate across my time period?

Answering the first question is best done by using a fixed-effects model. The second question requires a random effects or the pooled OLS I'd say.

Last edited by Jesse Tielens; 02 Aug 2018, 13:31.
Comment
Morten Gravesen

Join Date: May 2018

Posts: 18
#9

02 Aug 2018, 13:44

Hi Jesse. Thanks again for your reply. You are correct that the EFWAMB is the weighted average market to book ratio, weighted by external finance in any given year.

What I am looking for is if firms with a high EFWAMB has significantly lower current leverage. I am not interested in seeing an effect from any change. In that sense it is sort of looking back, not used to predict future leverage.

What exactly is your reasoning for saying, that in my case then, I should use pooled OLS rather than fixed effects?

Kind regards, Morten
Comment
Jesse Tielens

Join Date: Jul 2018

Posts: 46
#10

03 Aug 2018, 04:08

At this point it's more about the theory behind the framework, rather than statistical knowledge. But perhaps Clyde Schechter or Carlo Lazzaro can confirm I'm not saying wrong things here

If your hypothesis is that EFWAMB has a negative effect on the dependent variable the leverage ratio, there are several ways to test this. You could say:
-If EFWAMB has a negative effect on leverage, when a firms EFWAMB variable increase we see the opposite effect on leverage.

That is what a fixed model tests. Within the panel (so same firm), we check if years with large EFWAMB variable correlates with low variables of the leverage rate. On the other hand, you could also say the following:
-If EFWAMB has a negative effect on leverage, we should observe that firms with high EFWAMBS have low leverage rates and firms with low EFWAMB have high leverage rates.

The Random Effects model also tests that, in addition to what a fixed effect model already tests. Now, because your testing for a whole different range of things, the results are very different of course. Probably in this case, you could even say that the variance is much greater in the RE-model than in the FE-model. Because firms are unlikely to suddenly obtain much more external financing, but it's a rather slow process I'd guess. So, while there exists a lot of differences between firms and their use of external financing methods. I would suspect there's a lot less change in how the same firm handles its financing over the years. That'd be my best bet as to why there's a very significant coefficient for EFWAMB in the POLS and RE-models and much fewer in the FE-model.

But like I said, the choice for either model should be rooted in your theory and hypotheses.

Last edited by Jesse Tielens; 03 Aug 2018, 04:17.
Comment
Morten Gravesen

Join Date: May 2018

Posts: 18
#11

03 Aug 2018, 08:45

I must say, that you answer completely confuses me.

What exactly do you mean when you say that the fixed effects model and the pooled OLS are used for different objectives? I have read 10 chapters in different books and plenty of articles too, without finding any explanation what so ever. To me it seems as if you talk about the fixed and random effects model outside the scopes of these models - I thought, the fixed effects model was used to adjust for any unobservable fixed effect that is correlated with the explanatory variables of firm A (constantly over time) but not firm B, whereas the Random effects allows for these unobservables to not be correlated with the independant variables.

The external weighted average market-to-book ratio (EFWAMB) takes high values for firms that have issued equity in periods, when their market-to-book ratio was high. Thus, if firms with high EFWAMB has lower leverage on average than firms with a low EFWAMB, my model should give me a significant negative value. I am interested in point estimates, not predictions of change.

I am ONLY interested in knowing weather past attempts to time the market have a cumulative affect on current leverage. Baker and Wurgler (2002) who developed this ratio, do not AT ALL come up with their reasoning behind their choice of model.

I have ruled out the random effects model, as I am thinking that things like managerial ability and attitude towards risk will influence profitability, size and the market-to-book ratio.

So to be clear - the choise is between a fixed effects model and a pooled OLS with clustered standard errors.

My data is 1,000 firms, 500 Swedish, 100 Danish, 200 Finnish, 200 Norwegian. They are selected from the compustat global database. It is unbalanced and with gaps. This is all I know about the data, now you know the same. And the hypothesis again: do firms with hight EFWAMB have lower leverage than firms with low EFWAMB.
Comment

Morten Gravesen

Join Date: May 2018
Posts: 18

#12

03 Aug 2018, 09:05

The way the EFWAMB is constructed, by weighting each firm by its external finance in any given year, devided by the total of external finance up untill that point in time starting at time 0 in the sample, confuses me even further to how I can use the fixed effects model. If there is any fixed effect from unobservable variables, that influence the market-to-book ratio, this will create the problem of serial correlation in my residuals. And because the EFWAMB is constructed from these market-to-book ratio, would I not remove any effect from this variable when using fixed effects?

Also, as market-to-book ratios, book leverage, size and tangibility do not vary hugely over time, can I even use fixed effects without losing some important information?

market-to-book	profitability	tangibility	size
1.636465	0.1699918	0.2760911	10.80538
1.659328	0.1671353	0.2467667	10.87995
1.169266	0.1664327	0.2586308	11.0566
1.62008	0.099755	0.2731158	10.93715
2.02516	0.1871107	0.2339925	11.0191
1.617319	0.1885505	0.2202311	11.10068
1.730845	0.1469131	0.2153826	11.07558
1.661407	0.1166909	0.1985463	11.06032
1.523745	0.11873	0.1896398	11.17008

Last edited by Morten Gravesen; 03 Aug 2018, 09:11.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#13

03 Aug 2018, 09:14

And the hypothesis again: do firms with hight EFWAMB have lower leverage than firms with low EFWAMB.

For this hypothesis, the fixed effects model would be inappropriate, because the fixed-effects model specifically estimates changes within firms over time. The OLS model's estimates are a mixture of the within-firm effects (which this hypothesis does not address) and between-firm effects (which the hypothesis targets). Better still, I think, would be a between-effects analysis, using -xtreg, be- which provides a pure between-firms effect estimate.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#14

03 Aug 2018, 09:35

Morten:
as an an aside to previous excellent advice, please note a(nother) relevant difference between fixed effect (-fe-) and (pooled) OLS:
- -fe- specification allows a limited endogeneity, that is the individual error is correlated with the vector of regressors;
- (pooled) OLS (just like random effect specification) rules totally out correlation with any residual component.
As an aside, the fact that Baker and Wurgler (2002) (as per FAQ, full reference please; I can understand you're under heavy pressure, though) do not explain the reason underlying their model choice (I do not know that paper, so I trust your words) will not shelter you from justifying in your dissertation why you decided to go (say) -fe- and not (pooled) OLS (or the other way round).
I think I would follow Clyde's advice (assuming that you have excluded random effect specifiction once and for all).

Kind regards,
Carlo
(Stata 19.0)
Comment
Philip Gigliotti

Join Date: Nov 2016

Posts: 118
#15

03 Aug 2018, 20:23

Regardless of whether you run a fixed effects model or an OLS model, if you havehpanel data you should have cluster robust standard errors. If autocorrelation and heteroscedasticity are a problem, they are a problem regardless of what specification you use. Furthermore, they are standard in finance and economics, theory aside you should never in practice run a regression without them.

OLS measures differences between firms, for instance the coefficient on firm size would measure the difference between large and small firms. This is subject to major endogeneity concerns in observational data as small and large firms differ in many unobservables ways. The coefficient on size in a fixed effect measures the difference between periods in the same firm when it had different sizes. It's much harder to argue that this change of size is correlated with other unobservables changes instead of just the inherent nature of the firm itself. Thus fixed effects is usually the only plausibly consistent estimator. For this reason it's usually the only accepted choice of estimator in economics, finance or disciplines dealing with observational data.
Comment

Announcement