Panel Data which estimator to use: okay to ignore Hausmann test?

Niels Meijer

Join Date: Aug 2017

Posts: 19
#1

Panel Data which estimator to use: okay to ignore Hausmann test?

14 Sep 2017, 09:38

Greetings,

This is my first post to this community. FYI I am a master student in Finance working on my thesis.

I am investigating the effect of board characteristics on risk-taking for the banking industry. I have an international sample on approx 200 banks across 46 countries with data from 2002-2016 with varying degrees data available for each bank with an average of approx. 8.5 years of data per bank available. Needless to say, I am using (unbalanced) panel data analysis.

The data I have collected can be grouped the following way:
A: Proxies for bank risk (Dependent variables) (I am planning to create 4 models that are identical but with different proxies for risk)
B: Board Characteristics data (e.g., board size)
C: Bank level controls
D: Country level controls

Now I was wondering which estimator to use? It appears it would come down between Random Effects vs Fixed Effects estimators.

I've run a Hausman test which rejects the null hypothesis which would indicate that I should use the FE model. But, two papers analyzing very similar data to mine with a similar research question use a random effects model with the following rationale:

Quote from paper by Pathan (2009) on a very similar topic:

The primary estimation method for Eq. (2) is generalized least
square (GLS) random effect (RE) technique following Baltagi and
Wu (1999) procedure. This technique is robust to first-order autoregressive
(AR(1)) disturbances (if any) within unbalanced-panels
and cross-sectional correlation and/or heteroskedasticity across
panels. In the presence of unobserved bank fixed-effect, panel
‘Fixed-Effect’ (FE) estimation is commonly suggested (see Wooldridge,
2002, pp. 265–291, for details on FE estimation). However,
such FE estimation is not suitable for this study for several reasons.
First, time-invariant variable like GINDEX cannot be estimated
with FE regression as it would be absorbed or wiped out in ‘within
transformation’ or ‘time-demeaning’ process of the variables in FE.
Second, FE estimation requires significant within panel (bank) variation
of the variable values to produce consistent and efficient
estimates. When the important variables on the right-hand side
do not vary much over time, like the board structure variables in
this paper, the FE estimates would be imprecise (Wooldridge,
2002, p. 286).6 Third, FE estimates may aggravate the problem of
multicollinearity if solved with least squares dummy variables (Baltagi,
2005). Finally, for large ‘N’ (i.e. 212) and fixed small ‘T’ (i.e. 8),
which is the case with this study’s panel data set (observations on
212 BHCs over 8 years) FE estimation is inconsistent (Baltagi,
2005, p. 13). Furthermore, in case of a large N, FE estimation would
lead to an enormous loss of degrees of freedom (Baltagi, 2005, p. 14).
Thus, an alternative to FE, i.e. GLS RE is proposed here.

Now I do not have time invariant effects so the first reasoning should not be a problem for. But, I have obviously similar data with not so strong within panel (bank) variation for my explanatory variables so the second point would also be a problem form. The third and fourth problem should be similar for me as well.

I was just wondering what your thoughts are on this? I could really use some expert advise.

And another question, should I also include country dummies? I have already included several country control variables (e.g. GDP per capita, real interest rate etc) is it okay to include both?

Kind regards,

Niels
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17850
#2

14 Sep 2017, 09:57

Niels:
welcome to the list.
Pathan's article (full reference, please) reports the usual nuisances about using -fe-.
A way to relax the restrictive (and sometimes irrealistic) assumption of no correlation between observed and unobserved variables in -re- specification is the Mundlak's approach, which available as an user-written programme (type -search mundlak- from within Stata to install it).
As an aside, I find always surprising that master/PhD supervisors do not seem to be the first source of advice/discussion on these topics.

Kind regards,
Carlo
(Stata 19.0)
Comment
Niels Meijer

Join Date: Aug 2017

Posts: 19
#3

15 Sep 2017, 01:44

Thank you for your response. Full reference is: Strong boards, CEO power and bank risk-taking, Pathan (2009). Journal of Banking & Finance volume 33. What exactly happens when you use the Mundlak approach, is it academically justified to use it? I cannot find that much about it. Or would it be justified to use a random effects GLS estimator like Pathan?

And coming back to my last question: "And another question, should I also include country dummies? I have already included several country control variables (e.g. GDP per capita, real interest rate etc) is it okay to include both?"

regarding supervisor: Time with supervisor is limited and you are encouraged to use other sources (e.g. statalist) for help.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17850
#4

15 Sep 2017, 02:10

Niels:
1) http://blog.stata.com/2015/10/29/fix...dlak-approach/. Moreover: Mundlak, Y. 1978: On the pooling of time series and cross section data. Econometrica 46:69-85;
2) you can include -i.country-, but I suspect that some collinearity issue will creep up;
3) I see: however, in order to avoid making things too many times, I would recommend you to negotiate with her/him your full research strategy (including panel data regression specification).

Kind regards,
Carlo
(Stata 19.0)
Comment
Niels Meijer

Join Date: Aug 2017

Posts: 19
#5

15 Sep 2017, 02:16

Thank you for your quick response, I will look into it.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3911
#6

15 Sep 2017, 03:57

Not much on to add, but still wish to share my view regarding the cited arguments in Pathan (2009)

First, time-invariant variable like GINDEX cannot be estimated
with FE regression as it would be absorbed or wiped out in ‘within
transformation’ or ‘time-demeaning’ process of the variables in FE.

True. Usually this is not considered a problem in the theoretical framework of counterfactual causal inference. If a variable does not change it cannot have a causal effect in the sense it is defined within this framework. Personally, I would not be dogmatic about this framework.

Second, FE estimation requires significant within panel (bank) variation
of the variable values to produce consistent and efficient
estimates. When the important variables on the right-hand side
do not vary much over time, like the board structure variables in
this paper, the FE estimates would be imprecise (Wooldridge,
2002, p. 286)

True. However, you need to think about whether you prefer an imprecise (still) consistent estimator over a precise (potentially) inconsistent one.

Third, FE estimates may aggravate the problem of
multicollinearity if solved with least squares dummy variables (Baltagi,
2005).

Maybe, but then just do not do it this way. Compare the author's own first argument where the within-transformation is suggested.

Finally, for large ‘N’ (i.e. 212) and fixed small ‘T’ (i.e. 8),
which is the case with this study’s panel data set (observations on
212 BHCs over 8 years)

This might be a misconception. Fixed T does not refer to your observation window, but to the theoretical population. Whether N=212 is "large" is another question.

Furthermore, in case of a large N, FE estimation would
lead to an enormous loss of degrees of freedom (Baltagi, 2005, p. 14).

True, but see comment on the second point.

One last word to the Mundlak's approach. As useful and popular as this approach now seems to be, keep in mind that the coefficients for the time-constant variables still require the same exogeneity assumption underlying the RE model.

Edit:

Oh, I forgot: the Hausman test. Do not only look at the test statistic and the p-value. Compare the coefficients. Sometimes small differences will still lead to a rejection of the null, given a large sample. If the substantive results do not change much, then I would go with the RE model as the efficient estimator.

Best
Daniel

Last edited by daniel klein; 15 Sep 2017, 04:06.
1 like
Comment
Niels Meijer

Join Date: Aug 2017

Posts: 19
#7

15 Sep 2017, 04:13

Thank you for your reply Daniel, it is very helpful.

I have a question regarding the Mundlak test, I am trying to run it manually as the linked provided by Carlo explains. Should I also calculate the panel-level average of my control variables (btw all my variables are time-varying), I suspect I do but I just want to make sure.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3911
#8

15 Sep 2017, 04:25

When you want the Mundlak approach you need to the panel-average for all predictors. Note that the outcome is not transformed. Again, do not only look at p-values. Try to make sense of the estimated parameters.

Best
Daniel
Comment
Niels Meijer

Join Date: Aug 2017

Posts: 19
#9

15 Sep 2017, 05:02

So all predictors includes control variables, right? I have run the Mundlak test and it rejects the null hypothesis, which would indicate the use of FE.

But: I was planning on using year fixed effects in my regression but it won't allow me to use i.year in the Mundlak regression, does this affect the outcome of my Mundlak test as my year dummies are very strongly significant, which makes sense in the context (banks are more risky during financial crisis etc.).

Kind regards,

Niels
Comment

Announcement

Panel Data which estimator to use: okay to ignore Hausmann test?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment