Problems with panel data: significant results under pooled OLS but no significance under both xtreg be or fe

martijn hoorn

Join Date: May 2018

Posts: 4
#1

Problems with panel data: significant results under pooled OLS but no significance under both xtreg be or fe

23 May 2018, 04:10

In my research I try to explain the height of capex/sales by CEO overconfidence (a dummy variable) and various controls. I use a panel dataset containing 259 firms over a period of 11 years.

I started with the hausman test to check whether random effects was an appropriate test, it wasn't (in finance it almost never is to my understanding)
I then went on with a fixed effect analysis (using the stata command: xtreg , fe) but the results were very insignificant (Using pooled-OLS I find significance at 1% or better, yes I told stata its panel before I used the regress command). I Then expected to find very significant results when conducting the between effect analysis (using the stata command: xtreg ,be), because the output from the OLS regression should be driven by at least one (within or/and between effects). However the output of the between effects was also insignificant.

So, I find significance at 1% or better using OLS in where I capture both within and between effects. But when analyzing them separately, I find no effect at all.

I tried clustering on company id and used robust commands.

To check whether fixed effect regression did not work I also tried the least square dummy approach (having dummies for every firm). This of course gave the exact same output as the xtreg, fe command.

Checking the internet, overall fixed effects is preferred over pooled OLS (I also performed a hausman test to check whether random effects could be used as well). However I read some papers stating that when the independent variables changes slowly over time (as is overconfidence) fixed effects can fail to detect significance while it is there.

To wrap up:
1) can I use pooled OLS?
2) is there something wrong with my analysis: significance with pooled OLS, no significance with both between as within.
3) what kind of tests/commands can I use to check this.
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17699
#2

23 May 2018, 08:43

Martijn:
interested listers can have a weak insight of what's the matter with your data unless you share what you typed and what Stata gave you back (as per FAQ). Thanks.
You should not go for the most significant regression model, but for the one the gives the fairest and truest representation of the data generating process: see what others do in the past when presented with the same research goal.
Rarely pooled OLS outperforms -xtreg- when it comes to panel data analysis (see -xtreg- entry in Stata .pdf manual for further details).
See also -help xttest0-.

Kind regards,
Carlo
(Stata 19.0)
Comment
martijn hoorn

Join Date: May 2018

Posts: 4
#3

23 May 2018, 09:20

Thanks for your reply. regarding your commend about share what you typed and show what stata gives back: Do you mean like this?

* I run a fixed effect (xtreg, re)

-estimates store random_effects

-xttest0

Breusch and Pagan Lagrangian multiplier test for random effects

CAPEXSALES[Companykey,t] = Xb + u[Companykey] + e[Companykey,t]

Estimated results:
| Var sd = sqrt(Var)
---------+-----------------------------
CAPEXSA~S | .0203019 .1424849
e | .0020315 .0450721
u | .007848 .0885889

Test: Var(u) = 0
chibar2(01) = 1290.89
Prob > chibar2 = 0.0000

-xtreg , fe

Fixed-effects (within) regression Number of obs = 1,170
Group variable: Companykey Number of groups = 222

R-sq: Obs per group:
within = 0.0317 min = 1
between = 0.0006 avg = 5.3
overall = 0.0117 max = 11

F(16,932) = 1.91
corr(u_i, Xb) = 0.0090 Prob > F = 0.0166

F test that all u_i=0: F(221, 932) = 40.94 Prob > F = 0.0000

-hausman . random_effects

------------------------------------------------------------------------------
b = consistent under Ho and Ha; obtained from xtreg
B = inconsistent under Ha, efficient under Ho; obtained from xtreg

Test: Ho: difference in coefficients not systematic

chi2(16) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 53.90
Prob>chi2 = 0.0000

From this I would conclude that using the fixed effect analysis is the better model (preferred over both pooled OLS as xtreg, re). However there is heavy argument about using firm-fixed effects in your model when your independent variable varies little or slowly over time. The R^2 shown in the fixed effect analysis is very low aswell.

I do not really care about the significance I just found it very weird that I find results with pooled OLS but not with both between as within effect. Am I incorrect to say that at least one of those two should be significant if the pooled OLS is significant?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17699
#4

23 May 2018, 09:30

Martijn:
your approach to test -fe- vs -re- specification is right (for the future, please share what you typed and what Stata gave you back via CODE delimiters - see the FAQ on that).
The argument against using -fe- when time-variant regressors vary slowly as time goes by makes sense, but should be constrasted againts the risk oh having biased coefficiients under -re-, if -re- is not the right specification for your data.
Eventually, I fail to get if you cluster your standard errors on -paneild- in pooled OLS: hence, I cannot comment on your results (that you did not show).

Kind regards,
Carlo
(Stata 19.0)
Comment
martijn hoorn

Join Date: May 2018

Posts: 4
#5

23 May 2018, 10:34

So I did not use the OLS, cluster(panelid). I am "familiar" with using clustered standard errors when using xtreg command, but is it also used for pooled??? From my understanding there is this shortcut whether to use clustered standard errors: If the standard errors are bigger when clustered, you should use the clustered (dunno if its the same for pooled OLS).

Ok now some codes here (and output) using code delimiters:

Code:

xtset Companykey Year

Code:

regress CAPEXSALES Ceooverconfidence Leverage lnTotalAssets ROA laggedROA PercentageofTotalSharesOwned OwnershipSQ Tenure TenureSQ Sector15 Sector20 Sector25 Sector30 Sector35 Sector40 Sector45 Sector50 Sector55 Sector60 crisis, robust

Linear regression Number of obs = 1,716
F(20, 1695) = 52.26
Prob > F = 0.0000
R-squared = 0.5693
Root MSE = .10822

----------------------------------------------------------------------------------------------
| Robust
CAPEXSALES | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------------------------+----------------------------------------------------------------
Ceooverconfidence | .0175321 .0052063 3.37 0.001 .0073207 .0277435
Controlls
Sector fixed effects
Crisis dummy (=1 2008-2009-2010)

Now without robust

Code:

. regress CAPEXSALES Ceooverconfidence Leverage lnTotalAssets ROA laggedROA PercentageofTotalSharesOwned OwnershipSQ Tenure TenureSQ Sector15 Sector20 Sector25 Sector30 Sector35 Sector40 Sector45 Sector50 Sector55 Sector60 crisis

----------------------------------------------------------------------------------------------
CAPEXSALES | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------------------------+----------------------------------------------------------------
Ceooverconfidence | .0175321 .0058593 2.99 0.003 .0060399 .0290243

Now with cluster

Code:

regress CAPEXSALES Ceooverconfidence Leverage lnTotalAssets ROA laggedROA PercentageofTotalSharesOwned OwnershipSQ Tenure TenureSQ Sector15 Sector20 Sector25 Sector30 Sector35 Sector40 Sector45 Sector50 Sector55 Sector60 crisis, cluster(Companykey)

| Robust
CAPEXSALES | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------------------------+----------------------------------------------------------------
Ceooverconfidence | .0175321 .0101312 1.73 0.085 -.0024193 .0374836

Standard errors are bigger, should I prefer this model?

Now when using dummies for every firm

Code:

. regress CAPEXSALES Ceooverconfidence Leverage lnTotalAssets ROA laggedROA PercentageofTotalSharesOwned OwnershipSQ Tenure TenureSQ _IPERMNOCOD_* crisis

----------------------------------------------------------------------------------------------
CAPEXSALES | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------------------------+----------------------------------------------------------------
Ceooverconfidence | .0032198 .0061016 0.53 0.598 -.0087491 .0151887

Now when using xtreg within effect followed by between

Code:

xtreg CAPEXSALES Ceooverconfidence Leverage lnTotalAssets ROA laggedROA PercentageofTotalSharesOwned OwnershipSQ Tenure TenureSQ Sector15 Sector20 Sector25 Sector30 Sector35 Sector40 Sector45 Sector50 Sector55 Sector60 crisis, fe

R-sq: Obs per group:
within = 0.0235 min = 1
between = 0.0487 avg = 6.7
overall = 0.0402 max = 11

F(10,1450) = 3.48
corr(u_i, Xb) = 0.1075 Prob > F = 0.0002

----------------------------------------------------------------------------------------------
CAPEXSALES | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------------------------+----------------------------------------------------------------
Ceooverconfidence | .0032198 .0061016 0.53 0.598 -.0087491 .0151887

Indeed this is the same as making dummies for every firm using pooled OLS

Code:

xtreg CAPEXSALES Ceooverconfidence Leverage lnTotalAssets ROA laggedROA PercentageofTotalSharesOwned OwnershipSQ Tenure TenureSQ Sector15 Sector20 Sector25 Sector30 Sector35 Sector40 Sector45 Sector50 Sector55 Sector60 crisis, be

----------------------------------------------------------------------------------------------
CAPEXSALES | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------------------------+----------------------------------------------------------------
Ceooverconfidence | .0278997 .0174407 1.60 0.111 -.0064604 .0622599

So some or much significance when using pooled OLS, but no significance when using both between as within.

To me the little variation argument makes sense in explaining why I do not find results when using FE. However this does not explain why I do not find results under BE.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17699
#6

23 May 2018, 11:28

Martijn:
the reason for invoking clustered standard errors in pooled OLS is due to the fact that each panel is composed of non-independent observations.
That said, I would find a regression strategy aimed at hunting for "the most significant" regression model at risk of not passing the muster with any average reviewer.

Kind regards,
Carlo
(Stata 19.0)
Comment
martijn hoorn

Join Date: May 2018

Posts: 4
#7

23 May 2018, 12:16

Indeed my reviewer does not care about significance and told me to be wary of p-hunting. So I am not interested in that. For me it comes down to two questions.

Do you know how to formal test if my overconfidence variable indeed does not have enough variation to conduct a fixed effect analysis?

Do you have an explanation why pooled OLS shows significance but xtreg BE and FE does not. (I do not care about BE, I just tested it to explain the low significance when using FE figuring that at least one of these should represent my findings in pooled OLS. Again I am not interested in significance, but just confused about my results.)
Comment
Amin Sofla

Join Date: May 2018

Posts: 67
#8

24 May 2018, 02:52

There are many materials that can help you to understand the differences between pooled and FE e.g. , Baltagi, Griffin, and Xiong (2000). I am guessing that in your case the independent variable is correlated with unobserved heterogeneity. If there is a strong correlation between CEO overconfidence and the firm-specific unobserved characteristics, you can apply ‘xthtaylor’.

Additional comment: In the formation of the ‘investment intense firm’-‘overconfident CEO’ dyds, there is a sorting as well. And, the two-way nature of this matching will further complicate empirical inferences (Sorensen 2007). You can use the ‘matchingMarkets’ package in R to carry out your estimation routine (I am not aware of any Stata’s user-written package on the subject.)
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17699
#9

24 May 2018, 02:56

Martijn:
in your pooled OLS you invoked -robust- instead of cluster- option for standard errors.
Please note that, unlike -xtreg-, under -regress- those two options do different jobs: -robust- accounts for heteroskedasticity, whereas -cluster- accounts for autocorrelation.

Kind regards,
Carlo
(Stata 19.0)
Comment
Bright Tree

Join Date: Mar 2020

Posts: 85
#10

10 Jan 2021, 17:01

Dear Prof. Lazzaro, I wondered if the -cluster- option also served as the same function of accounting for autocorrelation, whereas -robust- accounting for heteroskedasticity.

I met the warning as below, as I added into the option of -cluster- and -i()-,

Warning: estimated covariance matrix of moment conditions not of full rank.
overidentification statistic not reported, and standard errors and
model tests should be interpreted with caution.
Possible causes:
number of clusters insufficient to calculate robust covariance matrix
singleton dummy variable (dummy with one 1 and N-1 0s or vice versa)
partial option may address problem.

But I have used the option of -partial- . I wondered if I could partial the variable clustered.
Thank you.

Last edited by Bright Tree; 10 Jan 2021, 17:33.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17699
#11

11 Jan 2021, 07:58

Bright:
please detail the Stata commands your'referring to. Thanks.
More substantively, please call me Carlo, just like all (and many more off) this forum do. Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Bright Tree

Join Date: Mar 2020

Posts: 85
#12

16 Jan 2021, 08:26

Dear Carlo, I am very appreciative of Professor's very invaluable advice. I changed the clustered variable and the Warning did not show up again. Thank you.
Comment

Announcement

Problems with panel data: significant results under pooled OLS but no significance under both xtreg be or fe

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment