Comparing regression models with fixed effects and multi clustering

Lutfi Ozturker

Join Date: Apr 2017
Posts: 40

Comparing regression models with fixed effects and multi clustering

16 Jul 2022, 18:32

Hi,

I have unbalanced panel data for 2235 companies from 32 countries over 14 years for which I run the following six regressions: (CVs: additional 12 control variables)

1-) reg Y X CVs ,r
2-) reg Y X CVs i.country i.year,r
3-) reghdfe Y X CVs,absorb(country year) vce(cluster country)
4-) reghdfe Y X CVs,absorb(country year) vce(cluster year)
5-) reghdfe Y X CVs,absorb(country year) vce(cluster country year)
6-) reghdfe Y X CVs,noabsorb vce(cluster country year)

Their outputs are as follows:

Y	1	2	3	4	5	6
X	1.218***	0.821***	0.821***	0.821**	0.821*	1.218**
	(7.43)	(7.07)	(2.73)	(2.83)	(2.14)	(2.31)
Additional control variables	YES	YES	YES	YES	YES	YES
Time fixed effects	NO	YES	YES	YES	YES	NO
Country fixed effects	NO	YES	YES	YES	YES	NO
Clustered standard errors (country)	NO	NO	YES	NO	YES	YES
Clustered standard errors (year)	NO	NO	NO	YES	YES	YES
N. of Obs.	22296	22296	22296	22296	22296	22296
F-stat	24.75	10.60	198.69	759.65	-	64.01
R²	0.1032	0.1459	0.1459	0.1459	0.1459	0.1032

Based on the outputs, could we suggest;

1-) Employing time and country fixed effects, and
2-) Clustering for country or year or both

is necessary?

Last but not least, does F-stat help to pick up the best model among the five above? Why can't I have an F-stat for the fifth model?

Best,

Lutfi

Last edited by Lutfi Ozturker; 16 Jul 2022, 18:39.

Tags: clustering standard error, fixed effects, regression, regressionanalysis

Joro Kolev

Join Date: Aug 2018

Posts: 3050
#2

17 Jul 2022, 00:01

I do not think that you can answer your question 2-) by looking at your output post factum. The choice at which level you cluster needs to be made before before you fit the regression. Post factum you just discover that when you cluster you get higher standard errors as expected, because you have less effective observations.

You are not saying what are Y and X, and what the previous literature says about the relationship of these two. Based on the information you are providing, you should include the fixed effects because they make a big difference.
Comment
Lutfi Ozturker

Join Date: Apr 2017

Posts: 40
#3

17 Jul 2022, 15:59

Thank you, Joro.

Y is the observed riskiness of the companies. X is a measure of similarity to all other companies from the same country. Please remember there are 2235 companies from 32 countries over 14 years.

Literature is ambiguous about the relationship between Y and X yet consistently significant/positive one between the two in each of the six models above makes sense.

Would you then update your insightful comments, please?
Comment
Lutfi Ozturker

Join Date: Apr 2017

Posts: 40
#4

17 Jul 2022, 17:34

By the way, coefficients and R2 are the same but not the t-stats and F-stats for the following two models for which I actually expected to produce identical results:

reg Y X CVs i.country i.year,r ............................model-2 above
reghdfe Y X CVs,absorb(country year) .............alternative code for model-2

This made me worried about comparing my six models above of which the first two use "reg" and the following four "reghdfe". If the outputs of "reg" and "reghdfe" for the same model are not identical, then I suppose I cannot compare my six models either. Then, should I rewrite the codes for models numbered 3,4,5,6 without "reghdfe"? That is easier said than done though since I couldn't do that.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#5

17 Jul 2022, 18:22

Model 2 specifies robust standard errors. The "alternative code for model-2" does not. That is why the t-stats and F-stats are different. (Had you looked at them, you would have observed the standard errors are also different, as are the p-values and confidence intervals.)
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#6

17 Jul 2022, 23:39

Clyde explained why you are getting different variances in #4, the -,r- at the end of your first command calls for robust variance; whereas your -reghdfe- variances are homoskedastic.

I think you should include country and year fixed effects. The way how you described your Y and X, I see no reason why they, and their relationship should not depend on the country and year.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17852
#7

18 Jul 2022, 00:47

Lufti:
as an aside to previous excellent advice, I would not sponsor.
1-) reg Y X CVs ,r
2-) reg Y X CVs i.country i.year,r
with panel data.
With -robust- standard error you're taking care of heteroskedasticity only.
Conversely, you should go -vce(cluster panelid)- as per panel definition your observations are not independent.
This issue creeps up from time to time on this forum, since, unlike, -regress-, both -robust- and -vce(cluster panelid)- options do the very same job under -xtreg- (as they both invoke cluster-robuts standard error).

Kind regards,
Carlo
(Stata 19.0)
Comment
Lutfi Ozturker

Join Date: Apr 2017

Posts: 40
#8

18 Jul 2022, 02:59

Thank you all.

Based on the responses and the definition of Y and X, which of the six models (or an alternate one) would you suggest?

Please also give the code for your model suggestion for the sake of clarity. For instance, I guess I cannot employ the following model code:
xtreg Y X CVs i.country i.year,fe vce (cluster company)
because the "country" is not a time-varying regressor in the model. I'm not sure though if this is what Carlo advised. Nevertheless, with or without one or more fixed effects, is multi-clustering necessary, too? This second question goes to Joro specifically who advised fixed effects for both country and year. Yet, everybody is more than welcome to suggest a model code for sure.

PS-1: This is the setting of the panel:

xtset company year

PS-2:I have unbalanced panel data for 2235 companies from 32 countries over 14 years.

Last edited by Lutfi Ozturker; 18 Jul 2022, 03:36.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17852
#9

18 Jul 2022, 03:35

Lutfi:
as far as the following code is concerned:

Code:

xtreg Y X CVs i.country i.year,fe (vce cluster company)

while the -fe- estimator will wipe out time-invariant variables (as in all likelihood country is), my previous comment rested on the fact that -regress....,r- does not mirror a panel data code setup.

Kind regards,
Carlo
(Stata 19.0)
Comment
Lutfi Ozturker

Join Date: Apr 2017

Posts: 40
#10

18 Jul 2022, 04:23

Thank you Carlo.

Since

PHP Code:

xtreg Y X CVs i.country i.year,fe vce (cluster company)

produces "country omitted because of collinearity" error message, which model/code do you recommend? What about multi-clustering? Do you think it is needed for my set-up or is only the company level satisfactory?

PS: Y is the observed riskiness of the companies. X is a measure of similarity to all other companies from the same country. There are 2235 companies from 32 countries over 14 years.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17852
#11

18 Jul 2022, 04:35

Lutfi:
the code produces exactly what expected.
Under the -fe- specification, -i.country- is basically redundant (as the estimator will give you back no coefficient at all).
I would go:

Code:

xtreg Y X CVs i.year,fe vce (cluster company)

I do not see multi-clustering that useful here.

Kind regards,
Carlo
(Stata 19.0)
Comment

Lutfi Ozturker

Join Date: Apr 2017
Posts: 40

#12

18 Jul 2022, 10:00

Thank you Carlo.

Taking into account that X is a measure of similarity to all other companies from the same country, should we sacrifice bank fixed effects (in your code above) and prefer country fixed effects instead as follows:

PHP Code:


reg Y X CVs i.country i.year,r cluster (company)

I'm not trying to favour "reg" to "xtreg" but just trying to implement country and time fixed effects should they be preferable to company and time fixed effects in the setup.

Please find the output for your model numbered 7 below left to my alternative above numbered 8 in addition to the previously mentioned six models as follows:

Y	1	2	3	4	5	6	7	8
X	1.218***	0.821***	0.821***	0.821**	0.821*	1.218**	2.198**	0.821***
	(7.43)	(7.07)	(2.73)	(2.83)	(2.14)	(2.31)	(2.42)	(3.63)
Additional control variables	YES	YES	YES	YES	YES	YES	YES	YES
Time fixed effects	NO	YES	YES	YES	YES	NO	YES	YES
Country fixed effects	NO	YES	YES	YES	YES	NO	NO	YES
Company fixed effects	NO	NO	NO	NO	NO	NO	YES	NO
Clustered standard errors (country)	NO	NO	YES	NO	YES	YES	NO	NO
Clustered standard errors (year)	NO	NO	NO	YES	YES	YES	NO	NO
Clustered standard errors (company)	NO	NO	NO	NO	NO	NO	YES	YES
N. of Obs.	22296	22296	22296	22296	22296	22296	22296	22296
F-stat	24.75	10.60	198.69	759.65	-	64.01	2.85	4.15
R²	0.1032	0.1459	0.1459	0.1459	0.1459	0.1032	0.1063	0.1459

Last edited by Lutfi Ozturker; 18 Jul 2022, 10:27.

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17852
#13

18 Jul 2022, 10:20

Lutfi:
under the -fe- specification, each and every time-invariant variable will be wiped out: it holds for -i.country-; -i.bank-, i-whatever- if there's no within-panel variation.

Kind regards,
Carlo
(Stata 19.0)
Comment
Lutfi Ozturker

Join Date: Apr 2017

Posts: 40
#14

18 Jul 2022, 11:25

I assume model 7 (your code)

PHP Code:

xtreg Y X CVs i.year,fe vce (cluster company)

employs company and time fixed effects whereas model 8

PHP Code:

reg Y X CVs i.country i.year,r cluster (company)

employs country and time fixed effects, is this correct? Therefore, we have to choose two fixed effects at most at a time since three of them (company,country,time) cannot be employed simultaneously, is this also correct?

Taking into account that X is a measure of similarity to all other companies from the same country, should we sacrifice company fixed effects (in model 7) and prefer country fixed effects instead (model 8)?

PS-1: Y is the observed riskiness of the companies.

PS-2: This is the setting of the panel:

xtset company year

PS-3:I have unbalanced panel data for 2235 companies from 32 countries over 14 years.

Last edited by Lutfi Ozturker; 18 Jul 2022, 11:32.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17852
#15

18 Jul 2022, 14:44

Lutfi:
I would go -model 7-, as your -panelid- is company.
In addition, with - model 7- you investigate both company and time fixed effect.
As an aside, time is actually time-varying; therefore, exception made for the year reference category (and possibly another year omitted due to collinearity), you'll have the coefficients for the remaining years.
Eventually, with more than 2000 panels, -vce(cluster panelid)- is mandatory.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Comparing regression models with fixed effects and multi clustering

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment