Clustering standard errors

Ali Mirzaei

Join Date: Jan 2021

Posts: 3
#1

Clustering standard errors

22 Jan 2021, 21:34

I have a cross-sectional dataset at the firm level, and my interest variable (RHS) is a country-level variable. The response variable (LHS) is firm performance (about 1000 firms in 50 countries). It seems I have to cluster the standard errors at the country-level. My question is whether there is no problem if the number of observations per cluster vary significantly? In other words, since I have some countries with only one firm and some other countries with more than 200 firms then can I still cluster at the country-level? Or, I must cluster at the bank-level (which is equivalent to robust standard errors, because of pure cross-section)?
Thank you,
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30091
#2

22 Jan 2021, 21:47

It sounds like you want to set the bank as the panel variable in your -xtset- command and cluster vce at the country level. Variation in the number of observations per cluster is not a problem. But singleton clusters cause some difficulties. If you can combine some of the countries that have only one observation into fewer larger clusters in a way that makes sense from the real world perspective, the analysis will be a little easier: you won't have to cope with missing F statistics.

Another approach, if you and your audience can tolerate the use of random rather than fixed effects, is to do this as a 3-level model. This would be a more faithful reflection of the actual data design. I know that in finance and economics, random effects models are viewed skeptically, so this may not be an option for you. But that, or something like it, is what I would probably do.

Last edited by Clyde Schechter; 22 Jan 2021, 21:49.
1 like
Comment
Ali Mirzaei

Join Date: Jan 2021

Posts: 3
#3

22 Jan 2021, 23:30

Thank you very much Clyde for your valuable comments.
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#4

23 Jan 2021, 01:34

I would say pretty much what Clyde says: You definitely should cluster at the country level, and at first order, different cluster sizes are not a huge problem.

Then depending on how deep you want to go, at second order the combination of small number of clusters and vastly different cluster sizes is a bit of a problem.

The suggested solution to this problem is bootstrap, you can check

MacKinnon and Webb “Wild Bootstrap Inference for Wildly Different Cluster Sizes,” Journal of Applied Econometrics 32(2) pp. 233--254, 2017.

MacKinnon and Webb, together or separately, might also have other papers on the issue.
Comment
Ali Mirzaei

Join Date: Jan 2021

Posts: 3
#5

23 Jan 2021, 04:02

Thank you Joro. Much appreciated.
Comment

Announcement

Clustering standard errors

Comment

Comment

Comment

Comment