What is the correct way to use double-clustering in reghdfe?

Hakan Gunduz

Join Date: Dec 2018

Posts: 49
#1

What is the correct way to use double-clustering in reghdfe?

11 Jan 2019, 03:39

Hi all,

I have a panel data set and I believe that correlations appear in two dimensions with both firm and time level. What is the correct way to use double-clustering in reghdfe?

These two give me different results:

Code:

reghdfe y1 x1 x2, absorb(year) cluster(companyid year)

and

Code:

reghdfe y1 x1 x2, absorb(year) cluster(companyid#year)

A paper I've been following states "clustering by firm and year to account for correlations among error terms within firm and within time in the analyses"

Which one should I use?

Many thanks.
Tags: None
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

14 Jan 2019, 12:29

You didn't get a quick answer. You'll increase your chances of a useful answer by following the FAQ on asking questions. For example, I don't know how many observations you have per firm year.

The documentation for reghdfe notes that "each cluster variable must have at least 50 different categories..." I suspect you're better off with companyid and year rather than companyid#year - do you really have more than one observation per company year? It seems to me to not make sense to have a cluster that is at the observation level. This is some work that claims larger clusters are better because the estimates are robust to within-cluster heteroskedasticity not cross-cluster heteroskedasticity.
Comment

Announcement