Robust or Cluster SE?

Felix Bittmann

Join Date: Aug 2018

Posts: 750
#1

Robust or Cluster SE?

23 Jul 2025, 11:12

Dear all,
my situation is as this:
dependent var: log. wage

explanatory variables:
- gender (binary)
- occupation (100+ categories)
Sample size is large (100k+)

In a first step, I want to show the gender effect, under control of occupation. The model is like this:

Code:

reghdfe logincome i.gender i.occupation $controls

I can either use robust standard errors, or cluster by occupation. Since occupation is very relevant from a theoretical and empirical point of view and has many levels, this choice seems fine. The results are as follows (I post images since the data are in a secure environment):

As you can see, SEs are much larger using the cluster SEs. So, the conservative thing to do is to report the cluster SEs.

Next, I would like to test whether the effect of gender varies by occupation. The model is as follows:

Code:

reghdfe logincome i.gender##i.occupation $controls margins occupation, dydx(gender) marginsplot

Again, I can either choose robust or cluster SEs for reghdfe. The thing is, the results differ a lot. And now, the cluster SEs are much smaller.

So I wonder what to report here or what other options I have. Do you have any idea why standard errors differ so widely?
Attached Files

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
Tags: cluster, reghdfe, regression
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2204
#2

23 Jul 2025, 23:40

There’s no theoretical reason to cluster here unless your sample was obtained by first sampling occupations and then people within occupations. That seems unlikely. Probably it’s roughly a random sample.

Would cluster by race if you had that information?

With heterogeneity in the effects by occupation the clustered standard errors a systematically too large. We shouldn’t use them just because clustering matters.
2 likes
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 750
#3

24 Jul 2025, 02:35

Dear Prof. Wooldridge, many thanks for the helpful reply. Indeed, sampling is not related to occupations. However, there is also a cluster variable available regarding sampling districts. When I use this variable for clustering, the results are very similar to the robust SE version. BTW: I have also found this relevant publication: https://economics.mit.edu/sites/defa...Clustering.pdf

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
Comment

Announcement

Robust or Cluster SE?

Comment

Comment