Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Robust or Cluster SE?

    Dear all,
    my situation is as this:
    dependent var: log. wage

    explanatory variables:
    - gender (binary)
    - occupation (100+ categories)
    Sample size is large (100k+)

    In a first step, I want to show the gender effect, under control of occupation. The model is like this:
    Code:
    reghdfe logincome i.gender i.occupation $controls
    I can either use robust standard errors, or cluster by occupation. Since occupation is very relevant from a theoretical and empirical point of view and has many levels, this choice seems fine. The results are as follows (I post images since the data are in a secure environment):
    Click image for larger version

Name:	mainres.png
Views:	2
Size:	3.1 KB
ID:	1780191
    As you can see, SEs are much larger using the cluster SEs. So, the conservative thing to do is to report the cluster SEs.

    Next, I would like to test whether the effect of gender varies by occupation. The model is as follows:
    Code:
    reghdfe logincome i.gender##i.occupation $controls
    margins occupation, dydx(gender)
    marginsplot
    Again, I can either choose robust or cluster SEs for reghdfe. The thing is, the results differ a lot. And now, the cluster SEs are much smaller.
    Click image for larger version

Name:	rob.png
Views:	1
Size:	78.2 KB
ID:	1780192


    Click image for larger version

Name:	cluster.png
Views:	1
Size:	63.1 KB
ID:	1780193


    So I wonder what to report here or what other options I have. Do you have any idea why standard errors differ so widely?
    Attached Files
    Best wishes

    Stata 18.0 MP | ORCID | Google Scholar

  • #2
    There’s no theoretical reason to cluster here unless your sample was obtained by first sampling occupations and then people within occupations. That seems unlikely. Probably it’s roughly a random sample.

    Would cluster by race if you had that information?

    With heterogeneity in the effects by occupation the clustered standard errors a systematically too large. We shouldn’t use them just because clustering matters.

    Comment


    • #3
      Dear Prof. Wooldridge, many thanks for the helpful reply. Indeed, sampling is not related to occupations. However, there is also a cluster variable available regarding sampling districts. When I use this variable for clustering, the results are very similar to the robust SE version. BTW: I have also found this relevant publication: https://economics.mit.edu/sites/defa...Clustering.pdf
      Best wishes

      Stata 18.0 MP | ORCID | Google Scholar

      Comment

      Working...
      X