Dear all,
my situation is as this:
dependent var: log. wage
explanatory variables:
- gender (binary)
- occupation (100+ categories)
Sample size is large (100k+)
In a first step, I want to show the gender effect, under control of occupation. The model is like this:
I can either use robust standard errors, or cluster by occupation. Since occupation is very relevant from a theoretical and empirical point of view and has many levels, this choice seems fine. The results are as follows (I post images since the data are in a secure environment):

As you can see, SEs are much larger using the cluster SEs. So, the conservative thing to do is to report the cluster SEs.
Next, I would like to test whether the effect of gender varies by occupation. The model is as follows:
Again, I can either choose robust or cluster SEs for reghdfe. The thing is, the results differ a lot. And now, the cluster SEs are much smaller.


So I wonder what to report here or what other options I have. Do you have any idea why standard errors differ so widely?
my situation is as this:
dependent var: log. wage
explanatory variables:
- gender (binary)
- occupation (100+ categories)
Sample size is large (100k+)
In a first step, I want to show the gender effect, under control of occupation. The model is like this:
Code:
reghdfe logincome i.gender i.occupation $controls
As you can see, SEs are much larger using the cluster SEs. So, the conservative thing to do is to report the cluster SEs.
Next, I would like to test whether the effect of gender varies by occupation. The model is as follows:
Code:
reghdfe logincome i.gender##i.occupation $controls margins occupation, dydx(gender) marginsplot
So I wonder what to report here or what other options I have. Do you have any idea why standard errors differ so widely?
Comment