I have used "reghdfe" for linear regression with multiple fixed effects. The data I have is not exactly a panel. It records every employee hired by random firms in five cities for five years, and I have been analyzing the characteristics of the newly hired workers.
The dependent variable is whether a newly hired worker is female or not. For independent variables, I have age, experience, education, and so on, with time, city, and firm fixed effects.
When I run "distinct firm_id," there appear to be thousands of firms, and a substantial share of them are hiring new workers in different periods more than three times.
However, when I run a linear probability model with the option "absorb(i.time i.city i.firm_id) vce(cluster i.firm_id)," the number of clusters is only around 400.
I was wondering why this is happening.
The dependent variable is whether a newly hired worker is female or not. For independent variables, I have age, experience, education, and so on, with time, city, and firm fixed effects.
When I run "distinct firm_id," there appear to be thousands of firms, and a substantial share of them are hiring new workers in different periods more than three times.
However, when I run a linear probability model with the option "absorb(i.time i.city i.firm_id) vce(cluster i.firm_id)," the number of clusters is only around 400.
I was wondering why this is happening.
- Is it just because I misunderstand what "cluster" means in reghdfe? Is this not the number of distinct ids of the firms?
- Is it because only 400 firms have a sufficient number of hires to estimate some parameters?

Comment