reghdfe, the number of clusters too smaller than the distinct values

Chanwoo Kim

Join Date: Nov 2022

Posts: 6
#1

reghdfe, the number of clusters too smaller than the distinct values

31 Mar 2023, 07:32

I have used "reghdfe" for linear regression with multiple fixed effects. The data I have is not exactly a panel. It records every employee hired by random firms in five cities for five years, and I have been analyzing the characteristics of the newly hired workers.

The dependent variable is whether a newly hired worker is female or not. For independent variables, I have age, experience, education, and so on, with time, city, and firm fixed effects.

When I run "distinct firm_id," there appear to be thousands of firms, and a substantial share of them are hiring new workers in different periods more than three times.

However, when I run a linear probability model with the option "absorb(i.time i.city i.firm_id) vce(cluster i.firm_id)," the number of clusters is only around 400.

I was wondering why this is happening.
Is it just because I misunderstand what "cluster" means in reghdfe? Is this not the number of distinct ids of the firms?

Is it because only 400 firms have a sufficient number of hires to estimate some parameters?
Tags: None
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#2

31 Mar 2023, 08:14

What happens if you type

vce(cluster firm_id)

?
Comment
Chanwoo Kim

Join Date: Nov 2022

Posts: 6
#3

31 Mar 2023, 08:45

If I use, "noabsorb vce(cluster firm_id)", including all previously absorbed variables specifying as dummy variables, still the number of ids is very low.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17854
#4

31 Mar 2023, 09:46

Chanwoo:
have you already ruled out missing values issues?

Kind regards,
Carlo
(Stata 19.0)
Comment
Chanwoo Kim

Join Date: Nov 2022

Posts: 6
#5

31 Mar 2023, 14:57

Dear Carlo,

Thank you very much for leaving the comment. I'll check that once again.
I guess the meaning cluster is the number of categories within my variable.
Comment

Announcement

reghdfe, the number of clusters too smaller than the distinct values

Comment

Comment

Comment

Comment