Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • reghdfe, the number of clusters too smaller than the distinct values

    I have used "reghdfe" for linear regression with multiple fixed effects. The data I have is not exactly a panel. It records every employee hired by random firms in five cities for five years, and I have been analyzing the characteristics of the newly hired workers.

    The dependent variable is whether a newly hired worker is female or not. For independent variables, I have age, experience, education, and so on, with time, city, and firm fixed effects.

    When I run "distinct firm_id," there appear to be thousands of firms, and a substantial share of them are hiring new workers in different periods more than three times.

    However, when I run a linear probability model with the option "absorb(i.time i.city i.firm_id) vce(cluster i.firm_id)," the number of clusters is only around 400.

    I was wondering why this is happening.
    1. Is it just because I misunderstand what "cluster" means in reghdfe? Is this not the number of distinct ids of the firms?
    2. Is it because only 400 firms have a sufficient number of hires to estimate some parameters?

  • #2
    What happens if you type

    vce(cluster firm_id)

    ?

    Comment


    • #3
      If I use, "noabsorb vce(cluster firm_id)", including all previously absorbed variables specifying as dummy variables, still the number of ids is very low.

      Comment


      • #4
        Chanwoo:
        have you already ruled out missing values issues?
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Dear Carlo,

          Thank you very much for leaving the comment. I'll check that once again.
          I guess the meaning cluster is the number of categories within my variable.

          Comment

          Working...
          X