Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Singletons, Cluster-Robust Standard Errors in reghdfe

    Dear all, I try to use the command reghdfe to estimate the following fixed effect model:

    Click image for larger version

Name:	regression model.png
Views:	1
Size:	44.8 KB
ID:	1599698

    My stata code is

    reghdfe delta_Y delta_X delta_Z if (year==2000|year == 2005|year == 2010) [aweight=population], absorb(city_year) cluster(city_year)

    However, the system reports the error message:

    (dropped 360 singleton observations)
    insufficient observations
    r(2001);


    I click the link of the error message and convert to the website about Singletons, Cluster-Robust Standard Errors: http://scorreia.com/research/singletons.pdf


    I can't fully understand the error message and how should I fix the code above?





  • #2
    reghdfe is from SSC (FAQ Advice #12). If you have panel data with city as the panel identifier and year as the time variable, it makes no sense that you will include city-year pair dummies in an attempt to capture city-year fixed effects as the combination of city and year represents a single observation. Yet, this is what the author claims to have done. Therefore, either the author has multiple city-year observations (implying that city is not the panel identifier) or he/she is misunderstanding something.

    Comment


    • #3
      Originally posted by Andrew Musau View Post
      reghdfe is from SSC (FAQ Advice #12). If you have panel data with city as the panel identifier and year as the time variable, it makes no sense that you will include city-year pair dummies in an attempt to capture city-year fixed effects as the combination of city and year represents a single observation. Yet, this is what the author claims to have done. Therefore, either the author has multiple city-year observations (implying that city is not the panel identifier) or he/she is misunderstanding something.
      Hey, Andrew, thank you for your comments. So you mean that the command reghdfe can't apply to my regression model and I should try other commands like areg, xtreg? I also try the following code with areg:

      areg delta_Y delta_X delta_Z if (year==2000|year == 2005|year == 2010) [aweight=weight], absorb(city_year) cluster(city_year)

      but the system reports the following message:


      note: delta_X omitted because of collinearity
      note: delta_Z omitted because of collinearity

      Linear regression, absorbing indicators Number of obs = 360
      Absorbed variable: city_year No. of categories = 360
      F( 0, 359) = .
      Prob > F = .
      R-squared = 1.0000

      (Std. Err. adjusted for 360 clusters in city_year)
      ------------------------------------------------------------------------------------
      | Robust
      delta_Y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
      -------------------+----------------------------------------------------------------
      delta_X | 0 (omitted)
      delta_Z | 0 (omitted)
      _cons | .0342651 . . . . .
      ------------------------------------------------------------------------------------

      It seems there exists the collinearity problem here. The command xtreg doesn't apply to me because it doesn't allow time-varied variable population as the regression weight.


      So which command do you think I can use to do the regression successfully? or should I replace the city-year pair effect with the dummy variable for city and year, respectively, in the model?


      Comment


      • #4
        Jason, in what Andrew said there was an implicit question, and a comment anticipating your answer.

        1. How have you generated this city_year variable?

        2. How many observations do you have per city/year? If you have only one observation per city/year, you cannot put city/year fixed effects.

        You can see how many observations you have per city/year for example like this:

        Code:
        . sysuse auto, clear
        (1978 Automobile Data)
        
        . bysort foreign rep: count
        
        ----------------------------------------------------------------------------------------------------------
        -> foreign = Domestic, rep78 = 1
          2
        ----------------------------------------------------------------------------------------------------------
        -> foreign = Domestic, rep78 = 2
          8
        ----------------------------------------------------------------------------------------------------------
        -> foreign = Domestic, rep78 = 3
          27
        ----------------------------------------------------------------------------------------------------------
        -> foreign = Domestic, rep78 = 4
          9
        ----------------------------------------------------------------------------------------------------------
        -> foreign = Domestic, rep78 = 5
          2
        ----------------------------------------------------------------------------------------------------------
        -> foreign = Domestic, rep78 = .
          4
        ----------------------------------------------------------------------------------------------------------
        -> foreign = Foreign, rep78 = 3
          3
        ----------------------------------------------------------------------------------------------------------
        -> foreign = Foreign, rep78 = 4
          9
        ----------------------------------------------------------------------------------------------------------
        -> foreign = Foreign, rep78 = 5
          9
        ----------------------------------------------------------------------------------------------------------
        -> foreign = Foreign, rep78 = .
          1
        
        .

        Comment


        • #5
          Thanks Joro Kolev for answering.

          Comment


          • #6
            hi, @Joro Kolev , thanks for your comments: firstly I answer your questions.
            (1) I use the command gen city_year = _n to create the city/year dummy variable after duplicating the data by city and year.
            (2) Yes, there exists only one observation per city/year. For example, if year = 2000 and city = London, there only exists one observation for Y,X and Z.
            So I think that's the singleton problem as you suggest there. I switch the model to be
            Click image for larger version

Name:	regression model.png
Views:	1
Size:	19.1 KB
ID:	1599856

            So I would try to estimate these two models in the following stage.

            Comment


            • #7
              I am not sure what you concluded at the end, but with your data structure you can estimate city and year fixed effects, you cannot estimate cityXyear fixed effects.

              E.g., the following fixed effects regression is feasible.

              Code:
              areg Y X Z i.year, absorb(city)

              Comment


              • #8
                Yeah, I agree with you @Joro Kolev. The fixed-effect model you suggest is consistent with my thought, but I also add the state/year fixed effect to the city and year fixed effects. My code would like

                Code:
                reghdfe Y X Z , absorb(city year state_year)

                Comment

                Working...
                X