Singletons, Cluster-Robust Standard Errors in reghdfe

Jason Su

Join Date: Oct 2020

Posts: 13
#1

Singletons, Cluster-Robust Standard Errors in reghdfe

24 Mar 2021, 20:10

Dear all, I try to use the command reghdfe to estimate the following fixed effect model:

My stata code is

reghdfe delta_Y delta_X delta_Z if (year==2000|year == 2005|year == 2010) [aweight=population], absorb(city_year) cluster(city_year)

However, the system reports the error message:

(dropped 360 singleton observations)
insufficient observations
r(2001);

I click the link of the error message and convert to the website about Singletons, Cluster-Robust Standard Errors: http://scorreia.com/research/singletons.pdf

I can't fully understand the error message and how should I fix the code above?
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10187
#2

25 Mar 2021, 06:18

reghdfe is from SSC (FAQ Advice #12). If you have panel data with city as the panel identifier and year as the time variable, it makes no sense that you will include city-year pair dummies in an attempt to capture city-year fixed effects as the combination of city and year represents a single observation. Yet, this is what the author claims to have done. Therefore, either the author has multiple city-year observations (implying that city is not the panel identifier) or he/she is misunderstanding something.
1 like
Comment
Jason Su

Join Date: Oct 2020

Posts: 13
#3

25 Mar 2021, 09:47

Originally posted by Andrew Musau View Post

reghdfe is from SSC (FAQ Advice #12). If you have panel data with city as the panel identifier and year as the time variable, it makes no sense that you will include city-year pair dummies in an attempt to capture city-year fixed effects as the combination of city and year represents a single observation. Yet, this is what the author claims to have done. Therefore, either the author has multiple city-year observations (implying that city is not the panel identifier) or he/she is misunderstanding something.

Hey, Andrew, thank you for your comments. So you mean that the command reghdfe can't apply to my regression model and I should try other commands like areg, xtreg? I also try the following code with areg:

areg delta_Y delta_X delta_Z if (year==2000|year == 2005|year == 2010) [aweight=weight], absorb(city_year) cluster(city_year)

but the system reports the following message:

note: delta_X omitted because of collinearity
note: delta_Z omitted because of collinearity

Linear regression, absorbing indicators Number of obs = 360
Absorbed variable: city_year No. of categories = 360
F( 0, 359) = .
Prob > F = .
R-squared = 1.0000

(Std. Err. adjusted for 360 clusters in city_year)
------------------------------------------------------------------------------------
| Robust
delta_Y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------------+----------------------------------------------------------------
delta_X | 0 (omitted)
delta_Z | 0 (omitted)
_cons | .0342651 . . . . .
------------------------------------------------------------------------------------

It seems there exists the collinearity problem here. The command xtreg doesn't apply to me because it doesn't allow time-varied variable population as the regression weight.

So which command do you think I can use to do the regression successfully? or should I replace the city-year pair effect with the dummy variable for city and year, respectively, in the model?
Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3050

25 Mar 2021, 10:41

Jason, in what Andrew said there was an implicit question, and a comment anticipating your answer.

1. How have you generated this city_year variable?

2. How many observations do you have per city/year? If you have only one observation per city/year, you cannot put city/year fixed effects.

You can see how many observations you have per city/year for example like this:

Code:

. sysuse auto, clear
(1978 Automobile Data)

. bysort foreign rep: count

----------------------------------------------------------------------------------------------------------
-> foreign = Domestic, rep78 = 1
  2
----------------------------------------------------------------------------------------------------------
-> foreign = Domestic, rep78 = 2
  8
----------------------------------------------------------------------------------------------------------
-> foreign = Domestic, rep78 = 3
  27
----------------------------------------------------------------------------------------------------------
-> foreign = Domestic, rep78 = 4
  9
----------------------------------------------------------------------------------------------------------
-> foreign = Domestic, rep78 = 5
  2
----------------------------------------------------------------------------------------------------------
-> foreign = Domestic, rep78 = .
  4
----------------------------------------------------------------------------------------------------------
-> foreign = Foreign, rep78 = 3
  3
----------------------------------------------------------------------------------------------------------
-> foreign = Foreign, rep78 = 4
  9
----------------------------------------------------------------------------------------------------------
-> foreign = Foreign, rep78 = 5
  9
----------------------------------------------------------------------------------------------------------
-> foreign = Foreign, rep78 = .
  1

.

Comment

Andrew Musau

Join Date: Oct 2014

Posts: 10187
#5

25 Mar 2021, 10:46

Thanks Joro Kolev for answering.
Comment
Jason Su

Join Date: Oct 2020

Posts: 13
#6

25 Mar 2021, 16:43

hi, @Joro Kolev , thanks for your comments: firstly I answer your questions.
(1) I use the command gen city_year = _n to create the city/year dummy variable after duplicating the data by city and year.
(2) Yes, there exists only one observation per city/year. For example, if year = 2000 and city = London, there only exists one observation for Y,X and Z.
So I think that's the singleton problem as you suggest there. I switch the model to be

So I would try to estimate these two models in the following stage.
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#7

26 Mar 2021, 02:56

I am not sure what you concluded at the end, but with your data structure you can estimate city and year fixed effects, you cannot estimate cityXyear fixed effects.

E.g., the following fixed effects regression is feasible.

Code:

areg Y X Z i.year, absorb(city)
Comment
Jason Su

Join Date: Oct 2020

Posts: 13
#8

26 Mar 2021, 15:35

Yeah, I agree with you @Joro Kolev. The fixed-effect model you suggest is consistent with my thought, but I also add the state/year fixed effect to the city and year fixed effects. My code would like

Code:

reghdfe Y X Z , absorb(city year state_year)
Comment

Announcement

Singletons, Cluster-Robust Standard Errors in reghdfe

Comment

Comment

Comment

Comment

Comment

Comment

Comment