Robust option and cluster option used together in regression

Peter Li

Join Date: May 2019

Posts: 45
#1

Robust option and cluster option used together in regression

27 May 2019, 22:37

Is it appropriate to use the robust option (Huber-White estimator) and cluster option together?

For example, assuming my observations are clustered within states:

Code:

reg y x i.state, cluster(state) robust

My reason to add the robust option is based on the following reading:

Such robust standard errors can deal with a collection of minor concerns about failure to meet assumptions, such as minor problems about normality, heteroscedasticity, or some observations that exhibit large residuals, leverage or influence. For such minor problems, the robust option may effectively deal with these concerns.

However, I don't seem to see people doing combo together. Is it redundant?

Thank you!

Best,
Peter
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#2

27 May 2019, 22:38

Yes, it is redundant. The -cluster(state)- vce is also robust.
Comment
Peter Li

Join Date: May 2019

Posts: 45
#3

27 May 2019, 22:56

Clyde Schechter, Thank you so much!

Two related followup questions:

1. In general, what is the rule in terms of clustering? Say I have 8 states (and about 100 obs nested within), is it sufficient for clustering? What is the minimum number of states I need to have for cluster(state) to work properly?

2. I'm using a survey sample, could I add weight on top of my existing model like this:

Code:

reg y x i.state [pweight=personweight], cluster(state)

Will the inclusion of weight do anything since my SE is already robust?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#4

27 May 2019, 23:50

1. This is difficult to give an exact answer to. I think everybody would agree that 8 clusters is not enough. Everybody agrees that cluster robust standard errors require a "sufficiently large" number of clusters to be valid. But there is no consensus about the minimum sufficient number. I have heard some say that 15 is sufficient and I have seen others who think 50 is the minimum. Fortunately, you are not in this gray area: 8 is clearly too few by all accounts.

2. If you are using a survey sample, you must use the pweight. This has nothing to do with the robustness of the SE. If you fail to use your sample weights, the coefficients can be biased, and, indeed, very seriously so! If you don't use your pweights your results will not be worth the paper you print them on.

Moreover, with survey data you should not be using the cluster(state) vce anway. You should use the -svyset- command to specify strata, primary (an higher if there are any) sampling units, as well as pweights, and then use the -svy:- prefix on your regress command. You cannot use the robust or cluster() vce with the -svy:- prefix. But the -svy:- prefix is the correct approach with survey data.
2 likes
Comment
Peter Li

Join Date: May 2019

Posts: 45
#5

28 May 2019, 00:59

Clyde Schechter I need to look into the -svy- command. Thanks for the heads up!

For my first question, I found this:

With a small number of clusters (M << 50), or very unbalanced cluster sizes, the cure can be worse than the disease, i.e. inference using the cluster-robust estimator may be incorrect more often than when using the

But is there literature I could formally cite?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#6

28 May 2019, 12:50

http://cameron.econ.ucdavis.edu/rese...5_February.pdf is a complete, and pretty technical, review of cluster robust standard errors. It contains a section on the problem of few clusters, which emphasizes that there is no agreed upon threshold, and in fact the degree of error from using them may well depend on specifics of the data involved. This is a non-paywalled version of Cameron AC, Miller DL. A Practitioner's Guide to Cluster-Robust Inference. Journal of Human Resources 2015 50(2):317-372.
1 like
Comment

Announcement

Robust option and cluster option used together in regression

Comment

Comment

Comment

Comment

Comment