Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Robust option and cluster option used together in regression

    Is it appropriate to use the robust option (Huber-White estimator) and cluster option together?

    For example, assuming my observations are clustered within states:

    Code:
     reg y x i.state, cluster(state) robust
    My reason to add the robust option is based on the following reading:

    Such robust standard errors can deal with a collection of minor concerns about failure to meet assumptions, such as minor problems about normality, heteroscedasticity, or some observations that exhibit large residuals, leverage or influence. For such minor problems, the robust option may effectively deal with these concerns.
    However, I don't seem to see people doing combo together. Is it redundant?

    Thank you!

    Best,
    Peter



  • #2
    Yes, it is redundant. The -cluster(state)- vce is also robust.

    Comment


    • #3
      Clyde Schechter, Thank you so much!

      Two related followup questions:

      1. In general, what is the rule in terms of clustering? Say I have 8 states (and about 100 obs nested within), is it sufficient for clustering? What is the minimum number of states I need to have for cluster(state) to work properly?

      2. I'm using a survey sample, could I add weight on top of my existing model like this:
      Code:
       reg y x i.state [pweight=personweight], cluster(state)
      Will the inclusion of weight do anything since my SE is already robust?

      Comment


      • #4
        1. This is difficult to give an exact answer to. I think everybody would agree that 8 clusters is not enough. Everybody agrees that cluster robust standard errors require a "sufficiently large" number of clusters to be valid. But there is no consensus about the minimum sufficient number. I have heard some say that 15 is sufficient and I have seen others who think 50 is the minimum. Fortunately, you are not in this gray area: 8 is clearly too few by all accounts.

        2. If you are using a survey sample, you must use the pweight. This has nothing to do with the robustness of the SE. If you fail to use your sample weights, the coefficients can be biased, and, indeed, very seriously so! If you don't use your pweights your results will not be worth the paper you print them on.

        Moreover, with survey data you should not be using the cluster(state) vce anway. You should use the -svyset- command to specify strata, primary (an higher if there are any) sampling units, as well as pweights, and then use the -svy:- prefix on your regress command. You cannot use the robust or cluster() vce with the -svy:- prefix. But the -svy:- prefix is the correct approach with survey data.

        Comment


        • #5
          Clyde Schechter I need to look into the -svy- command. Thanks for the heads up!

          For my first question, I found this:
          With a small number of clusters (M << 50), or very unbalanced cluster sizes, the cure can be worse than the disease, i.e. inference using the cluster-robust estimator may be incorrect more often than when using the
          But is there literature I could formally cite?

          Comment


          • #6
            http://cameron.econ.ucdavis.edu/rese...5_February.pdf is a complete, and pretty technical, review of cluster robust standard errors. It contains a section on the problem of few clusters, which emphasizes that there is no agreed upon threshold, and in fact the degree of error from using them may well depend on specifics of the data involved. This is a non-paywalled version of Cameron AC, Miller DL. A Practitioner's Guide to Cluster-Robust Inference. Journal of Human Resources 2015 50(2):317-372.

            Comment

            Working...
            X