Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Clustering standard errors

    I have a cross-sectional dataset at the firm level, and my interest variable (RHS) is a country-level variable. The response variable (LHS) is firm performance (about 1000 firms in 50 countries). It seems I have to cluster the standard errors at the country-level. My question is whether there is no problem if the number of observations per cluster vary significantly? In other words, since I have some countries with only one firm and some other countries with more than 200 firms then can I still cluster at the country-level? Or, I must cluster at the bank-level (which is equivalent to robust standard errors, because of pure cross-section)?
    Thank you,

  • #2
    It sounds like you want to set the bank as the panel variable in your -xtset- command and cluster vce at the country level. Variation in the number of observations per cluster is not a problem. But singleton clusters cause some difficulties. If you can combine some of the countries that have only one observation into fewer larger clusters in a way that makes sense from the real world perspective, the analysis will be a little easier: you won't have to cope with missing F statistics.

    Another approach, if you and your audience can tolerate the use of random rather than fixed effects, is to do this as a 3-level model. This would be a more faithful reflection of the actual data design. I know that in finance and economics, random effects models are viewed skeptically, so this may not be an option for you. But that, or something like it, is what I would probably do.
    Last edited by Clyde Schechter; 22 Jan 2021, 21:49.

    Comment


    • #3
      Thank you very much Clyde for your valuable comments.

      Comment


      • #4
        I would say pretty much what Clyde says: You definitely should cluster at the country level, and at first order, different cluster sizes are not a huge problem.

        Then depending on how deep you want to go, at second order the combination of small number of clusters and vastly different cluster sizes is a bit of a problem.

        The suggested solution to this problem is bootstrap, you can check

        MacKinnon and Webb “Wild Bootstrap Inference for Wildly Different Cluster Sizes,Journal of Applied Econometrics 32(2) pp. 233--254, 2017.

        MacKinnon and Webb, together or separately, might also have other papers on the issue.

        Comment


        • #5
          Thank you Joro. Much appreciated.

          Comment

          Working...
          X