Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Two-way clustering for logistic regression

    Hello everyone,

    I want to run a logistic regression where I cluster by both year and industry.
    I know Stata can do the one-way clustering, however to correct for Cross-Sectional and Time-Series Dependence I would need to do two-way clustering.
    As I realized from prior posts this is not a straightforward approach, and actually entails multiple steps.
    Can someone please provide me with the commands to follow to run my two-way clustered logistic regression?

    P.S. I am using Stata/SE 16.0

    Thank you!

  • #2
    You can do it with vcemway from SSC. But how you think about the robust variance estimator in nonlinear models is not the same way as in linear models. If you use this estimator when fitting a nonlinear model, you are admitting that your model is misspecified. See https://www.stata.com/support/faqs/s...nce-estimator/.

    Code:
    ssc install vcemway, replace
    help vcemway
    Last edited by Andrew Musau; 02 Jun 2023, 05:56.

    Comment


    • #3
      Andrew: I agree with you almost always, but not in this case. It's true that if you have independent observations -- such as from random sampling -- and you use vce(robust) with logit, you are admitting that the logit model is wrong. But not with clustering, which is being done primarily to account for correlation across observations. The way the clustering works, the standard errors are robust to misspecification of the model (and then it is up to you to argue it is a good enough approximation). But even if you think P(y_it = 1|x_it) follows a logit, you still have to obtain standard errors that allow for correlation across t (and i, in the case of two-way clustering). Part of the issue is what we mean by "correct specification." For me, it would be having P(y_it = 1|x_it) without taking a stand on serial correlation (or cross-sectional correlation).

      I will say that I'm not a big fan of two-way clustering because it just seems to be expected rather than thinking through whether it is warranted. It need not work well with small T. Unfortunately, heterogeneity across time in coefficients can make it seem that clustering across t is needed when it is not. But it's not for me to decide if one wants to use two-way clustering.

      Comment


      • #4
        Thanks for the correction Jeff Wooldridge. Indeed, #1 does mention clustering.

        Comment

        Working...
        X