Two-way clustering for logistic regression

Christelle Alkhoury

Join Date: Feb 2023

Posts: 36
#1

Two-way clustering for logistic regression

02 Jun 2023, 05:03

Hello everyone,

I want to run a logistic regression where I cluster by both year and industry.
I know Stata can do the one-way clustering, however to correct for Cross-Sectional and Time-Series Dependence I would need to do two-way clustering.
As I realized from prior posts this is not a straightforward approach, and actually entails multiple steps.
Can someone please provide me with the commands to follow to run my two-way clustered logistic regression?

P.S. I am using Stata/SE 16.0

Thank you!
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10219
#2

02 Jun 2023, 05:53

You can do it with vcemway from SSC. But how you think about the robust variance estimator in nonlinear models is not the same way as in linear models. If you use this estimator when fitting a nonlinear model, you are admitting that your model is misspecified. See https://www.stata.com/support/faqs/s...nce-estimator/.

Code:

ssc install vcemway, replace help vcemway

Last edited by Andrew Musau; 02 Jun 2023, 05:56.
2 likes
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2175
#3

02 Jun 2023, 11:30

Andrew: I agree with you almost always, but not in this case. It's true that if you have independent observations -- such as from random sampling -- and you use vce(robust) with logit, you are admitting that the logit model is wrong. But not with clustering, which is being done primarily to account for correlation across observations. The way the clustering works, the standard errors are robust to misspecification of the model (and then it is up to you to argue it is a good enough approximation). But even if you think P(y_it = 1|x_it) follows a logit, you still have to obtain standard errors that allow for correlation across t (and i, in the case of two-way clustering). Part of the issue is what we mean by "correct specification." For me, it would be having P(y_it = 1|x_it) without taking a stand on serial correlation (or cross-sectional correlation).

I will say that I'm not a big fan of two-way clustering because it just seems to be expected rather than thinking through whether it is warranted. It need not work well with small T. Unfortunately, heterogeneity across time in coefficients can make it seem that clustering across t is needed when it is not. But it's not for me to decide if one wants to use two-way clustering.
3 likes
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10219
#4

02 Jun 2023, 14:42

Thanks for the correction Jeff Wooldridge. Indeed, #1 does mention clustering.
Comment

Announcement

Two-way clustering for logistic regression

Comment

Comment

Comment