Clustering standard errors with logistic regression for panel data

Lucas da Silva

Join Date: May 2021

Posts: 3
#1

Clustering standard errors with logistic regression for panel data

24 Jul 2021, 05:00

Hello, this is my first time using Statalist, so I apologise in advance for any mistakes.

I was wondering if I should use clustered standard errors when running a logistic regression on panel data (with fixed effects). I don't see any examples of people doing this elsewhere on Statalist. However, if it were linear regression, I would have to cluster my standard errors with this panel data.

I am using the British Election Study panel data. Each individual responds to one or more waves of the survey, which are the time variable. I am asking whether various attitudes affect the likelihood that individuals will vote for the Labour Party. My code (with clustered standard errors) is as follows:
clogit notVoteLab immigLibLab econWorse i.education i.partner i.religion i.wave, group(id) vce(cluster id)

Meanwhile, without clustered SEs, I would use this code:
xtlogit notVoteLab immigLibLab econWorse i.education i.partner i.religion i.wave, fe

Additionally, I'm wondering if it makes sense to introduce wave (i.e. time variable) fixed effects?

Thank you for any help and let me know if I need to provide more or different information.
Tags: None
Lucas da Silva

Join Date: May 2021

Posts: 3
#2

24 Jul 2021, 05:01

Just to clarify, I already introduced the wave fixed effects in the examples ("i.wave").
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10254
#3

24 Jul 2021, 11:12

Jeff Wooldridge has made the point several times in the forum that using the -vce(robust)- option in nonlinear models such as logit is admitting that your model is misspecified. See, e.g., https://www.statalist.org/forums/for...nal-regression.This is not the same for linear models. The same point is made by Bill Sribney in the following Stata FAQ: https://www.stata.com/support/faqs/s...nce-estimator/
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#4

24 Jul 2021, 11:55

I am not aware of any reason why robust / clustered errors would cause any harm. Very mechanically the "standard" standard errors are the robust standard errors plus an assumption on the structure of the robust standard errors.

If I were you I would cluster the standard errors if the command allows.

Andrew is right to point out that clustering in linear and non-linear models is a little bit absolutely different. So pretty much we can forget about any intuition we have gained from linear models when we move to non-linear models.
Comment
Lucas da Silva

Join Date: May 2021

Posts: 3
#5

25 Jul 2021, 09:09

Thanks Andrew and Joro!

I suppose the fact that I have time-series data means that I have autocorrelation within panels. Thus, clustered standard errors would be necessary to account for that. clogit appears to allow for that with the "vce(cluster id)" option. Please feel free to let me know if you think I've made the wrong conclusion though.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10254
#6

25 Jul 2021, 09:30

This thread may help: https://www.statalist.org/forums/for...tandard-errors. What is implied is that you can use logit with indicators for the fixed effects and cluster provided that your \(T\) dimension is large.

Last edited by Andrew Musau; 25 Jul 2021, 09:38.
1 like
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#7

25 Jul 2021, 10:07

Lucas, skim through this paper https://www.bls.gov/osmr/research-pa...f/ec180020.pdf
and see what Professor Wooldridge and his coauthors are recommending when you are estimating panel data logit and you are suspecting autocorrelation. And then be kind and report back what you have read, because this is a new issue (the paper is still unpublished) and apparently the only person around here familiar with the issue is Professor Wooldridge himself. So it will be useful for the progeny to hear what was your take on the issue after reading the paper.

I still insist that using cluster robust variance is not going to hurt. I am saying this because the cluster robust variance has the form Bread*Meat*Bread, and you obtain the standard variance Bread when certain assumptions hold and Bread=inv(Meat). Even if these extra assumptions hold, still using the Bread*Meat*Bread expression is not doing any harm, what we cancel theoretically and manually when we use Bread, we will empirically obtain if we use Bread*Meat*Bread and it empirically holds that Bread=inv(Meat).
But I cannot cite any paper saying using robust variance would help in the case of within panel correlation. Asserting that something does not hurt, is not the same as saying that this thing will help...

What Andrew suggest above (abandoning the conditional logit, and doing standard logit with manually included panel fixed effects) is an option if you have large T. So do say what is your N and what is your T.

Another possibility is the correlated random effects logit, aka the Hausman device, which involves including the panel averages of the regressors. Check again the paper above, I think it is discussed there.
Comment

Announcement

Clustering standard errors with logistic regression for panel data

Comment

Comment

Comment

Comment

Comment

Comment