Logistic regression with fixed effects for states and robust SEs with clusters at address

Laura Zatz

Join Date: Nov 2018

Posts: 3
#1

Logistic regression with fixed effects for states and robust SEs with clusters at address

02 Nov 2018, 15:38

I'm trying to identify the correct commands in Stata 15.1 to create a logistic regression model for a binary outcome that would control for fixed effects at the state level and provide robust standard errors clustered at practice address. The data are from a one-time survey of ~240 physicians and we want to account for the fact that physicians in the same practice likely have correlation in their responses. The respondents came from 27 states and 180 practices.

A colleague suggested using clogit, but I received an error message of "groups (strata) are not nested within clusters" when I tried the following command:
clogit depvar indvars, group(state) vce(cluster practice_address)

For my continuous outcomes, I was advised to use: areg depvar indvars, cluster(practice_address) robust a(state). This ran without producing any error messages. In reading through this forum, it doesn't appear there's an analogous command for logistic regression.

Thanks for any advice!
Laura

Last edited by Laura Zatz; 02 Nov 2018, 15:42.
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10190
#2

02 Nov 2018, 16:39

groups (strata) are not nested within clusters

This implies that states are not nested within practices because the opposite holds- practices are nested within states. Using any panel estimator (in Stata, those that require you to xtset your data), you will not be able to run your model because you cannot have state as the panel identifier yet your observations are at the practice level. The areg regression that you ran does not take into account the panel structure of your data and is equivalent to

Code:

regress depvar indvars i.state, cluster(practice_address)

However, this does not matter because you introduce state fixed effects using state dummies (absorbed in areg). That said, state fixed effects are accounted for by having fixed effects at the practice level (because practices are nested within states). Therefore, you can simply run

Code:

clogit depvar indvars, group(practice_address) vce(cluster practice_address)

and be confident that you have accounted for state fixed effects.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10190
#3

03 Nov 2018, 06:47

I missed one big thing while reading your question, i.e.,

The data are from a one-time survey

So your data is a cross-section and you have the following structure:

Physician \(\rightarrow\) Practice \(\rightarrow\) State

While the advice is #2 holds in general,

~240 physicians... from 27 states and 180 practices

implies that a large number of practices in your data have a single physician (and thus no within-variaton). You need sufficient variation in your group variable (especially in the absence of temporal variation), so I think the best you can do with this data is

Code:

clogit depvar indvars, group(state) vce(cluster state)

You cannot cluster your standard errors at a lower level than state. You can simply explain that clustering at a lower level is possible with unconditional fixed effects through the inclusion of dummies in the model, but not with conditional fixed effects. The former is biased for logit.
Comment
Laura Zatz

Join Date: Nov 2018

Posts: 3
#4

03 Nov 2018, 13:45

Andrew, thanks for your quick responses! I have three follow-up questions:

1) With your proposed model specification (clogit depvar indvars, group(state) vce(cluster state)), is it possible to add "robust" to account for potential model misspecification?

2) Would my original model specification be theoretically sound with vce (bootstrap) instead of vce (cluster)? clogit depvar indvars, group(state) vce(bootstrap practice_address)

3) Are you suggesting the areg code for my linear regression should be changed as well?

Thanks,
Laura
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10190
#5

03 Nov 2018, 15:29

1) With your proposed model specification (clogit depvar indvars, group(state) vce(cluster state)), is it possible to add "robust" to account for potential model misspecification?

In clogit, robust standard errors are equivalent to clustering at the group level. You can verify that the following commands are equivalent:

Code:

clogit depvar indvars, group(state) vce(cluster state) clogit depvar indvars, group(state) robust

2) Would my original model specification be theoretically sound with vce (bootstrap) instead of vce (cluster)? clogit depvar indvars, group(state) vce(bootstrap practice_address)

In general, clustering your standard errors or bootstrapping them will result in very similar results. However, even if you choose to bootstrap your standard errors, the number of replications will be based on the number of clusters in your group variable. You cannot escape the fact that groups must be nested within clusters, whether you bootstrap or cluster. Try this out and see for yourself

Code:

clogit depvar indvars, group(state) vce(bootstrap practice_address)

3) Are you suggesting the areg code for my linear regression should be changed as well?

No, the areg regression perfectly fine because unconditional fixed effects are valid for linear models. As a matter of fact, if these were valid for logit, you could simply run the following with no issues:

Code:

logit depvar indvars i.state, cluster(practice_address)

However, the unconditional fixed effects model for logit is biased due to the incidental parameter problem. This is covered in most standard econometrics text books. So that is why we run conditional logit (clogit) instead.
Comment
Laura Zatz

Join Date: Nov 2018

Posts: 3
#6

04 Nov 2018, 19:11

Thank you for your response, Andrew!

For #1, the model results are indeed the same using "robust" or "vce(cluster state)".

Unfortunately, a lot of my observations get dropped.
"note: multiple positive outcomes within groups encountered.
note: 17 groups (86 obs) dropped because of all positive or all negative outcomes."

I was trying to find posts on this forum that relate to this and I came across this one suggesting -xtlogit, re. Do you think that could work in this instances? I recognize that states are typically modeled as fixed effects, but this paper by Clark and Linzer (2012) suggests that random effects could be a viable option (preferred in certain instances). I've never worked with xtlogit before.

For #2, the code with bootstrap wouldn't run. I received the error "no observations".

Thanks,
Laura
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10190
#7

05 Nov 2018, 02:08

Unfortunately, a lot of my observations get dropped.
"note: multiple positive outcomes within groups encountered.
note: 17 groups (86 obs) dropped because of all positive or all negative outcomes."

I wouldn't lose sleep over this. The fixed effects estimator utilizes the within variation in your data and for some set of observations, the is no within variation. You still have 240-86=154 observations that are used in the regression (or approximately 65% of all observations).

I was trying to find posts on this forum that relate to this and I came across this one suggesting -xtlogit, re. Do you think that could work in this instances? I recognize that states are typically modeled as fixed effects, but this paper by Clark and Linzer (2012) suggests that random effects could be a viable option (preferred in certain instances). I've never worked with xtlogit before.

I would not switch to random effects simply because of the 86 observations that are dropped. You still have sufficient within variation in your data for the fixed effects results to be useful. Practices differ between disciplines, but for us in economics, we believe that the decision as to which estimation technique provides a valid presentation of the data generating process should be determined by the sample data and not assumed a priori. For this reason, we use a Hausman test to choose between random and fixed effects. David Drukker demonstrates here how you can run such a test for logit:

https://www.stata.com/statalist/arch.../msg00669.html

In your case, you won't be able to use xtlogit either for fixed effects or random effects at the state level because this requires you to xtset your data beforehand and your observations are at the physician level. You could do this by reorganizing your data, but the easiest way I think is to stick with clogit for fixed effects and melogit to estimate the random effects logit model. After this, run the Hausman.

For #2, the code with bootstrap wouldn't run. I received the error "no observations".

This confirms that you will not be able to cluster or bootstrap your standard errors at a lower level than your group variable.
Comment
John Adler

Join Date: Apr 2017

Posts: 173
#8

30 Dec 2020, 06:44

Andrew Musau Sorry to return to an old thread after so long, but I'm interested in the point you make on clogit:

You need sufficient variation in your group variable

I've noticed the same in a clogit analysis across 3 waves of panel data clustered at the regional level, I only have about 30 regions (clusters) and thus my model similarly does not converge, but I was interested in the theoretical reasons for this.

Why does a clogit need variation in the clusters to converge?

And what exactly is the nature of this variation, I've noticed in another panel model of 9,000 individuals over 3 waves that when I cluster on their id that clogit converges fine, even though their id clearly doesnt change over time, so I assume the variation isnt a change in state across time (i.e. from one region to another) but instead the number of possible states there are (i.e. the number of possible regions).

Sorry to revive an old thread, but seeing this in action I would love to know why!

All the best,

John
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10190
#9

30 Dec 2020, 15:20

The quote

~240 physicians... from 27 states and 180 practices
implies that a large number of practices in your data have a single physician (and thus no within-variaton). You need sufficient variation in your group variable (especially in the absence of temporal variation), so I think the best you can do with this data is

refers to the inclusion of practice fixed effects. As an observation in the OP's dataset is a physician, practices with one physician are singletons and therefore one cannot estimate practice fixed effects if there is no within variation or multiple physicians within a practice. For clusters, the only requirements are that you need enough units within a cluster (rule of thumb 30) and your group variable needs to be nested within clusters to properly account for within-cluster correlation. Stata, by default, will not allow you to include clusters where the latter condition does not hold, but there is an undocumented option -nonest- that can override the default. Convergence problems are common in maximum likelihood estimations and are necessarily not caused by the level you choose to cluster your standard errors.

Last edited by Andrew Musau; 30 Dec 2020, 15:22.
Comment
John Adler

Join Date: Apr 2017

Posts: 173
#10

30 Dec 2020, 15:53

Hi Andrew,

Thanks a lot, that makes sense.

So are you saying that clustering doesn't cause issues of convergence problems in clogit?

I was playing with my binary outcome model to investigate this, removing the clustering option and seeing how things looked, and I noticed that I have a control which doesn't often change for people (where they live) and that including this stops my clogit model from converging, but doesn't ever stop a linear probability model (xtreg, fe) of the relationship I model from running. I was wondering why this is? I read that it's because a linear probability model does not rely on maximum likelihood estimation, but I'm not sure what that means? And what the theoretical implications of that are?

I'd appreciate your thoughts!

All the best,

John
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10190
#11

30 Dec 2020, 16:38

Yes, the estimation mechanism is very different for conditional fixed effects where the fixed effects are conditioned out of the likelihood function. For unconditional fixed effects, you either use demeaning or include indicators to purge the fixed effects.

I was wondering why this is? I read that it's because a linear probability model does not rely on maximum likelihood estimation, but I'm not sure what that means? And what the theoretical implications of that are?

You can estimate a linear model using maximum likelihood, so the issue is not that you are using maximum likelihood but whether the fixed effects are conditional or unconditional.
Comment
John Adler

Join Date: Apr 2017

Posts: 173
#12

30 Dec 2020, 16:52

Ok, so just to confirm, it is the presence of this infrequently appearing control, rather than the clustering level, which is causing issues with convergence?

Also I’m still not sure I understand why clogit can’t converge but xtlogit, fe can? Is it just the manner in which each handles fixes effects?

Thanks again
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10190
#13

30 Dec 2020, 17:24

clogit and xtlogit,fe are the same estimator. Are you sure you are estimating the same model? The only difference that I know is that clogit allows clustered standard errors whereas for xtlogit, you need to bootstrap.

Ok, so just to confirm, it is the presence of this infrequently appearing control, rather than the clustering level, which is causing issues with convergence?

It is easy to test this. Just estimate the model with convergence problems without clustering the standard errors. I think you will still have the same convergence problems.
1 like
Comment
John Adler

Join Date: Apr 2017

Posts: 173
#14

31 Dec 2020, 04:21

I'm an idiot. I wrote my response in a hurry, what I meant to say is I'm still not sure I understand why clogit can’t converge but linear probability model xtreg, fe can. Is it the manner in which each handles fixes effects? By which I mean the fixed effects are treated as conditional in a clogit and treated as unconditional in xtreg, fe. And why does this difference in how the estimators treat fixed effects cause clogit not to converge and xtreg to converge fine? You were totally right by the way, I have the same convergence problems with and without clustering!
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10190
#15

01 Jan 2021, 13:08

If you refer to the manual entries of these estimators and the references therein, you will find information on how the estimation is done. xtreg is a linear estimator whereas clogit is a nonlinear estimator. In general, nonlinear equations do not generate well-defined residuals which can be minimized. Least squares estimation is therefore not possible. Instead, one estimates by maximum likelihood (ML), and this can lead to convergence problems if the likelihood is not well behaved. For linear models, apart from OLS, you can also estimate using ML or the generalized method of moments (GMM). If you have a large number of observations within a group (30+), you can estimate an unconditional fixed effects logit model, and the incidental parameters problem will not be much of an issue. See the following thread, for example:

https://www.statalist.org/forums/for...sectional-data

On the other hand, with a small number of observations and convergence problems, nothing stops you from first estimating an unconditional fixed effects logit and thereafter, using the estimates as starting values for your conditional logit. This may just do the trick.

Code:

logit y x1 x2 x3 i.catvar mat b= e(b) clogit y x1 x2 x3, i(catvar) from(b, skip)
1 like
Comment

Announcement

Logistic regression with fixed effects for states and robust SEs with clusters at address

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment