Logistic Regression with Clustering

Karisma Morton

Join Date: Jun 2015

Posts: 3
#1

Logistic Regression with Clustering

24 Jun 2015, 14:59

Hello users,

I have data for about 15,000 9th graders. I am interested in running logistic models that investigate the impact of a number of independent variables (e.g., ses, grades) on student enrollment in a particular course in the 9th grade (1=enrolled, 0=not enrolled). Naturally, the likelihood of enrollment would be influenced by how these students were clustered both at the middle and high school levels and I would, therefore, like to account for that. Middle schools are not nested within high schools. I would like to run a fixed effects model that would consider these middle and high school clusters through an accurate estimation of the standard errors . However, I am not certain how to go about doing this. Could I accomplish this by simply using the code: logit y x, cluster(middleschool, highschool)? Any suggestions would be appreciated.

Thanks!
Tags: clustering, logistic regression, non-hierarchical clusters
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#2

24 Jun 2015, 20:41

Welcome to Statalist, Karisma!

Code:

logit y x, cluster(school_id)

will adequately capture the clustering by school. If you are interested in quantifying variation between- and within- schools, then you need melogit (mixed effect logit). You can add school type as a predictor. However if you also want to assess the relative impact of school characteristics (e.g. size, number of sections), compared to individual characteristics, you need melogit (mixed-effects logit).

Last edited by Steve Samuels; 24 Jun 2015, 20:52.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Karisma Morton

Join Date: Jun 2015

Posts: 3
#3

25 Jun 2015, 10:20

Thank you for your response.

I should clarify that each student in my sample attended 1 middle school and went on to attend 1 high school, and as such, would be associated with two school ids. Therefore, I 'm not certain that clustering on school_id would capture students' classification into a middle school cluster AND a high school cluster. Essentially, I would like to know how certain student-level predictors influence the dependent variable knowing that students belong to two separate clusters that are not nested. Any thoughts on how I could run a fixed effects model that takes these two clusters into account?
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#4

25 Jun 2015, 11:44

Karisma, you may also check if a "two-way crossed random effects" melogit wouldn't do the trick in your case.

Best,

Marcos

Best regards,

Marcos
Comment

Announcement

Logistic Regression with Clustering

Comment

Comment

Comment