Logistic Regression with Clustering

Jon Jenssen

Join Date: Apr 2019

Posts: 1
#1

Logistic Regression with Clustering

08 Apr 2019, 08:39

Hello users,

I am new to the forum, but hoped you would be able to settle a small dispute we are having. Currently we are analyzing data from a hospital setting. Patients have been recruited based on the presence of respiratory symptoms (yes/no) e.g. cough, sore throat, inspiratory wheezing etc.
The patients have been recruited from 6 different hospitals over the course of four influenza seasons (winter of 2010, 2011, 2012, 2013). On this, we are planning to run a logistic model that investigate the associations of the different respiratory symptoms and the disease of interest (outcome e.g Human Metapneumovirus). Since it is a hospital setting, all patients are sick and the once that are negative to Human Metapneumovirus will have some other respiratory disease. Which other disease(s) that are most prevalent among the “controls” will differ somewhat depending on the season (e.g season one was a strong influenza season, while season two was more dominated by Respiratory Syncytial Virus).

The discussion have ended up with four different views on how to address this problem.
Keeping it simple: Using a standard logistic regression with hospital and season as independent variables in the model:
logistic disease cough i.hospital i.season

Clustering 1: Taking the possible clustering arising from having different hospitals into account by using melogit (or possibly xtgee).

Clustering 2: Taking the possible clustering arising from both different hospitals and season into account by using melogit

Clustering 2: Since it is the same 4 season and same hospitals in all seasons there was a suggestion of viewing this as a “two-way crossed random effects" situation (not sure how to do this in STATA).

This is still at the discussion level, so unfortunately I cannot give an example of the data. But I am interested in hearing what other outside our small cluster think before the argument heats up when the data arrives…

Best wishes,
Jon
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30174
#2

08 Apr 2019, 18:59

Well, I am skeptical of using random effects for season. You have only four seasons. That is a very small sample of season-space and I would be quite reluctant to take the results of the variance coefficient seriously based on such a small n (no matter how many patient observations you have in each season.) Also, arguably the four seasons are in fact a census of season space, which would make a random effect is pointless mis-specification. So I would handle this just as a regular fixed effect.

The same considerations might apply to hospitals, depending on how many you have. If you have 30 or more, then it becomes a reasonable number to support random effects at that level. On that assumption I would opt for #2, being sure to include a seasonal fixed effect variable.

Option #3 does not really make much sense to me unless you mean it as the same thing as #4. There is no reason to consider hospitals nested in seasons, nor seasons in hospitals.

In the event you end up settling on #4, notwithstanding my concerns about the appropriateness of modeling season as a random effect, you will find a section on how to estimate models with crossed random effects in the PDF documentation that comes installed with Stata. Run -help mixed- and then click on the blue link near the top to open the -mixed- chapter of the PDF documentation. Go to the link for Remarks and examples, and then click on the link for Crossed-effects models.
1 like
Comment

Announcement

Logistic Regression with Clustering

Comment