Clustering hospitalizations by patient using "vce(cluster)"

Susan Gutierrez

Join Date: Oct 2022

Posts: 18
#1

Clustering hospitalizations by patient using "vce(cluster)"

25 Nov 2022, 14:00

Hi all,

I am using a large data base to study length of hospital stay in patients with a rare disease. I have data on 4,326 patients who had a total of 16,428 hospitalizations over the study period. Many of the patients have frequent hospitalizations, so there are multiple observations (hospitalization) per patient. I want to perform a simple linear regression analysis in which my independent variable is "median household income" (dichotimized above and below the median) and my dependent variable is "length of hospital stay". I would like to cluster this regression by patient.

I used the following code to perform linear regression:
"regress length_of_stay ib2.hhi_abovebelow, vce (cluster patient_id)"
Where:
- ib2.hhi_abovebelow = Household income (above and below median)
- patient_id = unique patient identifier

The output is statistically significant and the standard error is adjusted for the 4,326 clusters (each cluster representing a patient).

Problem: After looking at the total cohort (16,428 hospitalizations with 4,326 patients) I used this same command ("regress length_of_stay ib2.hhi_abovebelow, vce (cluster patient_id)") to look only at the first hospitalization per patient by dropping all subsequent hospitalizations. In this case, I had 4,326 patients with 4,326 hospitalizations. The output that I got when I used the above command is exactly the same as the output I got when looking at the total cohort.

Questions:
- When I am using the cluster command, is this simply telling Stata to include one observation (hospitalization) per cluster (by patient) and drop all additional observations within that cluster?
- Is there a better approach I should use to perform linear regression that accounts for the fact that the same patient will have multiple hospitalizations?

Thanks in advance.
Apologies for not using dataex - please let me know if it will be useful for this query.
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#2

26 Nov 2022, 02:08

Susan:
first off, (more) positive replies are dfficult to guess as you did not post what Stata gave you back (as per FAQ).
That said:
1) as far as your problem is concerned, modifies the standard errors for each of the -clusterid- (patients, in your example) and takes into account all the observations each cluster is composed of;
3) provided that a simple OLS is in all likelihood insufficient to give you any valuable information about the data generating process you're intrested in, I think you'll be better off with -xtreg,re- as a share of your patients was hospitalized more than once (this situation is similar to the one analyzed via teh so called shared frailty models, see .-stcox-).

Kind regards,
Carlo
(Stata 19.0)
Comment
Susan Gutierrez

Join Date: Oct 2022

Posts: 18
#3

29 Nov 2022, 16:09

Thank you Carlo for your answer. I will look into -xtreg,re- as an option for my analysis. Appreciate it.
Comment

Announcement

Clustering hospitalizations by patient using "vce(cluster)"

Comment

Comment