Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Clustering hospitalizations by patient using "vce(cluster)"

    Hi all,

    I am using a large data base to study length of hospital stay in patients with a rare disease. I have data on 4,326 patients who had a total of 16,428 hospitalizations over the study period. Many of the patients have frequent hospitalizations, so there are multiple observations (hospitalization) per patient. I want to perform a simple linear regression analysis in which my independent variable is "median household income" (dichotimized above and below the median) and my dependent variable is "length of hospital stay". I would like to cluster this regression by patient.

    I used the following code to perform linear regression:
    "regress length_of_stay ib2.hhi_abovebelow, vce (cluster patient_id)"
    Where:
    - ib2.hhi_abovebelow = Household income (above and below median)
    - patient_id = unique patient identifier

    The output is statistically significant and the standard error is adjusted for the 4,326 clusters (each cluster representing a patient).

    Problem: After looking at the total cohort (16,428 hospitalizations with 4,326 patients) I used this same command ("regress length_of_stay ib2.hhi_abovebelow, vce (cluster patient_id)") to look only at the first hospitalization per patient by dropping all subsequent hospitalizations. In this case, I had 4,326 patients with 4,326 hospitalizations. The output that I got when I used the above command is exactly the same as the output I got when looking at the total cohort.

    Questions:
    - When I am using the cluster command, is this simply telling Stata to include one observation (hospitalization) per cluster (by patient) and drop all additional observations within that cluster?
    - Is there a better approach I should use to perform linear regression that accounts for the fact that the same patient will have multiple hospitalizations?


    Thanks in advance.
    Apologies for not using dataex - please let me know if it will be useful for this query.

  • #2
    Susan:
    first off, (more) positive replies are dfficult to guess as you did not post what Stata gave you back (as per FAQ).
    That said:
    1) as far as your problem is concerned, modifies the standard errors for each of the -clusterid- (patients, in your example) and takes into account all the observations each cluster is composed of;
    3) provided that a simple OLS is in all likelihood insufficient to give you any valuable information about the data generating process you're intrested in, I think you'll be better off with -xtreg,re- as a share of your patients was hospitalized more than once (this situation is similar to the one analyzed via teh so called shared frailty models, see .-stcox-).
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thank you Carlo for your answer. I will look into -xtreg,re- as an option for my analysis. Appreciate it.

      Comment

      Working...
      X