Dear Statalist members,
I am performing a linear mixed-effects model analysis on hospital-acquired infection rates over time with Stata 13. I have two questions concerning the fitting of the mixed-effects model and the role of the residual variance.
The objective of my analysis is to identify whether hospital-acquired infection rates for hospitals participating in infection surveillance changes over the years. My dependant variable is the 'infect_rate' and my independent variable is 'year'. Hospitals report infections per trimester within a year (at least 1 trimester per year).
The infection rate = [number of infections per trimester] / [number of patient hospitalisation days per trimester]
There are multiple factors that influence the infection rate. Because it is an optional surveillance program, hospital participation changes over time. Hospitals with at least 5 data points over the entire time period of 15 years were selected. Additionally, differing hospital characteristics (number of beds, average length of hospitalisation for patients) are known to influence the infection rate. For this, a random-effects model was applied at the hospital level.
An example of how I'm trying to model the results:
mixed infect_rate year ||hosp: bedsize lengthofstay
(see attached figure)
As I understand it, to get the most accurate calculation for the change over time (slope of the independent variable 'year'), the model requires the best fit: i.e. the least amount of residual variance. For linear regression analysis you have the forward or backward stepwise regression where you choose to keep variables that contribute significantly to the model. However for this model I can't identify if the contribution is significant. I can add&remove variables that influence the residual variance, but there is no indication of the relative contribution per variable.
Which brings me to my two questions:
1. How do I best fit this model if I cannot identify significant contributions.
2. At what point does the residual variance become acceptable? I think it depends on the context because the residual variance changes with the size of the variables. If there is a residual variance of 5, then does that mean that the model fits with an infection_rate variation of ±5 (comparable to the standard deviation)?
Thanks in advance,
Koen
I am performing a linear mixed-effects model analysis on hospital-acquired infection rates over time with Stata 13. I have two questions concerning the fitting of the mixed-effects model and the role of the residual variance.
The objective of my analysis is to identify whether hospital-acquired infection rates for hospitals participating in infection surveillance changes over the years. My dependant variable is the 'infect_rate' and my independent variable is 'year'. Hospitals report infections per trimester within a year (at least 1 trimester per year).
The infection rate = [number of infections per trimester] / [number of patient hospitalisation days per trimester]
There are multiple factors that influence the infection rate. Because it is an optional surveillance program, hospital participation changes over time. Hospitals with at least 5 data points over the entire time period of 15 years were selected. Additionally, differing hospital characteristics (number of beds, average length of hospitalisation for patients) are known to influence the infection rate. For this, a random-effects model was applied at the hospital level.
An example of how I'm trying to model the results:
mixed infect_rate year ||hosp: bedsize lengthofstay
(see attached figure)
As I understand it, to get the most accurate calculation for the change over time (slope of the independent variable 'year'), the model requires the best fit: i.e. the least amount of residual variance. For linear regression analysis you have the forward or backward stepwise regression where you choose to keep variables that contribute significantly to the model. However for this model I can't identify if the contribution is significant. I can add&remove variables that influence the residual variance, but there is no indication of the relative contribution per variable.
Which brings me to my two questions:
1. How do I best fit this model if I cannot identify significant contributions.
2. At what point does the residual variance become acceptable? I think it depends on the context because the residual variance changes with the size of the variables. If there is a residual variance of 5, then does that mean that the model fits with an infection_rate variation of ±5 (comparable to the standard deviation)?
Thanks in advance,
Koen
Comment