Hi guys,
I have a question about the validity of an approach rather than its execution in Stata. Please forgive me if those questions generally aren't addressed here, as I'm still fairly new to Statalist.
I am analyzing a dataset of about 500 emergency room visits looking at emergency department (ED) utilization among patients with diagnosed heart failure. One outcome of primary interest is whether these patients are admitted to the hospital from the ED or whether they're treated at the ED then released. I'm using an ANOVA model to predict this, using a total of 13 categorical independent variables.
Prior to the study, as we'd discussed predictive variables we wanted to include, we'd planned to include patients' race (black/white/etc.) and ethnicity (Hispanic/non-Hispanic) to make sure we didn't observe any disparities in that regard. Now that I have the dataset, only 1.4% of the sample is Hispanic and 94% of the sample is of the same race. Unsurprisingly, the ANOVA model finds these variables to be insignificant predictors of the outcome, and they slightly hurt the model's adjusted R-squared by increasing the model's degrees of freedom.
I know that stepwise techniques that determine predictive variables post hoc based on p-values in multivariable models are frowned upon. I'm wondering if it would be considered legitimate to remove these independent variables from the model, not because of the lack of predictive value, but because of the lack of variability of these variables in my dataset. Can anyone with a stronger statistical foundation than mine help me here? Am I honor-bound to include these variables in my final model because I'd planned to include them, or is it acceptable to drop them because of how little they vary in my data?
Thanks!
I have a question about the validity of an approach rather than its execution in Stata. Please forgive me if those questions generally aren't addressed here, as I'm still fairly new to Statalist.
I am analyzing a dataset of about 500 emergency room visits looking at emergency department (ED) utilization among patients with diagnosed heart failure. One outcome of primary interest is whether these patients are admitted to the hospital from the ED or whether they're treated at the ED then released. I'm using an ANOVA model to predict this, using a total of 13 categorical independent variables.
Prior to the study, as we'd discussed predictive variables we wanted to include, we'd planned to include patients' race (black/white/etc.) and ethnicity (Hispanic/non-Hispanic) to make sure we didn't observe any disparities in that regard. Now that I have the dataset, only 1.4% of the sample is Hispanic and 94% of the sample is of the same race. Unsurprisingly, the ANOVA model finds these variables to be insignificant predictors of the outcome, and they slightly hurt the model's adjusted R-squared by increasing the model's degrees of freedom.
I know that stepwise techniques that determine predictive variables post hoc based on p-values in multivariable models are frowned upon. I'm wondering if it would be considered legitimate to remove these independent variables from the model, not because of the lack of predictive value, but because of the lack of variability of these variables in my dataset. Can anyone with a stronger statistical foundation than mine help me here? Am I honor-bound to include these variables in my final model because I'd planned to include them, or is it acceptable to drop them because of how little they vary in my data?
Thanks!
Comment