Hello,
I'm working on implement a two-stage residual inclusion model - Gamma family and log-link.
The primary outcome is total healthcare costs that have been standardized across hospitals, essentially it represents resource utilization instead of actual dollar costs, more importantly it is non-zero and positively skewed. For the initial models I included patient characteristics/demographics as covariates however clinical features will be included in later models.
The main predictor of interest is opioid exposure during hospitalization which we've measured as the number of days that a patient received an opioid, this is a continuous variable and is informed by an instrumental variable analysis to disentangle unobserved cofounding related to disease severity by using cumulative days for similar patients at the same hospital for the year prior. (the first stage)
We have healthcare costs coming from 47 hospitals, as the data is multilevel we're clustering by hospital. Additionally, from reading it's my understanding that the standard errors (t-z-statistics, p-values) of the estimates of the elements of ˆβ (the 2SRI elements of β) as displayed from Stata output are not correct (i.e. cannot be used to estimate asymptotic confidence intervals or to conduct asymptotichypothesis tests) - therefore we're also bootstrapping to approximate the asymptotically correct standard
errors (ACSE) - for this proof of concept I did 5 replications though this will be increased later.
Of particular interest is resource utilization across a variety of high risk infant categories (dx_premature, px_chd, nonnecabd, mednec, surgnec, dx_vlbw, dx_elbw, ecmo) these are dichotomous dummy variables and patients are complex and can have multiple flags (e.g. a patient could have px_chd and dx_vlbw). There’s one other category dx_hie which I’ve left out of the model as it serves as the reference for the intercept when interpreting the beta coefficients from the model when setting all other covariates to their ref category.
Hello,
I'm working on implement a two-stage residual inclusion model - Gamma family and log-link.
The primary outcome is total healthcare costs that have been standardized across hospitals, essentially it represents resource utilization instead of actual dollar costs, more importantly it is non-zero and positively skewed. For the initial models I included patient characteristics/demographics as covariates however clinical features will be included in later models.
The main predictor of interest is opioid exposure during hospitalization which we've measured as the number of days that a patient received an opioid, this is a continuous variable and is informed by an instrumental variable analysis to disentangle unobserved cofounding related to disease severity by using cumulative days for similar patients at the same hospital for the year prior. (the first stage)
We have healthcare costs coming from 47 hospitals, as the data is multilevel we're clustering by hospital. Additionally, from reading it's my understanding that the standard errors (t-z-statistics, p-values) of the estimates of the elements of ˆβ (the 2SRI elements of β) as displayed from Stata output are not correct (i.e. cannot be used to estimate asymptotic confidence intervals or to conduct asymptotichypothesis tests) - therefore we're also bootstrapping to approximate the asymptotically correct standard
errors (ACSE) - for this proof of concept I did 5 replications though this will be increased later.
Of particular interest is resource utilization across a variety of high risk infant categories (dx_premature, px_chd, nonnecabd, mednec, surgnec, dx_vlbw, dx_elbw, ecmo) these are dichotomous dummy variables and patients are complex and can have multiple flags (e.g. a patient could have px_chd and dx_vlbw). There’s one other category dx_hie which I’ve left out of the model as it serves as the reference for the intercept when interpreting the beta coefficients from the model when setting all other covariates to their ref category.


I’m a little confused with the marginal command, and the decision between AMEs and MEMs both seem slightly counterintuitive when thinking about the beta coefficients from the model and how to interpret them when holding over covariates at the reference.
From reading my interpreting these marginal effects, is that all_opioid_sum would be - the average marginal effect of a 1-unit increase in days on an opioid is associated with a 3804 increase in the standardized unit cost at the observed value of other covariates. Is this an appropriate interpretation for a continuous variable?
Additionally, how would I get the marginal effect (AME or MEM) for patients that had dx_hie – essentially the intercept when all covariates are set at the reference?
I was hoping to illustrate some type of figure like this however my attempt to add in the intercept (dx_hie) it did not have the desired result.
I'm working on implement a two-stage residual inclusion model - Gamma family and log-link.
The primary outcome is total healthcare costs that have been standardized across hospitals, essentially it represents resource utilization instead of actual dollar costs, more importantly it is non-zero and positively skewed. For the initial models I included patient characteristics/demographics as covariates however clinical features will be included in later models.
The main predictor of interest is opioid exposure during hospitalization which we've measured as the number of days that a patient received an opioid, this is a continuous variable and is informed by an instrumental variable analysis to disentangle unobserved cofounding related to disease severity by using cumulative days for similar patients at the same hospital for the year prior. (the first stage)
We have healthcare costs coming from 47 hospitals, as the data is multilevel we're clustering by hospital. Additionally, from reading it's my understanding that the standard errors (t-z-statistics, p-values) of the estimates of the elements of ˆβ (the 2SRI elements of β) as displayed from Stata output are not correct (i.e. cannot be used to estimate asymptotic confidence intervals or to conduct asymptotichypothesis tests) - therefore we're also bootstrapping to approximate the asymptotically correct standard
errors (ACSE) - for this proof of concept I did 5 replications though this will be increased later.
Of particular interest is resource utilization across a variety of high risk infant categories (dx_premature, px_chd, nonnecabd, mednec, surgnec, dx_vlbw, dx_elbw, ecmo) these are dichotomous dummy variables and patients are complex and can have multiple flags (e.g. a patient could have px_chd and dx_vlbw). There’s one other category dx_hie which I’ve left out of the model as it serves as the reference for the intercept when interpreting the beta coefficients from the model when setting all other covariates to their ref category.
Hello,
I'm working on implement a two-stage residual inclusion model - Gamma family and log-link.
The primary outcome is total healthcare costs that have been standardized across hospitals, essentially it represents resource utilization instead of actual dollar costs, more importantly it is non-zero and positively skewed. For the initial models I included patient characteristics/demographics as covariates however clinical features will be included in later models.
The main predictor of interest is opioid exposure during hospitalization which we've measured as the number of days that a patient received an opioid, this is a continuous variable and is informed by an instrumental variable analysis to disentangle unobserved cofounding related to disease severity by using cumulative days for similar patients at the same hospital for the year prior. (the first stage)
We have healthcare costs coming from 47 hospitals, as the data is multilevel we're clustering by hospital. Additionally, from reading it's my understanding that the standard errors (t-z-statistics, p-values) of the estimates of the elements of ˆβ (the 2SRI elements of β) as displayed from Stata output are not correct (i.e. cannot be used to estimate asymptotic confidence intervals or to conduct asymptotichypothesis tests) - therefore we're also bootstrapping to approximate the asymptotically correct standard
errors (ACSE) - for this proof of concept I did 5 replications though this will be increased later.
Of particular interest is resource utilization across a variety of high risk infant categories (dx_premature, px_chd, nonnecabd, mednec, surgnec, dx_vlbw, dx_elbw, ecmo) these are dichotomous dummy variables and patients are complex and can have multiple flags (e.g. a patient could have px_chd and dx_vlbw). There’s one other category dx_hie which I’ve left out of the model as it serves as the reference for the intercept when interpreting the beta coefficients from the model when setting all other covariates to their ref category.
HTML Code:
*************************************************/ glm all_opioid_sum ib0.female ib1.race ib2.ethnicity_num ib0.insurance ib0.dx_premature ib0.px_chd ib0.nonnecabd ib0.mednec ib0.surgnec ib0.dx_vlbw ib0.dx_elbw ib0.ecmo ib0.nicu_icu i.ccc_count2 ib0.vent ib0.tpn move_avg_1year, family(gamma) link(log) vce(cluster hospital_number) /************************************************* ** Save the first stage residuals. ** *************************************************/ predict Xuhat, response /************************************************* ** Apply GLM for the 2SRI second stage. ** *************************************************/ glm infla_total_suc all_opioid_sum ib0.female ib1.race ib2.ethnicity_num ib0.insurance ib0.dx_premature ib0.px_chd ib0.nonnecabd ib0.mednec ib0.surgnec ib0.dx_vlbw ib0.dx_elbw ib0.ecmo ib0.nicu_icu i.ccc_count2 ib0.vent ib0.tpn Xuhat, family(gamma) link(log) vce(bootstrap, reps(5) cluster(hospital_number) bca) margins, dydx(*) /************************************************* ** End Stata program for bootstrapping. ** *************************************************/
I’m a little confused with the marginal command, and the decision between AMEs and MEMs both seem slightly counterintuitive when thinking about the beta coefficients from the model and how to interpret them when holding over covariates at the reference.
From reading my interpreting these marginal effects, is that all_opioid_sum would be - the average marginal effect of a 1-unit increase in days on an opioid is associated with a 3804 increase in the standardized unit cost at the observed value of other covariates. Is this an appropriate interpretation for a continuous variable?
Additionally, how would I get the marginal effect (AME or MEM) for patients that had dx_hie – essentially the intercept when all covariates are set at the reference?
I was hoping to illustrate some type of figure like this however my attempt to add in the intercept (dx_hie) it did not have the desired result.
HTML Code:
margins, dydx(_cons dx_premature px_chd nonnecabd mednec surgnec dx_vlbw dx_elbw ecmo) at(all_opioid_sum=(5(5)60)) atmeans