Analyzing length of stay in clustered data (back transform)

Shadassa Ourshalimian

Join Date: Mar 2019

Posts: 17
#1

Analyzing length of stay in clustered data (back transform)

01 Mar 2019, 15:49

Hello,

I'm a relatively new Stata user and am working on a project. I'm looking at length of stay (days) which is heavily right skewed and analgesic usage (days), the data is clustered within hospitals so I'm using fixed effects modeling.

Code:

xtmixed log_LOS ty_iv_usage opioid_usage keto_usage age_year sex ib1.race ib2.ethnicity ib0.insurance open perf year ib3.region || hospital_number :, mle variance nostderr

I think this is a fair approach, however the interpretation is not as intuitive as being able to say that change in a drugs usage increases LOS by X days. I'm curious about potentially using Duan's smearing to retransform the coefficient as a potential way to make the interpretation more audience friendly.

One of the issues is the clustering within hospitals, which to me adds a layer of complexity.

Another thought was to leave LOS untransformed and run a median (quantile) regression with clustered bootstrapping.

Code:

bootstrap, cluster(hospital_number) reps(100) seed(5) : qreg length_of_stay post_ty_iv_usage post_opioid_usage post_keto_usage age_year sex ib1.race ib2.ethnicity ib0.insurance open perf year ib3.region, quantile(.5)

This seems like a reasonable approach to being able to interpret the results as a change in drug use results in a median increase in LOS by X. However, I have not used the boostrap clustered command in Stata and want to verify that it is accurately accounting for patients being clustered within hospitals.

Hopefully I've given any readers enough information. Any suggestions/advice is welcome.

Thank you,
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30170
#2

01 Mar 2019, 16:11

I have only a passing acquaintance with Duan's smearing, so I won't try to comment on that aspect of your question. I hope somebody who does know it well will do so.

With regard to your -bootstrap- approach, you have properly assured that the bootstrapping will respect the nesting of observations within hospital by re-sampling at the hospital level, not the individual observation level. I think you should also add the -idcluster()- option to -bootstrap- here, as it might be needed by -qreg-. (Added: actually, probably it won't be needed, but it can't hurt to include it. If you've already tried running it without -idcluster()- and it runs fine, then just ignore me here.)

Finally, I suggest yet another approach. Instead of log-transforming the length of stay, why not use a generalized linear model with a log link:

Code:

meglm LOS ty_iv_usage opioid_usage keto_usage age_year sex /// ib1.race ib2.ethnicity ib0.insurance open perf year ib3.region /// || hospital_number:, link(log) family(gamma)

That way the coefficients will already be in the LOS metric, not the log_LOS metric, but you will have the scale compression and variance reduction that comes with log-transformation. Note that I have used the gamma family rather than the Gaussian family, though you might prefer the opposite. I generally prefer the gamma family for this kind of thing because I find it plays better with the wide scale of a length-of-stay variable. But you can decide that for yourself.
Comment
Shadassa Ourshalimian

Join Date: Mar 2019

Posts: 17
#3

01 Mar 2019, 17:42

Thank you for the keen response, I wasn't familiar with the meglm command but I think that this is potentially a much better/simpler approach. However, I have one concern since there is a mass of zeros for patients who had same-day-discharges, it's my understanding that the gamma distribution does not play well with zero values.

Curious what your thoughts are on a zero inflated Gamma or Poisson and suggested syntax.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30170
#4

01 Mar 2019, 17:56

I don't think -meglm- can accommodate a zero-inflated gamma. But Poisson is definitely an option. And you can then simplify the coding a bit by using -mepoisson- instead of -meglm-.

But in length-of-stay data, at least the length-of-stay data I have worked with, the minimum value has been 1. And if you really do have 0 length-of-stay values, you cannot have included them in the log-transformed regression.
Comment

Shadassa Ourshalimian

Join Date: Mar 2019
Posts: 17

03 Mar 2019, 17:29

Hello Clyde,

My apologies, I was multitasking and looking at the incorrect variable. You are correct, length of stay is has a minimum value of 1 so my concern is moot. I went ahead and tried to run the meglm, but encountered a little hiccup.

The model output gave me: cannot compute an improvement -- discontinuous region encountered

My thoughts are that there is collinearity between the drugs: ty_iv_usage opioid_usage keto_usage. However, I don't think it's extreme by running a regression and looking at the VIF they're pretty low.

My strategy was to re-run the model starting with only one variable and working my way up.

Code:

(worked) meglm  length_of_stay ty_iv_usage  || hospital_number:,  link(log) family(gamma) 

meglm  length_of_stay ty_iv_usage opioid_usage  || hospital_number:,  link(log) family(gamma)

meglm  length_of_stay ty_iv_usage opioid_usage keto_usage  || hospital_number:,  link(log) family(gamma)

meglm  length_of_stay ty_iv_usage opioid_usage keto_usage age_year  || hospital_number:,  link(log) family(gamma)

meglm  length_of_stay ty_iv_usage opioid_usage keto_usage age_year sex  || hospital_number:,  link(log) family(gamma)

meglm  length_of_stay ty_iv_usage opioid_usage keto_usage age_year sex ib1.race  || hospital_number:,  link(log) family(gamma)

meglm  length_of_stay ty_iv_usage opioid_usage keto_usage age_year sex ib1.race ib2.ethnicity  || hospital_number:,  link(log) family(gamma)

meglm  length_of_stay ty_iv_usage opioid_usage keto_usage age_year sex ib1.race ib2.ethnicity ib0.insurance  || hospital_number:,  link(log) family(gamma)

meglm  length_of_stay ty_iv_usage opioid_usage keto_usage age_year sex ib1.race ib2.ethnicity ib0.insurance open  || hospital_number:,  link(log) family(gamma)

meglm  length_of_stay ty_iv_usage opioid_usage keto_usage age_year sex ib1.race ib2.ethnicity ib0.insurance open perf  || hospital_number:,  link(log) family(gamma)

meglm  length_of_stay ty_iv_usage opioid_usage keto_usage age_year sex ib1.race ib2.ethnicity ib0.insurance open perf year  || hospital_number:,  link(log) family(gamma)

meglm  length_of_stay opioid_usage age_year sex ib1.race ib2.ethnicity ib0.insurance open perf year ib3.region || hospital_number:,  link(log) family(gamma)

(worked) meglm  length_of_stay ty_iv_usage opioid_usage age_year sex ib1.race ib2.ethnicity ib0.insurance open perf year  || hospital_number:,  link(log) family(gamma)

meglm  length_of_stay  keto_usage age_year sex ib1.race ib2.ethnicity ib0.insurance open perf year  || hospital_number:,  link(log) family(gamma)

meglm  length_of_stay  keto_usage opioid_usage age_year sex ib1.race ib2.ethnicity ib0.insurance open perf year  || hospital_number:,  link(log) family(gamma)

(worked) meglm  length_of_stay   age_year sex ib1.race ib2.ethnicity ib0.insurance open perf year ib3.region || hospital_number:,  link(log) family(gamma)

The only codes that ran I've marked with (worked), the rest failed due to a discontinuous region. It seems like the trouble variables are opioid_usage and keto_usage. My next step was trying to mess with the start values (IV) and startgrid (.1 1 10) but it didn't seem to have any effect.

Code:

meglm  length_of_stay ty_iv_usage opioid_usage keto_usage age_year sex ib1.race ib2.ethnicity ib0.insurance open perf year ib3.region || hospital_number:,  link(log) family(gamma) startvalues(iv) startgrid(.1 1 10)

Lastly I tried to orthogonalize ty_iv_usage opioid_usage keto_usage to see if that impacted it's ability to run but it was unsuccessful.

Do you have any thoughts that might put me on the right track?

Once again, thanks!

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30170
#6

03 Mar 2019, 17:42

Well, you've already taken all the reasonable steps to try to get this model to converge. I think at this point I would abandon it. Poisson is often better behaved. I would try

Code:

mepoisson length_of_stay ty_iv_usage opioid_usage keto_usage age_year sex ib1.race ib2.ethnicity ib0.insurance open perf year ib3.region || hospital_number:

and see if that works.

But I think first I would see if there is something odd about the distribution of length_of_stay in connection with some value of opioid_usage and keto_usage. Maybe do some histograms or -kdensity- plots conditional on all four combinations of those two variables and see if one of them looks bizarre in some way.
Comment

Announcement