I am trying to understand if I have enough variability of a given variable within groups.
My data is at the physician-hospital-month level. I have an outcome variable (cost) for each physician-hospital-month. I also computed the mean outcome for the colleagues of a given physician in a given hospital-month. It is essentially a leave out mean: i.e., mean over all physicians in the hospital for a given month, excluding the physician identified in the given row.
The outcome, therefore, varies at the physician-hospital-month level. However, it should vary little within hospital-months, specially in large hospitals (excluding one physician when computing the average of a group of physicians in a given hospital-month will have a smaller effect in larger groups). I don't know how I can show this with the data.
I tried the following below. First, I estimated the SD and mean within hospital-month. Then, I computed the coefficient of variation (CV) among physicians within a given hospital-month to have a scaled measure of variability. Then, I summarized the CV among all the different combinations of hospital-month. I am not quite happy with this as it shows how a scaled measure of variability varies across groups. Instead, I would like to investigate whether there is *enough variability within groups* (across different physicians within hospital-month). Does this make sense?
My data is at the physician-hospital-month level. I have an outcome variable (cost) for each physician-hospital-month. I also computed the mean outcome for the colleagues of a given physician in a given hospital-month. It is essentially a leave out mean: i.e., mean over all physicians in the hospital for a given month, excluding the physician identified in the given row.
The outcome, therefore, varies at the physician-hospital-month level. However, it should vary little within hospital-months, specially in large hospitals (excluding one physician when computing the average of a group of physicians in a given hospital-month will have a smaller effect in larger groups). I don't know how I can show this with the data.
I tried the following below. First, I estimated the SD and mean within hospital-month. Then, I computed the coefficient of variation (CV) among physicians within a given hospital-month to have a scaled measure of variability. Then, I summarized the CV among all the different combinations of hospital-month. I am not quite happy with this as it shows how a scaled measure of variability varies across groups. Instead, I would like to investigate whether there is *enough variability within groups* (across different physicians within hospital-month). Does this make sense?
Code:
. isid physician_id hosp_id ym // each row is identified by physician-hospital-month
.
. bys hosp_id ym: egen sd_within_hosp_mon = sd(avg_peer_cost) // computing SD among physicians within hospital-m
> onth
(67,409 missing values generated)
. bys hosp_id ym: egen mean_within_hosp_mon = mean(avg_peer_cost) // computing mean among physicians within hosp
> ital-month
(67,409 missing values generated)
. gen cv_within_hosp_mon = sd_within_hosp_mon/mean_within_hosp_mon // computing coefficient of variation (CV) am
> ong physicians within hospital-month
(67,482 missing values generated)
.
. bys hosp_id ym: gen hosp_ym_first = _n==1 // tagging first obs within hospital-ym
.
. su cv_within_hosp_mon if hosp_ym_first, d // extent to which CV varies across different hospital-months
cv_within_hosp_mon
-------------------------------------------------------------
Percentiles Smallest
1% .0037104 0
5% .0086899 0
10% .0122923 0 Obs 345,161
25% .0214667 0 Sum of Wgt. 345,161
50% .0415179 Mean .0801513
Largest Std. Dev. .11824
75% .087763 1.414214
90% .1812275 1.414214 Variance .0139807
95% .2834875 1.414214 Skewness 4.381938
99% .6075859 1.414214 Kurtosis 29.97145

Comment