Exploratory analysis: investigating within-group variability of a given variable

Paula de Souza Leao Spinola

Join Date: Jun 2015

Posts: 384
#1

Exploratory analysis: investigating within-group variability of a given variable

25 Jul 2022, 13:28

I am trying to understand if I have enough variability of a given variable within groups.

My data is at the physician-hospital-month level. I have an outcome variable (cost) for each physician-hospital-month. I also computed the mean outcome for the colleagues of a given physician in a given hospital-month. It is essentially a leave out mean: i.e., mean over all physicians in the hospital for a given month, excluding the physician identified in the given row.

The outcome, therefore, varies at the physician-hospital-month level. However, it should vary little within hospital-months, specially in large hospitals (excluding one physician when computing the average of a group of physicians in a given hospital-month will have a smaller effect in larger groups). I don't know how I can show this with the data.

I tried the following below. First, I estimated the SD and mean within hospital-month. Then, I computed the coefficient of variation (CV) among physicians within a given hospital-month to have a scaled measure of variability. Then, I summarized the CV among all the different combinations of hospital-month. I am not quite happy with this as it shows how a scaled measure of variability varies across groups. Instead, I would like to investigate whether there is *enough variability within groups* (across different physicians within hospital-month). Does this make sense?

Code:

. isid physician_id hosp_id ym // each row is identified by physician-hospital-month . . bys hosp_id ym: egen sd_within_hosp_mon = sd(avg_peer_cost) // computing SD among physicians within hospital-m > onth (67,409 missing values generated) . bys hosp_id ym: egen mean_within_hosp_mon = mean(avg_peer_cost) // computing mean among physicians within hosp > ital-month (67,409 missing values generated) . gen cv_within_hosp_mon = sd_within_hosp_mon/mean_within_hosp_mon // computing coefficient of variation (CV) am > ong physicians within hospital-month (67,482 missing values generated) . . bys hosp_id ym: gen hosp_ym_first = _n==1 // tagging first obs within hospital-ym . . su cv_within_hosp_mon if hosp_ym_first, d // extent to which CV varies across different hospital-months cv_within_hosp_mon ------------------------------------------------------------- Percentiles Smallest 1% .0037104 0 5% .0086899 0 10% .0122923 0 Obs 345,161 25% .0214667 0 Sum of Wgt. 345,161 50% .0415179 Mean .0801513 Largest Std. Dev. .11824 75% .087763 1.414214 90% .1812275 1.414214 Variance .0139807 95% .2834875 1.414214 Skewness 4.381938 99% .6075859 1.414214 Kurtosis 29.97145

Last edited by Paula de Souza Leao Spinola; 25 Jul 2022, 13:31.
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17852
#2

26 Jul 2022, 04:01

Paula:
1) if you want to go -fe- your idea makes sense, as we know that little within-panel variation makes the -fe- estimator unhappy;
2) the qualitative issue concerning a quantitative topic is: how much within panel variation is enough to run the -fe- estimator with no worries? Unfortunately, I do not know about any hard and fast rule on that.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment

Announcement

Exploratory analysis: investigating within-group variability of a given variable

Comment