Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Exploratory analysis: investigating within-group variability of a given variable

    I am trying to understand if I have enough variability of a given variable within groups.

    My data is at the physician-hospital-month level. I have an outcome variable (cost) for each physician-hospital-month. I also computed the mean outcome for the colleagues of a given physician in a given hospital-month. It is essentially a leave out mean: i.e., mean over all physicians in the hospital for a given month, excluding the physician identified in the given row.

    The outcome, therefore, varies at the physician-hospital-month level. However, it should vary little within hospital-months, specially in large hospitals (excluding one physician when computing the average of a group of physicians in a given hospital-month will have a smaller effect in larger groups). I don't know how I can show this with the data.

    I tried the following below. First, I estimated the SD and mean within hospital-month. Then, I computed the coefficient of variation (CV) among physicians within a given hospital-month to have a scaled measure of variability. Then, I summarized the CV among all the different combinations of hospital-month. I am not quite happy with this as it shows how a scaled measure of variability varies across groups. Instead, I would like to investigate whether there is *enough variability within groups* (across different physicians within hospital-month). Does this make sense?

    Code:
    . isid physician_id hosp_id ym // each row is identified by physician-hospital-month
    
    . 
    . bys hosp_id ym: egen sd_within_hosp_mon = sd(avg_peer_cost) // computing SD among physicians within hospital-m
    > onth
    (67,409 missing values generated)
    
    . bys hosp_id ym: egen mean_within_hosp_mon = mean(avg_peer_cost) // computing mean among physicians within hosp
    > ital-month
    (67,409 missing values generated)
    
    . gen cv_within_hosp_mon = sd_within_hosp_mon/mean_within_hosp_mon // computing coefficient of variation (CV) am
    > ong physicians within hospital-month
    (67,482 missing values generated)
    
    . 
    . bys hosp_id ym: gen hosp_ym_first = _n==1 // tagging first obs within hospital-ym
    
    . 
    . su cv_within_hosp_mon if hosp_ym_first, d // extent to which CV varies across different hospital-months
    
                         cv_within_hosp_mon
    -------------------------------------------------------------
          Percentiles      Smallest
     1%     .0037104              0
     5%     .0086899              0
    10%     .0122923              0       Obs             345,161
    25%     .0214667              0       Sum of Wgt.     345,161
    
    50%     .0415179                      Mean           .0801513
                            Largest       Std. Dev.        .11824
    75%      .087763       1.414214
    90%     .1812275       1.414214       Variance       .0139807
    95%     .2834875       1.414214       Skewness       4.381938
    99%     .6075859       1.414214       Kurtosis       29.97145
    Last edited by Paula de Souza Leao Spinola; 25 Jul 2022, 13:31.

  • #2
    Paula:
    1) if you want to go -fe- your idea makes sense, as we know that little within-panel variation makes the -fe- estimator unhappy;
    2) the qualitative issue concerning a quantitative topic is: how much within panel variation is enough to run the -fe- estimator with no worries? Unfortunately, I do not know about any hard and fast rule on that.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment

    Working...
    X