Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do I depict the 75th, 85th and 90th percentiles in a box plot?

    Dear STATALIST forum,

    I'm looking for ways of visually depicting the association between a dichotomous explanatory variable (parod1) and a continuous dependent variable (cacs_tot) in a box plot displaying the 75th, 85th and 95th percentile.

    I tried a "normal" box plot but the results are not clearly visible due to a lot of outliers (please see the attached screenshot).

    In the adjusted model, certain confounders are included ("ageatvisitone", diabetes (diab1) and smoking, as you can see in the example below).

    The data is overdispersed and zero inflated (not due to missing, but simply most individuals have the value 0 for the dependent variable) so negative binomial regression (nbreg) was used for the analysis.

    The mean and median dependent varible "cacs_tot" is 0 as you can see below. And the percentiles of the dependent variable is as follows:

    percentiles: 10% 25% 50% 75% 90%
    0 0 0 21 138


    Code:
    Code:
     * Example generated by -dataex-. To install: ssc install dataex clear input int cacs_tot float parod1 byte cqed001 float(ageatvisitone diab1 smoking bmi) byte(sex cqah001) float ldlformattedresult int sbp_mean float medhyp byte(intakta kvarvarande) 30 0 4 53.1 0 1 24.8 0 4 2.7 113 0 14 28  0 0 4   62 0 1 22.5 1 3 3.2 114 0  9 30  0 0 4 51.4 0 1   26 0 5 2.2 120 0 17 30  0 0 4 63.3 2 1 34.9 1 3   6 114 0 16 31  . 0 4 57.5 1 1 28.5 0 1 6.9 132 1 13 32 end
    I have also conducted a quantlie regression to make inference on the 75th, 85th and 95th percentile, and the inference is statistically significant.

    Regards,
    Niko
    Attached Files

  • #2
    The means can't be zero!

    Box plots here aren't very helpful in my view. Quantile plots would work better, but we need to use a transformed scale that can accommodate zeros. log1p() is a good example and cube roots or square roots are some other choices.

    mylabels is from the Stata Journal.

    Supplying a good y axis title would help. I don't know what CACS_Tot means or what its units are. Give it a good variable label.

    Code:
    clear 
    set obs 200 
    gen parod1 = _n > 100 
    
    bysort parod1 : gen wanted = 0 if _n < cond(parod1 == 0, 60, 70)
    set seed 2803 
    replace wanted = runiformint(1, 50)^2 if wanted == . 
    
    gen toshow = log1p(wanted)
    bysort parod1 (wanted) : gen pp = (_n - 0.5) / _N
    
    su wanted  
    
    foreach pp in 95 85 75 {
        by parod1 : egen pc`pp' = pctile(toshow), p(`pp')
    }
    
    mylabels 0 10 20 50 100 200 500 1000 2000, myscale(log1p(@)) local(yla)
    
    scatter toshow pp, by(parod1, note("")) xla(0 0.25 "0.25" 0.5 "0.5"  0.75 "0.75" 1) ///
    xtitle(Cumulative probability) xli(0.75 0.85 0.95) yla(`yla') || line pc?? pp, ///
    legend(order(2 "95%" 3 "85%" 4 "75%"))

    Click image for larger version

Name:	q758595.png
Views:	1
Size:	54.4 KB
ID:	1751139

    Comment


    • #3
      Thank you Nick!

      Regards
      Niko
      Last edited by Niko Vahasarja; Yesterday, 00:50.

      Comment

      Working...
      X