Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help Creating Equal Deciles for a Skewed Subset of Data in Stata

    I'm working with a dataset where I need to create decile groups for a specific variable, "totaldose_ach_sum", representing medication doses.
    Notably, majority of the observations in my sample have a dose score of 0. This is the summary for totaldose_ach_sum>0

    Click image for larger version

Name:	skewed.png
Views:	1
Size:	40.0 KB
ID:	1752637



    I want to create deciles only for observations where totaldose_ach_sum is greater than 0.

    Here is the approach I've tried using the xtile function:

    gen decile = .
    xtile decile = totaldose_ach_sum if totaldose_ach_sum > 0, nq(10)
    replace decile = 0 if totaldose_ach_sum == 0


    This method does not seem to yield equal distribution across the deciles. As below:
    Click image for larger version

Name:	xtile.png
Views:	1
Size:	44.0 KB
ID:	1752638



    As an alternative, I tried using a ranking approach:

    gen decile_score = .
    count if totaldose_ach_sum > 0
    local total_n = r(N)
    local n_per_decile = ceil(`total_n'/10)
    egen rank = rank(totaldose_ach_sum) if totaldose_ach_sum > 0, field
    forval i = 1/10 {
    local lower = (`i' - 1) * `n_per_decile' + 1
    local upper = `i' * `n_per_decile'
    replace decile_score = `i' if rank >= `lower' & rank <= `upper'
    }
    replace decile_score = 0 if totaldose_ach_sum == 0

    Unfortunately, the distribution remains uneven. As below:
    Click image for larger version

Name:	rank.png
Views:	1
Size:	11.8 KB
ID:	1752639


    Could anyone suggest adjustments or alternative methods to achieve an equal distribution of deciles for this subset of data?

    Thank you for your help

  • #2
    Equal frequencies can't be achieved here by any stretch . Whatever you do, your biggest bin has 87% of the values. The constraint is simple: observations with the same value must be assigned to the same bin.

    What is the point of binning any way? See references in https://www.stata-journal.com/articl...article=dm0095 for why binning of this kind is usually a bad idea.

    Detail: xtile is a command, not a function.

    Comment


    • #3
      Thanks for the explanation Nick

      Comment

      Working...
      X