Help Creating Equal Deciles for a Skewed Subset of Data in Stata

Adrianna Ahmad

Join Date: May 2024

Posts: 5
#1

Help Creating Equal Deciles for a Skewed Subset of Data in Stata

07 May 2024, 23:25

I'm working with a dataset where I need to create decile groups for a specific variable, "totaldose_ach_sum", representing medication doses.
Notably, majority of the observations in my sample have a dose score of 0. This is the summary for totaldose_ach_sum>0

I want to create deciles only for observations where totaldose_ach_sum is greater than 0.

Here is the approach I've tried using the xtile function:

gen decile = .
xtile decile = totaldose_ach_sum if totaldose_ach_sum > 0, nq(10)
replace decile = 0 if totaldose_ach_sum == 0

This method does not seem to yield equal distribution across the deciles. As below:

As an alternative, I tried using a ranking approach:

gen decile_score = .
count if totaldose_ach_sum > 0
local total_n = r(N)
local n_per_decile = ceil(`total_n'/10)
egen rank = rank(totaldose_ach_sum) if totaldose_ach_sum > 0, field
forval i = 1/10 {
local lower = (`i' - 1) * `n_per_decile' + 1
local upper = `i' * `n_per_decile'
replace decile_score = `i' if rank >= `lower' & rank <= `upper'
}
replace decile_score = 0 if totaldose_ach_sum == 0

Unfortunately, the distribution remains uneven. As below:

Could anyone suggest adjustments or alternative methods to achieve an equal distribution of deciles for this subset of data?

Thank you for your help
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35754
#2

08 May 2024, 03:05

Equal frequencies can't be achieved here by any stretch . Whatever you do, your biggest bin has 87% of the values. The constraint is simple: observations with the same value must be assigned to the same bin.

What is the point of binning any way? See references in https://www.stata-journal.com/articl...article=dm0095 for why binning of this kind is usually a bad idea.

Detail: xtile is a command, not a function.
Comment
Adrianna Ahmad

Join Date: May 2024

Posts: 5
#3

09 May 2024, 19:09

Thanks for the explanation Nick
Comment

Announcement

Help Creating Equal Deciles for a Skewed Subset of Data in Stata

Comment

Comment