I'm working with a dataset where I need to create decile groups for a specific variable, "totaldose_ach_sum", representing medication doses.
Notably, majority of the observations in my sample have a dose score of 0. This is the summary for totaldose_ach_sum>0

I want to create deciles only for observations where totaldose_ach_sum is greater than 0.
Here is the approach I've tried using the xtile function:
gen decile = .
xtile decile = totaldose_ach_sum if totaldose_ach_sum > 0, nq(10)
replace decile = 0 if totaldose_ach_sum == 0
This method does not seem to yield equal distribution across the deciles. As below:

As an alternative, I tried using a ranking approach:
gen decile_score = .
count if totaldose_ach_sum > 0
local total_n = r(N)
local n_per_decile = ceil(`total_n'/10)
egen rank = rank(totaldose_ach_sum) if totaldose_ach_sum > 0, field
forval i = 1/10 {
local lower = (`i' - 1) * `n_per_decile' + 1
local upper = `i' * `n_per_decile'
replace decile_score = `i' if rank >= `lower' & rank <= `upper'
}
replace decile_score = 0 if totaldose_ach_sum == 0
Unfortunately, the distribution remains uneven. As below:

Could anyone suggest adjustments or alternative methods to achieve an equal distribution of deciles for this subset of data?
Thank you for your help
Notably, majority of the observations in my sample have a dose score of 0. This is the summary for totaldose_ach_sum>0
I want to create deciles only for observations where totaldose_ach_sum is greater than 0.
Here is the approach I've tried using the xtile function:
gen decile = .
xtile decile = totaldose_ach_sum if totaldose_ach_sum > 0, nq(10)
replace decile = 0 if totaldose_ach_sum == 0
This method does not seem to yield equal distribution across the deciles. As below:
As an alternative, I tried using a ranking approach:
gen decile_score = .
count if totaldose_ach_sum > 0
local total_n = r(N)
local n_per_decile = ceil(`total_n'/10)
egen rank = rank(totaldose_ach_sum) if totaldose_ach_sum > 0, field
forval i = 1/10 {
local lower = (`i' - 1) * `n_per_decile' + 1
local upper = `i' * `n_per_decile'
replace decile_score = `i' if rank >= `lower' & rank <= `upper'
}
replace decile_score = 0 if totaldose_ach_sum == 0
Unfortunately, the distribution remains uneven. As below:
Could anyone suggest adjustments or alternative methods to achieve an equal distribution of deciles for this subset of data?
Thank you for your help
Comment