Quintiles by value

Alysson Francisco

Join Date: Jun 2018

Posts: 6
#1

Quintiles by value

21 Jun 2019, 11:12

Dear Statalist,

I want to separate a specific variable in quintiles, not by the number of observations, which is the result generated by xtile command, but by the values of the observations.

The code with I used is this:

Code:

xtile port_LB = LB_a_1, nq(5) table port_LB, contents(n LB_a_1 min LB_a_1 max LB_a_1 mean LB_a_1)

This quintiles are formed by the numbers of observation, as we can see in the first columm.

There's any way to do this by the values of the observations? Does anyone have any idea about how to do this?

Thank you!

Last edited by Alysson Francisco; 21 Jun 2019, 11:22.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30124
#2

21 Jun 2019, 11:51

This quintiles are formed by the numbers of observation, as we can see in the first columm.

There's any way to do this by the values of the observations? Does anyone have any idea about how to do this?

I don't understand what you mean by this. Can you explain, or, better, provide some example data and then show what you want the result to look like.
1 like
Comment

Alysson Francisco

Join Date: Jun 2018
Posts: 6

21 Jun 2019, 14:52

Hi, Clyde.

This is part of my database. I renamed LB_a_1 to gross profit and port_LB to quintiles.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float gross_profit byte quintiles
  .2029267 3
 .27466056 4
  .0776501 1
  .1576813 2
 .23777103 3
  .3014265 4
 .08390924 1
 .16547313 2
  .2385117 3
  .3221767 4
  .0828101 1
 .16660033 2
  .1750131 2
 .23489387 3
 .06181969 1
 .12803297 1
 .19096455 3
 .25748774 3
.070972264 1
 .14391461 2
  .2282676 3
  .3017316 4
  .3786891 4
 .08248884 1
  .1910788 3
 .16843036 2
 .44742125 5
 .09678757 1
 .19497286 3
  .3208095 4
 .44742125 5
 .12232751 1
  .2221124 3
  .3037178 4
   .389981 5
.065729104 1
  .1168596 1
 .16843036 2
 .28889105 4
 .09105394 1
  .1777167 2
 .22721837 3
 .29732373 4
  .0836201 1
 .15665257 2
 .21167165 3
  .2921291 4
 .07056332 1
 .15794978 2
 .23639095 3
  .2989278 4
 .05041697 1
 .09325901 1
  .1244713 1
  .1809014 3
 .04665191 1
 .13138473 1
  .2094828 3
 .27780348 4
 .08002219 1
  .1402248 2
   .210746 3
 .28453812 4
 .06527575 1
 .12133432 1
 .18889205 3
  .1666926 2
 .24155883 3
 .07321396 1
 .15547764 2
   .227936 3
  .3193174 4
 .07096003 1
 .14997013 2
  .2033346 3
 .27716222 4
 .06871334 1
 .14448956 2
 .19901946 3
 .18628158 3
 .07198721 1
 .14604652 2
  .2338707 3
  .3221313 4
 .07961577 1
 .16129784 2
  .2428835 3
  .3909354 5
 .56055456 5
 .17199475 2
  .3452226 4
 .50433874 5
  .6777239 5
 .18824925 3
   .364068 4
   .459493 5
  .5533788 5
 .14893661 2
 .28366256 4
  .4243927 5
end

If I run the xtile command, these are the results:

Click image for larger version

Name: stata1.png
Views: 1
Size: 4.0 KB
ID: 1504319

Where N(gross_profit) provide the number of observations in dataset for each quintile So, the quintiles are based in the number of observations, in other words, the xtile command takes the total of observations and divide for 5, to generate the quintiles, and I do not want this.

I would like to generate a quitile based in the value of gross_profit.

Ex: If the lowest value of gross_profit is 0.04 and the highest is 1.85, I would like to do quintiles between them, based in the values of gross_profit variable. So, if the first quintile ends in a gross_profit = 0.2, for example, the observations between 0.04 and 0.2 create the first quintile, no matters how many observations have there.

The point is generate the quintiles without specifying the values, because there are lot of variables to do this.

Thank you for any help!

Last edited by Alysson Francisco; 21 Jun 2019, 15:01.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30124
#4

21 Jun 2019, 16:08

So, if the first quintile ends in a gross_profit = 0.2, for example, the observations between 0.04 and 0.2 create the first quintile, no matters how many observations have there.

OK, but where did that number 0.2 come from? How did you derive that number from the data? And what would be the upper ends of the other 4 groups? How are they calculated from the data?

By the way, whatever the answers to the above questions, you should not refer to these groups as quintiles. The word quintile specifically means five groups containing equal numbers of members (or as equal as possible given ties) based on the ordering.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35734
#5

21 Jun 2019, 23:05

I share most of Clive's puzzlement, not least over terminology: whatever you want, they can't be quintile-based bins or classes.

However, I find it a little easier to guess what you may be seeking, as similar-seeming questions arise here once every few years. These threads and references may help.

https://www.stata.com/statalist/arch.../msg00883.html

The thread starting with https://www.stata.com/statalist/arch.../msg00480.html

https://www.statalist.org/forums/for...g-jenks-splits

See also cluster kmeans and the community-contributed command

Code:

ssc install group1d help group1d

Despite some small creative pride in the program group1d -- creation being here translation rather than origination -- which goes back to Hartigan's wonderful book Clustering Algorithms in 1975, I am sceptical that so-called natural breaks that people seek aren't usually arbitrary, unrepeatable small gaps in a continuum.

I have it in mind to extend the program to include L_1 as well as L_2 as a criterion.

In the case of the kind of data in #3 I would be more likely to work with log profit than with profit.
Comment
Alysson Francisco

Join Date: Jun 2018

Posts: 6
#6

06 Jul 2019, 13:12

Thank you Clyde and Nick for your answers. Really helped me to solve my problem.

Thank you again!
Comment

Announcement