Ranking - Statalist

Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#16

15 Dec 2023, 14:22

Well, you do not show the actual exact code you have used, so nobody can be certain that there isn't an error in the code somewhere. But my best guess is that you are getting this error message because of a data problem. I suspect that there is some school with fewer than 8 observations: without more observations you cannot divide these into 10 deciles. You either need to get more data, or choose coarser quantiles for this analysis.
Comment
Katherine Oleas

Join Date: Aug 2021

Posts: 80
#17

15 Dec 2023, 14:37

So how do I eliminate schools that have less than 8 observations?
Levelsof colegio, local (colegies)
gen ranking_cole_alum=•
foreach p of local colegios{
xtile pd = nota_general it colegio == ‘p’, nq(100) replace ranking_cole_alum = pd if colegio== ‘p’
drop pd
}

Last edited by Katherine Oleas; 15 Dec 2023, 14:43.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30101

#18

15 Dec 2023, 14:51

Code:

levelsof colegio, local (colegies)
gen ranking_cole_alum=•
foreach p of local colegios{
    quietly count if !missing(nota_general)
    if r(N) > 8 {
        xtile pd = nota_general it colegio == `p', nq(100)
        replace ranking_cole_alum = pd if colegio== `p'
        drop pd
    }
}

Comment

Katherine Oleas

Join Date: Aug 2021

Posts: 80
#19

20 Dec 2023, 09:49

Good morning, I already used the code you recommended, but I keep getting this error:
nquantiles () must be less than or aqual to number of observartion plus one
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35699

#20

20 Dec 2023, 10:17

You're asking for 100 bins when there are fewer observations than that to do it with. The request makes very limited sense since some (if not most) of the bins are doomed to be empty. Stata won't even try.

I wonder whether you're confusing quantile bins with percentile rank. Let's rank foreign cars in the auto dataset on their mpg

Code:

. sysuse auto, clear
(1978 automobile data)

. keep if foreign
(52 observations deleted)

. sort mpg

. l make mpg

     +----------------------+
     | make             mpg |
     |----------------------|
  1. | Peugeot 604       14 |
  2. | Audi 5000         17 |
  3. | Volvo 260         17 |
  4. | Toyota Corona     18 |
  5. | Toyota Celica     18 |
     |----------------------|
  6. | Fiat Strada       21 |
  7. | Datsun 810        21 |
  8. | Audi Fox          23 |
  9. | Datsun 200        23 |
 10. | VW Dasher         23 |
     |----------------------|
 11. | Datsun 510        24 |
 12. | BMW 320i          25 |
 13. | Honda Accord      25 |
 14. | VW Rabbit         25 |
 15. | VW Scirocco       25 |
     |----------------------|
 16. | Renault Le Car    26 |
 17. | Honda Civic       28 |
 18. | Mazda GLC         30 |
 19. | Toyota Corolla    31 |
 20. | Datsun 210        35 |
     |----------------------|
 21. | Subaru            35 |
 22. | VW Diesel         41 |
     +----------------------+

There are only 22 such cars. What can be defended is an estimate of percentile rank. For more detail, see https://www.stata.com/support/faqs/s...ting-positions

As the FAQ explains, there are different recipes. For most purposes I like best 100 * (rank - 0.5) / sample size. That splits the difference between 100 * rank / sample size, which produces 100/n to 100, and 100 (rank - 1)/sample size, which produces 0 to 100 - 100/n.

Code:

. egen rank = rank(mpg)

. gen pcrank = 100 * (rank - 0.5) / 22

. l make mpg rank pcrank

     +----------------------------------------+
     | make             mpg   rank     pcrank |
     |----------------------------------------|
  1. | Peugeot 604       14      1   2.272727 |
  2. | Audi 5000         17    2.5   9.090909 |
  3. | Volvo 260         17    2.5   9.090909 |
  4. | Toyota Corona     18    4.5   18.18182 |
  5. | Toyota Celica     18    4.5   18.18182 |
     |----------------------------------------|
  6. | Fiat Strada       21    6.5   27.27273 |
  7. | Datsun 810        21    6.5   27.27273 |
  8. | Audi Fox          23      9   38.63636 |
  9. | Datsun 200        23      9   38.63636 |
 10. | VW Dasher         23      9   38.63636 |
     |----------------------------------------|
 11. | Datsun 510        24     11   47.72727 |
 12. | BMW 320i          25   13.5   59.09091 |
 13. | Honda Accord      25   13.5   59.09091 |
 14. | VW Rabbit         25   13.5   59.09091 |
 15. | VW Scirocco       25   13.5   59.09091 |
     |----------------------------------------|
 16. | Renault Le Car    26     16   70.45454 |
 17. | Honda Civic       28     17         75 |
 18. | Mazda GLC         30     18   79.54546 |
 19. | Toyota Corolla    31     19   84.09091 |
 20. | Datsun 210        35   20.5   90.90909 |
     |----------------------------------------|
 21. | Subaru            35   20.5   90.90909 |
 22. | VW Diesel         41     22   97.72727 |
     +----------------------------------------+

Now tied values must receive the same rank, and so the same percentile rank. In this toy dataset, only 13 bins could possibly be populated because of ties. (groups is from the Stata Journal, and just a convenient tool to illustrate the point.)

Code:

 
. groups mpg rank pcrank

  +-----------------------------------------+
  | mpg   rank     pcrank   Freq.   Percent |
  |-----------------------------------------|
  |  14      1   2.272727       1      4.55 |
  |  17    2.5   9.090909       2      9.09 |
  |  18    4.5   18.18182       2      9.09 |
  |  21    6.5   27.27273       2      9.09 |
  |  23      9   38.63636       3     13.64 |
  |-----------------------------------------|
  |  24     11   47.72727       1      4.55 |
  |  25   13.5   59.09091       4     18.18 |
  |  26     16   70.45454       1      4.55 |
  |  28     17         75       1      4.55 |
  |  30     18   79.54546       1      4.55 |
  |-----------------------------------------|
  |  31     19   84.09091       1      4.55 |
  |  35   20.5   90.90909       2      9.09 |
  |  41     22   97.72727       1      4.55 |
  +-----------------------------------------+

Comment

Katherine Oleas

Join Date: Aug 2021

Posts: 80
#21

20 Dec 2023, 14:42

I have 3770 schools so I want to get the decil for each school. So I want to understand the logic of how to use the command so that I don't get the error I mentioned levelsof colegio, local (colegies) gen ranking_cole_alum=• foreach p of local colegios{ quietly count if !missing(nota_general) if r(N) > 8 { xtile pd = nota_general it colegio == `p', nq(10) replace ranking_cole_alum = pd if colegio== `p' drop pd } }
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35699
#22

20 Dec 2023, 15:00

We're evidently at cross-purposes here. You're not engaging with my attempt at explaining why xtile won't play when you ask for 100 bins -- and also changing the question.

But the count in #21 is barely relevant to your problem. You are counting how many observations are !missing(nota_general) but the number of such observations over all 3770 schools is likely to be enormously greater than 8. The fact that the command is inside your loop over schools doesn't itself force the code to look at each school in turn.

Other way round, 9 or 10 observations would be insufficient for binning into deciles to work.

What is important for each application of xtile is how many are not missing in each colegio.

Code:

it colegio == `p'

is presumably a typo for

Code:

if colegio == `p'

There are other typos in your code which may be contributing to your problems if they occur in your real code.

To make your code readable, please use CODE delimiters and respect Stata's separation of commands.

Quantile binning is popular in some social science fields, but highly problematic with small samples and/or frequent ties.
Comment
Katherine Oleas

Join Date: Aug 2021

Posts: 80
#23

20 Dec 2023, 21:33

My apologies for writing the previous code wrong, is that unfortunately at the place where I am working I cannot enter my computer and I have to write from my cell phone, which is why there are errors in the code.
The base I am working with has the grade for each student (nota_general). What I am trying to do is get the deciles at the school level (colegio), for example for school A that has 500 students I need to get the deciles based on their grades, for school B that has 413 students I need to get the deciles. deciles So the original code I was using was:

Code:

levelsof colegio, local (colegies) gen ranking_cole_alum= . foreach p of local colegies{ xtile pd = nota_general it colegio == ‘p’, nq(10) replace ranking_cole_alum = pd if colegio== ‘p’ drop pd }

It was not 100 but 10, it was my typing error.

However, when running the code I get the following error:
nquantiles () must be less than or aqual to number of observartion plus one

I'm not sure why this error occurs. Then Clyde suggested to me that perhaps the problem could be caused by the number of observations that each school has and suggested I use the following command:

Code:

levelsof colegio, local (colegies) gen ranking_cole_alum=. foreach p of local colegies{ quietly count if !missing(nota_general) if r(N) > 8 { xtile pd = nota_general if colegio == `p', nq(10) replace ranking_cole_alum = pd if colegio== `p' drop pd } }

At first it seems that the command works but it reaches a point where I get the error message again:
nquantiles () must be less than or aqual to number of observartion plus one

So I don't know how to solve it or what could be happening.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#24

20 Dec 2023, 22:22

I think at this point it is unlikely that anyone can help you without example data that reproduces your problem. I understand that it may be difficult or impossible for you to do that given the conditions at the place where you are working. But it is hard to see how your problem might get resolved without that.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment