Ranking

Katherine Oleas

Join Date: Aug 2021

Posts: 80
#1

Ranking

29 Nov 2023, 13:19

hello with everyone I have a base that has a grade variable for university entrance and based on that variable I want to create a grade ranking variable at the general, provincial and cantonal level. Does anyone know what code I have to use?
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#2

29 Nov 2023, 13:52

You have left a great deal unspecified here. Suffice it to say that the solution to your problem will likely involve some application of the -egen- -rank()- function. So I'll refer you to -help egen-; scroll down to the -rank()- function and read up on its various options and what they do. The specific way you will use it will depend on how you want the ranks computed, and also on how provinces and cantons are identified in your data set.
Comment
Katherine Oleas

Join Date: Aug 2021

Posts: 80
#3

30 Nov 2023, 10:28

Sorry, I think I need to explain more, I have a variable that contains the grades for university entrance called grade_uni, and I want to create percentiles to know in which centile each observation is. I want to do all this at a general level, and also by province and canton, for which I have an identified prov_id and canton_id, respectively. But I don't know what the code I should use should be.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#4

30 Nov 2023, 11:44

I'm sorry, but I still don't understand your starting point, nor your goal here.

I don't know if you are starting with one observation per university, or if you have student-level data and one observation per student. If the latter, do you first want to in some way aggregate the student-level data to come up with some "average" value of grade_uni for each university? Or do you want to do the percentiles for the student-level observations.

Also, in #1 you spoke of ranking, but now you speak of percentiles. These are related, but different things, and now I am unclear about which you want. While you can, of course, calculate percentiles once you have ranks, in Stata you can directly calculate the percentiles without first getting the ranks. So do you actually need the ranks, or do you just need the percentiles?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#5

30 Nov 2023, 12:06

I think Katherine is talking about data from Ecuador. I don't know anything about education in Ecuador, but I do guess that in general grade can be anything from an ordered letter grade such as A, E, D, C, B, A through a percent (mark out of 100) to something else.

Percentiles won't work well even if in principle there are 100 distinct possibilities, because the distribution of marks is likely to be lumpy.

https://www.stata.com/support/faqs/s...ing-positions/ may help here.

But I agree with Clyde Schechter: we need more detail to give better advice.
Comment
Katherine Oleas

Join Date: Aug 2021

Posts: 80
#6

01 Dec 2023, 07:50

Sorry for confusing. What I hace are the grades for each student and I want to see what percentil each one is in.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#7

01 Dec 2023, 11:18

So, assuming one observation per student in the data set:

Code:

xtile overall_percentile = grade_uni, nq(100) by prov_id, sort: egen province_percentile = xtile(grade_uni), nq(100) by canton_id, sort: egen canton_percentil = xtile(grade_uni), nq(100)

Notes:
Read what Nick Cox said in #5 and take it to heart. This approach can only give usable results if the grades themselves are sufficiently fine-grained that you can actually break them down into 100 subgroups. So, for example, this would work well with USA SAT scores, which range from 400 to 1600 in increments of 1. But it will fail abysmally with a five category letter grade, and probably produce unusable results even with a 1-100 grade range because there will be too many ties in the distribution coupled with too many uninstantiated levels. If your data are not sufficiently fine-grained, change nq() to a lower number and settle for a smaller number of quantiles that are workable with the data.

-xtile()- is not an official Stata -egen- function, but is part of the -egenmore- suite which you can install from SSC. There are other useful functions in that suite as well. If you are perhaps working from a computer installation that does not permit you to install these additional Stata ados, post back and I will show you code that works only with native Stata commands and functions.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#8

01 Dec 2023, 11:43

https://www.stata.com/support/faqs/s...ting-positions may help here.

Despite repeated requests, you have yet to show us what your data look like.
Comment
Katherine Oleas

Join Date: Aug 2021

Posts: 80
#9

01 Dec 2023, 12:17

Thank you for the help. I can't show you data from the database because it is confidential and I only have access to institutional computers. Because of this, I can't install packages like those suggested by Clyde. To clarify a little, my general grade variable ranges from 0 to 10, which is Ecuador's grading system. To create the percentile variable use the command: xtile ranking_general= unit, n(10). This code helped me create what I needed. Now I need to create something for similar but with the condition that it is by province.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#10

01 Dec 2023, 13:19

Code:

levelsof prov_id, local(provinces) gen province_decile = . foreach p of local provinces { xtile pd = grade_uni if prov_id == `p', nq(10) replace province_decile = pd if prov_id == `p' drop pd }

Note: Assumes prov_id is a numeric variable. If it is a string variable, replace `p' in the above by `"`p'"' in the -xtile- and -replace- commands.

As for not showing example data, confidentiality would not usually be a barrier. What is most important is not the values of the variables but the data organization. So you could replace any student_id variable by sequential numbers starting at 1, and replace the values of uni_grade by random numbers having the same range of values as the range of the real uni_grade, and then show that modified data set by using the -dataex- command.

In this particular case, your problem was solvable without the example data. But it would have gone faster with it. And in most situations, asking for code with no example data shown is a lost cause.
Comment
Katherine Oleas

Join Date: Aug 2021

Posts: 80
#11

01 Dec 2023, 13:37

Thank you very much. The code helped us a lot. 😊🎊
Comment
Katherine Oleas

Join Date: Aug 2021

Posts: 80
#12

01 Dec 2023, 13:50

I tried to replicate the code at the canton level and I get a no-observations error. The i_cantones is also numeric.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#13

01 Dec 2023, 14:29

Well, "no observations" means just what it says. There must be some canton_id(s) for which all values of grade_uni are missing, so no quantiles can be calculated.

You can identify that (those) canton(s) by running:

Code:

tabstat grade_uni, statistic(count) by(canton)

Find the canton's in the resulting table that show a count of 0 for grade_uni.

Then you have to figure out why all the grade_uni values are missing in that (those) canton(s). It may mean that your data set is incorrect, in which case you need to go back and re-create that data set, fixing whatever errors were made making the original version.

If, however, there is a good reason why those cantons' values of grade_uni are all missing, and this is normal and expected, then you can program around them as follows:

Code:

levelsof canton_id, local(cantons) gen canton_decile = . foreach c of local cantons { capture xtile cd = grade_uni if canton_id == `c', nq(10) if c(rc) == 0 { // NORMAL SITUATION--COMPUTE DECILES replace canton_decile = cd if canton_id == `c' drop cd } else if inlist(c(rc), 2000, 2001) { /// NO OR TOO FEW OBSERVATIONS; SKIP continue } else { // UNEXPECTED ERROR; BREAK WITH ERROR MESSAGE display "Unexpected error: canton_id == `c'" error `c(rc)' } }

This code attempts to calculate the deciles for each canton. Where successful, it saves the results in variable canton_decile, as before. If it encounters an error condition due to no, or insufficient, observations, it just skips this canton and moves on to the next. If it encounters any other kind of error when attempting to compute the deciles, it halts execution with an error message and identifies the offending canton_id.
Comment
Katherine Oleas

Join Date: Aug 2021

Posts: 80
#14

04 Dec 2023, 09:42

Clyde, thank you very much for your help.
Comment
Katherine Oleas

Join Date: Aug 2021

Posts: 80
#15

15 Dec 2023, 12:31

Good afternoon. I'm using the code they recommended to me to get the amounts by canton. Now I want to do the same thing at the school level (The variable of schools is a number labeled) and I get this message:
nquantiles() must be less than or aqual to number of observations plus one

Last edited by Katherine Oleas; 15 Dec 2023, 12:36.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment