Ranking variables (instead of observations)

Sem vanWesten

Join Date: Apr 2016

Posts: 12
#1

Ranking variables (instead of observations)

28 Apr 2016, 07:35

The questionnaire I have data from asked respondents to rank 20 items on a scale of importance to them. The lower end of the scale contained a "bin" in which respondents could throw away any of the 20 items that they found completely unimportant to them. The result is a dataset with 20 variables (1 for every item). Every variable receives a number between 1 and 100 (and 0 if the item was thrown in the bin)
I would like to recode the entries into a ranking of the variables for every respondent. So all variables would receive a number between 1 and 20 relative to where that respondent ranked it.
Example:
Current:

item1 item2 item3 item4 item5 item6 item7 item8 etc.

respondent1 67 44 29 7 0 99 35 22

respondent2 0 42 69 50 12 0 67 100

etc.

What I would like:

item1 item2 item3 item4 item5 item6 item7 item8 etc.

respondent17 7 6 4 2 1 8 5 3

respondent2 1 4 7 5 3 1 6 8

etc.

As you can see with respondent2, I would like items that tie to get the same rank and the ranking to then skip a number.
I have found a lot of info on how to rank observations but I have not found out how to rank variables yet. Is there anyone that knows how to do this? I think there should be something like an egen command but I can only find the version for ranking observations.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35651
#2

28 Apr 2016, 07:37

Cross-posted at http://stackoverflow.com/questions/3...tions-in-stata and indeed already answered.

Please note our cross-posting policy at http://www.statalist.org/forums/help#crossposting which is that you should tell us about it.
Comment
Sem vanWesten

Join Date: Apr 2016

Posts: 12
#3

28 Apr 2016, 07:47

Was just about to type that I posted this question there as well and there they just gave the advise to reformat by using -reshape long-. I am looking into that and understand the benefits but am still puzzled as to how to rank the data per observation and not in total and then reverse them into the correct original items again... For me, and my knowledge level, the question was not answered yet.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35651
#4

28 Apr 2016, 07:55

"They" includes me. I gave you a link on Stack Overflow to an article you can read, which explains how to do it rowwise. As the strong advice is to reshape first, that's secondary.
Comment
Sem vanWesten

Join Date: Apr 2016

Posts: 12
#5

29 Apr 2016, 02:17

I just got a perfect answer from ander2ed on stackoverflow:

/* reshape long first */
reshape long item, i(respondant) j(itemNum)

/* Rank observations, accounting for ties */
by respondant (item), sort : gen rank = _n
replace rank = rank[_n-1] if item[_n] == item[_n-1] & _n > 1

/* reshape back to wide format */
drop item // optional, you can keep and just include in reshape wide
reshape wide rank, i(respondant) j(itemNum)

Just a little addition of a problem I ran into: I believe that if you decide not to drop item, you have to take it into the reshape wide:

reshape wide rank item, i(respondant) j(itemNum)
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35651

29 Apr 2016, 02:45

Evidently you've decided to keep your wide structure.

OK, in that case, as mentioned in http://stackoverflow.com/questions/3...t-observations you could just use rowranks from the Stata Journal (which you must install first: search rowranks and then click on pr0046).

This code adapts an example posted by ander2ed on Stack Overflow.

Code:

* set up fake data
clear *
set obs 2
gen respondent = "respondent" + string(_n)
set seed 123456789
forvalues i = 1/5 {
    gen item`i' = ceil(runiform()*100)
}
replace item2 = item1 if respondent == "respondent2"

* method 1: use rowranks

rowranks item* , gen(rank1-rank5) method(low)

list

     +---------------------------------------------------------------------------------------------+
     |  respondent   item1   item2   item3   item4   item5   rank1   rank2   rank3   rank4   rank5 |
     |---------------------------------------------------------------------------------------------|
  1. | respondent1      35      14      87       8      56       3       2       5       1       4 |
  2. | respondent2      27      27      36      33      88       1       1       4       3       5 |
     +---------------------------------------------------------------------------------------------+


* method 2 (ander2ed): reshape, rank and reshape back

reshape long item, i(respondent) j(itemNum)

/* Rank observations, accounting for ties */
by respondent (item), sort : gen rank = _n
replace rank = rank[_n-1] if item[_n] == item[_n-1] & _n > 1

/* reshape back to wide format */
drop item // optional, you can keep and just include in reshape wide
reshape wide rank, i(respondent) j(itemNum)

list

     +-----------------------------------------------------+
     |  respondent   rank1   rank2   rank3   rank4   rank5 |
     |-----------------------------------------------------|
  1. | respondent1       3       2       5       1       4 |
  2. | respondent2       1       1       4       3       5 |
     +-----------------------------------------------------+

Clearly you can ensure the same results. Note that the egen function rank() gives you scope for ranking in different ways, important when, as in this example, there are ties.

Similarly, rowranks offers different ranking rules. I just used method(low) to ensure compatibility with ander2ed's results.

The trade-off here is delicate:

1. The main merit of reshape long is that you get a better data structure for most Stata purposes. That is, from that point of view, you should not reshape back!

2. But if for at least some purposes you really want the wide structure, it is manifestly not necessary at all to go the reshape - rank - reshape route as rowranks gives you the ranks from one command.

Last edited by Nick Cox; 29 Apr 2016, 03:36.

Comment

Sem vanWesten

Join Date: Apr 2016

Posts: 12
#7

01 May 2016, 00:48

Thanks for the extra info Nick, I do need the wide structure because of the rest of the dataset I have, the total dataset is 700 variables big. I should've mentioned that in the original post. I already tried the "double reshape" way but I still have a few more ranking lists to make so I will definitely look into the rowranks package more closely. I had read about it in your article but it was to technical for me to get a good picture about how to simply use it exactly. This example you posted here makes it a lot clearer. thanks for that!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35651
#8

01 May 2016, 04:28

My papers aren't usually described as too technical!

In this case, I perhaps should have emphasised that with the formal publication of rowranks there was necessarily a help file too, and the help file contains a simple example.
Comment

	item1	item2	item3	item4	item5	item6	item7	item8	etc.
respondent17	7	6	4	2	1	8	5	3
respondent2	1	4	7	5	3	1	6	8
etc.

Announcement