Finding second highest value across variables

He Krau

Join Date: Jul 2023

Posts: 3
#1

Finding second highest value across variables

02 Jul 2023, 11:44

Hello,

heres hoping this is not super basic but im severely stuck.

I want create a dummy that indicates whether the variable v1 has the second highest/n highest value among the variables v1, v2, v3 and v4 within the same observation. There are possibly ties between v1, v2 and v3.

Thus far i tried rank in egen and rowsort but they seem not to do what i want to achieve. Where do i go from here?

Thank you so much for your time!

Last edited by He Krau; 02 Jul 2023, 11:58.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

02 Jul 2023, 12:07

How do you want to handle ties? If v1 is tied with one of the others for first place, should we consider v1 to be first place or second? What if v1 is tied with one of the others for second place?
Comment
He Krau

Join Date: Jul 2023

Posts: 3
#3

02 Jul 2023, 12:13

If tied with first place the dummy should be 0. If tied with second place the dummy should be 1.

Thank you very much for your reply!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#4

02 Jul 2023, 12:36

OK. As you did not provide example data, I have made a toy data set to demonstrate the approach.

Code:

// CREATE A DEMONSTRATION DATA SET clear* set obs 20 set seed 314159 forvalues i = 1/4 { gen v`i' = runiformint(1, 5) } // DEMONSTRATE THE APPROACH gen `c(obs_t)' obs_no = _n reshape long v, i(obs_no) j(index) gsort obs_no v -index by obs_no, sort: gen byte wanted = (index[3] == 1) & (`v'[3] != `v'[4]) reshape wide

The code also assumes, but does not verify that the variables v1 through v4 never contain missing values.

In the future, when asking for help with code, please use the -dataex- command and show example data. Although sometimes, as here, it is possible to give an answer that has a reasonable probability of being correct, this is usually not the case. Moreover, such answers are necessarily based on experience-based guesses or intuitions about the nature of your data. When those guesses are wrong, both you and the person trying to help you have wasted their time as you end up with useless code. To avoid this, a -dataex- based example provides all of the information needed to develop and test a solution.

If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

Last edited by Clyde Schechter; 02 Jul 2023, 12:41.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#5

02 Jul 2023, 14:18

You can count how many of v2 v3 v4 are higher than v1. The answer is 1 if and only if v1 ranks 2nd, regardless of whether v1 ties with any other variable.

Code:

gen higher = 0 foreach v in v2 v3 v4 { replace higher = higher + (`v' > v1) } gen is_second = higher == 1

Last edited by Nick Cox; 02 Jul 2023, 14:22.
1 like
Comment
He Krau

Join Date: Jul 2023

Posts: 3
#6

05 Jul 2023, 17:29

Thank you so much, your intuitions are quite on point!
-dataex- ist noted for next time if it should come to that.
Comment

Announcement

Finding second highest value across variables

Comment

Comment

Comment

Comment

Comment