Comparing Observations in Twos

Mengxiao Liu

Join Date: Jun 2020

Posts: 6
#1

Comparing Observations in Twos

31 Aug 2022, 07:42

Hello Stata,

I have 38,420 observations and 12 variables. Each variable is a numerical attribute of observation, the higher, the better. I want to select observations based on these variables by first throwing out dominated observations. That is, if observation 2 is lower than observation 1 in all attributes, I delete observation 2. Given the number of observations in my data, assuming that I do the comparison two at a time, I will be doing 15 rounds of comparisons, or 38,419 comparisons in total.

Is there any Stata command or package that can help me do it? If not, do you have any recommendations on the algorithm I should use?

Thank you so much.
Tags: None
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1369
#2

31 Aug 2022, 10:10

Please provide an extract of your data using -dataex-. See also the Statalist FAQ (esp. #12).
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

31 Aug 2022, 10:51

It is not clear why you think that at most 38,419 comparisons will need to be made.

I can reproduce that number by assuming that in the first round, you compare 19,210 pairs of observations and drop from each pair one observation that is dominated, and proceed forward with 9,605 pairs of observations, again dropping one from each pair, and so on.

But there is nothing that guarantees that in any comparison one of the two observations will dominate the other.

Suppose there are 2 variables x and y, and it turns out that in every observation, x+y=1, or stated differently, y=1-x. Then in any pair of observations, say (x1,y1) and (x2,y2), if x1>x2 then (1-x1)<(1-x2) and thus y1<y2. So no matter what two observations you choose to compare, neither will dominate the other.
Comment
Mengxiao Liu

Join Date: Jun 2020

Posts: 6
#4

31 Aug 2022, 12:17

Here is a MWE:

Code:

sysuse auto dataex make price mpg headroom gear_ratio in 1/5 list * Example generated by -dataex-. For more info, type help dataex clear input str18 make int(price mpg) float(headroom gear_ratio) "AMC Concord" 4099 22 2.5 3.58 "AMC Pacer" 4749 17 3 2.53 "AMC Spirit" 3799 22 3 3.08 "Buick Century" 4816 20 4.5 2.93 "Buick Electra" 7827 15 4 2.41 end

I would like to compare the observations in pairs and eliminate the dominated ones. For example, Buick Century dominates AMX pacer because it is larger in all 4 variables. I cannot throw out any other observations because there is no strict dominance.
Here are my questions:
1. Is there any package that can help me achieve this task?
2. If not, what algorithm can do this job as quickly as possible?
3. For those observations that cannot be eliminated, is there a way to judge which ones are better?

Thank you!

Originally posted by Hemanshu Kumar View Post

Please provide an extract of your data using -dataex-. See also the Statalist FAQ (esp. #12).
Comment
Mengxiao Liu

Join Date: Jun 2020

Posts: 6
#5

31 Aug 2022, 12:21

Yes, you are absolutely right. I guess I should have been more rigorous and mentioned this is the minimum number of comparisons. In the worst-case scenario, I would have to make 38,420 * 38,419 = 1,476,057,980 comparisons (?).

Originally posted by William Lisowski View Post

It is not clear why you think that at most 38,419 comparisons will need to be made.

I can reproduce that number by assuming that in the first round, you compare 19,210 pairs of observations and drop from each pair one observation that is dominated, and proceed forward with 9,605 pairs of observations, again dropping one from each pair, and so on.

But there is nothing that guarantees that in any comparison one of the two observations will dominate the other.

Suppose there are 2 variables x and y, and it turns out that in every observation, x+y=1, or stated differently, y=1-x. Then in any pair of observations, say (x1,y1) and (x2,y2), if x1>x2 then (1-x1)<(1-x2) and thus y1<y2. So no matter what two observations you choose to compare, neither will dominate the other.
Comment

Announcement

Comparing Observations in Twos

Comment

Comment

Comment

Comment