Comparing two lists

Tim Sulls

Join Date: May 2022

Posts: 25
#1

Comparing two lists

22 Aug 2023, 07:52

I was thinking through a possible data scenario. It is made up, so maybe my question doesn't make sense.

Code:

clear all set obs 100 g id = _n g indicator = uniform() < .3 g avgscore1 = runiformint(0, 100) g avgscore2 = runiformint(0, 100) egen avgscore1rank = rank(-avgscore1) egen avgscore2rank = rank(-avgscore2) g avgscore1ranktop25 = avgscore1rank <= 25 g avgscore2ranktop25 = avgscore2rank <= 25

1) A statistical test of whether the proportion of "indicator" in "avgscore1ranktop25 == 1" is the same as in "avgscore2ranktop25 == 1" regardless of "id".
2) A statistical test of whether the same "id" inclusion in "avgscore1ranktop25" is the same as "avgscore2ranktop25".

Last edited by Tim Sulls; 22 Aug 2023, 08:27.
Tags: None
Mike Lacy

Join Date: Apr 2014

Posts: 2421
#2

22 Aug 2023, 10:07

You may be talking about a test for the difference of two proportions, but it's hard to tell. Your use of vocabulary here is sufficiently nonstandard as to make it hard to know what you want. I'd suggest that you either: 1) Describe your problem more substantively, using more "ordinary" rather than "statistical" language, since I'd presume you have more experience in ordinary language; or 2) Get a friend to help you reword your question in more standard statistical language.
Comment
Tim Sulls

Join Date: May 2022

Posts: 25
#3

22 Aug 2023, 10:43

Thank you Mike Lacy.

I attempt ordinary language below. I know that the output is random meaning that these differences will likely not be different and what I see is different from what you see. But this is just for my learning and imaging that the data is real.

1)

I want to know if the proportion of "indicator == 1" in the "avgscore1ranktop25 == 1" is the same as its representation in the "avgscore2ranktop25 == 1"

tab indicator if avgscore1ranktop25 == 1 (36%)
tab indicator if avgscore2ranktop25 == 1 (40%)

Shows me that there is a difference of 4 percentage points. How would I compare these differences with a statistical test?

2)

I want to know if the distribution of "ids" when "avgscore1ranktop25 == 1" is the same when "avgscore2ranktop25 == 1"

tab id if avgscore1ranktop25 == 1
tab id if avgscore2ranktop25 == 1

Shows me that only 6 of the ids are in both sets. How would I compare these differences with a statistical test?
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2421
#4

22 Aug 2023, 13:00

Sorry, I'll have to leave this one to someone else. By ordinary language, I had in mind that you might fare better describe your question in substantive terms, that is, in the context of the research question you were pursuing, but I'm not getting that here.
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 822
#5

22 Aug 2023, 13:51

OP, it seems like maybe you want to say something like this: Suppose a sample of prospective graduate students are given a standardized math, reading, and writing test designed to evaluate the likelihood of success in graduate school. Scores on each test are converted to an index scaled from 0 to 100. An analyst is considering students with a writing score where students earned more than 30 points on the index (or earned around 30% of the points). The analyst observes that some of those students have a score ranked in the bottom 25% of students on the math test, and some of those same students have a score ranked in the bottom 25% of the reading test. The analyst would like to test the hypothesis that roughly the same number of students (measured as the proportion of total students) are in the bottom 25% on the math and reading test respectively. Obviously, in the sample overall, 25% are in the bottom 25% on the math and reading test, but is the same true only considering those students with a score greater than 30% on the writing test?

The analyst considers that a two-tailed t-test comparing the proportions might be appropriate, but recognizes these two scores are not drawn from two independent samples. (OP: notably, and contrary to this thought experiment, the way you generate the simulated data implies that the variables of interest are entirely independent of one another.) Is there an appropriate test for the difference in proportions across these two variables considering they come from the same sample?

Does that adequately represent what you're trying to do here OP?
Comment
Tim Sulls

Join Date: May 2022

Posts: 25
#6

22 Aug 2023, 14:06

That may be similar. But let me try with a research question and context.

Imagine a group is evaluating ice cream sundaes (ids) and scores them 1-100 (avgscore1). Sundaes can be of two types (indicator == 0 or 1). Another group evaluates the same sundaes and scores them 1-100 (avgscore2).

I want to see whether the groups have similar preferences by testing whether sundaes of type 0/1 in the top 25 for group1 is of the same proportion as group2. I then want to see if individual sundae preferences by score (ids) are the same between the groupd.
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 822
#7

22 Aug 2023, 14:16

So observations are actually sundaes? Like someone made 100 sundaes, and two people (or groups of people and you've taken the average) rank the 100 sundaes from best to worst? And you want to see whether those two people (or two groups on average) have different preferences?
Comment
Tim Sulls

Join Date: May 2022

Posts: 25
#8

22 Aug 2023, 14:52

Thanks for helping me get through this. It is a useful lesson.

Here are definitions

id = unique sundae
indicator = 0 (vanilla base sundae) and 1 (chocolate base sundae)
avgscore1 = average 1-100 score of 10 people (group 1) who tried that sundae with 100 being the best score
avgscore2 = average 1-100 score of 10 OTHER people (group 2) who tried that sundae with 100 being the best score
avgscore1ranktop25 = is the sundae in the top 25 for people in group 1 (using avgscore1)
avgscore2ranktop25 = is the sundae in the top 25 for people in group 2 (using avgscore2)

I have two questions:

1) Are sundaes with the same base flavor (indicator = 0/1) represented in the same proportion in group 2's top 25 (avgscore2ranktop25) as group1's top 25 (avgscore1ranktop25)? In other words is the proportion of vanilla in the top 25 the same across groups?
2) Are the distribution of scores similar by unique sundaes (id) between group1 (avgscore1) and group2 (avgscore2)? In other words, are the sundae preference by unique sundae the same across groups?
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 822
#9

22 Aug 2023, 15:25

This feels like something someone might want to do in an experimental design. Group 1 might be a control group and group 2 might be an intervention group. If you think about it, if these are random samples from the same population exposed to the same stimulus without any intervention, you should definitely expect the preferences to be the same. Its a little weird here because you've actually summarized your observations across different stimulus (the sundaes) and now you're looking for something like a difference in the rankings across groups. I essentially never work with data that looks like this, so I might not be the best to advise here. I'd guess maybe you could set up an ANOVA for this? I don't know, maybe Mike Lacy is willing to jump back in...

I didn't realize this, but apparently Mike and I are in the same field (Sociology). Small world.
Comment
Tim Sulls

Join Date: May 2022

Posts: 25
#10

22 Aug 2023, 20:45

Thanks for trying. I didn't realize that my thought exercise this morning would pose a challenge beyond me.

I think we can also take avgscore1 to mean person 1's score and avgscore2 person 2's score.

You mention my exact curiosity. I would expect preferences to be the same so we would see that the proportion of indicator in each group score to be the same. But I am just looking for the Stata command to test this expectations.That is question 1.

Question 2 is asking whether the distribution of scores by id between group1 (or person 1) is the same as group2 (or person 2). Is there a command to compare these distributions?
Comment

George Ford

Join Date: Aug 2014
Posts: 3185

#11

22 Aug 2023, 21:08

Maybe?

Code:

ttest indicator if avgscore1ranktop25 | avgscore2ranktop25, by(avgscore2ranktop25)
g same = avgscore1ranktop25 == avgscore2ranktop25
ttest same if avgscore1ranktop25 | avgscore2ranktop25, by(avgscore2ranktop25)

I don't think the last part is quite right, but might be helpful.

Last edited by George Ford; 22 Aug 2023, 21:12.

Announcement

Comparing two lists

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment