Assessing differences between ordinal ranks based on frequencies

Harry Coleman

Join Date: May 2019

Posts: 4
#1

Assessing differences between ordinal ranks based on frequencies

26 Jun 2019, 04:30

Hi Stata users!

Stata 14.1. I am trying to assess if there is a difference in the ordinal ranks of most frequently diagnosed diseases between men and women, and across different seasons. From my dataex below, we have an id variable (srno), the binary sex variable (sex2: 0 = men; 1 = women), the categorical season variable (seas: 1 = winter; 2 = pre-monsoon; 3 = southwest monsoon; 4 = post-monsoon), and a categorical diagnosis variable, which lists the indicated condition of the individual. I cherry-picked my dataex to include entries from all four seasons.

[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input int srno byte sex2 float seas str40 diag
4734 0 1 "Gastritis"
4735 1 1 "Upper Respiratory Tract Infection (URTI)"
4736 0 1 "Cough"
4737 1 1 "Cold"
4738 1 1 "Diarrhea"
6177 0 2 "SOB"
6178 0 2 "Upper Respiratory Tract Infection (URTI)"
6179 0 2 "Upper Respiratory Tract Infection (URTI)"
6180 0 2 "Cold"
6181 0 2 "Gastritis"
8089 1 3 "Gastritis"
8090 1 3 "Tinea"
8091 0 3 "Gastritis"
8092 0 3 "Fever"
8093 0 3 "Low Back Ache"
10669 0 4 "Tinea"
10670 0 4 "Icterus"
10671 0 4 "Cough"
10672 0 4 "Tinea"

I can quickly see which diagnoses are made most frequently, across sex and season, by using the tabsort command (ssc install tab_chi), e.g. below for sex, for the top five most frequent diagnoses

tabsort diag if sex2==0
RevisedDiagnosis Freq. Percent Cum.

Cough 1,477 12.05 12.05

Musculoskeletal Pain 1,327 10.83 22.88

Road Traffic Accident 1,023 8.35 31.23

Tinea 981 8.00 39.23

Cold 795 6.49 45.72

tabsort diag if sex2==1
RevisedDiagnosis Freq. Percent Cum.

Musculoskeletal Pain 530 12.10 12.10

Cough 490 11.19 23.29

Cold 349 7.97 31.26

Fever 322 7.35 38.62

Gastritis 279 6.37 44.99

Now, this is both a statistics question and a Stata-istics question (I apologise for the former). I believe I should use the Wilcoxon-Mann Whitney or Kruskal Wallis test to see if there is a difference between these two ranks, but what I don't understand is what form the data should take, in order to make these tests possible. I have successfully used other Statalist posts to create two new variables which list the rank of each diagnosis for males and females, but I don't understand how a single variable could contain the necessary information to make these tests possible. I don't discount I am making some error with regards to choice of test, dependent variable or otherwise. I appreciate any help possible!

Kind regards,

Harry
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35685
#2

26 Jun 2019, 04:44

The fact that you can rank diagnoses by frequency doesn't make diagnosis an ordinal variable.

So, what you want here is more like a chi-square test.
Comment
Harry Coleman

Join Date: May 2019

Posts: 4
#3

26 Jun 2019, 05:56

Does statistical satiation exist? Is there a German compound phrase for statistical shame?!

I realise the large oversight in my question. I had thought it possible, statistically, to assess differences between ranks, and even more, which position changes are the most 'influential'. I hadn't really considered whether that is something you should do when you have a categorical dependent variable.

My sincere thanks anyway. I need a break.
Comment

RevisedDiagnosis	Freq.	Percent	Cum.

Cough	1,477	12.05	12.05
Musculoskeletal Pain	1,327	10.83	22.88
Road Traffic Accident	1,023	8.35	31.23
Tinea	981	8.00	39.23
Cold	795	6.49	45.72

RevisedDiagnosis	Freq.	Percent	Cum.

Musculoskeletal Pain	530	12.10	12.10
Cough	490	11.19	23.29
Cold	349	7.97	31.26
Fever	322	7.35	38.62
Gastritis	279	6.37	44.99

Announcement

Assessing differences between ordinal ranks based on frequencies

Comment

Comment