Hi Stata users!
Stata 14.1. I am trying to assess if there is a difference in the ordinal ranks of most frequently diagnosed diseases between men and women, and across different seasons. From my dataex below, we have an id variable (srno), the binary sex variable (sex2: 0 = men; 1 = women), the categorical season variable (seas: 1 = winter; 2 = pre-monsoon; 3 = southwest monsoon; 4 = post-monsoon), and a categorical diagnosis variable, which lists the indicated condition of the individual. I cherry-picked my dataex to include entries from all four seasons.
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input int srno byte sex2 float seas str40 diag
4734 0 1 "Gastritis"
4735 1 1 "Upper Respiratory Tract Infection (URTI)"
4736 0 1 "Cough"
4737 1 1 "Cold"
4738 1 1 "Diarrhea"
6177 0 2 "SOB"
6178 0 2 "Upper Respiratory Tract Infection (URTI)"
6179 0 2 "Upper Respiratory Tract Infection (URTI)"
6180 0 2 "Cold"
6181 0 2 "Gastritis"
8089 1 3 "Gastritis"
8090 1 3 "Tinea"
8091 0 3 "Gastritis"
8092 0 3 "Fever"
8093 0 3 "Low Back Ache"
10669 0 4 "Tinea"
10670 0 4 "Icterus"
10671 0 4 "Cough"
10672 0 4 "Tinea"
I can quickly see which diagnoses are made most frequently, across sex and season, by using the tabsort command (ssc install tab_chi), e.g. below for sex, for the top five most frequent diagnoses
tabsort diag if sex2==0
tabsort diag if sex2==1
Now, this is both a statistics question and a Stata-istics question (I apologise for the former). I believe I should use the Wilcoxon-Mann Whitney or Kruskal Wallis test to see if there is a difference between these two ranks, but what I don't understand is what form the data should take, in order to make these tests possible. I have successfully used other Statalist posts to create two new variables which list the rank of each diagnosis for males and females, but I don't understand how a single variable could contain the necessary information to make these tests possible. I don't discount I am making some error with regards to choice of test, dependent variable or otherwise. I appreciate any help possible!
Kind regards,
Harry
Stata 14.1. I am trying to assess if there is a difference in the ordinal ranks of most frequently diagnosed diseases between men and women, and across different seasons. From my dataex below, we have an id variable (srno), the binary sex variable (sex2: 0 = men; 1 = women), the categorical season variable (seas: 1 = winter; 2 = pre-monsoon; 3 = southwest monsoon; 4 = post-monsoon), and a categorical diagnosis variable, which lists the indicated condition of the individual. I cherry-picked my dataex to include entries from all four seasons.
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input int srno byte sex2 float seas str40 diag
4734 0 1 "Gastritis"
4735 1 1 "Upper Respiratory Tract Infection (URTI)"
4736 0 1 "Cough"
4737 1 1 "Cold"
4738 1 1 "Diarrhea"
6177 0 2 "SOB"
6178 0 2 "Upper Respiratory Tract Infection (URTI)"
6179 0 2 "Upper Respiratory Tract Infection (URTI)"
6180 0 2 "Cold"
6181 0 2 "Gastritis"
8089 1 3 "Gastritis"
8090 1 3 "Tinea"
8091 0 3 "Gastritis"
8092 0 3 "Fever"
8093 0 3 "Low Back Ache"
10669 0 4 "Tinea"
10670 0 4 "Icterus"
10671 0 4 "Cough"
10672 0 4 "Tinea"
I can quickly see which diagnoses are made most frequently, across sex and season, by using the tabsort command (ssc install tab_chi), e.g. below for sex, for the top five most frequent diagnoses
tabsort diag if sex2==0
RevisedDiagnosis | Freq. | Percent | Cum. |
Cough | 1,477 | 12.05 | 12.05 |
Musculoskeletal Pain | 1,327 | 10.83 | 22.88 |
Road Traffic Accident | 1,023 | 8.35 | 31.23 |
Tinea | 981 | 8.00 | 39.23 |
Cold | 795 | 6.49 | 45.72 |
RevisedDiagnosis | Freq. | Percent | Cum. |
Musculoskeletal Pain | 530 | 12.10 | 12.10 |
Cough | 490 | 11.19 | 23.29 |
Cold | 349 | 7.97 | 31.26 |
Fever | 322 | 7.35 | 38.62 |
Gastritis | 279 | 6.37 | 44.99 |
Kind regards,
Harry
Comment