Good morning.
I have been doing some reading around how best to analyse my data collected for my university thesis, which are ordered categorical (ordinal data)
My exposure variable is the source of drinking water, ranked using the JMP water ladder (Drinking water | JMP (washdata.org)) i.e.
1. Safely managed
2. Basic
3. Limited
4. Unimproved
5. Surface water
My outcome variable of cca result is also ordinal:
1. Negative
2. Trace
3. +
4. ++
5. +++
I was advised by another student studying medical statistics to use the Chi2 test to look for association between the two, before running ordinal logistic regression if there is an association. However having done further reading it seems to me that using the Kendall's Tau test or Spearman test is more appropriate given the small sample size of n=253 (which makes me lean towards Kendall's Tau) and the fact that my outcome variable is ordinal.
However given my extremely limited knowledge, I am getting a little confused as which test to use.
From what I understand Kendall's Tau and Spearman are tests for correlation (linear relationship) whereas Chi2 test are tests for association.
Having ran the Chi2 test gives me the following result:
. tab hhwater_ladder cca_code, row chi exact
+----------------+
| Key |
|----------------|
| frequency |
| row percentage |
+----------------+
Enumerating sample-space combinations:
stage 5: enumerations = 1
stage 4: enumerations = 52
stage 3: enumerations = 1540
stage 2: enumerations = 28204
stage 1: enumerations = 0
hhwater_la | cca_code
dder | Neg trace + ++ +++ | Total
-----------+-------------------------------------------------------+----------
2 | 11 15 14 17 42 | 99
| 11.11 15.15 14.14 17.17 42.42 | 100.00
-----------+-------------------------------------------------------+----------
3 | 0 0 2 1 1 | 4
| 0.00 0.00 50.00 25.00 25.00 | 100.00
-----------+-------------------------------------------------------+----------
5 | 9 15 11 23 92 | 150
| 6.00 10.00 7.33 15.33 61.33 | 100.00
-----------+-------------------------------------------------------+----------
Total | 20 30 27 41 135 | 253
| 7.91 11.86 10.67 16.21 53.36 | 100.00
Pearson chi2(8) = 17.5166 Pr = 0.025
Fisher's exact = 0.024
Having ran the Kendall Tau test I have obtained the following result:
. ktau hhwater_ladder cca_code, stats (taua taub p)
Number of obs = 253
Kendall's tau-a = 0.1030
Kendall's tau-b = 0.1798
Kendall's score = 3283
SE of score = 1053.988 (corrected for ties)
Test of H0: hhwater_ladder and cca_code are independent
Prob > |z| = 0.0018 (continuity corrected)
Running the Spearman test the following is obtained:
. spearman hhwater_ladder cca_code, stats (rho p)
Number of obs = 253
Spearman's rho = 0.1953
Test of H0: hhwater_ladder and cca_code are independent
Prob > |t| = 0.0018
From what I glean from these results, given the significant p-value of the null hypothesis test, it appears that there does not seem to be any correlation between water source and cca result.
However, given the p values from the Chi2 and Fisher's exact test, is it still worthwhile to perform a ordinal logistic regression as there may still be a relationship albeit not linear?
Again, many thanks in advance
I have been doing some reading around how best to analyse my data collected for my university thesis, which are ordered categorical (ordinal data)
My exposure variable is the source of drinking water, ranked using the JMP water ladder (Drinking water | JMP (washdata.org)) i.e.
1. Safely managed
2. Basic
3. Limited
4. Unimproved
5. Surface water
My outcome variable of cca result is also ordinal:
1. Negative
2. Trace
3. +
4. ++
5. +++
I was advised by another student studying medical statistics to use the Chi2 test to look for association between the two, before running ordinal logistic regression if there is an association. However having done further reading it seems to me that using the Kendall's Tau test or Spearman test is more appropriate given the small sample size of n=253 (which makes me lean towards Kendall's Tau) and the fact that my outcome variable is ordinal.
However given my extremely limited knowledge, I am getting a little confused as which test to use.
From what I understand Kendall's Tau and Spearman are tests for correlation (linear relationship) whereas Chi2 test are tests for association.
Having ran the Chi2 test gives me the following result:
. tab hhwater_ladder cca_code, row chi exact
+----------------+
| Key |
|----------------|
| frequency |
| row percentage |
+----------------+
Enumerating sample-space combinations:
stage 5: enumerations = 1
stage 4: enumerations = 52
stage 3: enumerations = 1540
stage 2: enumerations = 28204
stage 1: enumerations = 0
hhwater_la | cca_code
dder | Neg trace + ++ +++ | Total
-----------+-------------------------------------------------------+----------
2 | 11 15 14 17 42 | 99
| 11.11 15.15 14.14 17.17 42.42 | 100.00
-----------+-------------------------------------------------------+----------
3 | 0 0 2 1 1 | 4
| 0.00 0.00 50.00 25.00 25.00 | 100.00
-----------+-------------------------------------------------------+----------
5 | 9 15 11 23 92 | 150
| 6.00 10.00 7.33 15.33 61.33 | 100.00
-----------+-------------------------------------------------------+----------
Total | 20 30 27 41 135 | 253
| 7.91 11.86 10.67 16.21 53.36 | 100.00
Pearson chi2(8) = 17.5166 Pr = 0.025
Fisher's exact = 0.024
Having ran the Kendall Tau test I have obtained the following result:
. ktau hhwater_ladder cca_code, stats (taua taub p)
Number of obs = 253
Kendall's tau-a = 0.1030
Kendall's tau-b = 0.1798
Kendall's score = 3283
SE of score = 1053.988 (corrected for ties)
Test of H0: hhwater_ladder and cca_code are independent
Prob > |z| = 0.0018 (continuity corrected)
Running the Spearman test the following is obtained:
. spearman hhwater_ladder cca_code, stats (rho p)
Number of obs = 253
Spearman's rho = 0.1953
Test of H0: hhwater_ladder and cca_code are independent
Prob > |t| = 0.0018
From what I glean from these results, given the significant p-value of the null hypothesis test, it appears that there does not seem to be any correlation between water source and cca result.
However, given the p values from the Chi2 and Fisher's exact test, is it still worthwhile to perform a ordinal logistic regression as there may still be a relationship albeit not linear?
Again, many thanks in advance
Comment