Need help interpreting Wilcoxon rank sum test results

Raju Parakkal

Join Date: Jul 2015

Posts: 11
#1

Need help interpreting Wilcoxon rank sum test results

08 Feb 2016, 22:54

Hi, I have the following result from running a Wilcoxon rank sum test. This is survey data using a 7-point Likert scale (the neutral middle response was dropped from the data). The response variable is labeled lhop and the grouping variable is labeled comexpgr. The z statistic is significant, so there is a statistically significant difference between the two groups, TrustCom and TrustExp.

But my question is, can I also conclude that TrustCom has a higher rank than TrustExp w.r.t. lhop, because TrustCom has a higher rank sum (14388) than expected (12300) while TrustExp has a lower rank sum (42227) than expected (44315)? This is my first time posting on the forum and I hope I am posting correctly (I did read the FAQ first). Thanks for any help you can offer.

. ranksum lhop, by( comexpgr )

Two-sample Wilcoxon rank-sum (Mann-Whitney) test

comexpgr | obs rank sum expected

TrustCom | 73 14388.5 12300.5
TrustExp | 263 42227.5 44315.5

combined | 336 56616 56616

unadjusted variance 539171.92
adjustment for ties -24047.19
----------
adjusted variance 515124.73

Ho: lhop(comexpgr==TrustCom) = lhop(comexpgr==TrustExp)
z = 2.909
Prob > |z| = 0.0036
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35699
#2

09 Feb 2016, 02:31

On the rare occasions I use these tests I reach for the porder option to summarize what is happening.

The key to unlock the door here is recast the problem as an estimation problem as well as a testing problem. Roger Newson's work is key here.

http://www.stata-journal.com/sjpdf.h...iclenum=gn0062 gives a way in. See especially the 2002 paper and the program somersd (SJ, SSC, Roger's own website: search for download locations).
Comment
Raju Parakkal

Join Date: Jul 2015

Posts: 11
#3

09 Feb 2016, 08:05

Thank you so much for the suggestion and the information. After reading the Newson 2002 article, I downloaded the somersd package and found it very useful. I have one more question, if you don't mind. I'm not sure if I missed it in the article, but does the somerd command need any assumption about the distribution of the data? The data I'm using is from a survey and it goes from 1 to 7, for both the predictor and response variables. Thanks.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35699
#4

09 Feb 2016, 08:07

Roger will probably tune in, but apart from the usual independence assumptions, I would assume not.
Comment
Raju Parakkal

Join Date: Jul 2015

Posts: 11
#5

09 Feb 2016, 08:10

That's great. Thanks, again.
Comment
Roger Newson

Join Date: Apr 2014

Posts: 317
#6

09 Feb 2016, 08:27

Thanks to Nick for alerting me to this post. No, the somersd command does not assume a great deal, except that individuals are sampled from populations, and that the Central Limit Theorem works. And the Central Limit Theorem works more quickly (ie at smaller sample sizes) for Somers' D than for most other statistics known to science. If the smaller of the 2 sample sizes is 8 or more, then the Central Limit Theorem seems to work like a dream.

I hope this helps. Let me know if you have any more queries.

Best wishes

Roger
Comment
Raju Parakkal

Join Date: Jul 2015

Posts: 11
#7

09 Feb 2016, 10:28

Thanks for responding to my post. Much appreciated. My sample sizes are in the hundreds for both the groups, with the total number of observations (N) above 300. So, the Central Limit Theorem should work. I have two more queries, if you don't mind answering.

(1) In the somersd command, do I first put the response variable or the predictor/grouping variable after typing somersd? In Stata, we usually put the response/dependent variable first but I noticed in your 2002 article (p. 53) that you put the grouping variable (US) first. Please clarify this aspect.

(2) Can I interpret the somersd results like I would do with regression results? As an example, this is the result I got for one of my somersd estimations (lhop is the response variable and comexp is the grouping/predictor variable with groups 0 and 1). Can I interpret that (a) groups 0 and 1 are statistically different from each other for lhop and (b) compared to group 0, respondents in group 1 have significantly higher values on lhop? Thank you so much.

. somersd comexp lhop
Somers' D with variable: comexp
Transformation: Untransformed
Valid observations: 336

Symmetric 95% CI

| Jackknife
comexp | Coef. Std. Err. z P>|z| [95% Conf. Interval]

lhop | .2175113 .0755702 2.88 0.004 .0693965 .3656262
Comment
Roger Newson

Join Date: Apr 2014

Posts: 317
#8

09 Feb 2016, 13:20

In reply to your queries:

1. In the somersd command, we have a single X-variable followed by a list of one or more Y-variables. However, this is a notational convention, and is followed because there is only one X and possibly multiple Y-variables. Sometimes, the single X-variable may be a "response" (like a possibly-censored lifetime), and the multiple Y-variables might be competing predictors. See Section 2.1 of Newson (2006), and also Newson (2010).

2. If the entire confidence interval for Somers' D(Y|X) is positive, then there is a tendency for individuals with larger X-values to have larger Y-values most of the time. And, if the entire confidence interval is negative, then there is a tendency for individuals with larger X-values to have smaller Y-values most of the time. This follows from the definition of Somers' D. You may have met Harrell's c-statistic, defined as c(Y|X)=(D(Y|X)+1)/2, which is an equivalent parameter, and is also discussed in Newson (2006) and Newson (2010).

I hope this helps.

Best wishes

Roger

References

Newson R. Confidence intervals for rank statistics: Somers' D and extensions. The Stata Journal 2006; 6(3): 309-334. Download from
http://www.stata-journal.com/article...rticle=snp15_6

Newson RB. Comparing the predictive power of survival models using Harrell’s c or Somers’ D. The Stata Journal 2010; 10(3): 339–358. Download from
http://www.stata-journal.com/article...article=st0198
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4412
#9

09 Feb 2016, 17:23

Originally posted by Raju Parakkal View Post

This is survey data using a 7-point Likert scale (the neutral middle response was dropped from the data).

Is that commonly done?

Are the results described something like, "Of those respondents who expressed an opinion (either agreed or disagreed) . . ."?

If the difference between groups is such that one is more neutral, is the interpretation complicated, for example, by differentially dropping observations?
Comment
Raju Parakkal

Join Date: Jul 2015

Posts: 11
#10

09 Feb 2016, 18:12

Yes, that really helped and I am quite thankful. But as I did a little more research on Somers' D, I came across this information that said the two variables need to have a monotonic relationship for the somersd results to be valid. Is that true? I checked my variables for monotonicity using the Spearman test and they do not have a monotonic relationship (the Spearman's values were very low). Can I still use the somersd estimation? If not, is there something I can do to still use the somersd estimation? Thank you very much.
Comment
Raju Parakkal

Join Date: Jul 2015

Posts: 11
#11

09 Feb 2016, 18:23

Originally posted by Joseph Coveney View Post

Is that commonly done?

Are the results described something like, "Of those respondents who expressed an opinion (either agreed or disagreed) . . ."?

If the difference between groups is such that one is more neutral, is the interpretation complicated, for example, by differentially dropping observations?

From what I understand, people are dropping the middle neutral value when using the Likert scale for surveys. But the survey for this project was already done with the neutral response option included in the questionnaire (before I joined the project). So, I run both tests, one with the middle value included in the data and one with it excluded. I use some theoretical reasoning as applicable to this research project in deciding to which group the neutral value respondents must be included. But when it's not possible theoretically for a particular question, I drop the middle value and explain that in the data section of the paper.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4412
#12

09 Feb 2016, 18:34

Originally posted by Raju Parakkal View Post

From what I understand, people are dropping the middle neutral value when using the Likert scale for surveys. But the survey for this project was already done with the neutral response option included in the questionnaire (before I joined the project).

Oh I know that instrument developers often omit the neutral ("neither agree nor disagree") response option from items when preparing the the questionnaire. That is, they force respondents to decide.

But that's different from first gathering data with the neutral response available to respondents, and then dropping those neutral responses from the dataset afterward.
Comment
Raju Parakkal

Join Date: Jul 2015

Posts: 11
#13

09 Feb 2016, 19:01

I agree. That's why I am doing both the tests, including and excluding the neutral responses, and presenting both the results.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35699
#14

09 Feb 2016, 19:30

I agree with Joseph. Dropping the middle value is just throwing away information and indeed almost certainly giving wrong answers here.
Comment
Raju Parakkal

Join Date: Jul 2015

Posts: 11
#15

09 Feb 2016, 19:37

Yes, you're right. I'm definitely keeping the middle values for those questions where I can theoretically justify classifying them in one group or the other. For other questions, I think I might have to provide some logical reasoning to retain them in the data so the results make sense.
Comment

Announcement

Need help interpreting Wilcoxon rank sum test results

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment