Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Need help interpreting Wilcoxon rank sum test results

    Hi, I have the following result from running a Wilcoxon rank sum test. This is survey data using a 7-point Likert scale (the neutral middle response was dropped from the data). The response variable is labeled lhop and the grouping variable is labeled comexpgr. The z statistic is significant, so there is a statistically significant difference between the two groups, TrustCom and TrustExp.

    But my question is, can I also conclude that TrustCom has a higher rank than TrustExp w.r.t. lhop, because TrustCom has a higher rank sum (14388) than expected (12300) while TrustExp has a lower rank sum (42227) than expected (44315)? This is my first time posting on the forum and I hope I am posting correctly (I did read the FAQ first). Thanks for any help you can offer.


    . ranksum lhop, by( comexpgr )

    Two-sample Wilcoxon rank-sum (Mann-Whitney) test

    comexpgr | obs rank sum expected

    TrustCom | 73 14388.5 12300.5
    TrustExp | 263 42227.5 44315.5

    combined | 336 56616 56616

    unadjusted variance 539171.92
    adjustment for ties -24047.19
    ----------
    adjusted variance 515124.73

    Ho: lhop(comexpgr==TrustCom) = lhop(comexpgr==TrustExp)
    z = 2.909
    Prob > |z| = 0.0036


  • #2
    On the rare occasions I use these tests I reach for the porder option to summarize what is happening.

    The key to unlock the door here is recast the problem as an estimation problem as well as a testing problem. Roger Newson's work is key here.

    http://www.stata-journal.com/sjpdf.h...iclenum=gn0062 gives a way in. See especially the 2002 paper and the program somersd (SJ, SSC, Roger's own website: search for download locations).

    Comment


    • #3
      Thank you so much for the suggestion and the information. After reading the Newson 2002 article, I downloaded the somersd package and found it very useful. I have one more question, if you don't mind. I'm not sure if I missed it in the article, but does the somerd command need any assumption about the distribution of the data? The data I'm using is from a survey and it goes from 1 to 7, for both the predictor and response variables. Thanks.

      Comment


      • #4
        Roger will probably tune in, but apart from the usual independence assumptions, I would assume not.

        Comment


        • #5
          That's great. Thanks, again.

          Comment


          • #6
            Thanks to Nick for alerting me to this post. No, the somersd command does not assume a great deal, except that individuals are sampled from populations, and that the Central Limit Theorem works. And the Central Limit Theorem works more quickly (ie at smaller sample sizes) for Somers' D than for most other statistics known to science. If the smaller of the 2 sample sizes is 8 or more, then the Central Limit Theorem seems to work like a dream.

            I hope this helps. Let me know if you have any more queries.

            Best wishes

            Roger

            Comment


            • #7
              Thanks for responding to my post. Much appreciated. My sample sizes are in the hundreds for both the groups, with the total number of observations (N) above 300. So, the Central Limit Theorem should work. I have two more queries, if you don't mind answering.

              (1) In the somersd command, do I first put the response variable or the predictor/grouping variable after typing somersd? In Stata, we usually put the response/dependent variable first but I noticed in your 2002 article (p. 53) that you put the grouping variable (US) first. Please clarify this aspect.

              (2) Can I interpret the somersd results like I would do with regression results? As an example, this is the result I got for one of my somersd estimations (lhop is the response variable and comexp is the grouping/predictor variable with groups 0 and 1). Can I interpret that (a) groups 0 and 1 are statistically different from each other for lhop and (b) compared to group 0, respondents in group 1 have significantly higher values on lhop? Thank you so much.

              . somersd comexp lhop
              Somers' D with variable: comexp
              Transformation: Untransformed
              Valid observations: 336

              Symmetric 95% CI

              | Jackknife
              comexp | Coef. Std. Err. z P>|z| [95% Conf. Interval]

              lhop | .2175113 .0755702 2.88 0.004 .0693965 .3656262

              Comment


              • #8
                In reply to your queries:

                1. In the somersd command, we have a single X-variable followed by a list of one or more Y-variables. However, this is a notational convention, and is followed because there is only one X and possibly multiple Y-variables. Sometimes, the single X-variable may be a "response" (like a possibly-censored lifetime), and the multiple Y-variables might be competing predictors. See Section 2.1 of Newson (2006), and also Newson (2010).

                2. If the entire confidence interval for Somers' D(Y|X) is positive, then there is a tendency for individuals with larger X-values to have larger Y-values most of the time. And, if the entire confidence interval is negative, then there is a tendency for individuals with larger X-values to have smaller Y-values most of the time. This follows from the definition of Somers' D. You may have met Harrell's c-statistic, defined as c(Y|X)=(D(Y|X)+1)/2, which is an equivalent parameter, and is also discussed in Newson (2006) and Newson (2010).

                I hope this helps.

                Best wishes

                Roger

                References

                Newson R. Confidence intervals for rank statistics: Somers' D and extensions. The Stata Journal 2006; 6(3): 309-334. Download from
                http://www.stata-journal.com/article...rticle=snp15_6

                Newson RB. Comparing the predictive power of survival models using Harrell’s c or Somers’ D. The Stata Journal 2010; 10(3): 339–358. Download from
                http://www.stata-journal.com/article...article=st0198

                Comment


                • #9
                  Originally posted by Raju Parakkal View Post
                  This is survey data using a 7-point Likert scale (the neutral middle response was dropped from the data).
                  Is that commonly done?

                  Are the results described something like, "Of those respondents who expressed an opinion (either agreed or disagreed) . . ."?

                  If the difference between groups is such that one is more neutral, is the interpretation complicated, for example, by differentially dropping observations?

                  Comment


                  • #10
                    Yes, that really helped and I am quite thankful. But as I did a little more research on Somers' D, I came across this information that said the two variables need to have a monotonic relationship for the somersd results to be valid. Is that true? I checked my variables for monotonicity using the Spearman test and they do not have a monotonic relationship (the Spearman's values were very low). Can I still use the somersd estimation? If not, is there something I can do to still use the somersd estimation? Thank you very much.

                    Comment


                    • #11
                      Originally posted by Joseph Coveney View Post

                      Is that commonly done?

                      Are the results described something like, "Of those respondents who expressed an opinion (either agreed or disagreed) . . ."?

                      If the difference between groups is such that one is more neutral, is the interpretation complicated, for example, by differentially dropping observations?

                      From what I understand, people are dropping the middle neutral value when using the Likert scale for surveys. But the survey for this project was already done with the neutral response option included in the questionnaire (before I joined the project). So, I run both tests, one with the middle value included in the data and one with it excluded. I use some theoretical reasoning as applicable to this research project in deciding to which group the neutral value respondents must be included. But when it's not possible theoretically for a particular question, I drop the middle value and explain that in the data section of the paper.

                      Comment


                      • #12
                        Originally posted by Raju Parakkal View Post
                        From what I understand, people are dropping the middle neutral value when using the Likert scale for surveys. But the survey for this project was already done with the neutral response option included in the questionnaire (before I joined the project).
                        Oh I know that instrument developers often omit the neutral ("neither agree nor disagree") response option from items when preparing the the questionnaire. That is, they force respondents to decide.

                        But that's different from first gathering data with the neutral response available to respondents, and then dropping those neutral responses from the dataset afterward.

                        Comment


                        • #13
                          I agree. That's why I am doing both the tests, including and excluding the neutral responses, and presenting both the results.

                          Comment


                          • #14
                            I agree with Joseph. Dropping the middle value is just throwing away information and indeed almost certainly giving wrong answers here.

                            Comment


                            • #15
                              Yes, you're right. I'm definitely keeping the middle values for those questions where I can theoretically justify classifying them in one group or the other. For other questions, I think I might have to provide some logical reasoning to retain them in the data so the results make sense.

                              Comment

                              Working...
                              X