Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Non-parametric test for skewness in between-subject design

    Dear all,

    I have survey data with a treatment (0, 1) and a dependent variable that takes values from 0 to 8 (It is generated from how many 1's the subjects have answered from 8 binary questions), which I recode for the purpose of my research as from 0 to 3 (0 being the most extreme cases of the DV - 0, 1, 7, 8). I have a between-subject design and want to test differences in that variable across treatment. I am providing with a minimal example:
    Code:
    clear
    input float(id treatment DV DV_recode)
      1 0 5 2
      2 1 8 0
      3 1 1 0
      4 1 8 0
      5 0 6 1
      6 1 0 0
      7 1 0 0
      8 1 3 2
      9 1 4 3
     10 0 4 3
     11 0 5 2
     12 0 7 0
     13 1 0 0
     14 1 2 1
     15 0 3 2
     16 0 6 1
     17 1 8 0
     18 1 8 0
     19 1 0 0
     20 0 1 0
     21 0 2 1
     22 0 2 1
     23 1 5 2
     24 0 6 1
     25 1 3 2
    end


    My current idea is to do a Mann-Whitney U test for the non-recoded variable and a Chi-squared test for each treatment separately to test if it is significantly different from random. However both the Mann-Whitney and Chi-squared only look at the mean only but my research question is concerning differences in skewness between treatment and control (treatment having more DV_recode 0's than control). Is there more formal test that does that other than simply comparing the skewness of both groups?

    Thank you!
    Sincerely,
    Radoslav

    P.S. I read about Pearson residuals that could help see from where the differences in the Chi-squared come from. Would that be a suitable option? If so how is it implemented in Stata?
    Last edited by Radoslav Velev; 30 Jul 2018, 17:51.

  • #2
    It seems that by your construction of DV_recode that you're actually interested in dispersion and not skew. You can get at that with
    Code:
    robvar DV, by(treatment)
    or even
    Code:
    mixed DV i.treatment, residuals(independent, by(treatment)) nolog
    (the LR test given after that is for heteroskedasticity), although given that you have a sum of only a handful of binomial outcomes I'd probably go with the Levene / Brown-Forsythe approach of robvar.

    Comment


    • #3
      Was there an a priori hypothesis that Treatment 1 would have more observations at the extremes? If not, then the analysis is directed to proving an association suggested by the data.

      As you have survey data, you'll also need a survey analysis, starting with svyset. Please describe the survey design; what the "treatment" and dv are; and what the original study questions were. Also show us the result of
      Code:
      svy: tab DV_recode treat, col
      Last edited by Steve Samuels; 30 Jul 2018, 20:58.
      Steve Samuels
      Statistical Consulting
      [email protected]

      Stata 14.2

      Comment


      • #4
        Thanks you for your comments! Joseph, I am more interested in the skewness of DV_recode rather than variance of DV (although, I already reject the null with robvar meaning that I have differences in variance between the treatments). Steve, this is a priori hypothesis because my treatment is a modified design that intends to capture better extreme outcomes. I am posting the results of the svy: tab DV_recode treat, col



        Code:
        . svy: tab DV_recode treat, col
        (running tabulate on estimation sample)
        
        Number of strata   =         1                  Number of obs     =        108
        Number of PSUs     =         2                  Population size   =        108
                                                        Design df         =          1
        
        -------------------------------
        RECODE of |      treatment     
        DV      |     0      1  Total
        ----------+--------------------
                0 | .0943  .5273  .3148
                1 | .4151  .2545  .3333
                2 | .3019  .1455  .2222
                3 | .1887  .0727  .1296
                  | 
            Total |     1      1      1
        -------------------------------
          Key:  column proportion
        Note: Variance estimate degrees of freedom = 1 are less than nominal table degrees of freedom = 3
        
          Pearson:
            Uncorrected   chi2(3)         =   23.9282
            Design-based  F(2.26, 2.26)   =  4.93e+15     P = 0.0000

        Comment


        • #5
          Any advice on this? I am mostly stuck on figuring out how to implement the specifics of my hypothesis in terms of skewness into formal statistical tests.

          Comment


          • #6
            Sorry, I have no advice to offer.
            Steve Samuels
            Statistical Consulting
            [email protected]

            Stata 14.2

            Comment


            • #7
              Originally posted by Radoslav Velev View Post
              Any advice on this? I am mostly stuck on figuring out how to implement the specifics of my hypothesis in terms of skewness into formal statistical tests.
              Well, I think that DV_recode isn't going to show you skew. You might be able to do something with resampling the difference in the coefficients of skew between the two groups, but I'm not sure how that would work with a survey design to accommodate. What exactly are the specifics of your hypothesis in terms of skewness?

              Comment


              • #8
                Thank you for your comments! I am sorry for the delayed response. My hypothesis is that the skew of DV_recode is different for the Treatment and Control. From this, I think I need to compare the two skewness measures but I cannot find a way to calculate the confidence intervals of those in Stata. Is there a way to compute those or maybe a formal test?

                I found in https://brownmath.com/stat/shape.htm (Section Inference) that this should be possible (by calculating the
                standard error of skewness (SES)) so I wonder how it is implemented in Stata.

                P.S. I also found a similar thread https://www.stata.com/statalist/arch.../msg00244.html suggesting the use of -poisson- but I am not sure how it relates to my question.
                Last edited by Radoslav Velev; 02 Aug 2018, 07:53.

                Comment


                • #9
                  Hi Radoslav. I have not been following this thread very closely, but I wonder if you might find something useful by looking at the following.

                  Code:
                  ssc describe moments2
                  You can choose whichever measure of skewness you want via the type() option for moments2, and I expect you can find a formula for the relevant variance in Joanes & Gill (1998). (Taking the square root of that variance will give you the SE.)
                  Joanes DN, Gill CA. Comparing measures of sample skewness and kurtosis. Journal of the Royal Statistical Society: Series D (The Statistician). 1998 Apr;47(1):183-9.
                  HTH.
                  --
                  Bruce Weaver
                  Email: [email protected]
                  Version: Stata/MP 18.5 (Windows)

                  Comment


                  • #10
                    Originally posted by Radoslav Velev View Post
                    My hypothesis is that the skew of DV_recode [FONT=arial]is different for the Treatment and Control.
                    It just seems to me that the skew of DV_recode, with only four categories, will reflect its mean, and that, in turn, reflects nothing but the magnitude of the dispersion of the original variable, DV. That is, you're going in a roundabout fashion to testing the hypothesis that your original DV exhibits a difference in variance between the two groups. That's something that you can examine in a more straightforward manner, which is what I suggested, and what you have apparently already done.

                    Comment


                    • #11
                      Joseph, thank you for your reply! I now fully understood your point - the skew of DV_recode is actually a reflection of the dispersion of DV. Bruce, thanks for your reply too, I find the paper you quoted applicable in my case too.

                      Comment

                      Working...
                      X