Non-parametric test for skewness in between-subject design

Radoslav Velev

Join Date: Apr 2017

Posts: 12
#1

Non-parametric test for skewness in between-subject design

30 Jul 2018, 16:56

Dear all,

I have survey data with a treatment (0, 1) and a dependent variable that takes values from 0 to 8 (It is generated from how many 1's the subjects have answered from 8 binary questions), which I recode for the purpose of my research as from 0 to 3 (0 being the most extreme cases of the DV - 0, 1, 7, 8). I have a between-subject design and want to test differences in that variable across treatment. I am providing with a minimal example:

Code:

clear input float(id treatment DV DV_recode) 1 0 5 2 2 1 8 0 3 1 1 0 4 1 8 0 5 0 6 1 6 1 0 0 7 1 0 0 8 1 3 2 9 1 4 3 10 0 4 3 11 0 5 2 12 0 7 0 13 1 0 0 14 1 2 1 15 0 3 2 16 0 6 1 17 1 8 0 18 1 8 0 19 1 0 0 20 0 1 0 21 0 2 1 22 0 2 1 23 1 5 2 24 0 6 1 25 1 3 2 end

My current idea is to do a Mann-Whitney U test for the non-recoded variable and a Chi-squared test for each treatment separately to test if it is significantly different from random. However both the Mann-Whitney and Chi-squared only look at the mean only but my research question is concerning differences in skewness between treatment and control (treatment having more DV_recode 0's than control). Is there more formal test that does that other than simply comparing the skewness of both groups?

Thank you!
Sincerely,
Radoslav

P.S. I read about Pearson residuals that could help see from where the differences in the Chi-squared come from. Would that be a suitable option? If so how is it implemented in Stata?

Last edited by Radoslav Velev; 30 Jul 2018, 17:51.
Tags: None
Joseph Coveney

Join Date: Apr 2014

Posts: 4398
#2

30 Jul 2018, 19:01

It seems that by your construction of DV_recode that you're actually interested in dispersion and not skew. You can get at that with

Code:

robvar DV, by(treatment)

or even

Code:

mixed DV i.treatment, residuals(independent, by(treatment)) nolog

(the LR test given after that is for heteroskedasticity), although given that you have a sum of only a handful of binomial outcomes I'd probably go with the Levene / Brown-Forsythe approach of robvar.
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#3

30 Jul 2018, 20:01

Was there an a priori hypothesis that Treatment 1 would have more observations at the extremes? If not, then the analysis is directed to proving an association suggested by the data.

As you have survey data, you'll also need a survey analysis, starting with svyset. Please describe the survey design; what the "treatment" and dv are; and what the original study questions were. Also show us the result of

Code:

svy: tab DV_recode treat, col

Last edited by Steve Samuels; 30 Jul 2018, 20:58.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment

Radoslav Velev

Join Date: Apr 2017
Posts: 12

31 Jul 2018, 03:43

Thanks you for your comments! Joseph, I am more interested in the skewness of DV_recode rather than variance of DV (although, I already reject the null with robvar meaning that I have differences in variance between the treatments). Steve, this is a priori hypothesis because my treatment is a modified design that intends to capture better extreme outcomes. I am posting the results of the svy: tab DV_recode treat, col

Code:

. svy: tab DV_recode treat, col
(running tabulate on estimation sample)

Number of strata   =         1                  Number of obs     =        108
Number of PSUs     =         2                  Population size   =        108
                                                Design df         =          1

-------------------------------
RECODE of |      treatment     
DV      |     0      1  Total
----------+--------------------
        0 | .0943  .5273  .3148
        1 | .4151  .2545  .3333
        2 | .3019  .1455  .2222
        3 | .1887  .0727  .1296
          | 
    Total |     1      1      1
-------------------------------
  Key:  column proportion
Note: Variance estimate degrees of freedom = 1 are less than nominal table degrees of freedom = 3

  Pearson:
    Uncorrected   chi2(3)         =   23.9282
    Design-based  F(2.26, 2.26)   =  4.93e+15     P = 0.0000

Comment

Radoslav Velev

Join Date: Apr 2017

Posts: 12
#5

31 Jul 2018, 17:21

Any advice on this? I am mostly stuck on figuring out how to implement the specifics of my hypothesis in terms of skewness into formal statistical tests.
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#6

31 Jul 2018, 19:43

Sorry, I have no advice to offer.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4398
#7

31 Jul 2018, 20:37

Originally posted by Radoslav Velev View Post

Any advice on this? I am mostly stuck on figuring out how to implement the specifics of my hypothesis in terms of skewness into formal statistical tests.

Well, I think that DV_recode isn't going to show you skew. You might be able to do something with resampling the difference in the coefficients of skew between the two groups, but I'm not sure how that would work with a survey design to accommodate. What exactly are the specifics of your hypothesis in terms of skewness?
Comment
Radoslav Velev

Join Date: Apr 2017

Posts: 12
#8

02 Aug 2018, 07:25

Thank you for your comments! I am sorry for the delayed response. My hypothesis is that the skew of DV_recode is different for the Treatment and Control. From this, I think I need to compare the two skewness measures but I cannot find a way to calculate the confidence intervals of those in Stata. Is there a way to compute those or maybe a formal test?

I found in https://brownmath.com/stat/shape.htm (Section Inference) that this should be possible (by calculating the standard error of skewness (SES)) so I wonder how it is implemented in Stata.

P.S. I also found a similar thread https://www.stata.com/statalist/arch.../msg00244.html suggesting the use of -poisson- but I am not sure how it relates to my question.

Last edited by Radoslav Velev; 02 Aug 2018, 07:53.
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1129
#9

02 Aug 2018, 09:08

Hi Radoslav. I have not been following this thread very closely, but I wonder if you might find something useful by looking at the following.

Code:

ssc describe moments2

You can choose whichever measure of skewness you want via the type() option for moments2, and I expect you can find a formula for the relevant variance in Joanes & Gill (1998). (Taking the square root of that variance will give you the SE.)
Joanes DN, Gill CA. Comparing measures of sample skewness and kurtosis. Journal of the Royal Statistical Society: Series D (The Statistician). 1998 Apr;47(1):183-9.
HTH.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4398
#10

02 Aug 2018, 17:15

Originally posted by Radoslav Velev View Post

My hypothesis is that the skew of DV_recode [FONT=arial]is different for the Treatment and Control.

It just seems to me that the skew of DV_recode, with only four categories, will reflect its mean, and that, in turn, reflects nothing but the magnitude of the dispersion of the original variable, DV. That is, you're going in a roundabout fashion to testing the hypothesis that your original DV exhibits a difference in variance between the two groups. That's something that you can examine in a more straightforward manner, which is what I suggested, and what you have apparently already done.
Comment
Radoslav Velev

Join Date: Apr 2017

Posts: 12
#11

03 Aug 2018, 05:45

Joseph, thank you for your reply! I now fully understood your point - the skew of DV_recode is actually a reflection of the dispersion of DV. Bruce, thanks for your reply too, I find the paper you quoted applicable in my case too.
Comment

Announcement

Non-parametric test for skewness in between-subject design

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment