Kruskal-Wallis H test with Binary Variable across multiple groups

John Richmond

Join Date: Feb 2021

Posts: 3
#1

Kruskal-Wallis H test with Binary Variable across multiple groups

03 Feb 2021, 07:58

Hi; I am a qualitative researcher who is relatively new to STATA. I have found reading this forum very helpful so far. I have a question, I am trying to tell the statistical significance of means between multiple groups.

I used Kruskal-Wallis H test for a number of activities (all the different activities are scored (1,2,3,4,5) likert style) across the multiple demographic categories (Region with 4 options, Size, with 3 options, Affiliation with 2 options, and Responsibility with 4options). This works great.

However, can i run a similar test for another set of activity variables that are scored as binary (1,0) by the multiple groups? I did run the table using the binary variables and it looks great, but I was concerned that because the activity variables are Binary I might be misinterpreting my table and significance scores.

1) Can i use Kruskal-Wallis Test using Binary Variables?
2) If I cannot, what other test could I run? (ideally that uses multiple categorical groups).

Helpful input is appreciated. Thank you.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35691
#2

03 Feb 2021, 08:16

The mean of a binary variable encodes exactly the same information as the ranks on two values (which are necessarily heavily tied). I can't see any advantage of using Kruskal-Wallis over using a generalised linear model with binomial family (and logit link). Indeed the latter tells you much more. All the K-W tells you is whether the groups differ through a P-value whereas the GLM allows many more questions to be answered (or raised).

Last edited by Nick Cox; 03 Feb 2021, 08:23.
2 likes
Comment
John Richmond

Join Date: Feb 2021

Posts: 3
#3

03 Feb 2021, 09:02

Thank you so much Nick. I can easily run regression on each activity (binary variable), by each of the categories. That is what you mean right? I think what your saying is this will show direction of correlation, significance and more - over and above the K-W test.

But is reporting K-W using Binary variables (as you say, rank of two values), by multiple groups incorrect or unacceptable in some way? I want to know if means reported by activity for each group are statistically different. Here is the anonymized table.

Attached Files
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#4

03 Feb 2021, 09:04

John:
welcome to this forum.
-glm- (as suggested by Nick), -logit- or -logistic- can also give you the chance to interact your possible predictors (groups; regions and so on).
See -fvvarlist- notation in this respect.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1132
#5

03 Feb 2021, 11:45

Hi John. Adding to the advice that you got about how to handle your dichotomous outcome variable, I will suggest that you consider using the -ologit- command to analyze the Likert-type outcome variables. Type -help ologit- for more info. You can also find lots of good info about both binary logit and ordered (or ordinal) logit models on Richard Williams' website. See this course page, for example:
https://www3.nd.edu/~rwilliam/xsoc73994/index.html

Section I has lots of info about binary logit models; and Section II has some info about ordered logit models.

One issue to be aware of with logit models is that rules-of-thumb about how many variables you can include are related to the number of events rather than to the number of observations. For binary logit models, an event is an observation with the outcome equal to a Yes if Yes is less frequent than No, or a No if No is less frequent than Yes. One good source on this issue is Mike Babyak's (2004) article on overfitting:
https://people.duke.edu/~mababyak/pa...regression.pdf

Babyak recommended that one should have 10-15 events-per-variable. But Frank Harrell now recommends 20 EPV in his checklist. I'm not entirely sure how to apply this rule of thumb in the case of multinomial or ordinal logit models. Perhaps someone else can comment on that.

HTH.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
2 likes
Comment
John Richmond

Join Date: Feb 2021

Posts: 3
#6

04 Feb 2021, 03:17

Thank you Nick, Carlo, and Bruce for responding. I am now exploring the data a few different ways based on your recommendations and resources (which I am still reviewing so bear with me). Without appearing too obtuse, can you comment on whether the table i reported (and it's significant levels) are at the very least comprehensible and haven't violated any form of rules? all the sub groups are independent of each other.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#7

04 Feb 2021, 03:43

John:
my gut feeling is that such rich tables are unlikely to pass the muster with any decent technical journal (and this might be a sound reason to think over again the way your results are reported).
Personally I consider tabulating what the devil does during her/his leisure time (as her/his working time is devoted to -stepwise-) and sometimes I ended up with similar situations (and stubbornily tabulating took me nowhere).
More substantively, from your tables I got the (misleading) feeling of no relationship among regions, organization size and affiliation, that I find hard to believe holds in the real world (that's why, just like Nick did, I propose a regression approach to analyze your data).

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Kruskal-Wallis H test with Binary Variable across multiple groups

Comment

Comment

Comment

Comment

Comment

Comment