Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Displaying p-values from Kruskal-Willis tests in a loop

    Hi there! I'd really appreciate some feedback on some code I wrote (below). Basically, I'd like to run Kruskal-Wallis tests on a long list of variables, and display only those with p-values less than 0.05.

    Code:
    sysuse auto, clear
    foreach var of varlist price mpg rep78 {
        display "`var'"
        quietly kwallis `var', by(weight)
        if chi2tail(r(df), r(chi2)) < 0.05 {
            local test `test' `var'
        }
    }
    display `"`test'"'
    After running the KW test on each variable individually, I can see that this code seems to have produced the correct result (i.e., no p-values less than 0.05). But it's hard to confirm without having a significant p-value to check against.

    Thanks for any/all feedback!

  • #2
    This is code I use for exploratory chi-square test for binary variables in my dataset looking at complete response rates for 4 binary variables. You could possibly adapt this easily for your purpose. You will get the logic. Uses frames and Stata18.

    Code:
    local x gender mdsrgrp tazy2 ep7
    local by cr1        
    local subset    "" 
    
    
    tempname chi
    frame create `chi' str30 varname str50 var chi2 N
    
    foreach v of local x{
        qui tab `v' `by' `subset', chi2
        frame post `chi' ("`v'") (`"`: variable label `v''"') (round(r(p),0.0001)) (r(N))
    }
    frame change `chi'
    frame `chi' { 
        gsort chi2
        drop if chi2 > 0.2
    }
    list, noobs

    Comment


    • #3
      [CODE]
      sysuse auto, clear foreach var of varlist price mpg rep78 {
      display "`var'"
      quietly kwallis `var', by(weight)
      if chi2tail(r(df), r(chi2)) < 0.5 { //change here to 0.5 just to see it works
      local test `test' `var'
      }
      }

      display `"`test'"'
      /CODE]

      Comment


      • #4
        As
        Code:
        weight
        takes on 64 distinct values in the auto dataset, it's an implausible choice here. At the same time it's clearly just part of an arbitrary example.

        I wouldn't apply round() to a P-value as in #2 for two reasons:

        1. round() is not a good way to round for presentation purposes unless the second argument is a power of 2, This point often arises here on Statalist. The issue is precision: as at machine level Stata works in binary, so also it doesn't work with decimals in the way that people expect from arithmetic as learned in childhood.

        https://journals.sagepub.com/doi/10....867X1801800311 Section 4 addresses the main point.

        2. More fundamentally, if you care about P-values at all, you should want the full precision to be preserved on a first loop through the data, so that later you can decide what to work with and what to present.

        Here's a first stab at my guess of what the OP might want. I've borrowed some notions from earlier posts in the thread.

        Code:
         
        . sysuse auto, clear
        (1978 automobile data)
        
        . mykwallis mpg weight price, by(foreign) frame(kw_results)
        
        Kruskal–Wallis equality-of-populations rank test
        
          +---------------------------+
          |  foreign | Obs | Rank sum |
          |----------+-----+----------|
          | Domestic |  52 |  1688.50 |
          |  Foreign |  22 |  1086.50 |
          +---------------------------+
        
          chi2(1) =  9.564
             Prob = 0.0020
        
          chi2(1) with ties =  9.614
                       Prob = 0.0019
        
        Kruskal–Wallis equality-of-populations rank test
        
          +---------------------------+
          |  foreign | Obs | Rank sum |
          |----------+-----+----------|
          | Domestic |  52 |  2379.50 |
          |  Foreign |  22 |   395.50 |
          +---------------------------+
        
          chi2(1) = 25.800
             Prob = 0.0001
        
          chi2(1) with ties = 25.804
                       Prob = 0.0001
        
        Kruskal–Wallis equality-of-populations rank test
        
          +---------------------------+
          |  foreign | Obs | Rank sum |
          |----------+-----+----------|
          | Domestic |  52 |  1862.00 |
          |  Foreign |  22 |   913.00 |
          +---------------------------+
        
          chi2(1) =  1.083
             Prob = 0.2980
        
          chi2(1) with ties =  1.083
                       Prob = 0.2980
        
        . frame change kw_results
        
        . list
        
             +-------------------------------+
             | varname      chisq     Pvalue |
             |-------------------------------|
          1. |     mpg   9.563952   .0019844 |
          2. |  weight   25.80003   3.79e-07 |
          3. |   price   1.083074   .2980108 |
             +-------------------------------+
        
        . format chisq %2.1f
        
        . format Pvalue %6.5f
        
        . list
        
             +---------------------------+
             | varname   chisq    Pvalue |
             |---------------------------|
          1. |     mpg     9.6   0.00198 |
          2. |  weight    25.8   0.00000 |
          3. |   price     1.1   0.29801 |
             +---------------------------+
        Here is mykwallis.ado

        Code:
        *! 1.0.0 free to anyone who wants to take it further 
        program mykwallis 
                version 16 
                syntax varlist(numeric) [if] [in] , by(varname) frame(str)
                
                marksample touse 
                markout `touse' `by', strok 
                quietly count if `touse'
                if r(N) == 0 error 2000 
                if r(N) == 1 error 2001 
                
                frame create `frame' str32 varname chisq Pvalue 
                
                foreach v of local varlist {
                        kwallis `v' if `touse', by(`by')
                        frame post `frame' ("`v'") (r(chi2)) (chi2tail(r(df), r(chi2))) 
                }
        end
        It is just a sketch. Much could be done differently or in addition. Do you prefer the adjustment for ties?

        All that said, conservative statistical thinking is that you should think about penalising yourself for so many significance tests performed shotgun-fashion.

        Comment


        • #5
          I appreciate all the input - thank you all so much!

          Comment

          Working...
          X