Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Measuring Nominal X Nominal association


    Reading a previous thread in which a measure of “correlation” [sic’] was requested for a binary X nominal crosstabulation prompted me to think about the absence from the Stata world (built-in or user-written) of an elegant but almost forgotten measure of association, “Goodman and Kruskal’s Tau.” (Goodman, L. A., & Kruskal, W. H. 1954 "Measures of association for cross classifications." Journal of the American Statistical Association, 49, 732-764.) I wanted to offer a brief didactic note and code fragment for it here, as I think what follows is a bit too scholarly for that previous thread. Other data analysis programs (e.g., SPSS) do include Tau.

    Tau is an asymmetric 0/1 normed measure of association for two non-ordered (nominal) categorical variables. Although originally derived based on the same proportional reduction in expected prediction error (PRE) rationale it shares with Goodman and Kruskal's Lambda (available in Stata as Nick Cox's -lambda- at SSC), Tau is much superior to the latter, for reasons I’ll leave aside here. Tau can also be understood/calculated as an R2 measure [1 - (conditional variation)/( total variation)], with variation measured by the Simpson diversity index (see, e.g. -ssc describe entropyetc-), and this is how I like to approach it.

    I’d say that Tau has never had the use or recognition it might deserve, given its simple and elegant rationale, and its connections to other measures. My completely unsupported explanation would be that Tau is a severe judge of relationships, giving uncomfortably low values on the 0/1 scale.

    While I don’t know that Tau deserves an “official” SSC entry, here’s a code I’ve used to calculate it, for whatever interest it might have to others.

    Code:
    cap mata mata drop gkt()
    mata:
    void gkt(string matrix sf) {
       f = st_matrix(sf)
            nrow = rows(f)
            ncol = cols(f)
            N = sum(f)
            rowmarg = (rowsum(f))/N
            E1  = (1 - (rowmarg' * rowmarg)) * N
            printf("   Total  variation= %f\n", E1 )
            //
            colsum = colsum(f)
            f = f :/ colsum
            E2 = 0
            for (j = 1; j <= ncol; j++) {
                    p = f[.,j]
                    next = (1- (p' * p)) * colsum[j]
                    printf("   Variation for col = %f:  %f\n", j, next)
                    E2 = E2 + next // (1- (p' * p)) * colsum[j]
            }
            printf("   Sum conditional variation = %f\n", E2)
            st_rclear()
            st_numscalar("tau", (E1-E2)/E1)
            st_numscalar("E1", E1)
            st_numscalar("E2", E2)
    }
    end
    //
    capture prog drop gktau
    program gktau, rclass
    * This program calculates the Goodman and Kruskal tau measure,
    * using the Simpson index
    // Use: gktau ResponseVariable ExplanatoryVariable
    syntax varlist [if] [in]
    marksample touse
    tempname f
    local y: word 1 of `varlist'
    local x: word 2 of `varlist'
    tab2 `y' `x' if `touse', matcell(`f') col chi2
    di ""
    return add
    // Could be calculated in Stata, but more convenient in Mata.
    mata: gkt("`f'")
    di as text "   G & K Tau = ", as result %7.4f tau
    return scalar tau = tau
    return scalar E1 = E1
    return scalar E2 = E2
    end
    //
    // Illustration
    sysuse auto, clear
    gktau rep78 foreign
    Last edited by Mike Lacy; 28 Sep 2023, 09:38.

  • #2
    Thanks Mike Lacy. For fun, I read the auto data into SPSS and used CROSSTABS to compute tau.

    Code:
    CROSSTABS
      /TABLES=rep78 BY foreign
      /FORMAT=AVALUE TABLES
      /STATISTICS=LAMBDA
      /CELLS=COUNT
      /COUNT ROUND CELL.
    From the help for CROSSTABS /STATISTICS:

    LAMBDA. Display lambda (symmetric and asymmetric) and Goodman and Kruskal’s tau in the Directional Measures table.
    Here is the key part of the output:

    Click image for larger version

Name:	gktau_example_SPSS.png
Views:	1
Size:	100.7 KB
ID:	1728650



    Notice that the GK tau is a directional measure. The value you got above matches the value with rep78 as the DV. To get the value with foreign as DV, you need to switch the order of the variables:

    Code:
    . gktau foreign rep78
    
    -> tabulation of foreign by rep78 if __000000
    
    +-------------------+
    | Key               |
    |-------------------|
    |     frequency     |
    | column percentage |
    +-------------------+
    
               |                   Repair record 1978
    Car origin |         1          2          3          4          5 |     Total
    -----------+-------------------------------------------------------+----------
      Domestic |         2          8         27          9          2 |        48
               |    100.00     100.00      90.00      50.00      18.18 |     69.57
    -----------+-------------------------------------------------------+----------
       Foreign |         0          0          3          9          9 |        21
               |      0.00       0.00      10.00      50.00      81.82 |     30.43
    -----------+-------------------------------------------------------+----------
         Total |         2          8         30         18         11 |        69
               |    100.00     100.00     100.00     100.00     100.00 |    100.00
    
              Pearson chi2(4) =  27.2640   Pr = 0.000
    
       Total  variation= 29.2173913
       Variation for col = 1:  0
       Variation for col = 2:  0
       Variation for col = 3:  5.4
       Variation for col = 4:  9
       Variation for col = 5:  3.27272727
       Sum conditional variation = 17.6727273
       G & K Tau =   0.3951
    Cheers,
    Bruce
    --
    Bruce Weaver
    Email: [email protected]
    Version: Stata/MP 19.5 (Windows)

    Comment


    • #3
      Following up on #2, you can add an EXACT sub-command to CROSSTABS in SPSS to get exact p-values (if you wish).

      Code:
      CROSSTABS
        /TABLES=rep78 BY foreign
        /FORMAT=AVALUE TABLES
        /STATISTICS=LAMBDA
        /CELLS=COUNT
        /COUNT ROUND CELL
        /METHOD=EXACT TIMER(5).

      Click image for larger version

Name:	gktau_example_SPSS_exact.png
Views:	1
Size:	105.2 KB
ID:	1728652

      --
      Bruce Weaver
      Email: [email protected]
      Version: Stata/MP 19.5 (Windows)

      Comment

      Working...
      X