Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Accounting for survey design when using Kendall's Tau

    Hello,

    I am working with household-level income and expenditure data collected using a two-stage stratified sampling design, the strata being region (urban vs rural) and PSU.

    I have ranked households into wealth quintiles using income (quintinc) and expenditure (quintexp) separately and want to assess how comparable these two methods are. Specifically, I want to see how good household expenditure is as a predictor for household income. I believe Kendall's Tau B is the best way to assess this as it can deal with ties (output below).

    Code:
    . ktau quintexp quintinc, stats(taub p)
    
      Number of obs =   24238
    Kendall's tau-a =       0.4830
    Kendall's tau-b =       0.6104
    Kendall's score =  1.4e+08
        SE of score = 1200861.722   (corrected for ties)
    
    Test of Ho: quintexp and quintinc are independent
         Prob > |z| =       0.0000  (continuity corrected)
    While the -ktau- command lets me do a Tau B test, I don't see option to specify sampling type, and am not sure how much the estimates will be affected by sampling type.

    I found a similar post on here where someone recommended using -somersd- (output below).

    Code:
    . somersd quintexp quintinc [pwei=weights], taua tdist transf(z) cluster(psu) wstrata(region)
    Kendall's tau-a with variable: quintexp
    Transformation: Fisher's z
    Within strata defined by: region
    Valid observations: 24238
    Number of clusters: 1605
    Degrees of freedom: 1604
    
    Symmetric 95% CI for transformed Kendall's tau-a
                                    (Std. Err. adjusted for 1,605 clusters in psu)
    ------------------------------------------------------------------------------
                 |              Jackknife
        quintexp |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
        quintexp |   1.061161   .0046923   226.15   0.000     1.051957    1.070364
        quintinc |   .4812417   .0091699    52.48   0.000     .4632553     .499228
    ------------------------------------------------------------------------------
    
    Asymmetric 95% CI for untransformed Kendall's tau-a
                       Tau_a     Minimum     Maximum 
        quintexp   .78610767   .78256598   .78959848 
        quintinc   .44723747   .43273369    .4615098

    However:
    1. I cannot use this to do a Tau B test. ("option taub not allowed")
    2. I'm not completely sure I understand how to interpret this output
    3. While I've read the help files, I don't understand what the tdist and transf(z) options are doing and if they're necessary here.
    I guess the bottom-line is: Should I stick with an 'unadjusted' Tau B, or should I go for a Tau A using -somersd- and just live with the fact that it cannot deal with ties? Or is there a better method out there?

  • #2
    To answer your bottom line question: you have no choice: only with tau_a fromsomersd will have properly weighted estimates and valid p-values and standard errors, ones that respect the cluster sample design. The tau_a and tau_b from ktau will be biased as estimates of the population value and will have p-values that are wrong..

    There's one error in your statement: You should omit w(strata). The strata referred to are not equivalent to sampling strata. Unfortunately, there's no way that I know of to specify sampling strata with somersd. One consequence is that standard errors for tau-a may be conservative.

    1. The output for ktau is self-explanatory, with formulas given in the manual.

    2. transf(z) where Z is Fisher's Z (the default for somersd is meant to make p-values and confidence intervals more accurate. tdist is intended to base inference on the t-distribution, rather than the Gaussian distribution. With clustered data and option cluster(), this ensures that the degrees of freedom are based on the number of clusters, not on the number of individuals. These Z-transformed values are shown in the first part of the output as "symmetric 95% CI". You would not put that table in a report. The tau-a values and CIs are in the section headed by "asymmetric 95% CI".

    3. The order of variables in the somersd command affects the output. Here quintexpis the first variable in the command and we want the tau_a for its association with quintinc. This is the second line in the "asymmetric 95% CI section; the weighted estimate of tau_a is 0.4473747,But what is the meaning of au_a for quintexp with itself (0.786) in the first line? (t is the probability that two values quintexp randomly selected from the population are not equal. You can just ignore it.
    Last edited by Steve Samuels; 24 Jul 2018, 20:53.
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      Thanks Steve Samuels, this is really helpful!

      As a follow-up, do I still have to worry about survey design when performing a tau test if I tried to deal with it while ranking households as per my code below?

      Code:
      xtile byte quintexp = annualexp [pweight = weights], n(5)

      Comment


      • #4
        You do need to consider the survey design. The pweight in the xtile statement ensures that the quintile assignments refer to the population, not to the sample. In somersd, you need the cluster(psu) and (again) the pweight option.
        Last edited by Steve Samuels; 27 Jul 2018, 13:01.
        Steve Samuels
        Statistical Consulting
        [email protected]

        Stata 14.2

        Comment

        Working...
        X