Accounting for survey design when using Kendall's Tau

Asad Zaidi

Join Date: Jul 2018
Posts: 2

Accounting for survey design when using Kendall's Tau

21 Jul 2018, 08:41

Hello,

I am working with household-level income and expenditure data collected using a two-stage stratified sampling design, the strata being region (urban vs rural) and PSU.

I have ranked households into wealth quintiles using income (quintinc) and expenditure (quintexp) separately and want to assess how comparable these two methods are. Specifically, I want to see how good household expenditure is as a predictor for household income. I believe Kendall's Tau B is the best way to assess this as it can deal with ties (output below).

Code:

. ktau quintexp quintinc, stats(taub p)

  Number of obs =   24238
Kendall's tau-a =       0.4830
Kendall's tau-b =       0.6104
Kendall's score =  1.4e+08
    SE of score = 1200861.722   (corrected for ties)

Test of Ho: quintexp and quintinc are independent
     Prob > |z| =       0.0000  (continuity corrected)

While the -ktau- command lets me do a Tau B test, I don't see option to specify sampling type, and am not sure how much the estimates will be affected by sampling type.

I found a similar post on here where someone recommended using -somersd- (output below).

Code:

. somersd quintexp quintinc [pwei=weights], taua tdist transf(z) cluster(psu) wstrata(region)
Kendall's tau-a with variable: quintexp
Transformation: Fisher's z
Within strata defined by: region
Valid observations: 24238
Number of clusters: 1605
Degrees of freedom: 1604

Symmetric 95% CI for transformed Kendall's tau-a
                                (Std. Err. adjusted for 1,605 clusters in psu)
------------------------------------------------------------------------------
             |              Jackknife
    quintexp |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    quintexp |   1.061161   .0046923   226.15   0.000     1.051957    1.070364
    quintinc |   .4812417   .0091699    52.48   0.000     .4632553     .499228
------------------------------------------------------------------------------

Asymmetric 95% CI for untransformed Kendall's tau-a
                   Tau_a     Minimum     Maximum 
    quintexp   .78610767   .78256598   .78959848 
    quintinc   .44723747   .43273369    .4615098

However:

I cannot use this to do a Tau B test. ("option taub not allowed")
I'm not completely sure I understand how to interpret this output
While I've read the help files, I don't understand what the tdist and transf(z) options are doing and if they're necessary here.

I guess the bottom-line is: Should I stick with an 'unadjusted' Tau B, or should I go for a Tau A using -somersd- and just live with the fact that it cannot deal with ties? Or is there a better method out there?

Tags: None

Steve Samuels

Join Date: Mar 2014

Posts: 1785
#2

24 Jul 2018, 20:48

To answer your bottom line question: you have no choice: only with tau_a fromsomersd will have properly weighted estimates and valid p-values and standard errors, ones that respect the cluster sample design. The tau_a and tau_b from ktau will be biased as estimates of the population value and will have p-values that are wrong..

There's one error in your statement: You should omit w(strata). The strata referred to are not equivalent to sampling strata. Unfortunately, there's no way that I know of to specify sampling strata with somersd. One consequence is that standard errors for tau-a may be conservative.

1. The output for ktau is self-explanatory, with formulas given in the manual.

2. transf(z) where Z is Fisher's Z (the default for somersd is meant to make p-values and confidence intervals more accurate. tdist is intended to base inference on the t-distribution, rather than the Gaussian distribution. With clustered data and option cluster(), this ensures that the degrees of freedom are based on the number of clusters, not on the number of individuals. These Z-transformed values are shown in the first part of the output as "symmetric 95% CI". You would not put that table in a report. The tau-a values and CIs are in the section headed by "asymmetric 95% CI".

3. The order of variables in the somersd command affects the output. Here quintexpis the first variable in the command and we want the tau_a for its association with quintinc. This is the second line in the "asymmetric 95% CI section; the weighted estimate of tau_a is 0.4473747,But what is the meaning of au_a for quintexp with itself (0.786) in the first line? (t is the probability that two values quintexp randomly selected from the population are not equal. You can just ignore it.

Last edited by Steve Samuels; 24 Jul 2018, 20:53.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
1 like
Comment
Asad Zaidi

Join Date: Jul 2018

Posts: 2
#3

27 Jul 2018, 12:41

Thanks Steve Samuels, this is really helpful!

As a follow-up, do I still have to worry about survey design when performing a tau test if I tried to deal with it while ranking households as per my code below?

Code:

xtile byte quintexp = annualexp [pweight = weights], n(5)
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1785
#4

27 Jul 2018, 12:58

You do need to consider the survey design. The pweight in the xtile statement ensures that the quintile assignments refer to the population, not to the sample. In somersd, you need the cluster(psu) and (again) the pweight option.

Last edited by Steve Samuels; 27 Jul 2018, 13:01.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment

Announcement

Accounting for survey design when using Kendall's Tau

Comment

Comment

Comment