How to display confidence intervals around Kramer's V

Ricky Gettys

Join Date: Jul 2016

Posts: 21
#1

How to display confidence intervals around Kramer's V

05 Jul 2016, 14:35

I am using Kramer's V to compare voting records and income levels in select countries across a period of 20 years. The survey data varies widely, with differing numbers of rows and columns.

How can I get confidence intervals for Kramer's V in Stata?

the base command is

Code:

tab [var1] [var2], V

but since I want to compare the statistics, it would be more helpful if I could get a confidence interval for each output.

the closest thing I could find was

Code:

pwcorr [var1] [var2], sig

but this is inadequate because I think I need to use Kramer's V for the number of rows and columns I am using (2 rows, 2-9 columns)
I would be happy if I could just get a standard error for Kramer's V, but Confidence Interval is of course preferable.

Any suggestions?

Last edited by Ricky Gettys; 05 Jul 2016, 14:45.
Tags: confidence intervals, correlation coefficient, Kramer's V, phi coefficient
Nick Cox

Join Date: Mar 2014

Posts: 35810
#2

05 Jul 2016, 15:08

Harald Cramér was the person in question.

https://en.wikipedia.org/wiki/Harald_Cram%C3%A9r

I'd bootstrap:

Code:

sysuse auto, clear tab foreign rep78, V ret li bootstrap V = r(CramersV), reps(10000) nodots : tab foreign rep78, V estat bootstrap
Comment
Ricky Gettys

Join Date: Jul 2016

Posts: 21
#3

06 Jul 2016, 09:51

So just to walk through the code...

Code:

ret li

is a return list (http://www.stata.com/help.cgi?return) which gets the results from the tab foreign rep78, V command?

Code:

bootstrap V = r(CramersV), reps(10000) nodots : tab foreign rep78, V

bootstrap (http://www.stata.com/manuals13/rbootstrap.pdf) I think I get, there are some syntactical things that I am not picking up from the instructions like why you used the V after the bootstrap command, the r(CramersV) and the nodots option. The :tab foreign rep78, V just means that the bootstrap will replicate that command before providing statistics?

Code:

estat bootstrap

is the actual command to provide statistics.

Sorry, I am going for thoroughness as I will need to explain this to a few other coders.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35810
#4

06 Jul 2016, 10:13

Yes indeed. bootstrap is calling tabulate repeatedly with different samples.

bootstrap needs to know which part of the output to use and tabulate won't calculate the statistic you want unless you specify the corresponding option.
Comment
Ricky Gettys

Join Date: Jul 2016

Posts: 21
#5

06 Jul 2016, 10:19

Using this method, with a sample of 1000 instead of 10000, it takes about 30 seconds for each bootstrap command to load (when I did it with 10000 reps, it took much longer). Any advice on finding the right balance between robustness(number of reps) and saving time(and money)?

Last edited by Ricky Gettys; 06 Jul 2016, 10:26.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35810
#6

06 Jul 2016, 10:49

I am of the generation for whom, when learning, a quick job came back in 3 hours and for a long job you were subject to the caprice of the operators and it might be a week before your card deck [NB] was loaded. (You didn't even get to see the computer.)

Seriously, no, I can't judge any trade-off between accuracy, time and money but my own. But I'd factor in how much you care about getting it right. If you want to publish these results, many reviewers would carp at low numbers of replications. If it's worth publishing, it's worth doing it properly. I can't guess how fragile this measure is for your data.
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30192
#7

06 Jul 2016, 10:56

A Google search for "sampling distribution of Cramer's V" turns up http://www.people.vcu.edu/~pdattalo/...inalAssoc.html, which in turn claims that Cramer's V has a known sampling distribution and references pages 15-16 in Liebetrau, Albert M. (1983). Measures of association. Newbury Park, CA: Sage Publications. Quantitative Applications in the Social Sciences Series No. 32.

I have neither access to that publication nor the time to look into it if I did. But I bring it to your attention because this seems to imply that the standard error of Cramer's V can be calculated from a formula, saving you the time and trouble of bootstrapping.

I hope this helps.
Comment
Ricky Gettys

Join Date: Jul 2016

Posts: 21
#8

06 Jul 2016, 11:29

I didn't want to sound ungrateful for the bootstrap help, it is a very useful tool indeed! Thank you. I might as well prod around for other answers just in case, so others can look at the forum and save time in their projects. (And since my dataset is so large, small time reductions can save me hundreds of dollars)

As for the formula, as soon as I am able to find it, in theory I would be able to use values from the ret li command and then insert them?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30192
#9

06 Jul 2016, 11:41

As for the formula, as soon as I am able to find it, in theory I would be able to use values from the ret li command and then insert them?

I can't answer that, as I don't know what the formula is. But presumably it can be calculated either from the returned statistics or from numbers in the cells of the cross-tabulation (which you can capture in a matrix using the -matcell()- option of -tab-.) Whether this will be simple or difficult, I have no idea.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4457
#10

07 Jul 2016, 06:08

The attached do-file (output shown below) computes confidence intervals for Cramér's V. The method is from Michael Smithson's Confidence Intervals Thousand Oaks, Calif.: Sage, 2003. pp.39–41. The do-file contains a Stata program (adapted from Smithson's SAS programs made available on his book's companion website) that takes the output of tabulate , chi2 and computes the confidence bounds. The do-file illustrates its use with a couple of worked examples (one from Smithson's book and another from the Web). The program could be put into an ado-file that acts as a wrapper to call tabulate to make the process seemless. The computation is straightforward, but could be condensed even more by implementing it in Mata within the ado-file.

.ÿversionÿ14.1

.ÿ
.ÿclearÿ*

.ÿsetÿmoreÿoff

.ÿ
.ÿ/*ÿAdaptedÿfromÿhttp://psychology3.anu.edu.au/people/smithson/details/CIstuff/CramersV.sas
>ÿÿÿÿandÿhttp://psychology3.anu.edu.au/people/smithson/details/CIstuff/Noncchi.sasÿ*/
.ÿprogramÿdefineÿCramérCI,ÿrclass
ÿÿ1.ÿÿÿÿÿÿÿÿÿversionÿ14.1
ÿÿ2.ÿÿÿÿÿÿÿÿÿsyntaxÿ,ÿChi(real)ÿN(integer)ÿRows(integer)ÿColumns(integer)ÿ[Level(realÿ`c(level)')]
ÿÿ3.ÿ
.ÿÿÿÿÿÿÿÿÿlocalÿrÿ=ÿ`rows'ÿ-ÿ1
ÿÿ4.ÿÿÿÿÿÿÿÿÿlocalÿcÿ=ÿ`columns'ÿ-ÿ1
ÿÿ5.ÿÿÿÿÿÿÿÿÿlocalÿdfÿ=ÿ`r'ÿ*ÿ`c'
ÿÿ6.ÿÿÿÿÿÿÿÿÿlocalÿpÿ=ÿ(1ÿ-ÿ`level'ÿ/ÿ100)ÿ/ÿ2
ÿÿ7.ÿ
.ÿÿÿÿÿÿÿÿÿtempnameÿncploÿncphi
ÿÿ8.ÿÿÿÿÿÿÿÿÿscalarÿdefineÿ`ncplo'ÿ=ÿnpnchi2(`df',ÿ`chi',ÿ1-`p')
ÿÿ9.ÿÿÿÿÿÿÿÿÿscalarÿdefineÿ`ncphi'ÿ=ÿnpnchi2(`df',ÿ`chi',ÿ`p')
ÿ10.ÿ
.ÿÿÿÿÿÿÿÿÿlocalÿnkÿ=ÿ`n'ÿ*ÿmin(`r',ÿ`c')
ÿ11.ÿ
.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿVÿ=ÿsqrt(`chi'ÿ/ÿ`nk')
ÿ12.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿlbÿ=ÿsqrt(ÿ(cond(mi(`ncplo'),ÿ0,ÿ`ncplo')ÿ+ÿ`df')ÿ/ÿ`nk'ÿ)
ÿ13.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿubÿ=ÿsqrt(ÿ(cond(mi(`ncphi'),ÿ0,ÿ`ncphi')ÿ+ÿ`df')ÿ/ÿ`nk'ÿ)
ÿ14.ÿend

.ÿ
.ÿ*
.ÿ*ÿWorkedÿexamplesÿthatÿillusrateÿusage
.ÿ*
.ÿ
.ÿ/*ÿSmithson'sÿexampleÿ(Google:ÿsmithsonÿcramerÿconfidenceÿinterval)ÿ*/
.ÿ
.ÿinputÿstr8ÿschoolÿbyte(countAÿcountBÿcountCÿcountD)

ÿÿÿÿÿÿÿÿschoolÿÿÿÿcountAÿÿÿÿcountBÿÿÿÿcountCÿÿÿÿcountD
ÿÿ1.ÿ"Private"ÿÿ6ÿ14ÿ17ÿÿ9
ÿÿ2.ÿ"Public"ÿÿ30ÿ32ÿ17ÿÿ3
ÿÿ3.ÿend

.ÿquietlyÿreshapeÿlongÿcount,ÿi(school)ÿj(grade)ÿstring

.ÿtabulateÿschoolÿgradeÿ[fweight=count],ÿVÿchi2

ÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿgrade
ÿÿÿÿschoolÿ|ÿÿÿÿÿÿÿÿÿAÿÿÿÿÿÿÿÿÿÿBÿÿÿÿÿÿÿÿÿÿCÿÿÿÿÿÿÿÿÿÿDÿ|ÿÿÿÿÿTotal
-----------+--------------------------------------------+----------
ÿÿÿPrivateÿ|ÿÿÿÿÿÿÿÿÿ6ÿÿÿÿÿÿÿÿÿ14ÿÿÿÿÿÿÿÿÿ17ÿÿÿÿÿÿÿÿÿÿ9ÿ|ÿÿÿÿÿÿÿÿ46ÿ
ÿÿÿÿPublicÿ|ÿÿÿÿÿÿÿÿ30ÿÿÿÿÿÿÿÿÿ32ÿÿÿÿÿÿÿÿÿ17ÿÿÿÿÿÿÿÿÿÿ3ÿ|ÿÿÿÿÿÿÿÿ82ÿ
-----------+--------------------------------------------+----------
ÿÿÿÿÿTotalÿ|ÿÿÿÿÿÿÿÿ36ÿÿÿÿÿÿÿÿÿ46ÿÿÿÿÿÿÿÿÿ34ÿÿÿÿÿÿÿÿÿ12ÿ|ÿÿÿÿÿÿÿ128ÿ

ÿÿÿÿÿÿÿÿÿÿPearsonÿchi2(3)ÿ=ÿÿ17.2858ÿÿÿPrÿ=ÿ0.001
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿCramér'sÿVÿ=ÿÿÿ0.3675

.ÿ
.ÿCramérCIÿ,ÿc(`r(chi2)')ÿn(`r(N)')ÿr(`r(r)')ÿc(`r(c)')

.ÿreturnÿlist

scalars:
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿr(ub)ÿ=ÿÿ.5448120990246206
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿr(lb)ÿ=ÿÿ.2233354242907042
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿr(V)ÿ=ÿÿ.3674852578354116

.ÿ
.ÿ/*ÿYano'sÿexampleÿhttps://rpubs.com/estopub/114408ÿ(scrollÿdownÿtoÿwhereÿheÿmentionsÿK.ÿY.'sÿblog)ÿ*/
.ÿ
.ÿimportÿdelimitedÿHairEyeColor.csv,ÿclear
(5ÿvars,ÿ32ÿobs)

.ÿ//ÿhttps://forge.scilab.org/index.php/p/rdataset/source/file/master/csv/datasets/HairEyeColor.csv
.ÿtabulateÿhairÿeyeÿ[fweight=freq],ÿVÿchi2

ÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿEye
ÿÿÿÿÿÿHairÿ|ÿÿÿÿÿÿBlueÿÿÿÿÿÿBrownÿÿÿÿÿÿGreenÿÿÿÿÿÿHazelÿ|ÿÿÿÿÿTotal
-----------+--------------------------------------------+----------
ÿÿÿÿÿBlackÿ|ÿÿÿÿÿÿÿÿ20ÿÿÿÿÿÿÿÿÿ68ÿÿÿÿÿÿÿÿÿÿ5ÿÿÿÿÿÿÿÿÿ15ÿ|ÿÿÿÿÿÿÿ108ÿ
ÿÿÿÿÿBlondÿ|ÿÿÿÿÿÿÿÿ94ÿÿÿÿÿÿÿÿÿÿ7ÿÿÿÿÿÿÿÿÿ16ÿÿÿÿÿÿÿÿÿ10ÿ|ÿÿÿÿÿÿÿ127ÿ
ÿÿÿÿÿBrownÿ|ÿÿÿÿÿÿÿÿ84ÿÿÿÿÿÿÿÿ119ÿÿÿÿÿÿÿÿÿ29ÿÿÿÿÿÿÿÿÿ54ÿ|ÿÿÿÿÿÿÿ286ÿ
ÿÿÿÿÿÿÿRedÿ|ÿÿÿÿÿÿÿÿ17ÿÿÿÿÿÿÿÿÿ26ÿÿÿÿÿÿÿÿÿ14ÿÿÿÿÿÿÿÿÿ14ÿ|ÿÿÿÿÿÿÿÿ71ÿ
-----------+--------------------------------------------+----------
ÿÿÿÿÿTotalÿ|ÿÿÿÿÿÿÿ215ÿÿÿÿÿÿÿÿ220ÿÿÿÿÿÿÿÿÿ64ÿÿÿÿÿÿÿÿÿ93ÿ|ÿÿÿÿÿÿÿ592ÿ

ÿÿÿÿÿÿÿÿÿÿPearsonÿchi2(9)ÿ=ÿ138.2898ÿÿÿPrÿ=ÿ0.000
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿCramér'sÿVÿ=ÿÿÿ0.2790

.ÿCramérCIÿ,ÿc(`r(chi2)')ÿn(`r(N)')ÿr(`r(r)')ÿc(`r(c)')

.ÿreturnÿlist

scalars:
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿr(ub)ÿ=ÿÿ.3258588669614554
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿr(lb)ÿ=ÿÿ.2345887776820346
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿr(V)ÿ=ÿÿ.2790446233426584

.ÿ
.ÿexit

endÿofÿdo-file

.

If you decide to use this method, then I recommend that you examine the method's coverage (help simulate) before relying on it. I recommend that regardless of whose method you end up using.
Attached Files

Gettys.do (1.6 KB, 1 view)
1 like
Comment
Ricky Gettys

Join Date: Jul 2016

Posts: 21
#11

08 Jul 2016, 13:58

Wow, that ado file seems to work great.
When I run the ado file and the bootstrap for the same data, I get the following:

Bootstrap Method:
Results (retr. in 1 minute): Cramer's V: 0.11235217 CI: [0.084647, 0.11407]

Ado Method:
Results (retrieved instantly): Cramer's V: 0.11235217 CI: [0.1072993, 0.14674927]

Both come with the same Cramer's V, but the bootstrap estimates the Cramer's V very close to the upper bound (after many different tests), while the ado file gives the Cramer's V very close to the lower bound. Any insights on to why this would be?
Comment

Announcement

How to display confidence intervals around Kramer's V

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment