Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to display confidence intervals around Kramer's V

    I am using Kramer's V to compare voting records and income levels in select countries across a period of 20 years. The survey data varies widely, with differing numbers of rows and columns.

    How can I get confidence intervals for Kramer's V in Stata?

    the base command is
    Code:
    tab [var1] [var2], V
    but since I want to compare the statistics, it would be more helpful if I could get a confidence interval for each output.

    the closest thing I could find was
    Code:
    pwcorr [var1] [var2], sig
    but this is inadequate because I think I need to use Kramer's V for the number of rows and columns I am using (2 rows, 2-9 columns)
    I would be happy if I could just get a standard error for Kramer's V, but Confidence Interval is of course preferable.

    Any suggestions?
    Last edited by Ricky Gettys; 05 Jul 2016, 14:45.

  • #2
    Harald Cramér was the person in question.


    https://en.wikipedia.org/wiki/Harald_Cram%C3%A9r

    I'd bootstrap:


    Code:
    sysuse auto, clear
    tab foreign rep78, V
    ret li
    bootstrap V = r(CramersV), reps(10000) nodots : tab foreign rep78, V
    estat bootstrap

    Comment


    • #3
      So just to walk through the code...
      Code:
      ret li
      is a return list (http://www.stata.com/help.cgi?return) which gets the results from the tab foreign rep78, V command?
      Code:
      bootstrap V = r(CramersV), reps(10000) nodots : tab foreign rep78, V
      bootstrap (http://www.stata.com/manuals13/rbootstrap.pdf) I think I get, there are some syntactical things that I am not picking up from the instructions like why you used the V after the bootstrap command, the r(CramersV) and the nodots option. The :tab foreign rep78, V just means that the bootstrap will replicate that command before providing statistics?
      Code:
      estat bootstrap
      is the actual command to provide statistics.

      Sorry, I am going for thoroughness as I will need to explain this to a few other coders.

      Comment


      • #4
        Yes indeed. bootstrap is calling tabulate repeatedly with different samples.

        bootstrap needs to know which part of the output to use and tabulate won't calculate the statistic you want unless you specify the corresponding option.

        Comment


        • #5
          Using this method, with a sample of 1000 instead of 10000, it takes about 30 seconds for each bootstrap command to load (when I did it with 10000 reps, it took much longer). Any advice on finding the right balance between robustness(number of reps) and saving time(and money)?
          Last edited by Ricky Gettys; 06 Jul 2016, 10:26.

          Comment


          • #6
            I am of the generation for whom, when learning, a quick job came back in 3 hours and for a long job you were subject to the caprice of the operators and it might be a week before your card deck [NB] was loaded. (You didn't even get to see the computer.)

            Seriously, no, I can't judge any trade-off between accuracy, time and money but my own. But I'd factor in how much you care about getting it right. If you want to publish these results, many reviewers would carp at low numbers of replications. If it's worth publishing, it's worth doing it properly. I can't guess how fragile this measure is for your data.

            Comment


            • #7
              A Google search for "sampling distribution of Cramer's V" turns up http://www.people.vcu.edu/~pdattalo/...inalAssoc.html, which in turn claims that Cramer's V has a known sampling distribution and references pages 15-16 in Liebetrau, Albert M. (1983). Measures of association. Newbury Park, CA: Sage Publications. Quantitative Applications in the Social Sciences Series No. 32.

              I have neither access to that publication nor the time to look into it if I did. But I bring it to your attention because this seems to imply that the standard error of Cramer's V can be calculated from a formula, saving you the time and trouble of bootstrapping.


              I hope this helps.

              Comment


              • #8
                I didn't want to sound ungrateful for the bootstrap help, it is a very useful tool indeed! Thank you. I might as well prod around for other answers just in case, so others can look at the forum and save time in their projects. (And since my dataset is so large, small time reductions can save me hundreds of dollars)

                As for the formula, as soon as I am able to find it, in theory I would be able to use values from the ret li command and then insert them?

                Comment


                • #9
                  As for the formula, as soon as I am able to find it, in theory I would be able to use values from the ret li command and then insert them?
                  I can't answer that, as I don't know what the formula is. But presumably it can be calculated either from the returned statistics or from numbers in the cells of the cross-tabulation (which you can capture in a matrix using the -matcell()- option of -tab-.) Whether this will be simple or difficult, I have no idea.

                  Comment


                  • #10
                    The attached do-file (output shown below) computes confidence intervals for Cramér's V. The method is from Michael Smithson's Confidence Intervals Thousand Oaks, Calif.: Sage, 2003. pp.39–41. The do-file contains a Stata program (adapted from Smithson's SAS programs made available on his book's companion website) that takes the output of tabulate , chi2 and computes the confidence bounds. The do-file illustrates its use with a couple of worked examples (one from Smithson's book and another from the Web). The program could be put into an ado-file that acts as a wrapper to call tabulate to make the process seemless. The computation is straightforward, but could be condensed even more by implementing it in Mata within the ado-file.

                    .ÿversionÿ14.1

                    .ÿ
                    .ÿclearÿ*

                    .ÿsetÿmoreÿoff

                    .ÿ
                    .ÿ/*ÿAdaptedÿfromÿhttp://psychology3.anu.edu.au/people/smithson/details/CIstuff/CramersV.sas
                    >ÿÿÿÿandÿhttp://psychology3.anu.edu.au/people/smithson/details/CIstuff/Noncchi.sasÿ*/
                    .ÿprogramÿdefineÿCramérCI,ÿrclass
                    ÿÿ1.ÿÿÿÿÿÿÿÿÿversionÿ14.1
                    ÿÿ2.ÿÿÿÿÿÿÿÿÿsyntaxÿ,ÿChi(real)ÿN(integer)ÿRows(integer)ÿColumns(integer)ÿ[Level(realÿ`c(level)')]
                    ÿÿ3.ÿ
                    .ÿÿÿÿÿÿÿÿÿlocalÿrÿ=ÿ`rows'ÿ-ÿ1
                    ÿÿ4.ÿÿÿÿÿÿÿÿÿlocalÿcÿ=ÿ`columns'ÿ-ÿ1
                    ÿÿ5.ÿÿÿÿÿÿÿÿÿlocalÿdfÿ=ÿ`r'ÿ*ÿ`c'
                    ÿÿ6.ÿÿÿÿÿÿÿÿÿlocalÿpÿ=ÿ(1ÿ-ÿ`level'ÿ/ÿ100)ÿ/ÿ2
                    ÿÿ7.ÿ
                    .ÿÿÿÿÿÿÿÿÿtempnameÿncploÿncphi
                    ÿÿ8.ÿÿÿÿÿÿÿÿÿscalarÿdefineÿ`ncplo'ÿ=ÿnpnchi2(`df',ÿ`chi',ÿ1-`p')
                    ÿÿ9.ÿÿÿÿÿÿÿÿÿscalarÿdefineÿ`ncphi'ÿ=ÿnpnchi2(`df',ÿ`chi',ÿ`p')
                    ÿ10.ÿ
                    .ÿÿÿÿÿÿÿÿÿlocalÿnkÿ=ÿ`n'ÿ*ÿmin(`r',ÿ`c')
                    ÿ11.ÿ
                    .ÿÿÿÿÿÿÿÿÿreturnÿscalarÿVÿ=ÿsqrt(`chi'ÿ/ÿ`nk')
                    ÿ12.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿlbÿ=ÿsqrt(ÿ(cond(mi(`ncplo'),ÿ0,ÿ`ncplo')ÿ+ÿ`df')ÿ/ÿ`nk'ÿ)
                    ÿ13.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿubÿ=ÿsqrt(ÿ(cond(mi(`ncphi'),ÿ0,ÿ`ncphi')ÿ+ÿ`df')ÿ/ÿ`nk'ÿ)
                    ÿ14.ÿend

                    .ÿ
                    .ÿ*
                    .ÿ*ÿWorkedÿexamplesÿthatÿillusrateÿusage
                    .ÿ*
                    .ÿ
                    .ÿ/*ÿSmithson'sÿexampleÿ(Google:ÿsmithsonÿcramerÿconfidenceÿinterval)ÿ*/
                    .ÿ
                    .ÿinputÿstr8ÿschoolÿbyte(countAÿcountBÿcountCÿcountD)

                    ÿÿÿÿÿÿÿÿschoolÿÿÿÿcountAÿÿÿÿcountBÿÿÿÿcountCÿÿÿÿcountD
                    ÿÿ1.ÿ"Private"ÿÿ6ÿ14ÿ17ÿÿ9
                    ÿÿ2.ÿ"Public"ÿÿ30ÿ32ÿ17ÿÿ3
                    ÿÿ3.ÿend

                    .ÿquietlyÿreshapeÿlongÿcount,ÿi(school)ÿj(grade)ÿstring

                    .ÿtabulateÿschoolÿgradeÿ[fweight=count],ÿVÿchi2

                    ÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿgrade
                    ÿÿÿÿschoolÿ|ÿÿÿÿÿÿÿÿÿAÿÿÿÿÿÿÿÿÿÿBÿÿÿÿÿÿÿÿÿÿCÿÿÿÿÿÿÿÿÿÿDÿ|ÿÿÿÿÿTotal
                    -----------+--------------------------------------------+----------
                    ÿÿÿPrivateÿ|ÿÿÿÿÿÿÿÿÿ6ÿÿÿÿÿÿÿÿÿ14ÿÿÿÿÿÿÿÿÿ17ÿÿÿÿÿÿÿÿÿÿ9ÿ|ÿÿÿÿÿÿÿÿ46ÿ
                    ÿÿÿÿPublicÿ|ÿÿÿÿÿÿÿÿ30ÿÿÿÿÿÿÿÿÿ32ÿÿÿÿÿÿÿÿÿ17ÿÿÿÿÿÿÿÿÿÿ3ÿ|ÿÿÿÿÿÿÿÿ82ÿ
                    -----------+--------------------------------------------+----------
                    ÿÿÿÿÿTotalÿ|ÿÿÿÿÿÿÿÿ36ÿÿÿÿÿÿÿÿÿ46ÿÿÿÿÿÿÿÿÿ34ÿÿÿÿÿÿÿÿÿ12ÿ|ÿÿÿÿÿÿÿ128ÿ

                    ÿÿÿÿÿÿÿÿÿÿPearsonÿchi2(3)ÿ=ÿÿ17.2858ÿÿÿPrÿ=ÿ0.001
                    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿCramér'sÿVÿ=ÿÿÿ0.3675

                    .ÿ
                    .ÿCramérCIÿ,ÿc(`r(chi2)')ÿn(`r(N)')ÿr(`r(r)')ÿc(`r(c)')

                    .ÿreturnÿlist

                    scalars:
                    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿr(ub)ÿ=ÿÿ.5448120990246206
                    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿr(lb)ÿ=ÿÿ.2233354242907042
                    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿr(V)ÿ=ÿÿ.3674852578354116

                    .ÿ
                    .ÿ/*ÿYano'sÿexampleÿhttps://rpubs.com/estopub/114408ÿ(scrollÿdownÿtoÿwhereÿheÿmentionsÿK.ÿY.'sÿblog)ÿ*/
                    .ÿ
                    .ÿimportÿdelimitedÿHairEyeColor.csv,ÿclear
                    (5ÿvars,ÿ32ÿobs)

                    .ÿ//ÿhttps://forge.scilab.org/index.php/p/rdataset/source/file/master/csv/datasets/HairEyeColor.csv
                    .ÿtabulateÿhairÿeyeÿ[fweight=freq],ÿVÿchi2

                    ÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿEye
                    ÿÿÿÿÿÿHairÿ|ÿÿÿÿÿÿBlueÿÿÿÿÿÿBrownÿÿÿÿÿÿGreenÿÿÿÿÿÿHazelÿ|ÿÿÿÿÿTotal
                    -----------+--------------------------------------------+----------
                    ÿÿÿÿÿBlackÿ|ÿÿÿÿÿÿÿÿ20ÿÿÿÿÿÿÿÿÿ68ÿÿÿÿÿÿÿÿÿÿ5ÿÿÿÿÿÿÿÿÿ15ÿ|ÿÿÿÿÿÿÿ108ÿ
                    ÿÿÿÿÿBlondÿ|ÿÿÿÿÿÿÿÿ94ÿÿÿÿÿÿÿÿÿÿ7ÿÿÿÿÿÿÿÿÿ16ÿÿÿÿÿÿÿÿÿ10ÿ|ÿÿÿÿÿÿÿ127ÿ
                    ÿÿÿÿÿBrownÿ|ÿÿÿÿÿÿÿÿ84ÿÿÿÿÿÿÿÿ119ÿÿÿÿÿÿÿÿÿ29ÿÿÿÿÿÿÿÿÿ54ÿ|ÿÿÿÿÿÿÿ286ÿ
                    ÿÿÿÿÿÿÿRedÿ|ÿÿÿÿÿÿÿÿ17ÿÿÿÿÿÿÿÿÿ26ÿÿÿÿÿÿÿÿÿ14ÿÿÿÿÿÿÿÿÿ14ÿ|ÿÿÿÿÿÿÿÿ71ÿ
                    -----------+--------------------------------------------+----------
                    ÿÿÿÿÿTotalÿ|ÿÿÿÿÿÿÿ215ÿÿÿÿÿÿÿÿ220ÿÿÿÿÿÿÿÿÿ64ÿÿÿÿÿÿÿÿÿ93ÿ|ÿÿÿÿÿÿÿ592ÿ

                    ÿÿÿÿÿÿÿÿÿÿPearsonÿchi2(9)ÿ=ÿ138.2898ÿÿÿPrÿ=ÿ0.000
                    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿCramér'sÿVÿ=ÿÿÿ0.2790

                    .ÿCramérCIÿ,ÿc(`r(chi2)')ÿn(`r(N)')ÿr(`r(r)')ÿc(`r(c)')

                    .ÿreturnÿlist

                    scalars:
                    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿr(ub)ÿ=ÿÿ.3258588669614554
                    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿr(lb)ÿ=ÿÿ.2345887776820346
                    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿr(V)ÿ=ÿÿ.2790446233426584

                    .ÿ
                    .ÿexit

                    endÿofÿdo-file


                    .


                    If you decide to use this method, then I recommend that you examine the method's coverage (help simulate) before relying on it. I recommend that regardless of whose method you end up using.
                    Attached Files

                    Comment


                    • #11
                      Wow, that ado file seems to work great.
                      When I run the ado file and the bootstrap for the same data, I get the following:

                      Bootstrap Method:
                      Results (retr. in 1 minute): Cramer's V: 0.11235217 CI: [0.084647, 0.11407]

                      Ado Method:
                      Results (retrieved instantly): Cramer's V: 0.11235217 CI: [0.1072993, 0.14674927]

                      Both come with the same Cramer's V, but the bootstrap estimates the Cramer's V very close to the upper bound (after many different tests), while the ado file gives the Cramer's V very close to the lower bound. Any insights on to why this would be?

                      Comment

                      Working...
                      X