Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to compute simultaneous confidence intervals for multinomial proportions?

    Hello everybody,

    I would like to use a multinomial variable in one sample of subjects to make inferences about the proportion of the population in each category, with confidence intervals.
    Apparently there is a 'Multinomial CI' package in R (http://cran.at.r-project.org/web/pac...tinomialCI.pdf).
    Do you know how to do the same in STATA?

    Thank you very much in advance.

    Thomas
    Last edited by tsimonso; 07 Aug 2014, 03:20.

  • #2
    I'm resurrecting this old thread because I have the same question: Does anyone know of a Stata program to compute simultaneous confidence intervals for multinomial proportions?

    Thanks,
    Bruce
    --
    Bruce Weaver
    Email: [email protected]
    Version: Stata/MP 18.5 (Windows)

    Comment


    • #3
      Another polite "bump" from me.
      I would also like to use simultaneous CIs for multinomial proportions without asking someone to do it in SAS or R for me.
      I looked at the help section to find out if there were any packages/utilities I could use but the search only returned mlogit.
      Unless I am missing something obvious like an option from a contingency table convenience command?

      Thanks in advance for your help.
      Gaby
      Stata BE ver 17
      MacOS Ventura

      Comment


      • #4
        Here's one hack that *might* help: You can run -mlogit y- without any predictors. Then, you can use -margins, predict- to get CIs based on parameters and standard errors estimated simultaneously. I can't figure out how to get -margins- to generate the prediction and CI for more than one outcome at a time, but you can do them serially:
        Code:
        sysuse auto
        mlogit rep78
        margins, predict(pr outcome(1))
        margins, predict(pr outcome(2))
        I don't know if this will give a correct result, as I don't have anything with which to compare. And, at this small sample size, I wouldn't trust these results.

        Comment


        • #5
          Hello Mike,
          Thanks for the advice - I tried it but up to now I haven't been able to find a way to verify the reliability of the results so I ended up using the procedure in SAS that Bruce Weaver suggested.
          I am surprised that this has not been dealth with by Stata yet.
          I think some people are creating indicator variables for each level of a category and calculating binomial CIs from each of these indicator variables.
          I am hoping that this thread might come to the attention of the luminaries of Statalist!
          Thanks again
          Gaby
          Stata BE ver 17
          MacOS Ventura

          Comment


          • #6
            Mike, I think that what you are looking for is
            Code:
            margins, predict()
            As you can see, the lower bound of the confidence interval for the first category is negative (which does not make sense). This is circumvented by the proportion command since it applies the logit transformation to the confidence intervals (and then transfoms it back to the original metric).

            What I am curious is the following. We know that, given a multinomial variable, the outcome given by proportion for a given category k is exactly the same as what we would get by applying proportion to a binomial variable which is 1 if the observation is in the category k and 0 otherwise. E.g. for the category 2 of rep78 in the auto dataset we get
            Code:
            sysuse auto, clear
            gen rep78_2=(rep78==2) if !missing(rep78)
            proportion rep78, citype(logit)
            proportion rep78_2, citype(logit)
            However, as some discussion suggests (see for example https://www.stata.com/statalist/arch.../msg00483.html), more acceptable confidence interval for binomial variables can be found by using ci instead (e.g. those under the wilson and agresti options which provide better coverage than that one using the exact formula for the binomial distribution). Hence, to the extent that the confidence intervals for each category of a multinomial variable reported by proportion command are the same that result running the command for each (binomial) category, and given that these later confidence intervals should be corrected, do the confidence intervals provided by proportion make sense at all? I have seen Gaby's suggestion floating around the internet, namely treating every category of the multinomial variable as a binomial variable per se and use the ci command with the desired method, but without making any reference to a paper or theory that says why this should be the a reasonable procedure (in the case it is reasonable). I hope someone from the Stata team can help us out on this one!

            Thank you
            Last edited by Juan del Pozo; 04 Sep 2018, 05:34. Reason: typo

            Comment


            • #7
              I am resurrecting this old thread again, because AFAICT, there is still no (easy) way to get simultaneous confidence intervals for multinomial proportions. The words simultaneous and multinomial appear nowhere in the documentation for -proportion-, for example. And the main resource that is cited appears to discuss binomial proportions only.

              I also wanted to follow up on the example Juan del Pozo posted in #6 and to try Mike Lacy's -mlogit- suggestion in #4. Here's a summary of what I did.
              • Using the auto dataset, I generated CIs for each level of rep78 by 3 different methods: -ci proportions-, -proportion-, and -mlogit-.
              • I assumed -mlogit- + -margins- would produce a Wald CI, so used the wald option for both -ci proportions- and -proportion-.
              • I compared the proportions, SEs, and confidence bounds for the 3 methods.
              My (somewhat clunky) code can be found at the end of this post. I ran it under v18, but it should work for v15 or later. Here are the main results and my summary of what they show.

              Code:
              . * Compare the estimated proportions  
              . * p  = proportion from -ci proportions-
              . * p1 = proportion from -proportion-
              . * m1 = proportion from -mlogit- plus -margins-
              . list p p1 m1 prange in 1/5
              
                   +-----------------------------------------+
                   |        p         p1         m1   prange |
                   |-----------------------------------------|
                1. | .0289855   .0289855   .0289855        0 |
                2. |  .115942    .115942    .115942        0 |
                3. | .4347826   .4347826   .4347826        0 |
                4. | .2608696   .2608696   .2608696        0 |
                5. | .1594203   .1594203   .1594203        0 |
                   +-----------------------------------------+
              
              . * Compare the estimated SEs
              . * se = SE from -ci proportions-
              . * p2 = SE from -proportion-
              . * m2 = SE from -mlogit- plus -margins-
              . list se p2 m2 serange in 1/5
              
                   +------------------------------------------+
                   |       se         p2         m2   serange |
                   |------------------------------------------|
                1. | .0201966   .0201966   .0201966         0 |
                2. | .0385422   .0385422   .0385422         0 |
                3. | .0596787   .0596787   .0596787         0 |
                4. | .0528625   .0528625   .0528625         0 |
                5. | .0440694   .0440694   .0440694         0 |
                   +------------------------------------------+
              
              . * Compare the estimated lower bounds  
              . * lb = lower bound from -ci proportions-
              . * p5 = lower bound from -proportion-
              . * m5 = lower bound from -mlogit- plus -margins-
              . list lb p5 m5 lb1v2 lb1v3 lb2v3 in 1/5
              
                   +--------------------------------------------------------------------+
                   |       lb          p5          m5      lb1v2      lb1v3       lb2v3 |
                   |--------------------------------------------------------------------|
                1. |        0   -.0113162   -.0105991   .0113162   .0105991   -.0007171 |
                2. | .0404007    .0390323    .0404007   .0013684          0   -.0013684 |
                3. | .3178145    .3156956    .3178145   .0021189          0   -.0021189 |
                4. |  .157261    .1553841     .157261   .0018769          0   -.0018769 |
                5. | .0730459    .0714813    .0730459   .0015647          0   -.0015647 |
                   +--------------------------------------------------------------------+
              
              . * Compare the estimated upper bounds  
              . * ub = upper bound from -ci proportions-
              . * p6 = upper bound from -proportion-
              . * m6 = upper bound from -mlogit- plus -margins-
              . list ub p6 m6 ub1v2 ub1v3 ub2v3 in 1/5
              
                   +---------------------------------------------------------------+
                   |       ub         p6         m6       ub1v2   ub1v3      ub2v3 |
                   |---------------------------------------------------------------|
                1. | .0685702   .0692872   .0685702   -.0007171       0   .0007171 |
                2. | .1914833   .1928518   .1914833   -.0013684       0   .0013684 |
                3. | .5517507   .5538696   .5517507   -.0021189       0   .0021189 |
                4. | .3644782    .366355   .3644782   -.0018769       0   .0018769 |
                5. | .2457946   .2473593   .2457946   -.0015647       0   .0015647 |
                   +---------------------------------------------------------------+
              
              .
              . * SUMMARY
              . * All 3 methods return exactly the same
              . * estimated proportions and SEs.
              . * Regarding CIs, -ci proportions-
              . * and -mlogit- with -margins- return
              . * the same CIs EXCEPT that -mlogit-
              . * reports a lower bound < 0 for
              . * rep78 = 1.  -proportions- returns
              . * different limits.

              Here is the code, in case anyone else wants to tinker.

              Code:
              * Follow up on Juan del Pozo's example in #6 here:
              * https://www.statalist.org/forums/forum/general-stata-discussion/general/134445-how-to-compute-simultaneous-confidence-intervals-for-multinomial-proportions
              
              * Code was generated using Stata 18.0,
              * but should work with Stata 15 or higher
              version 15
              cls
              sysuse auto, clear
              keep foreign rep78
              keep if !missing(foreign, rep78)
              generate x = .
              generate p = .
              generate se = .
              generate lb = .
              generate ub = .
              forvalues i = 1/5 {
               quietly {
               replace x = rep78==`i'
               ci proportions x, wald
               replace p = r(proportion) if _n==`i'
               replace se = r(se) if _n==`i'
               replace lb = r(lb) if _n==`i'
               replace ub = r(ub) if _n==`i'
               }
              }
              proportion rep78, citype(wald)
              matrix p = r(table)'
              svmat p
              
              * Use -mlogit- as suggested by M Lacy
              quietly mlogit rep78
              margins, predict()
              matrix m = r(table)'
              svmat m
              
              drop p3 p4 p7 p8 p9 m3 m4 m7 m8 m9
              
              generate prange = max(p, p1, m1) - min(p, p1, m1)
              generate serange = max(se, p2, m2) - min(se, p2, m2)
              generate lbrange = max(lb, p5, m5) - min(lb, p5, m5)
              generate ubrange = max(ub, p6, m6) - min(ub, p6, m6)
              
              generate lb1v2 = lb-p5
              generate lb1v3 = lb-m5
              generate lb2v3 = p5-m5
              
              generate ub1v2 = ub-p6
              generate ub1v3 = ub-m6
              generate ub2v3 = p6-m6
              
              * Compare the estimated proportions  
              * p  = proportion from -ci proportions-
              * p1 = proportion from -proportion-
              * m1 = proportion from -mlogit- plus -margins-
              list p p1 m1 prange in 1/5
              * Compare the estimated SEs
              * se = SE from -ci proportions-
              * p2 = SE from -proportion-
              * m2 = SE from -mlogit- plus -margins-
              list se p2 m2 serange in 1/5
              * Compare the estimated lower bounds  
              * lb = lower bound from -ci proportions-
              * p5 = lower bound from -proportion-
              * m5 = lower bound from -mlogit- plus -margins-
              list lb p5 m5 lb1v2 lb1v3 lb2v3 in 1/5
              * Compare the estimated upper bounds  
              * ub = upper bound from -ci proportions-
              * p6 = upper bound from -proportion-
              * m6 = upper bound from -mlogit- plus -margins-
              list ub p6 m6 ub1v2 ub1v3 ub2v3 in 1/5
              
              * SUMMARY
              * All 3 methods return exactly the same
              * estimated proportions and SEs.
              * Regarding CIs, -ci proportions-
              * and -mlogit- with -margins- return
              * the same CIs EXCEPT that -mlogit-
              * reports a lower bound < 0 for
              * rep78 = 1.  -proportions- returns
              * different limits.
              --
              Bruce Weaver
              Email: [email protected]
              Version: Stata/MP 18.5 (Windows)

              Comment


              • #8
                Hi Bruce
                i have a suggestion based on simulations
                see it here
                https://friosavila.github.io/stata_do/stata_do5.html

                Comment


                • #9
                  Originally posted by Bruce Weaver View Post
                  I am resurrecting this old thread again, because AFAICT, there is still no (easy) way to get simultaneous confidence intervals for multinomial proportions.
                  Try the attached ado-file.

                  It implements the two methods that the SAS blogger, Rick Wicklin, calls “Goodman (1965)” and “Quesenberry and Hurst (1964)” in his post of an updated SAS algorithm for computing confidence intervals for multinomial proportions. He mentions that an earlier investigation indicates that the first of these methods be favored for the purpose in most cases.

                  There's no helpfile, but its syntax is
                  Code:
                  mpci varname [if] [in] [fweight], options
                  where options are
                  Code:
                  method( goodman | qandh ) level(#)
                  For this version the variable needs to be numeric, but remedying that wouldn't be difficult in a future version. The output could be prettied, too, although it does deliver the results in a return matrix, r(table) if you want to use them further.

                  It's an exercise in Mata class programming and so the code is a bit verbose, and it uses (simple) inheritance for the two methods when a template method pattern (composition) would have been cuter.

                  Here it is in action where it matches the SAS results for Goodman's method shown in the first link above. (I've also attached a rudimentary test do-file.)

                  .ÿ
                  .ÿversionÿ18.0

                  .ÿ
                  .ÿclearÿ*

                  .ÿ
                  .ÿinputÿstr20ÿcategoryÿintÿcount

                  ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿcategoryÿÿÿÿÿcount
                  ÿÿ1.ÿNeuroticÿÿÿÿÿÿÿÿÿÿÿÿÿÿ91
                  ÿÿ2.ÿDepressedÿÿÿÿÿÿÿÿÿÿÿÿÿ49
                  ÿÿ3.ÿSchizophrenicÿÿÿÿÿÿÿÿÿ37
                  ÿÿ4.ÿ"Personalityÿdisorder"ÿÿ43
                  ÿÿ5.ÿend

                  .ÿ
                  .ÿencodeÿcategory,ÿgenerate(cat)ÿlabel(Diagnoses)

                  .ÿlist,ÿnoobs

                  ÿÿ+-----------------------------------------------------+
                  ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿcategoryÿÿÿcountÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿcatÿ|
                  ÿÿ|-----------------------------------------------------|
                  ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿNeuroticÿÿÿÿÿÿ91ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNeuroticÿ|
                  ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿDepressedÿÿÿÿÿÿ49ÿÿÿÿÿÿÿÿÿÿÿÿÿÿDepressedÿ|
                  ÿÿ|ÿÿÿÿÿÿÿÿSchizophrenicÿÿÿÿÿÿ37ÿÿÿÿÿÿÿÿÿÿSchizophrenicÿ|
                  ÿÿ|ÿPersonalityÿdisorderÿÿÿÿÿÿ43ÿÿÿPersonalityÿdisorderÿ|
                  ÿÿ+-----------------------------------------------------+

                  .ÿ
                  .ÿ//ÿGoodman'sÿmethodÿ(theÿdefault)
                  .ÿmpciÿcatÿ[fweight=count]
                  ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProp.ÿÿLBÿÿÿÿUB
                  ÿÿÿÿÿÿÿÿÿÿÿDepressedÿ0.2227ÿ0.1609ÿ0.2999
                  ÿÿÿÿÿÿÿÿÿÿÿÿNeuroticÿ0.4136ÿ0.3342ÿ0.4978
                  Personalityÿdisorderÿ0.1955ÿ0.1375ÿ0.2702
                  ÿÿÿÿÿÿÿSchizophrenicÿ0.1682ÿ0.1146ÿ0.2401

                  .ÿreturnÿlist

                  scalars:
                  ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿr(N)ÿ=ÿÿ220
                  ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿr(r)ÿ=ÿÿ4

                  matrices:
                  ÿÿÿÿÿÿÿÿÿÿÿÿÿÿr(table)ÿ:ÿÿ4ÿxÿ3

                  .ÿmatrixÿlistÿr(table)

                  r(table)[4,3]
                  ÿÿÿÿÿÿÿÿÿÿÿÿÿÿProportionÿÿÿÿÿÿÿÿÿÿlbÿÿÿÿÿÿÿÿÿÿub
                  ÿÿÿDepressedÿÿÿ.22272727ÿÿÿ.16085875ÿÿÿÿ.2998874
                  ÿÿÿÿNeuroticÿÿÿ.41363636ÿÿÿ.33420248ÿÿÿ.49783321
                  Personalit~rÿÿÿ.19545455ÿÿÿ.13746901ÿÿÿ.27023577
                  Schizophre~cÿÿÿ.16818182ÿÿÿ.11455134ÿÿÿ.24011208

                  .ÿ
                  .ÿ//ÿAlthernativeÿQuesenberryÿandÿHurstÿmethod
                  .ÿmpciÿcatÿ[fweight=count],ÿm(q)
                  ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProp.ÿÿLBÿÿÿÿUB
                  ÿÿÿÿÿÿÿÿÿÿÿDepressedÿ0.2227ÿ0.1546ÿ0.3099
                  ÿÿÿÿÿÿÿÿÿÿÿÿNeuroticÿ0.4136ÿ0.3253ÿ0.5079
                  Personalityÿdisorderÿ0.1955ÿ0.1317ÿ0.2801
                  ÿÿÿÿÿÿÿSchizophrenicÿ0.1682ÿ0.1094ÿ0.2498

                  .ÿ
                  .ÿ//ÿAnotherÿexample
                  .ÿquietlyÿsysuseÿauto,ÿclear

                  .ÿ
                  .ÿmpciÿrep78
                  ÿÿProp.ÿÿLBÿÿÿÿUB
                  1ÿ0.0290ÿ0.0057ÿ0.1349
                  2ÿ0.1159ÿ0.0490ÿ0.2503
                  3ÿ0.4348ÿ0.2936ÿ0.5874
                  4ÿ0.2609ÿ0.1501ÿ0.4136
                  5ÿ0.1594ÿ0.0768ÿ0.3018

                  .ÿ
                  .ÿexit

                  endÿofÿdo-file


                  .
                  Attached Files
                  Last edited by Joseph Coveney; 03 Jun 2023, 18:59.

                  Comment


                  • #10
                    Thanks FernandoRios and Joseph Coveney.
                    --
                    Bruce Weaver
                    Email: [email protected]
                    Version: Stata/MP 18.5 (Windows)

                    Comment


                    • #11
                      Hi:

                      The difference between -ci proportions- and -proportion- is because the former uses normal distribution for the critical value whereas the later uses Student's t distribution.

                      Comment

                      Working...
                      X