How to compute simultaneous confidence intervals for multinomial proportions?

tsimonso

Join Date: Jul 2014

Posts: 3
#1

How to compute simultaneous confidence intervals for multinomial proportions?

07 Aug 2014, 03:09

Hello everybody,

I would like to use a multinomial variable in one sample of subjects to make inferences about the proportion of the population in each category, with confidence intervals.
Apparently there is a 'Multinomial CI' package in R (http://cran.at.r-project.org/web/pac...tinomialCI.pdf).
Do you know how to do the same in STATA?

Thank you very much in advance.

Thomas

Last edited by tsimonso; 07 Aug 2014, 03:20.
Tags: categorical
Bruce Weaver

Join Date: May 2014

Posts: 1165
#2

27 Jul 2017, 14:56

I'm resurrecting this old thread because I have the same question: Does anyone know of a Stata program to compute simultaneous confidence intervals for multinomial proportions?

Thanks,
Bruce

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
1 like
Comment
Gaby Kassler

Join Date: Apr 2016

Posts: 31
#3

11 Apr 2018, 20:32

Another polite "bump" from me.
I would also like to use simultaneous CIs for multinomial proportions without asking someone to do it in SAS or R for me.
I looked at the help section to find out if there were any packages/utilities I could use but the search only returned mlogit.
Unless I am missing something obvious like an option from a contingency table convenience command?

Thanks in advance for your help.
Gaby

Stata BE ver 17
MacOS Ventura
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2449
#4

11 Apr 2018, 21:28

Here's one hack that *might* help: You can run -mlogit y- without any predictors. Then, you can use -margins, predict- to get CIs based on parameters and standard errors estimated simultaneously. I can't figure out how to get -margins- to generate the prediction and CI for more than one outcome at a time, but you can do them serially:

Code:

sysuse auto mlogit rep78 margins, predict(pr outcome(1)) margins, predict(pr outcome(2))

I don't know if this will give a correct result, as I don't have anything with which to compare. And, at this small sample size, I wouldn't trust these results.
1 like
Comment
Gaby Kassler

Join Date: Apr 2016

Posts: 31
#5

15 Apr 2018, 08:39

Hello Mike,
Thanks for the advice - I tried it but up to now I haven't been able to find a way to verify the reliability of the results so I ended up using the procedure in SAS that Bruce Weaver suggested.
I am surprised that this has not been dealth with by Stata yet.
I think some people are creating indicator variables for each level of a category and calculating binomial CIs from each of these indicator variables.
I am hoping that this thread might come to the attention of the luminaries of Statalist!
Thanks again
Gaby

Stata BE ver 17
MacOS Ventura
Comment
Juan del Pozo

Join Date: Jul 2018

Posts: 17
#6

04 Sep 2018, 05:29

Mike, I think that what you are looking for is

Code:

margins, predict()

As you can see, the lower bound of the confidence interval for the first category is negative (which does not make sense). This is circumvented by the proportion command since it applies the logit transformation to the confidence intervals (and then transfoms it back to the original metric).

What I am curious is the following. We know that, given a multinomial variable, the outcome given by proportion for a given category k is exactly the same as what we would get by applying proportion to a binomial variable which is 1 if the observation is in the category k and 0 otherwise. E.g. for the category 2 of rep78 in the auto dataset we get

Code:

sysuse auto, clear gen rep78_2=(rep78==2) if !missing(rep78) proportion rep78, citype(logit) proportion rep78_2, citype(logit)

However, as some discussion suggests (see for example https://www.stata.com/statalist/arch.../msg00483.html), more acceptable confidence interval for binomial variables can be found by using ci instead (e.g. those under the wilson and agresti options which provide better coverage than that one using the exact formula for the binomial distribution). Hence, to the extent that the confidence intervals for each category of a multinomial variable reported by proportion command are the same that result running the command for each (binomial) category, and given that these later confidence intervals should be corrected, do the confidence intervals provided by proportion make sense at all? I have seen Gaby's suggestion floating around the internet, namely treating every category of the multinomial variable as a binomial variable per se and use the ci command with the desired method, but without making any reference to a paper or theory that says why this should be the a reasonable procedure (in the case it is reasonable). I hope someone from the Stata team can help us out on this one!

Thank you

Last edited by Juan del Pozo; 04 Sep 2018, 05:34. Reason: typo
Comment

Bruce Weaver

Join Date: May 2014
Posts: 1165

02 Jun 2023, 13:37

I am resurrecting this old thread again, because AFAICT, there is still no (easy) way to get simultaneous confidence intervals for multinomial proportions. The words simultaneous and multinomial appear nowhere in the documentation for -proportion-, for example. And the main resource that is cited appears to discuss binomial proportions only.

I also wanted to follow up on the example Juan del Pozo posted in #6 and to try Mike Lacy's -mlogit- suggestion in #4. Here's a summary of what I did.

Using the auto dataset, I generated CIs for each level of rep78 by 3 different methods: -ci proportions-, -proportion-, and -mlogit-.
I assumed -mlogit- + -margins- would produce a Wald CI, so used the wald option for both -ci proportions- and -proportion-.
I compared the proportions, SEs, and confidence bounds for the 3 methods.

My (somewhat clunky) code can be found at the end of this post. I ran it under v18, but it should work for v15 or later. Here are the main results and my summary of what they show.

Code:

. * Compare the estimated proportions  
. * p  = proportion from -ci proportions-
. * p1 = proportion from -proportion-
. * m1 = proportion from -mlogit- plus -margins-
. list p p1 m1 prange in 1/5

     +-----------------------------------------+
     |        p         p1         m1   prange |
     |-----------------------------------------|
  1. | .0289855   .0289855   .0289855        0 |
  2. |  .115942    .115942    .115942        0 |
  3. | .4347826   .4347826   .4347826        0 |
  4. | .2608696   .2608696   .2608696        0 |
  5. | .1594203   .1594203   .1594203        0 |
     +-----------------------------------------+

. * Compare the estimated SEs
. * se = SE from -ci proportions-
. * p2 = SE from -proportion-
. * m2 = SE from -mlogit- plus -margins-
. list se p2 m2 serange in 1/5

     +------------------------------------------+
     |       se         p2         m2   serange |
     |------------------------------------------|
  1. | .0201966   .0201966   .0201966         0 |
  2. | .0385422   .0385422   .0385422         0 |
  3. | .0596787   .0596787   .0596787         0 |
  4. | .0528625   .0528625   .0528625         0 |
  5. | .0440694   .0440694   .0440694         0 |
     +------------------------------------------+

. * Compare the estimated lower bounds  
. * lb = lower bound from -ci proportions-
. * p5 = lower bound from -proportion-
. * m5 = lower bound from -mlogit- plus -margins-
. list lb p5 m5 lb1v2 lb1v3 lb2v3 in 1/5

     +--------------------------------------------------------------------+
     |       lb          p5          m5      lb1v2      lb1v3       lb2v3 |
     |--------------------------------------------------------------------|
  1. |        0   -.0113162   -.0105991   .0113162   .0105991   -.0007171 |
  2. | .0404007    .0390323    .0404007   .0013684          0   -.0013684 |
  3. | .3178145    .3156956    .3178145   .0021189          0   -.0021189 |
  4. |  .157261    .1553841     .157261   .0018769          0   -.0018769 |
  5. | .0730459    .0714813    .0730459   .0015647          0   -.0015647 |
     +--------------------------------------------------------------------+

. * Compare the estimated upper bounds  
. * ub = upper bound from -ci proportions-
. * p6 = upper bound from -proportion-
. * m6 = upper bound from -mlogit- plus -margins-
. list ub p6 m6 ub1v2 ub1v3 ub2v3 in 1/5

     +---------------------------------------------------------------+
     |       ub         p6         m6       ub1v2   ub1v3      ub2v3 |
     |---------------------------------------------------------------|
  1. | .0685702   .0692872   .0685702   -.0007171       0   .0007171 |
  2. | .1914833   .1928518   .1914833   -.0013684       0   .0013684 |
  3. | .5517507   .5538696   .5517507   -.0021189       0   .0021189 |
  4. | .3644782    .366355   .3644782   -.0018769       0   .0018769 |
  5. | .2457946   .2473593   .2457946   -.0015647       0   .0015647 |
     +---------------------------------------------------------------+

.
. * SUMMARY
. * All 3 methods return exactly the same
. * estimated proportions and SEs.
. * Regarding CIs, -ci proportions-
. * and -mlogit- with -margins- return
. * the same CIs EXCEPT that -mlogit-
. * reports a lower bound < 0 for
. * rep78 = 1.  -proportions- returns
. * different limits.

Here is the code, in case anyone else wants to tinker.

Code:

* Follow up on Juan del Pozo's example in #6 here:
* https://www.statalist.org/forums/forum/general-stata-discussion/general/134445-how-to-compute-simultaneous-confidence-intervals-for-multinomial-proportions

* Code was generated using Stata 18.0,
* but should work with Stata 15 or higher
version 15
cls
sysuse auto, clear
keep foreign rep78
keep if !missing(foreign, rep78)
generate x = .
generate p = .
generate se = .
generate lb = .
generate ub = .
forvalues i = 1/5 {
 quietly {
 replace x = rep78==`i'
 ci proportions x, wald
 replace p = r(proportion) if _n==`i'
 replace se = r(se) if _n==`i'
 replace lb = r(lb) if _n==`i'
 replace ub = r(ub) if _n==`i'
 }
}
proportion rep78, citype(wald)
matrix p = r(table)'
svmat p

* Use -mlogit- as suggested by M Lacy
quietly mlogit rep78
margins, predict()
matrix m = r(table)'
svmat m

drop p3 p4 p7 p8 p9 m3 m4 m7 m8 m9

generate prange = max(p, p1, m1) - min(p, p1, m1)
generate serange = max(se, p2, m2) - min(se, p2, m2)
generate lbrange = max(lb, p5, m5) - min(lb, p5, m5)
generate ubrange = max(ub, p6, m6) - min(ub, p6, m6)

generate lb1v2 = lb-p5
generate lb1v3 = lb-m5
generate lb2v3 = p5-m5

generate ub1v2 = ub-p6
generate ub1v3 = ub-m6
generate ub2v3 = p6-m6

* Compare the estimated proportions  
* p  = proportion from -ci proportions-
* p1 = proportion from -proportion-
* m1 = proportion from -mlogit- plus -margins-
list p p1 m1 prange in 1/5
* Compare the estimated SEs
* se = SE from -ci proportions-
* p2 = SE from -proportion-
* m2 = SE from -mlogit- plus -margins-
list se p2 m2 serange in 1/5
* Compare the estimated lower bounds  
* lb = lower bound from -ci proportions-
* p5 = lower bound from -proportion-
* m5 = lower bound from -mlogit- plus -margins-
list lb p5 m5 lb1v2 lb1v3 lb2v3 in 1/5
* Compare the estimated upper bounds  
* ub = upper bound from -ci proportions-
* p6 = upper bound from -proportion-
* m6 = upper bound from -mlogit- plus -margins-
list ub p6 m6 ub1v2 ub1v3 ub2v3 in 1/5

* SUMMARY
* All 3 methods return exactly the same
* estimated proportions and SEs.
* Regarding CIs, -ci proportions-
* and -mlogit- with -margins- return
* the same CIs EXCEPT that -mlogit-
* reports a lower bound < 0 for
* rep78 = 1.  -proportions- returns
* different limits.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)

Comment

FernandoRios

Join Date: Apr 2014

Posts: 2534
#8

02 Jun 2023, 20:34

Hi Bruce
i have a suggestion based on simulations
see it here
https://friosavila.github.io/stata_do/stata_do5.html
3 likes
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4540
#9

03 Jun 2023, 18:49

Originally posted by Bruce Weaver View Post

I am resurrecting this old thread again, because AFAICT, there is still no (easy) way to get simultaneous confidence intervals for multinomial proportions.

Try the attached ado-file.

It implements the two methods that the SAS blogger, Rick Wicklin, calls “Goodman (1965)” and “Quesenberry and Hurst (1964)” in his post of an updated SAS algorithm for computing confidence intervals for multinomial proportions. He mentions that an earlier investigation indicates that the first of these methods be favored for the purpose in most cases.

There's no helpfile, but its syntax is

Code:

mpci varname [if] [in] [fweight], options

where options are

Code:

method( goodman | qandh ) level(#)

For this version the variable needs to be numeric, but remedying that wouldn't be difficult in a future version. The output could be prettied, too, although it does deliver the results in a return matrix, r(table) if you want to use them further.

It's an exercise in Mata class programming and so the code is a bit verbose, and it uses (simple) inheritance for the two methods when a template method pattern (composition) would have been cuter.

Here it is in action where it matches the SAS results for Goodman's method shown in the first link above. (I've also attached a rudimentary test do-file.)

.ÿ
.ÿversionÿ18.0

.ÿ
.ÿclearÿ*

.ÿ
.ÿinputÿstr20ÿcategoryÿintÿcount

ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿcategoryÿÿÿÿÿcount
ÿÿ1.ÿNeuroticÿÿÿÿÿÿÿÿÿÿÿÿÿÿ91
ÿÿ2.ÿDepressedÿÿÿÿÿÿÿÿÿÿÿÿÿ49
ÿÿ3.ÿSchizophrenicÿÿÿÿÿÿÿÿÿ37
ÿÿ4.ÿ"Personalityÿdisorder"ÿÿ43
ÿÿ5.ÿend

.ÿ
.ÿencodeÿcategory,ÿgenerate(cat)ÿlabel(Diagnoses)

.ÿlist,ÿnoobs

ÿÿ+-----------------------------------------------------+
ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿcategoryÿÿÿcountÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿcatÿ|
ÿÿ|-----------------------------------------------------|
ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿNeuroticÿÿÿÿÿÿ91ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNeuroticÿ|
ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿDepressedÿÿÿÿÿÿ49ÿÿÿÿÿÿÿÿÿÿÿÿÿÿDepressedÿ|
ÿÿ|ÿÿÿÿÿÿÿÿSchizophrenicÿÿÿÿÿÿ37ÿÿÿÿÿÿÿÿÿÿSchizophrenicÿ|
ÿÿ|ÿPersonalityÿdisorderÿÿÿÿÿÿ43ÿÿÿPersonalityÿdisorderÿ|
ÿÿ+-----------------------------------------------------+

.ÿ
.ÿ//ÿGoodman'sÿmethodÿ(theÿdefault)
.ÿmpciÿcatÿ[fweight=count]
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProp.ÿÿLBÿÿÿÿUB
ÿÿÿÿÿÿÿÿÿÿÿDepressedÿ0.2227ÿ0.1609ÿ0.2999
ÿÿÿÿÿÿÿÿÿÿÿÿNeuroticÿ0.4136ÿ0.3342ÿ0.4978
Personalityÿdisorderÿ0.1955ÿ0.1375ÿ0.2702
ÿÿÿÿÿÿÿSchizophrenicÿ0.1682ÿ0.1146ÿ0.2401

.ÿreturnÿlist

scalars:
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿr(N)ÿ=ÿÿ220
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿr(r)ÿ=ÿÿ4

matrices:
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿr(table)ÿ:ÿÿ4ÿxÿ3

.ÿmatrixÿlistÿr(table)

r(table)[4,3]
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿProportionÿÿÿÿÿÿÿÿÿÿlbÿÿÿÿÿÿÿÿÿÿub
ÿÿÿDepressedÿÿÿ.22272727ÿÿÿ.16085875ÿÿÿÿ.2998874
ÿÿÿÿNeuroticÿÿÿ.41363636ÿÿÿ.33420248ÿÿÿ.49783321
Personalit~rÿÿÿ.19545455ÿÿÿ.13746901ÿÿÿ.27023577
Schizophre~cÿÿÿ.16818182ÿÿÿ.11455134ÿÿÿ.24011208

.ÿ
.ÿ//ÿAlthernativeÿQuesenberryÿandÿHurstÿmethod
.ÿmpciÿcatÿ[fweight=count],ÿm(q)
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProp.ÿÿLBÿÿÿÿUB
ÿÿÿÿÿÿÿÿÿÿÿDepressedÿ0.2227ÿ0.1546ÿ0.3099
ÿÿÿÿÿÿÿÿÿÿÿÿNeuroticÿ0.4136ÿ0.3253ÿ0.5079
Personalityÿdisorderÿ0.1955ÿ0.1317ÿ0.2801
ÿÿÿÿÿÿÿSchizophrenicÿ0.1682ÿ0.1094ÿ0.2498

.ÿ
.ÿ//ÿAnotherÿexample
.ÿquietlyÿsysuseÿauto,ÿclear

.ÿ
.ÿmpciÿrep78
ÿÿProp.ÿÿLBÿÿÿÿUB
1ÿ0.0290ÿ0.0057ÿ0.1349
2ÿ0.1159ÿ0.0490ÿ0.2503
3ÿ0.4348ÿ0.2936ÿ0.5874
4ÿ0.2609ÿ0.1501ÿ0.4136
5ÿ0.1594ÿ0.0768ÿ0.3018

.ÿ
.ÿexit

endÿofÿdo-file

.
Attached Files

mpci.ado (4.3 KB, 1 view)

mpci_harness.do (497 Bytes, 1 view)

Last edited by Joseph Coveney; 03 Jun 2023, 18:59.
5 likes
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1165
#10

03 Jun 2023, 19:24

Thanks FernandoRios and Joseph Coveney.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment
Enrique Saint-Pierre

Join Date: Sep 2024

Posts: 1
#11

30 Sep 2024, 18:32

Hi:

The difference between -ci proportions- and -proportion- is because the former uses normal distribution for the critical value whereas the later uses Student's t distribution.
Comment

Announcement