How to estimate the 95%CI of Absolute Difference Between Two Independent Proportions

Qiguo Lian

Join Date: May 2014
Posts: 32

How to estimate the 95%CI of Absolute Difference Between Two Independent Proportions

11 Aug 2025, 08:10

I have two variable (0= negative, 1=positive) generated from two different Guidelines on HP, one is old published in 2021, and the other is updated in last year.

I want to estimated the Absolute Difference of Two Independent Proportions with the 95%CI as the Tables in JAMA Network Open（https://jamanetwork.com/journals/jam...cle/2687386）. How can I do it efficiently?

Click image for larger version

Name: 220950_856.png
Views: 1
Size: 248.5 KB
ID: 1780877

I tried the commands:

HTML Code:

proportion hp2024, cformat(%9.4f)
scalar prop_2024 = r(table)["b",2]
scalar se2024 = r(table)["se", 2]

proportion hp2001, cformat(%9.4f)
scalar prop_2001 = r(table)["b",2]
scalar se2001 = r(table)["se", 2]

scalar se=sqrt(se2024^2 +se2001^2 )
scalar z = invnormal(0.975)
scalar p=prop_2001-prop_2024
scalar ci_lower = p - z * se
scalar ci_upper = p + z * se
format p ci_lower ci_upper %9.4f
list p ci_lower ci_lower

I want if there are other user written commands to do it more efficient and flexible. Like:

HTML Code:

command_name hp2024 hp2001
command_name hp2024 hp2001 if sex==1
svy: command_name hp2024 hp2001

The dataset is listed below:

HTML Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte (hp2024 hp2001 sex)
1 1 2
1 1 1
1 1 1
1 1 2
1 1 1
1 1 2
0 0 2
0 0 1
1 1 2
0 1 2
1 1 1
0 0 2
0 0 2
1 1 1
1 1 2
1 1 1
0 0 1
0 1 2
0 0 1
0 0 2
0 0 2
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
1 1 2
0 1 1
0 0 2
0 0 2
1 1 2
1 1 2
0 0 2
0 0 1
0 0 1
1 1 2
1 1 2
1 1 2
0 0 2
0 0 2
1 1 1
0 0 2
1 1 1
1 1 1
0 0 1
0 0 2
1 1 1
1 1 1
1 1 2
1 1 1
0 0 2
0 0 2
0 0 2
1 1 2
1 1 2
1 1 2
0 1 1
0 0 2
0 0 1
0 1 1
0 0 1
0 0 1
1 1 1
1 1 1
0 0 2
0 0 2
0 0 1
1 1 1
1 1 2
0 0 2
0 0 2
0 1 2
1 1 1
0 0 2
0 0 2
1 1 1
0 1 1
1 1 1
0 0 1
0 1 2
0 0 1
1 1 1
0 1 1
1 1 1
1 1 2
0 1 1
0 0 2
0 0 2
0 1 2
0 0 2
0 1 1
0 1 1
0 1 1
0 0 2
0 0 2
0 0 2
1 1 2
1 1 1
0 1 2
end

Last edited by Qiguo Lian; 11 Aug 2025, 08:27.

Tags: None

Felix Bittmann

Join Date: Aug 2018

Posts: 767
#2

11 Aug 2025, 08:22

Code:

prtest hp2024 = hp2001

However, I am not sure the command allows for the svy prefix.

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
Comment
Qiguo Lian

Join Date: May 2014

Posts: 32
#3

11 Aug 2025, 08:38

Originally posted by Felix Bittmann View Post

Code:

prtest hp2024 = hp2001

However, I am not sure the command allows for the svy prefix.

Thank you for remind me the prtest command, which is useful too. It seems that svy does not allows with prtest.
Comment

Tiago Pereira

Join Date: Jan 2016
Posts: 415

12 Aug 2025, 09:32

It is unclear the design of your study. I have assumed that the proportions are risks. You can also use GLM if the assumption of independence is OK.

Code:

  clear
    set obs 10000
    gene time = runiform()>0.50
    *! Risk of 60% for time 1
    gene status = runiform()>0.40 if time ==1
    *! Risk of 40% for time 0
    replace status = runiform()>0.60 if time ==0
    glm status time, family(binomial) link(identity) vce(robust)
    tab status time, col

Last edited by Tiago Pereira; 12 Aug 2025, 09:37.

Comment

Qiguo Lian

Join Date: May 2014

Posts: 32
#5

12 Aug 2025, 10:48

Originally posted by Tiago Pereira View Post

It is unclear the design of your study. I have assumed that the proportions are risks. You can also use GLM if the assumption of independence is OK.

Code:

clear set obs 10000 gene time = runiform()>0.50 *! Risk of 60% for time 1 gene status = runiform()>0.40 if time ==1 *! Risk of 40% for time 0 replace status = runiform()>0.60 if time ==0 glm status time, family(binomial) link(identity) vce(robust) tab status time, col

Thanks. The design is estimating the prevalence of a disease using two cutoffs (one was released in 2011 and the updated was published in 2024).
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4457
#6

12 Aug 2025, 18:23

Originally posted by Qiguo Lian View Post

I have two variable (0= negative, 1=positive) generated from two different Guidelines on HP, one is old published in 2021, and the other is updated in last year.

I want to estimated the Absolute Difference of Two Independent Proportions with the 95%CI as the Tables in JAMA Network Open

Those data in that article were paired, not independent. Yours look paired, too. Are you sure that a method for use with independent observations is what you're looking for?
1 like
Comment
Qiguo Lian

Join Date: May 2014

Posts: 32
#7

05 Sep 2025, 23:27

Originally posted by Joseph Coveney View Post

Those data in that article were paired, not independent. Yours look paired, too. Are you sure that a method for use with independent observations is what you're looking for?

You are right. The data are paired. Among the same population, we want to check the difference of HP prevalence between two different Guidelines. Any suggestions?
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4457
#8

06 Sep 2025, 04:13

Originally posted by Qiguo Lian View Post

You are right. . . . Any suggestions?

Sure. Here are three.

Code:

version 19 clear * quietly input byte (hp2024 hp2001 sex) <data snippet redacted for brevity> end generate `c(obs_t)' pid = _n quietly reshape long hp, i(pid) j(gdn) quietly recode gdn (2001=0) (2024 = 1) rename hp htn // One glm htn i.gdn##i.sex, family(binomial) link(identity) vce(cluster pid) nolog margins , dydx(gdn) // Two xtgee htn i.gdn##i.sex, i(pid) family(binomial) link(logit) nolog xtcorr, compact margins , dydx(gdn) // Three xtlogit htn i.gdn##i.sex, i(pid) nolog margins , dydx(gdn) exit

Complete do-file and corresponding log file attached for your convenience.

The first two above are marginal ("population average") and the third is "individual specific". Of the three, the last might be the most informative.

There's a fourth that I haven't shown above, but that I would also attempt if it were me and would favor if the algorithm converged, namely, fitting a generalized linear mulitlevel / hierarchical model with a binomial distribution family and an identity link function using gllamm, a user-written Stata estimation command that you can find on SSC and the Stata Journal site via search gllamm and on its own website via your favorite search engine.
Attached Files

Paired risk difference.do (1.3 KB, 1 view)

Paired risk difference.smcl (10.1 KB, 1 view)
1 like
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 443
#9

06 Sep 2025, 14:02

Joseph Coveney Very helpful, as always. I'm curious if you can share why you would prefer the multilevel model with binomial family distribution and identity link.
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1144
#10

06 Sep 2025, 15:14

After reading Joseph's suggestion about using gllamm with family = binomial and link = identity, I wondered why one could not use meglm to estimate the same model. But when I consulted the documentation, I discovered that the binomial + identity combination is not permitted (see p. 6).

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4457
#11

06 Sep 2025, 17:37

Originally posted by Erik Ruzek View Post

Joseph Coveney . . . share why you would prefer the multilevel model with binomial family distribution and identity link.

Sure:

1. OP's research question regards the risk difference between diagnostic criteria. Sex is included as a covariate.

2. The multilevel logistic model above indicates an interaction of diagnostic criteria and sex.

3. I would like to know whether that interaction prevails also in the additive case (the linear binomial model) before deciding whether to base the conclusion on the marginal (over sex) or to report separate risk differences.

As an aside, I suspect that, because the predictors are categorical and with the interaction term the model is saturated, gllamm will likely converge. But if not, then some might suggest fitting a linear probability model using, say, xtreg . . ., vce(robust) as an alternative.
1 like
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4496
#12

06 Sep 2025, 18:50

Bruce Weaver - just to note that many families in the meglm command do not allow the identity link (though they are allowed using glm); several versions ago, I added this to the wishlist for the next version of Stata
1 like
Comment
Qiguo Lian

Join Date: May 2014

Posts: 32
#13

07 Sep 2025, 03:26

Originally posted by Joseph Coveney View Post

Sure:

1. OP's research question regards the risk difference between diagnostic criteria. Sex is included as a covariate.

2. The multilevel logistic model above indicates an interaction of diagnostic criteria and sex.

3. I would like to know whether that interaction prevails also in the additive case (the linear binomial model) before deciding whether to base the conclusion on the marginal (over sex) or to report separate risk differences.

As an aside, I suspect that, because the predictors are categorical and with the interaction term the model is saturated, gllamm will likely converge. But if not, then some might suggest fitting a linear probability model using, say, xtreg . . ., vce(robust) as an alternative.

Many thanks!
Comment

Announcement

How to estimate the 95%CI of Absolute Difference Between Two Independent Proportions

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment