Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to estimate the 95%CI of Absolute Difference Between Two Independent Proportions

    I have two variable (0= negative, 1=positive) generated from two different Guidelines on HP, one is old published in 2021, and the other is updated in last year.

    I want to estimated the Absolute Difference of Two Independent Proportions with the 95%CI as the Tables in JAMA Network Open(https://jamanetwork.com/journals/jam...cle/2687386). How can I do it efficiently?

    Click image for larger version

Name:	220950_856.png
Views:	1
Size:	248.5 KB
ID:	1780877

    I tried the commands:
    HTML Code:
    proportion hp2024, cformat(%9.4f)
    scalar prop_2024 = r(table)["b",2]
    scalar se2024 = r(table)["se", 2]
    
    proportion hp2001, cformat(%9.4f)
    scalar prop_2001 = r(table)["b",2]
    scalar se2001 = r(table)["se", 2]
    
    scalar se=sqrt(se2024^2 +se2001^2 )
    scalar z = invnormal(0.975)
    scalar p=prop_2001-prop_2024
    scalar ci_lower = p - z * se
    scalar ci_upper = p + z * se
    format p ci_lower ci_upper %9.4f
    list p ci_lower ci_lower
    I want if there are other user written commands to do it more efficient and flexible. Like:
    HTML Code:
    command_name hp2024 hp2001
    command_name hp2024 hp2001 if sex==1
    svy: command_name hp2024 hp2001
    The dataset is listed below:

    HTML Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte (hp2024 hp2001 sex)
    1 1 2
    1 1 1
    1 1 1
    1 1 2
    1 1 1
    1 1 2
    0 0 2
    0 0 1
    1 1 2
    0 1 2
    1 1 1
    0 0 2
    0 0 2
    1 1 1
    1 1 2
    1 1 1
    0 0 1
    0 1 2
    0 0 1
    0 0 2
    0 0 2
    1 1 1
    1 1 1
    1 1 1
    1 1 1
    1 1 1
    1 1 1
    1 1 2
    0 1 1
    0 0 2
    0 0 2
    1 1 2
    1 1 2
    0 0 2
    0 0 1
    0 0 1
    1 1 2
    1 1 2
    1 1 2
    0 0 2
    0 0 2
    1 1 1
    0 0 2
    1 1 1
    1 1 1
    0 0 1
    0 0 2
    1 1 1
    1 1 1
    1 1 2
    1 1 1
    0 0 2
    0 0 2
    0 0 2
    1 1 2
    1 1 2
    1 1 2
    0 1 1
    0 0 2
    0 0 1
    0 1 1
    0 0 1
    0 0 1
    1 1 1
    1 1 1
    0 0 2
    0 0 2
    0 0 1
    1 1 1
    1 1 2
    0 0 2
    0 0 2
    0 1 2
    1 1 1
    0 0 2
    0 0 2
    1 1 1
    0 1 1
    1 1 1
    0 0 1
    0 1 2
    0 0 1
    1 1 1
    0 1 1
    1 1 1
    1 1 2
    0 1 1
    0 0 2
    0 0 2
    0 1 2
    0 0 2
    0 1 1
    0 1 1
    0 1 1
    0 0 2
    0 0 2
    0 0 2
    1 1 2
    1 1 1
    0 1 2
    end
    Last edited by Qiguo Lian; 11 Aug 2025, 08:27.

  • #2
    Code:
    prtest hp2024 = hp2001
    However, I am not sure the command allows for the svy prefix.
    Best wishes

    Stata 18.0 MP | ORCID | Google Scholar

    Comment


    • #3
      Originally posted by Felix Bittmann View Post
      Code:
      prtest hp2024 = hp2001
      However, I am not sure the command allows for the svy prefix.
      Thank you for remind me the prtest command, which is useful too. It seems that svy does not allows with prtest.

      Comment


      • #4
        It is unclear the design of your study. I have assumed that the proportions are risks. You can also use GLM if the assumption of independence is OK.


        Code:
          clear
            set obs 10000
            gene time = runiform()>0.50
            *! Risk of 60% for time 1
            gene status = runiform()>0.40 if time ==1
            *! Risk of 40% for time 0
            replace status = runiform()>0.60 if time ==0
            glm status time, family(binomial) link(identity) vce(robust)
            tab status time, col
        Last edited by Tiago Pereira; 12 Aug 2025, 09:37.

        Comment


        • #5
          Originally posted by Tiago Pereira View Post
          It is unclear the design of your study. I have assumed that the proportions are risks. You can also use GLM if the assumption of independence is OK.


          Code:
           clear
          set obs 10000
          gene time = runiform()>0.50
          *! Risk of 60% for time 1
          gene status = runiform()>0.40 if time ==1
          *! Risk of 40% for time 0
          replace status = runiform()>0.60 if time ==0
          glm status time, family(binomial) link(identity) vce(robust)
          tab status time, col
          Thanks. The design is estimating the prevalence of a disease using two cutoffs (one was released in 2011 and the updated was published in 2024).

          Comment


          • #6
            Originally posted by Qiguo Lian View Post
            I have two variable (0= negative, 1=positive) generated from two different Guidelines on HP, one is old published in 2021, and the other is updated in last year.

            I want to estimated the Absolute Difference of Two Independent Proportions with the 95%CI as the Tables in JAMA Network Open
            Those data in that article were paired, not independent. Yours look paired, too. Are you sure that a method for use with independent observations is what you're looking for?

            Comment


            • #7
              Originally posted by Joseph Coveney View Post
              Those data in that article were paired, not independent. Yours look paired, too. Are you sure that a method for use with independent observations is what you're looking for?
              You are right. The data are paired. Among the same population, we want to check the difference of HP prevalence between two different Guidelines. Any suggestions?

              Comment


              • #8
                Originally posted by Qiguo Lian View Post
                You are right. . . . Any suggestions?
                Sure. Here are three.
                Code:
                version 19
                
                clear *
                
                quietly input byte (hp2024 hp2001 sex)
                <data snippet redacted for brevity>
                end
                
                generate `c(obs_t)' pid = _n
                quietly reshape long hp, i(pid) j(gdn)
                quietly recode gdn (2001=0) (2024 = 1)
                rename hp htn
                
                // One
                glm htn i.gdn##i.sex, family(binomial) link(identity) vce(cluster pid) nolog
                margins , dydx(gdn)
                
                // Two
                xtgee htn i.gdn##i.sex, i(pid) family(binomial) link(logit) nolog
                xtcorr, compact
                margins , dydx(gdn)
                
                // Three
                xtlogit htn i.gdn##i.sex, i(pid) nolog
                margins , dydx(gdn)
                
                exit
                Complete do-file and corresponding log file attached for your convenience.

                The first two above are marginal ("population average") and the third is "individual specific". Of the three, the last might be the most informative.

                There's a fourth that I haven't shown above, but that I would also attempt if it were me and would favor if the algorithm converged, namely, fitting a generalized linear mulitlevel / hierarchical model with a binomial distribution family and an identity link function using gllamm, a user-written Stata estimation command that you can find on SSC and the Stata Journal site via search gllamm and on its own website via your favorite search engine.
                Attached Files

                Comment


                • #9
                  Joseph Coveney Very helpful, as always. I'm curious if you can share why you would prefer the multilevel model with binomial family distribution and identity link.

                  Comment


                  • #10
                    After reading Joseph's suggestion about using gllamm with family = binomial and link = identity, I wondered why one could not use meglm to estimate the same model. But when I consulted the documentation, I discovered that the binomial + identity combination is not permitted (see p. 6).
                    --
                    Bruce Weaver
                    Email: [email protected]
                    Version: Stata/MP 19.5 (Windows)

                    Comment


                    • #11
                      Originally posted by Erik Ruzek View Post
                      Joseph Coveney . . . share why you would prefer the multilevel model with binomial family distribution and identity link.
                      Sure:

                      1. OP's research question regards the risk difference between diagnostic criteria. Sex is included as a covariate.

                      2. The multilevel logistic model above indicates an interaction of diagnostic criteria and sex.

                      3. I would like to know whether that interaction prevails also in the additive case (the linear binomial model) before deciding whether to base the conclusion on the marginal (over sex) or to report separate risk differences.

                      As an aside, I suspect that, because the predictors are categorical and with the interaction term the model is saturated, gllamm will likely converge. But if not, then some might suggest fitting a linear probability model using, say, xtreg . . ., vce(robust) as an alternative.

                      Comment


                      • #12
                        Bruce Weaver - just to note that many families in the meglm command do not allow the identity link (though they are allowed using glm); several versions ago, I added this to the wishlist for the next version of Stata

                        Comment


                        • #13
                          Originally posted by Joseph Coveney View Post
                          Sure:

                          1. OP's research question regards the risk difference between diagnostic criteria. Sex is included as a covariate.

                          2. The multilevel logistic model above indicates an interaction of diagnostic criteria and sex.

                          3. I would like to know whether that interaction prevails also in the additive case (the linear binomial model) before deciding whether to base the conclusion on the marginal (over sex) or to report separate risk differences.

                          As an aside, I suspect that, because the predictors are categorical and with the interaction term the model is saturated, gllamm will likely converge. But if not, then some might suggest fitting a linear probability model using, say, xtreg . . ., vce(robust) as an alternative.
                          Many thanks!

                          Comment

                          Working...
                          X