Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • help with difference in proportions in multilevel data

    I have a dataset where eyes are nested within subjects. I would like to know whether the proportions for var1 and var2 are different, such that a p<0.05 would indicate a significant difference. (thus null hypothesis is that var1=var2). If the data were not nested, I would run
    prtest var1=var2

    However, the p-value from prtest does not account for the fact that some subjects have data on both eyes.

    Is there a way to do a bootstrap to get a 95CI and pvalue for prtest that would account for the fact that subjects sometimes have both eyes in the study, or is there another statistical test one can use to look at whether there is a significant difference between var1 and var2 while accounting for the multilevel nature of the data?

    I realize that a GEE or mixed model would be able to account for the multilevel nature of the data but these would test a different hypothesis that var 1 is associated with var 2, so p<0.05 indicates they are significantly associated. However this isn't really asking the same question as the one which we are trying to answer which is whether the 2 variables are significantly different.

    thank you for your help.
    Last edited by AC Thompson; 22 May 2020, 11:33.

  • #2
    I think you have to -cluster()- the standard errors on subjects.
    I hope that the following toy-example can be helpful somehow:
    Code:
    set obs 10
    g id=_n
    g eyes=0
    g x=runiform()
    expand 2
    bysort id: replace eyes=1 if _n==2 & id<=6
    bysort id: replace eyes=. if _n==2 & id>6
    label define eyes 0 "right" 1 "left"
    label val eyes eyes
    . logistic eyes x, vce(cluster id)
    
    Logistic regression                             Number of obs     =         16
                                                    Wald chi2(1)      =       0.32
                                                    Prob > chi2       =     0.5728
    Log pseudolikelihood = -10.525824               Pseudo R2         =     0.0056
    
                                        (Std. Err. adjusted for 10 clusters in id)
    ------------------------------------------------------------------------------
                 |               Robust
            eyes | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               x |   .5231353   .6009756    -0.56   0.573       .05505     4.97131
           _cons |    .756365   .3143038    -0.67   0.502     .3349799    1.707828
    ------------------------------------------------------------------------------
    Note: _cons estimates baseline odds.
    
    
    
    .
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Without contradicting Carlo's suggestion, I think that knowing something about the frequency of various data patterns would be helpful here. I'd suggest you post tabulations for var1 X var2 for each of the following subsets of observations:

      1) Subjects with only one eye in the study, tabulation for the right eye, and a separate one for the left eye;
      2) Subjects with two eyes in the study, same two tabulations.

      Comment


      • #4
        Hi, I think that a logistic regression with clustered standard errors (similar to xtgee or mixed) is looking at a different null hypothesis...In that case the null hypothesis is that var 1 is not associated with var 2. Thus if p<0.05, then we reject the null hypothesis and var 1 is significantly associated with var 2. The output is an odds ratio.

        But what we want to show is the opposite--we want to show that var 1 proportion is significantly different from var 2 -- so that p <0.05 means that these two variables have significantly different proportions, (not that they are significantly associated ). The way to do this when there is just one eye per subject would be prtest. But since this is multilevel data with more than one eye per subject, one would need to somehow adjust the 95CI and p value

        - I'm just wondering if someone might have code to do that or if there is another statistical test that could provide an adjusted 95CI and p-value for the difference in proportions in multilevel data?

        I'll show the breakdown here in case this is helpful:

        For left eye:
        var 1
        var2 | No Yes | Total
        ---------------+-----------+----------
        No | 65 22 | 87
        Yes. | 8 3 | 11
        ----------- +-------------------------+---------
        Total | 73 25 | 98


        For right eye:
        var1

        var2 | No Yes | Total
        ---------------+---------------+------------+----------
        No | 66 17 | 83
        Yes | 7 2 | 9
        ---------------+--------------+-------------+----------
        Total | 73 19 | 92


        For both eyes retained:

        var1

        var2 | No Yes. | Total
        ---------------+------------------+-----------------+----------
        No | 131 39 | 170
        Yes | 15 5 | 20
        ---------------+-------------------+-------------
        Total | 146 44 | 190

        Comment


        • #5
          I'm assuming there is a way to bootstrap the 95CI for the difference in proportions taking into account the fact that there is clustering of eyes within subjects...But I have not been able to find code to do this.

          Comment


          • #6
            Originally posted by Mike Lacy View Post
            Without contradicting Carlo's suggestion, I think that knowing something about the frequency of various data patterns would be helpful here. I'd suggest you post tabulations for var1 X var2 for each of the following subsets of observations:

            1) Subjects with only one eye in the study, tabulation for the right eye, and a separate one for the left eye;
            2) Subjects with two eyes in the study, same two tabulations.
            Hi, I think that a logistic regression with clustered standard errors (similar to xtgee or mixed) is looking at a different null hypothesis...In that case the null hypothesis is that var 1 is not associated with var 2. Thus if p<0.05, then we reject the null hypothesis and var 1 is significantly associated with var 2. The output is an odds ratio.

            But what we want to show is the opposite--we want to show that var 1 proportion is significantly different from var 2 -- so that p <0.05 means that these two variables have significantly different proportions, (not that they are significantly associated ). The way to do this when there is just one eye per subject would be prtest. But since this is multilevel data with more than one eye per subject, one would need to somehow adjust the 95CI and p value

            - I'm just wondering if someone might have code to do that or if there is another statistical test that could provide an adjusted 95CI and p-value for the difference in proportions in multilevel data?

            I'll show the breakdown here in case this is helpful:

            For left eye:
            var 1
            var2 | No Yes | Total
            ---------------+-----------+----------
            No | 65 22 | 87
            Yes. | 8 3 | 11
            ----------- +-------------------------+---------
            Total | 73 25 | 98


            For right eye:
            var1

            var2 | No Yes | Total
            ---------------+---------------+------------+----------
            No | 66 17 | 83
            Yes | 7 2 | 9
            ---------------+--------------+-------------+----------
            Total | 73 19 | 92


            For both eyes retained:

            var1

            var2 | No Yes. | Total
            ---------------+------------------+-----------------+----------
            No | 131 39 | 170
            Yes | 15 5 | 20
            ---------------+-------------------+-------------
            Total | 146 44 | 190

            Comment


            • #7
              As an example, I tried :

              seed 200
              cl(id eye)
              bootstrap, reps(1000): prtest var1=var2
              estat bootstrap, all

              This does not work - I get an error
              expression list required
              r(100);

              There is probably a way to get the 95CI and p for a difference in proportions, clustering at the ID level, by writing a program for bootstrapping. Does anyone know how to do this?


              Comment


              • #8
                I do hope that the following toy-example can help you somehow:
                Code:
                . set obs 10
                number of observations (_N) was 0, now 10
                
                . g id=_n
                
                . g var1=1 in 1/6
                (4 missing values generated)
                
                . replace var1=0 if var1==.
                (4 real changes made)
                
                . g var2=0 in 1/3
                (7 missing values generated)
                
                
                . replace var2=1 in 4/10
                (7 real changes made)
                
                . expand 2
                (10 observations created)
                
                . prtest var1==var2
                
                Two-sample test of proportions                  var1: Number of obs =       20
                                                                var2: Number of obs =       20
                ------------------------------------------------------------------------------
                    Variable |       Mean   Std. Err.      z    P>|z|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                        var1 |         .6   .1095445                      .3852967    .8147033
                        var2 |         .7   .1024695                      .4991635    .9008365
                -------------+----------------------------------------------------------------
                        diff |        -.1        .15                     -.3939946    .1939946
                             |  under Ho:    .150831    -0.66   0.507
                ------------------------------------------------------------------------------
                        diff = prop(var1) - prop(var2)                            z =  -0.6630
                    Ho: diff = 0
                
                    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
                 Pr(Z < z) = 0.2537         Pr(|Z| > |z|) = 0.5073          Pr(Z > z) = 0.7463
                
                . return list
                
                scalars:
                                 r(N1) =  20
                                 r(N2) =  20
                                 r(P1) =  .6
                                 r(P2) =  .7
                             r(P_diff) =  -.1
                                r(se1) =  .1095445115010332
                                r(se2) =  .102469507659596
                           r(se_diff0) =  .1508310312899836
                            r(se_diff) =  .15
                                r(lb1) =  .3852967027539412
                                r(ub1) =  .8147032972460588
                                r(lb2) =  .4991634554736407
                                r(ub2) =  .9008365445263593
                            r(lb_diff) =  -.393994597681008
                            r(ub_diff) =  .1939945976810081
                                  r(z) =  -.6629935441317957
                                r(p_l) =  .2536673443907445
                                  r(p) =  .5073346887814889
                                r(p_u) =  .7463326556092555
                              r(level) =  95
                
                . bootstrap r(P_diff), reps(1000) cluster(id) nodots : prtest var1==var2
                
                Warning:  Because prtest is not an estimation command or does not set e(sample), bootstrap has no way to determine which observations
                          are used in calculating the statistics and so assumes that all observations are used.  This means that no observations will
                          be excluded from the resampling because of missing values or other reasons.
                
                          If the assumption is not true, press Break, save the data, and drop the observations that are to be excluded.  Be sure that
                          the dataset in memory contains only the relevant data.
                
                Bootstrap results                               Number of obs     =         20
                                                                Replications      =      1,000
                
                      command:  prtest var1==var2
                        _bs_1:  r(P_diff)
                
                                                     (Replications based on 10 clusters in id)
                ------------------------------------------------------------------------------
                             |   Observed   Bootstrap                         Normal-based
                             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                       _bs_1 |        -.1    .265759    -0.38   0.707     -.620878     .420878
                ------------------------------------------------------------------------------
                
                . estat bootstrap, all
                
                Bootstrap results                               Number of obs     =         20
                                                                Replications      =       1000
                
                      command:  prtest var1==var2
                        _bs_1:  r(P_diff)
                
                                                     (Replications based on 10 clusters in id)
                ------------------------------------------------------------------------------
                             |    Observed               Bootstrap
                             |       Coef.       Bias    Std. Err.  [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                       _bs_1 |         -.1      .0159   .26575895    -.620878    .420878   (N)
                             |                                            -.6         .4   (P)
                             |                                            -.5         .6  (BC)
                ------------------------------------------------------------------------------
                (N)    normal confidence interval
                (P)    percentile confidence interval
                (BC)   bias-corrected confidence interval
                
                .
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Originally posted by Carlo Lazzaro View Post
                  I do hope that the following toy-example can help you somehow:
                  Code:
                  . set obs 10
                  number of observations (_N) was 0, now 10
                  
                  . g id=_n
                  
                  . g var1=1 in 1/6
                  (4 missing values generated)
                  
                  . replace var1=0 if var1==.
                  (4 real changes made)
                  
                  . g var2=0 in 1/3
                  (7 missing values generated)
                  
                  
                  . replace var2=1 in 4/10
                  (7 real changes made)
                  
                  . expand 2
                  (10 observations created)
                  
                  . prtest var1==var2
                  
                  Two-sample test of proportions var1: Number of obs = 20
                  var2: Number of obs = 20
                  ------------------------------------------------------------------------------
                  Variable | Mean Std. Err. z P>|z| [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                  var1 | .6 .1095445 .3852967 .8147033
                  var2 | .7 .1024695 .4991635 .9008365
                  -------------+----------------------------------------------------------------
                  diff | -.1 .15 -.3939946 .1939946
                  | under Ho: .150831 -0.66 0.507
                  ------------------------------------------------------------------------------
                  diff = prop(var1) - prop(var2) z = -0.6630
                  Ho: diff = 0
                  
                  Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
                  Pr(Z < z) = 0.2537 Pr(|Z| > |z|) = 0.5073 Pr(Z > z) = 0.7463
                  
                  . return list
                  
                  scalars:
                  r(N1) = 20
                  r(N2) = 20
                  r(P1) = .6
                  r(P2) = .7
                  r(P_diff) = -.1
                  r(se1) = .1095445115010332
                  r(se2) = .102469507659596
                  r(se_diff0) = .1508310312899836
                  r(se_diff) = .15
                  r(lb1) = .3852967027539412
                  r(ub1) = .8147032972460588
                  r(lb2) = .4991634554736407
                  r(ub2) = .9008365445263593
                  r(lb_diff) = -.393994597681008
                  r(ub_diff) = .1939945976810081
                  r(z) = -.6629935441317957
                  r(p_l) = .2536673443907445
                  r(p) = .5073346887814889
                  r(p_u) = .7463326556092555
                  r(level) = 95
                  
                  . bootstrap r(P_diff), reps(1000) cluster(id) nodots : prtest var1==var2
                  
                  Warning: Because prtest is not an estimation command or does not set e(sample), bootstrap has no way to determine which observations
                  are used in calculating the statistics and so assumes that all observations are used. This means that no observations will
                  be excluded from the resampling because of missing values or other reasons.
                  
                  If the assumption is not true, press Break, save the data, and drop the observations that are to be excluded. Be sure that
                  the dataset in memory contains only the relevant data.
                  
                  Bootstrap results Number of obs = 20
                  Replications = 1,000
                  
                  command: prtest var1==var2
                  _bs_1: r(P_diff)
                  
                  (Replications based on 10 clusters in id)
                  ------------------------------------------------------------------------------
                  | Observed Bootstrap Normal-based
                  | Coef. Std. Err. z P>|z| [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                  _bs_1 | -.1 .265759 -0.38 0.707 -.620878 .420878
                  ------------------------------------------------------------------------------
                  
                  . estat bootstrap, all
                  
                  Bootstrap results Number of obs = 20
                  Replications = 1000
                  
                  command: prtest var1==var2
                  _bs_1: r(P_diff)
                  
                  (Replications based on 10 clusters in id)
                  ------------------------------------------------------------------------------
                  | Observed Bootstrap
                  | Coef. Bias Std. Err. [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                  _bs_1 | -.1 .0159 .26575895 -.620878 .420878 (N)
                  | -.6 .4 (P)
                  | -.5 .6 (BC)
                  ------------------------------------------------------------------------------
                  (N) normal confidence interval
                  (P) percentile confidence interval
                  (BC) bias-corrected confidence interval
                  
                  .
                  Yes, this code works -
                  . bootstrap r(P_diff), reps(1000) cluster(id) nodots : prtest var1==var2 thank you for your help!

                  Comment

                  Working...
                  X