Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Comparing one binary variable to another

    I have data from a survey on two questions (Yes or no answers). I want to test the hypothesis that "no" was chosen more often in one question than another. How can I do this? Thanks in advance.

  • #2
    See

    Code:
    help prtest

    Comment


    • #3
      I'd interpret what lies behind Jack's question differently than I think Andrew Musau does, which leads me to a minor "rant and rave" about the documentation for -prtest-.

      My guess is that Jack has the same variable measured on the same people at two different times or in two different contexts or in relation to two different objects, leading to "related" or "dependent" proportions. (Do you like dogs? y/n Do you like cats y/n). This is not what -pretest var1 == var2- handles. Instead it handles an unusual data structure in which N1 of the _N subjects' observations contain var1 and are missing on var2, and N2 of the subjects have observations that contain var2 and are missing on var1. This unusual data representation is what Example 2 in the PDDF documentation for -help prtest- shows:

      Code:
      use http://www.stata-press.com/data/r15/cure
      list
      prtest cure1 == cure2
      // two independent groups SE
      di sqrt(r(se1)^2 + r(se2)^2)
      -prtest- does do the right thing here, presuming that the different observations are for different individuals. However, I think that few us would lay out data on different groups of individuals as is done in this example We'd instead represent this data with one variable for the outcome "cure" and one for the "group," rather than two variables for the same outcome with different people. Therefore, I think that giving this example without explaining it is likely to mislead even moderately experienced users. (From many years of teaching and consulting about such matters, I can testify that the distinction of related vs. independent means or proportions is commonly misunderstood, so this is a spot in which careful documentation is in order.) I think that a naive user might well try to use -prtest var1 == var2- to try to test related proportions, which it will happily and incorrectly do with a very typical data structure. (I'm looking at the documentation for v. 15.1, not v. 16, by the way.)

      For what I take to be the more common situation of two variables on each observation/individual, as I'm guessing Jack has, the relevant test would be what many of us call the McNemar test of related proportions, about which look for "McNemar" under -help symmetry-. (-symmetry-, by its reference to case/control examples, would also lead to confusion for analysts who are not epi/biostat people, but that infelicity is a different matter. I personally use -tab2 var1 var2- followed by -bitesti-, not -symmetry-.)


      Comment


      • #4
        Mike Lacy, yes, I was assuming independent groups. Thanks for pointing out that this will not be valid for dependent proportions. One extra detail, note that prtest can handle a conventional data structure (one outome variable and a group indicator), e.g.,

        Code:
        use http://www.stata-press.com/data/r15/cure
        list
        prtest cure1 == cure2
        // two independent groups SE
        di sqrt(r(se1)^2 + r(se2)^2)
        egen cure = rowmax(cure?)
        gen group2=!missing(cure2)
        prtest cure, by(group2)

        Comment


        • #5
          To give more information. In group 1, 49 participants selected "yes" and 12 selected "no". In group 2, 37 participants selected "yes" and 30 selected "no". How exactly can I test that the difference between the two is statistically significant?

          Thanks in advance,
          Jack

          Comment


          • #6
            Are these independent groups, i.e., are the participants in group 1 the same as those in group 2?

            Comment


            • #7
              No, the participants are divided at the beginning of the survey, half into into a treatment group and half into a control group. The treatment group sees a box with a warning label, and the control group just sees a box. Those in the treatment group "accepted" the box at a lower rate, but I need to see if it is at a statistically significant level

              Comment


              • #8
                Then see the example in #4. The following recreates your responses:

                Code:
                set obs `=49+37'
                gen group1= _n<50
                gen yes=1
                set obs `=_N+12+30'
                replace yes=0 if missing(yes)
                replace group1=1 in -12/l
                replace group1= 0 if missing(group1)
                lab define response 1 "yes" 0 "no"
                lab define group1 1 "group 1" 0 "group 2"
                lab values yes response
                lab values group1 group1
                tab yes group1
                prtest yes, by(group)
                Res.:

                Code:
                . tab yes group1
                
                           |        group1
                       yes |   group 2    group 1 |     Total
                -----------+----------------------+----------
                        no |        30         12 |        42
                       yes |        37         49 |        86
                -----------+----------------------+----------
                     Total |        67         61 |       128
                
                .
                . prtest yes, by(group)
                
                Two-sample test of proportions               group 2: Number of obs =       67
                                                             group 1: Number of obs =       61
                ------------------------------------------------------------------------------
                       Group |       Mean   Std. Err.      z    P>|z|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                     group 2 |   .5522388   .0607504                      .4331702    .6713074
                     group 1 |   .8032787   .0508972                       .703522    .9030354
                -------------+----------------------------------------------------------------
                        diff |  -.2510399   .0792536                     -.4063742   -.0957056
                             |  under Ho:   .0830934    -3.02   0.003
                ------------------------------------------------------------------------------
                        diff = prop(group 2) - prop(group 1)                      z =  -3.0212
                    Ho: diff = 0
                
                    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
                 Pr(Z < z) = 0.0013         Pr(|Z| > |z|) = 0.0025          Pr(Z > z) = 0.9987

                So yes, the proportion of no observations does differ between the 2 groups (P=0.003).
                Last edited by Andrew Musau; 07 Dec 2020, 12:46.

                Comment


                • #9
                  That's great. Thanks very much for your help.

                  Comment


                  • #10
                    And notice that the z-test from -prtest- is equivalent to the Pearson chi-square test you can request when you use -tabulate-. Only the sign differs for this example.

                    Code:
                    Code:
                    tabulate yes group1, chi2
                    display "z = " sqrt(r(chi2)) ", p = " r(p)
                    prtest yes, by(group)
                    Output:
                    Code:
                    . tabulate yes group1, chi2
                    
                               |        group1
                           yes |   group 2    group 1 |     Total
                    -----------+----------------------+----------
                            no |        30         12 |        42
                           yes |        37         49 |        86
                    -----------+----------------------+----------
                         Total |        67         61 |       128
                    
                              Pearson chi2(1) =   9.1275   Pr = 0.003
                    
                    . display "z = " sqrt(r(chi2)) ", p = " r(p)
                    z = 3.0211769, p = .00251794
                    
                    . prtest yes, by(group)
                    
                    Two-sample test of proportions               group 2: Number of obs =       67
                                                                 group 1: Number of obs =       61
                    ------------------------------------------------------------------------------
                           Group |       Mean   Std. Err.      z    P>|z|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                         group 2 |   .5522388   .0607504                      .4331702    .6713074
                         group 1 |   .8032787   .0508972                       .703522    .9030354
                    -------------+----------------------------------------------------------------
                            diff |  -.2510399   .0792536                     -.4063742   -.0957056
                                 |  under Ho:   .0830934    -3.02   0.003
                    ------------------------------------------------------------------------------
                            diff = prop(group 2) - prop(group 1)                      z =  -3.0212
                        Ho: diff = 0
                    
                        Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
                     Pr(Z < z) = 0.0013         Pr(|Z| > |z|) = 0.0025          Pr(Z > z) = 0.9987

                    --
                    Bruce Weaver
                    Email: [email protected]
                    Version: Stata/MP 18.5 (Windows)

                    Comment

                    Working...
                    X