Comparing one binary variable to another

Jack McGarrigle

Join Date: Dec 2020

Posts: 4
#1

Comparing one binary variable to another

04 Dec 2020, 07:02

I have data from a survey on two questions (Yes or no answers). I want to test the hypothesis that "no" was chosen more often in one question than another. How can I do this? Thanks in advance.
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10213
#2

04 Dec 2020, 09:28

See

Code:

help prtest
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#3

04 Dec 2020, 14:48

I'd interpret what lies behind Jack's question differently than I think Andrew Musau does, which leads me to a minor "rant and rave" about the documentation for -prtest-.

My guess is that Jack has the same variable measured on the same people at two different times or in two different contexts or in relation to two different objects, leading to "related" or "dependent" proportions. (Do you like dogs? y/n Do you like cats y/n). This is not what -pretest var1 == var2- handles. Instead it handles an unusual data structure in which N1 of the _N subjects' observations contain var1 and are missing on var2, and N2 of the subjects have observations that contain var2 and are missing on var1. This unusual data representation is what Example 2 in the PDDF documentation for -help prtest- shows:

Code:

use http://www.stata-press.com/data/r15/cure list prtest cure1 == cure2 // two independent groups SE di sqrt(r(se1)^2 + r(se2)^2)

-prtest- does do the right thing here, presuming that the different observations are for different individuals. However, I think that few us would lay out data on different groups of individuals as is done in this example We'd instead represent this data with one variable for the outcome "cure" and one for the "group," rather than two variables for the same outcome with different people. Therefore, I think that giving this example without explaining it is likely to mislead even moderately experienced users. (From many years of teaching and consulting about such matters, I can testify that the distinction of related vs. independent means or proportions is commonly misunderstood, so this is a spot in which careful documentation is in order.) I think that a naive user might well try to use -prtest var1 == var2- to try to test related proportions, which it will happily and incorrectly do with a very typical data structure. (I'm looking at the documentation for v. 15.1, not v. 16, by the way.)

For what I take to be the more common situation of two variables on each observation/individual, as I'm guessing Jack has, the relevant test would be what many of us call the McNemar test of related proportions, about which look for "McNemar" under -help symmetry-. (-symmetry-, by its reference to case/control examples, would also lead to confusion for analysts who are not epi/biostat people, but that infelicity is a different matter. I personally use -tab2 var1 var2- followed by -bitesti-, not -symmetry-.)
1 like
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10213
#4

04 Dec 2020, 17:14

Mike Lacy, yes, I was assuming independent groups. Thanks for pointing out that this will not be valid for dependent proportions. One extra detail, note that prtest can handle a conventional data structure (one outome variable and a group indicator), e.g.,

Code:

use http://www.stata-press.com/data/r15/cure list prtest cure1 == cure2 // two independent groups SE di sqrt(r(se1)^2 + r(se2)^2) egen cure = rowmax(cure?) gen group2=!missing(cure2) prtest cure, by(group2)
Comment
Jack McGarrigle

Join Date: Dec 2020

Posts: 4
#5

07 Dec 2020, 08:42

To give more information. In group 1, 49 participants selected "yes" and 12 selected "no". In group 2, 37 participants selected "yes" and 30 selected "no". How exactly can I test that the difference between the two is statistically significant?

Thanks in advance,
Jack
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10213
#6

07 Dec 2020, 09:06

Are these independent groups, i.e., are the participants in group 1 the same as those in group 2?
Comment
Jack McGarrigle

Join Date: Dec 2020

Posts: 4
#7

07 Dec 2020, 12:09

No, the participants are divided at the beginning of the survey, half into into a treatment group and half into a control group. The treatment group sees a box with a warning label, and the control group just sees a box. Those in the treatment group "accepted" the box at a lower rate, but I need to see if it is at a statistically significant level
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10213

07 Dec 2020, 12:43

Then see the example in #4. The following recreates your responses:

Code:

set obs `=49+37'
gen group1= _n<50
gen yes=1
set obs `=_N+12+30'
replace yes=0 if missing(yes)
replace group1=1 in -12/l
replace group1= 0 if missing(group1)
lab define response 1 "yes" 0 "no"
lab define group1 1 "group 1" 0 "group 2"
lab values yes response
lab values group1 group1
tab yes group1
prtest yes, by(group)

Res.:

Code:

. tab yes group1

           |        group1
       yes |   group 2    group 1 |     Total
-----------+----------------------+----------
        no |        30         12 |        42
       yes |        37         49 |        86
-----------+----------------------+----------
     Total |        67         61 |       128

.
. prtest yes, by(group)

Two-sample test of proportions               group 2: Number of obs =       67
                                             group 1: Number of obs =       61
------------------------------------------------------------------------------
       Group |       Mean   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     group 2 |   .5522388   .0607504                      .4331702    .6713074
     group 1 |   .8032787   .0508972                       .703522    .9030354
-------------+----------------------------------------------------------------
        diff |  -.2510399   .0792536                     -.4063742   -.0957056
             |  under Ho:   .0830934    -3.02   0.003
------------------------------------------------------------------------------
        diff = prop(group 2) - prop(group 1)                      z =  -3.0212
    Ho: diff = 0

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(Z < z) = 0.0013         Pr(|Z| > |z|) = 0.0025          Pr(Z > z) = 0.9987

So yes, the proportion of no observations does differ between the 2 groups (P=0.003).

Last edited by Andrew Musau; 07 Dec 2020, 12:46.

Comment

Jack McGarrigle

Join Date: Dec 2020

Posts: 4
#9

08 Dec 2020, 04:02

That's great. Thanks very much for your help.
Comment

Bruce Weaver

Join Date: May 2014
Posts: 1133

#10

08 Dec 2020, 16:52

And notice that the z-test from -prtest- is equivalent to the Pearson chi-square test you can request when you use -tabulate-. Only the sign differs for this example.

Code:

Code:

tabulate yes group1, chi2
display "z = " sqrt(r(chi2)) ", p = " r(p)
prtest yes, by(group)

Output:

Code:

. tabulate yes group1, chi2

           |        group1
       yes |   group 2    group 1 |     Total
-----------+----------------------+----------
        no |        30         12 |        42
       yes |        37         49 |        86
-----------+----------------------+----------
     Total |        67         61 |       128

          Pearson chi2(1) =   9.1275   Pr = 0.003

. display "z = " sqrt(r(chi2)) ", p = " r(p)
z = 3.0211769, p = .00251794

. prtest yes, by(group)

Two-sample test of proportions               group 2: Number of obs =       67
                                             group 1: Number of obs =       61
------------------------------------------------------------------------------
       Group |       Mean   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     group 2 |   .5522388   .0607504                      .4331702    .6713074
     group 1 |   .8032787   .0508972                       .703522    .9030354
-------------+----------------------------------------------------------------
        diff |  -.2510399   .0792536                     -.4063742   -.0957056
             |  under Ho:   .0830934    -3.02   0.003
------------------------------------------------------------------------------
        diff = prop(group 2) - prop(group 1)                      z =  -3.0212
    Ho: diff = 0

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(Z < z) = 0.0013         Pr(|Z| > |z|) = 0.0025          Pr(Z > z) = 0.9987

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)

Announcement