Help with Testing Difference of Proportion in an Overlapping Group

Edward Sutanto

Join Date: Jul 2019
Posts: 11

Help with Testing Difference of Proportion in an Overlapping Group

07 Oct 2019, 14:42

Hi Statalist members,

I am having difficulty in testing a difference of proportion. I am currently using a nationally representative dataset (thus svy command) and interested in the use of nicotine product (either cigarettes or e-cigarettes) in school. The group I want to analyze is dual users (using both cigarettes and e-cigarettes product). My aim is to know whether there was a difference in the prevalence of using cigarettes vs e-cigarettes in school among dual users (so I am thinking of using chi-square test).

Using svy: proportion command, I already have the proportion of use of their respective products in schools among dual users (cigarettes: 61.47% and e-cigarette: 95.98%). However I am unable to test whether the proportion between the two product use in school are different. I am wondering how could I test it?
Attached below is a fraction of my dataset. Any help would be greatly appreciated. Thank you very much.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(Smokers Vapers Smoke_School Vape_School Survey_Weight Unique_ID usergroup)
1 0 1 . 2734.12  1 1
0 1 . 1 3498.22  2 2
1 0 1 .  130.29  3 1
1 0 1 . 1231.22  4 1
1 1 0 1  490.43  5 3
1 1 1 1 2342.11  6 3
0 1 . 0 3453.56  7 2
0 1 . 0  636.78  8 2
1 1 1 1 2956.23  9 3
0 1 . 1 3453.22 10 2
0 1 . 1 2341.19 11 2
0 1 . 0 3458.54 12 2
1 0 0 . 1989.24 13 1
1 0 0 . 2312.33 14 1
1 1 0 1 3432.11 15 3
1 0 0 .  657.45 16 1
1 1 0 0  456.94 17 3
1 0 1 .  357.23 18 1
0 1 . 0  689.23 19 2
1 1 1 1 1423.12 20 3
0 1 . 1 2425.23 21 2
1 0 0 . 2345.11 22 1
0 1 . 1 1632.12 23 2
0 1 . 0 1982.12 24 2
1 1 1 1  267.42 25 3
1 0 0 . 1838.38 26 1
end
label values Smokers var1
label def var1 0 "No", modify
label def var1 1 "Yes", modify
label values Vapers Vapers
label def Vapers 0 "No", modify
label def Vapers 1 "Yes", modify
label values Smoke_School Smoke_Private
label def Smoke_Private 0 "No", modify
label def Smoke_Private 1 "Yes", modify
label values Vape_School Vape_Private
label def Vape_Private 0 "No", modify
label def Vape_Private 1 "Yes", modify
label values usergroup usergroup
label def usergroup 1 "Exclusive Smokers", modify
label def usergroup 2 "Exclusive Vapers", modify
label def usergroup 3 "Dual Users", modify

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30066
#2

07 Oct 2019, 16:36

My aim is to know whether there was a difference in the prevalence of using cigarettes vs e-cigarettes in school among dual users (so I am thinking of using chi-square test).

What?? By definition, the prevalence of using cigarettes and the prevalence of using e-cigarettes among dual users is 100%. What am I not understanding here?
Comment
Edward Sutanto

Join Date: Jul 2019

Posts: 11
#3

07 Oct 2019, 18:00

Hi Clyde, sorry for the misunderstanding. I am testing the prevalence of using cigarettes in school vs e-cigarettes in school among dual users. There might be dual users who only use cigarettes in school, or only use e-cigarettes in school, or using both of them at school (thus it is not 100%).

I already know their each proportion by:
svy: proportion Smoke_School if usergroup==3
svy: proportion Vape_School if usergroup==3

However, I am unable to test their difference-in-proportions. I hope this explanation helps to clear up my question.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30066
#4

07 Oct 2019, 18:12

Please disregard what I wrote in #2. I see that I misread your post. You are concerned about the prevalence of in school use of these modalities among dual users (defined by overall use, not just in school). So what you have here is paired data, and it is made more complicated because it is a complex survey design. You do not provide any variable that looks like an identifier for strata or primary (or other order) sampling units. So I'll assume there are no strata and that the individual Unique_ID is the primary (and only order) sampling unit. If that's not true, modify the code accordingly.

Code:

svyset [pweight = Survey_Weight], psu(Unique_ID) reshape long @_School, i(Unique_ID) j(_modality) string encode _modality, gen(modality) svy, subpop(if usergroup == 3): clogit _School i.modality, group(Unique_ID)

The coefficient of 2.modality will be the log of the Vape:Smoke odds ratio. The closer that log odds ratio is to 0 the more similar the probabilities of in-school vaping and smoking.

Now, this code does not run in your example data. The reason for that is that between missing values, and Unique_IDs where they either both Vape and Smoke or neither Vape nor Smoke in school, you are left with only two informative Unique_IDs. Moreover, both of those two Vape but do not Smoke, so the Vape:Smoke odds ratio is infinite and the -clogit- estimation cannot converge. Presumably your real data set is appreciably larger and has enough discordant pairs in different directions that you will not encounter this difficulty there.

Added: Crossed with #3.

Last edited by Clyde Schechter; 07 Oct 2019, 18:14.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30066
#5

07 Oct 2019, 18:17

In response to #3: it is not a good idea to use -svy: proportion whatever if usergroup == 3-. In order for the survey characteristics to be properly accounted for, you must have the entire survey sample available for use, and the -if- condition blocks that. The correct way to do subset analyses in survey data is to use the -subpop()- option in the -svy:- prefix. See what I have done in #4 for an example.
Comment
Edward Sutanto

Join Date: Jul 2019

Posts: 11
#6

08 Oct 2019, 07:13

I see. Thank you very much for the answer!
Comment

Announcement

Help with Testing Difference of Proportion in an Overlapping Group

Comment

Comment

Comment

Comment

Comment