Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with Testing Difference of Proportion in an Overlapping Group

    Hi Statalist members,

    I am having difficulty in testing a difference of proportion. I am currently using a nationally representative dataset (thus svy command) and interested in the use of nicotine product (either cigarettes or e-cigarettes) in school. The group I want to analyze is dual users (using both cigarettes and e-cigarettes product). My aim is to know whether there was a difference in the prevalence of using cigarettes vs e-cigarettes in school among dual users (so I am thinking of using chi-square test).

    Using svy: proportion command, I already have the proportion of use of their respective products in schools among dual users (cigarettes: 61.47% and e-cigarette: 95.98%). However I am unable to test whether the proportion between the two product use in school are different. I am wondering how could I test it?
    Attached below is a fraction of my dataset. Any help would be greatly appreciated. Thank you very much.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(Smokers Vapers Smoke_School Vape_School Survey_Weight Unique_ID usergroup)
    1 0 1 . 2734.12  1 1
    0 1 . 1 3498.22  2 2
    1 0 1 .  130.29  3 1
    1 0 1 . 1231.22  4 1
    1 1 0 1  490.43  5 3
    1 1 1 1 2342.11  6 3
    0 1 . 0 3453.56  7 2
    0 1 . 0  636.78  8 2
    1 1 1 1 2956.23  9 3
    0 1 . 1 3453.22 10 2
    0 1 . 1 2341.19 11 2
    0 1 . 0 3458.54 12 2
    1 0 0 . 1989.24 13 1
    1 0 0 . 2312.33 14 1
    1 1 0 1 3432.11 15 3
    1 0 0 .  657.45 16 1
    1 1 0 0  456.94 17 3
    1 0 1 .  357.23 18 1
    0 1 . 0  689.23 19 2
    1 1 1 1 1423.12 20 3
    0 1 . 1 2425.23 21 2
    1 0 0 . 2345.11 22 1
    0 1 . 1 1632.12 23 2
    0 1 . 0 1982.12 24 2
    1 1 1 1  267.42 25 3
    1 0 0 . 1838.38 26 1
    end
    label values Smokers var1
    label def var1 0 "No", modify
    label def var1 1 "Yes", modify
    label values Vapers Vapers
    label def Vapers 0 "No", modify
    label def Vapers 1 "Yes", modify
    label values Smoke_School Smoke_Private
    label def Smoke_Private 0 "No", modify
    label def Smoke_Private 1 "Yes", modify
    label values Vape_School Vape_Private
    label def Vape_Private 0 "No", modify
    label def Vape_Private 1 "Yes", modify
    label values usergroup usergroup
    label def usergroup 1 "Exclusive Smokers", modify
    label def usergroup 2 "Exclusive Vapers", modify
    label def usergroup 3 "Dual Users", modify

  • #2
    My aim is to know whether there was a difference in the prevalence of using cigarettes vs e-cigarettes in school among dual users (so I am thinking of using chi-square test).
    What?? By definition, the prevalence of using cigarettes and the prevalence of using e-cigarettes among dual users is 100%. What am I not understanding here?

    Comment


    • #3
      Hi Clyde, sorry for the misunderstanding. I am testing the prevalence of using cigarettes in school vs e-cigarettes in school among dual users. There might be dual users who only use cigarettes in school, or only use e-cigarettes in school, or using both of them at school (thus it is not 100%).

      I already know their each proportion by:
      svy: proportion Smoke_School if usergroup==3
      svy: proportion Vape_School if usergroup==3

      However, I am unable to test their difference-in-proportions. I hope this explanation helps to clear up my question.

      Comment


      • #4
        Please disregard what I wrote in #2. I see that I misread your post. You are concerned about the prevalence of in school use of these modalities among dual users (defined by overall use, not just in school). So what you have here is paired data, and it is made more complicated because it is a complex survey design. You do not provide any variable that looks like an identifier for strata or primary (or other order) sampling units. So I'll assume there are no strata and that the individual Unique_ID is the primary (and only order) sampling unit. If that's not true, modify the code accordingly.

        Code:
        svyset [pweight = Survey_Weight], psu(Unique_ID)
        
        reshape long @_School, i(Unique_ID) j(_modality) string
        
        encode _modality, gen(modality)
        
        svy, subpop(if usergroup == 3): clogit _School i.modality, group(Unique_ID)
        The coefficient of 2.modality will be the log of the Vape:Smoke odds ratio. The closer that log odds ratio is to 0 the more similar the probabilities of in-school vaping and smoking.

        Now, this code does not run in your example data. The reason for that is that between missing values, and Unique_IDs where they either both Vape and Smoke or neither Vape nor Smoke in school, you are left with only two informative Unique_IDs. Moreover, both of those two Vape but do not Smoke, so the Vape:Smoke odds ratio is infinite and the -clogit- estimation cannot converge. Presumably your real data set is appreciably larger and has enough discordant pairs in different directions that you will not encounter this difficulty there.

        Added: Crossed with #3.
        Last edited by Clyde Schechter; 07 Oct 2019, 18:14.

        Comment


        • #5
          In response to #3: it is not a good idea to use -svy: proportion whatever if usergroup == 3-. In order for the survey characteristics to be properly accounted for, you must have the entire survey sample available for use, and the -if- condition blocks that. The correct way to do subset analyses in survey data is to use the -subpop()- option in the -svy:- prefix. See what I have done in #4 for an example.

          Comment


          • #6
            I see. Thank you very much for the answer!

            Comment

            Working...
            X