Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Survey data: Probability weights and adjusting for clustering in a logistic regression (non-random sample)

    Hi everyone,

    I hope to get some timely help on a dataset I am working on. I am struggling a bit with the code and everyone around me seems to be unfamiliar with Stata codes. I would really appreciate it!

    Description of the dataset:
    • 4351 Observations
    • We oversampled for e-cigarette users and balanced other variables based on age, sex, LGBTQ, race/ethcnicity. Then we created one weight variable to be able to match the population target proportions (outwgt).
    • Not a random sample, but an online national convenience sample.
    • We also collected data on state (53 states) and region of participants (5 regions). The primary sampling unit is id, or individual participants. There were no multi-stage clusters or data collection involved by state or region.
    • Outcome variables are binary and would prefer Odds Ratio as the output.

    What I want to do? Assess associations between the outcome variable and my dependent variables.

    Approach 1.

    Steps I took:


    1. I am setting the survey weight and entering cluster information through the code below:

    Code:
    svyset id [pweight=outwgt], strata(state) vce(linearized) singleunit(missing) || id, strata(region)
    I think this code is inaccurate as it may be telling Stata that I first sampled by region, state and then id? I simply want to adjust for clustering effects, but I think the code I have entered tells Stata that multi-stage sampling is happening.

    2. Next I conducted a logistic regression

    Code:
    svy linearized : logistic covidsymptoms i.totalvape i.agecat i.gender i.lgbtq i.newre i.complysip
    Issues I need help with:

    I get a an error message that:
    Code:
    Note: Stage 1 is sampled with replacement; further stages will be ignored for variance estimation.
    And so I am not sure if Stata is ignoring region data altogether.


    Need feedback on Approach 1 and what it is doing.


    Approach 2.


    I tried to change my approach to the following steps:

    1. I removed the state and region information from the svy command:

    Code:
     svyset id [pweight=outwgt]
    
          pweight: outwgt
              VCE: linearized
      Single unit: missing
         Strata 1: <one>
             SU 1: id
            FPC 1: <zero>
    2. And, then I ran a logistic regression:

    Code:
    . logistic symptoms totalvape i.agecat i.gender i.lgbtq i.newre i.complysip, vce(cluster state)
    I think this code looks after both the weight and the state-level clustering; BUT I am not sure how to add region to this regression?

    Please help.
    Shivani

Working...
X