Survey data: Probability weights and adjusting for clustering in a logistic regression (non-random sample)

Shivani Gaiha

Join Date: Mar 2019

Posts: 7
#1

Survey data: Probability weights and adjusting for clustering in a logistic regression (non-random sample)

28 May 2020, 14:34

Hi everyone,

I hope to get some timely help on a dataset I am working on. I am struggling a bit with the code and everyone around me seems to be unfamiliar with Stata codes. I would really appreciate it!

Description of the dataset:
4351 Observations

We oversampled for e-cigarette users and balanced other variables based on age, sex, LGBTQ, race/ethcnicity. Then we created one weight variable to be able to match the population target proportions (outwgt).

Not a random sample, but an online national convenience sample.

We also collected data on state (53 states) and region of participants (5 regions). The primary sampling unit is id, or individual participants. There were no multi-stage clusters or data collection involved by state or region.

Outcome variables are binary and would prefer Odds Ratio as the output.

What I want to do? Assess associations between the outcome variable and my dependent variables.

Approach 1.

Steps I took:

1. I am setting the survey weight and entering cluster information through the code below:

Code:

svyset id [pweight=outwgt], strata(state) vce(linearized) singleunit(missing) || id, strata(region)

I think this code is inaccurate as it may be telling Stata that I first sampled by region, state and then id? I simply want to adjust for clustering effects, but I think the code I have entered tells Stata that multi-stage sampling is happening.

2. Next I conducted a logistic regression

Code:

svy linearized : logistic covidsymptoms i.totalvape i.agecat i.gender i.lgbtq i.newre i.complysip

Issues I need help with:

I get a an error message that:

Code:

Note: Stage 1 is sampled with replacement; further stages will be ignored for variance estimation.

And so I am not sure if Stata is ignoring region data altogether.

Need feedback on Approach 1 and what it is doing.

Approach 2.

I tried to change my approach to the following steps:

1. I removed the state and region information from the svy command:

Code:

svyset id [pweight=outwgt] pweight: outwgt VCE: linearized Single unit: missing Strata 1: <one> SU 1: id FPC 1: <zero>

2. And, then I ran a logistic regression:

Code:

. logistic symptoms totalvape i.agecat i.gender i.lgbtq i.newre i.complysip, vce(cluster state)

I think this code looks after both the weight and the state-level clustering; BUT I am not sure how to add region to this regression?

Please help.
Shivani
Tags: None

Announcement

Survey data: Probability weights and adjusting for clustering in a logistic regression (non-random sample)