Hi everyone,
I hope to get some timely help on a dataset I am working on. I am struggling a bit with the code and everyone around me seems to be unfamiliar with Stata codes. I would really appreciate it!
Description of the dataset:
What I want to do? Assess associations between the outcome variable and my dependent variables.
Approach 1.
Steps I took:
1. I am setting the survey weight and entering cluster information through the code below:
I think this code is inaccurate as it may be telling Stata that I first sampled by region, state and then id? I simply want to adjust for clustering effects, but I think the code I have entered tells Stata that multi-stage sampling is happening.
2. Next I conducted a logistic regression
Issues I need help with:
I get a an error message that:
And so I am not sure if Stata is ignoring region data altogether.
Need feedback on Approach 1 and what it is doing.
Approach 2.
I tried to change my approach to the following steps:
1. I removed the state and region information from the svy command:
2. And, then I ran a logistic regression:
I think this code looks after both the weight and the state-level clustering; BUT I am not sure how to add region to this regression?
Please help.
Shivani
I hope to get some timely help on a dataset I am working on. I am struggling a bit with the code and everyone around me seems to be unfamiliar with Stata codes. I would really appreciate it!
Description of the dataset:
- 4351 Observations
- We oversampled for e-cigarette users and balanced other variables based on age, sex, LGBTQ, race/ethcnicity. Then we created one weight variable to be able to match the population target proportions (outwgt).
- Not a random sample, but an online national convenience sample.
- We also collected data on state (53 states) and region of participants (5 regions). The primary sampling unit is id, or individual participants. There were no multi-stage clusters or data collection involved by state or region.
- Outcome variables are binary and would prefer Odds Ratio as the output.
What I want to do? Assess associations between the outcome variable and my dependent variables.
Approach 1.
Steps I took:
1. I am setting the survey weight and entering cluster information through the code below:
Code:
svyset id [pweight=outwgt], strata(state) vce(linearized) singleunit(missing) || id, strata(region)
2. Next I conducted a logistic regression
Code:
svy linearized : logistic covidsymptoms i.totalvape i.agecat i.gender i.lgbtq i.newre i.complysip
I get a an error message that:
Code:
Note: Stage 1 is sampled with replacement; further stages will be ignored for variance estimation.
Need feedback on Approach 1 and what it is doing.
Approach 2.
I tried to change my approach to the following steps:
1. I removed the state and region information from the svy command:
Code:
svyset id [pweight=outwgt]
pweight: outwgt
VCE: linearized
Single unit: missing
Strata 1: <one>
SU 1: id
FPC 1: <zero>
Code:
. logistic symptoms totalvape i.agecat i.gender i.lgbtq i.newre i.complysip, vce(cluster state)
Please help.
Shivani
