Hi experts,
My full sample is ~6000 participants and the subsample is ~350 participants. ~350 pts were chosen from the full sample to run another medical test. The associations between primary predictor X and outcome Y are significant across different adjustment models (p<0.05) for the full sample. However, these associations between X and Y are not significant for the subsample. We thought it's because the subsample has different age/race/.../ distributions compared with the full sample. For example, the full sample has over 70% blacks while in the subsample almost half were blacks. So I thought we should "up" weight the subsample to the full sample, to make the subsample similar to the full sample. And we expect to have significant results for the associations between X and Y for the subsample.
1. After we create the weights, we include `[pweight=wt]` in the regression models, but the associations between Y and X for subsample are still not significant even after doing upweighting. Can you please provide any suggestion why it is not significant?
2. Is the weight created correctly(see codes below)? I use pweights instead of fweight because we dont know how the subsample was selected. Is `gen wt = ( obspr / _b[obspr] ) * e(N)` correct? Or anything wrong with my codes?
`full.dta` is full sample dataset, with the "sub" indicator to indicate whether this participant is in the subsample or not.
`sub.dta` is the subsample dataset. Below is my STATA codes. Thanks!
use "1-data\full.dta", clear
keep if sub==1 // sub is the indicator of subsample
save "1-data\sub.dta", replace
use "1-data\full.dta", clear
*predict probability of being selected for the subsample using
logistic sub a b c d
*only list covariates a b c d to predict, because primary predictor X and other covariates
have too much missing by looking at missing data patterns with "misstable tree"
predict obspr , p
quietly total obspr // get sum of the probs
gen wt = ( obspr / _b[obspr] ) * e(N)
*_b[obspr] is the sum of obspr, e(N) is subsample size, I think wt=(p/sum of p)*N
*codebook wt if sub // check coverage of the weighting probs (nearly all)
My full sample is ~6000 participants and the subsample is ~350 participants. ~350 pts were chosen from the full sample to run another medical test. The associations between primary predictor X and outcome Y are significant across different adjustment models (p<0.05) for the full sample. However, these associations between X and Y are not significant for the subsample. We thought it's because the subsample has different age/race/.../ distributions compared with the full sample. For example, the full sample has over 70% blacks while in the subsample almost half were blacks. So I thought we should "up" weight the subsample to the full sample, to make the subsample similar to the full sample. And we expect to have significant results for the associations between X and Y for the subsample.
1. After we create the weights, we include `[pweight=wt]` in the regression models, but the associations between Y and X for subsample are still not significant even after doing upweighting. Can you please provide any suggestion why it is not significant?
2. Is the weight created correctly(see codes below)? I use pweights instead of fweight because we dont know how the subsample was selected. Is `gen wt = ( obspr / _b[obspr] ) * e(N)` correct? Or anything wrong with my codes?
`full.dta` is full sample dataset, with the "sub" indicator to indicate whether this participant is in the subsample or not.
`sub.dta` is the subsample dataset. Below is my STATA codes. Thanks!
use "1-data\full.dta", clear
keep if sub==1 // sub is the indicator of subsample
save "1-data\sub.dta", replace
use "1-data\full.dta", clear
*predict probability of being selected for the subsample using
logistic sub a b c d
*only list covariates a b c d to predict, because primary predictor X and other covariates
have too much missing by looking at missing data patterns with "misstable tree"
predict obspr , p
quietly total obspr // get sum of the probs
gen wt = ( obspr / _b[obspr] ) * e(N)
*_b[obspr] is the sum of obspr, e(N) is subsample size, I think wt=(p/sum of p)*N
*codebook wt if sub // check coverage of the weighting probs (nearly all)
Comment