Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Weighting subsample to full sample- using STATA-logistic

    Hi experts,

    My full sample is ~6000 participants and the subsample is ~350 participants. ~350 pts were chosen from the full sample to run another medical test. The associations between primary predictor X and outcome Y are significant across different adjustment models (p<0.05) for the full sample. However, these associations between X and Y are not significant for the subsample. We thought it's because the subsample has different age/race/.../ distributions compared with the full sample. For example, the full sample has over 70% blacks while in the subsample almost half were blacks. So I thought we should "up" weight the subsample to the full sample, to make the subsample similar to the full sample. And we expect to have significant results for the associations between X and Y for the subsample.

    1. After we create the weights, we include `[pweight=wt]` in the regression models, but the associations between Y and X for subsample are still not significant even after doing upweighting. Can you please provide any suggestion why it is not significant?

    2. Is the weight created correctly(see codes below)? I use pweights instead of fweight because we dont know how the subsample was selected. Is `gen wt = ( obspr / _b[obspr] ) * e(N)` correct? Or anything wrong with my codes?

    `full.dta` is full sample dataset, with the "sub" indicator to indicate whether this participant is in the subsample or not.
    `sub.dta` is the subsample dataset. Below is my STATA codes. Thanks!


    use "1-data\full.dta", clear
    keep if sub==1 // sub is the indicator of subsample
    save "1-data\sub.dta", replace

    use "1-data\full.dta", clear
    *predict probability of being selected for the subsample using
    logistic sub a b c d
    *only list covariates a b c d to predict, because primary predictor X and other covariates
    have too much missing by looking at missing data patterns with "misstable tree"

    predict obspr , p
    quietly total obspr // get sum of the probs
    gen wt = ( obspr / _b[obspr] ) * e(N)
    *_b[obspr] is the sum of obspr, e(N) is subsample size, I think wt=(p/sum of p)*N

    *codebook wt if sub // check coverage of the weighting probs (nearly all)






  • #2
    Cross-posted from http://stackoverflow.com/questions/4...tic-regression

    I did suggest there that you post here, but you are also asked to tell us about cross-posting. http://www.statalist.org/forums/help#crossposting

    Comment


    • #3
      Originally posted by Nick Cox View Post
      Cross-posted from http://stackoverflow.com/questions/4...tic-regression

      I did suggest there that you post here, but you are also asked to tell us about cross-posting. http://www.statalist.org/forums/help#crossposting
      Thanks for reminder Nick!

      Comment


      • #4
        The pweights may get the relative weights (and hence the point estimates) right, but the point remains that the subsample only has 350 cases while the full sample has 6,000 cases. So, a much smaller sample size alone could account for the lack of statistical significance. How do the actual estimates compare between full and subsample? If more or less similar, this could reinforce the idea that differences in sample size are what is critical here, rather than differences in the relationship between X and Y.

        Further, you say you don't know how the subsample was selected. So the relationship between X and Y may be different for it than it is for the full sample. Who knows, maybe these cases were selected precisely because X and Y did not seem to be related for them.

        I haven't checked your weight coding. But I suspect the above points may be the most critical ones.
        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        StataNow Version: 19.5 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://academicweb.nd.edu/~rwilliam/

        Comment


        • #5
          Originally posted by Richard Williams View Post
          The pweights may get the relative weights (and hence the point estimates) right, but the point remains that the subsample only has 350 cases while the full sample has 6,000 cases. So, a much smaller sample size alone could account for the lack of statistical significance. How do the actual estimates compare between full and subsample? If more or less similar, this could reinforce the idea that differences in sample size are what is critical here, rather than differences in the relationship between X and Y.

          Further, you say you don't know how the subsample was selected. So the relationship between X and Y may be different for it than it is for the full sample. Who knows, maybe these cases were selected precisely because X and Y did not seem to be related for them.

          I haven't checked your weight coding. But I suspect the above points may be the most critical ones.
          Thanks for your reply Richard I agree really small sample size could not guarantee we can have the same significant results as the full sample did. And yes,the estimates are similar compare between full and subsample. Only the significance changed. Could you also look at my weight coding? I'm also curious did I code it correctly. Thanks again!

          Comment


          • #6
            Well, you should know from your full sample what the % black should be as well as whatever other variables you think you need to adjust for, So, I would do something like

            svy: mean black

            or maybe something like

            svy: tabulate race

            or maybe

            svy: tabulate race gender

            If it doesn't look right, you can go over your code more carefully. And even if your code did look right, you should do something like this to confirm that there isn't some problem you overlooked.
            -------------------------------------------
            Richard Williams, Notre Dame Dept of Sociology
            StataNow Version: 19.5 MP (2 processor)

            EMAIL: [email protected]
            WWW: https://academicweb.nd.edu/~rwilliam/

            Comment

            Working...
            X