Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Use of subpop command in SVY survey regression

    Hello,

    I'm working with survey data to examine the association between, for example, BMI and heart attack outcomes, while accounting for survey design effects (using SVY commands). I noticed in the SVY help files the following quote:

    "Warning: Using if or in restrictions will often not produce correct variance estimates for subpopulations. To compute estimates for subpopulations, use the subpop() option."

    Questions:

    1. I have a set of inclusion/exclusion criteria that I would like to apply to the cohort. Is it OK for me to drop the excluded records from the data set before running my regressions? Or do I have to use the subpop command?

    Code:
    use "http://www.stata-press.com/data/r13/nhanes2.dta", clear
    svydescribe
    
    generate included=1 if region==3
    replace included=0 if region!=3
    
    /*Do I use the subpop command to apply my exclusion criteria?*/
    svy, subpop(included): logistic heartatk c.bmi i.race i.diabetes c.age
    
    /*Or do I just drop the excluded records, since I'm not interested in analyzing these as a sub-population, but rather as a final study cohort with exclusion criteria applied?*/
    svy: logistic heartatk c.bmi i.race i.diabetes c.age if included==1
    2. Also, for interaction terms, is it valid to use the "##" operator (and the lincom function) with the SVY prefix?

    Code:
    svy: logistic heartatk c.bmi##i.female i.race i.diabetes c.age if included==1
    
    /*odds ratios for effect of BMI on heart attack, for men*/
    lincom _b[c.bmi]+_b[c.bmi#0.female], or
    
    /*odds ratios for effect of BMI on heart attack, for women*/
    lincom _b[c.bmi]+_b[c.bmi#1.female], or
    3. Finally, if I want stratified estimates, then I should be using the subpop option (to include the entire sample in the variance calculations), correct?

    Code:
    generate male=1 if female==0
    replace male=0 if female==1
    
    /**********effect of BMI on heart attack, for MEN**********/
    
    /*use subpop?*/
    svy, subpop(male): logistic heartatk c.bmi i.race i.diabetes c.age
    
    /*but don't use "if" restrictions?*/
    svy: logistic heartatk c.bmi i.race i.diabetes c.age if female==0
    
    
    /**********effect of BMI on heart attack, for WOMEN**********/
    
    /*use subpop?*/
    svy, subpop(female): logistic heartatk c.bmi i.race i.diabetes c.age
    
    /*but don't use "if" restrictions?*/
    svy: logistic heartatk c.bmi i.race i.diabetes c.age if female==1
    Thanks

    ALL CODE TOGETHER:

    Code:
    use "http://www.stata-press.com/data/r13/nhanes2.dta", clear
    svydescribe
    
    /*QUESTION 1*/
    
    generate included=1 if region==3
    replace included=0 if region!=3
    
    /*Do I use the subpop command to apply my exclusion criteria?*/
    svy, subpop(included): logistic heartatk c.bmi i.race i.diabetes c.age
    
    /*Or do I just drop the excluded records, since I'm not interested in analyzing these as a sub-population, but rather as a final study cohort with exclusion criteria applied?*/
    svy: logistic heartatk c.bmi i.race i.diabetes c.age if included==1
    
    /*QUESTION 2*/
    
    svy: logistic heartatk c.bmi##i.female i.race i.diabetes c.age if included==1
    
    /*odds ratios for effect of BMI on heart attack, for men*/
    lincom _b[c.bmi]+_b[c.bmi#0.female], or
    
    /*odds ratios for effect of BMI on heart attack, for women*/
    lincom _b[c.bmi]+_b[c.bmi#1.female], or
    
    /*QUESTION 3*/
    
    generate male=1 if female==0
    replace male=0 if female==1
    
    /**********effect of BMI on heart attack, for MEN**********/
    
    /*use subpop?*/
    svy, subpop(male): logistic heartatk c.bmi i.race i.diabetes c.age
    
    /*but don't use "if" restrictions?*/
    svy: logistic heartatk c.bmi i.race i.diabetes c.age if female==0
    
    
    /**********effect of BMI on heart attack, for WOMEN**********/
    
    /*use subpop?*/
    svy, subpop(female): logistic heartatk c.bmi i.race i.diabetes c.age
    
    /*but don't use "if" restrictions?*/
    svy: logistic heartatk c.bmi i.race i.diabetes c.age if female==1
    Last edited by Jenny Williams; 03 Sep 2017, 23:00.

  • #2
    You shouldn't drop cases; use subpop. UCLA had/has a FAQ on this but it seems to have been moved so instead see p. 3 of

    https://www3.nd.edu/~rwilliam/stats3/SvyCautionsX.pdf

    Way back when Austin Nichols posted on how you could drop cases but I don't have the link handy.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 19.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Here is Austin's post on how you can draw a subsample and get accurate results:

      https://www.stata.com/statalist/arch.../msg00810.html

      It seems pretty complicated though. Unless the full data set is absolutely monstrous, I would use the subpop option.
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      StataNow Version: 19.5 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://www3.nd.edu/~rwilliam

      Comment


      • #4
        Hello Professor Williams,

        Thanks for your response and for the links! Appreciated. A few follow up questions:

        1. If my data contains bootstrap replicate weights, and if I am using these to calculate standard errors, then does it matter if I use "keep if" versus "subpop"? In other words, does the subpop option only become necessary if I am svysetting my data with PSU and strata variable (and not just weights)?

        2. My survey data contains missing information for some covariates included in the regression models (e.g., age is missing for some individuals). Thus, when running a "svy: logistic outcomeY exposureX age" model, then some records are naturally dropped from the models.

        Do I have to explicitly specify non-missing age, and then use this in the subpop option? (see code below)

        Or, do the SVY commands automatically use the full sample in the standard error calculations, including both missing and non-missing age records, and thus I can let the models delete missing age as per the usual listwise deletion default?

        If the latter is the case, then how is this any different than specifying a sub-sample of dropped records using "keep if"???

        Code:
        use "http://www.stata-press.com/data/r13/nhanes2.dta", clear
        
        generate randomsort=runiform()
        sort randomsort
        replace age=. in 1/1000
        
        /*Do I have to explicitly specify non-missing age, and then use this in the subpop option?*/
        generate include=1 if age!=.
        svy, subpop(include): logistic heartatk c.bmi i.race i.diabetes c.age
        
        /*Or, do the SVY commands automatically use the full sample in the standard error calculations, including both missing and non-missing age records, and thus I can let the models delete missing age as per the usual listwise deletion default?*/
        svy: logistic heartatk c.bmi i.race i.diabetes c.age
        
        /*If the latter is the case, then how is this any different than specifying a sub-sample of dropped records using "keep if"???*/
        svy: logistic heartatk c.bmi i.race i.diabetes c.age if age!=.
        Last edited by Jenny Williams; 04 Sep 2017, 20:39.

        Comment


        • #5
          This is when we could use Austin Nichols back on the list. But my guess is (a) you should not use keep if; use subpop (b) Stata can figure out how to handle missing data. You do not need to explicitly exclude it.

          Anyone who knows better can feel free to correct me.
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 19.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment


          • #6
            Thanks for the help!

            Comment

            Working...
            X