Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Complex survey design with repeated waves of study data svyset misspecification

    Dear Statalist members,

    I am using the National Health and Aging Trends study (NHATS) to evaluate number of hospitalizations per year per person in subpopulation of persons with dementia with multiple chronic conditions who are followed over 3 waves/years of data (round). Follow up is one interview per year. My goal is to use a poisson model either a Multilevel/hierarchical model or a GEE.
    Based on the NHATS technical paper 1&2:Round 1 (2011) of NHATS used a stratified three-stage sampling design:
    1. 95 PSUs, which are countries of groups of counties (using probability proportional size sampling).
    2. At the second stage, 655 SSUs within sampled PSUs, these are zip clusters from a sampling frame constructed from a 20% subsample of persons enrolled in medicare as of Sept 2010 who resided in the 95 PSUs sampled for NHATS(. Zip clusters were proportional size sampling.)
    3. Final stage was selection of beneficiaries within sampled zip cluster/SSU who were age 65 and older with oversamples of the oldest age groups and of black non-Hispanic persons. The probabilities of selection at each of the three stages were desinged to yield equal probability samples and targeted sample sizes by age group and race/ethnicity. The number of people (SPID) included is 8,245. My subpopulation is 909.
    NHATS provided me this svyset code:

    svyset w1varunit [pweight=w1anfinwgt0], strat(w1varstrat)

    stratum: w1varstrat and cluster: w1varunit are the variables used to allow users to compute variance estimate using Taylor series linearization.
    w1anfinwgt0=sampling analytic weight
    However when I run this with the code I see that the waves are not taken into consideration.

    svyset w1varunit [pweight=w1anfinwgt0], strat(w1varstrat)

    svy linearized, subpop(variable) : mepoisson dependent variable b1.i.independent variable covariates, irr



    Output: with survey weights and strata
    Survey: Poisson regression (running mepoisson on estimation sample)
    Number of strata = 56 Number of obs = 23,140
    Number of PSUs = 112 Population size = 105,193,875.00
    Subpop. no. obs = 1,927
    Subpop. size = 6,939,508.90
    Design df = 56
    F( 22, 35) = 14.57
    Prob > F =
    I know using just the sub sample that my data results should be more like this:

    Output on subpopulation unweighted:
    GEE population-averaged model Number of obs 1,927
    Group variable: spid Number of groups 909
    Link: log Obs per group:
    Family: Poisson min 1
    Correlation: exchangeable avg 2.1
    max 3
    Wald chi2(22) 331.34
    Scale parameter: 1.3196 Prob > chi2
    I have tried to modify the svyset command but I have been unsuccessful. If anyone knows how to incorporate my repeated measures of SPID. I would be extremely grateful.

    These are other codes I have tried that didn't work

    svyset w1varstrat [pw = w1anfinwgt0], strata(w1varunit)|| round, strata(spid),
    Note: Stage 1 is sampled with replacement; further stages will be ignored for variance estimation.

    pweight: w1anfinwgt0
    VCE: linearized
    Single unit: missing
    Strata 1: w1varunit
    SU 1: w1varstrat
    FPC 1: <zero>

    . svydes

    Survey: Describing stage 1 sampling units

    pweight: w1anfinwgt0
    VCE: linearized
    Single unit: missing
    Strata 1: w1varunit
    SU 1: w1varstrat
    FPC 1: <zero>

    #Obs per Unit
    ----------------------------
    Stratum #Units #Obs min mean max
    -------- -------- -------- -------- -------- --------
    1 56 12,633 42 225.6 381
    2 56 12,102 39 216.1 336
    -------- -------- -------- -------- -------- --------
    2 112 24,735 39 220.8 381




  • #2
    If you are using a subsample you need to use the subpop option. Your code shows a generic version so I can't quite tell exactly what you did. Used correctly, the subpop option should produce a set of cases akin to what you find in the unweighted analysis but read the discussion of how it works carefully as it is a bit tricky.

    Apparently the people who released the NHATS data set think that a single set of weights is appropriate. Aside from that, I don't think the svyset statement is where you want to deal with repeated measures; you have to do that in the setup for mepoisson by specifying either random or fixed effects and a correlation structure if you are doing the former. The help file for mepoisson deals with "multilevel models", e.g. patients nested in hospitals measured on a single occasion, but repeated measures nested in patients is a special case. I assume that you can use it for longitudinal data but please verify that independently.
    Richard T. Campbell
    Emeritus Professor of Biostatistics and Sociology
    University of Illinois at Chicago

    Comment


    • #3
      Dear Richard,

      Thank you so much for you help for throwing me a lifeline! I will try it out asap.

      Have a wonderful day!

      Best wishes,
      Janet



      Janet MacNeil Vroomen, Ph.D.
      Post-doctorate Associate
      Program on Aging
      Yale University

      Comment


      • #4
        Dear Statalist or Prof Richard Campbell,

        I am using stata 14. I tried to use the NHATS code I got this message from Stata:

        Syntax:
        svyset w1varunit [pweight=w1anfinwgt0], strat(w1varstrat)

        svy linearized, subpop(possprobge2mcc) : mepoisson n_edhosp b2.i.newdemclas_r b1.i.cgdmchf diab_complicnew chf_complicnew n_mcc number_caregivers ///
        i.livealone i.dr1dgender i.del1higstschl i.r1d2intvrage i.r1bmi3cat i.drl1dracehisp || spid:

        Error message

        (running mepoisson on estimation sample)
        survey final weights not allowed with multilevel models;
        a final weight variable was svyset using the [pw=exp] syntax, but multilevel models require that each stage-level weight variable is svyset using the stage's corresponding weight() option
        an error occurred when svy executed mepoisson

        So I took out for now the pweight from the survey set and this is what I was displayed:

        Syntax:
        svyset w1varunit, strat(w1varstrat)
        svy linearized, subpop(possprobge2mcc) : mepoisson n_hospstay i.cgdmchf || spid:

        Output:
        (running mepoisson on estimation sample)

        Survey: Mixed-effects Poisson regression

        Number of strata = 56 Number of obs = 23,832
        Number of PSUs = 112 Population size = 23,832
        Subpop. no. obs = 2,115
        Subpop. size = 2,115
        Design df = 56
        F( 7, 50) = 25.24
        Prob > F = 0.0000

        ------------------------------------------------------------------------------
        | Linearized
        n_hospstay | Coef. Std. Err. t P>|t| [95% Conf. Interval]
        -------------+----------------------------------------------------------------
        |
        cgdmchf |
        2 | 1.309829 .2622466 4.99 0.000 .7844855 1.835172
        3 | .3557816 .2825009 1.26 0.213 -.2101357 .9216988
        4 | 1.803099 .242885 7.42 0.000 1.316541 2.289656
        5 | .8204583 .2082461 3.94 0.000 .4032913 1.237625
        6 | 1.471425 .2075564 7.09 0.000 1.055639 1.88721
        7 | .9253856 .223169 4.15 0.000 .4783243 1.372447
        8 | 1.802185 .2066184 8.72 0.000 1.388278 2.216091
        |
        _cons | -1.876387 .192247 -9.76 0.000 -2.261504 -1.49127
        -------------+----------------------------------------------------------------
        spid |
        var(_cons)| .6653813 .0704 .5382964 .8224694
        ------------------------------------------------------------------------------


        The problem is that I don't see the repeated measures (I have three rounds of data so my population should by 909 and I would hope to see observations per group min=1 and max=3. I know I am doing something wrong.

        I also tried the below syntax but I don't see the repeated measures either.

        syntax:
        svy linearized, subpop(possprobge2mcc) : mepoisson n_hospstay i.cgdmchf || spid: || round:
        (running mepoisson on estimation sample)

        Survey: Mixed-effects Poisson regression

        Number of strata = 56 Number of obs = 23,832
        Number of PSUs = 112 Population size = 23,832
        Subpop. no. obs = 2,115
        Subpop. size = 2,115
        Design df = 56
        F( 7, 50) = 26.02
        Prob > F = 0.0000

        ------------------------------------------------------------------------------
        | Linearized
        n_hospstay | Coef. Std. Err. t P>|t| [95% Conf. Interval]
        -------------+----------------------------------------------------------------
        |
        cgdmchf |
        2 | 1.323356 .258098 5.13 0.000 .8063233 1.840388
        3 | .3109146 .2746337 1.13 0.262 -.2392428 .8610719
        4 | 1.760982 .2323047 7.58 0.000 1.29562 2.226344
        5 | .816739 .2066508 3.95 0.000 .4027678 1.23071
        6 | 1.522114 .2019334 7.54 0.000 1.117593 1.926635
        7 | .9643293 .2210411 4.36 0.000 .5215308 1.407128
        8 | 1.834725 .2032946 9.02 0.000 1.427477 2.241973
        |
        _cons | -1.97622 .1888941 -10.46 0.000 -2.35462 -1.597819
        -------------+----------------------------------------------------------------
        spid |
        var(_cons)| .4904738 .0712084 .3666969 .656031
        -------------+----------------------------------------------------------------
        spid>round |
        var(_cons)| .311078 .0652603 .2043406 .4735696
        ------------------------------------------------------------------------------

        If any one sees what I am doing wrong I would love to know.

        Sincerely,
        Janet

        Comment

        Working...
        X