Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Accounting for complex survey design AND longitudinal/correlated data

    Dear Statalisters,

    Say I am interested in analyzing data from a labor force survey using a regression model. The data contains the following:
    1. Survey weights
    2. Information on survey design effects such as clustering/stratum variables
    3. Repeated observations

    And say I am interested in running a linear regression model using a GEE/marginal/population averaged model (treating the clustering as a nuisance).

    Is there a way to handle all three in Stata? As I see it, here are my options:
    A. Model that accounts for #1 (survey weights) and #2 (survey design) using svy and svyset commands.
    B. Model that accounts for #1 (survey weights) and #3 (repeated observations) using [pweight=] and xtgee commands.
    C. But no #1, #2 and #3.

    1. If not, would option A (survey weights plus svy commands) be robust enough to account for any clustering (subsumed within the overall PSU/cluster variable)?
    2. Or, would option B (survey weights plus GEE) be robust enough to account for any survey design factors?
    3. Or, is there a way to incorporate the clustering by repeated observations into the survey design clustering variable, so that both get accounted for? And then I can just run a svy model that incorporates weights plus clustering (repeated observations) plus robust variance estimation (to account for survey design factors to some extent). Similar to this paper (see page 12): https://support.sas.com/resources/pa...AS404-2014.pdf

    I was reading this presentation and it suggests that SUDAAN may be the only program that handles both survey information AND repeated observations (as of 2016): http://www.itcproject.org/files/Anal..._using_GEE.pdf

    Sample data:

    Code:
    //example survey data
    use "https://stats.idre.ucla.edu/stat/stata/faq/svysmall", clear
    
    //create repeated measures
    rename y y1
    set seed 99999
    generate y2=floor((9-3+1)*runiform()+3)
    generate y3=floor((9-3+1)*runiform()+3)
    
    //respondent identifier
    generate id=_n
    
    //reshape from wide to long for analysis
    list, sepby(id)
    reshape long y, i(id) j(time)
    list, sepby(id)
    
    //Option A: survey regression that accounts for weights and survey design
    svyset house [pweight = wt], strata(eth)
    svy: regress y x1 x2 x3
    
    //Option B: GEE model that accounts for weights but no survey design
    //coefficients are the same as above, but standard errors are different
    xtset id
    xtgee y x1 x2 x3 [pweight=wt], family(gaussian) link(identity) corr(exchangeable)
    
    //Option C: is there a way to incorporate the clustering by ID into the House cluster to account for clustering by individuals (repeated observations) as the lowest level of clustering?
    svyset id [pweight = wt], strata(eth)
    svy: regress y x1 x2 x3
    Last edited by Jenny Williams; 20 Apr 2018, 17:31.

  • #2
    The mixed/me commands let you use svyset data (unlike the xt commands). Will they do what you need? For some simple examples, see

    https://www3.nd.edu/~rwilliam/xsoc73994/Multilevel.pdf
    Last edited by Richard Williams; 20 Apr 2018, 17:22.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    Stata Version: 17.0 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Hi Jenny,

      I have a similar issue with my data which is both a complex survey design (clusters) with repeated measures on individuals within those clusters - essentially my quasi-experimental data set needs to account for your points #2 (Information on survey design effects such as clustering/stratum variables) and #3 (Repeated observations) from your original post. I'm hoping to use GEE (and not ME) as I'm interested in population average effects and am treating clustering as a nuisance. Looks like this thread hasn't been active in awhile - have you had any luck in the past year or so? Everything I've read thus far re: the xtgee command is that it can only account for one level of clustering.

      Comment


      • #4
        I'm having the same problem, Jenny! I'm using a proportional hazards marginal structural model in the context of a complex sampling scheme. So I have (A) repeated observations, (B) weights (inverse probability of treatment and inverse probability of selection), and (C) stratum and PSU variables I want to incorporate. It's very frustrating!

        One thing I'm experimenting with is using fixed effects representing the stratum and PSU variables to handle issue C, then using [pw=weight] and cluster(<person ID>) to handle issues A and B. It's not very elegant, and it might explode if you have a lot of strata/PSUs, but it might be worth a shot!

        Comment


        • #5
          PS - I tried using GLLAMM and/or melogit, which claims to be able to handle all of the above; it wouldn't converge for me no matter what I did, but might be worth exploring for anybody else having this issue!

          Comment

          Working...
          X