Accounting for complex survey design AND longitudinal/correlated data

Jenny Williams

Join Date: Sep 2017

Posts: 35
#1

Accounting for complex survey design AND longitudinal/correlated data

20 Apr 2018, 17:12

Dear Statalisters,

Say I am interested in analyzing data from a labor force survey using a regression model. The data contains the following:
1. Survey weights
2. Information on survey design effects such as clustering/stratum variables
3. Repeated observations

And say I am interested in running a linear regression model using a GEE/marginal/population averaged model (treating the clustering as a nuisance).

Is there a way to handle all three in Stata? As I see it, here are my options:
A. Model that accounts for #1 (survey weights) and #2 (survey design) using svy and svyset commands.
B. Model that accounts for #1 (survey weights) and #3 (repeated observations) using [pweight=] and xtgee commands.
C. But no #1, #2 and #3.

1. If not, would option A (survey weights plus svy commands) be robust enough to account for any clustering (subsumed within the overall PSU/cluster variable)?
2. Or, would option B (survey weights plus GEE) be robust enough to account for any survey design factors?
3. Or, is there a way to incorporate the clustering by repeated observations into the survey design clustering variable, so that both get accounted for? And then I can just run a svy model that incorporates weights plus clustering (repeated observations) plus robust variance estimation (to account for survey design factors to some extent). Similar to this paper (see page 12): https://support.sas.com/resources/pa...AS404-2014.pdf

I was reading this presentation and it suggests that SUDAAN may be the only program that handles both survey information AND repeated observations (as of 2016): http://www.itcproject.org/files/Anal..._using_GEE.pdf

Sample data:

Code:

//example survey data use "https://stats.idre.ucla.edu/stat/stata/faq/svysmall", clear //create repeated measures rename y y1 set seed 99999 generate y2=floor((9-3+1)*runiform()+3) generate y3=floor((9-3+1)*runiform()+3) //respondent identifier generate id=_n //reshape from wide to long for analysis list, sepby(id) reshape long y, i(id) j(time) list, sepby(id) //Option A: survey regression that accounts for weights and survey design svyset house [pweight = wt], strata(eth) svy: regress y x1 x2 x3 //Option B: GEE model that accounts for weights but no survey design //coefficients are the same as above, but standard errors are different xtset id xtgee y x1 x2 x3 [pweight=wt], family(gaussian) link(identity) corr(exchangeable) //Option C: is there a way to incorporate the clustering by ID into the House cluster to account for clustering by individuals (repeated observations) as the lowest level of clustering? svyset id [pweight = wt], strata(eth) svy: regress y x1 x2 x3

Last edited by Jenny Williams; 20 Apr 2018, 17:31.
Tags: None

1 like
Richard Williams

Join Date: Apr 2014

Posts: 5008
#2

20 Apr 2018, 17:17

The mixed/me commands let you use svyset data (unlike the xt commands). Will they do what you need? For some simple examples, see

https://www3.nd.edu/~rwilliam/xsoc73994/Multilevel.pdf

Last edited by Richard Williams; 20 Apr 2018, 17:22.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Jackie Knee

Join Date: Jun 2018

Posts: 2
#3

07 May 2019, 10:04

Hi Jenny,

I have a similar issue with my data which is both a complex survey design (clusters) with repeated measures on individuals within those clusters - essentially my quasi-experimental data set needs to account for your points #2 (Information on survey design effects such as clustering/stratum variables) and #3 (Repeated observations) from your original post. I'm hoping to use GEE (and not ME) as I'm interested in population average effects and am treating clustering as a nuisance. Looks like this thread hasn't been active in awhile - have you had any luck in the past year or so? Everything I've read thus far re: the xtgee command is that it can only account for one level of clustering.
1 like
Comment
Gabriel Schwartz

Join Date: Jul 2019

Posts: 2
#4

16 Jul 2019, 08:26

I'm having the same problem, Jenny! I'm using a proportional hazards marginal structural model in the context of a complex sampling scheme. So I have (A) repeated observations, (B) weights (inverse probability of treatment and inverse probability of selection), and (C) stratum and PSU variables I want to incorporate. It's very frustrating!

One thing I'm experimenting with is using fixed effects representing the stratum and PSU variables to handle issue C, then using [pw=weight] and cluster(<person ID>) to handle issues A and B. It's not very elegant, and it might explode if you have a lot of strata/PSUs, but it might be worth a shot!
Comment
Gabriel Schwartz

Join Date: Jul 2019

Posts: 2
#5

17 Jul 2019, 12:52

PS - I tried using GLLAMM and/or melogit, which claims to be able to handle all of the above; it wouldn't converge for me no matter what I did, but might be worth exploring for anybody else having this issue!
1 like
Comment

Announcement

Accounting for complex survey design AND longitudinal/correlated data

Comment

Comment

Comment

Comment