Dear Statalisters,
Say I am interested in analyzing data from a labor force survey using a regression model. The data contains the following:
1. Survey weights
2. Information on survey design effects such as clustering/stratum variables
3. Repeated observations
And say I am interested in running a linear regression model using a GEE/marginal/population averaged model (treating the clustering as a nuisance).
Is there a way to handle all three in Stata? As I see it, here are my options:
A. Model that accounts for #1 (survey weights) and #2 (survey design) using svy and svyset commands.
B. Model that accounts for #1 (survey weights) and #3 (repeated observations) using [pweight=] and xtgee commands.
C. But no #1, #2 and #3.
1. If not, would option A (survey weights plus svy commands) be robust enough to account for any clustering (subsumed within the overall PSU/cluster variable)?
2. Or, would option B (survey weights plus GEE) be robust enough to account for any survey design factors?
3. Or, is there a way to incorporate the clustering by repeated observations into the survey design clustering variable, so that both get accounted for? And then I can just run a svy model that incorporates weights plus clustering (repeated observations) plus robust variance estimation (to account for survey design factors to some extent). Similar to this paper (see page 12): https://support.sas.com/resources/pa...AS404-2014.pdf
I was reading this presentation and it suggests that SUDAAN may be the only program that handles both survey information AND repeated observations (as of 2016): http://www.itcproject.org/files/Anal..._using_GEE.pdf
Sample data:
Say I am interested in analyzing data from a labor force survey using a regression model. The data contains the following:
1. Survey weights
2. Information on survey design effects such as clustering/stratum variables
3. Repeated observations
And say I am interested in running a linear regression model using a GEE/marginal/population averaged model (treating the clustering as a nuisance).
Is there a way to handle all three in Stata? As I see it, here are my options:
A. Model that accounts for #1 (survey weights) and #2 (survey design) using svy and svyset commands.
B. Model that accounts for #1 (survey weights) and #3 (repeated observations) using [pweight=] and xtgee commands.
C. But no #1, #2 and #3.
1. If not, would option A (survey weights plus svy commands) be robust enough to account for any clustering (subsumed within the overall PSU/cluster variable)?
2. Or, would option B (survey weights plus GEE) be robust enough to account for any survey design factors?
3. Or, is there a way to incorporate the clustering by repeated observations into the survey design clustering variable, so that both get accounted for? And then I can just run a svy model that incorporates weights plus clustering (repeated observations) plus robust variance estimation (to account for survey design factors to some extent). Similar to this paper (see page 12): https://support.sas.com/resources/pa...AS404-2014.pdf
I was reading this presentation and it suggests that SUDAAN may be the only program that handles both survey information AND repeated observations (as of 2016): http://www.itcproject.org/files/Anal..._using_GEE.pdf
Sample data:
Code:
//example survey data use "https://stats.idre.ucla.edu/stat/stata/faq/svysmall", clear //create repeated measures rename y y1 set seed 99999 generate y2=floor((9-3+1)*runiform()+3) generate y3=floor((9-3+1)*runiform()+3) //respondent identifier generate id=_n //reshape from wide to long for analysis list, sepby(id) reshape long y, i(id) j(time) list, sepby(id) //Option A: survey regression that accounts for weights and survey design svyset house [pweight = wt], strata(eth) svy: regress y x1 x2 x3 //Option B: GEE model that accounts for weights but no survey design //coefficients are the same as above, but standard errors are different xtset id xtgee y x1 x2 x3 [pweight=wt], family(gaussian) link(identity) corr(exchangeable) //Option C: is there a way to incorporate the clustering by ID into the House cluster to account for clustering by individuals (repeated observations) as the lowest level of clustering? svyset id [pweight = wt], strata(eth) svy: regress y x1 x2 x3
Comment