So I have a survey that was done two ways - one by phone and one in mail. In analyzing the data, I have to compare each variable in any analysis to ensure that there is no difference between survey administration results - meaning that the average for the mail survey has to be the same as the average for the phone survey.
If the phone is no different than the mail, I can pool the data. Otherwise I can only use the mail version.
I created a diagnostic table by running:
svyset _n [pweight=rweight], jkrweight(rwgt*) vce(jackknife)
tabulate gendersex
svy, subpop(if datayr==2007): tabulate gendersex
svyset _n [pweight=cweight], jkrweight(cwgt*) vce(jackknife)
svy, subpop(if datayr==2007): tabulate gendersex
svyset _n [pweight=mweight], jkrweight(mwgt*) vce(jackknife)
svy, subpop(if datayr==2007): tabulate gendersex
A that point, I have to do everything manually. If it is a binary variable I take those numbers and manually calculate a difference test for two proportions. If they are on a Likert scale, then I use the standard errors from a tabstat to calculate differences. If they are continuous - I calculate the summary statistics, and then manually calculate the difference - yet again.
Given that a model might have 8 variables, this can be an awful lot of manual work - and in my mind its all because I can't put the tables together across survey designs and then analyze them.
It also seems like a limitation in my knowledge of stata and not a limitation of stata itself. Does anyone have an idea how I might make these comparisons more efficiently?
My thanks in advance.
Tim Huerta
Ohio State University
If the phone is no different than the mail, I can pool the data. Otherwise I can only use the mail version.
I created a diagnostic table by running:
svyset _n [pweight=rweight], jkrweight(rwgt*) vce(jackknife)
tabulate gendersex
svy, subpop(if datayr==2007): tabulate gendersex
svyset _n [pweight=cweight], jkrweight(cwgt*) vce(jackknife)
svy, subpop(if datayr==2007): tabulate gendersex
svyset _n [pweight=mweight], jkrweight(mwgt*) vce(jackknife)
svy, subpop(if datayr==2007): tabulate gendersex
A that point, I have to do everything manually. If it is a binary variable I take those numbers and manually calculate a difference test for two proportions. If they are on a Likert scale, then I use the standard errors from a tabstat to calculate differences. If they are continuous - I calculate the summary statistics, and then manually calculate the difference - yet again.
Given that a model might have 8 variables, this can be an awful lot of manual work - and in my mind its all because I can't put the tables together across survey designs and then analyze them.
It also seems like a limitation in my knowledge of stata and not a limitation of stata itself. Does anyone have an idea how I might make these comparisons more efficiently?
My thanks in advance.
Tim Huerta
Ohio State University