I'm working with data from a clustered sample where observations were sampled with a given probability which is to be used as a sampling weight (pweight). There are two ways to obtain the correct point estimates: I) using reg yvar xvar [pw = pweight] or ii) using svyset[pw = pweight] and then svy : reg yvar xvar These return identical point estimates (as they should). However, once one wants to introduce cluster-robust standard errors, the "manual" approach and the svyset approach return slightly different results. What I mean by "manual" is a command of the form: reg yvar xvar [pw = pweight], cluster(clustervar) as opposed to: svyset clustervar [pw = pweight] and then svy : reg yvar xvar. Here is a little code example to illustrate this with some numbers:
The standard errors are very close to one another but not identical (mpg is 72.48 and 71.48 and weight has 0.969 and 0.956). Stata calls the ones from the svyset-regression "Linearized" so I suppose that's where the difference comes from - potentially a Taylor expansion? Could somebody point me towards the precise (mathematical) difference? Are the patterns, i.e. one is always larger than the other?
I'm using Stata 13. I've posted this question before in the Cross Validated community but have not received an answer http://stats.stackexchange.com/quest...-survey-design.
Code:
sysuse auto set seed 92122 *a variable containing random integers from 1 thru 4 designating fake clusters gen mycluster = ceil(4*uniform()) *random probability weights as the inverse of some random sampling probability gen mypw = 1/uniform() *run the "manual" regression reg price mpg weight [pw = mypw], cluster(mycluster) *using svy design svyset mycluster [pw = mypw] svy : reg price mpg weight
I'm using Stata 13. I've posted this question before in the Cross Validated community but have not received an answer http://stats.stackexchange.com/quest...-survey-design.
Comment