Robust standard errors and working correlation structure in xtgee (generalized estimating equations)

Lisa Dinkler

Join Date: Aug 2018

Posts: 2
#1

Robust standard errors and working correlation structure in xtgee (generalized estimating equations)

13 Aug 2018, 08:33

Hi!

I have binary outcome data clustered in individuals and want to use the xtgee command to adjust for correlated residuals within individuals.

I have however problems understanding how using cluster-robust standard errors (in xtgee vce(robust)) and specifying the working correlation matrix in xtgee relate to each other.

According to Stata help:

vce(robust) specifies that the Huber/White/sandwich estimator of
variance is to be used in place of the default conventional variance
estimator (see Methods and formulas in [XT] xtgee). Use of this
option causes xtgee to produce valid standard errors even if the
correlations within group are not as hypothesized by the specified
correlation structure. Under a noncanonical link, it does, however,
require that the model correctly specifies the mean. The resulting
standard errors are thus labeled "semirobust" instead of "robust" in
this case. Although there is no vce(cluster clustvar) option,
results are as if this option were included and you specified
clustering on the panel variable.

1. I am wondering if there is any use of specifying a within-group correlation structure (the default is exchangeable) if vce(robust) produces "valid standard errors even if the correlations within group are not as hypothesized by the specified correlation structure"?

2. I am also wondering, what Stata does if an independent correlation structure is specified together with vce (robust). Does Stata just "ignore" my specification and allow for within-group correlation anyway?

3. Also, is there any reason why someone would want to run a gee with independent working correlation structure? In my understanding, gee is used to adjust for within-group correlation structure, so if one thinks that within-group residuals are uncorrelated (=independent), on could just use OLS?

4. And lastly, couldn't - instead of xtgee - a glm with vce(cluster clustvar) be used? In my data, glm with vce(cluster clusterid) and gee with vce(robust) and independent working correlation structure yield exactly the same coefficients. Do both models in this case just estimate a within-group correlation?

vce(cluster clustvar ) specifies that the standard errors allow for
intragroup correlation, relaxing the usual requirement that the
observations be independent. That is to say, the observations are
independent across groups (clusters) but not necessarily within
groups . clustvar specifies to which group each observation belongs,
for examples, vce (cluster personid) in data with repeated
observations on individuals. vce (cluster clustvar ) affects the
standard errors and variance-covariance matrix of the estimators but
not the estimated coefficients; see [U] 20.22 Obtaining robust
variance estimates.

Thank you!

Last edited by Lisa Dinkler; 13 Aug 2018, 08:51.
Tags: working correlation, xtgee
Andrew Musau

Join Date: Oct 2014

Posts: 10186
#2

14 Aug 2018, 10:19

1. I am wondering if there is any use of specifying a within-group correlation structure (the default is exchangeable) if vce(robust) produces "valid standard errors even if the correlations within group are not as hypothesized by the specified correlation structure"?

2. I am also wondering, what Stata does if an independent correlation structure is specified together with vce (robust). Does Stata just "ignore" my specification and allow for within-group correlation anyway?

One thing that you should note is that your coefficients will be different depending on what within-group correlation structure you choose, so for no. 2, the answer is no, Stata does not ignore what you specify. If your goal is to obtain valid inference without regard to anything else (that is, only determining whether a particular variable is important in explaining something without regard to by how much), then yes (No. 1), it is not critical to specify any given correlation structure. Robust standard errors are robust to arbitrary within group correlations. Look at the range of the t-statistics in the following example under different correlation structures (I have gaps in my data, so I can only use independent, exchangeable and unstructured for comparison)

Code:

webuse union xtset id year eststo: xtgee union age grade not_smsa south, family(binomial) link(logit) corr(ind) nolog eststo: xtgee union age grade not_smsa south, family(binomial) link(logit) corr(exc) nolog eststo: xtgee union age grade not_smsa south, family(binomial) link(logit) corr(uns) nolog eststo: xtgee union age grade not_smsa south, family(binomial) link(logit) corr(ind) robust nolog eststo: xtgee union age grade not_smsa south, family(binomial) link(logit) corr(exc) robust nolog eststo: xtgee union age grade not_smsa south, family(binomial) link(logit) corr(uns) robust nolog esttab est*

Code:

. esttab est* ------------------------------------------------------------------------------------------------------------ (1) (2) (3) (4) (5) (6) union union union union union union ------------------------------------------------------------------------------------------------------------ age 0.0117*** 0.00988*** 0.00736* 0.0117*** 0.00988** 0.00736* (4.99) (4.74) (2.55) (3.54) (3.18) (2.51) grade 0.0485*** 0.0606*** 0.0644*** 0.0485*** 0.0606*** 0.0644*** (7.55) (5.59) (5.61) (3.48) (4.56) (5.07) not_smsa -0.221*** -0.126** -0.162** -0.221** -0.126* -0.162** (-6.22) (-2.60) (-3.14) (-3.10) (-2.05) (-2.83) south -0.647*** -0.575*** -0.552*** -0.647*** -0.575*** -0.552*** (-19.77) (-11.81) (-10.74) (-10.27) (-9.80) (-9.84) _cons -1.942*** -2.163*** -2.168*** -1.942*** -2.163*** -2.168*** (-18.40) (-14.57) (-13.14) (-9.84) (-11.41) (-11.98) ------------------------------------------------------------------------------------------------------------ N 26200 26200 26200 26200 26200 26200 ------------------------------------------------------------------------------------------------------------ t statistics in parentheses * p<0.05, ** p<0.01, *** p<0.001

If we focus on the t-statistics relating to the coefficient on south, they range from -10.74 to -19.77 under the different correlation structures (columns 1-3). However, with robust standard errors (columns 4- 6), the range is -9.80 to -10.27. So, for the latter, no matter what correlation structure we specify, we end up with a similar story of the association between our outcome and this variable (that is how you interpret the entry in the manual).

3. Also, is there any reason why someone would want to run a gee with independent working correlation structure? In my understanding, gee is used to adjust for within-group correlation structure, so if one thinks that within-group residuals are uncorrelated (=independent), on could just use OLS?

4. And lastly, couldn't - instead of xtgee - a glm with vce(cluster clustvar) be used? In my data, glm with vce(cluster clusterid) and gee with vce(robust) and independent working correlation structure yield exactly the same coefficients. Do both models in this case just estimate a within-group correlation?

xtgee will give you the flexibility of specifying a within-group correlation structure without excluding the possibility of specifying what is the default in other estimators. So yes, you can use OLS, glm or xtgee to estimate the same model as below, you just need to choose the right options.

Code:

webuse grunfeld regress invest mvalue kstock, nolog xtgee invest mvalue kstock, corr(ind) nmp nolog glm invest mvalue kstock, family(gaussian) link(identity) nolog

Last edited by Andrew Musau; 14 Aug 2018, 10:46.
Comment

Announcement

Robust standard errors and working correlation structure in xtgee (generalized estimating equations)

Comment