Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Robust standard errors and working correlation structure in xtgee (generalized estimating equations)

    Hi!

    I have binary outcome data clustered in individuals and want to use the xtgee command to adjust for correlated residuals within individuals.

    I have however problems understanding how using cluster-robust standard errors (in xtgee vce(robust)) and specifying the working correlation matrix in xtgee relate to each other.

    According to Stata help:

    vce(robust) specifies that the Huber/White/sandwich estimator of
    variance is to be used in place of the default conventional variance
    estimator (see Methods and formulas in [XT] xtgee). Use of this
    option causes xtgee to produce valid standard errors even if the
    correlations within group are not as hypothesized by the specified
    correlation structure. Under a noncanonical link, it does, however,
    require that the model correctly specifies the mean. The resulting
    standard errors are thus labeled "semirobust" instead of "robust" in
    this case. Although there is no vce(cluster clustvar) option,
    results are as if this option were included and you specified
    clustering on the panel variable.
    1. I am wondering if there is any use of specifying a within-group correlation structure (the default is exchangeable) if vce(robust) produces "valid standard errors even if the correlations within group are not as hypothesized by the specified correlation structure"?

    2. I am also wondering, what Stata does if an independent correlation structure is specified together with vce (robust). Does Stata just "ignore" my specification and allow for within-group correlation anyway?

    3. Also, is there any reason why someone would want to run a gee with independent working correlation structure? In my understanding, gee is used to adjust for within-group correlation structure, so if one thinks that within-group residuals are uncorrelated (=independent), on could just use OLS?

    4. And lastly, couldn't - instead of xtgee - a glm with vce(cluster clustvar) be used? In my data, glm with vce(cluster clusterid) and gee with vce(robust) and independent working correlation structure yield exactly the same coefficients. Do both models in this case just estimate a within-group correlation?

    vce(cluster clustvar ) specifies that the standard errors allow for
    intragroup correlation, relaxing the usual requirement that the
    observations be independent. That is to say, the observations are
    independent across groups (clusters) but not necessarily within
    groups . clustvar specifies to which group each observation belongs,
    for examples, vce (cluster personid) in data with repeated
    observations on individuals. vce (cluster clustvar ) affects the
    standard errors and variance-covariance matrix of the estimators but
    not the estimated coefficients; see [U] 20.22 Obtaining robust
    variance estimates.
    Thank you!
    Last edited by Lisa Dinkler; 13 Aug 2018, 08:51.

  • #2
    1. I am wondering if there is any use of specifying a within-group correlation structure (the default is exchangeable) if vce(robust) produces "valid standard errors even if the correlations within group are not as hypothesized by the specified correlation structure"?

    2. I am also wondering, what Stata does if an independent correlation structure is specified together with vce (robust). Does Stata just "ignore" my specification and allow for within-group correlation anyway?
    One thing that you should note is that your coefficients will be different depending on what within-group correlation structure you choose, so for no. 2, the answer is no, Stata does not ignore what you specify. If your goal is to obtain valid inference without regard to anything else (that is, only determining whether a particular variable is important in explaining something without regard to by how much), then yes (No. 1), it is not critical to specify any given correlation structure. Robust standard errors are robust to arbitrary within group correlations. Look at the range of the t-statistics in the following example under different correlation structures (I have gaps in my data, so I can only use independent, exchangeable and unstructured for comparison)


    Code:
    webuse union
    xtset id year
    eststo: xtgee union age grade not_smsa south, family(binomial) link(logit) corr(ind) nolog
    eststo: xtgee union age grade not_smsa south, family(binomial) link(logit) corr(exc) nolog
    eststo: xtgee union age grade not_smsa south, family(binomial) link(logit) corr(uns) nolog
    eststo: xtgee union age grade not_smsa south, family(binomial) link(logit) corr(ind) robust nolog
    eststo: xtgee union age grade not_smsa south, family(binomial) link(logit) corr(exc) robust nolog
    eststo: xtgee union age grade not_smsa south, family(binomial) link(logit) corr(uns) robust nolog
    esttab est*

    Code:
    . esttab est*
    
    ------------------------------------------------------------------------------------------------------------
                          (1)             (2)             (3)             (4)             (5)             (6)  
                        union           union           union           union           union           union  
    ------------------------------------------------------------------------------------------------------------
    age                0.0117***      0.00988***      0.00736*         0.0117***      0.00988**       0.00736*  
                       (4.99)          (4.74)          (2.55)          (3.54)          (3.18)          (2.51)  
    
    grade              0.0485***       0.0606***       0.0644***       0.0485***       0.0606***       0.0644***
                       (7.55)          (5.59)          (5.61)          (3.48)          (4.56)          (5.07)  
    
    not_smsa           -0.221***       -0.126**        -0.162**        -0.221**        -0.126*         -0.162**
                      (-6.22)         (-2.60)         (-3.14)         (-3.10)         (-2.05)         (-2.83)  
    
    south              -0.647***       -0.575***       -0.552***       -0.647***       -0.575***       -0.552***
                     (-19.77)        (-11.81)        (-10.74)        (-10.27)         (-9.80)         (-9.84)  
    
    _cons              -1.942***       -2.163***       -2.168***       -1.942***       -2.163***       -2.168***
                     (-18.40)        (-14.57)        (-13.14)         (-9.84)        (-11.41)        (-11.98)  
    ------------------------------------------------------------------------------------------------------------
    N                   26200           26200           26200           26200           26200           26200  
    ------------------------------------------------------------------------------------------------------------
    t statistics in parentheses
    * p<0.05, ** p<0.01, *** p<0.001
    If we focus on the t-statistics relating to the coefficient on south, they range from -10.74 to -19.77 under the different correlation structures (columns 1-3). However, with robust standard errors (columns 4- 6), the range is -9.80 to -10.27. So, for the latter, no matter what correlation structure we specify, we end up with a similar story of the association between our outcome and this variable (that is how you interpret the entry in the manual).



    3. Also, is there any reason why someone would want to run a gee with independent working correlation structure? In my understanding, gee is used to adjust for within-group correlation structure, so if one thinks that within-group residuals are uncorrelated (=independent), on could just use OLS?
    4. And lastly, couldn't - instead of xtgee - a glm with vce(cluster clustvar) be used? In my data, glm with vce(cluster clusterid) and gee with vce(robust) and independent working correlation structure yield exactly the same coefficients. Do both models in this case just estimate a within-group correlation?

    xtgee will give you the flexibility of specifying a within-group correlation structure without excluding the possibility of specifying what is the default in other estimators. So yes, you can use OLS, glm or xtgee to estimate the same model as below, you just need to choose the right options.

    Code:
    webuse grunfeld
    regress invest mvalue kstock, nolog
    xtgee invest mvalue kstock, corr(ind) nmp nolog
    glm invest mvalue kstock, family(gaussian) link(identity) nolog
    Last edited by Andrew Musau; 14 Aug 2018, 10:46.

    Comment

    Working...
    X