xtgee: using qic for selecting optimal correlation structure

Mohsin Khan

Join Date: Jul 2015

Posts: 66
#1

xtgee: using qic for selecting optimal correlation structure

13 Aug 2015, 05:12

Hi Statalisters,

I have a question regarding selecting the optimal correlation structure in Generalized Estimating Equations.

I went through

Cui, James. "QIC program and model selection in GEE analyses." Stata journal7.2 (2007): 209.

and

Hardin, James W & Hilbe, Joseph M. Generalized estimating equations (GEE). Chapman and Hall/CRC, 2012.

in order to identify which correlation structure to use. Based on the texts, it is mentioned that the correlation structure that minimises the qic should be used. What I find amusing is that using the same data, when I switch from using log of sales as a proxy for firm size to use log of employees, the correlation structure that minimises the qic switches between the two. For the first - using log of sales, it comes out stationary of the order 1

Code:

qic cino asg_1 ten_1 coo_1 tmt_1 fyear dc_1 ari_1 hhi_1 oc0_1 lsale_1 td_1, family(binomial 1) link(logit) corr(sta1) robust nolog nodisplay QIC and QIC_u ___________________________________________ Corr = sta1 Family = binomial 1 Link = logit p = 12 Trace = 24.885 QIC = 188.596 QIC_u = 162.827 ___________________________________________

And using log of employees, it comes out to be autoregressive of order 1

Code:

qic cino asg_1 ten_1 coo_1 tmt_1 fyear dc_1 ari_1 hhi_1 oc0_1 lemp_1 td_1, family(binomial 1) link(logit) corr(ar1) robust nolog nodisplay QIC and QIC_u ___________________________________________ Corr = ar1 Family = binomial 1 Link = logit p = 12 Trace = 24.246 QIC = 189.616 QIC_u = 165.125 ___________________________________________

I have not posted the qic for other structures such as ind, exc etc. in order to save space. Is there a reason why, for essentially the same data, the correlation structure that best suits it should change by simply changing one variable? From my limited understanding, I thought that the correlation structure is for the overall data, and not so dependent on one variable. But then again, I can be wrong. Can someone please shed some light on this?

Thanking you in advance,
Mohsin
Tags: None
Federico Tedeschi

Join Date: Mar 2015

Posts: 139
#2

23 Jul 2019, 03:56

My understanding is that the QIC values may change basing on the included covariates since they relate to the error distribution. I relate it to the test of Normality: sometimes you may find the square-root transformation normalizes data, but the logathmic one gives you a close-to-Normal distribution of residuals (that is what we usually want). But I admit I'm talking about two very different models (an "empty" ones vs a one with many covariates) while you're talking about very similar models. I see two possible explanations for it (a combination of them is also possible of course):
1) The two variables (log of sales and log of employees) are so weak proxies (or at least one of them is) of firm size (even after undergoing the best linear transformation) to be very different covariates (I mean you cannot use a linear transformation of one as a proxy for the other), so that they give raise to a very different distribution of the regression error;
2) The two models are basically equivalent in terms of QIC, so which one actually "wins" depends on details, and even very slight modifications of the model may lead to change the "winner".
Federico
Comment

Announcement

xtgee: using qic for selecting optimal correlation structure

Comment