Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • xtgee: using qic for selecting optimal correlation structure

    Hi Statalisters,

    I have a question regarding selecting the optimal correlation structure in Generalized Estimating Equations.

    I went through

    Cui, James. "QIC program and model selection in GEE analyses." Stata journal7.2 (2007): 209.
    and

    Hardin, James W & Hilbe, Joseph M. Generalized estimating equations (GEE). Chapman and Hall/CRC, 2012.


    in order to identify which correlation structure to use. Based on the texts, it is mentioned that the correlation structure that minimises the qic should be used. What I find amusing is that using the same data, when I switch from using log of sales as a proxy for firm size to use log of employees, the correlation structure that minimises the qic switches between the two. For the first - using log of sales, it comes out stationary of the order 1


    Code:
     qic cino asg_1 ten_1 coo_1 tmt_1 fyear dc_1 ari_1 hhi_1 oc0_1 lsale_1 td_1, family(binomial 1) link(logit) corr(sta1) robust nolog nodisplay
    
                  QIC and QIC_u
    ___________________________________________
    Corr =                 sta1
    Family =         binomial 1
    Link =                logit
    p =                      12
    Trace =              24.885
    QIC =               188.596
    QIC_u =             162.827
    ___________________________________________
    And using log of employees, it comes out to be autoregressive of order 1


    Code:
    qic cino asg_1 ten_1 coo_1 tmt_1 fyear dc_1 ari_1 hhi_1 oc0_1 lemp_1 td_1, family(binomial 1) link(logit) corr(ar1) robust nolog nodisplay
    
                  QIC and QIC_u
    ___________________________________________
    Corr =                  ar1
    Family =         binomial 1
    Link =                logit
    p =                      12
    Trace =              24.246
    QIC =               189.616
    QIC_u =             165.125
    ___________________________________________
    I have not posted the qic for other structures such as ind, exc etc. in order to save space. Is there a reason why, for essentially the same data, the correlation structure that best suits it should change by simply changing one variable? From my limited understanding, I thought that the correlation structure is for the overall data, and not so dependent on one variable. But then again, I can be wrong. Can someone please shed some light on this?

    Thanking you in advance,
    Mohsin




  • #2
    My understanding is that the QIC values may change basing on the included covariates since they relate to the error distribution. I relate it to the test of Normality: sometimes you may find the square-root transformation normalizes data, but the logathmic one gives you a close-to-Normal distribution of residuals (that is what we usually want). But I admit I'm talking about two very different models (an "empty" ones vs a one with many covariates) while you're talking about very similar models. I see two possible explanations for it (a combination of them is also possible of course):
    1) The two variables (log of sales and log of employees) are so weak proxies (or at least one of them is) of firm size (even after undergoing the best linear transformation) to be very different covariates (I mean you cannot use a linear transformation of one as a proxy for the other), so that they give raise to a very different distribution of the regression error;
    2) The two models are basically equivalent in terms of QIC, so which one actually "wins" depends on details, and even very slight modifications of the model may lead to change the "winner".
    Federico

    Comment

    Working...
    X