Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • GEE: Correlation structure

    Hi All,

    I am doing an analysis where my dependent variable is binary - presence(1)/absence(0) of a chief innovation officer in the top management team. My dataset is a balanced panel of 100 firms, with the data spread over 5 years. Since the same firms are repeated, I initially used exchangeable correlation structure. However, the wald chi2 and prob > chi2 (16.48 and .1243 respectively) is very weak compared to that of independent correlation ( 37.43 and 0.0001). Moreover, the independent variables that are significant also varies between the two setups. Can someone please guide me here on how I can verify which correlation structure to use.

    Kind regards,
    Mohsin

  • #2
    Welcome to Statalist, Mohsin!

    FAQ section 12 ask that you show the exact commands and all results of those commands. Please do so and put the commands and results inside CODE delimiters, as the Section asks. I'm curious why you chose only the independence and exchangeable options to compare. More realistic would be "unstructured" and "ar" For an example of comparing correlation structures, see the section on estat correlation in the Manual Entry for xtgee postestimation
    Last edited by Steve Samuels; 25 Jul 2015, 21:18.
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      Hi Steven,

      Thank you for helping me organise my post. Please find below the codes and results:

      Ind structure:
      Code:
      xtgee cino tmt ten oceo0 dceo coo aroa_1 ari_1 asg_1 alat_1 hhi_1, family(binomial 1) link(logit) corr(ind) nolog
      
      Results:
      GEE population-averaged model                   Number of obs      =       500
      Group variable:                         id      Number of groups   =       100
      Link:                                logit      Obs per group: min =         5
      Family:                           binomial                     avg =       5.0
      Correlation:                   independent                     max =         5
                                                      Wald chi2(10)      =     36.22
      Scale parameter:                         1      Prob > chi2        =    0.0001
      
      Pearson chi2(500):                  377.96      Deviance           =    185.20
      Dispersion (Pearson):             .7559285      Dispersion         =  .3704039
      
      ------------------------------------------------------------------------------
              cino |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
               tmt |   .3322662   .0579783     5.73   0.000     .2186309    .4459015
               ten |  -.0103753   .0380548    -0.27   0.785    -.0849613    .0642107
             oceo0 |  -.1421095   .4680038    -0.30   0.761     -1.05938    .7751611
              dceo |   .3836575   .4472361     0.86   0.391    -.4929091    1.260224
               coo |   .1534408   .5733054     0.27   0.789    -.9702172    1.277099
            aroa_1 |   .2502174   2.142053     0.12   0.907    -3.948129    4.448564
             ari_1 |  -.2168398   1.289913    -0.17   0.867    -2.745023    2.311343
             asg_1 |   .0600475   .3922838     0.15   0.878    -.7088146    .8289096
            alat_1 |  -.3129255    .169878    -1.84   0.065    -.6458802    .0200293
             hhi_1 |  -.0017149   .0007905    -2.17   0.030    -.0032642   -.0001656
             _cons |  -4.767029   .7401897    -6.44   0.000    -6.217774   -3.316284
      ------------------------------------------------------------------------------
      Exc structure:
      Code:
      xtgee cino tmt ten oceo0 dceo coo aroa_1 ari_1 asg_1 alat_1 hhi_1, family(binomial 1) link(logit) corr(exc) nolog
      
      Results:
      GEE population-averaged model                   Number of obs      =       500
      Group variable:                         id      Number of groups   =       100
      Link:                                logit      Obs per group: min =         5
      Family:                           binomial                     avg =       5.0
      Correlation:                  exchangeable                     max =         5
                                                      Wald chi2(10)      =     14.50
      Scale parameter:                         1      Prob > chi2        =    0.1515
      
      ------------------------------------------------------------------------------
              cino |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
               tmt |   .1105161   .0626282     1.76   0.078    -.0122329    .2332651
               ten |   .0759874   .0389548     1.95   0.051    -.0003625    .1523373
             oceo0 |    .200225   .5087608     0.39   0.694    -.7969278    1.197378
              dceo |   .2377148   .4733204     0.50   0.616    -.6899762    1.165406
               coo |   .1153037   .4676706     0.25   0.805    -.8013138    1.031921
            aroa_1 |  -2.875153   1.415628    -2.03   0.042    -5.649733   -.1005725
             ari_1 |  -.0933946   .6585011    -0.14   0.887    -1.384033    1.197244
             asg_1 |   .1679215   .2662192     0.63   0.528    -.3538586    .6897016
            alat_1 |   .2121619    .240677     0.88   0.378    -.2595563    .6838802
             hhi_1 |   .0013722   .0007857     1.75   0.081    -.0001678    .0029123
             _cons |  -5.816824   1.213502    -4.79   0.000    -8.195245   -3.438403
      ------------------------------------------------------------------------------
      Uns structure gives the following error:
      convergence not achieved
      r(430);

      Code:
      xtgee cino tmt ten oceo0 dceo coo aroa_1 ari_1 asg_1 alat_1 hhi_1, family(binomial 1) link(logit) corr(uns) nolog
      Finally, using ar structure:

      Code:
      xtgee cino tmt ten oceo0 dceo coo aroa_1 ari_1 asg_1 alat_1 hhi_1, family(binomial 1) link(logit) corr(ar 1) nolog
      
      Results:
      GEE population-averaged model                   Number of obs      =       500
      Group and time vars:              id fyear      Number of groups   =       100
      Link:                                logit      Obs per group: min =         5
      Family:                           binomial                     avg =       5.0
      Correlation:                         AR(1)                     max =         5
                                                      Wald chi2(10)      =      5.74
      Scale parameter:                         1      Prob > chi2        =    0.8366
      
      ------------------------------------------------------------------------------
              cino |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
               tmt |   .0252484   .0632784     0.40   0.690     -.098775    .1492718
               ten |   .0217006   .0376289     0.58   0.564    -.0520507    .0954519
             oceo0 |   .0232131   .4863239     0.05   0.962    -.9299643    .9763905
              dceo |   .0473541   .4276206     0.11   0.912    -.7907669    .8854752
               coo |    .124459   .4151263     0.30   0.764    -.6891736    .9380916
            aroa_1 |  -2.197934   1.087229    -2.02   0.043    -4.328864   -.0670045
             ari_1 |  -.1314868   .7020972    -0.19   0.851    -1.507572    1.244598
             asg_1 |   .0709553   .1916394     0.37   0.711     -.304651    .4465616
            alat_1 |   .2346929   .2187694     1.07   0.283    -.1940874    .6634731
             hhi_1 |    .000734   .0009118     0.81   0.421     -.001053     .002521
             _cons |  -3.985506   1.094861    -3.64   0.000    -6.131395   -1.839617
      ------------------------------------------------------------------------------
      The significance of the overall model decreases from ind>exc>ar. Moreover, the I.Vs that are significant also varies with the models.

      I am following past papers in which authors have used GEE, but for other top management team executives - chief operation officer, chief strategy officers and chief marketing officers. In their analysis, presence of the executive officers were in at least 20% or more firms years. While in my data, out of the 500 firm years in only 35 firm years is a cino present (=1) which is a mere 7%. Does the fact that only in 7% of the firm years is a cino present effect the choice of regression type I should be using?

      Thank you in advance!
      Mohsin


      Comment


      • #4
        Thank you for using the CODE delimiters. It makes your results very easy to use.

        I see that you did not use the option vce(robust). Repeat the analyses and do so. Doing so will give valid standard errors no matter what the working correlation structure.Failure to do so will almost certainly give biased standard errors. I myself would favor the ar 1 working correlation model, as it is likely to reduce standard errors when used with vce(robust).
        I notice that your model is quite simple, with no interactions and (apparently) no non-linear terms. What is the goal of the analysis?
        Steve Samuels
        Statistical Consulting
        [email protected]

        Stata 14.2

        Comment


        • #5
          Thank you for your reply. I tried using vce(robust) and the significance improved a lot. And yes, you're right. It makes more sense to use ar1. I tried using uns but again, no convergence was achieved. As for the model, yes you're right - what I shared was quite basic. I didn't add the interaction terms yet. I was playing around with the models to understand how it works. Now I have a fair idea and will work on adding the different interactions. Thanks again!

          Comment


          • #6
            Hi Steven/Statalisters,

            I have a follow up question regards to the correlation structure. I went through
            Cui, James. "QIC program and model selection in GEE analyses." Stata journal7.2 (2007): 209.
            and
            Hardin, James W & Hilbe, Joseph M. Generalized estimating equations (GEE). Chapman and Hall/CRC, 2012.
            in order to identify which correlation structure to use. Based on the texts, it is mentioned that the correlation structure that minimises the qic should be used. What I find amusing is that using the same data, when I switch from using log of sales as a proxy for firm size to use log of employees, the correlation structure that minimises the qic switches between the two. For the first - using log of sales, it comes out stationary of the order 1

            Code:
            qic cino asg_1 ten_1 coo_1 tmt_1 fyear dc_1 ari_1 hhi_1 oc0_1 lsale_1 td_1, family(binomial 1) link(logit) corr(sta1) robust nolog nodisplay
            
                          QIC and QIC_u
            ___________________________________________
            Corr =                 sta1
            Family =         binomial 1
            Link =                logit
            p =                      12
            Trace =              24.885
            QIC =               188.596
            QIC_u =             162.827
            ___________________________________________
            And using log of employees, it comes out to be autoregressive of order 1

            Code:
            qic cino asg_1 ten_1 coo_1 tmt_1 fyear dc_1 ari_1 hhi_1 oc0_1 lemp_1 td_1, family(binomial 1) link(logit) corr(ar1) robust nolog nodisplay
            
                          QIC and QIC_u
            ___________________________________________
            Corr =                  ar1
            Family =         binomial 1
            Link =                logit
            p =                      12
            Trace =              24.246
            QIC =               189.616
            QIC_u =             165.125
            ___________________________________________
            I have not posted the qic for other structures such as ind, exc etc. in order to save space. Is there a reason why, for essentially the same data, the correlation structure that best suits it should change by simply changing one variable? From my limited understanding, I thought that the correlation structure is for the data overall, and not so dependent on one variable. But then again, I can be wrong. Can someone please shed some light on this?

            Moreover, I read in the texts that GEE is robust to correlation structures, even if it is misspecified. Does that mean I should not be paying too much attention to the correlation structures?

            Thanking you all in advance,
            Mohsin

            Comment


            • #7
              Yes, GEE is robust to the chosen structure; the choice only affects the precision of estimates.
              Steve Samuels
              Statistical Consulting
              [email protected]

              Stata 14.2

              Comment


              • #8
                Thank you again, Steve. You have been very helpful! Do you know why might the "best correlation structure according to qic" change from sta1 to ar1 just by changing one variable? Shouldn't the "best correlation structure" be based on overall data, rather than being influenced so much by one variable?

                As always, sincerely appreciate your help!
                Mohsin

                Comment


                • #9
                  Sorry, but I can't answer your latest question. I suggest that you ask it in a new thread. I'll just observe that you could put both variables into a model.
                  Steve Samuels
                  Statistical Consulting
                  [email protected]

                  Stata 14.2

                  Comment


                  • #10
                    That's understandable. Thank you for your help so far.

                    Comment

                    Working...
                    X