Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fixed-effects model using simultaneous ANOVA

    Hello everyone,

    I’d like to measure the CEO-effect (i.e. proportion of variance in a firm-level outcome variable that can be attributed to the presence of individual CEOs) within my unbalanced panel data using Stata 12.0 for Windows.

    My model looks as follows:

    FirmPerformance_ijkt = GrandMean + Year_i + Industry_j + Company_k + CEO_t + Error_ijkt

    In accordance with previous papers, I would like to estimate the effects with a fixed-effect model using simultaneous ANOVA, which is supposed to be “… a modified version of sequential ANOVA. While sequential ANOVA does not estimate the covariance among categories, and instead attributes covariance between two categories to whichever is entered earlier in the model, simultaneous ANOVA estimates the covariance between different categories and adjusts the category estimates accordingly.” (Crossland/Hambrick 2007, p. 780). Another article states “… simultaneous ANOVA allows for a full set of covariance effects but does not assume randomness in the model errors.” (Mackey 2008, p. 1360).

    Unfortunately, I couldn’t find anything in how to do a simultaneous ANOVA in Stata so far. Does anyone has an idea?
    Is it maybe just something like:

    . anova a b c a#b a#c b#c a#b#c a##b##c ?

    Thanks in advance!
    ---------------------------------------

    Crossland, C. und D. C. Hambrick (2007): How National Systems Differ in Their Constraints on Corporate Executives: A Study of CEO Effects in Three Countries; in: Strategic Management Journal 28 (8); p. 767–789.

    Mackey, A. (2008): The Effect of CEOs on Firm Performance; in: Strategic Management Journal 29 (12); p. 1357-1367.

  • #2
    I'm not an anova expert, but is it possible to frame the problem in terms of a linear regression, and then run e.g. reghdfe (ssc) or areg and compute the results?
    Note though that if you go that way, the CEOs fixed effects would be inconsistent, and so your estimate for their influence. Maybe you can run two models, one without absorbing for CEO FEs, one absorbing for them, and compare the R2s.

    Comment


    • #3
      Thanks a lot for the answer.

      After some more research, I found out that apparently authors in this field refer to “simultaneous” instead of “sequential” ANOVA, when they make us of partial (sometime also referred to as unique or simultaneous) sums of squares instead of sequential sums of squares. Therefore, the answer is quit trivial: anova performance year industry company ceo

      In this regard I have an additional question. For further calculations, I would like to know the intercepts (constant terms) for each CEO of the underlying regression model of the anova. With the command “regress” after the anova run I’ll get the results. Unfortunately, for some intercepts I just get the result “0 (omitted)”.
      Does someone know why this happens und what I have to do in order to get all intercept values?

      Thanks in advance!

      Comment


      • #4
        By default Anova produces partial SS. If you want Sequential SS, just use 'sequential' as an option after 'anova' command. Saying that, whichever you think is justified, at least one problem can be identified from your code which resulting 'zero' estimates:
        anova a b c a#b a#c b#c a#b#c a##b##c
        In the code above, this portion:
        Code:
        anova a b c a#b a#c b#c a#b#c
        is already a full factorial model which is equivalent to:
        Code:
        anova a##b##c
        No wonder the redundant parameters will return 'zero'. Also you might be loosing degrees of freedom if either a, b or c is continuous variable. In that case you need to tell Stata that which variable is continuous i.e. c.a#c.b means they are continuous variable. see
        Code:
        help fvvarlist
        Just for your reference, you omitted the 'y' from your equation code, so did I. For sure we Know there is a "y' in the equation. But as per rules of this forum you are expected to show 'exact' code you used and 'exact' results Stata returned.
        Last edited by Roman Mostazir; 13 May 2015, 18:25.
        Roman

        Comment


        • #5
          Thanks for your reply.

          I think you got my answer wrong. My ‘final’ anova command I use for the analysis is:

          anova performance i.year i.industry i.company i.year

          with ‘performance’ being my continuous variable (dependent variable) and ‘year’, ‘industry’, ‘company’ and ‘ceo’ being the categorical variables. I’m also aware of the differences in using partial or sequential SS.

          As said before, I would now like to know all intercepts of the ‘ceo’ variable of the underlying regression model after the anova run, but for a lot values I just get “0 (omitted)” by using the ‘regress’ command after the anova run. Does anyone know why and how to fix it?

          Comment


          • #6

            I think you got my answer wrong. My ‘final’ anova command I use for the analysis is:

            anova performance i.year i.industry i.company i.year
            Hmm..I was referring to your codes in #1. But where is 'ceo' in your final model? I can't see it. That is why the FAQ section suggests to provide the 'exact' codes you have used and provide the Stata outputs so that people can easily spot your problem and help you out more efficiently. Now looking at your reply#5, my guess is you need 'margins'. Tried that? Though not sure why 'reg' will provide lots of 'zero' for 'ceo' as it should provide the coefficients for other categories in compared to the reference category. My guess cannot go further unless some outputs are presented.

            Best.
            Roman

            Comment


            • #7
              Sorry, you are of course totally right, I typed in twice ‘year’ instead of ‘ceo’ in the presented code. Here the ‘correct’ command I used:

              anova performance i.year i.industry i.company i.ceo

              Then, I get normally looking anova results. Afterwards I type in ‘regress’ in order to find out the intercepts for the variables of the underlying regression model. For both the ‘year’ and ‘industry’ variables the tables are completely filled, whereas for ‘company’ some values are missing and for ‘ceo’ quite a lot. Unfortunately, I couldn’t figure out a pattern why some values are omitted. For the sake of completeness, I attached a screenshot of the results of the regress command.

              Thanks in advance for any help!

              Comment


              • #8
                I need to catch up a meeting, so making a quick reply hoping you can sort it out. 'reg' is not giving you the intercepts for each category. It is providing you the difference (coefficient) for each category compared to the reference category, in this case ceo = 1. Try:

                Code:
                anova performance i.year i.industry i.company i.ceo
                
                margins i.year i.ceo
                See the results. For future,: it is more convenient and legible if you use code delimiters to post the outputs rather screenshots. Click the A button in the right hand side and click '#' and copy and paste the outputs, Stata codes in between the CODE delimiters.

                Best,

                Roman

                Comment


                • #9
                  Thanks, concerning the stata commands that is exactly what I was looking for. However, if I try to use the margins command for the ‘ceo’ variable, I get a “not estimable” notification (the same occurs for ‘industry’ and ‘company’). See the following stata output:

                  Code:
                   anova performance i.year i.industry i.company i.ceo
                  
                                             Number of obs =    1013     R-squared     =  0.4853
                                             Root MSE      = 12.7618     Adj R-squared =  0.2999
                  
                                    Source |  Partial SS    df       MS           F     Prob > F
                                -----------+----------------------------------------------------
                                     Model |  114242.222   268  426.276947       2.62     0.0000
                                           |
                                      year |  1304.54193     9  144.949104       0.89     0.5337
                                  industry |  646.920433    11  58.8109485       0.36     0.9703
                                   company |  72992.8089   126  579.308007       3.56     0.0000
                                       ceo |  26083.3129   122  213.797646       1.31     0.0194
                                           |
                                  Residual |  121171.066   744  162.864336  
                                -----------+----------------------------------------------------
                                     Total |  235413.288  1012  232.621826  
                  
                  . margins i.ceo
                  
                  Predictive margins                                Number of obs   =       1013
                  
                  Expression   : Linear prediction, predict()
                  
                  ------------------------------------------------------------------------------
                               |            Delta-method
                               |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                           ceo |
                            1  |          .  (not estimable)
                            2  |          .  (not estimable)
                            3  |          .  (not estimable)
                            4  |          .  (not estimable)
                            5  |          .  (not estimable)
                  
                  ...
                  As far as I can tell, it seems like I have problems with collinearity. Using the ‘noestimcheck’ option for the ‘margins’ command, results are calculated. However, for all ‘ceo’ values that where labeled “0 (omitted)” by using the ‘regress’ command, I’ll now get the identical numerical values for each ‘ceo’ value. See the following stata output:

                  Code:
                  . margins i.ceo, noestimcheck
                  
                  Predictive margins                                Number of obs   =       1013
                  
                  Expression   : Linear prediction, predict()
                  
                  ------------------------------------------------------------------------------
                               |            Delta-method
                               |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                           ceo |
                            1  |   1.677583   .6844242     2.45   0.014      .336136    3.019029
                            2  |   .0656615   12.82456     0.01   0.996    -25.07001    25.20133
                            3  |   .9846664   14.00152     0.07   0.944     -26.4578    28.42713
                            4  |   1.677583   .6844242     2.45   0.014      .336136    3.019029
                            5  |  -.2455593   8.153771    -0.03   0.976    -16.22666    15.73554
                            6  |   .0731999   18.16001     0.00   0.997    -35.51977    35.66617
                            7  |   33.69258   10.71732     3.14   0.002     12.68703    54.69814
                            8  |   22.90153    14.3586     1.59   0.111    -5.240803    51.04387
                            9  |   1.677583   .6844242     2.45   0.014      .336136    3.019029
                  
                  .....
                          253  |   1.677583   .6844242     2.45   0.014      .336136    3.019029
                          254  |   1.677583   .6844242     2.45   0.014      .336136    3.019029
                          255  |   1.677583   .6844242     2.45   0.014      .336136    3.019029
                          256  |   1.677583   .6844242     2.45   0.014      .336136    3.019029
                          257  |   1.677583   .6844242     2.45   0.014      .336136    3.019029
                          258  |   1.677583   .6844242     2.45   0.014      .336136    3.019029
                  ------------------------------------------------------------------------------
                  Does anyone know how to fix this? Are my data just too sparse to correctly conduct this kind of analysis? For the sake of completness see an overview of the data:

                  Code:
                  . summarize year industry company ceo
                  
                      Variable |       Obs        Mean    Std. Dev.       Min        Max
                  -------------+--------------------------------------------------------
                          year |      1013    2006.803    2.788406       2002       2011
                      industry |      1013    6.023692    2.928568          1         12
                       company |      1013    125.0237    124.1759          1        559
                           ceo |      1013    129.1125     74.7792          1        258
                  Thanks in advance for any help!
                  Last edited by Thomas Lindner; 16 May 2015, 05:15.

                  Comment


                  • #10
                    To be honest, I don't know as I don't have the data for further diagnosis. However, looking at your output, my first guess would be because the 'ceo' variable includes 258 categories !! you may have `zero' in 'performance' values in relation to some of the 'ceo' values. But I am not sure if this is the source of the problem. This can be checked with some descriptive stats. Could you please check it for the identical estimates related to 'ceo' values. Run the following to check it:

                    Code:
                    foreach j of numlist 1 4 9 253 254 255 256 257 258 {
                    tab ceo if ceo == `j' ,sum(performance)
                    }
                    See the returned results.

                    Saying that, another possibility popping in my mind. I don't know what kind of variable 'ceo' is but could you treat it as a continuous variable? Would it make more sense? Interpreting and presenting estimates from 258 categories seems far away from being a decent idea unless you have strong reasons behind that. If a linear fit treating 'ceo' as continuous variable make sense, you can use 'margins' for post estimation means at certain intervals of 'ceo' and plot the effects..for example:

                    Code:
                    anova performance ceo
                    
                    margins,at(ceo=(1 (50) 250)
                    
                    marginsplot,xdim(ceo)
                    Not sure if that solves your problems.

                    Best,
                    Roman

                    Comment


                    • #11
                      Thanks again for your answer.

                      The ‘ceo’ variable is the name of a CEO in charge for a respective company, which conducts business in the related industry and year. So the first value of the ‘ceo’ variable might be “Steve Jobs”, working for “Apple” (‘company’ variable) which is in the “Technology Sector” (‘industry’ variable) in “2008” (‘year’ variable) and the company had an return of sales in that year of “15,7 %” (‘performance’ variable). So the 'ceo' variable should be a categorical variable in my opinion. The ‘performance’ variable is at least in my sample never 0 or identical to another value.

                      I don’t want to report the different ‘ceo’ intercepts directly, but I would like to know them for further calculations. Therefore, your suggested margins plot doesn’t really solve my problem.

                      My stata output for running your for-loop looks as follows:

                      Code:
                      . foreach j of numlist 1 4 9 253 254 255 256 257 258 {
                        2. 
                      . tab ceo if ceo == `j' ,sum(performance)
                        3. 
                      . }
                      
                                  |       Summary of performance
                              ceo |        Mean   Std. Dev.       Freq.
                      ------------+------------------------------------
                                1 |   13.868823   4.3292114           3
                      ------------+------------------------------------
                            Total |   13.868823   4.3292114           3
                      
                                  |       Summary of performance
                              ceo |        Mean   Std. Dev.       Freq.
                      ------------+------------------------------------
                               4  |   2.4502015   1.3382299           9
                      ------------+------------------------------------
                            Total |   2.4502015   1.3382299           9
                      
                                  |       Summary of performance
                              ceo |        Mean   Std. Dev.       Freq.
                      ------------+------------------------------------
                                9 |   3.5016346   3.1326687           7
                      ------------+------------------------------------
                            Total |   3.5016346   3.1326687           7
                      
                                  |       Summary of performance
                              ceo |        Mean   Std. Dev.       Freq.
                      ------------+------------------------------------
                             253  |   4.9944388   .39269566           7
                      ------------+------------------------------------
                            Total |   4.9944388   .39269566           7
                      
                                  |       Summary of performance
                              ceo |        Mean   Std. Dev.       Freq.
                      ------------+------------------------------------
                              254  |   2.6932446   1.2328878          10
                      ------------+------------------------------------
                            Total |   2.6932446   1.2328878          10
                      
                                  |       Summary of performance
                              ceo |        Mean   Std. Dev.       Freq.
                      ------------+------------------------------------
                             255  |   1.8880489   7.5637416           2
                      ------------+------------------------------------
                            Total |   1.8880489   7.5637416           2
                      
                                  |       Summary of performance
                              ceo |        Mean   Std. Dev.       Freq.
                      ------------+------------------------------------
                              256 |   .70206482   2.3068254           5
                      ------------+------------------------------------
                            Total |   .70206482   2.3068254           5
                      
                                  |       Summary of performance
                              ceo |        Mean   Std. Dev.       Freq.
                      ------------+------------------------------------
                             257  |  -1.9966718   1.7356451           4
                      ------------+------------------------------------
                            Total |  -1.9966718   1.7356451           4
                      
                                  |       Summary of performance
                              ceo |        Mean   Std. Dev.       Freq.
                      ------------+------------------------------------
                              258 |  -.97718828   2.8261166           2
                      ------------+------------------------------------
                            Total |  -.97718828   2.8261166           2
                      Looking at the raw data I just can't figure out a particular pattern why some values of 'ceo' are properly calculated and others not. Anyone an idea?

                      Thanks in advance for any help!

                      Comment


                      • #12
                        First of all, you are correct. 'ceo' should be categorical. Now we have a basic problem here. Because I focused on the the anova command and partial/sequential-SS issue, I didn't notice what kind of dataset you have. Now from post-1 and from the post above #11 it is clear that you have a panel dataset, therefore, your model is misspecified. You are fitting a single-level model with multi-level structure (firm – year–ceo) . You either need 'repeated measured anova' or 'mixed' ('mixed' is preferred, see help mixed). After fitting the right model then try estimating the `ceo' means with `margins'.
                        Last edited by Roman Mostazir; 18 May 2015, 10:35.
                        Roman

                        Comment


                        • #13
                          You are totally right, I have an unbalanced panel dataset over the timespan of 10 years. I should have mentioned that right away.

                          Within the greater context of my analysis, I was going to estimate the so called “CEO-effect” (i.e. the portion in firm performance variance explained by the respective CEO in charge) not only with an ANOVA as above (which most papers do), but also with a multi-level model using ‘xtmixed’ in order to explicitly account for the nested structure of the data. Articles applying this MLM approach used the following Stata command:

                          Code:
                           xtmixed performance || industry: || firm: || ceo: || year: , var
                          As you can see, no cross-level covariances are modeled. If, however, I run the model with my data, the “year-effect” (i.e. the portion in firm performance variance explained by the year) is unusually high compared to my ANOVA estimates. In other articles the different effects were at least in comparable range. Therefore, I’m not sure if the MLM model is correctly specified for my dataset. Any suggestions?

                          For the sake of completeness see my Stata output for the partial ANOVA in comparison to the MLM approach:

                          Code:
                          . xtmixed performance || industry: || company: || ceo:  || year: , var technique(bfgs)
                          
                          ...
                          
                          Computing standard errors:
                          
                          Mixed-effects ML regression                     Number of obs      =      1013
                          
                          -----------------------------------------------------------
                                          |   No. of       Observations per Group
                           Group Variable |   Groups    Minimum    Average    Maximum
                          ----------------+------------------------------------------
                                 industry |       12         20       84.4        308
                                  company |      138          1        7.3         10
                                      ceo |      258          1        3.9         10
                                     year |     1013          1        1.0          1
                          -----------------------------------------------------------
                          
                                                                          Wald chi2(0)       =         .
                          Log likelihood = -4129.8384                     Prob > chi2        =         .
                          
                          ------------------------------------------------------------------------------
                           performance |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                          -------------+----------------------------------------------------------------
                                 _cons |   1.448331   .8931908     1.62   0.105    -.3022908    3.198953
                          ------------------------------------------------------------------------------
                          
                          ------------------------------------------------------------------------------
                            Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
                          -----------------------------+------------------------------------------------
                          industry: Identity           |
                                            var(_cons) |   .9782162   3.447573      .0009784    978.0579
                          -----------------------------+------------------------------------------------
                          company: Identity            |
                                            var(_cons) |   47.34626   14.25791       26.2394    85.43141
                          -----------------------------+------------------------------------------------
                          ceo: Identity                |
                                            var(_cons) |   28.49681   13.00188      11.65277    69.68884
                          -----------------------------+------------------------------------------------
                          year: Identity               |
                                            var(_cons) |   141.8553   1694.618      9.62e-09    2.09e+12
                          -----------------------------+------------------------------------------------
                                         var(Residual) |   20.26191   1694.602      1.31e-70    3.14e+72
                          ------------------------------------------------------------------------------
                          LR test vs. linear regression:       chi2(4) =   134.35   Prob > chi2 = 0.0000
                          
                          
                          
                          anova performance i.year i.industry i.company i.ceo
                          
                                                     Number of obs =    1013     R-squared     =  0.4861
                                                     Root MSE      = 12.7608     Adj R-squared =  0.3000
                          
                                            Source |  Partial SS    df       MS           F     Prob > F
                                        -----------+----------------------------------------------------
                                             Model |  114425.437   269  425.373373       2.61     0.0000
                                                   |
                                              year |  1397.10497     9  155.233885       0.95     0.4778
                                          industry |  2149.56357    11   195.41487       1.20     0.2828
                                           company |  82560.7515   126   655.24406       4.02     0.0000
                                               ceo |  26266.5283   123  213.549011       1.31     0.0195
                                                   |
                                          Residual |   120987.85   743  162.836945   
                                        -----------+----------------------------------------------------
                                             Total |  235413.288  1012  232.621826
                          Thanks for any help!

                          Comment


                          • #14
                            Your 'mixed-model' is incorrectly specified no doubt. I am guessing you have a three-level model, therefore two random effects but your specifications are causing confusions. Therefore, before suggesting a model, a sample of your data will be useful to define the hierarchical structure of the dataset. Could you please provide a sample of the data set. Just run the command below and copy the output and paste here with the code-delimiters.

                            Code:
                            list industry firm ceo year performance in 1/30,clean
                            Roman

                            Comment


                            • #15
                              Thanks again for your answer Roman! Here is my Stata output:

                              Code:
                              . list industry company ceo year performance in 1/30, clean
                              
                                     industry   company   ceo   year   perform~e  
                                1.          1         1     1   2002   5.3644982  
                                2.          1         1     1   2003   6.2105323  
                                3.          1         1     1   2004   7.0976729  
                                4.          1         1     1   2005    7.420506  
                                5.          1         1     1   2006    5.764411  
                                6.          1         1     1   2007   6.6186186  
                                7.          1         1     1   2008   6.7345012  
                                8.          1         1     1   2009   2.7605634  
                                9.          1         1     1   2010   5.3399887  
                               10.          1         1     1   2011   5.8963093  
                               11.          2         3     2   2002   12.060431  
                               12.          2         3     2   2003   .88963312  
                               13.          2         3     2   2004   7.9966868  
                               14.          2         3     2   2005   5.8898665  
                               15.          2         3     2   2006   .61083222  
                               16.          2         3     3   2007   7.8003133  
                               17.          2         3     3   2008   7.3470911  
                               18.          2         3     3   2009   14.298538  
                               19.          2         3     3   2010   13.635128  
                               20.          2         3     3   2011    14.47492  
                               21.          2         4     4   2002   8.4291881  
                               22.          2         4     5   2003    7.098982  
                               23.          2         4     5   2004   8.6865757  
                               24.          2         4     5   2005    5.725521  
                               25.          2         4     5   2006   2.7502536  
                               26.          2         4     5   2007   7.6726214  
                               27.          2         4     5   2008   10.115243  
                               28.          2         4     5   2009   4.2877742  
                               29.          2         4     5   2010   2.7087993  
                               30.          2         4     6   2011   5.5520171
                              As you might see, there is not much variation in this first couple entries. However, my data gets more sparse and variable for other companies as for instance no performance indicators were available. See exemplary the following output:

                              Code:
                              . list industry company ceo year performance in 980/1010, clean
                              
                                      industry   company   ceo   year   performa~e  
                               980.          5       455   246   2008    -4.505062  
                               981.          5       455   247   2009   -4.1491119  
                               982.          5       455   247   2010   -14.133701  
                               983.          3       469   248   2002     2.963495  
                               984.          3       469   248   2003     3.491723  
                               985.          3       469   248   2004    6.3760305  
                               986.          3       469   248   2005    6.2084257  
                               987.          3       469   248   2006    4.6043165  
                               988.          3       478   249   2002   -1.0733204  
                               989.          3       478   249   2003     .7910689  
                               990.          3       478   249   2004     5.238406  
                               991.          3       478   250   2005    .65075203  
                               992.         10       496   251   2003   -3.6533432  
                               993.         10       505   252   2003    7.4573977  
                               994.         10       505   252   2004    16.992098  
                               995.         10       505   252   2005    9.5537914  
                               996.         10       505   252   2006    12.289505  
                               997.          8       509   253   2002    5.5996051  
                               998.          8       509   253   2003   -5.5830062  
                               999.          1       511   254   2002    4.9023817  
                              1000.          1       511   254   2003    4.3374542  
                              1001.          8       534   255   2008    7.6471916  
                              1002.          8       534   255   2009   -4.2525691  
                              1003.          8       534   255   2010    .55300577  
                              1004.          8       534   255   2011    .27123184  
                              1005.          8       535   256   2008     8.397254  
                              1006.          8       535   256   2009    7.9749495  
                              1007.          8       535   256   2010    4.3267016  
                              1008.          8       535   257   2011    -5.956808  
                              1009.          8       536   258   2008    25.454743  
                              1010.          8       536   258   2009    22.419817
                              The "general idea" behind my data is described in post #11.

                              Thanks for any help!

                              Comment

                              Working...
                              X