Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating performance covariance for group members, across years

    Hello, I previously asked a question regarding covariance of performance within group members,
    yet due to my lack of knowledge, I failed to interpret kind and detailed replies for my question.

    Thus, I decided to elaborate more on my dataset and on the problems I have while devising a code.

    My data is a panel data that lists performance of firms for a given year (fyear), within a given industry (sich). Each firm is assigned a unique id (gvkey).

    Now, I would like to see how the performance of firms in a specific industry covary across years. High covariance would imply that performance of the firms in the same industry move in the same direction across years (firms all improving or all failing compared to the year before), while low covariance would imply that their performance move rather randomly in terms of direction.

    The three issues that remain unsolved while trying to devise a code is as follows.
    1. Firm composition for each sich varies for each year. That is, some firms go bankrupt drop out of the industry, while new firms are added to the industry.
    2. I want to create a three-year moving window to measure covariance - that is, how performance of firms within a specific industry covary across years t-2, t-1, and t. Yet I do not know how to apply this into my code.
    3. I want to create a variable to save the covariance score for each observation. That is, for each firm-year-industry observation, I would to like to calculate the performance covariance measure and save it.

    Yet unfamiliar with the stata package (struggling though), I would really appreciate the help from the community. Thank you!

    Below is my database:
    clear
    input int(fyear sich) long gvkey float ni
    1991 3460 1009 1.084
    1992 3460 1009 2.388
    1993 3460 1009 3.362
    1994 3460 1009 4.855
    1991 3743 1010 156.959
    1992 3743 1010 -17.81
    1993 3743 1010 112.249
    1994 3743 1010 25.2
    1995 3743 1010 84.3
    1996 3743 1010 92.4
    1997 3743 1010 216.4
    1998 3743 1010 67.2
    1999 3743 1010 77.3
    2000 3743 1010 79.9
    2001 3743 1010 145.7
    2002 3743 1010 79.7
    2003 3743 1010 352.5
    1991 3661 1013 22.025
    1992 3661 1013 21.026
    1993 3661 1013 31.636
    1994 3661 1013 39.071
    1995 3661 1013 55.186
    1996 3661 1013 87.463
    1997 3661 1013 108.837
    1998 3661 1013 146.727
    1999 3661 1013 87.635
    2000 3661 1013 868.1
    2001 3661 1013 -1287.7
    2002 3661 1013 -1145
    2003 3661 1013 -76.7
    2004 3661 1013 16.4
    2005 3661 1013 110.7
    2006 3661 1013 65.7
    2007 3661 1013 106.3
    2008 3661 1013 -41.9
    2009 3661 1013 -474.3
    2010 3661 1013 62
    1991 3812 1017 12.302
    1992 3812 1017 .484
    1993 3812 1017 1.617
    1994 3812 1017 1.769
    1991 3861 1021 -1.187
    1992 3861 1021 -.528
    1993 3861 1021 -1.256
    1994 3861 1021 -4.184
    1995 3861 1021 .924
    1996 3861 1021 .701
    1997 3861 1021 1.549
    1998 3861 1021 -3.328
    1999 3861 1021 -2.207
    2000 3861 1021 -.808
    2001 3861 1021 -1.738
    2002 3861 1021 .084
    2003 3861 1021 -1.515
    2004 3861 1021 1.345
    2005 3861 1021 1.9
    2006 3861 1021 1.005
    2007 3861 1021 -4.673
    2008 3844 1021 -11.049
    1991 3580 1033 .603
    1992 3580 1033 -.276
    1993 3580 1033 -.374
    1991 2834 1034 5.081
    1992 2834 1034 16.176
    1993 2834 1034 8.621
    1994 2834 1034 -2.386
    1995 2834 1034 18.817
    1996 2834 1034 -11.461
    1997 2834 1034 17.408
    1998 2834 1034 24.211
    1999 2834 1034 36.972
    2000 2834 1034 55.508
    2001 2834 1034 -37.914
    2002 2834 1034 -99.661
    2003 2834 1034 13.833
    2004 2834 1034 -314.737
    2005 2834 1034 133.769
    2006 2834 1034 82.544
    2007 2834 1034 -13.581
    1991 3440 1036 37.01
    1992 3440 1036 25.684
    1993 3440 1036 39.811
    1994 3585 1036 62.143
    1995 3585 1036 78.519
    1996 3585 1036 94.92
    1997 3585 1036 137.978
    1998 3443 1036 99.688
    1999 3443 1036 88.91
    2000 3443 1036 56.55
    1991 3663 1037 -.725
    1992 3663 1037 -.5
    1993 3663 1037 .383
    1994 3663 1037 .767
    1995 3663 1037 -1.101
    1996 3663 1037 -2.761
    1997 3663 1037 .937
    1998 3663 1037 -3.48
    1999 3663 1037 -1.121
    2000 3663 1037 1.164
    2001 3663 1037 .67
    end
    [/CODE]

  • #2
    I don't have a solution to offer, but here are some thoughts.

    If we forget about the three-year window, one approach would be to do a multi-level model and get the intra-class correlation at the industry level:

    Code:
    mixed ni /*possibly some representation of time*/ || sich: || gvkey:
    estat icc
    If, however, we do that in the example data, we find (and really you can see this by just scanning the data by eye) that this ni variable is extremely noisy and there is no suggestion of any consistent patterns at either the firm or industry level. The inclusion of either a time trend or year indicators does not improve that situation. Now perhaps this example is not at all representative of the real data set, but this suggests that the prospects of finding firm or industry level variation are poor. Perhaps this is improved by the inclusion of other predictor variables that are not shown in the data example, nor discussed in the post. But unless there is something different, I would not expect to find much.

    Doing a similar analysis over a three year window might make matters even worse. While it would reduce noise attributable to time-dependent shocks, it would drastically reduce the number of observations available to estimate the variance components at every level, and, at least in the example, adjusting for the time-dependent shocks did not seem to make much difference in the results.

    Comment


    • #3
      Again, Mr.Schechter, thank you very much for the help! This time I realize what the -mixed- function means and how to get the ICC at the industry level.

      In the previous question, I used the variable NI (net income) to measure each firm's performance level. As you have mentioned, however, the variable is extremely noisy. Fortunately, there are other conventionally-used performance variables - such as RoA (Return on Assets) - that can act as alternatives. I believe that using the alternative predictors, and with the full data set, there could be meaningful variation (of performance covariance) at the industry level.

      And here goes my code for that
      Code:
      levelsof sich, local(levels)
      foreach l of local levels {
        mixed roa || sich: || gvkey:
        estat icc
        local icc_'roa` 'r(icc2)'
      }
      Now, the remaining problem is that
      1) The codes are executed, yet I do not know how to explicitly store the icc2 values in a variable
      2) I failed to understand how to put "the representations of time" to create the time window as I hope (performance covariance for the past three years).

      Would it possible to get an advice for these? I find it extremely encouraging to receive valuable comments from this community.

      Thank you!
      Last edited by Jason Lee; 03 Sep 2017, 00:53.

      Comment


      • #4
        Well, there are several problems with your code.

        1. The loop will just do the exact same thing each time through. Even though the local macro l will iterate through the levels of sich, it is never referred to inside the loop, so nothing ever changes.

        2. I assume what you meant to do is restrict the analysis in the loop to a single industry each time through. This requires the use of an -if-clause. If you do that, then you are no longer working with a three-level model, because the top level has been reduced to a single industry.

        3. -local icc_'roa` is a syntax error: it should be icc_`roa' (the quotes in opposite order). But even that is not going to do anything useful for you because local macro roa itself is never defined.

        So what I think you actually want is more like this:

        Code:
        levelsof sich, local(levels)
        gen icc = .
        foreach l of local levels {
            mixed roa if sich == `l' || gvkey:
            estat icc
            replace icc = `r(icc2)' if sich == `l'
        }
        Note that this will not run on your example data because in that example, sich 3844 has only a single observation, so -mixed- will fail there. I assume that in your real data every sich has enough observations to do a regression, even if you restrict to a 3 year window.

        To do this with a three year rolling window, I think your best bet is to install Robert Picard's -rangerun- program from SSC. You will also need to install -rangestat- if you do not already have it. (Also from SSC, by Robert Picard, Nick Cox, & Roberto Ferrer.) The code would go something like this:

        Code:
        capture program drop jason_lee
        program define jason_lee
            mixed ni || gvkey: // OR USE roa OR OTHER OUTCOME
            estat icc
            gen icc = `r(icc2)'
            exit
        end
        
        rangerun jason_lee, interval(fyear -2 0) by(sich) use(ni gvkey fyear sich)
        As for time, in a three year window I think this is not as important as the variables in question may be more stable over a three year period than they are over the full range of all years. And in any case with only three years, you don't have a lot of different ways to specify time here.

        Note that I have not been able to properly test this on your example data because in your data there is only one observation in each combination of sich and fyear, so it is not possible to get a meaningful mixed model estimation going. I assume that in your real data, you will not encounter this problem, and will have multiple gvkey's in each sich-fyear combination.




        Comment


        • #5
          On thinking about it more, however, I'm not convinced that this is what you want. The code shown in #4 will get you information about the consistency with which individual firms perform over a three year window, and it will give you that separately for each industry. But it will not tell you anything about the extent to which different firms' performances in the same industry are correlated. For that, you cannot restrict the analysis to a single industry, you need the three level model, and you need the ICC at the sich level, not the gvkey level. So it would be:

          Code:
          capture program drop jason_lee
          program define jason_lee
              mixed ni || sich: || gvkey: // OR USE roa OR OTHER OUTCOME
              estat icc
              gen icc = `r(icc3)'
              exit
          end
          
          rangerun jason_lee, interval(fyear -2 0) use(ni gvkey fyear sich)

          Comment


          • #6
            I am grateful for your reply, Mr.Schechter.

            Rangerun is a function that would really help my analysis, since my dataset consists of 100,000 observations in total.
            Yet, every time I execute the rangerun command, the program seems to freeze - The reason why I was so late to reply was
            because the stata never managed to run the code for two full days, at least with my computer.

            This being said, is this possible to devise a code using -levelsof-?
            I want to be sure whether my data is too huge to run the analysis even with the rangerun command, or whether the specific command does not work with my computer.

            Thank you for your time!

            Comment


            • #7
              See the first code in #4.

              No matter what you do, though, this is going to be very slow in a very large data set.

              Added: See Robert Picard's post below. This is almost certainly a better way to go.

              If, for whatever reason, you decide not to do that, the code in #4 is not the correct model. What I intended is that the mechanism of initializing an icc variable to missing values, and then -replace-ing it with the value of `r(icc3)' (not `r(icc2)') would work. So what I meant is the overall structure of the first block in #4 but the analysis of #5.
              Last edited by Clyde Schechter; 05 Sep 2017, 09:52.

              Comment


              • #8
                The first thing that can be done to speed up Clyde's code in #5 is to note that the results are the same for all observations with the same value for fyear. That's because rangerun runs the user's program for each observation in the dataset using a subset of the data defined by the current observation's interval bounds. There's no point in running the program more than once per value of fyear. You can control this by using an invalid interval (where the upper bound is less than the lower bound) for repeat observations within the same group.

                The second thing is to note that mixed optimizes using an iterative method and some cases may not converge easily or at all. You can observe this when used with the data example in #1 by using the verbose option in rangerun. You can speed up things by imposing a limit on the number of iterations.

                Like all rolling window problems, I would think that keeping track of the number of iterations is important. I also add a variable to confirm that the model converged.

                Code:
                capture program drop jason_lee
                program define jason_lee
                    mixed ni || sich: || gvkey: , iterate(10)
                    gen long nobs = e(N)
                    gen byte converged = e(converged)
                    estat icc
                    gen icc = `r(icc3)'
                    exit
                end
                
                bysort fyear (sich): gen high = cond(_n == 1, fyear, -1)
                rangerun jason_lee, interval(fyear -2 high) use(ni gvkey fyear sich) verbose

                Comment


                • #9
                  Thank you so much! now I clearly see that some of the cases do not converge, leading to infinite iterations.
                  The revised model solves the problem well, and now the results come up after some time.

                  Yet, I found that the code only runs 25 times, computing the ICC for the whole sample for the given fyear (25 years in total)
                  Code:
                  count if icc~=.
                    25
                  My aim, though, was to compute the ICC for the specific industry for the given fyear (which would amount to the number of sic codes in the given year * 25 years)
                  Would it possible to compute the ICC for each industry, for the given year, using the three-year window? I think I am almost there, with all the useful comments!

                  Comment


                  • #10
                    No, you can't. If you do the analysis separately for each industry, then there is no variation at the industry level in the regression and there is no ICC to compute that expresses within-firm coherence.

                    Comment


                    • #11
                      Thank you, Mr.Schechter! Now I see why measuring ICC for each industry does not make sense. It seems that I need to find an alternative way to calculate the coherence of performance within each group.

                      Thank you again for all the useful comments!

                      Comment


                      • #12
                        It seems that I need to find an alternative way to calculate the coherence of performance within each group.
                        Perhaps the range or the standard deviation or the median absolute deviation or the interquartile range? Or if the performance indicator is substantially influenced by industry so that these are not really comparable, you could take one of these and form a ratio to the industry mean (or median).

                        Comment

                        Working...
                        X