Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating standard deviation of income for every five years for each firm

    Hi,

    I have to compute standard deviation of next five years earnings (roa in data) of each firm. For example Firm A's standard deviation of ROA in year t will be for year t+1 to t+5. Following is the data sample.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int(firmid year) float roa
    10006 1975   .044461
    10006 1976  .0468497
    10006 1977  .0455249
    10006 1978  .0445072
    10006 1979  .0470507
    10006 1980  .0403393
    10006 1981  .0383093
    10006 1982  .0275599
    10006 1983  .0005601
    10007 1989  .2495441
    10010 1986  .0703543
    10010 1987  .0953522
    10010 1988  .1139882
    10010 1989  .1082189
    10010 1990  .0898358
    10010 1991   .079795
    10010 1992 -.0218884
    10010 1993 -.1065698
    10010 1994  .0344516
    10012 1987  .2495441
    10012 1988  .2142017
    10012 1989  -.029544
    10012 1990  .0114192
    10012 1991 -.1401488
    10012 1992 -.0454312
    end
    Thank you.

  • #2
    Not sure what is referred to Firm-A here, I don't see any Firm-A in your data example. See below if this is what you wanted:

    Code:
    so firmid year
    byso firmid : gen n =_n
    gen five_y_sd =.
    
    
    levelsof firmid, loc(levels)
    foreach v of loc levels {
         su roa if n <= 5 & `v' == firmid
         replace five_y_sd = r(sd) if n <= 5 & `v' == firmid
    }
    
    li firmid year roa n five_y_sd in 1/19, noobs clean
    
     firmid   year         roa   n   five_y_sd
         10006   1975     .044461   1   .0012382  
         10006   1976    .0468497   2   .0012382  
         10006   1977    .0455249   3   .0012382  
         10006   1978    .0445072   4   .0012382  
         10006   1979    .0470507   5   .0012382  
         10006   1980    .0403393   6          .  
         10006   1981    .0383093   7          .  
         10006   1982    .0275599   8          .  
         10006   1983    .0005601   9          .  
         10007   1989    .2495441   1          .  
         10010   1986    .0703543   1   .0170879  
         10010   1987    .0953522   2   .0170879  
         10010   1988    .1139882   3   .0170879  
         10010   1989    .1082189   4   .0170879  
         10010   1990    .0898358   5   .0170879  
         10010   1991     .079795   6          .  
         10010   1992   -.0218884   7          .  
         10010   1993   -.1065698   8          .  
         10010   1994    .0344516   9          .
    If you don't want missing values in five_y_sd, change the condition in the last line to:

    Code:
    replace five_y_sd = r(sd) if `v' == firmid
    Last edited by Roman Mostazir; 09 Oct 2018, 06:50.
    Roman

    Comment


    • #3
      I've tried SSC rangestat here:

      Code:
      bysort firmid (year): generate low = year[1]
      bysort firmid (year): generate high = year[5]
      rangestat (sd) roa, interval(year low high) by(firmid)
      Code:
      . list, clean noobs
      
          firmid   year         roa    low   high      roa_sd  
           10006   1975     .044461   1975   1979   .00123819  
           10006   1976    .0468497   1975   1979   .00123819  
           10006   1977    .0455249   1975   1979   .00123819  
           10006   1978    .0445072   1975   1979   .00123819  
           10006   1979    .0470507   1975   1979   .00123819  
           10006   1980    .0403393   1975   1979   .00123819  
           10006   1981    .0383093   1975   1979   .00123819  
           10006   1982    .0275599   1975   1979   .00123819  
           10006   1983    .0005601   1975   1979   .00123819  
           10007   1989    .2495441   1989      .           .  
           10010   1986    .0703543   1986   1990   .01708788  
           10010   1987    .0953522   1986   1990   .01708788  
           10010   1988    .1139882   1986   1990   .01708788  
           10010   1989    .1082189   1986   1990   .01708788  
           10010   1990    .0898358   1986   1990   .01708788  
           10010   1991     .079795   1986   1990   .01708788  
           10010   1992   -.0218884   1986   1990   .01708788  
           10010   1993   -.1065698   1986   1990   .01708788  
           10010   1994    .0344516   1986   1990   .01708788  
           10012   1987    .2495441   1987   1991   .16593441  
           10012   1988    .2142017   1987   1991   .16593441  
           10012   1989    -.029544   1987   1991   .16593441  
           10012   1990    .0114192   1987   1991   .16593441  
           10012   1991   -.1401488   1987   1991   .16593441  
           10012   1992   -.0454312   1987   1991   .16593441

      Comment


      • #4
        dave Airey's helpful suggestion can be corrected and tweaked to

        Code:
         
         rangestat (sd) roa, interval(year 1 5) by(firmid)
        In Dave's code

        Code:
        year[1]
        is a typo for

        Code:
        year + 1
        and so forth. The main point is that there is no need for a loop. One command line suffices.

        Comment


        • #5
          Thank you. I'm enjoying trying to learn rangestat! Can you comment on the output of your command when I run it vs the other two above?

          Code:
          . list, clean noobs
          
              firmid   year         roa      roa_sd  
               10006   1975     .044461   .00272769  
               10006   1976    .0468497   .00367525  
               10006   1977    .0455249   .00752926  
               10006   1978    .0445072   .01827851  
               10006   1979    .0470507   .01830155  
               10006   1980    .0403393   .01944883  
               10006   1981    .0383093   .01909174  
               10006   1982    .0275599           .  
               10006   1983    .0005601           .  
               10007   1989    .2495441           .  
               10010   1986    .0703543   .01381626  
               10010   1987    .0953522   .05533744  
               10010   1988    .1139882   .09154467  
               10010   1989    .1082189   .08111265  
               10010   1990    .0898358    .0802909  
               10010   1991     .079795   .07098376  
               10010   1992   -.0218884   .09971719  
               10010   1993   -.1065698           .  
               10010   1994    .0344516           .  
               10012   1987    .2495441   .13092866  
               10012   1988    .2142017   .06412232  
               10012   1989    -.029544   .07656833  
               10012   1990    .0114192   .06697546  
               10012   1991   -.1401488           .  
               10012   1992   -.0454312           .

          Comment


          • #6
            Your code uses fixed years where moving years are needed.

            Comment


            • #7
              I see. I reread the post and noted the SD is for t+1 to t+5 for a given t. So the SD for 1975 is for years 1976,1977,1978,1979,1980, and the SD for 1976 is for 1977,1978,1979,1980,1981 etc.

              Thanks for the clarification and correction and for rangestat. Very nice contribution.

              Comment


              • #8
                Quite so. Thanks. Almost all of the credit belongs to Robert Picard as first author.

                Comment


                • #9
                  Thank you so much for your help.

                  Comment


                  • #10
                    One more thing, I want SD for at least 5 years, means if a company has 20 years data (1995-2015) it shows SD for 1995 till 2010. How this condition can be added to the above code?

                    Nick and Dave, can you please help me on this?

                    Comment


                    • #11
                      I don't understand what that means. The code above calculates SDs for windows of at most 5 years. If you want something else, show what code you are using instead and give an example showing what is wrong.

                      Alternatively, see the help for how to get the count as well and then ignore values based on fewer than 5 years.

                      Comment


                      • #12
                        Dear Nick

                        After using this code
                        Code:
                         rangestat (sd) roa, interval(year 1 5) by(firmid)
                        I get the following result.

                        Code:
                        * Example generated by -dataex-. To install: ssc install    dataex
                        clear
                        input int(firmid year) float roa double roa_sd
                        10006 1975   .044461 .002727692098149268
                        10006 1976  .0468497 .003675253857833364
                        10006 1977  .0455249 .007529261329403107
                        10006 1978  .0445072  .01827851389877486
                        10006 1979  .0470507  .01830154664930318
                        10006 1980  .0403393  .01944882503111198
                        10006 1981  .0383093  .01909174212367013
                        10006 1982  .0275599                   .
                        10006 1983  .0005601                   .
                        10007 1989  .2495441                   .
                        10010 1986  .0703543 .013816260580260344
                        10010 1987  .0953522  .05533743863453224
                        10010 1988  .1139882  .09154466547308558
                        10010 1989  .1082189  .08111264657163188
                        10010 1990  .0898358  .08029090136289406
                        10010 1991   .079795  .07098376459096417
                        10010 1992 -.0218884  .09971718608620192
                        10010 1993 -.1065698                   .
                        10010 1994  .0344516                   .
                        10012 1987  .2495441  .13092866381877577
                        end
                        This is absolutely what I needed but what shall I do if SD is calculated using data for less than 5 years as we see for firmid 10006 roa_sd of 1979, 1980 and 1981 are calculated using less than 5 years ahead data of roa. I need to calculate SD using at least 5 years ahead data and showing rest as missing.

                        Comment


                        • #13
                          Already explained in the last paragraph of #11:

                          Code:
                          rangestat (count) roa (sd) roa, interval(year 1 5) by(firmid)
                          replace roa_sd = . if roa_count < 5

                          Comment


                          • #14
                            Oh that's brilliant, I was thinking the count would be total count so how can it be useful but that's also on rolling basis

                            Thank you so much.

                            Comment

                            Working...
                            X