Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Collapse gives a different result

    Hi,

    I have divided my panel data into 5 portfolios depending on the leverage change. I want to calculate the mean return per portfolio. My data is structured as following:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long gvkey float time double ri float change5
     17404 408     6.2868 2
    100174 408  .67875385 3
     12383 408    20.2034 1
     28272 408 -3.0111372 4
     15444 408  10.393393 3
    100816 408  14.393223 2
    101538 408   8.974827 5
    100951 408  10.318255 3
    100650 408   7.108283 3
    102283 408  16.589928 5
    104643 408   4.458046 5
     18860 408 -.97833276 3
    101048 408  22.843945 1
    100171 408  25.716532 1
      4439 408  11.580002 2
     12368 408  16.007114 3
     11749 408   9.778547 1
    208224 408  10.131335 3
    101343 408  4.8086047 4
     19565 408  11.478603 2
      2411 408  1.5684724 3
     14620 408  34.344685 1
     63477 408  22.201073 5
    100095 408  12.224221 1
    102576 408  12.202585 5
    102569 408  15.309072 5
     16299 408   13.66235 2
    100131 408  4.9301624 4
    100913 408   4.228711 4
    100619 408  20.178556 1
    end
    format %tm time
    To calculate the mean return per portfolio, I have used the following code:
    Code:
    bysort change5 year month: egen mean_port = mean(ri)
    And to give me the mean returns per portfolio:
    Code:
    forvalues i=1/5 {
        sum mean_port if (change5 == `i')
       }
    In other post, someone told me to collapse the data, but when I do it, the mean returns differ. To collapse the data I have used:
    Code:
    collapse (mean) ri, by(change5 time year month)
    The resulting data:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int year byte month float(time change5) double mean_port
    1995  1 420 1  1.0816451685714283
    1995  2 421 1  3.6309387985714285
    1995  3 422 1  1.0072868880952377
    1995  4 423 1  5.0430555279047615
    1995  5 424 1  2.9058011104761903
    1995  6 425 1   2.851414689047618
    1995  7 426 1   7.513776112857145
    1995  8 427 1 -1.9839238423809529
    1995  9 428 1   6.297235610476192
    1995 10 429 1 -1.9108445761904764
    1995 11 430 1   2.716185952380952
    1995 12 431 1  2.0948071990476187
    1996  1 432 1  2.3413777818181822
    1996  2 433 1   2.690949502727273
    1996  3 434 1   4.891548433636363
    1996  4 435 1  1.2471700777272732
    1996  5 436 1   2.880885366818182
    1996  6 437 1  2.4656674173913045
    1996  7 438 1 -1.8710875078260867
    1996  8 439 1   6.912583163478258
    1996  9 440 1   .7871459691304344
    1996 10 441 1   5.783322285652174
    1996 11 442 1  1.7836946360869574
    1996 12 443 1  3.0109286543478255
    1997  7 450 1 -.43301152388888897
    1997  8 451 1  .33338073421052644
    1997  9 452 1   8.364553181578946
    1997 10 453 1 -.30252995263157895
    1997 11 454 1  1.1336856173684209
    1997 12 455 1 -1.2938973271052634
    end
    format %tm time
    My question is, do I have to collapse the data to obtain the mean returns or is the correct way to not collapse the data.

    Thanks


  • #2
    I have manually checked some portfolios and they seem to have the same mean returns, collapsed and not collapsed, so I do not understand why the mean returns differ when I use the following code:
    Code:
     forvalues i=1/5 {    
    sum mean_port if (change5 == `i')    
    }
    Last edited by Juan Gonzalex; 27 Jun 2022, 03:00.

    Comment


    • #3
      Juan:
      yout -forvalues- code mirrors:
      Code:
      bysort change5: egen wanted2=mean(ri)
      Try it and see in what it differs from your other codes.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Hi Carlo,

        Thank you for answering. When I compute the code you give me, I indeed get the same mean returns as with the -forvalues-. But when I collapse it and I compute a t-test, the t-test cannot be computed because the stdev is 0. The strange thing is that when I collapse the entire dataset,
        Code:
         
         collapse (mean) ri, by(change5 time year month)
        if I look for example at change5 == 1, year == 2000 and month == 1 in the collapsed dataset and in the normal dataset, the mean returns do not differ. But when I summarize the mean returns for each portfolio, they do differ.

        Comment


        • #5
          Juan:
          in your -collapse- code you're calculating the mean if -ri- -by- all the variables reported within brackets.
          Despite I cannot replicate the issue you're complaining about with your data excerpt, I'munder the impression that your code returns the very same value for each observation each panel os composed of, lke in the following code:
          Code:
          . bysort gvkey time change5 : egen mean_port = mean(ri)
          
          .  list in 1/10
               +---------------------------------------------------+
               | gvkey     time           ri   change5   mean_port |
               |---------------------------------------------------|
            1. |  2411   1994m1    1.5684724         3    1.568472 |
            2. |  4439   1994m1    11.580002         2       11.58 |
            3. | 11749   1994m1     9.778547         1    9.778547 |
            4. | 12368   1994m1    16.007114         3    16.00711 |
            5. | 12383   1994m1      20.2034         1     20.2034 |
               |---------------------------------------------------|
            6. | 14620   1994m1    34.344685         1    34.34468 |
            7. | 15444   1994m1    10.393393         3    10.39339 |
            8. | 16299   1994m1     13.66235         2    13.66235 |
            9. | 17404   1994m1       6.2868         2      6.2868 |
           10. | 18860   1994m1   -.97833276         3   -.9783328 |
               +---------------------------------------------------+
          
          .
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            When I compute the following code:
            Code:
            sum mean_port if change5 == 1
            In both the collapsed and in the not collapsed dataset, the means are different, although when looking at each individual portfolio, the returns are the same for both datasets.

            Comment


            • #7
              This
              Code:
              bysort change5 year month: egen mean_port = mean(ri)
              is by change5 year month

              but this
              Code:
               
               collapse (mean) ri, by(change5 time year month)
              also includes -time-.

              I can't replicate either because your data example doesn't include year, but trying putting the same variables in both statements.

              hth,
              Jeph


              Comment


              • #8
                Let me put an easier example:
                Code:
                * Example generated by -dataex-. For more info, type help dataex
                clear
                input long gvkey float time int year byte month double ri float(change5 mean_port)
                100744 421 1995 2    2.223146 1  3.630939
                 11217 421 1995 2    6.982219 1  3.630939
                102576 421 1995 2   1.7892718 1  3.630939
                  8020 421 1995 2   10.545576 1  3.630939
                 12383 421 1995 2  -4.6700835 1  3.630939
                100862 421 1995 2   15.906179 1  3.630939
                 11217 422 1995 3  -13.510084 1 1.0072869
                102283 422 1995 3   11.712492 1 1.0072869
                 15444 422 1995 3    6.452191 1 1.0072869
                100095 422 1995 3   12.685835 1 1.0072869
                101538 422 1995 3   -.8816004 1 1.0072869
                 16299 422 1995 3   -3.021711 1 1.0072869
                100060 422 1995 3    9.742522 1 1.0072869
                100744 422 1995 3   -9.421879 1 1.0072869
                100131 422 1995 3    9.857952 1 1.0072869
                100862 422 1995 3  -.79218745 1 1.0072869
                  2410 422 1995 3    11.65806 1 1.0072869
                206248 422 1995 3   -7.708663 1 1.0072869
                102576 422 1995 3   6.2134624 1 1.0072869
                102696 422 1995 3   -8.328384 1 1.0072869
                101310 422 1995 3   13.336002 1 1.0072869
                  8020 422 1995 3   -3.154838 1 1.0072869
                208224 422 1995 3   4.2889357 1 1.0072869
                  4439 422 1995 3   12.612331 1 1.0072869
                101048 422 1995 3  -16.001278 1 1.0072869
                100980 422 1995 3  -14.046944 1 1.0072869
                 12383 422 1995 3   -.5391896 1 1.0072869
                 12383 423 1995 4     8.60126 1  5.043056
                102283 423 1995 4   4.7450423 1  5.043056
                 11217 423 1995 4   11.380541 1  5.043056
                100862 423 1995 4   .98056793 1  5.043056
                  8020 423 1995 4    7.155502 1  5.043056
                  2410 423 1995 4   3.0635595 1  5.043056
                100131 423 1995 4   5.7053924 1  5.043056
                 15444 423 1995 4   2.8944254 1  5.043056
                100744 423 1995 4   23.725437 1  5.043056
                100060 423 1995 4  -5.9101877 1  5.043056
                102576 423 1995 4 -.121194124 1  5.043056
                208224 423 1995 4   1.2536168 1  5.043056
                101048 423 1995 4  13.8799305 1  5.043056
                  4439 423 1995 4    6.401515 1  5.043056
                101538 423 1995 4    4.154551 1  5.043056
                101310 423 1995 4    1.165235 1  5.043056
                 16299 423 1995 4   4.2894006 1  5.043056
                206248 423 1995 4   .12905598 1  5.043056
                100980 423 1995 4   10.086894 1  5.043056
                102696 423 1995 4    -3.21576 1  5.043056
                100095 423 1995 4   5.5393815 1  5.043056
                100744 424 1995 5  -2.5345922 1  2.905801
                  8020 424 1995 5     2.75079 1  2.905801
                101310 424 1995 5    3.702712 1  2.905801
                208224 424 1995 5   5.9866548 1  2.905801
                206248 424 1995 5   3.9435863 1  2.905801
                 16299 424 1995 5    2.766633 1  2.905801
                100980 424 1995 5  10.2055435 1  2.905801
                100060 424 1995 5   10.158777 1  2.905801
                  4439 424 1995 5    9.227503 1  2.905801
                100862 424 1995 5    4.736769 1  2.905801
                101538 424 1995 5   2.1997213 1  2.905801
                end
                format %tm time
                The average mean_port in this sample is 3.06. When I collapse this sample by:
                Code:
                collapse (mean) ri, by(change5 time year month)
                ren ri mean_port
                The average mean_port is 4.09. How is this possible?

                Comment


                • #9
                  Juan:
                  1) your first code is:
                  Code:
                  bysort change5 year: egen mean_port2 = mean(ri)
                  -month- is not included-

                  2) your second code (that includes -month-) is:
                  Code:
                  . list in 1/6
                  
                       +-----------------------------------------------------------------------------+
                       |  gvkey     time   year   month           ri   change5   mean_p~t   mean_p~2 |
                       |-----------------------------------------------------------------------------|
                    1. | 100744   1995m2   1995       2     2.223146         1   3.630939    3.60979 |
                    2. |  11217   1995m2   1995       2     6.982219         1   3.630939    3.60979 |
                    3. | 102576   1995m2   1995       2    1.7892718         1   3.630939    3.60979 |
                    4. |   8020   1995m2   1995       2    10.545576         1   3.630939    3.60979 |
                    5. |  12383   1995m2   1995       2   -4.6700835         1   3.630939    3.60979 |
                       |-----------------------------------------------------------------------------|
                    6. | 100862   1995m2   1995       2    15.906179         1   3.630939    3.60979 |
                       +-----------------------------------------------------------------------------+
                  
                  .
                  
                  Sum -ri- in 1/6 and calculates its mean (= 
                  5.4627181
                  )
                  Last edited by Carlo Lazzaro; 27 Jun 2022, 09:33.
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment

                  Working...
                  X