Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Long short portfolio return based on a factor rank for panel data

    Hi,

    I have a monthly panel data on securities (id) and time(months). I have monthly returns for the securities and monthly returns for various factors (equity indices) and sensitivities (betas) to those factors.

    Now for every month I want to rank those betas (high to low) and create quintile (Q1-Q5) portfolios where Q1 (top 20% of rank) and Q5 (bottom 20% rank). And then create a new time series of Q1-Q5 factor returns (monthly) based on each factor - similar to fama french methodology.

    Thanks,
    John.

  • #2
    There have been a number of posted related to similar issues. Please check them out.
    You can certainly use egen with percentile by month to do your ranks. Then, create a variable that makes quintiles from the full percentiles. Finally, you can do by month and quintile calculations to do the portfolios.

    Comment


    • #3
      Thanks, this is very helpful. I have over a million observations, trying to to figure what will be the most efficient way do this (without do loops and multiple steps).

      I am including a snapshot of my data for a particular id, ex_ret (returns) and factor returns (R3, dax, cac), and newdate (time index).

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float newdate int year byte month long id float ex_ret double(R3 dax cac)
      408 1994  1 28 -.0645    .03060191641549337 -.039365944906206485  .029462085706556174
      409 1994  2 28 -.0162  -.024160546504141345  -.03944063009483556  -.04119362067067478
      410 1994  3 28  .0803  -.043724768795500024   .01986067882021647   -.0695155872102281
      411 1994  4 28 -.0282   .011436857553822177   .05291335186652346   .04094288853153305
      412 1994  5 28  .0378   .011018487472685079  -.05266298007996517  -.06005032235367891
      413 1994  6 28  .1099   -.02737752161383289    -.048108285942567  -.05238192099589101
      414 1994  7 28 -.0238   .030993533215755376  .059891178765046904   .10780306868039857
      416 1994  9 28  .0237   -.02126590024915853  -.09087827914228253   -.0917414791496638
      417 1994 10 28  .0159    .01653603090596456   .02976512986206048  .014073357751685833
      418 1994 11 28  .0493  -.036488252803620225 -.011280971988241073  .037351314173836414
      419 1994 12 28  .0081   .015572097901300763  .028472947770302515  -.04798145172771462
      420 1995  1 28  .0373    .02191116548991423  -.04049691917705478 -.043892518233361155
      421 1995  2 28 -.0117    .04079570294049795   .04002928851662557 -.011641105391105522
      423 1995  4 28  .0377   .026145608073318893  .048554293947227434   .03245132085903357
      424 1995  5 28 -.0084   .036327509558550464  .037813625405518136  .018386890293646374
      425 1995  6 28 -.1104   .028917681007853302 -.003938494481806054 -.025864335480478284
      426 1995  7 28  .0154    .04015517343605435   .06469027270589711   .04050976687696095
      427 1995  8 28  .0101   .008876060874758007  .008820321443702373 -.019109648447315775
      428 1995  9 28 -.0337    .03874873321794037  -.02290567437039548  -.05045058445824502
      429 1995 10 28  .0305  -.008635545778235332  -.00874698222254744   .01433364228555023
      430 1995 11 28  -.018    .04435071596904483  .034558630201438234  .008369690651771844
      432 1996  1 28  .0184   .029024891958708388    .0959500949473795   .07989197049963659
      433 1996  2 28  .0065   .014751367690996275 .0013804885553045931  -.01492095127797688
      434 1996  3 28 -.0001   .010051696459511428  .004980695761152898  .027126257201445014
      435 1996  4 28 -.0456   .018961221023904518   .00779606335005445   .05096779100277615
      436 1996  5 28 -.0579   .025591966932002608  .014988524099391443 -.011889728736164562
      437 1996  6 28  .1171 -.0032286016737809176  .007310838445807599  .024325581225030923
      439 1996  8 28 -.0148   .030336297636648357  .028495764853336603 -.012673451558369409
      end
      format %tm newdate

      Comment


      • #4
        Hi,

        I am not sure if I am calculating the portfolio returns correctly. The number of observations are drastically lower. I want to run a regression using the new Q5-Q1 return but it is missing for many ids (securities).

        my code for Q5 (quintile 5) - Q1 (qunitile 1) portfolio return:

        bysort newdate : egen R3_80 = pctile(R3), p(80)
        by newdate : R3_q5 = mean (ex_ret) if R3 > R3_80

        bysort newdate : egen R3_20 = pctile(R3), p(20)
        by newdate : R3_q1 = mean (ex_ret) if R3 < R3_20

        Q5_Q1_R3 = R3_q5 - R3_q1

        Comment


        • #5
          This isn't legal code. It peters off into pseudocode. It will certainly fail at line 2.

          If you haven't tried it yet, you should try it first to find the elementary bugs in it.

          If you did try equivalent code first which Stata ran, please show us that exact code.

          Statalist is capricious. No one has to answer anything.

          Whether you get an answer depends simply on someone being able and willing to answer. You can increase the chance of someone answering by asking a good question and pushing as far as you possibly can.

          Comment


          • #6
            Hi,

            Here's the code that I ran, but it is not giving me the desired result.

            . xtile q_R3 = R3, nq(5)

            . sort newdate

            . by newdate : egen R3_q5 = mean (ex_ret) if q_R3 == 5
            (530467 missing values generated)

            . by newdate : egen R3_q1 = mean (ex_ret) if q_R3 == 1
            (528077 missing values generated)

            I think the code is generating the quintile return for each date but I need it populated for each date and for each id (not sure if this is making sense), because otherwise I cannot use Q5-Q1 return with my penal data (id, ymdate).

            Any suggestions/help is much appreciated.

            Best,
            John

            Comment


            • #7
              That makes sense. Each of the variables will have missing values for the other quintile bins, unless you explicitly spread the means.

              Code:
              xtile q_R3 = R3, nq(5)
              bysort newdate : egen R3_q5 = mean(cond(q_R3 == 5, ex_ret, .) )
              by newdate : egen R3_q1 = mean(cond(q_R3 == 1, ex_ret, .))
              See e.g. Section 9 of http://www.stata-journal.com/sjpdf.h...iclenum=dm0055

              (Please use CODE delimiters for code, as requested)

              Comment


              • #8
                Hi Nick,

                Thank you. This makes a lot of sense. Now the # of missing values is much smaller. Thanks again for all the help.

                Another follow up question. How do I test that the difference between R3_q5 and R3_q1 is statistically significant.

                Best,
                John.

                Comment


                • #9
                  John,

                  I currently work on a project where I also sort assets into portfolios based on a characteristic. Therefore, I implemented a new Stata program called -xtpsort- which implements the Fama and French (1993) type methodology. The program is not published yet but maybe it is useful for you (see attached). Comments and feedback is appreciated.

                  Best,
                  Daniel

                  Attached Files

                  Comment


                  • #10
                    How do I test that the difference between R3_q5 and R3_q1 is statistically significant
                    While there are some calculations you could do in Stata that would create the appearance of answering this question, in fact the proposed test is meaningless and you should not make any attempts to do it.

                    You have defined R3_q5 and R3_q1 to be the means of the 1st and 5th quintiles of the same variable R3. Therefore they are, by definition, different. It is not meaningful to pose a null hypothesis that they are equal: they cannot possibly be equal by definition. This would be even beyond the frequentlly seen "straw mean" null hypothesis. It's an "Escher drawing" null hypothesis: an illusion, an impossibility. If you were to actually perform the necessary calculations and conclude that the difference is not "statistically significant" the only valid interpretation of that finding would be that it is a Type II error.

                    Comment


                    • #11
                      Hi Daniel,

                      Thanks. This is very interesting and helpful.

                      I am curious it is possible to generate dependent sorts (i.e. sorts on sorts) for. e.g. creating market cap buckets (Quintile 1 to 5) and then within these buckets analyzing R3 quintiles (for e.g. R3_q5, R3_q1 and R3_q5_minus_q1)

                      It would be helpful if you have any examples to show the different options/grouping etc.

                      Best,
                      John.

                      Comment


                      • #12
                        Hi Clyde,

                        Thanks for your comments. I share some of the same concerns that you have in terms of validity of testing the difference in q5 and q1 returns.

                        The only case I can possibly think of is, if by any reason, the difference is not statistically significant then I have a problem at hand. It could be that R3 does not have enough cross-sectional dispersion period by period to make the q1 and q5 returns distinct from each other (though I agree that it is a remote possibility).

                        Also since I am partly replicating an academic study, I am trying to see if I get similar statistical significance levels (a validation/confirmation of previous results).

                        Any help/suggestion is much appreciated.

                        Best,
                        John

                        Comment


                        • #13
                          I don't think even #12 addresses Clyde's point at all, despite the opener.

                          The point is that even to talk about statistical significance you need to identify even one test that makes sense here. Statistical significance can't mean just substantial dispersion, however defined.

                          What tests in previous studies are you trying to replicate?

                          Comment


                          • #14
                            Hi John,

                            The -xtpsort- program does not construct the dummy variables identifying the portfolios for you. It assumes that the portfolio dummies have been created beforehand. However, once you have the dummies, then you can compare any portfolios. Therefore, yes, the program can be used for doing comparisons of dependent sorts. However, when doing this you have to make sure that your if-condition when calling the program excludes all observations from portfolios other than the two portfolios you want to compare.

                            Best,
                            Daniel



                            Comment


                            • #15
                              Also since I am partly replicating an academic study, I am trying to see if I get similar statistical significance levels (a validation/confirmation of previous results).
                              But that wouldn't necessarily be a validation or confirmation of previous results, nor would getting different statistical significance results necessarily be a disconfirmation. p-values depend on too many things besides the actual effects being studied. Confirmation or validation of a study consists of demonstrating that you get similar effects.

                              And the fact that somebody did something ridiculous in a study and it didn't get caught in peer review isn't a reason to replicate it.

                              if by any reason, the difference is not statistically significant then I have a problem at hand. It could be that R3 does not have enough cross-sectional dispersion period by period to make the q1 and q5 returns distinct from each other
                              But as I said in #9, they are by definition different from each other. Whether a putative "statistical significance" test found them to be so or not would depend entirely on your sample size, the variance of R3, measurement error in R3, and nothing else. If your concern is that R3 has too little variance (whatever that means in your context) then just calculate the variance of R3 and appraise it. At least that will actually answer your real concern and it won't be confounded with extraneous issues.

                              Added: Crossed with #13.

                              Comment

                              Working...
                              X