Long short portfolio return based on a factor rank for panel data

john Abe

Join Date: Sep 2017

Posts: 70
#1

Long short portfolio return based on a factor rank for panel data

01 Oct 2017, 10:23

Hi,

I have a monthly panel data on securities (id) and time(months). I have monthly returns for the securities and monthly returns for various factors (equity indices) and sensitivities (betas) to those factors.

Now for every month I want to rank those betas (high to low) and create quintile (Q1-Q5) portfolios where Q1 (top 20% of rank) and Q5 (bottom 20% rank). And then create a new time series of Q1-Q5 factor returns (monthly) based on each factor - similar to fama french methodology.

Thanks,
John.
Tags: panel, panel data, Time Series
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

02 Oct 2017, 10:47

There have been a number of posted related to similar issues. Please check them out.
You can certainly use egen with percentile by month to do your ranks. Then, create a variable that makes quintiles from the full percentiles. Finally, you can do by month and quintile calculations to do the portfolios.
Comment

john Abe

Join Date: Sep 2017
Posts: 70

03 Oct 2017, 18:03

Thanks, this is very helpful. I have over a million observations, trying to to figure what will be the most efficient way do this (without do loops and multiple steps).

I am including a snapshot of my data for a particular id, ex_ret (returns) and factor returns (R3, dax, cac), and newdate (time index).

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float newdate int year byte month long id float ex_ret double(R3 dax cac)
408 1994  1 28 -.0645    .03060191641549337 -.039365944906206485  .029462085706556174
409 1994  2 28 -.0162  -.024160546504141345  -.03944063009483556  -.04119362067067478
410 1994  3 28  .0803  -.043724768795500024   .01986067882021647   -.0695155872102281
411 1994  4 28 -.0282   .011436857553822177   .05291335186652346   .04094288853153305
412 1994  5 28  .0378   .011018487472685079  -.05266298007996517  -.06005032235367891
413 1994  6 28  .1099   -.02737752161383289    -.048108285942567  -.05238192099589101
414 1994  7 28 -.0238   .030993533215755376  .059891178765046904   .10780306868039857
416 1994  9 28  .0237   -.02126590024915853  -.09087827914228253   -.0917414791496638
417 1994 10 28  .0159    .01653603090596456   .02976512986206048  .014073357751685833
418 1994 11 28  .0493  -.036488252803620225 -.011280971988241073  .037351314173836414
419 1994 12 28  .0081   .015572097901300763  .028472947770302515  -.04798145172771462
420 1995  1 28  .0373    .02191116548991423  -.04049691917705478 -.043892518233361155
421 1995  2 28 -.0117    .04079570294049795   .04002928851662557 -.011641105391105522
423 1995  4 28  .0377   .026145608073318893  .048554293947227434   .03245132085903357
424 1995  5 28 -.0084   .036327509558550464  .037813625405518136  .018386890293646374
425 1995  6 28 -.1104   .028917681007853302 -.003938494481806054 -.025864335480478284
426 1995  7 28  .0154    .04015517343605435   .06469027270589711   .04050976687696095
427 1995  8 28  .0101   .008876060874758007  .008820321443702373 -.019109648447315775
428 1995  9 28 -.0337    .03874873321794037  -.02290567437039548  -.05045058445824502
429 1995 10 28  .0305  -.008635545778235332  -.00874698222254744   .01433364228555023
430 1995 11 28  -.018    .04435071596904483  .034558630201438234  .008369690651771844
432 1996  1 28  .0184   .029024891958708388    .0959500949473795   .07989197049963659
433 1996  2 28  .0065   .014751367690996275 .0013804885553045931  -.01492095127797688
434 1996  3 28 -.0001   .010051696459511428  .004980695761152898  .027126257201445014
435 1996  4 28 -.0456   .018961221023904518   .00779606335005445   .05096779100277615
436 1996  5 28 -.0579   .025591966932002608  .014988524099391443 -.011889728736164562
437 1996  6 28  .1171 -.0032286016737809176  .007310838445807599  .024325581225030923
439 1996  8 28 -.0148   .030336297636648357  .028495764853336603 -.012673451558369409
end
format %tm newdate

Comment

john Abe

Join Date: Sep 2017

Posts: 70
#4

16 Oct 2017, 06:03

Hi,

I am not sure if I am calculating the portfolio returns correctly. The number of observations are drastically lower. I want to run a regression using the new Q5-Q1 return but it is missing for many ids (securities).

my code for Q5 (quintile 5) - Q1 (qunitile 1) portfolio return:

bysort newdate : egen R3_80 = pctile(R3), p(80)
by newdate : R3_q5 = mean (ex_ret) if R3 > R3_80

bysort newdate : egen R3_20 = pctile(R3), p(20)
by newdate : R3_q1 = mean (ex_ret) if R3 < R3_20

Q5_Q1_R3 = R3_q5 - R3_q1
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35699
#5

16 Oct 2017, 06:14

This isn't legal code. It peters off into pseudocode. It will certainly fail at line 2.

If you haven't tried it yet, you should try it first to find the elementary bugs in it.

If you did try equivalent code first which Stata ran, please show us that exact code.

Statalist is capricious. No one has to answer anything.

Whether you get an answer depends simply on someone being able and willing to answer. You can increase the chance of someone answering by asking a good question and pushing as far as you possibly can.
Comment
john Abe

Join Date: Sep 2017

Posts: 70
#6

16 Oct 2017, 11:30

Hi,

Here's the code that I ran, but it is not giving me the desired result.

. xtile q_R3 = R3, nq(5)

. sort newdate

. by newdate : egen R3_q5 = mean (ex_ret) if q_R3 == 5
(530467 missing values generated)

. by newdate : egen R3_q1 = mean (ex_ret) if q_R3 == 1
(528077 missing values generated)

I think the code is generating the quintile return for each date but I need it populated for each date and for each id (not sure if this is making sense), because otherwise I cannot use Q5-Q1 return with my penal data (id, ymdate).

Any suggestions/help is much appreciated.

Best,
John
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35699
#7

16 Oct 2017, 12:11

That makes sense. Each of the variables will have missing values for the other quintile bins, unless you explicitly spread the means.

Code:

xtile q_R3 = R3, nq(5) bysort newdate : egen R3_q5 = mean(cond(q_R3 == 5, ex_ret, .) ) by newdate : egen R3_q1 = mean(cond(q_R3 == 1, ex_ret, .))

See e.g. Section 9 of http://www.stata-journal.com/sjpdf.h...iclenum=dm0055

(Please use CODE delimiters for code, as requested)
Comment
john Abe

Join Date: Sep 2017

Posts: 70
#8

16 Oct 2017, 12:32

Hi Nick,

Thank you. This makes a lot of sense. Now the # of missing values is much smaller. Thanks again for all the help.

Another follow up question. How do I test that the difference between R3_q5 and R3_q1 is statistically significant.

Best,
John.
Comment
Daniel Hoechle

Join Date: Sep 2017

Posts: 11
#9

16 Oct 2017, 12:35

John,

I currently work on a project where I also sort assets into portfolios based on a characteristic. Therefore, I implemented a new Stata program called -xtpsort- which implements the Fama and French (1993) type methodology. The program is not published yet but maybe it is useful for you (see attached). Comments and feedback is appreciated.

Best,
Daniel

Attached Files

xtpsort.ado (7.1 KB, 1 view)

xtpsort.sthlp (5.2 KB, 1 view)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#10

16 Oct 2017, 12:50

How do I test that the difference between R3_q5 and R3_q1 is statistically significant

While there are some calculations you could do in Stata that would create the appearance of answering this question, in fact the proposed test is meaningless and you should not make any attempts to do it.

You have defined R3_q5 and R3_q1 to be the means of the 1st and 5th quintiles of the same variable R3. Therefore they are, by definition, different. It is not meaningful to pose a null hypothesis that they are equal: they cannot possibly be equal by definition. This would be even beyond the frequentlly seen "straw mean" null hypothesis. It's an "Escher drawing" null hypothesis: an illusion, an impossibility. If you were to actually perform the necessary calculations and conclude that the difference is not "statistically significant" the only valid interpretation of that finding would be that it is a Type II error.
1 like
Comment
john Abe

Join Date: Sep 2017

Posts: 70
#11

16 Oct 2017, 12:52

Hi Daniel,

Thanks. This is very interesting and helpful.

I am curious it is possible to generate dependent sorts (i.e. sorts on sorts) for. e.g. creating market cap buckets (Quintile 1 to 5) and then within these buckets analyzing R3 quintiles (for e.g. R3_q5, R3_q1 and R3_q5_minus_q1)

It would be helpful if you have any examples to show the different options/grouping etc.

Best,
John.
Comment
john Abe

Join Date: Sep 2017

Posts: 70
#12

16 Oct 2017, 13:05

Hi Clyde,

Thanks for your comments. I share some of the same concerns that you have in terms of validity of testing the difference in q5 and q1 returns.

The only case I can possibly think of is, if by any reason, the difference is not statistically significant then I have a problem at hand. It could be that R3 does not have enough cross-sectional dispersion period by period to make the q1 and q5 returns distinct from each other (though I agree that it is a remote possibility).

Also since I am partly replicating an academic study, I am trying to see if I get similar statistical significance levels (a validation/confirmation of previous results).

Any help/suggestion is much appreciated.

Best,
John
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35699
#13

16 Oct 2017, 13:12

I don't think even #12 addresses Clyde's point at all, despite the opener.

The point is that even to talk about statistical significance you need to identify even one test that makes sense here. Statistical significance can't mean just substantial dispersion, however defined.

What tests in previous studies are you trying to replicate?
1 like
Comment
Daniel Hoechle

Join Date: Sep 2017

Posts: 11
#14

16 Oct 2017, 13:15

Hi John,

The -xtpsort- program does not construct the dummy variables identifying the portfolios for you. It assumes that the portfolio dummies have been created beforehand. However, once you have the dummies, then you can compare any portfolios. Therefore, yes, the program can be used for doing comparisons of dependent sorts. However, when doing this you have to make sure that your if-condition when calling the program excludes all observations from portfolios other than the two portfolios you want to compare.

Best,
Daniel
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#15

16 Oct 2017, 13:20

Also since I am partly replicating an academic study, I am trying to see if I get similar statistical significance levels (a validation/confirmation of previous results).

But that wouldn't necessarily be a validation or confirmation of previous results, nor would getting different statistical significance results necessarily be a disconfirmation. p-values depend on too many things besides the actual effects being studied. Confirmation or validation of a study consists of demonstrating that you get similar effects.

And the fact that somebody did something ridiculous in a study and it didn't get caught in peer review isn't a reason to replicate it.

if by any reason, the difference is not statistically significant then I have a problem at hand. It could be that R3 does not have enough cross-sectional dispersion period by period to make the q1 and q5 returns distinct from each other

But as I said in #9, they are by definition different from each other. Whether a putative "statistical significance" test found them to be so or not would depend entirely on your sample size, the variance of R3, measurement error in R3, and nothing else. If your concern is that R3 has too little variance (whatever that means in your context) then just calculate the variance of R3 and appraise it. At least that will actually answer your real concern and it won't be confounded with extraneous issues.

Added: Crossed with #13.
Comment

Announcement

Long short portfolio return based on a factor rank for panel data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment