Long short portfolio return based on a factor rank for panel data

Daniel Hoechle

Join Date: Sep 2017

Posts: 11
#16

16 Oct 2017, 13:30

Clyde and Nick,

Here is a (slightly simplified) note on the methodology which is a standard technique in empirical finance.

People often sort common stocks into, say, quintiles based on a characteristic like the book-to-market ratio or firm size. They then compute the monthly return (i.e. relative change in value) for each of the five portfolios (or "groups of stocks") and compare the (risk-adjusted) mean return of the top-quintile portfolio with that of the bottom-quintile portfolio. The goal of this procedure is to discover a firm characteristic which is able to choose stocks which have, on average, better returns than the overall stock market.

Best,
Daniel
Comment
john Abe

Join Date: Sep 2017

Posts: 70
#17

16 Oct 2017, 13:43

Hi Nick, Clyde

I understand your viewpoint and concerns.

Testing statistical significance of q5-q1 factor returns is a standard procedure in the finance literature (for e.g.studies on factor models such as Fama French, Carhart, Sharpe etc).

Best,
John.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#18

16 Oct 2017, 13:43

Daniel,

This sounds different from what John Abe was asking for. I understood him to be looking at testing the means of the top and bottom quintiles of the returns distribution for a significant difference, which is not a meaningful use of statistical testing. You are describing testing the mean returns in the top and bottom quintiles of the distribution of some other variable. That would be meaningful. Perhaps John will clarify what he actually means.

Added: Crossed with #17.
1 like
Comment
john Abe

Join Date: Sep 2017

Posts: 70
#19

16 Oct 2017, 14:50

Thanks Daniel for your comments.

I meant the "risk adjusted" factor return (after accounting for other firm characteristics) and then testing whether the residual Q1-Q5 quintile returns are significantly different. If the difference is not statistically significant then the factor is not useful in identifying securities that on average have better return profile.

Best,
John.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#20

16 Oct 2017, 15:21

But John, you have not addressed the question I posed in #18. What are Q1 and Q5 the quintiles of? Are they quintiles of the risk adjusted factor return itself, or of some other variable?
Comment
john Abe

Join Date: Sep 2017

Posts: 70
#21

16 Oct 2017, 15:29

Hi Clyde,

The Q1 and Q5 returns are of the residual after taking out the effects of the market and other firm characteristics. In this case the distribution of the original factor and the residual are different, so it is technically some other variable.

Best,
John.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#22

16 Oct 2017, 16:00

OK, I don't understand the financial jargon here, and so I can't figure out which variables in your example data are which. So I'll put it abstractly and then you can plug in the actual variable names. Let's call the residual after taking out the effects... variable residual. And let's call the other thing factor. It's not just "technically" another variable--it has to actually be another variable for this to give you something other than garbage. And let's call your date variable date. Substitute the actual variable names in this code.

Code:

egen quintile = xtile(factor), by(date) nq(5) levelsof date, local(dates) foreach d of local dates { display "Date: " %td `d' ttest residual if inlist(quintile, 1, 5) & date == `d', by(quintile) }

Notes:
1. Requires the -egenmore- package so that you can use the -xtile()- -egen- function.
2. I assume you want to test equality of the quintiles of residual separately at each date. That is what this code does.
3. The mean residuals in the 1st and 5th quintiles will be part of the output of the -ttest- command. They will not be saved in the data set.
4. If you need those means saved in the data, I would use a slightly different approach; post back if that's what you want.
Comment
john Abe

Join Date: Sep 2017

Posts: 70
#23

16 Oct 2017, 16:13

Hi Clyde,

Thank you. I think it would be important to save the 1st and 5th quintile residual means if possible.

Best,
John.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#24

16 Oct 2017, 16:38

So the following code creates a toy data set with the relevant variables (date, residual, factor) and then does the calculations. At the end the dataset will contain the following additional variables:

quintile -- the quintile of factor into which this asset falls on this date

quintile_mean_residual -- the mean value of residual in this asset's quintile of factor from this date. You may not care about its values for quintiles other than 1 and 5. Feel free to disregard the results for quintiles 2-4 if you wish.

t_stat: the t-statistic for a t-test of equality quintile_mean_residual in quintiles 1 and 5 for this date

p-value: the p-value of that t-test

Code:

clear* capture program drop myprogram program define myprogram regress residual i.quintile predict quintile_mean_residual test 5.quintile = 1.quintile gen t_stat = sqrt(r(F)) gen p_value = r(p) exit end // CREATE DEMONSTRATION DATA SET set seed 1234 set obs 100 gen asset = _n expand 10 by asset, sort: gen date = mdy(1, 1, 2000+_n-1) format date %td gen residual = rnormal() gen factor = rgamma(10, 4) // SEPARATE ASSETS INTO QUINTILES BY LEVELS OF FACTOR AT EACH DATE egen byte quintile = xtile(factor), by(date) nq(5) // CALCULATE MEAN RESIDUALS IN EACH QUINTILE AND // TEST EQUALITY OF 1ST AND 5TH QUINTILES AT EACH DATE runby myprogram, by(date)

Notes: 1. You need to get the -runby- program for this. It was written by Robert Picard and me, and it is available at SSC.
2. Evidently, you will want to run this with your actual data. So you will need to skip the part where a demonstration data set is created, and you will need to change the variable names in the code to correspond to the actual variable names for date, residual, and factor in your data everywhere they appear above.
3. As there may be complications in your real data not foreseen as I wrote this code, I suggest that you first test the code on a small sample of your data. When testing it, add the -verbose- option to the -runby- command so that if errors crop up, you will be able to see what's happening and not just be left mystified by the absence of results. If there are no problems encountered, or after you fix those you do find, then run it using the entire data set and, if your data set is large, remove the -verbose- option (unless you want to see the gory details for every single date.)

Last edited by Clyde Schechter; 16 Oct 2017, 16:43.
Comment
Kate Lussy

Join Date: Apr 2019

Posts: 42
#25

01 May 2019, 02:05

Originally posted by Nick Cox View Post

That makes sense. Each of the variables will have missing values for the other quintile bins, unless you explicitly spread the means.

Code:

xtile q_R3 = R3, nq(5) bysort newdate : egen R3_q5 = mean(cond(q_R3 == 5, ex_ret, .) ) by newdate : egen R3_q1 = mean(cond(q_R3 == 1, ex_ret, .))

See e.g. Section 9 of http://www.stata-journal.com/sjpdf.h...iclenum=dm0055

(Please use CODE delimiters for code, as requested)

Hello Nick,

Thank you for the code, it's extremely helpful since I was also struggling with the missing values.

I was wondering how to consider value-weighted excess return, ex_ret in the code? Thank you in advance for your help!

Best,
Kate
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#26

01 May 2019, 02:10

#25 Sorry, but I don't understand your question. ex_ret is in the code you cite.
Comment
Kate Lussy

Join Date: Apr 2019

Posts: 42
#27

01 May 2019, 02:14

I apologise for that. I meant, how would the code change if I want to generate the value-weighted excess returns for the particular quintile portfolio. I want to value-weight with MV, or market value.
Thank you!
Comment
Kate Lussy

Join Date: Apr 2019

Posts: 42
#28

01 May 2019, 05:00

Originally posted by Nick Cox View Post

#25 Sorry, but I don't understand your question. ex_ret is in the code you cite.

Hi Nick,

Maybe as a follow-up: instead of trying to use the value-weighted function in egen, I tried to use asgen. However, in asgen, I am not sure how to account for the missing values as you did with the cond function.

I thought the code could be:

bys ymdate : asgen ExUSD = F_Excess_USD_w(cond(IVOL_w_5 == 1, F_Excess_USD_w, .)), w(MV_USD_w)

But, then I get an error message: unknown function F_Excess_USD_w(). I replaced mean, as in your example, with the variable of interest because with asgen, I did not need the mean function. However, I don't know what to put in its place.

Thank you in advance. I very much appreciate your help.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#29

01 May 2019, 06:47

Sorry, but I didn’t write asgen and have never used it. But my wild guess is that your syntax is a long way from what it supports. Attaullah Shah will no doubt address this.
Comment
Kate Lussy

Join Date: Apr 2019

Posts: 42
#30

01 May 2019, 07:42

Originally posted by Nick Cox View Post

Sorry, but I didn’t write asgen and have never used it. But my wild guess is that your syntax is a long way from what it supports. Attaullah Shah will no doubt address this.

I completely understand that. In that case, would you have a recommendation as to how to account for value weighting in your code:
bysort newdate : egen R3_q5 = mean(cond(q_R3 == 5, ex_ret, .) ) by newdate : egen R3_q1 = mean(cond(q_R3 == 1, ex_ret, .)) Thank you kindly. I very much appreciate it.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment