converting regression output (only the intercept) into a new variable/column

LydiaSmit

Join Date: Jul 2014

Posts: 77
#1

converting regression output (only the intercept) into a new variable/column

22 Jul 2014, 17:11

Dear readers,

Is the following possible: converting regression output (only the intercept which is _cons) into a new variable/column?
If so, please tell me how. I need to make about 600 regressions and need the 600 intercepts (_cons) as 1 variable/column.

Lydia

Last edited by LydiaSmit; 22 Jul 2014, 18:03.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

22 Jul 2014, 18:29

You need to tell us more about what these 600 regressions are. If they correspond to 600 subsets of your population defined by the values of a variable or combination of variables, see -help statsby-

If they are based on different sets of dependent or independent variables, it isn't clear how you would reasonably integrate these as a new variable in the existing data set: which results would go in which observation (row)? You might want to store them in a matrix instead, or in a separate data set. If that's what you're trying to do, tell us more about how you're generating the 600 regressions and I (or someone else) can show you how to write a loop to save the constant terms in a matrix or data set.

Or maybe you have something else entirely in mind?
Comment
LydiaSmit

Join Date: Jul 2014

Posts: 77
#3

22 Jul 2014, 18:56

Thank you for the quick reply Clyde.

r - Rf = beta1 x ( Km - Rf ) + beta2 x SMB + beta3 x HML + alpha.

I'm running regressions based on the above formula, wherein the betas are the coefficients and the alpha is the intercept (_cons).
In my dataset I've got a variable 'Fund' with a thousand funds and I want to make a variable with a thousand alphas next to the variable 'Fund'.
I also have the other variables of the above formula.

gen ret=r-Rf
gen MRP=Km-Rf

regress ret MRP SMB HML

First 600 regressions without 'bysort date' and after that with 'bysort date'. First the date doesn't matter and later it does.

Hopefully that's possible.

Last edited by LydiaSmit; 22 Jul 2014, 19:00.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#4

22 Jul 2014, 20:01

So, is the data xset by fund and date, you have multiple records for each fund, and you want to run a separate regression for each fund? Something like

bysort fund: regress ret MRP SMB HML

I think that is manageable, but confirm that this is in the ballpark first. I am thinking something like

Code:

gen alpha = . egen fund = group(fundid) forval fundnum = 1/600 { quietly regress ret MRP SMB HML if fund == `fundnum' replace alpha = _b[_cons] if e(sample) }

If this is nowhere close to what you want then please describe your data more. Maybe show what the commands for the first regression would be, and then we can try to figure out the other 599.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#5

22 Jul 2014, 20:08

I am also curious why you would want to do this! It seems like kind of an esoteric task. Is there some great finance theory behind this all? The intercept is the expected value when all Xs = 0; is there some reason for thinking that is interesting and is it even possible?

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
LydiaSmit

Join Date: Jul 2014

Posts: 77
#6

22 Jul 2014, 21:19

Thank you for the quick replies Richard.

The command for the 1st regressiontype is: regress ret MRP SMB HML
The data is indeed xtset by fund and date. I also have multiple records for each fund, a row for each date.
However, I do not want a separate regression for each fund because on each date each fund has a different return. So, for the 1st regressiontype I need a separate regression for each date/row. So, date doesn't matter for the use of commands in this case, at least I think so

For the 2nd regressiontype, the dates do matter and then I want to regress based on the date, so "bysort date: regress ret MRP SMB HML"
(@Richard, due to my previous topic about equalportfolio and valueportfolio)

The intercept, which is called the alpha, is the added value of a fundmanager. A fund can have a better return than the market because of factors like company size. "If the fund manager captures the factor exposures perfectly, the expected alpha would be zero, minus the expense ratio (ER) of the fund. An alpha greater than this suggests that the fund manager is adding value beyond the underlying factor exposures." (The factors in my problem case are MRP SMB HML, when those Xs=0 and the _cons is >0 then the manager is adding value)
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#7

22 Jul 2014, 21:43

Well, maybe my question should be, what is the second regression? If you just regress ret MRP SMB HML then every single case in the data set will be included, How do you get 599 more regressions after that? Don't you want/need some kind of sample selection? You say "So, for the 1st regressiontype I need a separate regression for each date/row." You can't run a regression on just 1 case.

Just from the quote. If I was interested in added value, I would think in terms of better than predicted performance. So, I might do

Code:

regress ret MRP SMB HML predict resid, resid

The residual would indicate how much better (or worse) the case did than predicted by the Xs. Is that Alpha? Indeed, as I look at this formula

r - Rf = beta1 x ( Km - Rf ) + beta2 x SMB + beta3 x HML + alpha

alpha looks to me the same as residual. But that only requires one regression, not 600.

If I am still totally off the mark, maybe you could present some of your data; maybe even a hand calculation of alpha for the first case or two.

Last edited by Richard Williams; 22 Jul 2014, 22:08.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#8

22 Jul 2014, 21:52

Or, given that there is no constant in the equation as you wrote it, maybe it is

Code:

regress ret MRP SMB HML predict resid, resid gen alpha = resid + _b[_cons]

Last edited by Richard Williams; 22 Jul 2014, 21:55.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
LydiaSmit

Join Date: Jul 2014

Posts: 77
#9

23 Jul 2014, 06:01

The first table of http://www.efficientfrontier.com/ef/101/roll101.htm shows exactly which dataset I have. However, I have a few thousand funds and 50 years of data. Therefore, I also have a column/variable with the few thousand funds. After the tables there's a short explanation that the intercept (without the residual) is indeed the alpha. Maybe I'm wrong, but I always thought that the intercept is also called the constant.

"You can't run a regression on just 1 case" So it's impossible to make a regression for each date (month) in this case and put the intercepts in a new column/variable in the same dataset?

In my dataset some funds are managed by multiple managers, example fund A is managed by manager A1 from 1jan1990-1march1992, managed by manager A2 from 1june1994-1dec2000 (I don't know if there are time-data gaps for a few funds). I need to know which manager is responsible for which alpha (@Richard, and what the monthly alpha is for the equalweightportfolio and valueweightedportfolio)

The returndata is based on a 'date' variable with a format like 1jan1990
The manage periods of each manager are based on 4 'date' variables (starting quarter, starting year, ending quarter, ending year)

Last edited by LydiaSmit; 23 Jul 2014, 06:24.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#10

23 Jul 2014, 06:44

As I read that example, there are 120 monthly records for a single fund, Dodge and Cox. You say you have data for thousands of funds, so I think my original suggestion is more or less right: You should run a separate regression for each fund.

In their example, there is just one alpha for the entire fund over the period studied. If you need separate alphas for each manager, then maybe you should treat these as separate funds, e.g. run a regression for Dodge & Cox 1990-1992, another regression for Dodge and Cox 1993-2000.

I assume the number 600 comes from 600 months over 50 years. But that is not what is done in the page you cite. There, it was one fund with records for 120 months. It may be legit to do something monthly instead, but if so it is different from your example. If you want to come up with an alpha for a fund, you have to have multiple records for that fund.

In short, it sounds to me like you want to clone what they did, but instead of just doing it for one fund you want to do it for thousands of funds, possibly treating some of those funds as different funds if they have different managers. If you don't want to clone what they did, you need to clarify what it is you want to do instead.

If I was just doing one fund, I might want to put in dummy variables for manager, but I don't see how you can do that for thousands of funds.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#11

23 Jul 2014, 06:58

I would tweak my earlier code to

Code:

gen alpha = . levelsof fund, local(fundnum) foreach id of local fundnum { quietly regress ret MRP SMB HML if fund == `id' replace alpha = _b[_cons] if e(sample) }

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
LydiaSmit

Join Date: Jul 2014

Posts: 77
#12

23 Jul 2014, 07:34

Thank you for the quick reply.

"If you need separate alphas for each manager, then maybe you should treat these as separate funds, e.g. run a regression for Dodge & Cox 1990-1992, another regression for Dodge and Cox 1993-2000." It would be nice if I could run that kind of regressions for each manager, however, the periods for each manager are different for each fund. If all funds were based on manager A manages from 1990-1992 and manager B manages 1993-2000....then it would have been a lot less complicated.

The number 600 indeed comes 12x50. If a fund of my dataset is only managed by 1 manager, then it's ok to clone (do the exact same thing) as was done on the page I cited. However, then the funds remain which were managed by several managers in different periods.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#13

23 Jul 2014, 07:47

You basically just need to add a variable, e.g. manager, to each record. Have it coded 1 for the first manager, 2 for the next manager, etc. There is no need for the time periods to be the same across funds. Then you could have something like

Code:

egen newfund = group(fund manager)

You'd have to figure out what to do when you don't know who the manager was.

I assume you have this manager data already, but if it is not in eform you'll have to figure out how to add it to your data.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
LydiaSmit

Join Date: Jul 2014

Posts: 77
#14

23 Jul 2014, 08:17

I already have a variable/column with managernames. That code is very useful. Thank you.

So now I can use the code below to get an alpha (regression output: intercept/_cons) for each fund in a new column/variable in the same dataset?

Code:

gen alpha = . levelsof fund, local(fundnum) foreach id of local fundnum { quietly regress ret MRP SMB HML if fund == `id' replace alpha = _b[_cons] if e(sample) }

[/QUOTE]

Could you tell me why you put a '.' after 'gen alpha ='?
Do I need to put the varname 'newfund' in the above commands instead of 'id' or instead of 'fundnum'?

I also got fundreturns for periods wherein no manager managed the specific fund.
Is it possible to drop those observations, based on the following:
The returndata is based on a 'date' variable with a format like 1jan1990
The manage periods of each manager are based on 4 'date' variables (starting quarter, starting year, ending quarter, ending year)

Last edited by LydiaSmit; 23 Jul 2014, 08:44.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#15

23 Jul 2014, 08:52

The first line initializes alpha at missing. It will get replaced as you run your regressions.

Manager names worries me a bit, as Adams in 2000 would get coded ahead of Smith in 1980. But, maybe it doesn't matter whether the regressions for a company are run in chronological order. Maybe somebody else can suggest a way to create a manager id variable that would increase by 1 every time manager changed, e.g. Smith in 1980 would be manager 1, Jones in 1983 would be manger 2, ... Adams in 2000 would be manager 10, etc.

You've just copied my original code and that does not account for manager changes. So yes, you need something like

Code:

gen alpha = . egen newfund = group(fund manager) levelsof newfund, local(fundnum) foreach id of local fundnum { quietly regress ret MRP SMB HML if newfund == `id' replace alpha = _b[_cons] if e(sample) }

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment

Announcement

converting regression output (only the intercept) into a new variable/column

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment