Calculate market betas for stocks

John Harperson started a topic Calculate market betas for stocks

30 Apr 2015, 15:09
Calculate market betas for stocks

Hi all,
I just started using STATA, so I don't have much knowledge about it.
My problem is the following:
I have monthly return data for all NYSE stocks for 40 years and have to calculate an individual beta for each stock on a rolling basis.
So for a particular stock and a particular month, I want to regress the returns from the last 5 years on the market returns.
For example stock Apple: beta in January 1980 would be a regression of Apple's returns from January 1975 to December 1979 on the market returns during this period.
The beta for February 1980 would then include date from February 1975 to January 1980.
I know that I have to work with the following command: rolling _b, window(60) saving(betas_`i', replace): regress stockexcessreturn marketexcessreturn
However, I don't know how to apply this command exactly to my problem so that STATA is calculating the beta for each month (is that already included in the rolling command?).
And how can I include that STATA should should just use the returns from a particular stock to calculate its beta and not from other stocks? I think you can't combine the command rolling with by companyID, can you?
I have to include another restriction where I am unsure how to code this: STATA should just calculate the beta for a particular stock if at least 30 return observation were available during the last 60 months (the 5year period).
I know that these are a lot of questions! However, it would make me really happy if someone can help me out...
Thank you very, very much in advance!
Tags: None
Attaullah Shah replied

09 Apr 2020, 06:22
For rolling betas, you can also explore asreg, see this video
Leave a comment:
Guest replied

09 Apr 2020, 03:00
Hi there,

I'm not very familiar with stata yet and would like some help!
My problem is similar to that of the original post (#1), but I have some more problems.

I have three different data sets, these are the following:
The file sp500inclusions contains information about when the stock of a company was included in the S&P 500 index. Permno is a stock security identifier. Dateann refers to the date when S&P announced the inclusion of a new stock and Dateeff to the date when the stock was actually included (usually a few days later).

The file sp500stockret contains daily stock returns for each stock in sp500inclusions. For each stock it covers the period starting 1 year prior to it's Dateann until 1 year after it's Dateeff.

The file sp500marketret contains the index return of the S&P 500 (sprtrn), which serves as a proxy for the market return, as well as the yield on short-term US government debt (rf), which serves as a proxy for the risk-free rate.

For each stock I want to estimate pre- and post-inclusion betas. For pre-inclusion betas I will have to use 1 year of stock returns until 10 days prior to the inclusion announcement. For post-inclusion betas I will have to use 1 year of stock returns starting 10 days after effective inclusion.

What's the best way to handle this? I find it especially tricky how to combine the data from the different sets and how to make the rolling window work in such a way that it takes exactly the right period for the pre- and post-inclusions.

Thank you very much in advance for your help!

Best,
Guest
Last edited by sladmin; 13 Mar 2024, 08:10. Reason: anonymize original poster
Leave a comment:
Clyde Schechter replied

15 Oct 2018, 10:25
I know I have to perform some sort of loop to repeat the command for all stocks.

No, you don't. What you have to do is get your data into a more useful layout, and then the -rolling- command (or several other approaches that you can find in many threads here--this is a very frequently asked question) will handle it.

Code:

rename (NESTLER-whatever) ret= reshape long ret, i(Date) j(stock) string xtset stock Date

In the above code, replace whatever by the name of the last stock before EUROSTOXX.

From there, the -rolling- command, modified as follows:

Code:

rolling _b, window(365) stepsize(365) saving(beta_practice, replace): reg ret EUROSTOXX, robust

will do it.

-rolling- is by now pretty old, and there are other community-contributed commands that can be used to do this more efficiently. You can find models of these earlier in this thread. What all of them have in common, and also with -rolling-, is that the require the data to be in long layout. Your wide layout is easy on human eyes, but it doesn't play well with most Stata commands. When dealing with panel data like this, you should form the habit of working with it in long layout--that will save you much time and frustration. Wide layout is best for a handful of Stata commands, but only a handful.

In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.
Leave a comment:
Jan Scheur replied

15 Oct 2018, 07:53
Goodafternoon,

I want to calculate the betas over all the stocks in my sample, there are 600 stocks in the sample. I have stock return data for 18 years for every stock. I already calculated the beta of one stock, however, now I want to do this for the stocks in my sample. I used the following to calculate the beta:

rolling _b, window(365) stepsize(365) saving(beta_practice, replace): reg NESTLER EUROSTOXX, robust

I know I have to perform some sort of loop to repeat the command for all stocks. I have daily stock returns, the idea is that I get a beta for every stock for every year. A simplified sample looks like this:
Date NESTLER (...) EUROSTOXX

03jan2000 x x x

04jan2000 x x x

(...) (...) (...) (...)

31dec2017 x x x

Between NESTLER and EUROSTOXX are 599 other stock returns. EUROSTOXX is the index.
Thanks in advance.

Regards,
Jan Scheur
Leave a comment:
Arjan Trinks replied

20 Dec 2017, 05:19
@#31

Hi Robert,

Many thanks for your quick solution. This seems to work perfectly for the problem I posted. (#30)
Will save me hours of waiting (and coffee consumption).
Again, thanks so much!
Leave a comment:
Johanna Moser replied

16 Dec 2017, 07:51
Thank you so much Clyde! You are brilliant!
Leave a comment:
Clyde Schechter replied

14 Dec 2017, 10:58
Yes. This suggests that there are some groups where the -newey- command cannot be executed as requested. In order to see what is going on, you can add the -verbose- option to the -runby- command. This will generate a lot of output, but you can break it off after you get some error messages coming out and you'll have a sense of what's going on. It's likely a problem with your data, perhaps groups that do not have enough non-missing observations to run with the number of lags you have calculated. (You have calculated the number of lags on the basis of the number of non-missing values of ex_return, but the estimation sample might actually be smaller than that due to missing values in the other variables.)
2 likes
Leave a comment:

Johanna Moser replied

14 Dec 2017, 09:50

Thank you so so much Clyde!!!!

I adjusted the code:

Code:

capture program drop my_newey
program define my_newey
    tsset week
 quietly summarize ex_return
 scalar lags = floor(4*(r(N)/100)^(2/9))
    newey ex_return mkt_rf hml smb twexb_change_percentage,lag(`=lags')
    if e(N) >= 40 {
        gen n_obs = e(N)
        foreach v of varlist mkt_rf hml smb twexb {
            gen b_`v' = _b[`v']
            gen se_`v' = _se[`v']
   gen t_`v' = _b[`v'] / _se[`v']
   gen p_`v' = 2*ttail(e(df_r), abs(_b[`v'] / _se[`v']))
        }
    }
    exit
end

gen year = yofd(dofw(week))
runby my_newey, by(company_id year) status

Only the number of errors that it displays worries me a little. Do you think there is something wrong?

Click image for larger version

Name: Stata.PNG
Views: 1
Size: 5.2 KB
ID: 1422421

Again, thank you so much for your help!

Leave a comment:

Clyde Schechter replied

14 Dec 2017, 08:13
Here's the gist of it. Bear in mind that I don't know anything about -newey-. I discovered that you have to specify a lag() option to make it run. So I arbitrarily picked -lag(1)- to illustrate the approach. Change that to whatever you need. I also don't know which output from this regression you would consider "the beta." So I generated the regression coefficients and standard errors for all variables. Again, tailor this code to your actual needs.

Code:

capture program drop my_newey program define my_newey tsset week newey ex_return mkt_rf hml smb twexb, lag(1) if e(N) >= 40 { gen n_obs = e(N) foreach v of varlist mkt_rf hml smb twexb { gen b_`v' = _b[`v'] gen se_`v' = _se[`v'] } } exit end gen year = yofd(dofw(week)) runby my_newey, by(company_id year) status

The key points are:

1. -runby-, is written by Robert Picard and me, and you can install it with -ssc install runby-. It's purpose is precisely for projects like this: run the same commands on chunks of the data set and capture results as you go along. You could also do this with official -statsby-, but -runby- is much faster and, in my opinion, simpler to use.

2. The simplest way to determine whether 40 observations are available is to actually run -newey- and see how many it says it had. So that's the approach I used. -newey- seems to run very quickly on a set of up to 52 observations, so I don't think this will create much inefficiency compared to explicitly counting observations with missing values.

3. The data that -newey- is applied to in this approach is not the original data set, but is a chunk of a clone of the original data set. That chunk does not carry with it the information about how the data are -tsset- or -xtset-, so the -tsset- needs to be done inside the program my_newey. (I guess this means that it isn't really a full "clone" of the original.)
3 likes
Leave a comment:

Johanna Moser replied

14 Dec 2017, 01:40

Hi Clyde,
first of all, thank you for your answer!!!!

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int company_id float(week ex_return) double(mkt_rf smb hml twexb_change_percentage)
6 1820  -1.922182   .17 -1.21  1.85                  .
6 1821  4.0626664  1.07   -.3   .13 -.0319025921651096
6 1822   3.007111  -.22   .47  -.96 -.5542856415845588
6 1823 -1.3971034   .79 -1.21   .09  .2684526551012281
6 1824 -3.1557686  1.76 -1.23   .11  .0166934967665756
6 1825   1.702802   .83  1.09  -.77  -.398238226404705
6 1826      -.099  -.03  -.37   .66  .2480528809463563
6 1827      -.099   .95  -.66   .87 -.4569744501027921
6 1828  1.6549115  -.33   .75 -1.05 -.2111390276694827
6 1829  1.6241304   .47  -.84  -.74 -.4712981869436382
6 1830      -.115   .95  -.44  -.56 -.1046784368881651
6 1831  1.5944017    .9  -.54  -.23  .2399777486939216
6 1832  2.4060085   .09   .54  1.36 -.5425828016846495
6 1833  3.1676884   .71 -1.18  1.41  -.874923088734785
6 1834  2.2699523   .73   .69  -.24 -1.114025020836144
6 1835  -5.537357  -.49  -.38  1.53 -.9998190772054948
6 1836   2.757852  1.15   .25  -.53 -.1408511913849036
6 1837 -2.1260319   .68 -1.17   .18 -.0387217579454937
6 1838  2.3050244  1.28   -.1   .65 -.3285368222462772
6 1839      -.134 -1.06  1.35   .26  1.493984971669278
6 1840  -4.895905   .68  -.58    .4  .4127188016588153
6 1841     -3.868  1.36 -1.24   .72 -1.093935111206834
6 1842 -1.8496017  -.58  2.14 -1.42   .458236478468209
6 1843  2.0846431  2.16  -.13 -1.32 -.0730010972286157
6 1844   6.778552  1.78  -.75   -.3 -.0450502304535142
6 1845  -7.376065  -.83  1.13    .2 -.0694329860945689
6 1846   2.495696  2.21  -.67   .37 -.1279917376069615
6 1847  -2.655373   .84  1.45 -1.13   .259751211377123
6 1848  -.9825652 -1.41   .28   .02  .0174858453188828
6 1849  -.9901929  2.14   .64  -.96  .0220194612202905
6 1850  -2.770867   -.8    .5  1.36  .1129493497945198
6 1851  -5.570546  -.32   .99  -.39  .6373720117661943
6 1852  -4.442923  1.16   .68   -.5  1.553472254247672
6 1853      -.116     0     0   .68  1.090733736844261
6 1854   5.922151   .64  -.44  2.01 -.0910189631752859
6 1855  2.2616682  1.91   .61 -1.06 -.0330792971773431
6 1856  -7.052444  1.31 -1.33   .14  .5664963964832207
6 1857  2.8770745  -.35   -.1 -1.71  .8852092428921295
6 1858 -3.4896426   .15 -1.08  1.45 -1.404561685347355
6 1859     -1.118  -.98  -2.2  1.41 -.0493520007365284
6 1860   6.952707   .42  -.76   .63  .1843313036539588
6 1861   -24.6463   .34    .1 -1.41  .0902320588945803
6 1862     -6.368 -1.66  -1.2  -.52 -.0712446993730011
6 1863   6.561666  2.23   .17 -1.08  .6997006016826614
6 1864      4.895   .25   .35  -.16  .6612963318752312
6 1865   5.252143   .71 -1.03   .45  .1477350975779138
6 1866  -5.189746   -.3   -.8  1.58 -.1752095354420207
6 1867  -7.264857  1.46   .67  -.58  .1490418287527331
6 1868   6.288256  1.14  -.78  -.38  .3260769967213287
6 1869 -1.3268193   -.7   .16   .58  .4610708980489516
6 1870  2.3170245  -.34   .75   .53  .0933166276018613
6 1871   7.020857   .78   .29  -.18 -.2994205773346552
6 1872  2.1152222  -.48  -.19   .59  .2020980500156001
6 1873 -3.3678696 -2.32   .12   .79  .3911540455130725
6 1874  1.5783932  1.28 -1.63  -.12   .153957456110667
6 1875   .4454862  1.67   .16 -1.02  .4569010137881136
6 1876  -.3727253  2.27  -.41  -.82  .4881379781279293
6 1877   5.755884   2.7 -1.78  -.71  -.385689339900815
6 1878  -2.244938  -.82   1.1   .59  .0982943704509014
6 1879 -2.2256596  1.61  -.41 -1.33 -.2600022923156712
6 1880 -1.1849566 -2.02  1.61  1.58 -.3573739433812891
6 1881 -2.2958024 -1.55   .11  -.26  .3787079875696428
6 1882  -7.963169  1.42   .62 -1.04  .0257729824619537
6 1883   5.999561  1.14  -.01   .67 -.1300734599436812
6 1884  -2.971563  -.52  1.17    .9  .1297248010610148
6 1885   5.211444   1.4  -.08  -.33 -.0456346003402419
6 1886 -3.4847865  -2.7  2.03  -.53  .1548764870368509
6 1887  -6.509349  1.49  1.18 -1.41  .2939756694820437
6 1888  10.445005  1.79  1.32 -1.52 -.0497799068507639
6 1889      -.106 -1.38  2.01  -.04  .0653751129112623
6 1890 -3.1958876  1.37  -.29  -.22 -.0057706853306983
6 1891   8.879507  2.62   .23  -.95  .1038783304443812
6 1892  10.000383  1.12   .41   -.1   .337460133789865
6 1893      -.106 -1.27   .86   .07  .3939867768188034
6 1894   .3830918   .17 -1.35  -.16 -.2448665906312937
6 1895 -1.0615385 -1.05   .16   .96  .3271194636593105
6 1896  -3.012621  -.86 -2.46   .57 -.1892189129051771
6 1897       -2.1   .58   .03   .21  .1150970498324158
6 1898  3.9696326 -1.98   .47    .8 -.0010219087006377
6 1899  -4.033569 -2.62 -2.91  2.93  .3228242574991119
6 1900  -6.234449 -1.07   .41  -.19 -.2889855456295752
6 1901   .4314783  -.87 -1.91   .86 -.5526763941798342
6 1902  -4.967865  3.88 -1.28  -.91 -.0426313069936908
6 1903 -1.2393637  -.06   .77  -.01 -.1184936406014964
6 1904  -5.850127   .61  -.33   .78  .0928083795060269
6 1905  1.1165122    .3   .42   .19  .0793587973647026
6 1906   4.716277 -1.54  2.47  -.14  -.157461874813189
6 1907 -2.4078505   .28     0   .17  .1498915678020208
6 1908 -1.2854706   3.4  -1.5 -1.43  .3435052948506073
6 1909 -4.2756667   .86   -.2 -1.11  .2215313302014492
6 1910  12.934478   .19   .79  -.91 -.0955055204233619
6 1911   8.685209  1.88 -1.42   .77  .2315794639625562
6 1912 -13.237313  -.33  -.02    .6 -.0087725472314976
6 1913  2.2195814   .94 -1.31   .77  .2505496074960811
6 1914  -5.219636 -1.66  -.28  2.11  .1090870977031732
6 1915  1.0956048   .04 -1.66  1.08 -.0147391725530304
6 1916  -1.877148  3.32 -2.04  -.89 -.1623581272594788
6 1917   15.56065   .78  -.48   .48 -.5026317906461209
6 1918    -1.6645  1.12 -1.01  1.47 -.0740971962808138
6 1919   .4271005  1.06   .01  -.02  .2193838202896143
end
format %tw week

I hope this gives a clearer idea of what my data looks like. In total, i have 2137 companys with weekly obeservations for the time period 1995-2015.
What I have to do is a Newey-Regression which creates yearly Beta per Company. It should NOT be rolling, hence i want to do a separate single regression for each company's data for each calender year.

Additionally companies need to have at least 40 non-missing-values in the respective year, otherwise no regression should be run in the respective year.
Thank you so much!

Last edited by Johanna Moser; 14 Dec 2017, 01:44.

Date	NESTLER	(...)	EUROSTOXX
03jan2000	x	x	x
04jan2000	x	x	x
(...)	(...)	(...)	(...)
31dec2017	x	x	x

Announcement