Run more than one regression with same independent variables

Ernest Adler

Join Date: May 2015
Posts: 3

Run more than one regression with same independent variables

28 May 2015, 09:47

Hello all,

For my thesis I have to run several regressions with certain factors as independent variables (F&F 5-factor model). However, I need to run around 1400 regressions on the same factors, as I have that many different dependent variables. Here follows an example of my data as it is ordered now. All numbers are returns.

Date	Mkt-rf	SMB	HML	RMW	CMA	Bank A	Bank B	Bank C
31-01-1986	0,020	0,11	0,34	0,004	0,33	.	0,045	.
28-02-1986	0,012	0,03	0,04	0,45	0,01	.	-0,09	.
31-03-1986	-0,01	-0,12	0,55	0,04	0,04	.	-0,07	0,07
30-04-1986	0,01	0,035	-0,09	0,03	0,04	.	.	0,09
31-05-1986	0,003	0,001	0,004	0,01	0,02	0,045	.	0,05
30-06-1986	0,002	0,11	-0,56	-0,09	-0,01	0,022	.	0,07
31-07-1986	0,02	-0,02	-0,04	-0,03	-0,07	0,045	.	-0,08
31-08-1986	-0,004	0,01	-0,09	-0,03	0,10	0,03	.	-0,09
30-09-1986	-0,1	0,22	0,06	-0,02	0,04	0,008	.	-0.01

NOTE: these numbers are fictional, the example is just to show what I mean. The factors have returns at all dates, while Bank A for example can have 25 returns, and Bank B 50.

Want I need is something like this:
reg Bank A Mkt-rf SMB HML RMW CMA
reg Bank B Mkt-rf SMB HML RMW CMA

However, to run these manually would take simply too much time but I just can't figure out how I can run more than one regression at the same time. Is there an alternative?

This question has my priority, but if someone also knows how to easily import the beta's and p-values of the regressions that would also be nice.

I'm using STATA version 12. I know I demand a lot, but hopefully someone can help me with this.

Best regards,
Ernest Adler

Tags: None

Friedrich Huebler

Join Date: Apr 2014

Posts: 1053
#2

28 May 2015, 12:34

"Bank A", "Bank B", "Bank C" and "Mkt-rf" are not valid variable names so let's assume that your variables are called Bank_A, Bank_B, Bank_C and Mtk_rf. Assume further that all "Bank" variables are next to each other in your dataset so that we can get a list with ds. To run more than one regression with one command you can loop over the dependent variables.

Code:

ds Bank_A - Bank_C local banks `r(varlist)' foreach bank of local banks { di "Bank: `bank'" reg `bank' Mkt_rf SMB HML RMW CMA }
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30354
#3

28 May 2015, 13:02

Friedrich's solution does not respond to the original poster's request of "how to easily import the beta's and p-values of the regressions that would also be nice." Also, as the original post states that there are 1400 different banks, the output from that loop of regressions will be quite unwieldy.

With 1400 outcomes (all of them banks?), a naming scheme like Bank_A, Bank _B, etc. will also be unwieldy. So I'm going to assume that Ernest will rename these variables to Bank1 through Bank1400. In referring to the regression p-values, it isn't clear whether Ernest wants the p-values associated with each coefficient in each regression, or the overall p-values for each regression model. With that done, I would recommend the following:

Code:

gen long obsno = _n reshape long Bank, i(obsno) j(bank_num) rename Bank outcome statsby _b _se e(F) e(df_m) e(df_r), saving(regression_results, replace) by(bank_num): reg outcome Mkt_rf SMB HML RMW CMA

The file regression_results.dta will now contain all of the coefficients, their standard errors, the F statistic, and the model and residual degrees of freedom for each model. (The last three will have names that look like _eq2_stat1 through _eq2_stat3: I recommend renaming them immediately.) Ernest can then calculate the p-values for the overall regression using the Ftail() function, and those for the individual variables using the ttail() function.
Comment
Ernest Adler

Join Date: May 2015

Posts: 3
#4

28 May 2015, 13:55

First of all, thanks for replying to my questions.

I see now that I was a little bit careless in my opening post. The Mkt-rf variable in reality is MktRF. Furthermore, the banks all have their original names (so I wrote Bank A, which in reality is AMERICANPIONEER for example). Having said that, I tried Friedrich's solution first. This solution works, but I only can check the last 20 regressions or something like that (probably because there are too much regressions ran at once?). Also, as Clyde mentioned, it is harder to copy all the necessary outcomes this way.

The outcomes that I need are the coefficients of all the variables, their std. error, the t-values and p-values. I also need the r-squares and the F-value and p-value of the models. I tried Clyde's solution, but I can't get that one to work.

Sorry for my beginner skills. Again, thanks for replying, I appreciate the help from both of you.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30354
#5

28 May 2015, 14:56

I tried Clyde's solution, but I can't get that one to work.

If the problem is that your variable names, being bank names, are unsystematic, so you can't easily make them look like Bank1-Bank1400, then try this (I assume your bank-name variables are all together at the end of your data set, and there is nothing else interspersed among them. Let's say the first one is AMERICANPIONEER and the last one is ZEROBANK) :

Code:

unab bank_names: AMERICANPIONEER-ZEROBANK rename (`bank_names') Bank#, addnumber

That will convert the names to the form Bank1-Bank1400 needed for my code in #3.

If that isn't the problem (or if it still doesn't work after you do this), please show exactly what code you ran and exactly how Stata responded by pasting (do not retype anything) directly from your Results window or log-file into a code block here on the forum. (See the FAQ for how to create a code block.) I'll try to figure out how to fix it for you.

By the way, you didn't' originally mention you needed the R2 statistics. You can capture those by adding e(r2) to the list of parameters in the -statsby- command.

Last edited by Clyde Schechter; 28 May 2015, 14:58.
Comment

Friedrich Huebler

Join Date: Apr 2014
Posts: 1053

28 May 2015, 15:14

You can store the results with the regsave module from SSC.

Code:

ssc install regsave

The example below uses the auto data to demonstrate how this works.

Code:

sysuse auto, clear
ren price MktRF
ren mpg SMB
ren rep78 HML
ren headroom RMW
ren trunk CMA
ren weight JPMorgan_Chase
ren length Bank_of_America
ren turn Citigroup

ds JPMorgan_Chase - Citigroup
local banks `r(varlist)'
local i = 0
foreach bank of local banks {
  local i = `i' + 1
  di "Bank: `bank'"
  reg `bank' MktRF SMB HML RMW CMA
  local f = e(F)
  local p =  Ftail(`e(df_m)',`e(df_r)',`e(F)')
  if `i' == 1 {
    regsave using results.dta, tstat pval table(`bank') addlabel(F, `f', p, `p') replace
  }
  else {
    regsave using results.dta, tstat pval table(`bank') addlabel(F, `f', p, `p') append
  }
}

The file results.dta contains these variables:

Code:

+-----------------------------------------------------------+
|          var   JPMorgan_C~e   Bank_of_Am~a      Citigroup |
|-----------------------------------------------------------|
|   MktRF_coef    .0705303177    .0007755397    .0000657132 |
| MktRF_stderr    .0191870164    .0005565662    .0001337312 |
|  MktRF_tstat    3.675939798     1.39343667    .4913825989 |
|   MktRF_pval    .0004916819    .1683845222    .6248634458 |
|     SMB_coef   -59.57649612   -1.868157387   -.3305487931 |
|   SMB_stderr    11.85547352    .3438969553    .0826312527 |
|    SMB_tstat   -5.025231361   -5.432317257    -4.00028801 |
|     SMB_pval    4.42246e-06    9.54228e-07    .0001688887 |
|     HML_coef   -138.7820282   -2.313806534    -1.22577095 |
|   HML_stderr    54.31847763    1.575639844    .3785933256 |
|    HML_tstat   -2.554969072   -1.468486905   -3.237698317 |
|     HML_pval     .013047494    .1469475925    .0019236589 |
|     RMW_coef      80.710289    1.783444047    .2113008946 |
|   RMW_stderr    75.52352142    2.190743923    .5263900161 |
|    RMW_tstat    1.068677545      .81408149    .4014150798 |
|     RMW_pval    .2892927527    .4186647236    .6894729733 |
|     CMA_coef    44.75694656    1.891320944    .2698600888 |
|   CMA_stderr     16.7351799     .485444665    .1166422144 |
|    CMA_tstat    2.674422741    3.896058798    2.313571453 |
|     CMA_pval    .0095242811    .0002392972    .0239671711 |
|   _cons_coef    3474.100098    199.4847412    46.21289825 |
| _cons_stderr    421.6842651    12.23198032    2.939089298 |
|  _cons_tstat    8.238628387    16.30845833    15.72354317 |
|   _cons_pval    1.36572e-11    2.71718e-24    1.73519e-23 |
|            N             69             69             69 |
|           r2    .7720876932    .7670263648    .6471169591 |
|            F    42.68441391    41.48337173    23.10588074 |
|            p    5.70477e-19    1.12876e-18    4.23521e-13 |
+-----------------------------------------------------------+

Edit 1: Table showing results.dta replaced by output of list, noobs sep(0).
Edit 2: p-value for F-test added to code and results.

Last edited by Friedrich Huebler; 28 May 2015, 15:37.

Comment

Friedrich Huebler

Join Date: Apr 2014
Posts: 1053

28 May 2015, 18:56

For some reason the regression results are saved as strings in the example above. To convert the data to numeric variables, run the commands below.

Code:

use results, clear
destring JPMorgan_Chase - Citigroup, replace

Finally, you can rearrange the data to have one observation per bank.

Code:

count
local n = `r(N)'
forval i = 1/`n' {
  local v`i' = var[`i']
}
drop var
xpose, clear varname
forval i = 1/`n' {
  ren v`i' `v`i''
}
ren _varname bank
order bank

Here is a subset of the variables after xpose.

Code:

list bank - MktRF_pval N - p, noobs sep(0)

+---------------------------------------------------------------------------------------------------+
|            bank   MktRF_~f   MktRF_~r   MktRF_~t   MktRF_~l    N         r2          F          p |
|---------------------------------------------------------------------------------------------------|
|  JPMorgan_Chase   .0705303    .019187    3.67594   .0004917   69   .7720877   42.68441   5.70e-19 |
| Bank_of_America   .0007755   .0005566   1.393437   .1683845   69   .7670264   41.48337   1.13e-18 |
|       Citigroup   .0000657   .0001337   .4913826   .6248634   69    .647117   23.10588   4.24e-13 |
+---------------------------------------------------------------------------------------------------+

Comment

Ernest Adler

Join Date: May 2015

Posts: 3
#8

29 May 2015, 05:05

I have tried Friedrich's codes and it works like a charm, this is exactly what I needed. Thank you so much for your help, with these codes I can make much progression in a short amount of time. Clyde, your help is also really appreciated. Thanks for all the replies, you two really have helped me alot.
Comment
Friedrich Huebler

Join Date: Apr 2014

Posts: 1053
#9

31 May 2015, 13:26

Originally posted by Friedrich Huebler View Post

For some reason the regression results are saved as strings in the example above. To convert the data to numeric variables, run the commands below.

Code:

use results, clear destring JPMorgan_Chase - Citigroup, replace

The previous version of regsave had a bug: when the addlabel() option was used the results were saved as strings. The author of regsave submitted an updated version to SSC (distribution date 20150530) that no longer has this bug. With the new version it is not necessary to destring the results because they are saved as numeric variables.
Comment
Da GXHI

Join Date: Nov 2020

Posts: 17
#10

09 Nov 2020, 18:34

Dear all,

I have a similar problem but I need to perform weighted regressions where weights are variables, say w_JPMorgan_Chase - w_Citigroup.

Could someone please help me rearrange the code to include weights for each regression?

Thanks a lot!
Comment

Announcement

Run more than one regression with same independent variables

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment