Regression by stocks and output variables and t-values

Tristan Sun

Join Date: May 2018

Posts: 5
#1

Regression by stocks and output variables and t-values

08 May 2018, 12:03

Hi all,

I am running simple time-series regression to exam stock liquidity in financial market. This is a panel dataset and each stock has an unique ID number in ID column. The dataset contains 120 stocks in the past 3 years. Column "year" documents the year variable.

My regression is like: y= a + bx + c

Here, I want to run regression for each stock and extract the coefficients of b and its corresponding t -values.
In the end I should get 120 coefficients and t-values.

My is code is like:

Code:

gen beta_x=. qui levelsof ID if ID > 0 & ID < 121 , local(year) foreach v of local year{ qui reg y x,r if ID == `v' qui replace beta_x= _b[_x] if ID == `v' } **Use collapse to summarize data** collapse beta_x, by(ID) outreg2 using c_1, bdec(3) stats (coef tstat) excel replace dec(2)

However, I can't get the desired results. Can someone help me ? THANKS !
Tags: foreach, output, regression, rolling, stocks
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

08 May 2018, 14:43

Well there is one clear problem in the code:

Code:

qui reg y x,r if ID == `v'

should be

Code:

qui reg y x if ID == `v', r

-if- qualifiers precede the comma that sets of options in Stata syntax.

The other clear problem is:

Code:

qui replace beta_x= _b[_x] if ID == `v'

should be

Code:

qui replace beta_x= _b[x] if ID == `v'

The _b[] reference does begin with an underscore character, but you do not add an underscore before the name of the variable whose coefficient you are trying to access.

With those changes, your code has a good chance of running. It will not, however, produce any t-statistics because you do nothing to create them. If you want those, the easiest way to get them is to save _se[x], and then calculate t as the quotient of beta and the standard error.

Pitfalls may also arise if there are some ID's for which there are not enough observations to do the desired regression.

In the future, when asking for help with code, please show a data example that goes with the code, and, please use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10195

08 May 2018, 14:53

Clyde gives good advice as always. I find it easier to store results in a matrix which can then be exported to Excel or some other editor. Based on your example, here is one way

Code:

qui levelsof ID if  ID > 0 & ID < 121 , local(year)
local n: word count `year'
matrix R = J(`n',2,.)

local i=1
foreach v of local year{
qui reg y x if ID == `v',r 
matrix R[`i',1]= _b[x]
matrix R[`i',2]= round(_b[x]/_se[x], 0.01)
local ++i
}

putexcel set "results.xls", sheet("Coefficients and t-statistics") 
putexcel A1=("Coefficient")
putexcel B1=("t-statistic")
putexcel A2=matrix(R)

Comment

Tristan Sun

Join Date: May 2018

Posts: 5
#4

08 May 2018, 16:28

Clyde Schechter Dear Mighty Clyde, Thank you so much. I will -dataex- next time to share the sample.

I think your code works, but it gives me 'matrix e(b) not found; run/post a regression, or specify varlist for non-regression outputs'.

I ran the normal regression first, y = a +bx +c , and ran your code. However, it only gave me the regression results for the whole sample. I am new to STATA. Could you please advise ? Thanks.
Comment
Tristan Sun

Join Date: May 2018

Posts: 5
#5

08 May 2018, 16:48

Andrew Musau Hi Andrew, Thanks. The code works fine and 'results.xls' was generated. However, there is nothing in the xls file. I think there are some issues with my current STATA version. 6 months ago I could click open the xls results outputted from STATA, but now i cannot open the xls file. STATA only gives me warning message.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10195
#6

09 May 2018, 03:25

Can you post the commands and errors that you get?

'matrix e(b) not found; run/post a regression, or specify varlist for non-regression outputs'.

This implies that your regressions did not run. For diagnostic purposes, it may also be useful to include the following lines

Code:

qui levelsof ID if ID > 0 & ID < 121 , local(year) local n: word count `year' di `n' matrix R = J(`n',2,.) local i=1 foreach v of local year{ qui reg y x if ID == `v',r matrix R[`i',1]= _b[x] matrix R[`i',2]= round(_b[x]/_se[x], 0.01) local ++i } mat list R putexcel set "results.xls", sheet("Coefficients and t-statistics") putexcel A1=("Coefficient") putexcel B1=("t-statistic") putexcel A2=matrix(R)

So if matrix R has all elements, then the issue arises when you are exporting output to Excel. Otherwise, there may be some problems with your implementation of the regressions.
1 like
Comment

Tristan Sun

Join Date: May 2018
Posts: 5

09 May 2018, 07:53

Andrew Musau Thank you very much. The code works perfectly !

Can I ask you how to extract R square and adjusted R square from each regression ?
I used the following

Code:

qui levelsof ID if ID > 0 & ID < 121 , local(year) local n: word count `year'
di `n'
matrix R = J(`n',4,.)
local i=1
foreach v of local year{
qui reg y x if ID == `v',r
matrix R[`i',1]= _b[x]
matrix R[`i',2]= round(_b[x]/_se[x], 0.01)
matrix R[`i',3]= e(r2)
matrix R[`i',4]= e(r2_a)
local ++i }

mat list R

However, the system says "r2 not found"

Thank You Very much.

Comment

Andrew Musau

Join Date: Oct 2014

Posts: 10195
#8

09 May 2018, 08:14

Your syntax looks fine to me. Try running

Code:

foreach v of local year{ reg y x if ID == `v',r di e(r2) di e(r2_a) }

and see if you can spot the problem, i.e., if you still get the error.
1 like
Comment
Tristan Sun

Join Date: May 2018

Posts: 5
#9

10 May 2018, 04:28

Andrew Musau Thank you very much Andrew.
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

#10

10 May 2018, 09:29

You can also use runby (from SSC) to run any number of commands on data subsets defined using by-groups. For each distinct value of ID, runby will run the myreg program defined below. Before running myreg, runby replaces the data in memory with the subset of observations defined using the current distinct value of ID. This is more efficient as there is no need to use the if qualifier to restrict commands to a particular subset and makes for simpler code. With runby, what's left in memory accumulates and when runby finishes processing all by-groups, the data in memory is replaced with the accumulated results.

Code:

* create a demonstration dataset with 120 stocks, each with 36 months of data
clear all
set seed 321
set obs 120
gen ID = _n
expand 36
bysort ID: gen time = _n
gen y = runiform()
gen x = runiform()

* do a single case that we'll spot check later on
reg y x if ID == 21, robust

program myreg
    capture noisily reg y x, robust
    keep in 1
    keep ID
    gen nobs = e(N)
    gen b_x  = _b[x]
    gen t_x  = _b[x] / _se[x]
    gen r2   = e(r2)
    gen r2a  = e(r2_a)
end
runby myreg, by(ID)

* spot check one case
list if ID == 21

The myreg program runs the desired regression and then reduces the data to a single observation with a single variable (ID). The program then creates the desired variables using the estimation results. Here's the output:

Code:

. * do a single case that we'll spot check later on
. reg y x if ID == 21, robust

Linear regression                               Number of obs     =         36
                                                F(1, 34)          =       0.08
                                                Prob > F          =     0.7835
                                                R-squared         =     0.0020
                                                Root MSE          =     .30277

------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |  -.0509101   .1838154    -0.28   0.783    -.4244679    .3226478
       _cons |   .5201195    .095969     5.42   0.000     .3250871     .715152
------------------------------------------------------------------------------

. 
. program myreg
  1.         capture noisily reg y x, robust
  2.         keep in 1
  3.         keep ID
  4.         gen nobs = e(N)
  5.         gen b_x  = _b[x]
  6.         gen t_x  = _b[x] / _se[x]
  7.         gen r2   = e(r2)
  8.         gen r2a  = e(r2_a)
  9. end

. runby myreg, by(ID)

--------------------------------------
Number of by-groups    =           120
by-groups with errors  =             0
by-groups with no data =             0
Observations processed =         4,320
Observations saved     =           120
--------------------------------------

. 
. * spot check one case
. list if ID == 21

     +----------------------------------------------------------+
     | ID   nobs         b_x         t_x         r2         r2a |
     |----------------------------------------------------------|
 21. | 21     36   -.0509101   -.2769631   .0020239   -.0273283 |
     +----------------------------------------------------------+

Announcement