Regression output and t-tests

Andreas Mueller

Join Date: Apr 2015

Posts: 12
#1

Regression output and t-tests

06 Jul 2015, 07:07

Dear Stata Community

I am performing a regression analysis with multiple factors (see output of the regression analysis below) and I stumbled upon one question. The regression output provides the estimates of the coefficients, the standard error as well as the t-statistic among others. After analysing the data I noticed that the standard error of a coefficient corresponds to the square root of the variance of respective coefficient, i.e. basically the standard deviation. However, many statistical books and papers provide the one-sample t-tests as the estimate of the coefficient divided by the standard error with the standard error being equal to the standard deviation divided by the square root of the number of observations.

More precisely, I regress a portfolio excess return on the Fama-French three factors using WLS methodology and want to assess the statistical significance of the constant/intercept. The intercept represents the mean of 213 individual regressions and thus, it's statistical significance should be assessed on the basis of the standard error and not the standard deviation. Hence, do I use the wrong regression command or do I have some logical (and statistical) issues/misthinking in calculating t-tests?

Thank you and kind regards

Code:

regress ewportexc_t mktrf smb hml [aweight=number] (sum of wgt is 4.2480e+04) Source | SS df MS Number of obs = 214 -------------+------------------------------ F( 3, 210) = 190.66 Model | .688963173 3 .229654391 Prob > F = 0.0000 Residual | .252948617 210 .001204517 R-squared = 0.7315 -------------+------------------------------ Adj R-squared = 0.7276 Total | .94191179 213 .004422121 Root MSE = .03471 ------------------------------------------------------------------------------ ewportexc_t | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mktrf | 1.153562 .0507859 22.71 0.000 1.053446 1.253677 smb | .2523661 .0632534 3.99 0.000 .1276731 .3770591 hml | .436299 .067152 6.50 0.000 .3039206 .5686773 _cons | .0003275 .0024164 0.14 0.892 -.0044361 .005091 ------------------------------------------------------------------------------
Tags: None
daniel klein

Join Date: Mar 2014

Posts: 3890
#2

06 Jul 2015, 07:30

After analysing the data I noticed that the standard error of a coefficient corresponds to the square root of the variance of respective coefficient, i.e. basically the standard deviation. However, many statistical books and papers provide the one-sample t-tests as the estimate of the coefficient divided by the standard error with the standard error being equal to the standard deviation divided by the square root of the number of observations.

This is formulated as though you see a contradiction here. I do not think there is one. Note that the square root of the coefficient does not equal the standard deviation of the respective predictor.

Best
Daniel
Comment
Andreas Mueller

Join Date: Apr 2015

Posts: 12
#3

06 Jul 2015, 08:36

However, when calculating the t-test on my own by using the estimate of _cons (i.e. .0003275) as enumerator and the square root of the variance of the coefficient (in my case for example _cons) divided by the square root of 213 as denominator results in a different t-test as provided in the output. So there seems to be a contradiction or I am on a wrong path... As writen in my initial post, I do not address the square root of the coefficient but the square root of the variance of the coefficient, thus, I cannot follow what you are trying to say in your post.

Thanks
Andreas
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5026
#4

06 Jul 2015, 08:51

Andreas, it might help if you showed your math. But I have a feeling you are trying to divide by the square root of 213 when that was already done when calculating the standard error of the coefficient.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment

Andreas Mueller

Join Date: Apr 2015
Posts: 12

06 Jul 2015, 09:09

I will try to provide all my calculations below:

Code:

. regress ewportexc_t mktrf smb hml [aweight=number]
(sum of wgt is   4.2480e+04)

      Source |       SS       df       MS              Number of obs =     214
-------------+------------------------------           F(  3,   210) =  190.66
       Model |  .688963173     3  .229654391           Prob > F      =  0.0000
    Residual |  .252948617   210  .001204517           R-squared     =  0.7315
-------------+------------------------------           Adj R-squared =  0.7276
       Total |   .94191179   213  .004422121           Root MSE      =  .03471

------------------------------------------------------------------------------
 ewportexc_t |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       mktrf |   1.153562   .0507859    22.71   0.000     1.053446    1.253677
         smb |   .2523661   .0632534     3.99   0.000     .1276731    .3770591
         hml |    .436299    .067152     6.50   0.000     .3039206    .5686773
       _cons |   .0003275   .0024164     0.14   0.892    -.0044361     .005091
------------------------------------------------------------------------------

matrix b_ew=e(b)
matrix var_ew=e(V)
scalar alpha_ew=b_ew[1,4]
scalar var_alpha_ew=var_ew[4,4]
gen sd_alpha_ew=sqrt(var_alpha_ew)
display sd_alpha_ew
.00241642

As visible, the square root of the variance of the constant equals the standard error provided in the output, i.e. it corresponds to the standard deviation. Thus, this gives the t-statistic 0.14 given in the output.

Code:

gen sqrtn_ew=sqrt(e(N)) // stores the number of regressions performed, i.e. 214
gen ttest_ew=alpha_ew/(sd_alpha_ew/sqrtn_ew)
display ttest_ew
1.9825038

The standard error is affected by sample size, i.e with increasing sample size the SE tends to converge towards zero thereby improving the accuracy as an estimate of the population mean. As visible, this t-statistic differs to the one provided in the output.

Comment

Richard Williams

Join Date: Apr 2014

Posts: 5026
#6

06 Jul 2015, 09:35

The standard error already reflects the sample size. I don't understand why you think further adjustments are necessary.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
daniel klein

Join Date: Mar 2014

Posts: 3890
#7

06 Jul 2015, 10:31

I do not address the square root of the coefficient but the square root of the variance of the coefficient, thus, I cannot follow what you are trying to say in your post.

I admit my post is quite irritating to read, sorry. I was trying to point out that you are probably mixing up coefficients and predictors. In a t-Test, you do not divide the standard deviation of the coefficient (i.e. the mean difference) by the square root of the number of observations to obtain its standard error (as you seem to have in mind for the regression coefficient). Instead you divide the standard deviation of the predictor by the square root of the number of observations.

To put what Richard has already clarified much better in other words, I was trying to point out that the standard deviation (i.e. square root of the variance) of a coefficient, is what we call its standard error. Thus, while your statement

the square root of the variance of respective coefficient, i.e. basically the standard deviation

is true, the standard deviation of the coefficient already accounts for the sample size.

You can verify that the T-test can be expressed as a linear regression model.

Code:

sysuse auto , clear ttest mpg = 0 reg mpg

Note that the coefficient, standard error, t-stat etc are exactly the same in both cases.

Edit.

In case my point is still unclear, note that

Code:

summarize mpg di sqrt(r(Var))/(sqrt(r(N)))

i.e. dividing the standard deviation of mpg (and neither Mean nor _cons) will reproduce what is termed the standard error in both outputs.

Best
Daniel

Last edited by daniel klein; 06 Jul 2015, 10:36.
Comment

Announcement