Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regression output and t-tests

    Dear Stata Community

    I am performing a regression analysis with multiple factors (see output of the regression analysis below) and I stumbled upon one question. The regression output provides the estimates of the coefficients, the standard error as well as the t-statistic among others. After analysing the data I noticed that the standard error of a coefficient corresponds to the square root of the variance of respective coefficient, i.e. basically the standard deviation. However, many statistical books and papers provide the one-sample t-tests as the estimate of the coefficient divided by the standard error with the standard error being equal to the standard deviation divided by the square root of the number of observations.

    More precisely, I regress a portfolio excess return on the Fama-French three factors using WLS methodology and want to assess the statistical significance of the constant/intercept. The intercept represents the mean of 213 individual regressions and thus, it's statistical significance should be assessed on the basis of the standard error and not the standard deviation. Hence, do I use the wrong regression command or do I have some logical (and statistical) issues/misthinking in calculating t-tests?

    Thank you and kind regards

    Code:
     regress ewportexc_t mktrf smb hml [aweight=number]
    (sum of wgt is   4.2480e+04)
    
          Source |       SS       df       MS              Number of obs =     214
    -------------+------------------------------           F(  3,   210) =  190.66
           Model |  .688963173     3  .229654391           Prob > F      =  0.0000
        Residual |  .252948617   210  .001204517           R-squared     =  0.7315
    -------------+------------------------------           Adj R-squared =  0.7276
           Total |   .94191179   213  .004422121           Root MSE      =  .03471
    
    ------------------------------------------------------------------------------
     ewportexc_t |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           mktrf |   1.153562   .0507859    22.71   0.000     1.053446    1.253677
             smb |   .2523661   .0632534     3.99   0.000     .1276731    .3770591
             hml |    .436299    .067152     6.50   0.000     .3039206    .5686773
           _cons |   .0003275   .0024164     0.14   0.892    -.0044361     .005091
    ------------------------------------------------------------------------------

  • #2
    After analysing the data I noticed that the standard error of a coefficient corresponds to the square root of the variance of respective coefficient, i.e. basically the standard deviation. However, many statistical books and papers provide the one-sample t-tests as the estimate of the coefficient divided by the standard error with the standard error being equal to the standard deviation divided by the square root of the number of observations.
    This is formulated as though you see a contradiction here. I do not think there is one. Note that the square root of the coefficient does not equal the standard deviation of the respective predictor.

    Best
    Daniel

    Comment


    • #3
      However, when calculating the t-test on my own by using the estimate of _cons (i.e. .0003275) as enumerator and the square root of the variance of the coefficient (in my case for example _cons) divided by the square root of 213 as denominator results in a different t-test as provided in the output. So there seems to be a contradiction or I am on a wrong path... As writen in my initial post, I do not address the square root of the coefficient but the square root of the variance of the coefficient, thus, I cannot follow what you are trying to say in your post.

      Thanks
      Andreas

      Comment


      • #4
        Andreas, it might help if you showed your math. But I have a feeling you are trying to divide by the square root of 213 when that was already done when calculating the standard error of the coefficient.
        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        StataNow Version: 19.5 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://www3.nd.edu/~rwilliam

        Comment


        • #5
          I will try to provide all my calculations below:

          Code:
          . regress ewportexc_t mktrf smb hml [aweight=number]
          (sum of wgt is   4.2480e+04)
          
                Source |       SS       df       MS              Number of obs =     214
          -------------+------------------------------           F(  3,   210) =  190.66
                 Model |  .688963173     3  .229654391           Prob > F      =  0.0000
              Residual |  .252948617   210  .001204517           R-squared     =  0.7315
          -------------+------------------------------           Adj R-squared =  0.7276
                 Total |   .94191179   213  .004422121           Root MSE      =  .03471
          
          ------------------------------------------------------------------------------
           ewportexc_t |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                 mktrf |   1.153562   .0507859    22.71   0.000     1.053446    1.253677
                   smb |   .2523661   .0632534     3.99   0.000     .1276731    .3770591
                   hml |    .436299    .067152     6.50   0.000     .3039206    .5686773
                 _cons |   .0003275   .0024164     0.14   0.892    -.0044361     .005091
          ------------------------------------------------------------------------------
          
          matrix b_ew=e(b)
          matrix var_ew=e(V)
          scalar alpha_ew=b_ew[1,4]
          scalar var_alpha_ew=var_ew[4,4]
          gen sd_alpha_ew=sqrt(var_alpha_ew)
          display sd_alpha_ew
          .00241642
          As visible, the square root of the variance of the constant equals the standard error provided in the output, i.e. it corresponds to the standard deviation. Thus, this gives the t-statistic 0.14 given in the output.

          Code:
          gen sqrtn_ew=sqrt(e(N)) // stores the number of regressions performed, i.e. 214
          gen ttest_ew=alpha_ew/(sd_alpha_ew/sqrtn_ew)
          display ttest_ew
          1.9825038
          The standard error is affected by sample size, i.e with increasing sample size the SE tends to converge towards zero thereby improving the accuracy as an estimate of the population mean. As visible, this t-statistic differs to the one provided in the output.

          Comment


          • #6
            The standard error already reflects the sample size. I don't understand why you think further adjustments are necessary.
            -------------------------------------------
            Richard Williams, Notre Dame Dept of Sociology
            StataNow Version: 19.5 MP (2 processor)

            EMAIL: [email protected]
            WWW: https://www3.nd.edu/~rwilliam

            Comment


            • #7
              I do not address the square root of the coefficient but the square root of the variance of the coefficient, thus, I cannot follow what you are trying to say in your post.
              I admit my post is quite irritating to read, sorry. I was trying to point out that you are probably mixing up coefficients and predictors. In a t-Test, you do not divide the standard deviation of the coefficient (i.e. the mean difference) by the square root of the number of observations to obtain its standard error (as you seem to have in mind for the regression coefficient). Instead you divide the standard deviation of the predictor by the square root of the number of observations.

              To put what Richard has already clarified much better in other words, I was trying to point out that the standard deviation (i.e. square root of the variance) of a coefficient, is what we call its standard error. Thus, while your statement

              the square root of the variance of respective coefficient, i.e. basically the standard deviation
              is true, the standard deviation of the coefficient already accounts for the sample size.

              You can verify that the T-test can be expressed as a linear regression model.
              Code:
              sysuse auto , clear
              
              ttest mpg = 0
              
              reg mpg
              Note that the coefficient, standard error, t-stat etc are exactly the same in both cases.

              Edit.

              In case my point is still unclear, note that

              Code:
              summarize mpg
              di sqrt(r(Var))/(sqrt(r(N)))
              i.e. dividing the standard deviation of mpg (and neither Mean nor _cons) will reproduce what is termed the standard error in both outputs.

              Best
              Daniel
              Last edited by daniel klein; 06 Jul 2015, 10:36.

              Comment

              Working...
              X