Why has extremely large t-statistics?

Jae Li

Join Date: May 2017
Posts: 184

Why has extremely large t-statistics?

19 Jan 2018, 12:16

Dear Statalists,

I have run an OLS regression and the estimated coefficient of the constant term is slightly higher than the correct answer, but its t-statistic is extremely larger than the correct t-statistics (i,e, 5.99).

Here is the partial data:

Code:


	Code:
	* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id ym exret_m1) double mktrf
 295 -377 .01937785 .0668
1481 -377 .01937785 .0668
2977 -377 .01937785 .0668
1156 -377 .01937785 .0668
4854 -377 .01937785 .0668
2188 -377 .01937785 .0668
2597 -377 .01937785 .0668
1749 -377 .01937785 .0668
 607 -377 .01937785 .0668
1156 -377 .01937785 .0668
 528 -377 .01937785 .0668
2889 -377 .01937785 .0668
2867 -377 .01937785 .0668
2877 -377 .01937785 .0668
2430 -377 .01937785 .0668
2692 -377 .01937785 .0668
1481 -377 .01937785 .0668
1879 -377 .01937785 .0668
2600 -377 .01937785 .0668
3058 -377 .01937785 .0668
2971 -377 .01937785 .0668
 528 -377 .01937785 .0668
1630 -377 .01937785 .0668
2289 -377 .01937785 .0668
2386 -377 .01937785 .0668
2740 -377 .01937785 .0668
2877 -377 .01937785 .0668
3058 -377 .01937785 .0668
1792 -377 .01937785 .0668
2975 -377 .01937785 .0668
1553 -377 .01937785 .0668
1156 -377 .01937785 .0668
2600 -377 .01937785 .0668
2600 -377 .01937785 .0668
2430 -377 .01937785 .0668
2650 -377 .01937785 .0668
2188 -377 .01937785 .0668
2889 -377 .01937785 .0668
2634 -377 .01937785 .0668
2837 -377 .01937785 .0668
2839 -377 .01937785 .0668
2597 -377 .01937785 .0668
2837 -377 .01937785 .0668
 528 -377 .01937785 .0668
2975 -377 .01937785 .0668
1156 -377 .01937785 .0668
1879 -377 .01937785 .0668
2914 -377 .01937785 .0668
2692 -377 .01937785 .0668
1792 -377 .01937785 .0668
2430 -377 .01937785 .0668
2364 -377 .01937785 .0668
2692 -377 .01937785 .0668
2634 -377 .01937785 .0668
1630 -377 .01937785 .0668
2364 -377 .01937785 .0668
2867 -377 .01937785 .0668
2947 -377 .01937785 .0668
2333 -377 .01937785 .0668
2817 -377 .01937785 .0668
2188 -377 .01937785 .0668
2386 -377 .01937785 .0668
2722 -377 .01937785 .0668
3045 -377 .01937785 .0668
2650 -377 .01937785 .0668
1553 -377 .01937785 .0668
2600 -377 .01937785 .0668
4854 -377 .01937785 .0668
1553 -377 .01937785 .0668
3058 -377 .01937785 .0668
2364 -377 .01937785 .0668
2740 -377 .01937785 .0668
1792 -377 .01937785 .0668
1461 -377 .01937785 .0668
3058 -377 .01937785 .0668
2430 -377 .01937785 .0668
2585 -377 .01937785 .0668
2839 -377 .01937785 .0668
2947 -377 .01937785 .0668
1792 -377 .01937785 .0668
2333 -377 .01937785 .0668
2235 -377 .01937785 .0668
2837 -377 .01937785 .0668
1097 -377 .01937785 .0668
2839 -377 .01937785 .0668
2289 -377 .01937785 .0668
1481 -377 .01937785 .0668
1097 -377 .01937785 .0668
1156 -377 .01937785 .0668
2364 -377 .01937785 .0668
2585 -377 .01937785 .0668
2971 -377 .01937785 .0668
2333 -377 .01937785 .0668
 528 -377 .01937785 .0668
2585 -377 .01937785 .0668
1792 -377 .01937785 .0668
 800 -377 .01937785 .0668
3043 -377 .01937785 .0668
1879 -377 .01937785 .0668
 800 -377 .01937785 .0668
end
format %tm ym
------------------ copy up to and including the previous line ------------------

Listed 100 out of 69940837 observations
Use the count() option to list more

The following is the output:

Code:

reg exret mktrf  //excess return and market factor

      Source |       SS           df       MS      Number of obs   = 5,609,182
-------------+----------------------------------   F(1, 5609180)   >  99999.00
       Model |   9219.0744         1   9219.0744   Prob > F        =    0.0000
    Residual |  4592.82575 5,609,180  .000818805   R-squared       =    0.6675
-------------+----------------------------------   Adj R-squared   =    0.6675
       Total |  13811.9002 5,609,181  .002462374   Root MSE        =    .02861

------------------------------------------------------------------------------
    exret_m1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       mktrf |   .8327924   .0002482  3355.47   0.000      .832306    .8332789
       _cons |   .0054813   .0000122   450.34   0.000     .0054575    .0055052
------------------------------------------------------------------------------

Is there any Stata professional who can spot the issue? Is the dependent variable or independent variable wrong? Hope to hear from you soon! Many thanks in advance!

Tags: None

Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#2

19 Jan 2018, 13:45

since there is no variation in any of the variables you show us (other than id), a solution is not possible; I think you need to explain your data in much more detail (i.e., explain each variable and why there are so many duplicates)
Comment

Jae Li

Join Date: May 2017
Posts: 184

19 Jan 2018, 15:09

@Rich Goldstein Hi Rich, thank you for your reply! I've made a mistake that I forgot to add one more important variable, that is P, so now let me explain each of the variables:

id is the company number.
ym is the month of the year.
P is the decile portfolio which is indicated as 1 to 10.
exret_m is the monthly portfolio excess return.
mktrf is the monthly market excess return.

Here is a better version of my partial data for your review:

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id ym P exret_m) double mktrf
2 348 1   .036809705                 .061
2 348 1   .036809705                 .061
2 348 1   .036809705                 .061
2 348 1   .036809705                 .061
2 348 1   .036809705                 .061
2 348 1   .036809705                 .061
2 348 1   .036809705                 .061
2 348 1   .036809705                 .061
2 348 1   .036809705                 .061
2 348 1   .036809705                 .061
2 348 1   .036809705                 .061
2 348 1   .036809705                 .061
2 348 1   .036809705                 .061
2 348 1   .036809705                 .061
2 348 1   .036809705                 .061
2 348 1   .036809705                 .061
2 348 1   .036809705                 .061
2 348 1   .036809705                 .061
2 348 1   .036809705                 .061
2 348 1   .036809705                 .061
2 348 1   .036809705                 .061
2 349 1 -.0011732077               -.0225
2 349 1 -.0011732077               -.0225
2 349 1 -.0011732077               -.0225
2 349 1 -.0011732077               -.0225
2 426 2    .04106256  .037200000000000004
2 426 2    .04106256  .037200000000000004
2 426 2    .04106256  .037200000000000004
2 426 2    .04106256  .037200000000000004
2 426 2    .04106256  .037200000000000004
2 426 2    .04106256  .037200000000000004
2 426 2    .04106256  .037200000000000004
2 426 2    .04106256  .037200000000000004
2 426 2    .04106256  .037200000000000004
2 426 2    .04106256  .037200000000000004
2 426 2    .04106256  .037200000000000004
2 426 2    .04106256  .037200000000000004
2 426 2    .04106256  .037200000000000004
2 426 2    .04106256  .037200000000000004
2 426 2    .04106256  .037200000000000004
2 426 2    .04106256  .037200000000000004
2 426 2    .04106256  .037200000000000004
2 426 2    .04106256  .037200000000000004
2 426 2    .04106256  .037200000000000004
2 426 2    .04106256  .037200000000000004
2 428 2   .015792679                .0335
2 428 2   .015792679                .0335
2 428 2   .015792679                .0335
2 428 2   .015792679                .0335
2 428 2   .015792679                .0335
3 543 3    -.0401035 -.026099999999999998
3 543 3    -.0401035 -.026099999999999998
3 543 3    -.0401035 -.026099999999999998
3 543 3    -.0401035 -.026099999999999998
3 543 3    -.0401035 -.026099999999999998
3 543 3    -.0401035 -.026099999999999998
3 543 3    -.0401035 -.026099999999999998
3 543 3    -.0401035 -.026099999999999998
3 543 3    -.0401035 -.026099999999999998
3 543 3    -.0401035 -.026099999999999998
3 543 3    -.0401035 -.026099999999999998
3 543 3    -.0401035 -.026099999999999998
3 543 3    -.0401035 -.026099999999999998
3 543 3    -.0401035 -.026099999999999998
3 543 3    -.0401035 -.026099999999999998
3 543 3    -.0401035 -.026099999999999998
3 543 3    -.0401035 -.026099999999999998
3 543 3    -.0401035 -.026099999999999998
3 543 3    -.0401035 -.026099999999999998
3 543 3    -.0401035 -.026099999999999998
3 543 3    -.0401035 -.026099999999999998
3 562 3    .01261218                .0171
3 562 3    .01261218                .0171
3 562 3    .01261218                .0171
3 562 3    .01261218                .0171
3 564 4    .01944657  .013999999999999999
3 564 4    .01944657  .013999999999999999
3 564 4    .01944657  .013999999999999999
3 564 4    .01944657  .013999999999999999
3 564 4    .01944657  .013999999999999999
3 564 4    .01944657  .013999999999999999
3 564 4    .01944657  .013999999999999999
3 564 4    .01944657  .013999999999999999
3 564 4    .01944657  .013999999999999999
3 564 4    .01944657  .013999999999999999
3 564 4    .01944657  .013999999999999999
3 564 4    .01944657  .013999999999999999
3 564 4    .01944657  .013999999999999999
3 564 4    .01944657  .013999999999999999
3 564 4    .01944657  .013999999999999999
3 564 4    .01944657  .013999999999999999
3 564 4    .01944657  .013999999999999999
3 564 4    .01944657  .013999999999999999
3 564 4    .01944657  .013999999999999999
3 564 4    .01944657  .013999999999999999
3 565 4  -.005436508               -.0196
3 565 4  -.005436508               -.0196
3 565 4  -.005436508               -.0196
3 565 4  -.005436508               -.0196
3 565 4  -.005436508               -.0196
end
format %tm ym

------------------ copy up to and including the previous line ------------------

Listed 100 out of 275 observations
Use the count() option to list more

When I ran the OLS using the following code:

bys P: reg exret_m mktrf

Each portfolio will generate an extreme large t-statistic and extreme small standard error for the estimated coefficient of the constant term (b_cons), such as the following example:

Code:

-> P = 1

      Source |       SS           df       MS      Number of obs   =  11396617
-------------+----------------------------------   F(1, 11396615)  >  99999.00
       Model |  10825.1192         1  10825.1192   Prob > F        =    0.0000
    Residual |  8564.82914  11396615  .000751524   R-squared       =    0.5583
-------------+----------------------------------   Adj R-squared   =    0.5583
       Total |  19389.9483  11396616  .001701378   Root MSE        =    .02741

------------------------------------------------------------------------------
    exret_m1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       mktrf |   .6507903   .0001715  3795.29   0.000     .6504542    .6511264
       _cons |   .0061281   8.19e-06   748.64   0.000     .0061121    .0061442
------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-> P = 2

      Source |       SS           df       MS      Number of obs   = 5,609,182
-------------+----------------------------------   F(1, 5609180)   >  99999.00
       Model |   9219.0744         1   9219.0744   Prob > F        =    0.0000
    Residual |  4592.82575 5,609,180  .000818805   R-squared       =    0.6675
-------------+----------------------------------   Adj R-squared   =    0.6675
       Total |  13811.9002 5,609,181  .002462374   Root MSE        =    .02861

------------------------------------------------------------------------------
    exret_m1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       mktrf |   .8327924   .0002482  3355.47   0.000      .832306    .8332789
       _cons |   .0054813   .0000122   450.34   0.000     .0054575    .0055052
------------------------------------------------------------------------------

which confuses me very much! Do you possibly know why? Thank you a lot!

Last edited by Jae Li; 19 Jan 2018, 15:14.

Comment

Michael Droste

Join Date: Sep 2017

Posts: 24
#4

19 Jan 2018, 15:43

Why do you think the 'correct' t-stat is 5.99? You have a ton of observations and you are running a univariate regression on variables that are highly related - it should not be surprising that the t statistics are huge. Are you trying to replicate results using someone else's dataset? Are you sure that your specification is correct, i.e. the model you're estimating is not supposed to include additional terms, fixed effects, etc? Robust or cluster-robust standard errors will probably make the t-stat go down, but almost certainly not by three orders of magnitude. It is tough for us to say more without knowing anything else about what you're doing.

Last edited by Michael Droste; 19 Jan 2018, 15:46.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#5

19 Jan 2018, 15:45

Well, this is very strange looking data. In your 100-observation example there are actually only 8 different observations. Everything else is a duplicate of one of those. In a data set like that, it is unsurprising to see very small standard errors and very large t-statistics.

Moreover, if we run your -by P, sort: reg exret_m mktrf- command in your example data, in each regression there are only two distinct observations of mktrf and exret_m in each P, so all of those regressions constitute fitting a line to just two points. The fit is perfect, and the standard errors are zero and the t-statistics (if they could be calculated) would be infinite.

I suspect there is something wrong with this data set. It does not seem sensible to have large numbers of replicate observations like that. If that is, indeed, the real data, then your results, whether you like them are not, are what the data delivers.
Comment

Jae Li

Join Date: May 2017
Posts: 184

20 Jan 2018, 07:58

@Michael Droste @Clyde Schechter Hi Michael and Clyde, thank you for your replies! Yes, I am replicating a paper and their t-statistic of second decile portfolio is 5.99. However, the paper doesn't explain the details of the t-statistics and just said t-statistics are shown below the coefficient estimates, so I have no idea if they ever used additional terms. Your posts give me more ideas so when I try to put more specifications, such as random effects, fixed effects, robust or cluster robust options. All the trials basically only give two different results and here are output results:

Code:

 . xtreg exret_m mktrf if P ==1, re vce(cluster id)

Random-effects GLS regression                   Number of obs     = 11,396,617
Group variable: id                              Number of groups  =     10,397

R-sq:                                           Obs per group:
     within  = 0.5630                                         min =          1
     between = 0.6727                                         avg =    1,096.1
     overall = 0.5583                                         max =     16,901

                                                Wald chi2(1)      =  151446.11
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

                                (Std. Err. adjusted for 10,397 clusters in id)
------------------------------------------------------------------------------
             |               Robust
    exret_m1 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       mktrf |   .6484091   .0016662   389.16   0.000     .6451435    .6516747
       _cons |    .005136   .0001138    45.15   0.000     .0049131     .005359
-------------+----------------------------------------------------------------
     sigma_u |  .01152767
     sigma_e |  .02683067
         rho |  .15582957   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Code:

. xtreg exret_m mktrf if P ==1, fe vce(cluster id)

Fixed-effects (within) regression               Number of obs     = 11,396,617
Group variable: id                              Number of groups  =     10,397

R-sq:                                           Obs per group:
     within  = 0.5630                                         min =          1
     between = 0.6727                                         avg =    1,096.1
     overall = 0.5583                                         max =     16,901

                                                F(1,10396)        =  151179.05
corr(u_i, Xb)  = 0.0212                         Prob > F          =     0.0000

                                (Std. Err. adjusted for 10,397 clusters in id)
------------------------------------------------------------------------------
             |               Robust
    exret_m1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       mktrf |   .6482425   .0016672   388.82   0.000     .6449745    .6515106
       _cons |   .0061434     .00001   612.85   0.000     .0061238    .0061631
-------------+----------------------------------------------------------------
     sigma_u |  .01258796
     sigma_e |  .02683067
         rho |  .18040445   (fraction of variance due to u_i)
------------------------------------------------------------------------------

The data is downloaded from CRSP database and exret_m is the average portfolio excess return which is equal to the average monthly portfolio return - risk-free rate.

But the above trials also give high t-statistics, do you possibly know other specifications that may give better results? Many thanks for your advices!

Last edited by Jae Li; 20 Jan 2018, 08:08.

Announcement

Why has extremely large t-statistics?

Comment

Comment

Comment

Comment

Comment