Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why has extremely large t-statistics?

    Dear Statalists,

    I have run an OLS regression and the estimated coefficient of the constant term is slightly higher than the correct answer, but its t-statistic is extremely larger than the correct t-statistics (i,e, 5.99).

    Here is the partial data:

    Code:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(id ym exret_m1) double mktrf
     295 -377 .01937785 .0668
    1481 -377 .01937785 .0668
    2977 -377 .01937785 .0668
    1156 -377 .01937785 .0668
    4854 -377 .01937785 .0668
    2188 -377 .01937785 .0668
    2597 -377 .01937785 .0668
    1749 -377 .01937785 .0668
     607 -377 .01937785 .0668
    1156 -377 .01937785 .0668
     528 -377 .01937785 .0668
    2889 -377 .01937785 .0668
    2867 -377 .01937785 .0668
    2877 -377 .01937785 .0668
    2430 -377 .01937785 .0668
    2692 -377 .01937785 .0668
    1481 -377 .01937785 .0668
    1879 -377 .01937785 .0668
    2600 -377 .01937785 .0668
    3058 -377 .01937785 .0668
    2971 -377 .01937785 .0668
     528 -377 .01937785 .0668
    1630 -377 .01937785 .0668
    2289 -377 .01937785 .0668
    2386 -377 .01937785 .0668
    2740 -377 .01937785 .0668
    2877 -377 .01937785 .0668
    3058 -377 .01937785 .0668
    1792 -377 .01937785 .0668
    2975 -377 .01937785 .0668
    1553 -377 .01937785 .0668
    1156 -377 .01937785 .0668
    2600 -377 .01937785 .0668
    2600 -377 .01937785 .0668
    2430 -377 .01937785 .0668
    2650 -377 .01937785 .0668
    2188 -377 .01937785 .0668
    2889 -377 .01937785 .0668
    2634 -377 .01937785 .0668
    2837 -377 .01937785 .0668
    2839 -377 .01937785 .0668
    2597 -377 .01937785 .0668
    2837 -377 .01937785 .0668
     528 -377 .01937785 .0668
    2975 -377 .01937785 .0668
    1156 -377 .01937785 .0668
    1879 -377 .01937785 .0668
    2914 -377 .01937785 .0668
    2692 -377 .01937785 .0668
    1792 -377 .01937785 .0668
    2430 -377 .01937785 .0668
    2364 -377 .01937785 .0668
    2692 -377 .01937785 .0668
    2634 -377 .01937785 .0668
    1630 -377 .01937785 .0668
    2364 -377 .01937785 .0668
    2867 -377 .01937785 .0668
    2947 -377 .01937785 .0668
    2333 -377 .01937785 .0668
    2817 -377 .01937785 .0668
    2188 -377 .01937785 .0668
    2386 -377 .01937785 .0668
    2722 -377 .01937785 .0668
    3045 -377 .01937785 .0668
    2650 -377 .01937785 .0668
    1553 -377 .01937785 .0668
    2600 -377 .01937785 .0668
    4854 -377 .01937785 .0668
    1553 -377 .01937785 .0668
    3058 -377 .01937785 .0668
    2364 -377 .01937785 .0668
    2740 -377 .01937785 .0668
    1792 -377 .01937785 .0668
    1461 -377 .01937785 .0668
    3058 -377 .01937785 .0668
    2430 -377 .01937785 .0668
    2585 -377 .01937785 .0668
    2839 -377 .01937785 .0668
    2947 -377 .01937785 .0668
    1792 -377 .01937785 .0668
    2333 -377 .01937785 .0668
    2235 -377 .01937785 .0668
    2837 -377 .01937785 .0668
    1097 -377 .01937785 .0668
    2839 -377 .01937785 .0668
    2289 -377 .01937785 .0668
    1481 -377 .01937785 .0668
    1097 -377 .01937785 .0668
    1156 -377 .01937785 .0668
    2364 -377 .01937785 .0668
    2585 -377 .01937785 .0668
    2971 -377 .01937785 .0668
    2333 -377 .01937785 .0668
     528 -377 .01937785 .0668
    2585 -377 .01937785 .0668
    1792 -377 .01937785 .0668
     800 -377 .01937785 .0668
    3043 -377 .01937785 .0668
    1879 -377 .01937785 .0668
     800 -377 .01937785 .0668
    end
    format %tm ym
    ------------------ copy up to and including the previous line ------------------ Listed 100 out of 69940837 observations Use the count() option to list more
    The following is the output:

    Code:
    reg exret mktrf  //excess return and market factor
    
          Source |       SS           df       MS      Number of obs   = 5,609,182
    -------------+----------------------------------   F(1, 5609180)   >  99999.00
           Model |   9219.0744         1   9219.0744   Prob > F        =    0.0000
        Residual |  4592.82575 5,609,180  .000818805   R-squared       =    0.6675
    -------------+----------------------------------   Adj R-squared   =    0.6675
           Total |  13811.9002 5,609,181  .002462374   Root MSE        =    .02861
    
    ------------------------------------------------------------------------------
        exret_m1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           mktrf |   .8327924   .0002482  3355.47   0.000      .832306    .8332789
           _cons |   .0054813   .0000122   450.34   0.000     .0054575    .0055052
    ------------------------------------------------------------------------------

    Is there any Stata professional who can spot the issue? Is the dependent variable or independent variable wrong? Hope to hear from you soon! Many thanks in advance!

  • #2
    since there is no variation in any of the variables you show us (other than id), a solution is not possible; I think you need to explain your data in much more detail (i.e., explain each variable and why there are so many duplicates)

    Comment


    • #3
      @Rich Goldstein Hi Rich, thank you for your reply! I've made a mistake that I forgot to add one more important variable, that is P, so now let me explain each of the variables:

      id is the company number.
      ym is the month of the year.
      P is the decile portfolio which is indicated as 1 to 10.
      exret_m is the monthly portfolio excess return.
      mktrf is the monthly market excess return.

      Here is a better version of my partial data for your review:

      ----------------------- copy starting from the next line -----------------------
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(id ym P exret_m) double mktrf
      2 348 1   .036809705                 .061
      2 348 1   .036809705                 .061
      2 348 1   .036809705                 .061
      2 348 1   .036809705                 .061
      2 348 1   .036809705                 .061
      2 348 1   .036809705                 .061
      2 348 1   .036809705                 .061
      2 348 1   .036809705                 .061
      2 348 1   .036809705                 .061
      2 348 1   .036809705                 .061
      2 348 1   .036809705                 .061
      2 348 1   .036809705                 .061
      2 348 1   .036809705                 .061
      2 348 1   .036809705                 .061
      2 348 1   .036809705                 .061
      2 348 1   .036809705                 .061
      2 348 1   .036809705                 .061
      2 348 1   .036809705                 .061
      2 348 1   .036809705                 .061
      2 348 1   .036809705                 .061
      2 348 1   .036809705                 .061
      2 349 1 -.0011732077               -.0225
      2 349 1 -.0011732077               -.0225
      2 349 1 -.0011732077               -.0225
      2 349 1 -.0011732077               -.0225
      2 426 2    .04106256  .037200000000000004
      2 426 2    .04106256  .037200000000000004
      2 426 2    .04106256  .037200000000000004
      2 426 2    .04106256  .037200000000000004
      2 426 2    .04106256  .037200000000000004
      2 426 2    .04106256  .037200000000000004
      2 426 2    .04106256  .037200000000000004
      2 426 2    .04106256  .037200000000000004
      2 426 2    .04106256  .037200000000000004
      2 426 2    .04106256  .037200000000000004
      2 426 2    .04106256  .037200000000000004
      2 426 2    .04106256  .037200000000000004
      2 426 2    .04106256  .037200000000000004
      2 426 2    .04106256  .037200000000000004
      2 426 2    .04106256  .037200000000000004
      2 426 2    .04106256  .037200000000000004
      2 426 2    .04106256  .037200000000000004
      2 426 2    .04106256  .037200000000000004
      2 426 2    .04106256  .037200000000000004
      2 426 2    .04106256  .037200000000000004
      2 428 2   .015792679                .0335
      2 428 2   .015792679                .0335
      2 428 2   .015792679                .0335
      2 428 2   .015792679                .0335
      2 428 2   .015792679                .0335
      3 543 3    -.0401035 -.026099999999999998
      3 543 3    -.0401035 -.026099999999999998
      3 543 3    -.0401035 -.026099999999999998
      3 543 3    -.0401035 -.026099999999999998
      3 543 3    -.0401035 -.026099999999999998
      3 543 3    -.0401035 -.026099999999999998
      3 543 3    -.0401035 -.026099999999999998
      3 543 3    -.0401035 -.026099999999999998
      3 543 3    -.0401035 -.026099999999999998
      3 543 3    -.0401035 -.026099999999999998
      3 543 3    -.0401035 -.026099999999999998
      3 543 3    -.0401035 -.026099999999999998
      3 543 3    -.0401035 -.026099999999999998
      3 543 3    -.0401035 -.026099999999999998
      3 543 3    -.0401035 -.026099999999999998
      3 543 3    -.0401035 -.026099999999999998
      3 543 3    -.0401035 -.026099999999999998
      3 543 3    -.0401035 -.026099999999999998
      3 543 3    -.0401035 -.026099999999999998
      3 543 3    -.0401035 -.026099999999999998
      3 543 3    -.0401035 -.026099999999999998
      3 562 3    .01261218                .0171
      3 562 3    .01261218                .0171
      3 562 3    .01261218                .0171
      3 562 3    .01261218                .0171
      3 564 4    .01944657  .013999999999999999
      3 564 4    .01944657  .013999999999999999
      3 564 4    .01944657  .013999999999999999
      3 564 4    .01944657  .013999999999999999
      3 564 4    .01944657  .013999999999999999
      3 564 4    .01944657  .013999999999999999
      3 564 4    .01944657  .013999999999999999
      3 564 4    .01944657  .013999999999999999
      3 564 4    .01944657  .013999999999999999
      3 564 4    .01944657  .013999999999999999
      3 564 4    .01944657  .013999999999999999
      3 564 4    .01944657  .013999999999999999
      3 564 4    .01944657  .013999999999999999
      3 564 4    .01944657  .013999999999999999
      3 564 4    .01944657  .013999999999999999
      3 564 4    .01944657  .013999999999999999
      3 564 4    .01944657  .013999999999999999
      3 564 4    .01944657  .013999999999999999
      3 564 4    .01944657  .013999999999999999
      3 564 4    .01944657  .013999999999999999
      3 565 4  -.005436508               -.0196
      3 565 4  -.005436508               -.0196
      3 565 4  -.005436508               -.0196
      3 565 4  -.005436508               -.0196
      3 565 4  -.005436508               -.0196
      end
      format %tm ym
      ------------------ copy up to and including the previous line ------------------

      Listed 100 out of 275 observations
      Use the count() option to list more


      When I ran the OLS using the following code:

      bys P: reg exret_m mktrf
      Each portfolio will generate an extreme large t-statistic and extreme small standard error for the estimated coefficient of the constant term (b_cons), such as the following example:

      Code:
      -> P = 1
      
            Source |       SS           df       MS      Number of obs   =  11396617
      -------------+----------------------------------   F(1, 11396615)  >  99999.00
             Model |  10825.1192         1  10825.1192   Prob > F        =    0.0000
          Residual |  8564.82914  11396615  .000751524   R-squared       =    0.5583
      -------------+----------------------------------   Adj R-squared   =    0.5583
             Total |  19389.9483  11396616  .001701378   Root MSE        =    .02741
      
      ------------------------------------------------------------------------------
          exret_m1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
             mktrf |   .6507903   .0001715  3795.29   0.000     .6504542    .6511264
             _cons |   .0061281   8.19e-06   748.64   0.000     .0061121    .0061442
      ------------------------------------------------------------------------------
      
      --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      -> P = 2
      
            Source |       SS           df       MS      Number of obs   = 5,609,182
      -------------+----------------------------------   F(1, 5609180)   >  99999.00
             Model |   9219.0744         1   9219.0744   Prob > F        =    0.0000
          Residual |  4592.82575 5,609,180  .000818805   R-squared       =    0.6675
      -------------+----------------------------------   Adj R-squared   =    0.6675
             Total |  13811.9002 5,609,181  .002462374   Root MSE        =    .02861
      
      ------------------------------------------------------------------------------
          exret_m1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
             mktrf |   .8327924   .0002482  3355.47   0.000      .832306    .8332789
             _cons |   .0054813   .0000122   450.34   0.000     .0054575    .0055052
      ------------------------------------------------------------------------------

      which confuses me very much! Do you possibly know why? Thank you a lot!
      Last edited by Jae Li; 19 Jan 2018, 15:14.

      Comment


      • #4
        Why do you think the 'correct' t-stat is 5.99? You have a ton of observations and you are running a univariate regression on variables that are highly related - it should not be surprising that the t statistics are huge. Are you trying to replicate results using someone else's dataset? Are you sure that your specification is correct, i.e. the model you're estimating is not supposed to include additional terms, fixed effects, etc? Robust or cluster-robust standard errors will probably make the t-stat go down, but almost certainly not by three orders of magnitude. It is tough for us to say more without knowing anything else about what you're doing.
        Last edited by Michael Droste; 19 Jan 2018, 15:46.

        Comment


        • #5
          Well, this is very strange looking data. In your 100-observation example there are actually only 8 different observations. Everything else is a duplicate of one of those. In a data set like that, it is unsurprising to see very small standard errors and very large t-statistics.

          Moreover, if we run your -by P, sort: reg exret_m mktrf- command in your example data, in each regression there are only two distinct observations of mktrf and exret_m in each P, so all of those regressions constitute fitting a line to just two points. The fit is perfect, and the standard errors are zero and the t-statistics (if they could be calculated) would be infinite.

          I suspect there is something wrong with this data set. It does not seem sensible to have large numbers of replicate observations like that. If that is, indeed, the real data, then your results, whether you like them are not, are what the data delivers.

          Comment


          • #6
            @Michael Droste @Clyde Schechter Hi Michael and Clyde, thank you for your replies! Yes, I am replicating a paper and their t-statistic of second decile portfolio is 5.99. However, the paper doesn't explain the details of the t-statistics and just said t-statistics are shown below the coefficient estimates, so I have no idea if they ever used additional terms. Your posts give me more ideas so when I try to put more specifications, such as random effects, fixed effects, robust or cluster robust options. All the trials basically only give two different results and here are output results:

            Code:
             . xtreg exret_m mktrf if P ==1, re vce(cluster id)
            
            Random-effects GLS regression                   Number of obs     = 11,396,617
            Group variable: id                              Number of groups  =     10,397
            
            R-sq:                                           Obs per group:
                 within  = 0.5630                                         min =          1
                 between = 0.6727                                         avg =    1,096.1
                 overall = 0.5583                                         max =     16,901
            
                                                            Wald chi2(1)      =  151446.11
            corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
            
                                            (Std. Err. adjusted for 10,397 clusters in id)
            ------------------------------------------------------------------------------
                         |               Robust
                exret_m1 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                   mktrf |   .6484091   .0016662   389.16   0.000     .6451435    .6516747
                   _cons |    .005136   .0001138    45.15   0.000     .0049131     .005359
            -------------+----------------------------------------------------------------
                 sigma_u |  .01152767
                 sigma_e |  .02683067
                     rho |  .15582957   (fraction of variance due to u_i)
            ------------------------------------------------------------------------------

            Code:
            . xtreg exret_m mktrf if P ==1, fe vce(cluster id)
            
            Fixed-effects (within) regression               Number of obs     = 11,396,617
            Group variable: id                              Number of groups  =     10,397
            
            R-sq:                                           Obs per group:
                 within  = 0.5630                                         min =          1
                 between = 0.6727                                         avg =    1,096.1
                 overall = 0.5583                                         max =     16,901
            
                                                            F(1,10396)        =  151179.05
            corr(u_i, Xb)  = 0.0212                         Prob > F          =     0.0000
            
                                            (Std. Err. adjusted for 10,397 clusters in id)
            ------------------------------------------------------------------------------
                         |               Robust
                exret_m1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                   mktrf |   .6482425   .0016672   388.82   0.000     .6449745    .6515106
                   _cons |   .0061434     .00001   612.85   0.000     .0061238    .0061631
            -------------+----------------------------------------------------------------
                 sigma_u |  .01258796
                 sigma_e |  .02683067
                     rho |  .18040445   (fraction of variance due to u_i)
            ------------------------------------------------------------------------------
            The data is downloaded from CRSP database and exret_m is the average portfolio excess return which is equal to the average monthly portfolio return - risk-free rate.

            But the above trials also give high t-statistics, do you possibly know other specifications that may give better results? Many thanks for your advices!
            Last edited by Jae Li; 20 Jan 2018, 08:08.

            Comment

            Working...
            X