Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Weighted Least Squares vs. Transformation of Data

    Dear Stata Community

    I am trying to perform a weighted least squares (WLS) regression in Stata and try tro crossverify my results by using the aweight function as well as transforming the data in my dataset. Reason for the regression is performing a calender time portfolio approach as outlined by Mitchell and Stafford (2000) or Kotari and Warner (1997) among others.

    I try to use as weights the square root of the number of firms available in each month. I.e. I try to perform the following regression:
    HTML Code:
     regress y*sqrt(n)=b0+b1*x1*sqrt(n)+b2*x2*sqrt(n)+b3*x3*sqrt(n)
    According to the technical note information in the -rregress- help file, this is equivalently to using [aweights=n], i.e. the regression is adjusted as provided above and the coefficients (apart from the mean squared errors / variance of the residuals) should coincide. Unfortunately, this is not the case when performing the regression as outlined above.

    For clarification, please find the codes I perform below:

    Regression with transformed data:
    HTML Code:
     gen yt=y*sqrt(n)
    gen x1t=x1*sqrt(n)
    gen x2t=x2*sqrt(n)
    gen x3t=x3*sqrt(n)
    regress yt x1t x2t x3t
    
    Hence, for my data:
    
    gen a=ewportexc_t*sqrt(number)
    gen b=mktexcess*sqrt(number)
    gen c=smbl*sqrt(number)
    gen d=hml*sqrt(number)
    
     regress a b c d
    
          Source |       SS       df       MS              Number of obs =     214
    -------------+------------------------------           F(  3,   210) =  185.89
           Model |  135.306525     3  45.1021752           Prob > F      =  0.0000
        Residual |  50.9518619   210  .242627914           R-squared     =  0.7264
    -------------+------------------------------           Adj R-squared =  0.7225
           Total |  186.258387   213  .874452523           Root MSE      =  .49257
    
    ------------------------------------------------------------------------------
               a |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               b |   1.136941   .0506906    22.43   0.000     1.037014    1.236869
               c |   .2326233   .0634894     3.66   0.000      .107465    .3577816
               d |   .4220965    .067242     6.28   0.000     .2895407    .5546522
           _cons |   .0082994   .0341087     0.24   0.808    -.0589398    .0755387
    ------------------------------------------------------------------------------
    Regression with raw data:
    HTML Code:
     regress y x1 x2 x3 [aweight=n]
    
    Hence, for my data:
    
    . regress ewportexc_t mktexcess smb hml [aweight=number]
    (sum of wgt is   4.2480e+04)
    
          Source |       SS       df       MS              Number of obs =     214
    -------------+------------------------------           F(  3,   210) =  186.81
           Model |  .685174046     3  .228391349           Prob > F      =  0.0000
        Residual |  .256737744   210  .001222561           R-squared     =  0.7274
    -------------+------------------------------           Adj R-squared =  0.7235
           Total |   .94191179   213  .004422121           Root MSE      =  .03497
    
    ------------------------------------------------------------------------------
     ewportexc_t |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
       mktexcess |   1.137592    .050611    22.48   0.000     1.037821    1.237363
             smb |   .2333695    .063853     3.65   0.000     .1074945    .3592444
             hml |   .4231413   .0675373     6.27   0.000     .2900034    .5562791
           _cons |   .0002512   .0024346     0.10   0.918    -.0045481    .0050505
    ------------------------------------------------------------------------------
    As visible, the coefficients are different, especially the constant term which is very important for my further analysis as it provides the monthly abnormal return which is not explained by either smb, hml or mktexcess. Does someone have any inputs on why the coefficients do not coincide or suggestions on how the problem should be addressed?

    Thank you in advance and kind regards
    Andreas Mueller

  • #2
    In the first regression you forgot to multiply the constant by the weight; and that regression should be performed without a constant.

    Comment


    • #3
      Dear Joao,

      Thank you very much for the answer to my question. With hindsight, the solution is so obvious...

      Regards
      Andreas

      Comment


      • #4
        Don't feel bad; everybody falls for that one at some point ;-)

        Glad I could help,

        Joao

        Comment

        Working...
        X