Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Linear regressions

    Hello,

    My objective is to create 3 new variables that return the coefficients from the following regression: reti,t= ai+B1i RMRFt+B2iSMBt+B3iHMLt+ei,t. So the 3 variables should return B1, B2 and B3.
    This regression should be done by permno (indicative of each stock) in every month t over the period t-36 to t-1. The regression should not return anything if there is any missing value in the range t-36 to t-1.

    I believe this is basically a simple regression over a 36-month rolling window. But I do not know how to do it.. Hope you can help me.



    input double permno long date double(ret MktRF SMB HML RF)
    10000 312 . .65 1.22 .56 .56
    10000 313 -.2571428716182709 7.13 -.65 -.93 .53
    10000 314 .36538460850715637 4.88 -.51 -.47 .6
    10000 315 -.09859155118465424 -1.31 2.84 -2.91 .52
    10000 316 -.22265625 4.62 -1.31 -.11 .49
    10000 317 -.005025125574320555 1.03 -.9 1.39 .52
    10000 318 -.08080808073282242 -6.45 -3.38 4.77 .52
    10000 319 -.6153846383094788 6.07 -4.18 3.51 .46
    10000 320 -.05714285746216774 -8.6 2.28 3.19 .45
    10000 321 -.24242424964904785 4.66 -2.49 -1.34 .46
    10000 322 .05999999865889549 1.17 -1.92 -.05 .39
    10000 323 -.37735849618911743 -3.27 .08 .33 .49
    10000 324 -.21212121844291687 12.47 -1.8 -3.17 .42
    10000 325 0 4.39 3.49 -5.98 .43
    10000 326 -.38461539149284363 1.64 .37 1.64 .47
    10000 327 -.0625 -2.11 -1.7 -.32 .44
    10000 328 -.06666667014360428 .11 -.52 .12 .38
    10000 329 . 3.94 -2.19 1.06 .48
    10001 312 . .65 1.22 .56 .56
    10001 313 .020408162847161293 7.13 -.65 -.93 .53
    10001 314 .025200003758072853 4.88 -.51 -.47 .6
    10001 315 .009900989942252636 -1.31 2.84 -2.91 .52
    10001 316 -.009803921915590763 4.62 -1.31 -.11 .49
    10001 317 -.013069307431578636 1.03 -.9 1.39 .52
    10001 318 -.010204081423580647 -6.45 -3.38 4.77 .52
    10001 319 .07216494530439377 6.07 -4.18 3.51 .46
    10001 320 -.003076923545449972 -8.6 2.28 3.19 .45
    10001 321 .03921568766236305 4.66 -2.49 -1.34 .46
    10001 322 .056603774428367615 1.17 -1.92 -.05 .39
    10001 323 .014999999664723873 -3.27 .08 .33 .49
    10001 324 -.0357142873108387 12.47 -1.8 -3.17 .42
    10001 325 -.07407407462596893 4.39 3.49 -5.98 .43
    10001 326 .03680000081658363 1.64 .37 1.64 .47
    10001 327 -.03921568766236305 -2.11 -1.7 -.32 .44
    10001 328 -.0714285746216774 .11 -.52 .12 .38
    10001 329 .051428571343421936 3.94 -2.19 1.06 .48
    10001 330 .021276595070958138 3.85 -.63 .71 .46
    10001 331 .0833333358168602 3.52 -.75 -.93 .47
    10001 332 -.02230769209563732 -2.59 .53 .28 .45
    10001 333 .019999999552965164 -23.24 -8.42 4.23 .6
    10001 334 -.029411764815449715 -7.77 2.76 3.08 .35
    10001 335 -.03353535383939743 6.81 .13 -4.45 .39
    10001 336 .06382978707551956 4.21 -.7 5.17 .29
    10001 337 .07999999821186066 4.75 3.35 -1.65 .46
    10001 338 -.07629629969596863 -2.27 6.15 .75 .44
    10001 339 .030612245202064514 .56 .96 1.68 .46
    10001 340 .019801979884505272 -.29 -2.65 2.28 .51
    10001 341 -.01203883532434702 4.79 2.12 -1.11 .49
    10001 342 .029999999329447746 -1.25 -.21 2.27 .51
    10001 343 .029126213863492012 -3.31 .08 2.08 .59
    10001 344 -.021132076159119606 3.3 -1.26 -.69 .62


  • #2
    Luis:
    the following thread may be usefuil: http://www.statalist.org/forums/foru...c-coefficients
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Thanks for your answer but is not exactly what I am looking for. I need a rolling window based on the prior 36 months.

      Comment


      • #4
        Hi Luis,

        To the best of my knowledge, there is no easy way to perform a running linear model on t-x observations. If there is, I should be very interested in learning about it myself. However, I’ve put together a loop that will compute what you have in mind. You can set the local period to 36 if this is your desired running period. Since your example data do not include more than, at max, 33 months of a single stock id, I made the parameter variable for the sake of code testing.
        Code:
        * generating variables
        capture drop B*
        gen B1 = .
        gen B2 = .
        gen B3 = .
        
        * defining constants
        local period = 10 // set number of months in to include in running linear model
        local counter = 1
        levelsof permno, local(stock_group)
        
            foreach stock in `stock_group' { // looping over permno groups
                qui count if permno == `stock'
                local obs = r(N)
                    forvalues i = 1/`obs' { // looping over observations within group
                        if `i' > `period' {
                            local ub = `counter' - 1
                            local lb = `counter' - `period'
                            qui reg ret MktRF SMB HML if permno == `stock' in `lb'/`ub'
                            mat B = e(b)
                                forvalues j = 1/3 { // looping over the three coefficients
                                    local coef = B[1,`j']
                                    qui replace B`j' = `coef' in `counter'
                                    }
                            }
                        local `counter++1'
                        }
                }
        The result is the following three variables:
        Code:
        . list permno date B*
        
             +---------------------------------------------------+
             | permno   date          B1          B2          B3 |
             |---------------------------------------------------|
          1. |  10000    312           .           .           . |
          2. |  10000    313           .           .           . |
          3. |  10000    314           .           .           . |
          4. |  10000    315           .           .           . |
          5. |  10000    316           .           .           . |
             |---------------------------------------------------|
          6. |  10000    317           .           .           . |
          7. |  10000    318           .           .           . |
          8. |  10000    319           .           .           . |
          9. |  10000    320           .           .           . |
         10. |  10000    321           .           .           . |
             |---------------------------------------------------|
         11. |  10000    322   -.0143497    .0233847    -.025888 |
         12. |  10000    323   -.0214537    .0044178   -.0437205 |
         13. |  10000    324   -.0034166    .0272796   -.0178577 |
         14. |  10000    325   -.0253416   -.0158852   -.0457147 |
         15. |  10000    326    -.026923   -.0116368   -.0513066 |
             |---------------------------------------------------|
         16. |  10000    327   -.0326573   -.0257506   -.0642102 |
         17. |  10000    328   -.0341195   -.0290519   -.0694975 |
         18. |  10000    329   -.0295798   -.0173819   -.0685643 |
         19. |  10001    312           .           .           . |
         20. |  10001    313           .           .           . |
             |---------------------------------------------------|
         21. |  10001    314           .           .           . |
         22. |  10001    315           .           .           . |
         23. |  10001    316           .           .           . |
         24. |  10001    317           .           .           . |
         25. |  10001    318           .           .           . |
             |---------------------------------------------------|
         26. |  10001    319           .           .           . |
         27. |  10001    320           .           .           . |
         28. |  10001    321           .           .           . |
         29. |  10001    322    .0033208   -.0006798    .0025692 |
         30. |  10001    323    .0017064   -.0049901   -.0014833 |
             |---------------------------------------------------|
         31. |  10001    324    .0019333   -.0044863   -.0014693 |
         32. |  10001    325   -.0013644   -.0073757   -.0015889 |
         33. |  10001    326     -.00073   -.0107427    .0024265 |
         34. |  10001    327    .0015319   -.0054373    .0073364 |
         35. |  10001    328    .0036817   -.0009017    .0119079 |
             |---------------------------------------------------|
         36. |  10001    329    .0019802   -.0056175     .011678 |
         37. |  10001    330    .0026174   -.0057151    .0131562 |
         38. |  10001    331    .0025089   -.0043781    .0141631 |
         39. |  10001    332    .0033047   -.0001245    .0167092 |
         40. |  10001    333    .0051395    .0050686    .0194405 |
             |---------------------------------------------------|
         41. |  10001    334    .0030841   -.0027612    .0148719 |
         42. |  10001    335    .0051358    -.010108    .0108124 |
         43. |  10001    336    .0060146   -.0111458    .0152604 |
         44. |  10001    337    .0048016   -.0106799    .0107507 |
         45. |  10001    338    .0036418    -.005132    .0067632 |
             |---------------------------------------------------|
         46. |  10001    339    .0045457   -.0091896    .0056927 |
         47. |  10001    340    .0047879   -.0094996     .006128 |
         48. |  10001    341    .0046542   -.0089206    .0054986 |
         49. |  10001    342    .0039554   -.0073475    .0071754 |
         50. |  10001    343    .0039042    -.007342     .007049 |
             |---------------------------------------------------|
         51. |  10001    344    .0059015    -.004611     .009349 |
             +---------------------------------------------------+
        The observations from 1 to 10 (36 when you change the parameter) are missing since there doesn't exist at least 10 prior observations yet. However, starting from the 11th observation, the variable B1 holds the coefficient of MktRF from the model below and so on.
        Code:
        . reg ret MktRF SMB HML if permno == 10000 in 1/10
        
              Source |       SS           df       MS      Number of obs   =         9
        -------------+----------------------------------   F(3, 5)         =      0.50
               Model |  .124858357         3  .041619452   Prob > F        =    0.6995
            Residual |  .417655649         5   .08353113   R-squared       =    0.2301
        -------------+----------------------------------   Adj R-squared   =   -0.2318
               Total |  .542514005         8  .067814251   Root MSE        =    .28902
        
        ------------------------------------------------------------------------------
                 ret |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
               MktRF |  -.0143497   .0338893    -0.42   0.690    -.1014648    .0727654
                 SMB |   .0233847    .080583     0.29   0.783    -.1837605    .2305299
                 HML |   -.025888   .0773589    -0.33   0.751    -.2247455    .1729694
               _cons |  -.0736962   .1114298    -0.66   0.538    -.3601355    .2127432
        ------------------------------------------------------------------------------
        Basically what I'm doing is, for each observation in your dataset - where you have at least 10 (36) prior observations within the same stock id - I estimate a separate model like the one above. I then return the coefficients and store them in the three variables B1, B2, and B3. No standrd errors are stored, but you can easily append the loop and get it to store these estimates as well.

        I am sorry for the somewhat complicated solution, but I couldn't come up with anything more simple. However, I hope this is what you had in mind.
        Last edited by Mathias Pedersen Heinze; 28 Sep 2016, 07:32. Reason: Updated result table

        Comment


        • #5
          You can use the rolling command to perform regressions over a rolling window of time. If you have a very large dataset, this may be slow, in which case you may want to use rangestat (from SSC). Here's a quick example:

          Code:
          clear
          input double permno long date double(ret MktRF SMB HML RF)
          10000 312 . .65 1.22 .56 .56
          10000 313 -.2571428716182709 7.13 -.65 -.93 .53
          10000 314 .36538460850715637 4.88 -.51 -.47 .6
          10000 315 -.09859155118465424 -1.31 2.84 -2.91 .52
          10000 316 -.22265625 4.62 -1.31 -.11 .49
          10000 317 -.005025125574320555 1.03 -.9 1.39 .52
          10000 318 -.08080808073282242 -6.45 -3.38 4.77 .52
          10000 319 -.6153846383094788 6.07 -4.18 3.51 .46
          10000 320 -.05714285746216774 -8.6 2.28 3.19 .45
          10000 321 -.24242424964904785 4.66 -2.49 -1.34 .46
          10000 322 .05999999865889549 1.17 -1.92 -.05 .39
          10000 323 -.37735849618911743 -3.27 .08 .33 .49
          10000 324 -.21212121844291687 12.47 -1.8 -3.17 .42
          10000 325 0 4.39 3.49 -5.98 .43
          10000 326 -.38461539149284363 1.64 .37 1.64 .47
          10000 327 -.0625 -2.11 -1.7 -.32 .44
          10000 328 -.06666667014360428 .11 -.52 .12 .38
          10000 329 . 3.94 -2.19 1.06 .48
          10001 312 . .65 1.22 .56 .56
          10001 313 .020408162847161293 7.13 -.65 -.93 .53
          10001 314 .025200003758072853 4.88 -.51 -.47 .6
          10001 315 .009900989942252636 -1.31 2.84 -2.91 .52
          10001 316 -.009803921915590763 4.62 -1.31 -.11 .49
          10001 317 -.013069307431578636 1.03 -.9 1.39 .52
          10001 318 -.010204081423580647 -6.45 -3.38 4.77 .52
          10001 319 .07216494530439377 6.07 -4.18 3.51 .46
          10001 320 -.003076923545449972 -8.6 2.28 3.19 .45
          10001 321 .03921568766236305 4.66 -2.49 -1.34 .46
          10001 322 .056603774428367615 1.17 -1.92 -.05 .39
          10001 323 .014999999664723873 -3.27 .08 .33 .49
          10001 324 -.0357142873108387 12.47 -1.8 -3.17 .42
          10001 325 -.07407407462596893 4.39 3.49 -5.98 .43
          10001 326 .03680000081658363 1.64 .37 1.64 .47
          10001 327 -.03921568766236305 -2.11 -1.7 -.32 .44
          10001 328 -.0714285746216774 .11 -.52 .12 .38
          10001 329 .051428571343421936 3.94 -2.19 1.06 .48
          10001 330 .021276595070958138 3.85 -.63 .71 .46
          10001 331 .0833333358168602 3.52 -.75 -.93 .47
          10001 332 -.02230769209563732 -2.59 .53 .28 .45
          10001 333 .019999999552965164 -23.24 -8.42 4.23 .6
          10001 334 -.029411764815449715 -7.77 2.76 3.08 .35
          10001 335 -.03353535383939743 6.81 .13 -4.45 .39
          10001 336 .06382978707551956 4.21 -.7 5.17 .29
          10001 337 .07999999821186066 4.75 3.35 -1.65 .46
          10001 338 -.07629629969596863 -2.27 6.15 .75 .44
          10001 339 .030612245202064514 .56 .96 1.68 .46
          10001 340 .019801979884505272 -.29 -2.65 2.28 .51
          10001 341 -.01203883532434702 4.79 2.12 -1.11 .49
          10001 342 .029999999329447746 -1.25 -.21 2.27 .51
          10001 343 .029126213863492012 -3.31 .08 2.08 .59
          10001 344 -.021132076159119606 3.3 -1.26 -.69 .62
          end
          save "main_statalist.dta", replace
          
          * ----------------------------- using rolling ----------------------------------
          
          * declare the data as panel and perform regressions on a 10 month window
          tsset permno date
          rolling _b, window(10) saving(betas_statalist, replace) reject(e(N) < 10):  ///
              regress ret MktRF SMB HML
          
          * match results to the correct observation (target is -10 to -1 month)
          use "betas_statalist.dta", clear
          gen date = end + 1
          merge 1:1 permno date using "main_statalist.dta", keep(match using) nogen
          sort permno date
          
          
          * ----------------------------- using rangestat (from SSC) ---------------------
          
          * define a linear regression in Mata using quadcross() - help mata cross(), example 2
          mata:
          mata clear
          mata set matastrict on
          real rowvector myreg(real matrix Xall)
          {
              real colvector y, b, Xy
              real matrix X, XX
          
              y = Xall[.,1]                // dependent var is first column of Xall
              X = Xall[.,2::cols(Xall)]    // the remaining cols are the independent variables
              X = X,J(rows(X),1,1)         // add a constant
              
              XX = quadcross(X, X)        // linear regression, see help mata cross(), example 2
              Xy = quadcross(X, y)
              b  = invsym(XX) * Xy
              
              return(rows(X), b')
          }
          end
          
          * regressions with a constant over a rolling window of 10 periods by permno
          rangestat (myreg) ret MktRF SMB HML, by(permno) interval(date -10 -1) casewise
          
          * the Mata function returns first the number of observations and then as many
          * variables as there are independent variables (plus the constant) for the betas
          rename (myreg1 myreg2 myreg3 myreg4 myreg5) (nobs B_MktRF B_SMB B_HML B_cons)
          
          * reject results if the number of obs is less than 10
          foreach v of varlist B_MktRF B_SMB B_HML B_cons {
              replace `v' = . if nobs < 10
          }

          Comment

          Working...
          X