Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Rolling Regressions through MACRO

    Hi everyone,

    I want to run some macro regressions but i don't know anything about macros in stata.

    I have a data set for which i want to estimate coefficients. I want stata to calculate coefficients on monthly basis by taking historical 60 months data to give me estimated coefficients and then roll by one month i.e. remove first month and take 61st month to calculate the results for next observation. I also want stata to generate estimated coefficient results which can be saved as "const b1 b2 b3". I have many firms for which i want to do this. I have data for 312 months for each firm. For each firm, my first results will appear from 61st observation as i utilize first 60 months to estimate the 1st observation and then on rolling basis it can generate results for all 312 observation starting from 61st through a loop.

    My dependent variable is return
    three independent variables are mkt-rf hml smb i want estimated coefficients for these three variables plus coefficient for constant based on 60 months historical data and then making rolling regressions as explained above.

    Firm ID is defined by gvkey. for each firm ID (gvkey) i have months starting from 1 to 312.

    My data looks like this.
    gvkey months return mkt_rf smb hml
    10026 1 0.015873 0.95 0.68 -0.42
    10026 2 0.375 0.02 0.79 0.88
    10026 3 0.056818 -0.09 0.72 -0.01
    10026 4 0.193548 0.18 -1.12 0.43
    10026 5 -0.04505 0.69 0.03 0.2
    10026 6 0.075472 -0.64 0.83 -0.03
    10026 7 0.052632 0.39 0.36 -0.43
    10026 8 -0.075 -0.15 0.45 -0.16
    10026 9 0 0.53 0.11 -0.04
    10026 10 -0.0991 0.09 0.75 -0.43


  • #2
    You can do this with the -rolling- command. Read -help rolling- for instructions. You will need to -xtset gvkey- first. You don't need to know anything about macros to do this. -rolling- will create a new data set containing the regression parameters. If you want those in the original data, you can just -merge- the two data sets together.

    Comment


    • #3
      Thanks mate. Its working. But its taking too much time. I think this is happening because of my data. It has more than 200,000 observations.

      Comment


      • #4
        Indeed, rolling regression should take much time. Recently, I did it for around 8000 observations under a Poisson model, and it stroke me as quite time-consuming. Well, in my case, having a sample size quite smaller than two hundred thousand observations, that was something a nice pause for a cup of coffee would perfectly atone for...
        Best regards,

        Marcos

        Comment


        • #5
          rangestat (from SSC) is several orders of magnitude faster at performing regressions over a rolling window of time than any other solution in Stata. The only downside is that the Mata code to perform the regression must be supplied by the user. I have shown several times on this list how to do this when all that is needed is the regression coefficients. Here's an example that matches the description and size of the data in the initial post. With just over 200,000 regression performed, the whole thing runs in less than 3 seconds on my computer.

          Code:
          * fake data for 650 firms, each with 312 monthly obs
          clear
          set seed 3424
          set obs 650
          gen gvkey = _n
          expand 312
          bysort gvkey: gen months = _n
          gen return = runiform()
          gen mkt_rf = runiform()
          gen smb       = runiform()
          gen hml    = runiform()
          
          * ------------ regressions over a rolling window using -rangestat- --------
          * define a linear regression in Mata using quadcross() - help mata cross(), example 2
          mata:
          mata clear
          mata set matastrict on
          real rowvector myreg(real matrix Xall)
          {
              real colvector y, b, Xy
              real matrix X, XX
          
              y = Xall[.,1]                // dependent var is first column of Xall
              X = Xall[.,2::cols(Xall)]    // the remaining cols are the independent variables
              X = X,J(rows(X),1,1)         // add a constant
              
              XX = quadcross(X, X)        // linear regression, see help mata cross(), example 2
              Xy = quadcross(X, y)
              b  = invsym(XX) * Xy
              
              return(rows(X), b')
          }
          end
          
          * -rangestat- is from SSC; to install, type: ssc install rangestat
          rangestat (myreg) return mkt_rf smb hml, by(gvkey) interval(months -60 -1) casewise
          rename myreg* (nobs b_mkt_rf b_smb b_hml b_cons)
          You can easily check the rangestat results for any observation by manually performing the regression. Here are two examples:

          Code:
          . * replicate results for observations 100
          . list gvkey months nobs b_mkt_rf b_smb b_hml b_cons in 100
          
               +-------------------------------------------------------------------------+
               | gvkey   months   nobs    b_mkt_rf        b_smb        b_hml      b_cons |
               |-------------------------------------------------------------------------|
          100. |     1      100     60   .08052623   -.04333148   -.24286608   .54064748 |
               +-------------------------------------------------------------------------+
          
          . regress return mkt_rf smb hml if gvkey == gvkey[100] & inrange(months, months[100]-60, months[100]-1)
          
                Source |       SS           df       MS      Number of obs   =        60
          -------------+----------------------------------   F(3, 56)        =      1.51
                 Model |   .35887484         3  .119624947   Prob > F        =    0.2230
              Residual |  4.44834553        56  .079434742   R-squared       =    0.0747
          -------------+----------------------------------   Adj R-squared   =    0.0251
                 Total |  4.80722037        59  .081478311   Root MSE        =    .28184
          
          ------------------------------------------------------------------------------
                return |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                mkt_rf |   .0805262   .1285504     0.63   0.534    -.1769912    .3380436
                   smb |  -.0433315   .1339048    -0.32   0.747     -.311575     .224912
                   hml |  -.2428661   .1197586    -2.03   0.047    -.4827713   -.0029608
                 _cons |   .5406475   .1183699     4.57   0.000     .3035241    .7777708
          ------------------------------------------------------------------------------
          
          . 
          . * replicate results for observations 1000
          . list gvkey months nobs b_mkt_rf b_smb b_hml b_cons in 1000
          
                +-----------------------------------------------------------------------+
                | gvkey   months   nobs   b_mkt_rf       b_smb        b_hml      b_cons |
                |-----------------------------------------------------------------------|
          1000. |     4       64     60   .1857548   .15517557   -.06921163   .34908404 |
                +-----------------------------------------------------------------------+
          
          . regress return mkt_rf smb hml if gvkey == gvkey[1000] & inrange(months, months[1000]-60, months[1000]-1)
          
                Source |       SS           df       MS      Number of obs   =        60
          -------------+----------------------------------   F(3, 56)        =      1.12
                 Model |  .268983879         3  .089661293   Prob > F        =    0.3474
              Residual |  4.46951787        56  .079812819   R-squared       =    0.0568
          -------------+----------------------------------   Adj R-squared   =    0.0062
                 Total |  4.73850175        59  .080313589   Root MSE        =    .28251
          
          ------------------------------------------------------------------------------
                return |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                mkt_rf |   .1857548   .1281928     1.45   0.153    -.0710462    .4425558
                   smb |   .1551756   .1414196     1.10   0.277     -.128122    .4384731
                   hml |  -.0692116   .1304275    -0.53   0.598    -.3304894    .1920661
                 _cons |    .349084   .1297137     2.69   0.009     .0892362    .6089319
          ------------------------------------------------------------------------------
          Note that the number of observations is returned in the nobs variable. You can use that to exclude results if the number of observations within the window is below a desired threshold.

          Comment


          • #6
            Thanks for the code Sir. However, I already got what i wanted. Will definitely try this code next time i perform similar regressions.

            Comment


            • #7
              Hello,

              Robert Picard Can you help me on mata code in #5. How should i put my variable sin matrix, for example my dependent variable is "ret" and 3 independent variables are "market, size,value" and a "constant".

              Thanks in advance,

              Comment


              • #8
                rangestat has been updated since and there's a built-in regression function so no need to define a Mata function anymore. There's an example of how to run a regression over a rolling window of time in the help file.

                Comment


                • #9
                  Hello,

                  In continuation to #7, I exactly copy pasted the code in #5 for mata



                  Code:
                   
                   mata: mata clear mata set matastrict on real rowvector myreg(real matrix Xall) {     real colvector y, b, Xy     real matrix X, XX      y = Xall[.,1]                // dependent var is first column of Xall     X = Xall[.,2::cols(Xall)]    // the remaining cols are the independent variables     X = X,J(rows(X),1,1)         // add a constant          XX = quadcross(X, X)        // linear regression, see help mata cross(), example 2     Xy = quadcross(X, y)     b  = invsym(XX) * Xy          return(rows(X), b') } end
                  and then run the rangestat command
                  Code:
                  rangestat (myreg) ret mktr_f smb hml, by(permno) interval(months -60 -1) casewise
                  .

                  I got the results. but i am not sure whether they are correct!. Please help

                  Comment


                  • #10
                    In continuation of #8, please update rangestat and use the new version. To update, type in Stats's command window:
                    Code:
                    adoupdate rangestat
                    No need to mess around with Mata with the new version. Read the help file, it explains how you can confirm that the results are correct.

                    Comment

                    Working...
                    X