No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • ASREG: Rolling window regressions and by groups regressions (rolling betas)

    Hi everyone
    Thanks to Kit Baum for uploading version 1 of asreg on SSC. The program can be downloaded by typing
    ssc install asreg
    . Short description and examples of asreg are presented below:


    To estimate rolling window regressions in Stata, the conventional method is to use the rolling command of Stata. However, that command is too slow, especially for larger data sets. As a speedy alternative, I have written asreg that fits a model of depvar on indepvars using linear regression in a user's defined rolling window or by a grouping variable. asreg is order of magnitude faster than estimating rolling window regressions through conventional methods such as Stata loops or using the Stata's official rolling command. asreg has the same speed efficiency as asrol (ssc des asrol). All calculations of the rolling windows, estimation of regression parameters, and writing the results to Stata variables are done in the Mata language.

    Rolling window calculations require lots of looping over observations. The problem is compounded by different data structures such as unbalanced panel data, data with many duplicates, and data with many missing values. Yet, there might be data sets that have both time series gaps as well as many duplicate observations across groups. asreg does not use a static code for all types of data structures. Instead, asreg intelligently identifies data structures and matches one of its rolling window routines with the data characteristics. Therefore, the rolling window regressions are fast even in larger data sets.

    asreg writes all regression outputs to the data in memory as separate variables. This eliminates the need for writing results to a separate file, and then merging those results back to the data for further processing. New variables from the regression results follow the following naming conventions:

    observations variable containing number of observation is named as obs_N

    regression slopes a prefix of _b_ is added to the name of each independent variables

    constant variable containing constant of the regression is names as _b_cons

    r-squared r-squared and adj. r-squared are named as _R2 and _AdjR2 , respectively

    standard errors a prefix of _se_ is added to the name of each interdependent variables

    residuals variable containing residuals is named as _residuals

    fitted variable containing fitted values is named as _fitted.


    asreg has the following options.

    1. window:

    Option window(rangevar #) specifies length of the rolling window. The window option accepts up to two arguments. If we have already declared our data as panel or time series data, asreg will automatically pick the time variable. In such cases, option window can have one argument, that is the length of the window, e.g., window(5). If our data is not time series or panel, then we have to specify the time variable as a first argument of the option window. For example, if our time variable is year and we want a rolling window of 24, then option window will look like:

       window( year 24)
    2. recursive:

    The option recursive specifies that a recursive window be used. In time series analysis, a recursive window refers to a window where the starting period is held fixed, the ending period advances, and the window size grows. asreg allows a recursive window either by invoking the option recursive or setting the length of the window greater than or equal to the sample size per group. For example, if sample size of our data set is 1000 observation per group, we can use a recursive analysis by setting the window length equal to 1000 or greater than 1000

    3. by:

    asreg is byable. Hence, it can be run on groups as specified by option by(varlist) or the bysort varlist: prefix. An example of such regression might be Fama and MacBeth (1973) second stage regression, which is estimated cross-sectionally in each time period. Therefore, the grouping variable in this case would be the time variable. Assume that we have our dependent variable named as stock_returns, independent variable as stock_betas, and time variable as month_id, then to estimate the cross-sectional regression for each month, asreg command will look like:

      . bys month_id: asreg stock_return stock_betas

    4. minimum:

    asreg estimates regressions where number of observations are greater than number of regressors. However, there is a way to limit the regression estimates to a desired number of observations. The option minimum can be used for this purpose. If option min is used, asreg then finds the required number of observation for the regression estimated such that :

    obs = max(number of regressors (including the intercept), minimum observation as specified by the option min)

    For example, if we have 4 explanatory variables, then the number of regressors will be equal to 4 plus 1 i.e. 5. Therefore, if asreg receives the the value of 8 from the option min, the required number of observations will be : max(5,8) = 8. If a specific rolling window does not have that many observations, values of the new variable will be replaced with missing values.


    fitted reports residuals and fitted values for the last observation in the rolling window. If option window is not specified, then
    the residuals are calculated withing each group as specified by the option by(varlist) or the bysort varlist:

    serror reports standard errors for each explanatory variable

    other Most commonly used regression statistics such as number of observations, slope coefficients, r-squared, and adjusted r-squared
    are written to new variables by default. Therefore, if these statistics are not needed, they can be dropped once asreg is

    The following examples can be easily run by clicking the clickable links inside the help file of asreg.
    Example 1: Regression for each company in a rolling window of 10 years

     webuse grunfeld
        . bys company: asreg invest mvalue kstock, wind(year 10)
    The grunfeld data set is a panel data set, so we can omit the word year from the option window. Therefore, the command can also be estimated
    as shown below:

     bys company: asreg invest mvalue kstock, wind(10)

    Example 2: Regression for each company in a recursive window

    . webuse grunfeld
        . bys company: asreg invest mvalue kstock, wind(year 10) rec

     bys company: asreg invest mvalue kstock, wind(year 1000)
    Example 3: Using option minimum

     webuse grunfeld
        . bys company: asreg invest mvalue kstock, wind(10) min(5)
    Example 4: Reporting standard errors

    webuse grunfeld
        . bys company: asreg invest mvalue kstock, wind(10) se

    Example 5: Reporting standard errors, fitted values and residuals

    webuse grunfeld
        . bys company: asreg invest mvalue kstock, wind(10) se fit

    Example 6: No window - by groups regressions

    webuse grunfeld
        . bys company: asreg invest mvalue kstock

    Example 7: Yearly cross-sectional regressions

     webuse grunfeld
        . bys year: asreg invest mvalue kstock
    Please note: The beta version leaves behind a variable TimeDiff, which I shall work upon and upload a revised version soon to SSC .
    Since this is a beta version, I would appreciate any bug reporting or suggestions on what other options can be added.
    Last edited by Attaullah Shah; 03 May 2017, 12:08.
    Attaullah Shah, PhD.
    Associate Professor of Finance, Institute of Management Sciences Peshawar, Pakistan

  • #2
    As an update to the previous post, I am happy to share that an updated version of asreg is now available on SSC. New users can type
    ssc install asreg
    and existing users can update the ado files by
    adoupdate asreg, update
    The updated version fixes a minor bug in the earlier version that was leaving behind a temporary variable.
    Attaullah Shah, PhD.
    Associate Professor of Finance, Institute of Management Sciences Peshawar, Pakistan


    • #3
      Hi Attaullah,

      Thank you so much for this very helpful package. I was hoping you might be able to help me understand why I am receiving the following error:
      asregw(): 3301 subscript invalid
      <istmt>: - function returned error

      I am running the following regression:
      bys permno: asreg retlessrf mkt_prem, wind(10) se fit

      I ran tsset on the data beforehand (unbalanced panel with gaps). Do you have any ideas? I'm really hoping to get this command to work as it seems much preferable to the rolling command.

      Thank you!


      • #4
        I can't try to debug asreg for you, or alternatively explain what's wrong with your code, not least because its Mata code is hidden and there is no data example here.

        While Attaullah Shah is explaining what went wrong, an alternative is rangestat from SSC by Robert Picard and friends. Here is some token code: The regression here is presumably just silly.

        webuse grunfeld , clear 
        rangestat (reg) mvalue kstock, interval(year -9 0) by(company) 
        gen fitted = b_cons + b_kstock * kstock
        I don't understand the exact definition of windows in asreg from a quick glance at its help. If window(10) means (e.g.) the previous 10 values, not including the present observation, your rangestat equivalent would be interval(year -10 -1)


        • #5
          Dear Katharine Adame , the updated version of asreg requires two arguments in the option window. Therefore, if you change the command to the following, it will solve the problem. Please note that I assume your time variable is named as year. If it is named differently, then replace year with actual name of the variable.
          bys permno: asreg retlessrf mkt_prem, wind(year 10) se fit
          Attaullah Shah, PhD.
          Associate Professor of Finance, Institute of Management Sciences Peshawar, Pakistan


          • #6
            Dear Attaullah Shah ,

            your programme seems very helpful for my intended analysis. However, I am unfortunately experiencing a problem when running the described command.

            I have an unbalanced Panel Data set with:
            xtset GlobalCompanyCode DataYearFiscal, yearly

            After preparing my dataset (deleting observations with missing values, computing the variables and so on), I ran the following command

            bys GlobalCompanyKey: asreg ActualLeverage_w FirmSize_w RDExp_w AdvertisingExp_w TangibleAssets_w , wind(DataYearFiscal 5)

            the command runs smoothly without any error message and also generates the variables mentioned in your post, such as beta coefficients and R².
            However, when I browse my data, those variables have only missing values.

            Could you please tell me, what I am doing wrong?
            Thank you so much in advance
            Kind regards
            Stefanie Peiffer

            Student at Maastricht University, the Netherlands


            • #7
              You are asking for an equivalent of

              regress ActualLeverage_w FirmSize_w RDExp_w AdvertisingExp_w TangibleAssets_w
              for windows of 5 observations. So with 4 predictors you are estimating 5 parameters from 5 values.

              That can't be done usefully. It's on all fours with fitting a straight line to 2 distinct points. Here is a dopey example.

              .  sysuse auto
              (1978 Automobile Data)
              . regress mpg price weight trunk length in 1/5
                    Source |       SS           df       MS      Number of obs   =         5
              -------------+----------------------------------   F(4, 0)         =         .
                     Model |        38.8         4         9.7   Prob > F        =         .
                  Residual |           0         0           .   R-squared       =    1.0000
              -------------+----------------------------------   Adj R-squared   =         .
                     Total |        38.8         4         9.7   Root MSE        =         0
                       mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                     price |   -.000583          .        .       .            .           .
                    weight |  -.0072496          .        .       .            .           .
                     trunk |  -.0949042          .        .       .            .           .
                    length |   .1212445          .        .       .            .           .
                     _cons |   24.12378          .        .       .            .           .
              You need a longer window or a simpler model. No program can solve this otherwise. (If asreg doesn't give you R-square identically 1, that is a side puzzle.)

              Alternatively, if you meant to pool your companies, don't supply the by: prefix.
              Last edited by Nick Cox; 13 May 2018, 09:17.


              • #8
                Thank you Nick Cox for your quick response.
                I am using Stata for my Master Thesis for the first time, so I should probably tell more about what I want to do.

                I am replicating an academic paper, which outlines the following model:

                Actual Leverage = ß0 + ß1*FirmSize +ß2*RDExp + ß3*Advertising Exp + ß4*TangibleAssets + 20 Year Dummy Variables and 10 Industry Dummy Variables

                I want to run this Regression for each Company in my Data set, based on a past number of years (the number of years, I can Chose freely, so I could also use window 10 if that helps).
                So I thought I Need to use a rolling Regression, where asreg seemed a faster and better alternative as the new variables are stored in the original data set.

                I will then use the all ß coefficients to calculate a new variable called TargetLeverage for each Observation in my data set.

                So if I understand you correctly, I Need to Chose a window that is bigger than my number of Independent variables?

                Thank you so much for your help


                • #9
                  10 years is far too few for the model you cite if you fit it the way you did. Put differently, it's not at all the same model, as it appears that it would be applied to data for several companies.

                  I never try to fit a model with 35 parameters, but if I did I would want hundreds of data points at least.I am not an economist and I fear that you need advice from someone who is.