Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • ASREG: Rolling window regressions and by groups regressions (rolling betas)

    Hi everyone
    Thanks to Kit Baum for uploading version 1 of asreg on SSC. The program can be downloaded by typing
    Code:
    ssc install asreg
    . Short description and examples of asreg are presented below:


    Description

    To estimate rolling window regressions in Stata, the conventional method is to use the rolling command of Stata. However, that command is too slow, especially for larger data sets. As a speedy alternative, I have written asreg that fits a model of depvar on indepvars using linear regression in a user's defined rolling window or by a grouping variable. asreg is order of magnitude faster than estimating rolling window regressions through conventional methods such as Stata loops or using the Stata's official rolling command. asreg has the same speed efficiency as asrol (ssc des asrol). All calculations of the rolling windows, estimation of regression parameters, and writing the results to Stata variables are done in the Mata language.

    Rolling window calculations require lots of looping over observations. The problem is compounded by different data structures such as unbalanced panel data, data with many duplicates, and data with many missing values. Yet, there might be data sets that have both time series gaps as well as many duplicate observations across groups. asreg does not use a static code for all types of data structures. Instead, asreg intelligently identifies data structures and matches one of its rolling window routines with the data characteristics. Therefore, the rolling window regressions are fast even in larger data sets.

    asreg writes all regression outputs to the data in memory as separate variables. This eliminates the need for writing results to a separate file, and then merging those results back to the data for further processing. New variables from the regression results follow the following naming conventions:

    observations variable containing number of observation is named as obs_N

    regression slopes a prefix of _b_ is added to the name of each independent variables

    constant variable containing constant of the regression is names as _b_cons

    r-squared r-squared and adj. r-squared are named as _R2 and _AdjR2 , respectively

    standard errors a prefix of _se_ is added to the name of each interdependent variables

    residuals variable containing residuals is named as _residuals

    fitted variable containing fitted values is named as _fitted.



    Options


    asreg has the following options.


    1. window:

    Option window(rangevar #) specifies length of the rolling window. The window option accepts up to two arguments. If we have already declared our data as panel or time series data, asreg will automatically pick the time variable. In such cases, option window can have one argument, that is the length of the window, e.g., window(5). If our data is not time series or panel, then we have to specify the time variable as a first argument of the option window. For example, if our time variable is year and we want a rolling window of 24, then option window will look like:

    Code:
       window( year 24)
    2. recursive:

    The option recursive specifies that a recursive window be used. In time series analysis, a recursive window refers to a window where the starting period is held fixed, the ending period advances, and the window size grows. asreg allows a recursive window either by invoking the option recursive or setting the length of the window greater than or equal to the sample size per group. For example, if sample size of our data set is 1000 observation per group, we can use a recursive analysis by setting the window length equal to 1000 or greater than 1000


    3. by:

    asreg is byable. Hence, it can be run on groups as specified by option by(varlist) or the bysort varlist: prefix. An example of such regression might be Fama and MacBeth (1973) second stage regression, which is estimated cross-sectionally in each time period. Therefore, the grouping variable in this case would be the time variable. Assume that we have our dependent variable named as stock_returns, independent variable as stock_betas, and time variable as month_id, then to estimate the cross-sectional regression for each month, asreg command will look like:

    Code:
      . bys month_id: asreg stock_return stock_betas

    4. minimum:

    asreg estimates regressions where number of observations are greater than number of regressors. However, there is a way to limit the regression estimates to a desired number of observations. The option minimum can be used for this purpose. If option min is used, asreg then finds the required number of observation for the regression estimated such that :

    obs = max(number of regressors (including the intercept), minimum observation as specified by the option min)

    For example, if we have 4 explanatory variables, then the number of regressors will be equal to 4 plus 1 i.e. 5. Therefore, if asreg receives the the value of 8 from the option min, the required number of observations will be : max(5,8) = 8. If a specific rolling window does not have that many observations, values of the new variable will be replaced with missing values.


    Statistics_Options


    fitted reports residuals and fitted values for the last observation in the rolling window. If option window is not specified, then
    the residuals are calculated withing each group as specified by the option by(varlist) or the bysort varlist:

    serror reports standard errors for each explanatory variable

    other Most commonly used regression statistics such as number of observations, slope coefficients, r-squared, and adjusted r-squared
    are written to new variables by default. Therefore, if these statistics are not needed, they can be dropped once asreg is
    estimated.


    Examples
    The following examples can be easily run by clicking the clickable links inside the help file of asreg.
    Example 1: Regression for each company in a rolling window of 10 years


    .
    Code:
     webuse grunfeld
     
        . bys company: asreg invest mvalue kstock, wind(year 10)
    The grunfeld data set is a panel data set, so we can omit the word year from the option window. Therefore, the command can also be estimated
    as shown below:

    .
    Code:
     bys company: asreg invest mvalue kstock, wind(10)

    Example 2: Regression for each company in a recursive window


    Code:
    . webuse grunfeld
     
        . bys company: asreg invest mvalue kstock, wind(year 10) rec
    OR


    .
    Code:
     bys company: asreg invest mvalue kstock, wind(year 1000)
    Example 3: Using option minimum


    .
    Code:
     webuse grunfeld
     
        . bys company: asreg invest mvalue kstock, wind(10) min(5)
    Example 4: Reporting standard errors


    .
    Code:
    webuse grunfeld
     
        . bys company: asreg invest mvalue kstock, wind(10) se

    Example 5: Reporting standard errors, fitted values and residuals


    .
    Code:
    webuse grunfeld
     
        . bys company: asreg invest mvalue kstock, wind(10) se fit

    Example 6: No window - by groups regressions


    .
    Code:
    webuse grunfeld
     
        . bys company: asreg invest mvalue kstock

    Example 7: Yearly cross-sectional regressions


    Code:
     webuse grunfeld
     
        . bys year: asreg invest mvalue kstock
    Please note: The beta version leaves behind a variable TimeDiff, which I shall work upon and upload a revised version soon to SSC .
    Since this is a beta version, I would appreciate any bug reporting or suggestions on what other options can be added.
    Last edited by Attaullah Shah; 03 May 2017, 12:08.

  • Attaullah Shah
    replied
    Since your window is 15, you should use the option min(15) to suppress the regression coefficients arising out of less than 15 observations.
    Code:
     . webuse grunfeld, clear
     
     . bys company: asreg invest mvalue kstock, wind(year 10) min(10)
    
         +--------------------------------------------------------------------+
         | _Nobs         _R2      _adjR2   _b_mvalue   _b_kstock      _b_cons |
         |--------------------------------------------------------------------|
      1. |     .           .           .           .           .            . |
      2. |     .           .           .           .           .            . |
      3. |     .           .           .           .           .            . |
      4. |     .           .           .           .           .            . |
      5. |     .           .           .           .           .            . |
         |--------------------------------------------------------------------|
      6. |     .           .           .           .           .            . |
      7. |     .           .           .           .           .            . |
      8. |     .           .           .           .           .            . |
      9. |     .           .           .           .           .            . |
     10. |    10   .46011588   .30586328   .05693085   .46800615    96.771625 |
         |--------------------------------------------------------------------|
     11. |    10    .5614129    .4361023   .08887698   .88729908   -126.96142 |
     12. |    10   .78380282   .72203219   .08872388   1.2292866   -214.23699 |
     13. |    10   .71488429   .63342265   .12516278   .42876425   -160.50868 |
     14. |    10   .45098464   .29412311   .11227122   .29657166   -65.906213 |
     15. |    10   .52009389   .38297786    .0912647   .16681578     85.23875 |
         |--------------------------------------------------------------------|
     16. |    10   .72287646    .6436983   .10511862   .17118093    38.538037 |
     17. |    10   .88167177    .8478637   .11069522   .16073767    27.814834 |
     18. |    10   .91376003   .88912004   .13377585   .20041587   -91.965695 |
     19. |    10   .95105686    .9370731    .1749349   .25069983   -299.34944 |
     20. |    10   .94840015   .93365733   .17671561   .34521899   -390.57569 |
         +--------------------------------------------------------------------+
    You can explore more about asreg here https://fintechprofessor.com/2017/12...ions-in-stata/
    Last edited by Attaullah Shah; 29 Apr 2022, 05:21.

    Leave a comment:


  • Nick Cox
    replied
    #29 overlaps with https://www.statalist.org/forums/for...ng-regressions

    Leave a comment:


  • vrinda gupta
    replied
    Hi Attaullah Shah,
    Sir, I am working on quarterly panel data with N=51 and T=40. I want to run a rolling regression with 15 quarters as window, where the first regression is run using first15 quarters, the nest from 2 to 16 quarters and so on. I run the code:

    bys i : asreg interestrate gfc cpiinflation ex_ln outputgap, wind(time 15)
    where i is countryid and time represents quarter

    However, I get the following output where first five values are missing and then 15 keeps on repeating:
    _Nobs _R2 _adjR2 _b_gfc _b_cpiinflation _b_ex_ln _b_outputgap _b_cons
    . . . . . . . .
    . . . . . . . .
    . . . . . . . .
    . . . . . . . .
    . . . . . . . .
    6 -1.14E-13 0 -1.16E-10 9.10E-13 9.5
    7 0 0 2.91E-11 0 9.5
    8 0 0 0 0 9.5
    9 0 0 0 5.68E-14 9.5
    10 0 0 9.10E-13 0 9.5
    11 -3.55E-15 0 0 3.55E-15 9.5
    12 -8.88E-16 0 -9.10E-13 3.55E-15 9.5
    13 4.44E-16 0 -4.55E-13 0 9.5
    14 1.39E-17 0 0 0 9.5
    15 3.55E-15 0 0 3.55E-15 9.5
    15 3.55E-15 0 -2.27E-13 3.55E-15 9.5
    15 0.57158481 0.4547443 -0.5949871 0 17.013916 -0.8415596 -15.058147
    15 0.67449925 0.58572632 0.07154758 0 19.157318 -0.8017373 -18.6109
    15 0.75565307 0.68901299 -0.038887 0 21.698423 -0.8302239 -22.609611
    15 0.81294023 0.76192392 -0.1736552 0 24.240944 -0.8290036 -27.00011
    15 0.84876895 0.80752412 -0.1587135 0 25.290433 -0.9173466 -29.078082
    15 0.87536891 0.84137862 -0.0433958 0 26.486147 -0.9977948 -31.588413
    15 0.9058569 0.88018151 0.01865548 0 27.644144 -1.2697714 -34.196145
    15 0.92468799 0.90414836 0.06770972 0 27.385615 -1.4162443 -34.279654
    15 0.93796095 0.92104121 0.06605368 0 26.001546 -1.6653594 -31.422473
    15 0.86956058 0.83398619 0.14156781 0 23.343965 -1.2346861 -26.792364
    15 0.76071796 0.69545922 0.17696004 0 20.424027 -1.0653102 -21.314451
    15 0.61998184 0.51634052 0.16265453 0 16.405722 -1.2292738 -12.935763
    15 0.47588814 0.33294854 0.05461825 0 12.551712 -1.3845503 -4.1147822
    15 0.2774562 0.0803988 -0.0420422 0 8.4163467 -1.0194785 6.2416644
    15 0.09098482 -0.1569284 -0.1745475 0 1.1061531 -0.3376111 25.246986
    15 0.09001539 -0.1581622 -0.1568791 0 1.7089324 -0.2492659 23.751656
    15 0.10733139 -0.1361237 -0.3230389 0 1.1345878 -0.1792229 25.099124
    15 0.33269165 0.15069846 -0.2819006 0 5.5779585 -0.6825067 14.222327
    15 0.60416234 0.49620662 0.52108639 0 19.738484 -1.5661441 -22.344993
    15 0.70282334 0.62177516 -0.0934901 0 20.879848 -1.4431357 -25.717625
    15 0.80732218 0.75477368 -0.264455 0 26.188266 -1.587023 -41.526525
    15 0.87953144 0.84667637 0.12671152 0 31.331196 -1.8803631 -57.017177
    15 0.93054224 0.91159922 0.16418257 0 35.649422 -1.7035194 -70.749674
    15 0.89404824 0.86515231 -0.1491169 0 33.063561 -1.0927707 -65.205697

    Can you please help me with it?

    Leave a comment:


  • Mira Bellenbaum
    replied
    Nick Cox too bad, thank you so much for your help, though!! I really appreciate it

    Leave a comment:


  • Nick Cox
    replied
    Sorry, no more ideas from me.

    To debug further, you need to comment out the quietly to see where the code fails.

    Leave a comment:


  • Mira Bellenbaum
    replied
    Nick Cox you are right! Could have seen that myself! Thank you soooo much!!

    SO sorry to annoy you again but the code above still doesn't run. When I do it it gives me
    quietly {
    insufficient observations.

    I really don't know how to cope with this all as you might have noticed. Do you have any idea how I can solve this?

    THANK YOU SO SO MUCH

    Leave a comment:


  • Nick Cox
    replied
    Your code so far pays no attention to sic_d. It refers to sic, which evidently is quite different.


    You need to loop over integers 1/12 and refer to the result of the recode.


    Code:
    quietly { 
        gen double indmun=.
        gen double inddyn=.
    
        * don't use -sum()- (undocumented since Stata 9) 
        bysort sic_d fyear : egen double indsales = total(sale)
        by sic_d fyear : gen int indcount=_n
    
        tempname indmean 
    
        forvalues i=1/12 {
            forvalues curyear=1980/2010 {
    
                summarize sale if sic_d==`i' & inrange(fyear, `curyear' - 4, `curyear'), meanonly 
    
                if r(N) > 1 { 
                    scalar `indmean' = r(mean)
    
                    regress indsales fyear if sic_d==`i' & inrange(fyear, `curyear' - 4, `curyear') & indcount==1, vce(r)
    
                    replace indmun = _b[fyear] / `indmean' if sic_d==`i' & fyear==`curyear'
                    replace inddyn = _se[fyear] / `indmean' if sic_d==`i' & fyear==`curyear'
                }
            }
        }
    }
    Code:
    
    

    Leave a comment:


  • Mira Bellenbaum
    replied
    Nick Cox thanks again for your kind help!!

    Yes there were a lot of them, BUT we transformed sic into a dummy variable (see picture). Thats why I typed in "1-12 " - it describes the clusters. I clustered the sic codes according to the sic website to actually see it regressed on the industries like "Construction" and not the sic codes - if you know what I mean. Furthermore, we want to drop the clusters sic_d 8, sic_d 10 and sic_d 12 from our dataset later on.
    Should I rather cluster them after running the dynamism/munificience code? And if so, what would I type in instead "1-12"? Our original dataset contains sic codes from 100 to 9997 and are of the type str16.

    Attached Files

    Leave a comment:


  • Nick Cox
    replied
    Are there any values with SIC codes "01" "02" ... "11" "12" ? because that is what your code requires. The data example in #17 has 4-character SIC codes.

    Leave a comment:


  • Mira Bellenbaum
    replied
    Nick Cox , thank you sooooooo much for your help again! You really are a life saver I adjusted the code accordingly:

    quietly {

    gen double indmun=.
    gen double inddyn=.

    bysort sic fyear : egen double indsales = total(sale)
    by sic fyear : gen int indcount=_n

    tempname indmean

    forvalues industry=01/12 {
    local i : di %02.0f `industry'
    forvalues curyear=1980/2010 {

    summarize sale if sic=="`i'" & inrange(fyear, `curyear' - 4, `curyear'), meanonly

    if r(N) > 1 {
    scalar `indmean' = r(mean)
    regress indsales fyear if sic=="`i'" & inrange(fyear, `curyear' - 4, `curyear') & indcount==1, vce(r)

    replace indmun = _b[fyear] / `indmean' if sic=="`i'" & fyear==`curyear'
    replace inddyn = _se[fyear] / `indmean' if sic=="`i'" & fyear==`curyear'
    }
    }
    }
    }



    The code runs through now, however, it does not generate any values... The variables exist in the datasets but only with ".".
    Furthermore, the output shows only the beginning of the file whenever I run the whole code (you can see this in my screenshot). This is probably not normal, is it?
    Attached Files

    Leave a comment:


  • Nick Cox
    replied
    Possibly

    Code:
    summarize sale if sic=="`i'" & inrange(fyear, `curyear' - 4, `curyear'), meanonly
    
    if r(N) > 1 {
        scalar `indmean' = r(mean)
        regress indsales fyear if sic=="`i'" & inrange(fyear, `curyear' - 4, `curyear') & indcount==1, vce(r)
    
        replace indmun = _b[fyear] / `indmean' if sic=="`i'" & fyear==`curyear'
        replace inddyn = _se[fyear] / `indmean' if sic=="`i'" & fyear==`curyear'
    }

    Leave a comment:


  • Mira Bellenbaum
    replied
    Nick Cox thank you so much for your reply!

    I am sorry, I was told that we could use asreg to simplify our problem, that's why I posted it here.

    I tried your code, however, I get an error message right behind the open brace (see screenshot). Do you know why this happens?

    I tried to solve it by plugging in the following after the brace, but the error message still occurs.
    if r(N) == 0 { /*handle cases without observations*/ } else { /* run normal code*/ }


    Thank you in advance!!
    Attached Files

    Leave a comment:


  • Nick Cox
    replied
    This question (#17) is misplaced as nothing to do with asreg (SSC).


    The code is hard to follow given a puzzling attempt to define local macros in advance. Many Stata programmers have a strong prejudice against that, as recently discussed at https://www.statalist.org/forums/for...-single-quotes

    Another problem is that a loop over 01/12 won't do what you want here unless you insist on a leading zero when it is needed. This was explained in https://www.stata-journal.com/articl...article=pr0051

    Nothing here is tested, but it includes several simplifications and intended small improvements.

    Code:
    quietly {
        gen double indmun=.
        gen double inddyn=.
    
        * don't use -sum()- (undocumented since Stata 9)
        bysort sic fyear : egen double indsales = total(sale)
        by sic fyear : gen int indcount=_n
    
        tempname indmean
    
        forvalues industry=01/12 {
            local i : di %02.0f `industry'
            forvalues curyear=1980/2010 {
    
                summarize sale if sic=="`i'" & inrange(fyear, `curyear' - 4, `curyear'), meanonly
                scalar `indmean' = r(mean)
    
                regress indsales fyear if sic=="`i'" & inrange(fyear, `curyear' - 4, `curyear') & indcount==1, vce(r)
    
                replace indmun = _b[fyear] / `indmean' if sic=="`i'" & fyear==`curyear'
                replace inddyn = _se[fyear] / `indmean' if sic=="`i'" & fyear==`curyear'
            }
        }
    }
    Last edited by Nick Cox; 23 Dec 2020, 09:18.

    Leave a comment:


  • Mira Bellenbaum
    replied
    Nick Cox is there any chance that you have an idea how to solve our problem? I would really, really appreciate your help!! Thank you so much in advance!

    Leave a comment:

Working...
X