Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • New on SSC: -rangerun-, an efficient alternative to -rolling-, -statsby-, and looping over observations

    Thanks to Kit Baum, a new command called rangerun (with Nick Cox) is now available on SSC. To use it, you need to install rangestat as well. To install both, type in Stata's Command window:
    Code:
    ssc install rangerun
    ssc install rangestat
    rangerun runs a user-supplied Stata program for each observation in the sample. At each pass, the data in memory is cleared and replaced with observations that fall within the interval bounds specified for the current observation. The user's program is run and results are collected from new variables left in memory when the program terminates without error. rangerun provides the same functionality as rangestat except that, instead of using Mata functions, statistics are calculated by running a user-written Stata program that may contain any number of Stata statistical commands.

    rangerun provides a general solution to any type of rolling window problem. To contrast the mechanics of rolling compared to rangerun, here's the last example from rolling's help file:

    Code:
    webuse lutkepohl2, clear
    tsset qtr
    rolling ratio=(r(mean)/r(p50)), window(10): summarize inc, detail
    list in 1/10
    
    clear all
    program myprog
      if _N < 10 exit
      summarize inc, detail
      gen ratio = r(mean)/r(p50)
    end
    webuse lutkepohl2, clear
    rangerun myprog, interval(qtr -9 0) use(inc)
    list qtr inc ratio in 1/19, sep(0)
    With rolling, execution times increase exponentially as the data size increases. With rangerun (and rangestat), execution times increase linearly with the number of observations in the sample. If you double the number of observations, the run time for rangerun will be twice as long. For large problems, rangerun will be orders of magnitude faster than rolling.

    Everything that can be done with statsby can also be done with rangerun. Since statsby's execution times also increase exponentially as the data size increases, rangerun will be orders of magnitude faster for large data problems. See the help file for a fully spelled out example that shows how to specify the interval bounds so that rangerun only runs the user's program once per group.

    Because rangerun manages all the data in Mata and runs the user's program from Mata, there is very little overhead. The commands in the user's program do not require in or if qualifiers since the data in memory is only populated with observations in range for the current observation. So even with the most efficient Stata loop, rangerun will be significantly faster (and more convenient). Here's an example that calculates a regression on a rolling window of 7 years (including the current observation) and stores the constant term. A minimum of 4 observations is required:

    Code:
    clear all
    webuse grunfeld
    
    local nobs = _N
    gen alpha = .
    quietly forvalues i = 1/`nobs' {
      capture regress invest kstock if company == company[`i'] &inrange(year, year[`i']-6, year[`i'])
      if _rc == 0 & e(N) >= 4 replace alpha = _b[_cons] in `i'
    }
    
    program myprog
      if _N < 4 exit
      regress invest kstock
      gen alpha_rr = _b[_cons]
    end
    rangerun myprog, interval(year -6 0) by(company)
    
    assert alpha == alpha_rr
    The help file for rangerun includes the above examples as well as many others. Each example is completely self-contained and can be run easily via a click to run link.

  • #2
    Hi Robert,

    Very appealing indeed. Would you be able to provide the syntax in "program myprog" and in "rangerun" that would replicate the following with "statsby" with the same name of variable output. In particular, I am having a hard time to replicate "saving( ,replace)". Thanks a lot in advance.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long country float(date F1_exret CAR MOM12 VAL crisis)
     1 564   1.7234702    -.9702867 0 0 0
     2 564     .669481     .4198689 0 0 0
     3 564   2.1774395    1.8967996 0 0 0
     4 564    1.716818   -.10568553 0 0 0
     5 564    1.654546    1.5605747 0 0 0
     6 564   .12005723    -.3819725 0 0 0
     7 564    2.088872    -.3025167 0 0 0
     8 564   1.9542092    -.1429573 0 0 0
     9 564    1.913284   -1.5697012 0 0 0
    10 564   1.3895004     -.333147 0 0 0
    11 564   -.6983489   -.07097614 0 0 0
     1 565    2.525541    -.9401287 0 0 0
     2 565   1.6382786     .4823666 0 0 0
     3 565   .03053062     1.928773 0 0 0
     4 565    .7797985   -.10141976 0 0 0
     5 565    .7721716     1.435958 0 0 0
     6 565   .08164307    -.3535634 0 0 0
     7 565    .3472369   -.28043684 0 0 0
     8 565     .639734   -.13630654 0 0 0
     9 565    2.077236    -1.665879 0 0 0
    10 565   .42019585   -.29749677 0 0 0
    11 565   .00641382  -.071866564 0 0 0
     1 566    3.032741    -.9757972 0 0 0
     2 566   4.3885446     .5083358 0 0 0
     3 566    1.172018    1.9126217 0 0 0
     4 566    2.522024   -.06745326 0 0 0
     5 566    2.524133     1.362949 0 0 0
     6 566   1.9629337    -.3860264 0 0 0
     7 566  -1.1966026   -.22607984 0 0 0
     8 566   2.6356275    -.1197861 0 0 0
     9 566    3.909091   -1.7282636 0 0 0
    10 566   1.3118774   -.25941315 0 0 0
    11 566    4.587566   -.02108716 0 0 0
     1 567  -.56466633    -.8579798 0 0 0
     2 567    3.371995    .52388686 0 0 0
     3 567  -1.5325208    1.8431644 0 0 0
     4 567   -1.367512   -.03876706 0 0 0
     5 567  -1.3995237    1.3607465 0 0 0
     6 567  -1.0875815   -.49573874 0 0 0
     7 567  -1.8963182    -.2006677 0 0 0
     8 567  -1.4711365   -.08216292 0 0 0
     9 567   -.7537519   -1.8450867 0 0 0
    10 567   -.7283884   -.20468663 0 0 0
    11 567  -2.9526434 -.0027084006 0 0 0
     1 568   2.5070984    -.8590295 0 0 0
     2 568   .50293773     .6651217 0 0 0
     3 568 -.006127452     1.853255 0 0 0
     4 568    .4609633   -.04480395 0 0 0
     5 568    .3641363    1.2643485 0 0 0
     6 568    1.428138    -.6417536 0 0 0
     7 568   -1.360379    -.1843808 0 0 0
     8 568   2.1976347   -.09202287 0 0 0
     9 568    4.773852   -1.8170186 0 0 0
    10 568      .44334   -.16763383 0 0 0
    11 568    .8057225   .023917945 0 0 0
     1 569     .990099    -.7633621 0 0 0
     2 569   -.3746195    .55767906 0 0 0
     3 569   1.8444757    1.7321197 0 0 0
     4 569    1.369636     .0508652 0 0 0
     5 569   1.3550035    1.2670403 0 0 0
     6 569    1.280933    -.8702254 0 0 0
     7 569    3.720657    -.0807259 0 0 0
     8 569   1.4383608  -.016056275 0 0 0
     9 569   -.4724613   -1.9291434 0 0 0
    10 569    .8247426   -.04843285 0 0 0
    11 569   1.7705812     .1002419 0 0 0
     1 570   -5.047853    -.8147603 0 0 0
     2 570    .9358605    .56120753 0 0 0
     3 570   -.4091456    1.6798683 0 0 0
     4 570   -.5243028    .07973593 0 0 0
     5 570   -.4091025     1.256953 0 0 0
     6 570   -.7381708    -.8494334 0 0 0
     7 570   2.7175086   -.04664638 0 0 0
     8 570  .008587967   .007487852 0 0 0
     9 570   -8.746262   -1.9676385 0 0 0
    10 570   -.3508842   -.03478042 0 0 0
    11 570  -2.1770217    .12800634 0 0 0
     1 571     8.77635    -.7335489 0 0 0
     2 571    6.455671    .36474895 0 0 0
     3 571   3.4316096    1.8096303 0 0 0
     4 571   4.2555833    .10649563 0 0 0
     5 571   4.3205576    1.4366283 0 0 0
     6 571   1.0064203   -1.1170341 0 0 0
     7 571    .7825066   -.11768645 0 0 0
     8 571    7.374245  -.016813857 0 0 0
     9 571    7.689019   -1.6399684 0 0 0
    10 571    3.053912    -.1539106 0 0 0
    11 571    6.473903    .06145909 0 0 0
     1 572   4.7234306    -.9422983 0 0 0
     2 572    4.622026     .3383433 0 0 0
     3 572    .7943925    1.8703525 0 0 0
     4 572     1.73724    .20057704 0 0 0
     5 572   1.7297754    1.0474254 0 0 0
     6 572   1.9633347   -1.0726296 0 0 0
     7 572   -.2342022    .10246935 0 0 0
     8 572   .57314026    .03735296 0 0 0
     9 572    1.793277   -1.7764182 0 0 0
    10 572    .8708991  -.014342692 0 0 0
    11 572   1.6970334     .2091683 0 0 0
     1 573  -4.5697327   -1.0226897 0 0 0
    end
    format %tm date
    label values country country
    label def country 1 "AUD", modify
    label def country 2 "CAD", modify
    label def country 3 "CHF", modify
    label def country 4 "DKK", modify
    label def country 5 "EUR", modify
    label def country 6 "GBP", modify
    label def country 7 "JPY", modify
    label def country 8 "NOK", modify
    label def country 9 "NZD", modify
    label def country 10 "RUB", modify
    label def country 11 "SEK", modify
    Code:
    statsby _b _se, by(date) saving(beta_model, replace): regress F1_exret CAR MOM12 VAL if crisis==0, vce(robust)

    Comment


    • #3
      rangerun, like rangestat, runs the user-defined program for each observation in the data in memory. You want to run a regression once for each by-group so runby (from SSC) is a better tool for your situation. Look for the "Example: panel-specific regression" section in the help file. As applied to your problem, it should go something like:

      Code:
      clear all
      
      program doit
          regress F1_exret CAR MOM12 VAL if crisis==0, vce(robust)
          keep in 1
          
          gen long nobs = e(N)
          foreach v in CAR MOM12 VAL _cons {
              gen double b_`v' = _b[`v']
              gen double se_`v' = _se[`v']
          }
          keep date nobs b_* se_*
      end
      
      use "test_data.dta", clear
      runby doit, by(date)
      list
      and the results from the list command:
      Code:
      . list
      
           +-------------------------------------------------------------------------------------------------------+
           |   date   nobs        b_CAR      se_CAR   b_MOM12   se_MOM12   b_VAL   se_VAL      b__cons    se__cons |
           |-------------------------------------------------------------------------------------------------------|
        1. | 2007m1     11    .07658284   .20992476         0          0       0        0    1.3372117   .29115652 |
        2. | 2007m2     11   -.42948128   .19703566         0          0       0        0    .84716182   .23691064 |
        3. | 2007m3     11   -.32126374   .28498127         0          0       0        0    2.4409049   .51513761 |
        4. | 2007m4     11    .01633305   .29539946         0          0       0        0   -.94382252   .50075689 |
        5. | 2007m5     11   -1.0987914    .4048005         0          0       0        0    1.1015742   .37358851 |
           |-------------------------------------------------------------------------------------------------------|
        6. | 2007m6     11    .36274807    .2537449         0          0       0        0    1.2497644   .33828324 |
        7. | 2007m7     11    2.0838004   .94224578         0          0       0        0   -1.3400714   .73222936 |
        8. | 2007m8     11    -.5235185   .72774564         0          0       0        0    4.8745252   .83789653 |
        9. | 2007m9     11   -.41926504   .33138473         0          0       0        0    1.8427588   .47456698 |
           +-------------------------------------------------------------------------------------------------------+
      
      .

      Comment


      • #4
        You can also use asreg for this.
        Code:
        ssc install asreg
        
        bys date: asreg F1_exret CAR MOM12 VAL if crisis==0, robust
        duplicates drop date, force
        
        list date _Nobs _b_CAR _se_CAR _b_MOM12 _se_MOM12 _b_VAL _se_VAL _b_cons _se_cons
        
        *And appreciate the results
        
             +----------------------------------------------------------------------------------------------------------+
             |    date   _Nobs       _b_CAR    _se_CAR   _b_MOM12   _se_M~12   _b_VAL   _se_VAL      _b_cons   _se_cons |
             |----------------------------------------------------------------------------------------------------------|
          1. |  2007m1      11    .07658284   .2099248          0          0        0         0    1.3372117   .2911565 |
          2. |  2007m2      11   -.42948128   .1970357          0          0        0         0    .84716182   .2369106 |
          3. |  2007m3      11   -.32126374   .2849813          0          0        0         0    2.4409049   .5151376 |
          4. |  2007m4      11    .01633305   .2953995          0          0        0         0   -.94382252   .5007569 |
          5. |  2007m5      11   -1.0987914   .4048005          0          0        0         0    1.1015742   .3735885 |
             |----------------------------------------------------------------------------------------------------------|
          6. |  2007m6      11    .36274807   .2537449          0          0        0         0    1.2497644   .3382832 |
          7. |  2007m7      11    2.0838004   .9422458          0          0        0         0   -1.3400714   .7322294 |
          8. |  2007m8      11    -.5235185   .7277457          0          0        0         0    4.8745252   .8378965 |
          9. |  2007m9      11   -.41926504   .3313847          0          0        0         0    1.8427588    .474567 |
         10. | 2007m10       .            .          .          .          .        .         .            .          . |
        Last edited by Attaullah Shah; 19 Jul 2020, 03:34.
        Regards
        --------------------------------------------------
        Attaullah Shah, PhD.
        Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
        FinTechProfessor.com
        https://asdocx.com
        Check out my asdoc program, which sends outputs to MS Word.
        For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

        Comment


        • #5
          Thanks Robert and Attaullah.

          Is runby significantly faster than statsby though? I'm running regressions by group (date, about 120 of them). Each group has about 3000 observations. The regression has about 30 independent variables. The first test I am running do not show a lot of speed improvement. Any clue?

          Same question for asreg.

          Finally, how does "asreg" compares to "runby" in terms of speed?

          Comment


          • #6
            asreg is an order of magnitude faster than the statsby and you can compare it yourself with other commands as I have never used runby.
            Regards
            --------------------------------------------------
            Attaullah Shah, PhD.
            Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
            FinTechProfessor.com
            https://asdocx.com
            Check out my asdoc program, which sends outputs to MS Word.
            For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

            Comment


            • #7
              Let's be clear: asreg (SSC), statsby (official) and runby (SSC) are not aimed at the same problem but they do overlap in what they can do: otherwise the thread would not have evolved to the this point.

              The only time comparisons that can be taken seriously are for

              a specified dataset

              specified code


              a particular stated machine with particular hardware and OS and other applications minimised.

              Speed ratios aren't constant for pairs of programs, but highly variable.

              Even then, in this kind of case: it is not statsby or runby that does (in this example) repeated regressions with robust standard errors, but statsby or runby with extra code, and what that extra code is will be crucial.

              Both statsby and runby are highly general tools and in effect extensible in that they will work with the user's own code.

              asreg as I understand it has very different flavour: it does what it does and is focused on certain classes of regression problem, especially some common in applied economics.

              I agree with @Attaullah Shah: If Francois Durant wants to know what is fastest for his problem, he has to find out for himself.

              Comment


              • #8
                This is what Robert Picard writes above:
                "With rolling, execution times increase exponentially as the data size increases. With rangerun (and rangestat), execution times increase linearly with the number of observations in the sample. If you double the number of observations, the run time for rangerun will be twice as long. For large problems, rangerun will be orders of magnitude faster than rolling. [...] Since statsby's execution times also increase exponentially as the data size increases, rangerun will be orders of magnitude faster for large data problems. See the help file for a fully spelled out example that shows how to specify the interval bounds so that rangerun only runs the user's program once per group.

                Because rangerun manages all the data in Mata and runs the user's program from Mata, there is very little overhead. The commands in the user's program do not require in or if qualifiers since the data in memory is only populated with observations in range for the current observation. So even with the most efficient Stata loop, rangerun will be significantly faster (and more convenient)."

                I would prefer not to try to program them all to try because the code I have is quite complex, changing it increases the probability of errors and it takes a long time to run. I would like to have a general idea of what would be the fastest way to execute the regression setup I described above.
                "'m running regressions by group (date, about 120 of them). Each group has about 3000 observations. The regression has about 30 independent variables."
                and
                The extra code in the program part is pretty much what Robert wrote in #3. Hope this is enough information to get some directions.

                How does rangerun compare to all the above? Thanks a lot for your help.








                Comment


                • #9
                  Did you try asreg and its speed?
                  Regards
                  --------------------------------------------------
                  Attaullah Shah, PhD.
                  Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
                  FinTechProfessor.com
                  https://asdocx.com
                  Check out my asdoc program, which sends outputs to MS Word.
                  For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

                  Comment


                  • #10
                    asreg seems to be much faster. I'm currently experimenting. Thanks a lot!
                    Last edited by Francois Durant; 21 Jul 2020, 23:19.

                    Comment


                    • #11
                      Robert Picard already explained in #3 that rangerun has no advantage over stataby for your problem.

                      Comment


                      • #12
                        asreg seems to work fantastic speedwise. Thanks Attaullah!

                        Comment


                        • #13
                          Again, rangerun will run a user-defined program once for each observation in the data and is therefore not the tool to use for your by-group application.

                          The overhead needed to run about 120 regressions is minimal so long execution times are almost entirely due to the speed of Stata's regress command. FWIW, you can get a significant speed increase over regress by implementing a bare-bones regression procedure coded in Mata. rangestat (from SSC) has such a procedure built in but it does not support robust standard errors. The code however is public and I show in this post how to extend it to calculate what you want. Since rangestat is optimized to calculate only once for all observations with the same interval, you can perform your by-group regressions using something like:

                          Code:
                          rangestat (reg_robust) F1_exret CAR, interval(date 0 0)
                          You can indeed to the same with asreg (from SSC) and it might even be a bit more convenient to use but note that the author has chosen not to make the source code public.

                          Comment


                          • #14
                            Thanks for bringing it up Robert, I appreciate it.

                            Comment


                            • #15
                              Hi Robert Sir
                              I am facing an issue while applying -rangerun- to generate quantile regression coefficients (alpha and beta) for an entire time series for cross-section (Panel Data). I want to generate 1% and 50% quantile coefficients for let say 10 banks for 15 years. I am using daily stock market data and want beta coefficients of the independent variables for the series, specifically year-end, which is not the issue if an entire dataset is generated segregation can be done. However I am facing a problem with syntax. Could you please help me with the commands?

                              Comment

                              Working...
                              X