Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Speed up Rangestat and asrol

    Hi Statalists,

    I have two datasets to calculate 2 variables: Roll (1984) liquidity measure and the 4-year-rolling-window standard deviations of the residuals of a cross-sectional regression.

    1) For Roll liquidity:

    Roll liquidity = 2*sqrt(- covariance (price_change_t, price_change_t-1))

    I have unbalanced panel for daily stock price (i.e. id date price). I use rangestat to calculate the covariance over 21 days. I have quite big data with nearly 5,300,000 rows (about 800 firms over 14 years). It takes me years to get the results and I am not sure when it will complete.

    Code:
    encode id,gen (firm)
    sort firm date
    format %td date
    
    by firm: gen obs_count=_n
    xtset firm obs_count
    bys firm: gen change_prc= prc - L.prc
    
    bys firm: gen lag_change_prc=L.change_prc
    
    drop if year(date)<2004
    drop if year(date)>2017
    
    ssc install rangestat
    rangestat (cov) lag_change_prc change_prc, by(firm) interval(obs_count -20 0)

    2) For the 4-year-rolling-window standard deviation of the residuals of a cross-sectional regression for unbalanced panel with over firm-year 35,000 obs from 2005 to 2017:
    First, I run the cross-sectional regression like this reg accruals cf_1lag cf cf_1lead rev ppe
    So I use runby as suggested by some prior posts here, and get the residuals, then I want to calculate the standard deviations of the residuals rolling 4 years.
    Again, it takes me forever to have the results by using asrol.

    Code:
    ssc install runby
    capture program drop one_regression
    program define one_regression
        if _N > 10 {
            capture noisily reg accruals cf_1lag cf cf_1lead rev ppe, noconstant
            if c(rc) == 0 { // REGRESSION WENT OK
                predict r
            }
            else if inlist(c(rc), 2000, 2001) { // NO OR INSUFFICIENT OBSERVATIONS
                gen r = .
            }
            else { // THERE WAS AN UNEXPECTED PROBLEM
                gen comment = "Unexpected error `c(rc)''"
            }
        }
        exit
    end
    
    runby one_regression, by(year industry) status
    replace r=0 if missing(r)
    rename r residuals
    
    ///use asrol to obtain the standard deviation of the residuals rolling 4 years
    sort firm year
    bys firm: gen t=_n
    tsset firm t
    asrol residuals, w(year 4) s(sd) g(sd)
    I cannot upload any dataset sample here because rangestat or asrol is run based on the actual data sample size.
    Please accept my apology for this long post, but because they are the same topic how to speed up the running process.

    Can anyone please help if I did something wrong with the codes? How can I check if when they will finish?

    I really appreciate your help.

    Kind regards,
    Ken
    Last edited by Ken Yang; 26 Mar 2019, 05:46.

  • #2
    rangestat doesn't emit signals about its progress. As for speeding it up, the code is visible. Any suggestions you have about improving it are welcome. I can't see that you are doing anything wrong in calling it.

    asrol is (also) from SSC, as you are asked to explain. I can't speak on its behalf.

    Comment


    • #3
      Thank you so much for you quick reply Nick Cox .

      Can you advise me it is better if I use rangestat for calculating the standard deviation of the residuals, instead of asrol?

      Code:
      rangestat (sd) residuals, by(firm) interval( t -4 0)
      Thank you.

      Comment


      • #4
        It really shouldn't matter much, but in any case I am not a good person to ask. I don't use asrol (which I didn't write) because I do use rangestat (which I helped to write).

        You may find a speed difference; there is a price to pay for rangestat's generality.

        Comment


        • #5
          Thank you so much again Nick Cox.

          Comment


          • #6
            I have the following comments on your code:
            Code:
            bys firm: gen t=_n
            tsset firm t
            asrol residuals, w(year 4) s(sd) g(sd)
            Why do you use tsset firm t before asrol? The earlier version of asrol supported time series setting and would automatically pick those time and panel id dimensions, in the new version you need to explicitly use these variables in the asrol syntax. So if you need the statistics for each firm, the code is
            Code:
            bys firm: asrol  residuals , w(year 4) s(sd) g(sd)
            This is much faster than using by. Even for a bigger dataset, asrol is faster. Once instance in which asrol is slow is that when you have missing values in the timevar, ie. year here. So make sure this is not the case.


            For the cross-sectional regression, you can also try asreg that is much faster. It also supports the noconstant option, you can try:
            Code:
            ssc install asreg
            bys industry year : asreg accruals cf_1lag cf cf_1lead rev ppe, noconstant
            Regards
            --------------------------------------------------
            Attaullah Shah, PhD.
            Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
            FinTechProfessor.com
            https://asdocx.com
            Check out my asdoc program, which sends outputs to MS Word.
            For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

            Comment

            Working...
            X