Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Collect rolling correlations using statsby

    Dear all,

    I have an unbalanced panel dataset of stock and market returns where permno is the panel variable and date is the time variable. The data has been tsset. The sample of interest covers the period from January 1996 through October 2012.

    Now I'm trying to estimate 60-months rolling correlations between the stock and market returns. Rolling over 60 months is done by looping. I would like to use the -statsby- command to collect correlation coefficients for the different stocks (permno).

    So far, my code looks like this (later I will append all files):
    Code:
        * 3. Correlations
        // Daily sample: months from 01Jan1996-31Oct2012
        local k = 1996*12+1     // k=Jan1996
        local l = 2012*12+10    // l=Oct2012
            forvalues i=`k'(1)`l'{
            disp `i'
            quietly use date permno stock_ret market_ret if year(date)*12+month(date)<=`i'&year(date)*12+month(date)>`i'-60 using "C:\...", clear
            if     (r(N)>=200){
                quietly statsby rho_s_m = r(rho), by(permno) saving(C:/.../corr`i', replace): corr stock_ret market_ret
                }
                    }
    However, tow issues arised when running the code.
    1. It takes a lot of time. After 30 minutes, the very first file "corr23953" hasn't been finished, yet.

    2. When choosing the "noisily" option, every now and then there is the message

    "no observations
    captured error running (correlate stock_ret market_ret), posting missing values"

    popping up.

    My questions are: What did I do wrong? Is there a more efficient way to collect rolling correlations in this case?

    Help is highly appreciated.

    Best,
    Christopher

  • #2
    First, if you want rolling statistics, instead of building a loop around -statsby- I would use the -rolling- command.

    Next, you have made your life more complicated than it needs be with your way of handling dates. Use Stata monthly dates instead.

    Next, your -if (r(N) >= 200) {- is incorrect. In fact, I'm amazed you got any output from it at all. It should generate a syntax error because r(N) is not defined at that point in the code.

    So, here's how I would approach this:

    Code:
    use my_data_file, clear
    gen int mdate = mofd(date)
    format mdate %tm
    keep if inrange(mdate, tm(1996m1), tm(2012m10)
    xtset permno mdate
    
    rolling r(rho), window(60) reject(r(N) < 200)) saving(correlations, replace): corr stock_ret market_ret
    Notes: Not tested. Beware of typos, other minor errors. This will produce a single file containing all of your correlations, and there will be variables indicating which permno and which window they apply to.

    Note also that if your data set is large this is going to take a long time. So be patient.


    Comment


    • #3
      Thank you very much indeed, Clyde!

      However, the problem is that when I -xtset- the data, Stata shows the error message:
      "repeated time values within panel"

      Note that the original dataset is daily, so I have observations of "stock_ret" and "market_ret" for every day. My goal, however, is to obtain monthly rolling correlations.

      Comment


      • #4
        OK, but what does that mean? Correlations between monthly means? Mean correlations for each month of daily values? etc.

        Comment


        • #5
          Sorry, should have stated this more clearly from the beginning. Indeed, I'm looking for mean correlations for each month of daily values.

          Comment


          • #6
            This seems to get less clear as we proceed. How do you reconcile "mean correlations of each month of daily values" with a 60 month window? Do you mean you want to do a correlation for each month, and then average those monthly correlations over 60 month windows? I worry about that approach if only because, in general, averaging correlation coefficients is rarely a useful or meaningful thing to do.

            Or perhaps the monthly aspect of the data is more about framing? That is, do you want a 60 month window that correlates the daily observations, and perhaps with the windows separated by a month instead of by a day?

            Nick's suggestion in #4 of reducing the data set to monthly means and then correlating the monthly means over a 60 month window makes sense to me. But I gather that isn't what you have in mind based on what you say in #5.
            Last edited by Clyde Schechter; 15 Dec 2016, 10:54.

            Comment


            • #7
              Sorry for the confusion. Of course the window length has to be specified in days, not months. What confused me here was that in financial markets a month typically consists of 20 or 21 trading days, but that can obviously vary due to weekends or holidays. My first guess to “circumvent” this was specifying the window length in months … Never mind.
              Let’s assume, on average, 250 trading days per year, i.e. 1250 trading days per five years (60 months). I want to require a minimum of 200 valid observations. If the number of observations is less than that, the correlation value should be set missing. Having -tsset- the (daily) data, I would have something like:
              Code:
              rolling r(rho), window(1250) reject(r(N) < 200)) saving(correlations, replace): corr stock_ret market_ret
              More specifically, the obtained correlations then serve as an input for a “market beta” variable that is averaged on a monthly basis.
              The code seems to work perfectly well. However, it is very slow as you mentioned in #2.

              Is there a way to combine the -rangestat- command with Mata to speed things up? Something like:
              Code:
              mata:
              mata clear
              mata set matastrict on
              
              real rowvector mycorr(real matrix X)
              {
                real matrix R
              
                R = correlation(X)
                return(rows(X), R[2,1])
              
              }
              
              end
              
              
              rangestat (mycorr) stock_ret market_ret, interval(date -1250 0) by(permno) casewise
              
              rename (myreg1 myreg2) (nobs rho_s_m)
              
              // Reject results if window < 1250 or number of obs < 200
              isid permno date
              by permno: replace rho_s_m = . if _n < 1250 | nobs < 200

              Comment


              • #8
                Edit -rangestat- command:

                Code:
                ...
                 rename (mycorr1 mycorr2) (nobs rho_s_m)
                ...

                Comment

                Working...
                X