Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using rangestat when there are gaps in the date identifier

    Hi,

    I am trying to use rangestat to perform rolling regressions over a data panel. My data correspond to daily observations as of Friday of each week. Dates are formatted as daily (%td). This is the command I am using:

    xtset ticker date_end, delta(7)
    rangestat (reg) excess_ret excess_mkt anomaly, interval(date_end -260 0) by(ticker)

    Looking at the results of this command, I find that the number of observations in the regressions is equal to 38, which roughly matches 260/7. I also realize that changing the interval to 1820 (which is equal to 260 * 7) give 260 observations in each regression. However, I am not sure if this aproximation is right.

    I also tried to format the dates to weekly (%tw), but this creates repeated values within groups. Also, I tried creating an index 1,...,n, to identify weeks, but this seems problematic since I also have gaps in my data.

    I would like to know if there is any way to correctly format the dates to get the results I want without having to approximate the size of the interval.

    Thanks.

  • #2
    I am completely confused by by your description of the situation. It is completely unclear whether you have daily or weekly observations. And knowing that your date variable is formatted %td doesn't necessarily make it daily. You can format any number with %td, or any other date-time format. But that doesn't change how the date variable actually works or what it contains. I can assure you that no amount of re-formatting of your data will solve this problem. It is a question of figuring out what kind of data you actually have, and then perhaps having to transform it to some other data, or perhaps changing the arguments of the -interval()- option.

    It is also unclear what you want the results to look like. Just what time range do you want? (Never mind the representation in the variables--explain carefully in words what actual dates should be included in your rolling regression.)

    I think in order to understand what is going on, you should show an example of your data. Please use the -dataex- command to do that. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Clyde Schechter makes excellent points.

      My guess is that you are troubled by intervals 261 [sic] days long capturing different numbers of observations because of the occurrence of not just weekends but also holidays, to which two answers are

      1. Why worry? The data are what they are.

      2. Use a business calendar.

      rangestat by the way ignores xtset settings. Indeed, its longest root was for tackling problems in which xtset or tsset is either fairly useless (because data are mostly gaps) or completely useless (because there can be multiple observations with the same identifier and time).

      Comment


      • #4
        Hi all.

        Thanks for your advice and sorry for any confusion I may have caused.

        GOing back to my doubt, my data looks like this:

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input long ticker int date_end float(excess_ret excess_mkt anomaly)
        1 16072    .104384  2.3311381  2.1091588
        1 16079   9.828722   .5370846  2.0708468
        1 16086   4.985981    .650857   3.864417
        1 16093   .4456983  1.6564808 -.13345626
        1 16100  .26526782 -2.3136055   .7814444
        1 16107 -.03845385  1.1595386   4.051586
        1 16114  -.6696562   2.138609   1.380939
        1 16121  -4.104428   .4275139  -1.267698
        1 16128  -5.097979  .10247156 -2.2284052
        1 16135  11.290337  1.1343857  -2.939527
        1 16142  -1.793276  -3.795134   .3828012
        1 16149  -6.670278 -1.9485587    1.66777
        1 16156  3.5154846   .8989193 -2.2360399
        1 16163 -2.3583226  3.9483554 -1.1824989
        1 16170  1.9345993  1.4185913   .8184172
        1 16177   9.790351 -.06364479 -1.0510018
        1 16184   3.701107  -.7074504   .0581892
        1 16191 -1.4558245  -2.704473   .7970957
        1 16198 -3.8589404  -1.055533 -2.0847619
        1 16205  -5.385038 -3.1201224 -2.1353164
        end
        format %td date_end
        label values ticker ticker
        label def ticker 1 "A3M.MC", modify
        My sample includes observations at Friday of each week from January 2004 to Decembre 2019 for 120 tickers. For each ticker and date, I want to run a regression using the previous 260 observations available (or weeks available). When using rangestat to achieve this, I run the command

        xtset ticker date_end, delta(7)
        rangestat (reg) excess_ret excess_mkt anomaly, interval(date_end -260 0) by(ticker)
        However, it seems to me rangestat is counting days and not observations or rows. Thus, the total number of observations I get in each regression is 38 and not 260 as I would like. Notice that 38 is close to 260/7.

        To get what I need, I was thinking about creating an index variable like week = 1, ... , N, and then running again the rangestat command using this variable in the interval. But I want to be sure there is no other alternative like formating the date or modifiying the rangestat function to do what I want directly using my date variable.

        I hope I was clearer in this post.


        Thanks again for your support. .

        Comment


        • #5
          Indeed; rangestat (a command, not a function) just does what you tell it to do, to look for observations in the interval [date_end - 260, date_end] and the solution for you is to work with a different counter. But don't use Stata's weekly functions -- which are not a good idea here -- or try messing with the display format -- as Clyde's already explained, that won't work.

          floor(date_end/7) or ceil(date_end/7) will map successive Friday dates to successive integers.

          Comment


          • #6
            Understood.
            Thank you very much to both of you.

            Comment

            Working...
            X