Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Consecutive observations by mofd

    Hello everyone,
    I am working with panel data and I would like to kindly ask you for some guidance. The problem may seem trivial to some of you, but I have to admit I do not know how to deal with my problem.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double crsp_fundno long caldt float totalreturn
    2706 15279   .024748646
    2706 16282  -.013114043
    2706 16314    .01105325
    2706 16344   .013855805
    2706 16373    .01221272
    2706 16405    .02567463
    2706 16436   .023424506
    2706 16467  -.011213606
    2706 16495    .00991609
    2706 16526  -.011934737
    2706 16555  -.002844098
    2706 16587   .022079043
    2706 16617   .007660258
    2706 16646   .015898954
    2706 16679   .006798665
    2706 16709   .004056035
    2706 16740  -.013472764
    2706 16770   .021835493
    2706 16800   .012771065
    2706 16832    .02111781
    2706 16860 -.0013869605
    2706 16891    .00344977
    2706 16919   .008940833
    2706 16952   -.02251356
    2706 16982  .0006903863
    2706 17013  .0041847653
    2706 17044   .017356306
    2706 17073   .013650396
    2706 17105   .020194326
    2706 17135   .014516724
    2706 17164   .005006196
    2706 17197   .007528771
    2706 17225  .0013551157
    2706 17255   .005412986
    2706 17286    .02227978
    2706 17317   .013191413
    2706 17346  -.007165887
    2706 17378  -.008541029
    2706 17409   .008599109
    2706 17437    .02756282
    2706 17470     .0204282
    2706 17500  -.015645668
    2706 17531  -.004752705
    2706 17562   -.02759434
    2706 17591  -.013820929
    2706 17622   -.01256211
    2706 17652   .031387217
    2706 17682   .005068702
    2706 17713   -.04542773
    2706 17744  -.014363908
    2706 17773  -.001528342
    2706 17805   -.07752702
    2706 17836   -.11563985
    2706 17864   -.03855668
    2706 17897    .04511837
    2706 17927    -.0402251
    2706 17955    -.0566357
    2706 17987    .04560557
    2706 18017    .06875563
    2706 18046    .05882353
    2706 18078   .004115226
    2706 18109    .05840164
    2706 18140   .027105516
    2706 18170   .033930253
    2706 18200  -.006381039
    2706 18231    .03119266
    2706 18262   .009373898
    2706 18291  -.014362657
    2706 18319   .013661202
    2706 18352    .03504043
    2706 18382   .012152778
    2706 18410   -.04631218
    2706 18443   -.01528777
    2706 18473    .04840183
    2706 18505  -.014808362
    2708 15279    .02570281
    2708 16405    .04043605
    2708 16436    .03043185
    2708 16467   -.01996008
    2708 16495   .016972166
    2708 16526  -.015353805
    2708 16555   -.01016949
    2708 16587   .028767124
    2708 16617   .007989348
    2708 16646   .027741084
    2708 16679   .003213368
    2708 16709   .010890455
    2708 16740  -.015842838
    2708 16770    .03090792
    2708 16800   .014739787
    2708 16832   .029126214
    2708 16860  -.003144654
    2708 16891   .010094637
    2708 16919    .01436602
    2708 16952  -.032635465
    2708 16982  .0006365372
    2708 17013 -.0006361323
    2708 17044    .01909612
    2708 17073    .01623985
    2708 17105    .02704364
    end
    format %td caldt
    This is just a small part of the entire data.
    I am using 60 month estimation window to get alpha for funds in my sample, in the later stage. The data was provided on consecutive monthly basis. Normally in a ideal circumstances I would use

    bys crsp_fundno: gen obs=_N
    drop if obs<60
    drop obs,

    and then run the rangestat with interval of 60. But the problem is that in the meantime, I had to drop observations when a fund for example was too volatile or too small. Thus, there is a possibility that I will not have 60 consecutive months for my estimation window for a fund (for example, in the 60 month window for a fund with 65 monthly observations, if any (except first or last 5) month is missing). This is slightly (or more) too much for my current STATA level. This is why I would like to kindly ask you which command should I use to check how many time such situation happens in my sample that although a fund has 60 months but those are not consecutive, and what command should I use with the rangestat (reg) command the way it would not calculate anything when the window consist of 60 months that are not consecutive?

    Thank you very much!

    Best,
    Rafał



  • #2
    Here is a way to get the results you want:

    Code:
    // CREATE A MONTHLY DATE VARIABLE
    gen mdate = mofd(caldt)
    format mdate %tm
    
    by crsp_fundno (mdate), sort: gen gap = mdate != mdate[_n-1] + 1 & _n > 1
    
    rangestat (max) gap (mean) totalreturn (count) N = totalreturn, ///
        by(crsp_fundno) interval(mdate -59 0)
        
    replace totalreturn_mean = . if N < 60 | gap_max
    The first step is to create a monthly date variable--it is hopeless to work with the daily date variables for this problem. When the essence of the data is monthly, you need a monthly date variable.

    The next thing is to identify non-consecutive values of the monthly date variable.

    Finally -rangestat- comes in. You did not describe much about your rolling window estimation. You did not say what you are estimating, nor the location of the rolling window with respect to the index observation. So, for illustration purposes, the code above shows how to get a rolling 60 day lagged window including the index date and calculate the mean of totalreturn in the window. Note that in addition to calculating the rolling mean, we also calculate the number of non-missing ob servations of totalreturn in the window (N), and whether or not the window contains any gaps (gap_max)

    -rangestat- does not allow -if- or -in- conditions. So you can't get rangestat to refrain from calculating anything when the window lacks 60 months or is non-consecutive. What you have to do is just let -rangestast- calculate everything and then replace the result by a missing value when the additional requirements for the window are not met.

    Comment


    • #3
      Oh apologies Clyde for poor description.
      This is how my code for rangestat looked like:

      rangestat (reg) rirf mkt, interval(datecode -60 0) by(crsp_fundno) excludeself,

      which is used to calculate constant for each fund, updated monthly based on a rolling estimation window. Thus for each fund in month I run time-series regression using sixty months of returns data.

      Comment


      • #4
        I am ashamed to say that I incorrectly presented my problem. If you could reflect on my problem again I would really appreciate that. Big apologies.
        Short introduction of variables and logic behind analysis:
        1. Crsp_fundnos are part of crsp_portnos ; a portno can have 1, or many fundos.
        2. Crsp_fundnos are used to calculate value weighted returns for a crsp_portno in each month; thus in the sample there are duplicates of portnos in the same month.
        3. I cannot delete the duplicates, because I will lose other information attached to crsp_fundno
        4. mkt is market return
        5. rirf is return of the portno minus riskfree rate.

        For my analysis I need to run a moving estimation window of 60 months for my crsp_portnos and their respective rirfs over mkts with -rangestat (reg) rirf mkt, interval(datecode -60 0) by(crsp_fundno) excludeself,-
        Therefore, what I need to do first is to drop crsp_portnos that have less than 60 months in the entire sample period. Then, since the window cannot have gaps, I also need to drop crsp_portnos that have these gaps.

        But when I run command -by crsp_portno (mofd), sort: gen gap = mofd != mofd[_n-1] + 1 & _n > 1-, I get "0" when the months are consecutive, 1 when the months are not consecutive, and 1 when the months are duplicates (eg. 5th last observation). Therefore, I cannot simply drop observations for which gap==1. May I kindly ask you for help again how to work with this problem?

        Extract of the sample:

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input double(crsp_fundno crsp_portno) float(mofd gap totalreturn) double rf
        27912 1000502 540 0  -.07568027                .0016
        27914 1000502 540 1  -.07568027                .0016
        27912 1000502 541 0  .005501574                .0016
        27914 1000502 541 1  .005501574                .0016
        27912 1000502 542 0 -.008718219                .0021
        27914 1000502 542 1 -.008718219                .0021
        27914 1000502 543 0  -.05552376                .0021
        27912 1000502 543 1  -.05552376                .0021
        27912 1000502 544 0   .12067322                .0024
        27914 1000502 544 1   .12067322                .0024
        27914 1000502 545 0 -.017471263                .0023
        27912 1000502 545 1 -.017471263                .0023
        27914 1000502 546 0   .06445238                .0024
        27912 1000502 546 1   .06445238                .0024
        27912 1000502 547 0  -.04817391                 .003
        27914 1000502 547 1  -.04817391                 .003
        27914 1000502 548 0  .008999858                .0029
        27912 1000502 548 1  .008999858                .0029
        27914 1000502 549 0  .010574147                .0027
        27912 1000502 549 1  .010574147                .0027
        27912 1000502 550 0   .06298639                .0031
        27914 1000502 550 1   .06298639                .0031
        27914 1000502 551 0 -.011978362                .0032
        27912 1000502 551 1 -.011978362                .0032
        27912 1000502 552 0   .05915558 .0034999999999999996
        27914 1000502 552 1   .05915558 .0034999999999999996
        27912 1000502 553 0 -.001705507 .0034000000000000002
        27914 1000502 553 1 -.001705507 .0034000000000000002
        27914 1000502 554 0  .015842576                .0037
        27912 1000502 554 1  .015842576                .0037
        27912 1000502 555 0  -.01658717                .0036
        27914 1000502 555 1  -.01658717                .0036
        27912 1000502 556 0 -.071240306                .0043
        27914 1000502 556 1 -.071240306                .0043
        27914 1000502 557 0   -.0097166                 .004
        27912 1000502 572 1   .01296426                .0032
        27912 1000502 573 0   .03977862                .0032
        27912 1000502 574 0  -.02461743 .0034000000000000002
        27912 1000502 575 0  .010231923                .0027
        27912 1000502 576 0  -.06853478                .0021
        end
        More than grateful for your help.

        Last edited by Rafal Krol; 23 Dec 2021, 09:43.

        Comment


        • #5
          If you change --by crsp_portno (mofd), sort: gen gap = mofd != mofd[_n-1] + 1 & _n > 1- to
          Code:
          -by crsp_portno (mofd), sort: gen gap = mofd > mofd[_n-1] + 1 & _n > 1-
          you will identify gaps without marking duplicates.

          Comment


          • #6
            Thank you very much Clyde. I really do appreciate it.

            Comment


            • #7
              Hello everyone,
              I would like to kindly ask for some assistance, again. The code worked perfectly fine, but I would like to ask for a little adjustment.
              The code used so far:
              Code:
              bys crsp_portno: gen obs=_N
              drop if obs<60
              by crsp_portno (mofd), sort: gen gap = mofd > mofd[_n-1] + 1 & _n > 1
              bysort crsp_portno (mofd): egen todrop = max(gap>0)
              drop if todrop
              I can see that a company that has a history of say 500 months might be deleted because it has a gap at the very beginning, and thus I might lose many important companies in similar situations.
              May I kindly ask you to tell me how to possibly "trim" the unneeded part, and to preserve the longest interval?

              Thank you in advance.

              Comment


              • #8
                I'm using the example data from #4 here in working up the code. Consider the following:
                Code:
                by crsp_portno (mofd), sort: gen byte run = sum(mofd > mofd[_n-1] + 1)
                by crsp_portno run (mofd), sort: gen duration = mofd[_N] - mofd[1]
                drop if duration < 60
                This will eliminate from the data any run of consecutive dates (but allowing the same date to be repeated) that does not have a duration of at least 60 days beginning to end. You then say you want to keep the longest such run. But what if the same crsp_portno has two or more runs that are each, say, 130 days, and those are all "the longest?" Which one do you want to keep? The following will keep, from among those, the one that is chronologically latest:
                Code:
                by crsp_portno duration run (mofd), sort: keep if run == run[_N]
                That is the easiest selection to code. If you want something different, post back with an explanation.

                Added: By the way, though not relevant to the current question, I notice that you are using -rangestat- with -interval(datecode -60 0)-. Presumably datecode is a daily date. But, I wanted to point out that this is a 61 day window, not a 60 day window, as both interval endpoints are included.
                Last edited by Clyde Schechter; 25 May 2022, 13:58.

                Comment


                • #9
                  Thank you very much Clyde. If anything comes up I will let you know. Thank you for the valuable insight on rangestat. I doubt that I would ever notice this error....

                  Comment


                  • #10
                    Please ignore this message. Made a mistake in my post and need to check it. If there will be a possibility to edit this message I will do so, if not I will post a new one.
                    Last edited by Rafal Krol; 10 Jul 2022, 09:29.

                    Comment


                    • #11
                      Hello everyone,
                      Adjusting my study, I would like to run the rolling window but I would like to make sure, when I run the estimation on 60 month rolling window, that there is at least 24 non-missing observations in terms of returns. At the moment, the code makes sure that there are at least 60 observations and the longest run is kept.

                      This time (I am sorry for the changes, but you did not waste your time as I learned a lot from your insights) I would like to make sure that there are at least 60 observations (otherwise I cannot run the 60 month window) first, which is done by

                      Code:
                      by crsp_portno (mofd), sort: gen duration = mofd[_N] - mofd[1]
                      drop if duration < 60
                      and now - before running regressions, I need to make sure that there are at least 24 non-missing observations in terms of returns which is variable "totalreturn" in any 60 month rolling window.
                      I think that it is the simplest possible explanation. Before this post, my needs were slightly different and covered only some scenarios, such as consecutive months, then the gaps and lastly the longest period.

                      May I kindly ask you to help me with this one again?
                      More than thankful for any help in my query.

                      Best,
                      Rafał

                      Comment


                      • #12
                        I'm not sure I understand what you are asking for, but I think it's this:
                        Code:
                        by crsp_portno (mofd), sort: egen n_totalreturn = count(totalreturn)
                        drop if n_totalreturn < 24
                        Added: Sorry, that's to assure there are at least 24 non missing values of total_return in the data corresponding to crsp_portno. I realize now you want each window to have 24 values. That's more complicated, and you can't do it by -drop-ing observations, because dropping from one window can interfere with other windows that overlap it. What you have to do is just run the regressions for all of the windows and then go back and remove the results from those windows where the sample size turned out to be < 24.

                        You can use -rangestat- to do the regressions and then rely on the variable reg_nobs it creates:

                        Code:
                        rangestat (reg) totalreturn whatever, by(crsp_fundno) excludeself interval(date -60 -1)
                        foreach v of varlist reg_42-se_cons {
                            replace `v' = . if reg_nobs < 24
                        }
                        Note: It isn't clear what your regression command actually is and whetehr totalreturn is on the right or left hand side, so I've just glossed it in the above. Replace the italicized material with the actual regression.
                        Last edited by Clyde Schechter; 10 Jul 2022, 10:30.

                        Comment


                        • #13
                          Total return is left and a benchmark is right hand side. Sorry. I think this is what I needed!
                          Thank you very much Clyde.

                          Comment

                          Working...
                          X