Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • tsegen : rolling percentile

    I have tried to generate rolling 5th percentile of the variable rateofreturn using tsegen(SSC).

    Code:
    tsegen pct = rowpctile(L(0/59).rateofreturn,24),p(5)

    The result that I get, however, does not match when I do it in Excel. For example, in row 60 Stata gives -.045 whereas Excel gives .1215. Is something wrong with my code?

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(mdate id) double rateofreturn float pct
    420 1  .48     .
    421 1  .89     .
    422 1 1.18     .
    423 1    1     .
    424 1  1.5     .
    425 1  1.6     .
    426 1  1.3     .
    427 1 1.01     .
    428 1 1.44     .
    429 1 1.55     .
    430 1  1.2     .
    431 1 1.51     .
    432 1  .97     .
    433 1 1.21     .
    434 1  .42     .
    435 1  .82     .
    436 1 1.14     .
    437 1 1.09     .
    438 1  .18     .
    439 1  .62     .
    440 1 1.24     .
    441 1 1.18     .
    442 1  .85     .
    443 1    1   .42
    444 1   .7   .42
    445 1  .84   .42
    446 1  .66   .42
    447 1  .82   .42
    448 1  .71   .42
    449 1 1.38   .42
    450 1 2.28   .42
    451 1  .86   .42
    452 1  .66   .42
    453 1  .96   .42
    454 1  .75   .42
    455 1  .76   .42
    456 1  .77   .42
    457 1  .78   .42
    458 1 1.03   .42
    459 1  .39  .405
    460 1  .43   .42
    461 1  .34   .39
    462 1  .58   .39
    463 1 -.43   .34
    464 1 -.55   .18
    465 1 -.23  -.23
    466 1 1.27  -.23
    467 1 1.05  -.23
    468 1 1.38  -.23
    469 1  .75  -.23
    470 1 1.53  -.23
    471 1 1.61  -.23
    472 1  .51  -.23
    473 1  .53  -.23
    474 1  .71  -.23
    475 1  .14  -.23
    476 1   .4  -.23
    477 1  .45  -.23
    478 1  .89  -.23
    479 1 1.53 -.045
    480 1   .9 -.045
    481 1 1.98 -.045
    482 1  .21 -.045
    483 1 -.09  -.16
    484 1   .1  -.16
    485 1 1.01  -.16
    486 1  .67  -.16
    487 1  .48  -.16
    488 1    1  -.16
    489 1  .28  -.16
    490 1  .05  -.16
    491 1   .5  -.16
    492 1 1.94  -.16
    493 1 1.44  -.16
    494 1 1.64  -.16
    495 1 1.36  -.16
    496 1 -.31  -.27
    497 1 -.07  -.27
    498 1  .85  -.27
    499 1 1.53  -.27
    500 1   .4  -.27
    501 1  .92  -.27
    502 1  .58  -.27
    503 1  .21  -.27
    504 1  .91  -.27
    505 1 -.31  -.31
    506 1  .56  -.31
    507 1  .75  -.31
    508 1 -.13  -.31
    509 1  .32  -.31
    510 1 -1.6  -.37
    511 1 -.77  -.49
    512 1 1.57  -.49
    513 1  .77  -.49
    514 1    2  -.49
    515 1 2.02  -.49
    516 1 3.45  -.49
    517 1 1.02  -.49
    518 1 1.09  -.49
    519 1 1.22  -.49
    end
    format %tm mdate




  • #2
    For appropriate context you could read the methods and formulas in https://www.stata.com/manuals/rsummarize.pdf and the discussion of the many ways to calculate percentiles (quantiles) in e.g. the paper referenced at https://www.jstor.org/stable/2684934

    Your example is just the 5% percentile of the first 60 values. After reading in your example data (thanks), it is sufficient to go

    Code:
    . su rateofreturn in 1/60, detail
    
                            rateofreturn
    -------------------------------------------------------------
          Percentiles      Smallest
     1%         -.55           -.55
     5%        -.045           -.43
    10%         .365           -.23       Obs                  60
    25%           .6            .14       Sum of Wgt.          60
    
    50%         .855                      Mean               .877
                            Largest       Std. Dev.      .5077144
    75%        1.205           1.55
    90%         1.52            1.6       Variance       .2577739
    95%        1.575           1.61       Skewness      -.3183908
    99%         2.28           2.28       Kurtosis        3.93004
    to see first that tsegen is giving you what summarize would give you.

    Let's turn the magnifying glass on the tail of lower values.

    Code:
    keep in 1/60
    sort rateofreturn
    
    . list rate in 1/5
    
         +----------+
         | rateof~n |
         |----------|
      1. |     -.55 |
      2. |     -.43 |
      3. |     -.23 |
      4. |      .14 |
      5. |      .18 |
         +----------+
    
    . di (-0.23 + 0.14)/2
    -.045
    So that's a hand-calculation confirming the Stata estimate of the 5% point as just half-way between the 3rd and 4th ranked values in a sample of 60.

    Your (unnamed) Excel function is also producing a value between those two values. It's just using a different rule. It should be documented somewhere in the Excel universe; otherwise detective work and/or reading of the literature may unearth what the rule is.

    My wild guess is that Excel wants here to find for the 5% point the value with rank

    1 + (60 - 1) * 5/100

    and that it interpolates linearly when that result is not an integer.

    Summary: There are different small rules for this. Stata uses one; Excel uses another one in your application.

    Comment


    • #3
      Thanks for the insight Nick Cox . In Excel, I used =PERCENTILE.INC(array,k).

      Comment


      • #4
        OK, so presumably there is documentation of the exact rule. If not, abandon software that doesn't explain its statistical rules.

        Comment


        • #5
          I think another general point that this example illustrates is that there are different approaches to defining percentiles. In idealized continuous data distributions, there is no ambiguity. But in finite data samples, the desired quantile falls in between observed data points, and any value inside that gap is a potentially valid estimate of the quantile. Different software package handle this differently. In fact, even the same software package sometimes offers different ways. See, for example Stata's -pctile- command which, by default, uses the middle of the gap (just as -summ- does), but has an -altdef- option which will, instead, interpolate. Stata's PDF documentation fully explains both methods, so you know what you are getting.

          Comment

          Working...
          X