Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to compute percentiles with rangestat?

    Dear STATA-listers,

    I was wondering if there is any way to compute percentile with rangestat? I tried putting (pctile) in command syntax where it allows to mention statistic of interest:

    rangestat (pctile, p90) var_x, int(year -3 -1) by(group_var)

    but STATA returns by saying it is not a valid stat.

    Any guidance will be highly appreciated.

    Thank you.

    Kind regards,
    Mg

  • #2
    A good approach is to seek guidance in the output of help rangestat.

    Doing so first of all tells us that the syntax (pctile, p90) you used is not among those supported by rangestat, so it is no suprise that it did not work.

    Scrolling to nearly the end of the help rangestat output takes us to a section titled

    Moving quantiles using a user-supplied Mata function
    and it is there you will find an example that may point the way to obtain what you need.

    Comment


    • #3
      Dear William,

      Thank you so much. Let me try the approach you pointed out.

      Kind regards,
      Mg

      Comment


      • #4
        I am attaching a short data excerpt.

        I had installed mm_quantile() after reading the "Moving quantiles using a user-supplied Mata function" section in rangestat help, as suggested.

        Here is my code:
        mata
        mata clear
        real rowvector myquantile(real colvector X) {
        return(mm_quantile(X, 1, (0.1, 0.25, 0.5, 0.75, 0.9)))
        }
        end

        rangestat (myquantile) complaints, interval(yr -2 0) by(region)

        However, I get the following error:
        _editvalue(): 3200 conformability error
        _mm_quantile(): - function returned error
        mm_quantile(): - function returned error
        mm_quantile(): - function returned error
        myquantile(): - function returned error
        do_flex_stats(): - function returned error
        <istmt>: - function returned error

        It is the first time I am using MATA, so a bit clueless. I read a few posts, it seems there could be matrix order issue, but could not figure it out.

        Please guide me how to tackle this issue.

        Thank you.

        Kind regards,
        Mg
        Attached Files

        Comment


        • #5
          What's biting here is that you are requesting 5 quantile values but there are cases where the number of observations (within the same region and within the interval) is less than 5. All you need is to do is to add a line to return a missing value when the number of observations is insufficient. If the data example you posted in #4 is saved as "dataex.dta", the following code should do what you want. I add a couple of manual checks to verify that the problem has been set up properly:
          Code:
          clear all
          use "dataex.dta"
          isid region yr id batch, sort
          
          mata:  
              real rowvector myquantile(real colvector X) {
                  if (rows(X) < 5) return(.)
                  return(mm_quantile(X, 1, (0.1, 0.25, 0.5, 0.75, 0.9)))
              }
          end 
          
          rangestat (count) qobs=complaints (myquantile) complaints, ///
              interval(yr -2 0) by(region)
          
          rename myquantile* (q10 q25 q50 q75 q90)
          
          * spot check results for observation 60
          list if yr == yr[60] & region == region[60]
          sum complaints if inrange(yr, yr[60]-2, yr[60]) & region == region[60], detail
          
          * spot check results for observation 243
          list if yr == yr[243] & region == region[243]
          sum complaints if inrange(yr, yr[243]-2, yr[243]) & region == region[243], detail
          and the spot check results:
          Code:
          . * spot check results for observation 60
          . list if yr == yr[60] & region == region[60]
          
               +----------------------------------------------------------------------------------+
               |   batch      id   region   compla~s     yr   qobs   q10   q25   q50    q75   q90 |
               |----------------------------------------------------------------------------------|
           59. | 4.3e+06   10006      123          2   1979      8     2     3     7   14.5    25 |
           60. | 4.3e+06   10006      123          3   1979      8     2     3     7   14.5    25 |
           61. | 4.3e+06   10321      123          4   1979      8     2     3     7   14.5    25 |
           62. | 4.3e+06   10321      123         25   1979      8     2     3     7   14.5    25 |
           63. | 4.3e+06   10321      123          3   1979      8     2     3     7   14.5    25 |
               +----------------------------------------------------------------------------------+
          
          . sum complaints if inrange(yr, yr[60]-2, yr[60]) & region == region[60], detail
          
                                   complaints
          -------------------------------------------------------------
                Percentiles      Smallest
           1%            2              2
           5%            2              3
          10%            2              3       Obs                   8
          25%            3              4       Sum of Wgt.           8
          
          50%            7                      Mean                9.5
                                  Largest       Std. Dev.      8.468429
          75%         14.5             10
          90%           25             10       Variance       71.71429
          95%           25             19       Skewness       .8660691
          99%           25             25       Kurtosis       2.333757
          
          . 
          . * spot check results for observation 243
          . list if yr == yr[243] & region == region[243]
          
               +---------------------------------------------------------------------------------+
               |   batch      id   region   compla~s     yr   qobs   q10   q25   q50   q75   q90 |
               |---------------------------------------------------------------------------------|
          240. | 4.4e+06   10015      251          7   1982     20    .5     1     4   6.5    12 |
          241. | 4.4e+06   10015      251         15   1982     20    .5     1     4   6.5    12 |
          242. | 4.4e+06   10015      251          6   1982     20    .5     1     4   6.5    12 |
          243. | 4.4e+06   10015      251          0   1982     20    .5     1     4   6.5    12 |
               +---------------------------------------------------------------------------------+
          
          . sum complaints if inrange(yr, yr[243]-2, yr[243]) & region == region[243], detail
          
                                   complaints
          -------------------------------------------------------------
                Percentiles      Smallest
           1%            0              0
           5%            0              0
          10%           .5              1       Obs                  20
          25%            1              1       Sum of Wgt.          20
          
          50%            4                      Mean                5.1
                                  Largest       Std. Dev.      5.892904
          75%          6.5              7
          90%           12              9       Variance       34.72632
          95%           20             15       Skewness       2.198081
          99%           25             25       Kurtosis       7.778285
          
          .

          Comment


          • #6
            Using your sample data, I have the same result that you do, and I am equally at a loss as to what the problem is.

            Since your intent in post #1 was to get the 90th percentile, the following change to your Mata code - based on the documentation in help mf_mm_quantile - should get what you need,
            Code:
            return(mm_quantile(X, 1, 0.9))
            Added in edit: crossed with Robert's definitive answer.

            Comment


            • #7
              Dear Robert and dear William,

              Thank you very much for your responses!

              Just reached my office and will be running the code soon.

              Kind regards,
              Mg

              Comment


              • #8
                Dear Robert and dear William,

                The code is working just fine.

                Thank you again for the help!

                Kind regards,
                Mg

                Comment


                • #9
                  You can also use the perc(k) option of asrol (available from SSC) for rolling window percentiles. You shall find that asrol is blinking fast. To download asrol

                  Code:
                   
                   ssc install asrol
                  The option perc(k) returns the k-th percentile of values in a range. This option must be used in combination with the option stat(median). Without using perc(k) option, stat(median) finds the median value or the 50th percentile of the values in a given window. However, if option perc(k) is specified, then the stat(median) will find k-th percentile of the values in range. For example, if we are interested in finding the 75th percentiles of the values in our desired rolling window, then we have to invoke the option perc(.75) along with using the option stat(median).

                  See the following example where we shall find the 75th percentile of the variable profitability in a rolling window of 5 years for each industry in each country.

                  Code:
                   
                   bys country industry : asrol profitability, window(year 5) stat(median) perc(.75)
                  Note :
                  The calculation of percentiles follows a similar method as used in summarize and _pctile. Therefore, the percentile values might be slightly different from the values calculated with centile. For details related to different definitions of percentiles, see Hyndman and Fan (1996).

                  Hyndman, R., & Fan, Y. (1996). Sample Quantiles in Statistical Packages. The American Statistician, 50(4), 361-365. doi:10.2307/2684934
                  Regards
                  --------------------------------------------------
                  Attaullah Shah, PhD.
                  Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
                  FinTechProfessor.com
                  https://asdocx.com
                  Check out my asdoc program, which sends outputs to MS Word.
                  For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

                  Comment

                  Working...
                  X