Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating a Count Variable using the 90 percentile centered on a five day window


    Hi I am working with a panel data set for 300 districts from 1959 to 1999 using Stata 13. I have got mean, minimum and maximum daily temperature data. Example is given below

    I am trying to identify the days on which the daily maximum temperature of a district in the given period exceeds its 90th percentile centred on a 5-day window for the base period (in my case 1959 to 1999). Specifically, I need to generate a variable which takes the value 1 if the maximum temperature exceeds the 90th percentile centred on a 5-day window and 0 otherwise.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(DT MT YEAR) double(LAT LON) float(MEANT MINT MAXT id)
    1 1 1969 7.5 94.5  25.2 20.32 30.08  1
    1 1 1970 7.5 94.5  27.7 25.42 29.97  2
    1 1 1971 7.5 94.5 26.72 24.26 29.18  3
    1 1 1972 7.5 94.5 25.39 21.73 29.05  4
    1 1 1973 7.5 94.5 27.63 24.88 30.38  5
    1 1 1974 7.5 94.5 26.39 24.41 28.38  6
    1 1 1975 7.5 94.5 26.74 23.62 29.86  7
    1 1 1978 7.5 94.5 27.17 23.86 30.48  8
    1 1 1980 7.5 94.5 26.98 23.75  30.2  9
    1 1 1981 7.5 94.5  26.1 22.04 30.15 10
    1 1 1982 7.5 94.5 26.98 24.41 29.55 11
    1 1 1983 7.5 94.5 25.73 21.64 29.81 12
    1 1 1988 7.5 94.5 25.88 22.01 29.76 13
    1 1 1989 7.5 94.5 27.24 23.92 30.55 14
    1 2 1969 7.5 94.5 25.27 20.83 29.71 15
    end
    (I had a similar issue some time ago and received very helpful suggestions (https://www.statalist.org/forums/for...nuous-sequence), however I am not sure how to modify the code to incorporate the 90th percentile centered on a 5-day window!)

    Any suggestion will be appreciated



  • #2

    Hi
    It seems that rangestat (by Robert Picard, Nicholas J. Cox and Roberto Ferrer) is what I need to use for the above issue (I also looked up tsegen)

    As I need a count variable that takes the value 1 if the maximum temperature exceeds the 90th percentile based on a 5-day window for the base period (in my case 1959 to 1999) centered on the day in question I use the following code.

    Code:
    use DATAFILE, clear
    rename (DT MT) (DATE MONTH)
    sort DATE MONTH YEAR LAT LON
    egen district_idd=concat( LAT LON)
    encode district_idd, generate(district_id)
    gen ddate = mdy(MONTH, DATE, YEAR)
    format ddate %td vent
     
    ***RANGESTAT (CALCULATE 90 PERCENTILE)****
    tsset district_id ddate
    mata:
                mata clear
                real rowvector myquantile(real colvector X) {
                    return(mm_quantile(X, 1, (0.9)))
                }
            end
    rangestat (myquantile) MAXT , interval( ddate -7 7) by(district_id)
    rename myquantile1 p90
    label var p90 "p90"
    
    *GENERATE COUNT VARIABLE*
    gen max90exceeds = MAXT >= p90
    by district_id (ddate), sort: gen max90spell = sum(max90exceeds != max90exceeds[_n-1])
    by district_id max90spell (ddate), sort: gen max90spell_length = _N
    gen byte HWD90 = max90spell_length >= 0 & max90exceeds == 1


    The issue I have is that I am not sure that the code above is using the historical data to calculate the percentile. For example there are 365 days so for May 3 we will use the historical data for May 1 to 5 for all the years by panel. As I have data for 40 years so the 5-day window will yield a sample size of 40 years × 5 = 200 for each calendar day. So the moving window is based on the calendar day.

    Not sure whether I have applied rangestat properly to capture this
    Will appreciate any comments



    Last edited by Jaweriah Abdullah; 08 Feb 2019, 00:59.

    Comment


    • #3
      I don't think your code is quite right. But the more I read this thread the more confusing I find it. You said you are trying to use a 5 day window, so why would you use -7 to 7 in the interval for -rangestat-?

      In any case, if you want your window to cover 5 days before to 5 days after the date but include those dates in all 40 years, then your code will give you incorrect answers. Your variable ddate gives the full date, so your -rangestat- will only look at observations within 5 (or, as you coded it, 7) days in the same year. What you need is this:

      Code:
      gen day_of_year = doy(ddate)
      rangestat (myquantile) MAXT, by(district_id) interval(day_of_year -5 5)  // OR IS IT - 7 to 7???

      Comment


      • #4
        Dear Clyde

        Thank you very much for correcting the code. I will try it
        I was initially trying a fifteen day window that is why I had tried the code with -7 and 7
        It was remiss of me not to change it to 5 while posting

        Thanks

        Comment

        Working...
        X