Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ratio 2 Variables Cumulatively

    Good afternoon,

    I start with 4 variables in the first 4 columns.
    I then run a do file which averages the top half of the (performancerating) by ddate, then averages the second half of the (performancerating) by ddate, then ratios that by dividing one over the other (improvementratioall)

    The code achieves this by working on the entire dataset
    Would it be possible to modify the code so it would ratio all performances up to but not including the current ddate?

    For example at (racetyperuns) 12 it would calculate the average performancerating for the first 6 runs, the second 6 run,s and then ratio them.

    Thanks
    Hans

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str23 racedate str21 horse str1 racecode int performancerating float(ddate tfrlto racetyperuns ismissing islower ishigher lowernum lowerden highernum higherden firsthalf secondhalf improvementratioall)
    "24/06/2015" "Zzoro" "f" 59 20263  .  0 1 .5 .5    0 .5    0 .5        0        0        .
    "23/07/2015" "Zzoro" "f" 67 20292 59  1 0  1  0 1650 21 1541 21 78.57143 73.38095 .9339394
    "14/08/2015" "Zzoro" "f" 86 20314 67  2 0  1  0 1650 21 1541 21 78.57143 73.38095 .9339394
    "13/10/2015" "Zzoro" "f" 77 20374 86  3 0  1  0 1650 21 1541 21 78.57143 73.38095 .9339394
    "12/04/2016" "Zzoro" "f" 87 20556 77  4 0  1  0 1650 21 1541 21 78.57143 73.38095 .9339394
    "04/05/2016" "Zzoro" "f" 78 20578 87  5 0  1  0 1650 21 1541 21 78.57143 73.38095 .9339394
    "21/05/2016" "Zzoro" "f" 70 20595 78  6 0  1  0 1650 21 1541 21 78.57143 73.38095 .9339394
    "29/07/2016" "Zzoro" "f" 63 20664 70  7 0  1  0 1650 21 1541 21 78.57143 73.38095 .9339394
    "05/10/2016" "Zzoro" "f" 91 20732 63  8 0  1  0 1650 21 1541 21 78.57143 73.38095 .9339394
    "21/10/2016" "Zzoro" "f" 85 20748 91  9 0  1  0 1650 21 1541 21 78.57143 73.38095 .9339394
    "01/05/2017" "Zzoro" "f" 72 20940 85 10 0  1  0 1650 21 1541 21 78.57143 73.38095 .9339394
    "19/05/2017" "Zzoro" "f" 96 20958 72 11 0  1  0 1650 21 1541 21 78.57143 73.38095 .9339394
    "22/06/2017" "Zzoro" "f" 89 20992 96 12 0  1  0 1650 21 1541 21 78.57143 73.38095 .9339394
    "30/06/2017" "Zzoro" "f" 88 21000 89 13 0  1  0 1650 21 1541 21 78.57143 73.38095 .9339394
    "15/09/2017" "Zzoro" "f" 90 21077 88 14 0  1  0 1650 21 1541 21 78.57143 73.38095 .9339394
    "27/09/2017" "Zzoro" "f" 79 21089 90 15 0  1  0 1650 21 1541 21 78.57143 73.38095 .9339394
    "18/10/2017" "Zzoro" "f" 65 21110 79 16 0  1  0 1650 21 1541 21 78.57143 73.38095 .9339394
    "25/03/2018" "Zzoro" "f" 86 21268 65 17 0  1  0 1650 21 1541 21 78.57143 73.38095 .9339394
    "21/04/2018" "Zzoro" "f" 54 21295 86 18 0  1  0 1650 21 1541 21 78.57143 73.38095 .9339394
    "30/04/2018" "Zzoro" "f" 83 21304 54 19 0  1  0 1650 21 1541 21 78.57143 73.38095 .9339394
    "28/05/2018" "Zzoro" "f" 85 21332 83 20 0  1  0 1650 21 1541 21 78.57143 73.38095 .9339394
    "01/06/2018" "Zzoro" "f" 77 21336 85 21 0  1  0 1650 21 1541 21 78.57143 73.38095 .9339394
    "11/07/2018" "Zzoro" "f" 86 21376 77 22 0  0  1 1650 21 1541 21 78.57143 73.38095 .9339394
    "08/08/2018" "Zzoro" "f" 80 21404 86 23 0  0  1 1650 21 1541 21 78.57143 73.38095 .9339394
    "18/08/2018" "Zzoro" "f" 71 21414 80 24 0  0  1 1650 21 1541 21 78.57143 73.38095 .9339394
    "27/08/2018" "Zzoro" "f" 86 21423 71 25 0  0  1 1650 21 1541 21 78.57143 73.38095 .9339394
    "21/09/2018" "Zzoro" "f" 84 21448 86 26 0  0  1 1650 21 1541 21 78.57143 73.38095 .9339394
    "22/10/2018" "Zzoro" "f" 76 21479 84 27 0  0  1 1650 21 1541 21 78.57143 73.38095 .9339394
    "06/05/2019" "Zzoro" "f" 83 21675 76 28 0  0  1 1650 21 1541 21 78.57143 73.38095 .9339394
    "12/06/2019" "Zzoro" "f" 70 21712 83 29 0  0  1 1650 21 1541 21 78.57143 73.38095 .9339394
    "10/07/2019" "Zzoro" "f" 84 21740 70 30 0  0  1 1650 21 1541 21 78.57143 73.38095 .9339394
    "01/08/2019" "Zzoro" "f" 82 21762 84 31 0  0  1 1650 21 1541 21 78.57143 73.38095 .9339394
    "17/08/2019" "Zzoro" "f" 81 21778 82 32 0  0  1 1650 21 1541 21 78.57143 73.38095 .9339394
    "28/08/2019" "Zzoro" "f" 86 21789 81 33 0  0  1 1650 21 1541 21 78.57143 73.38095 .9339394
    "13/09/2019" "Zzoro" "f" 84 21805 86 34 0  0  1 1650 21 1541 21 78.57143 73.38095 .9339394
    "29/09/2019" "Zzoro" "f" 84 21821 84 35 0  0  1 1650 21 1541 21 78.57143 73.38095 .9339394
    "01/11/2019" "Zzoro" "f" 28 21854 84 36 0  0  1 1650 21 1541 21 78.57143 73.38095 .9339394
    "19/11/2019" "Zzoro" "f" 72 21872 28 37 0  0  1 1650 21 1541 21 78.57143 73.38095 .9339394
    "04/06/2020" "Zzoro" "f" 27 22070 72 38 0  0  1 1650 21 1541 21 78.57143 73.38095 .9339394
    "13/07/2020" "Zzoro" "f" 56 22109 27 39 0  0  1 1650 21 1541 21 78.57143 73.38095 .9339394
    "22/08/2020" "Zzoro" "f" 70 22149 56 40 0  0  1 1650 21 1541 21 78.57143 73.38095 .9339394
    "31/08/2020" "Zzoro" "f" 74 22158 70 41 0  0  1 1650 21 1541 21 78.57143 73.38095 .9339394
    "12/10/2020" "Zzoro" "f" 46 22200 74 42 0  0  1 1650 21 1541 21 78.57143 73.38095 .9339394
    end
    format %td ddate
    The code I'm using so far...

    gen ddate = daily(racedate, "DMY")
    format ddate %td
    bysort horse racecode (ddate): gen tfrlto = (performancerating[_n-1])
    bysort horse racecode (ddate): gen racetyperuns = _n-1
    gen ismissing = missing(tfrlto)
    replace ismissing = . if(racetyperuns == .)
    bysort ismissing horse (racetyperuns) : gen islower = _n <= _N/2 if mod(_N, 2) == 0
    by ismissing horse : replace islower = cond(_n < _N/2, 1, cond(_n == (_N + 1)/2, 0.5, 0)) if mod(_N, 2) == 1
    gen ishigher = 1 - islower
    by ismissing horse : egen lowernum = total(islower * tfrlto)
    by ismissing horse : egen lowerden = total(islower)
    by ismissing horse : egen highernum = total(ishigher * tfrlto)
    by ismissing horse : egen higherden = total(ishigher)
    gen firsthalf = lowernum/lowerden
    gen secondhalf = highernum/higherden
    gen improvementratioall = (secondhalf/firsthalf)
    sort horse ddate

  • #2
    Well, you've actually already done all the hard parts. It's just a matter of wrapping the core calculations in a program, converting horse to a labeled numeric variable, and executing the program under -rangerun-. -rangerun- is written by Robert Picard and is available from SSC. To use -rangerun-, you must also install -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.

    Code in bold face is code that is added to, or different from, yours.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str23 racedate str21 horse str1 racecode int performancerating
    "24/06/2015" "Zzoro" "f" 59
    "23/07/2015" "Zzoro" "f" 67
    "14/08/2015" "Zzoro" "f" 86
    "13/10/2015" "Zzoro" "f" 77
    "12/04/2016" "Zzoro" "f" 87
    "04/05/2016" "Zzoro" "f" 78
    "21/05/2016" "Zzoro" "f" 70
    "29/07/2016" "Zzoro" "f" 63
    "05/10/2016" "Zzoro" "f" 91
    "21/10/2016" "Zzoro" "f" 85
    "01/05/2017" "Zzoro" "f" 72
    "19/05/2017" "Zzoro" "f" 96
    "22/06/2017" "Zzoro" "f" 89
    "30/06/2017" "Zzoro" "f" 88
    "15/09/2017" "Zzoro" "f" 90
    "27/09/2017" "Zzoro" "f" 79
    "18/10/2017" "Zzoro" "f" 65
    "25/03/2018" "Zzoro" "f" 86
    "21/04/2018" "Zzoro" "f" 54
    "30/04/2018" "Zzoro" "f" 83
    "28/05/2018" "Zzoro" "f" 85
    "01/06/2018" "Zzoro" "f" 77
    "11/07/2018" "Zzoro" "f" 86
    "08/08/2018" "Zzoro" "f" 80
    "18/08/2018" "Zzoro" "f" 71
    "27/08/2018" "Zzoro" "f" 86
    "21/09/2018" "Zzoro" "f" 84
    "22/10/2018" "Zzoro" "f" 76
    "06/05/2019" "Zzoro" "f" 83
    "12/06/2019" "Zzoro" "f" 70
    "10/07/2019" "Zzoro" "f" 84
    "01/08/2019" "Zzoro" "f" 82
    "17/08/2019" "Zzoro" "f" 81
    "28/08/2019" "Zzoro" "f" 86
    "13/09/2019" "Zzoro" "f" 84
    "29/09/2019" "Zzoro" "f" 84
    "01/11/2019" "Zzoro" "f" 28
    "19/11/2019" "Zzoro" "f" 72
    "04/06/2020" "Zzoro" "f" 27
    "13/07/2020" "Zzoro" "f" 56
    "22/08/2020" "Zzoro" "f" 70
    "31/08/2020" "Zzoro" "f" 74
    "12/10/2020" "Zzoro" "f" 46
    end
    
    gen ddate = daily(racedate, "DMY")
    format ddate %td
    bysort horse racecode (ddate): gen tfrlto = (performancerating[_n-1])
    bysort horse racecode (ddate): gen racetyperuns = _n-1
    gen ismissing = missing(tfrlto)
    encode horse, gen(_horse)
    drop horse
    rename _horse horse
    
    capture program drop one_horse_one_date
    program define one_horse_one_date
        bysort ismissing horse (racetyperuns) : gen islower = _n <= _N/2 if mod(_N, 2) == 0
        by ismissing horse : replace islower = cond(_n < _N/2, 1, cond(_n == (_N + 1)/2, 0.5, 0)) if mod(_N, 2) == 1
        gen ishigher = 1 - islower
        by ismissing horse : egen lowernum = total(islower * tfrlto)
        by ismissing horse : egen lowerden = total(islower)
        by ismissing horse : egen highernum = total(ishigher * tfrlto)
        by ismissing horse : egen higherden = total(ishigher)
        gen firsthalf = lowernum/lowerden
        gen secondhalf = highernum/higherden
        gen improvementratioall = (secondhalf/firsthalf)
        keep if !ismissing
        keep in L
        exit
    end
    
    rangerun one_horse_one_date, by(horse) interval(ddate . 0)
    By the way, I eliminated the -replace ismissing = . if (racetyperuns == .)- command for two reasons. First, because of the way you created variable racetyperuns, it can never be missing, so the command does nothing at all. Second of all, if it could do something, it would disrupt the nice 1/0 coding of the ismissing variable, which would actually mess things up at the end of program one_horse_one_date.
    Last edited by Clyde Schechter; 21 Sep 2023, 10:30.

    Comment


    • #3
      Thanks very much, Clyde. Thats works fine. What I should have mentioned is that I have 104808 separate horses in the actual dataset. When the code reaches 'encode' I'm getting a 'too many values' error. As far as I understand encode only allows about 65,000 numeric values. Do you have any suggestions for a work-around this issue? Thanks again.

      Comment


      • #4
        Code:
        rename horse horse_name
        egen horse = group(horse_name)
        The limitation on -encode- is secondary to the limitation on value labels. You have too many different horses, as you noted. -egen, group- has no such limitation, but you cannot label the numbers with values. Unfortunately, the code requires a numeric variable for horse. So, the above solution will do the trick and still enable you to identify which horse is which.

        Comment


        • #5
          Thanks, Clyde. For clarity could you put the entire revised code up again please. I'm running it now, taking a long time. May not be practical. Just want to be sure I have the code correct....

          Comment


          • #6
            I made a few additional changes to the code which may (or may not) somewhat speed things up, removing horse from the sorting of the various -by- commands inside program one_horse_one_date. It is not mention to sort on horse there because, running under -rangerun-, there will only be a single value of horse in active memory at any time. Similarly, as the -ismissing == 1- observations play no real role in the calculation, it makes sense drop them at the beginning of program one_horse_one_date, and thereby eliminate all of the sorting on that variable as well.
            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input str23 racedate str21 horse str1 racecode int performancerating
            "24/06/2015" "Zzoro" "f" 59
            "23/07/2015" "Zzoro" "f" 67
            "14/08/2015" "Zzoro" "f" 86
            "13/10/2015" "Zzoro" "f" 77
            "12/04/2016" "Zzoro" "f" 87
            "04/05/2016" "Zzoro" "f" 78
            "21/05/2016" "Zzoro" "f" 70
            "29/07/2016" "Zzoro" "f" 63
            "05/10/2016" "Zzoro" "f" 91
            "21/10/2016" "Zzoro" "f" 85
            "01/05/2017" "Zzoro" "f" 72
            "19/05/2017" "Zzoro" "f" 96
            "22/06/2017" "Zzoro" "f" 89
            "30/06/2017" "Zzoro" "f" 88
            "15/09/2017" "Zzoro" "f" 90
            "27/09/2017" "Zzoro" "f" 79
            "18/10/2017" "Zzoro" "f" 65
            "25/03/2018" "Zzoro" "f" 86
            "21/04/2018" "Zzoro" "f" 54
            "30/04/2018" "Zzoro" "f" 83
            "28/05/2018" "Zzoro" "f" 85
            "01/06/2018" "Zzoro" "f" 77
            "11/07/2018" "Zzoro" "f" 86
            "08/08/2018" "Zzoro" "f" 80
            "18/08/2018" "Zzoro" "f" 71
            "27/08/2018" "Zzoro" "f" 86
            "21/09/2018" "Zzoro" "f" 84
            "22/10/2018" "Zzoro" "f" 76
            "06/05/2019" "Zzoro" "f" 83
            "12/06/2019" "Zzoro" "f" 70
            "10/07/2019" "Zzoro" "f" 84
            "01/08/2019" "Zzoro" "f" 82
            "17/08/2019" "Zzoro" "f" 81
            "28/08/2019" "Zzoro" "f" 86
            "13/09/2019" "Zzoro" "f" 84
            "29/09/2019" "Zzoro" "f" 84
            "01/11/2019" "Zzoro" "f" 28
            "19/11/2019" "Zzoro" "f" 72
            "04/06/2020" "Zzoro" "f" 27
            "13/07/2020" "Zzoro" "f" 56
            "22/08/2020" "Zzoro" "f" 70
            "31/08/2020" "Zzoro" "f" 74
            "12/10/2020" "Zzoro" "f" 46
            end
            
            gen ddate = daily(racedate, "DMY")
            format ddate %td
            bysort horse racecode (ddate): gen tfrlto = (performancerating[_n-1])
            bysort horse racecode (ddate): gen racetyperuns = _n-1
            gen ismissing = missing(tfrlto)
            rename horse horse_name
            egen horse = group(horse_name)
            
            capture program drop one_horse_one_date
            program define one_horse_one_date
                keep if !ismissing
                sort racetyperuns
                gen islower = _n <= _N/2 if mod(_N, 2) == 0
                replace islower = cond(_n < _N/2, 1, cond(_n == (_N + 1)/2, 0.5, 0)) if mod(_N, 2) == 1
                gen ishigher = 1 - islower
                egen lowernum = total(islower * tfrlto)
                egen lowerden = total(islower)
                egen highernum = total(ishigher * tfrlto)
                egen higherden = total(ishigher)
                gen firsthalf = lowernum/lowerden
                gen secondhalf = highernum/higherden
                gen improvementratioall = (secondhalf/firsthalf)
                keep in L
                exit
            end
            
            rangerun one_horse_one_date, by(horse) interval(ddate . 0)
            I guess a long time is in the eyes of the beholder. I am accustomed to running calculations that take weeks to conclude, and you are apparently dissatisfied after less than an hour. I have to say that with over 104,000 horses, and, if the example you showed is typical, each horse having about 40 observations, then, no, this is not going to finish in a matter of minutes. Only you can decide how long is practical given the benefit you can derive from getting the answers. But I would imagine that on a typical setup, this calculation should probably reach a conclusion in something like 3 hours.

            Comment


            • #7
              That's the code I have entered. Yes the time is an issue. It needs to be run daily as part of a routine, so multiple hours would be impractical. But very much appreciate your input, Clyde. Thanks again.

              Comment

              Working...
              X