Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • New Variable for Average Value of First Half of Observations

    Good morning,

    I would like to generate a newvar that returns the average of the 'rating' variable, for the first half of the 'run_no'

    So in the example provided the 'run_no' goes in ascending order of date from 1-7.
    The first half of the run_no is therefore 3.5, so in this case with an odd number, the first half of the run_no can be rounded up to the first 4 'run_no' values.
    Therefore in this case I need the average 'rating' for the first half of the 'run_no' which is the average of 76-75-70-65, giving an average of 71.5 which can be rounded down to 71.

    If there were for example 36 'run_no', then the average 'rating' for the first 18 of these would be required etc...

    Thank you.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str9 ddate byte(run_no rating)
    "01-Aug-21" 7 98
    "26-Sep-20" 6 80
    "16-Aug-20" 5 93
    "31-Jul-20" 4 76
    "21-Jul-20" 3 75
    "30-Jun-20" 2 70
    "12-Jun-20" 1 65
    end

  • #2
    In your case you can just go

    Code:
    su rating if run_no <= ceil(run_no/_N)  , meanonly

    and look for r(mean) and round down. But presumably you would not be asking this without some other grouping variable, something like


    Code:
    egen count = count(run),  by(group) 
    
    egen wanted = mean(cond(run <= ceil(count/2), run_no, .)), by(group) 
    
    replace wanted = floor(wanted)
    This code populates all observations with the new means. If you want to populate only the observations for which they are calculated the code could be

    Code:
    egen wanted = mean(run_no) if  run <= ceil(count/2), by(group)
    See also https://www.stata-journal.com/articl...article=dm0055 -- especially Sections 9 and 10;

    Comment


    • #3
      Thanks very much Nick. I will take a look and provide feedback later today.

      Comment


      • #4
        First line of code should be


        Code:
         
         su rating if run_no <= ceil(_N/2)  , meanonly

        Comment


        • #5
          Hi Nick

          Your correct, there is a grouping variable, I've included it here as var4.
          Your code required one small modification in line 3 (run_no changed to rating) and is working as I need it now.

          Many thanks for your help

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str9 ddate byte(run_no rating) str8 var4 float(count wanted)
          "01-Aug-21" 7 98 "Paco Boy" 7 71
          "26-Sep-20" 6 80 "Paco Boy" 7 71
          "16-Aug-20" 5 93 "Paco Boy" 7 71
          "31-Jul-20" 4 76 "Paco Boy" 7 71
          "21-Jul-20" 3 75 "Paco Boy" 7 71
          "30-Jun-20" 2 70 "Paco Boy" 7 71
          "12-Jun-20" 1 65 "Paco Boy" 7 71
          end
          Code:
          su rating if run_no <= ceil(_N/2) , meanonly
          egen count = count(run_no), by(var4)
          egen wanted = mean(cond(run_no <= ceil(count/2), rating, .)), by(var4)
          replace wanted = floor(wanted)

          Comment


          • #6
            You are correct. Sorry about the silly error.

            Comment


            • #7
              Good evening,

              I wanted to make a small modification to this code...
              At the moment the wanted column is returning the average of the first half of the entire 7 runs in all its cells...
              In the example wanted returns 71 as the run_no is divided by 2 to give 3.5 then rounded up to 4. The average of the first 4 ratings are then returned and rounded down to 71 in the wanted column

              What I would like to do now is make the wanted column cumulative....for example run_no 7 takes the average rating of the first half of previous runs and returns them in wanted 1 (eg run_no - 1 = 6 runs, so average rating of run_no 1/2/3 is returned in wanted 1 (average(75+70+65 = 70)
              I hope that makes sense...

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input str9 ddate byte(run_no rating) str8 var4 float(count wanted var7)
              "01-Aug-21" 7 98 "Paco Boy" 7 71   70
              "26-Sep-20" 6 80 "Paco Boy" 7 71   70
              "16-Aug-20" 5 93 "Paco Boy" 7 71 67.5
              "31-Jul-20" 4 76 "Paco Boy" 7 71 67.5
              "21-Jul-20" 3 75 "Paco Boy" 7 71 67.5
              "30-Jun-20" 2 70 "Paco Boy" 7 71   65
              "12-Jun-20" 1 65 "Paco Boy" 7 71    .
              end

              Comment


              • #8
                Good afternoon,

                Perhaps this explanation will be more coherent....

                Top Row: run_no1 =4
                Wanted: returns the average rating for all run_no <=4

                in this example --- average of 76/75/70/65 = 71.5

                2nd row: wanted returns average rating for run_no 1/2/3 = 70
                and so on down through the dataset

                Thanks

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input str9 ddate byte(run_no rating) str8 var4 float(run_no1 wanted)
                "01-Aug-21" 7 98 "Paco Boy" 4 71.5
                "26-Sep-20" 6 80 "Paco Boy" 3   70
                "16-Aug-20" 5 93 "Paco Boy" 3   70
                "31-Jul-20" 4 76 "Paco Boy" 2 67.5
                "21-Jul-20" 3 75 "Paco Boy" 2 67.5
                "30-Jun-20" 2 70 "Paco Boy" 1   65
                "12-Jun-20" 1 65 "Paco Boy" 1    .
                end

                Comment

                Working...
                X