Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Comparing means to date from two separate groups

    Code:
    clear
    inp    id     date     charityid     donated
    1    18265    1    100    
    2    18263    1    30    
    3    18264    1    60    
    5    18269    1    10    
    1    18267    2    90    
    2    18264    2    10    
    3    18271    2    20    
    4    18268    2    200    
    1    18273    3    25    
    2    18272    3    40    
    5    18272    3    75    
    end
    For these data I am trying to first calculate the mean amount donated by a given id to date, excluding the date in question.

    Code:
    rangestat (mean) giving, interval(date -5000 -1) by(id)
    Next, I want to compare this amount to how much their peers have given to that date to the same charities that this person has given in to this date. I calculated what the result should be but it is unclear to me how to start this process.

    Code:
    clear
    inp    id     date     charityid     giving     giving_mean     others_giving_mean
    1    18265    1    100    .    .    
    2    18263    1    30    .    .    
    3    18264    1    60    .    .    
    5    18269    1    10    .    .    
    1    18267    2    90    100    45    
    2    18264    2    10    30    .    
    3    18271    2    20    60    70    
    4    18268    2    200    .    .    
    1    18273    3    25    95    55    
    2    18272    3    40    20    80    
    5    18272    3    75    10    95    
    end


  • #2
    Your code refers to a variable named giving, which does not occur in your example data. Please provide data and code that are consistent with each other.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      Your code refers to a variable named giving, which does not occur in your example data. Please provide data and code that are consistent with each other.
      Yes, this is a typo. The first block should be as follows:
      Code:
      clear
      inp    id     date     charityid     giving
      1    18265    1    100    
      2    18263    1    30    
      3    18264    1    60    
      5    18269    1    10    
      1    18267    2    90    
      2    18264    2    10    
      3    18271    2    20    
      4    18268    2    200    
      1    18273    3    25    
      2    18272    3    40    
      5    18272    3    75    
      end

      Comment


      • #4
        OK. Your -rangestat- code looks correct. As for the second part of your problem, I don't understand now how you arrived at the answers you want. Here is some code that I believe answers the question you pose, but the answers it gives are quite different from what you show, and I don't understand how you arrived at yours.

        Code:
        clear
        inp    id     date     charityid     giving     giving_mean     others_giving_mean
        1    18265    1    100    .    .    
        2    18263    1    30    .    .    
        3    18264    1    60    .    .    
        5    18269    1    10    .    .    
        1    18267    2    90    100    45    
        2    18264    2    10    30    .    
        3    18271    2    20    60    70    
        4    18268    2    200    .    .    
        1    18273    3    25    95    55    
        2    18272    3    40    20    80    
        5    18272    3    75    10    95    
        end
        
        
        rename giving_mean wanted_self
        rename others_giving_mean wanted_others
        
        capture program drop one_donation
        program define one_donation
            tempvar previous_donor
            by id, sort: egen byte `previous_donor' = max(charityid == pfx_charityid)
            summ giving if id != pfx_id & `previous_donor', meanonly
            gen others_giving = r(mean)
            exit
        end
        
        rangestat (mean) giving, by(id) interval(date -5000 -1)
        
        rangerun one_donation, sprefix(pfx_) interval(date -5000 -1)
        sort date id charityid
        list, clean abbrev(24)
        which produces:

        Code:
               id    date   charityid   giving   wanted_self   wanted_others   giving_mean   others_giving  
          1.    2   18263           1       30             .               .             .               .  
          2.    2   18264           2       10            30               .            30               .  
          3.    3   18264           1       60             .               .             .              30  
          4.    1   18265           1      100             .               .             .        33.33333  
          5.    1   18267           2       90           100              45           100              20  
          6.    4   18268           2      200             .               .             .            57.5  
          7.    5   18269           1       10             .               .             .              58  
          8.    3   18271           2       20            60              70            60              86  
          9.    2   18272           3       40            20              80            20               .  
         10.    5   18272           3       75            10              95            10               .  
         11.    1   18273           3       25            95              55            95              33
        In each case may others_giving is the average of all donations made previously by any other donors who donated to that charity up to that point. For example in observation 3 of the output, we arqe looking at id3's donation to charity 1 on date 18264. Previous to that, on date 18263, id2 has given 30 to charity 1, and nobody else has, so I calculate a mean of 30, whereas you show nothing for that observation. In observation 4, we are focusing on id1's donation of 100 to charity 1 on date 18265. Previous to that, botoh ids 2 and 3 have donated to charity 1, and their total donations to all charities up to that point is 30 + 10 + 60 = 100, so the mean is 33.33. Again, you show nothing. Looking at observation 5, id 1 gives to charity 2 on date 18267. The only previous donor to charity 2 is id2, who up to that point has donated 30 to charity 1 on 18263 and 10 to charity 2 on 18264, for an average of 20. But you show 45.

        So evidently, either I misunderstand your intend, or you have calculated things incorrectly. Can you clarify this?

        Comment


        • #5
          Hi Clyde I could not understand the following line of code: specifically the use of function max in this context as below, please if you could explain, I will be grateful

          egen byte `previous_donor' = max(charityid == pfx_charityid)

          Comment


          • #6
            The point here is to determine whether or not the person (id) in the other observation has donated to the charity in the current observation (pfx_charityid). The expression charityid == pfx_charityid will be 0 when the person has not, and 1 when they have. By taking the max over all of that person's observations (within the time frame) we determine whether or not they have ever donated to that charity during that time period, because the maximum will be 1 if and only if at least one of those observations gives 1. This is a fairly standard Stata trick.

            Added: I forgot to mention in the original response that -ramgerim- is not part of official Stata. It was written by Robert Picard, and is available from SSC.
            Last edited by Clyde Schechter; 02 Jul 2018, 15:59.

            Comment


            • #7
              Typo time: Clyde Schechter means rangerun by Robert Picard and another from SSC.

              Comment


              • #8
                Nick Cox Thanks for correcting my typo. Sorry about the attribution on -rangerun-; I forgot that you were also an author of that one.

                Comment


                • #9
                  Thanks. I was more concerned about the possibility that someone would go looking for rangerim That program rangerun really is almost all by Robert Picard.

                  Comment


                  • #10
                    Clyde Schechter

                    Thanks for this response and my original example did have a typo for others_giving. However, the output is off and I am trying to figure out where in the code this is occurring. The goal of others_giving is to calculate the average others have given to the charities the focal id has given to to date. This includes the charity of the unit of analysis

                    Code:
                          id      date    charityid   giving   wanted_self   wanted_others   giving_mean   others_giving    
                    1.    2   18263               1        30                  .                       .                .                     .    
                    2.    2   18264               2        10               30                       .               30                    .    
                    3.    3   18264               1        60                 .                       .                 .                    30    
                    4.    1   18265               1      100                 .                       .                .              33.33333    
                    5.    1   18267               2        90             100                    45             100                20    
                    6.    4   18268               2      200                .                     .                   .               57.5    
                    7.    5   18269               1        10                .                     .                  .                  58    
                    8.    3   18271               2        20              60                  70                60                 86    
                    9.    2   18272               3        40              20                 80                 20                  .    
                    10.    5   18272             3        75              10                 95                 10                   .    
                    11.    1   18273             3        25              95                55                  95                 33
                    Observation 3 is correct. One person (id == 2) has given to this charity (charityid == 1) and they gave

                    The rest are incorrect.

                    Observation 4, this person (id == 1) has only given to this charity (charityid == 1) to this date. Two people have given to it previously (id == 3 for 60 and id == 2 for 30, for a total of 90). Others_giving should be equal to 45 (90/2) and not 33.333.

                    Observation 5, id ==1 has now given to two charities (charityid == 1 and charityid ==2). The average to date for both of these charities from other people is 100 (30+10+60) so others_giving should be equal to 33.333 and not 20.

                    Thank you for your help on this.
                    Last edited by Chris James; 03 Jul 2018, 11:36.

                    Comment


                    • #11
                      Well, the following code appears to implement what you are describing. And it now agrees with you on observations 4 and 5, but not on later observations:

                      Code:
                      clear
                      inp    id     date     charityid     giving     giving_mean     others_giving_mean
                      1    18265    1    100    .    .    
                      2    18263    1    30    .    .    
                      3    18264    1    60    .    .    
                      5    18269    1    10    .    .    
                      1    18267    2    90    100    45    
                      2    18264    2    10    30    .    
                      3    18271    2    20    60    70    
                      4    18268    2    200    .    .    
                      1    18273    3    25    95    55    
                      2    18272    3    40    20    80    
                      5    18272    3    75    10    95    
                      end
                      
                      
                      rename giving_mean wanted_self
                      rename others_giving_mean wanted_others
                      
                      capture program drop one_donation
                      program define one_donation
                          tempvar previous_donor past_recipient
                          //    IDENTIFY CHARITIES TO WHICH INDEX DONOR HAS GIVEN SO FAR
                          by charityid, sort: egen `past_recipient' = max(id == pfx_id)
                          //    IDENTIFY DONORS WHO HAVE GIVEN TO THOSE CHARITIES
                          by id, sort: egen byte `previous_donor' = max(`past_recipient')
                          //    CALCULATE MEAN DONATIONS TO PAST RECIPIENTS BY PREVIOUS DONORS
                          //    EXCLUDING THE INDEX DONOR
                          summ giving if id != pfx_id & `previous_donor' & `past_recipient', meanonly
                          gen others_giving = r(mean)
                          exit
                      end
                      
                      rangestat (mean) giving, by(id) interval(date -5000 -1)
                      
                      sort date
                      rangerun one_donation, sprefix(pfx_) interval(date -5000 -1)
                      sort date id charityid
                      list, clean abbrev(24)
                      with output

                      Code:
                             id    date   charityid   giving   wanted_self   wanted_others   giving_mean   others_giving  
                        1.    2   18263           1       30             .               .             .               .  
                        2.    2   18264           2       10            30               .            30               .  
                        3.    3   18264           1       60             .               .             .               .  
                        4.    1   18265           1      100             .               .             .               .  
                        5.    1   18267           2       90           100              45           100              45  
                        6.    4   18268           2      200             .               .             .               .  
                        7.    5   18269           1       10             .               .             .               .  
                        8.    3   18271           2       20            60              70            60        46.66667  
                        9.    2   18272           3       40            20              80            20              80  
                       10.    5   18272           3       75            10              95            10        63.33333  
                       11.    1   18273           3       25            95              55            95              55


                      Comment


                      • #12
                        Clyde Schechter Thanks. I appreciate your helping me as I tried for weeks to get this to work in Stata and had to move to an outside program to finish but would love to know if it could be done in Stata as I think it has multiple applications.

                        I am a little confused, as I do not see the agreement you are referencing. To clear up any typo from the start, I have edited the wanted_others to fit the calculation of the average others have given to the charities the focal id has given to to date.

                        Code:
                        id date charityid giving wanted_self wanted_others
                        1. 2 18263 1 30 .   .
                        2. 2 18264 2 10 30 .
                        3. 3 18264 1 60 . 30
                        4. 1 18265 1 100 . 45
                        5. 1 18267 2 90 100 33.33
                        6. 4 18268 2 200 . 50
                        7. 5 18269 1 10 . 63.33
                        8. 3 18271 2 20 60  73.33
                        9. 2 18272 3 40 20 80
                        10. 5 18272 3 75 10 .
                        11. 1 18273 3 25 95 55.63
                        Note that, the date of the focal observation should not be included, and this is why wanted_others == . for observation 10.
                        Last edited by Chris James; 03 Jul 2018, 13:27.

                        Comment


                        • #13
                          Well, now I see we do not have these agreements, but I still do not understand why my answers are not what you want. Let's focus for a minute on observation no. 5.

                          The focal donor in observation #5 is id = 1. The only charity previously donated to be id 1 is charityid 1 (in observation 4). The other donors to charityid 1 prior to date 18267 are ids 1, 2, and 3. We do not count id 1's previous donation, so we tally up a donation nof 30 from id2 plus 60 from id 3 = 90, for an average donation of 45, which is what my code calculates, but you are looking for an answer of 30. I don't understand why.

                          Comment


                          • #14
                            Clyde Schechter Ah, I believe I see the confusion. It is subtle and I most likely was not clear, sorry about that. I am trying to calculate others_giving as the average others have given to all other charities you have given to to date.

                            For observation 5, id==1 has given to charityid==1 on 18265 and now charityid==2 on 18267 (the focal observation).

                            Up to 18267 there has been 3 donations to both of these charities (excluding what id==1 has given). Id==2 gave 30 to charityid==1 on 18263, id==2 gave 10 to charityid==2 on 18264, and id==3 gave 60 to charityid==1 on 18264. Overall, id==1 witnesses 100 being given to the charities she has given to and is currently considering by 3 people, an average of 33.33 (100/3). I want to see how this running average affects the amount they give compared to the average they have given to date (giving_mean).

                            I am excluding what others give the day you give because without a time stamp it is hard to know who came first.


                            Comment


                            • #15
                              We're almost there now. In your example, my revised code agrees with your results (up to rounding errors) for all but one observation (#10). I don't see how you arrived at your answer in observation 10. The date is 18272. The index donor is id 5 and the donation is to charityid 3. id5 has previously donated only to charity 1. Previous donations to charity 1 include those in observations 1, 3, 4, and 7. The donation in #7 doesn't count because it, too is from id 5. So we have the donations in observations 1, 3, and 4, which are 30 + 60 + 100 = 190, which when divided by 3 gives 63.33, but you have missing value for that observation. Perhaps your result is in error here? Or is there another aspect of this I'm missing.

                              Code:
                              clear
                              input obs_no id date charityid giving wanted_self wanted_others
                              1. 2 18263 1 30 .   .
                              2. 2 18264 2 10 30 .
                              3. 3 18264 1 60 . 30
                              4. 1 18265 1 100 . 45
                              5. 1 18267 2 90 100 33.33
                              6. 4 18268 2 200 . 50
                              7. 5 18269 1 10 . 63.33
                              8. 3 18271 2 20 60  73.33
                              9. 2 18272 3 40 20 80
                              10. 5 18272 3 75 10 .
                              11. 1 18273 3 25 95 55.63
                              end
                              
                              capture program drop one_donation
                              program define one_donation
                                  tempvar previous_donor past_recipient
                                  //    IDENTIFY CHARITIES TO WHICH INDEX DONOR HAS GIVEN SO FAR
                                  by charityid, sort: egen `past_recipient' = max(id == pfx_id)
                                  //    IDENTIFY DONORS WHO HAVE GIVEN TO THOSE CHARITIES
                                  by id, sort: egen byte `previous_donor' = max(`past_recipient')
                                  //    CALCULATE MEAN DONATIONS TO PAST RECIPIENTS BY PREVIOUS DONORS
                                  //    EXCLUDING THE INDEX DONOR
                                  summ giving if id != pfx_id & date < pfx_date ///
                                      &`previous_donor' & `past_recipient', meanonly
                                  gen others_giving = r(mean)
                                  exit
                              end
                              
                              rangestat (mean) giving, by(id) interval(date -5000 -1)
                              
                              rangerun one_donation, sprefix(pfx_) interval(date -5000 0) verbose
                              sort date id charityid
                              order wanted_others, after(others_giving)
                              
                              list, noobs clean abbrev(16)
                              which produces

                              Code:
                                  obs_no   id    date   charityid   giving   wanted_self   giving_mean   others_giving   wanted_others  
                                       1    2   18263           1       30             .             .               .               .  
                                       2    2   18264           2       10            30            30               .               .  
                                       3    3   18264           1       60             .             .              30              30  
                                       4    1   18265           1      100             .             .              45              45  
                                       5    1   18267           2       90           100           100        33.33333           33.33  
                                       6    4   18268           2      200             .             .              50              50  
                                       7    5   18269           1       10             .             .        63.33333           63.33  
                                       8    3   18271           2       20            60            60        73.33334           73.33  
                                       9    2   18272           3       40            20            20              80              80  
                                      10    5   18272           3       75            10            10        63.33333               .  
                                      11    1   18273           3       25            95            95          55.625           55.63
                              Notes: The changes from the previous version of the code are shown in bold italics. They are quite slight. The trick is to expand the range of observations to include the current date so that a complete set of charities to which the index id has donated can be identified, and then exclude observations with date equal to the index date from the calculation of the mean.

                              Note also that I have added a new variable, obs_no to the data. This is just for convenience in discussing the listing of inputs and outputs. It plays no actual role in the code, and you do not need to modify your data set to include it for production purposes.




                              Comment

                              Working...
                              X