Comparing means to date from two separate groups

Chris James

Join Date: Mar 2015

Posts: 30
#16

04 Jul 2018, 16:35

Clyde Schechter Your output is all correct, thanks.

Out of curiosity, I have two questions.

First, is it possible to exclude the focal charity. So the average given to date to every charity you have given to and not the one you are currently giving to?

Second, imagine if we had an indicator volunteer that is == 1 if the person has ever volunteered (e.g., ids 1, 3, 5 are volunteers). Is there a straightforward to way to update this to calculate a variable for volunteers and another variable for non-volunteers.

Just curious.

Last edited by Chris James; 04 Jul 2018, 16:42.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30117

#17

05 Jul 2018, 08:22

First, is it possible to exclude the focal charity. So the average given to date to every charity you have given to and not the one you are currently giving to?

Yes. Just add -& charityid != pfx_charityid- to the -if- qualifier in the -summ, meanonly- command.

Second, imagine if we had an indicator volunteer that is == 1 if the person has ever volunteered (e.g., ids 1, 3, 5 are volunteers). Is there a straightforward to way to update this to calculate a variable for volunteers and another variable for non-volunteers.

I'm not entirely sure I understand what you mean here. Is it this?

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id date charityid giving wanted_self wanted_others volunteer)
2 18263 1  30   .     . 0
2 18264 2  10  30     . 0
3 18264 1  60   .    30 1
1 18265 1 100   .    45 1
1 18267 2  90 100 33.33 1
4 18268 2 200   .    50 0
5 18269 1  10   . 63.33 1
3 18271 2  20  60 73.33 1
2 18272 3  40  20    80 0
5 18272 3  75  10     . 1
1 18273 3  25  95 55.63 1
end


capture program drop one_donation
program define one_donation
    tempvar previous_donor past_recipient
    //    IDENTIFY CHARITIES TO WHICH INDEX DONOR HAS GIVEN SO FAR
    by charityid, sort: egen `past_recipient' = max(id == pfx_id)
    //    IDENTIFY DONORS WHO HAVE GIVEN TO THOSE CHARITIES
    by id, sort: egen byte `previous_donor' = max(`past_recipient')
    //    CALCULATE MEAN DONATIONS TO PAST RECIPIENTS BY PREVIOUS DONORS
    //    EXCLUDING THE INDEX DONOR
    summ giving if id != pfx_id & date < pfx_date ///
        &`previous_donor' & `past_recipient' & volunteer, meanonly
    gen volunteer_giving = r(mean)
    summ giving if id != pfx_id & date < pfx_date ///
        &`previous_donor' & `past_recipient' & !volunteer, meanonly
    gen non_volunteer_giving = r(mean)

    exit
end

rangestat (mean) giving, by(id) interval(date -5000 -1)

rangerun one_donation, sprefix(pfx_) interval(date -5000 0)

Comment

Chris James

Join Date: Mar 2015

Posts: 30
#18

06 Jul 2018, 08:31

This was it, thanks Clyde Schechter

For my education, what is the "pfx_" doing in all of the above?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#19

06 Jul 2018, 09:25

Notice in the -rangerun- command the option -sprefix(pfx_)-. That option tells Stata, when running range run, to create scalars that contain the values of the variables in the focused observation and to give those scalars names that are the variable names with pfx_ in front of the. (I just chose pfx_ as the particular prefix for its mnemonic value: anything that is legal as part of a variable name would do.) In the program one_donation, when I refer to pfx_id, that means the value of the variable id in the focused observation. Similarly, pfx_date is the value of the variable date in the focused observation.

This is explained in the help file for -runby-, and there is a good example shown there.
Comment

Announcement

Comment

Comment

Comment

Comment