Question related to loop/foreach vvv

Nader Mehri

Join Date: Jun 2019

Posts: 189
#1

Question related to loop/foreach vvv

21 Mar 2020, 18:42

Hi,
Using the below sample data, I am trying to obtain the number of values in b1 (id=50_512) that exceed the maximum value of b1 when id=50_511. I would like to do same thing for b2. Given that I have 65 ids in my real data, I was wondering if there is any way to obtain my info of interest using a loop.
Thanks,
Nader

[QUOTE]
clear
// generate example data
set obs 15
set seed 666
gen n=_n
generate b1 = runiform(0,1)
generate b2 = runiform(-50,50)
gen Age_Group=""
replace Age_Group="50_51" if n<6
replace Age_Group="50_51" if n>5 & n<11
replace Age_Group="50_51" if n>10

gen sample_label=.
replace sample_label=1 if n<6
replace sample_label=2 if n>5 & n<11
replace sample_label=3 if n>10
label define sample_label 1 "1.NCG" 2 "2.caregivers<14h/w" 3 "3.caregivers>=14h/w"
label values sample sample

*id is the combination of Age_Group and sample_label
gen id=""
replace id="50_511" if n<6
replace id="50_512" if n>5 & n<11
replace id="50_513" if n>10
Tags: foreach, loop
Clyde Schechter

Join Date: Apr 2014

Posts: 30114
#2

21 Mar 2020, 19:01

Your question is not clearly posed.

Is id 50_511 a single reference group, and for all other id's you want to find the number of observations of b1 and b2 that exceed the maximum values of b1 and b2 (respectively) for that reference group (id 50_511)?

Code:

// 50_511 IS THE REFERENCE GROUP FOR ALL ID'S summ b1 if id == "50_511" local b1max = r(max) summ b2 if id == "50_511" local b2max = r(max) by id (sample_label), sort: egen exceeds_ref_b1_max = total(b1 > `b1max') by id (sample_label): egen exceeds_ref_b2_max = total(b2 > `b2max')

Or do you want to use id 50_512 as the reference group for 50_513, and then 50_513 as the reference group for 50_514, etc. using age sample as the reference group for the preceding sample in that age group?

Code:

// EACH SAMPLE IS REFERENCE GROUP FOR THE PRECEDING ONE IN ITS AGE GROUP rangestat (max) b1 b2, by(Age_Group) interval(sample_label -1 -1) by id (sample_label), sort: egen exceeds_ref_b1_max = total(b1 > b1_max) by id (sample_label): egen exceeds_ref_b2_max = total(b2 > b2_max)

Or maybe something altogether different? If so, please post back with a clearer explanation and show what the results for your example data should look like.

-rangestat- is written by Robert Picard, Nick Cox, and Robert Ferrer. It is availalbe from SSC.

Note that no loops are required for this.

Note also that the code assumes (but does not verify) that b1 and b2 are never missing. If that is not correct, the various (b1 > …) and (b2 > …) expressions have to be modified to (b1 > … & !missing(b1)) etc.
1 like
Comment
Nader Mehri

Join Date: Jun 2019

Posts: 189
#3

21 Mar 2020, 19:40

Wow! This is great Clyde. Thanks. I would like to have id == "50_511" as a reference group for both subsequent ids (50_512 & 50_513). Your suggested code perfectly captures the values that exceed the max of b1&b2 when id=50_512 but it does not capture my desired values when id=50_513.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30114

21 Mar 2020, 21:12

Yes, it does capture the desired values for id = 50_513:

Code:

. clear

. // generate example data
. set obs 15
number of observations (_N) was 0, now 15

. set seed 666

. gen n=_n

. generate b1 = runiform(0,1)

. generate b2 = runiform(-50,50)

. gen Age_Group=""
(15 missing values generated)

. replace Age_Group="50_51" if n<6
variable Age_Group was str1 now str5
(5 real changes made)

. replace Age_Group="50_51" if n>5 & n<11
(5 real changes made)

. replace Age_Group="50_51" if n>10
(5 real changes made)

.
. gen sample_label=.
(15 missing values generated)

. replace sample_label=1 if n<6
(5 real changes made)

. replace sample_label=2 if n>5 & n<11
(5 real changes made)

. replace sample_label=3 if n>10
(5 real changes made)

. label define sample_label 1 "1.NCG" 2 "2.caregivers<14h/w" 3 "3.caregivers>=14h/w"

. label values sample sample

.
. *id is the combination of Age_Group and sample_label
. gen id=""
(15 missing values generated)

. replace id="50_511" if n<6
variable id was str1 now str6
(5 real changes made)

. replace id="50_512" if n>5 & n<11
(5 real changes made)

. replace id="50_513" if n>10
(5 real changes made)

.
. //  50_511 IS THE REFERENCE GROUP FOR ALL ID'S
. summ b1 if id == "50_511"

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          b1 |          5    .2687425    .1628331    .066487   .4540368

. local b1max = r(max)

. summ b2 if id == "50_511"

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          b2 |          5    12.56845    12.72637  -.6126878   30.38014

. local b2max = r(max)

.
. by id (sample_label), sort: egen exceeds_ref_b1_max = total(b1 > `b1max')

. by id (sample_label): egen exceeds_ref_b2_max = total(b2 > `b2max')

.
. list, noobs clean

     n         b1          b2   Age_Gr~p   sample~l       id   ex~1_max   ex~2_max 
     1   .1498635    30.38014      50_51          1   50_511          0          0 
     2   .4540368     1.08842      50_51          1   50_511          0          0 
     3   .2755941    15.10049      50_51          1   50_511          0          0 
     4    .066487     16.8859      50_51          1   50_511          0          0 
     5   .3977311   -.6126878      50_51          1   50_511          0          0 
     6   .5504283    3.418392      50_51          2   50_512          3          2 
     7   .2672185    36.74982      50_51          2   50_512          3          2 
     8   .3493752     48.4127      50_51          2   50_512          3          2 
     9   .8180724   -5.858458      50_51          2   50_512          3          2 
    10   .8836644   -31.82873      50_51          2   50_512          3          2 
    11   .4745677    29.30499      50_51          3   50_513          2          0 
    12   .2509398    7.094431      50_51          3   50_513          2          0 
    13   .7385626   -41.07987      50_51          3   50_513          2          0 
    14   .0835627   -11.28583      50_51          3   50_513          2          0 
    15   .0451471   -10.41394      50_51          3   50_513          2          0

So we see that the maximum value of b1 is .4540368 and the maximum value of b2 is 30.38104 for id = 50_511.

If we look at the results for observations 11-15 (id = 50_513) we see that b1 exceeds .4540368 in observations 11 and 13, which is 2 observations, and b2 exceeds 30.38104 not at all, which is 0 observations. So this gives what you said you wanted. Perhaps you can explain better what you want if this is not it.

Comment

Nader Mehri

Join Date: Jun 2019

Posts: 189
#5

21 Mar 2020, 21:42

Sorry! my bad! The code captures my desired values for ids. However, this works only for the age group of "50_51". In my real dataset, I have three age groups consisting of "50_51", "60_61" and "70_71". I wonder if there is any way to expand this code to include two additional age groups. I would like to specify 60_611 as a reference for two subsequent values: 60_612, 60_613.

Last edited by Nader Mehri; 21 Mar 2020, 22:16.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30114

21 Mar 2020, 22:47

Code:

by Age_Group (sample_label), sort: egen b1_max = max(cond(sample_label == 1, b1, .))
by Age_Group: egen b2_max = max(cond(sample_label == 1, b2, .))

by Age_Group sample_label: egen exceeds_ref_b1_max = total(b1 > b1_max)
by Age_Group sample_label: egen exceeds_ref_b2_max = total(b2 > b2_max)

Announcement