Problem writing a loop with multiple layers

Luke Benvenuto

Join Date: Nov 2016

Posts: 7
#1

Problem writing a loop with multiple layers

21 Jul 2017, 16:21

I can’t figure out how to write a loop for this. I think it is the only way to solve this. I do not think rangestat will work.

I would like to:
For each id/observation, determine the time from start_date to (Date where 50% of persons/id who have an overlapping interval and have the same group are Transplanted)

Transplanted = 1
Id = person
end_date is the last date of follow up, they may have been transplanted, they not have been.

clear
input float id str8 group float Start_Date long end_date float Transplanted
1 "09054" 16807 16839 1
2 "09054" 16812 16841 1
3 "09054" 16831 16845 1
4 "09054" 16838 16848 0
5 "09054" 16852 16878 1
6 "09054" 16891 16897 1
7 "09054" 16898 16900 0
8 "09054" 16835 16909 1
9 "09054" 16877 16912 1
10 "09054" 16908 16916 1
11 "09952" 16877 16918 0
12 "09952" 16926 16932 1
13 "09952" 16940 16946 1
14 "09952" 16840 16954 1
15 "09952" 16926 16965 1
16 "00952" 16908 16966 1
17 "00952" 16960 16967 1
18 "00952" 16961 16969 1
19 "00952" 16968 16979 0
20 "00952" 16944 16982 1
21 "29002" 16988 16995 0
22 "29002" 16975 16999 1
23 "23002" 16971 17008 1
24 "23002" 16937 17014 0
25 "23002" 17017 17022 1
26 "23002" 17015 17024 1
27 "23002" 16926 17026 0
28 "23002" 16924 17032 1
29 "23002" 16982 17034 1
30 "23002" 16996 17035 1
end
format %d Activation_List_Date
format %d end_date
[/CODE]

I calculated the number of people who had an overlapping interval in the same group using
by group: generate Total_Group = _n
**
generate Number_Removed = .
local N = _N
quietly forval i = 1/`N' {
count if group == group[`i'] & end_date < Start_Date [`i']
replace Number_removed = r(N) in `i'
}
**
generate Total_in_Group_At_Start = Total_Group - Number_Removed

Thank you
Tags: None

Clyde Schechter

Join Date: Apr 2014
Posts: 30095

21 Jul 2017, 17:14

I'm not sure I understand exactly what you're trying to do. But to the extent I understand it, I agree that -rangestat- alone cannot do the job, but I believe a couple of applications of -rangejoin- can do the heavy lifting here. Does this get what you want?

Code:

clear
input float id str8 group float Start_Date long end_date float Transplanted
1 "09054" 16807 16839 1
2 "09054" 16812 16841 1
3 "09054" 16831 16845 1
4 "09054" 16838 16848 0
5 "09054" 16852 16878 1
6 "09054" 16891 16897 1
7 "09054" 16898 16900 0
8 "09054" 16835 16909 1
9 "09054" 16877 16912 1
10 "09054" 16908 16916 1
11 "09952" 16877 16918 0
12 "09952" 16926 16932 1
13 "09952" 16940 16946 1
14 "09952" 16840 16954 1
15 "09952" 16926 16965 1
16 "00952" 16908 16966 1
17 "00952" 16960 16967 1
18 "00952" 16961 16969 1
19 "00952" 16968 16979 0
20 "00952" 16944 16982 1
21 "29002" 16988 16995 0
22 "29002" 16975 16999 1
23 "23002" 16971 17008 1
24 "23002" 16937 17014 0
25 "23002" 17017 17022 1
26 "23002" 17015 17024 1
27 "23002" 16926 17026 0
28 "23002" 16924 17032 1
29 "23002" 16982 17034 1
30 "23002" 16996 17035 1
end
format %d Start_Date
format %d end_date

//    INTERVALS OVERLAP IF THE START OR END POINT OF ONE
//    INTERVAL LIES INSIDE THE OTHER
//    USE RANGEJOIN TWICE, ONCE FOR START, ONCE FOR ENDPOINT
//    TO PAIR EACH OBSERVATION WITH ALL OTHERS IN GROUP
//    WITH OVERLAPPING FOLLOW-UP INTERVALS
tempfile copy
save `copy'
rangejoin Start_Date Start_Date end_date using `copy', by(group)
tempfile holding
save `holding'
use `copy', clear
rangejoin end_date Start_Date end_date using `copy', by(group)
append using `holding'
//    SOMETIMES BOTH OCCUR; JUST KEEP ONE OF THESE
duplicates drop
//    IF A OVERLAPS WITH B THEN MAKE B OVERLAP WITH A AS WELL
save "`holding'", replace
rename (id Start_Date end_date Transplanted) =_V
rename *_U *
rename *_V *_U
append using `holding'
duplicates drop

//    GET A RUNNING COUNT OF TRANSPLANTS DONE AMONG THE MATCHES
by id (end_date_U), sort: gen n_transplants = sum(Transplanted_U)
by id: gen overlap_group_size = _N
//    FIND FIRST END DATE AMONG OVERLAPS WHERE THE NUMBER OF TRANSPLANTS
//    DONE IS AT LEAST HALF THE SIZE OF THE OVERLAP GROUP
by id: egen date_half_transplanted = min(cond(2*n_transplants >= overlap_group_size, ///
    end_date_U, .))
format date_half_transplanted %td

Note: In this code, I assume that if a person is transplanted, the date of their transplant is their end_date. (You don't actually say how we are to know when a person is transplanted; I just thought that end_date might be a sensible guess about that.)

Added: -rangejoin-, like -rangestat-, is from SSC. It was written by Robert Picard.

Comment

Luke Benvenuto

Join Date: Nov 2016

Posts: 7
#3

22 Jul 2017, 05:50

Yes, sorry for the lack of clarity. end_date is the date of transplant if they were transplanted. I will apply this code, it will take me a little time to apply it as I am still a relative begininer, but it looks like it will work.

Thank you.
Comment

Announcement