Generating variable by group if pt developed complication after procedure

Tara Boyle

Join Date: Nov 2022
Posts: 137

Generating variable by group if pt developed complication after procedure

28 Nov 2022, 08:36

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float colorectal str3 hospitalid float MI str3 expected_mi float(date2 date4each MI2) str1 expected float max
  0 "12A" 0 "no"  22252     . 0 ""  .
112 "12A" 1 "yes" 22253 22253 1 "1" .
  0 "13A" 0 "no"  22678 22646 0 ""  .
  0 "13A" 1 "no"  22665 22619 0 ""  .
113 "13A" 0 "no"  22619 22619 0 "0" 0
114 "13A" 0 "no"  22646 22646 0 "0" 0
  0 "14A" 1 "yes" 22720 22705 1 ""  .
115 "14A" 0 "no"  22734 22734 0 "0" 0
116 "14A" 0 "no"  22705 22705 0 "1" 0
end
format %td date2
format %td date4each

I’m trying to create a new dataset where if the patient had a MI within 45 days this will be stored as 1 ON THE SAME LINE THE PATIENT HAD A COLORECTAL PROCEDURE.

COLORECTAL = PT HAD A COLORECTALSURGERY

HOSPITALID = ANY TIME PT ADMITTED TO HOSPITAL AND DATE2 WHEN PATIENT ADMITTED

I can’t seem to do it. I hope to creating something that looks like the 'expected' column...

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#2

28 Nov 2022, 08:55

I can't quite figure out what you want here. The "expected" variable that you set as a model does not seem consistent with what you say in words. In particular, let's look at patient id 13A. This patient has an MI on 20 Jan 2022, which is just 19 days after colorectal surgery on 1 Jan 2022. So why isn't expected set to 1 in this situation?

Also, what cannot in any case be discerned from your example is whether "within 45 days" means within the 45 days after colorectal surgery, or within the 45 days before colorectal surgery, or both of those.
1 like
Comment

Vilma Antonov

Join Date: Aug 2022
Posts: 47

28 Nov 2022, 08:58

Ok I'm sure there is a easier way solving this but I solved a similar thing using this code:

Code:

gen days = datediff(date4each, date2, "day" > )
keep if MI==1
gen MI_within_45=1 if days<46
replace MI_within_45=0 if days>45
keep hospitalid MI_within_45
save "/Volumes/USB 128GB/statahelp.dta"

* Use your original file and do following:
merge m:1 hospitalid using "/Volumes/USB 128GB/statahelp.dta"
drop _merge

I had a little trouble understanding which date was the surgery date, and put it as date4each, but if not, just switch it around

Comment

Tara Boyle

Join Date: Nov 2022

Posts: 137
#4

28 Nov 2022, 09:37

Clyde Schechter - sorry this should read as 45 days after the colorectal procedure. You are correct error from my end. sorry. I have been trying to find ways of solving this problem for at least 72 hours.

Interesting Vilma Antonov you use keep command. This was my last resort.

I was wondering whether there is another other way by keeping the entire dataset and just storing the data if the Patient had a MI within 45 days on the same line.

But if keep is the only idea people have, I will use this and create a new dta file. Then combine datasets and match.
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10186

28 Nov 2022, 10:48

You want to copy the MI date upwards and the colorectal date downwards to match the dates when computing the differences. Otherwise, your description is not too easy to follow.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float colorectal str3 hospitalid float MI str3 expected_mi float(date2 date4each MI2) str1 expected float max
  0 "12A" 0 "no"  22252     . 0 ""  .
112 "12A" 1 "yes" 22253 22253 1 "1" .
  0 "13A" 0 "no"  22678 22646 0 ""  .
  0 "13A" 1 "no"  22665 22619 0 ""  .
113 "13A" 0 "no"  22619 22619 0 "0" 0
114 "13A" 0 "no"  22646 22646 0 "0" 0
  0 "14A" 1 "yes" 22720 22705 1 ""  .
115 "14A" 0 "no"  22734 22734 0 "0" 0
116 "14A" 0 "no"  22705 22705 0 "1" 0
end
format %td date2
format %td date4each

g MI_date= date2*MI
bys hospitalid (date2): g colorectal_date= date2*(colorectal>0)
by hospitalid: replace colorectal_date= colorectal_date[_n-1] if !colorectal_date & colorectal_date[_n-1] & _n>1
gsort hospitalid -date2
by hospitalid: replace MI_date= MI_date[_n-1] if !MI_date & MI_date[_n-1] & _n>1
g wanted = inrange(MI_date- colorectal_date+1, 0, 45) if colorectal

Res.:

Code:

. sort hospitalid date2


. l hospitalid colorectal MI date2 expected MI_date-wanted , sepby(hospitalid)

     +-------------------------------------------------------------------------------+
     | hospit~d   colore~l   MI       date2   expected   MI_date   colore~e   wanted |
     |-------------------------------------------------------------------------------|
  1. |      12A          0    0   03dec2020                22253          .        . |
  2. |      12A        112    1   04dec2020          1     22253      22253        1 |
     |-------------------------------------------------------------------------------|
  3. |      13A        113    0   05dec2021          0     22665      22619        0 |
  4. |      13A        114    0   01jan2022          0     22665      22646        1 |
  5. |      13A          0    1   20jan2022                22665      22646        . |
  6. |      13A          0    0   02feb2022                    .      22646        . |
     |-------------------------------------------------------------------------------|
  7. |      14A        116    0   01mar2022          1     22720      22705        1 |
  8. |      14A          0    1   16mar2022                22720      22705        . |
  9. |      14A        115    0   30mar2022          0         .      22734        0 |
     +-------------------------------------------------------------------------------+

Last edited by Andrew Musau; 28 Nov 2022, 11:11.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#6

28 Nov 2022, 10:52

Code:

rangestat (sum) wanted = MI, by(hospitalid) interval(date2 0 45) replace wanted = min(wanted, 1) replace wanted = . if colorectal == 0

-rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer. It is available from SSC.

Note: code assumes (but does not verify) that the variable MI is always 0 or 1.
1 like
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1373
#7

28 Nov 2022, 10:54

One quick way to do this might be using the community-contributed rangestat command (available from SSC):

Code:

rangestat (max) wanted = MI , interval(date2 0 45) by(hospitalid) replace wanted = . if colorectal == 0
1 like
Comment
Tara Boyle

Join Date: Nov 2022

Posts: 137
#8

29 Nov 2022, 05:36

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float colorectal str3 hospitalid float(MI date2) 0 "12A" 0 22252 112 "12A" 1 22253 0 "13A" 0 22678 0 "13A" 1 22665 113 "13A" 0 22619 114 "13A" 0 22646 115 "14A" 0 22720 0 "14A" 1 22734 116 "14A" 0 22705 0 "14A" 0 22705 end format %td date2

The code does not work for patient 14A who had procedure 116 but did not develop a MI in 45 days. But then had a procedure 115 on 16 Mar and developed a MI within 45 days.

The code generates the max or sum depending on whose code you use but with this code presents an inaccurate picture that patient 14A develop a MI after procedure 115 and 116...

So i think I will need to stick to keep and then merging different datasets.

Last edited by Tara Boyle; 29 Nov 2022, 06:14.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#9

29 Nov 2022, 06:46

The code does not work for patient 14A who had procedure 116 but did not develop a MI in 45 days. But then had a procedure 115 on 16 Mar and developed a MI within 45 days.

But patient 14A did develop an MI on 30 Mar 2022, which is within 45 days of colorectal procedure 116, which took place on 1 Mar 2022. The MI was, in fact, within 45 days of both procedures 115 and 116. So the code is correct.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#10

29 Nov 2022, 07:11

Well, the code is a correct implementation of what you asked for in #1. But perhaps what you asked for is not what you meant. Perhaps if there are multiple colorectal surgeries that all occurred within the 45 days preceding an MI, you only want the last of those surgeries to be marked 1 in the new variable. This might make sense if, for example, you are conducting a study of the incidence of MI following colorectal procedures using time-to-event analysis and you want to treat an intervening colorectal procedure as censoring the observation following the preceding colorectal procedure(s). In that case, the following code will give you what you want:

Code:

assert inlist(MI, 0, 1) gsort hospitalid -date2 by hospitalid: gen next_colorectal_date = . by hospitalid: replace next_colorectal_date = cond(colorectal[_n-1], date2[_n-1], /// next_colorectal_date[_n-1]) if _n > 1 format next_colorectal_date %td gen lower = cond(colorectal, date2, 1) gen upper = cond(colorectal, min(date2+45, next_colorectal_date-1), 0) rangestat (max) wanted = MI, by(hospitalid) interval(date2 lower upper) drop lower upper

Last edited by Clyde Schechter; 29 Nov 2022, 07:13.
Comment
Tara Boyle

Join Date: Nov 2022

Posts: 137
#11

29 Nov 2022, 10:47

Another question of the code . Although I don’t know if i’ll be able to use range stat due a problem highlighted ina previous thread but I still would like to know how the code works (bold section)

rangestat (max) wanted = MI , interval(date2 0 45)

Ex

Section 1

Pt 13 A had procedure 114 on 1 Jan 2022, but no MI

Section 2

That same pt develops a MI on 20 Jan 2022

So in this way would stata use the code and say that for section 1 the pt had no MI -correct

But for section 2 the pt gets a Mi within 45 days of Section 1 date HOWEVER—> actually the code says date2 so theoretically that wouldn’t that mean 20 Jan 2022 .

How does stata interpret this correctly and calculate 45 days from Section 1 date ? Just trying to understand how this works.
Comment
Tara Boyle

Join Date: Nov 2022

Posts: 137
#12

29 Nov 2022, 10:47

Another question of the code . Although I don’t know if i’ll be able to use range stat due a problem highlighted ina previous thread but I still would like to know how the code works (bold section)

rangestat (max) wanted = MI , interval(date2 0 45)

Ex

Section 1

Pt 13 A had procedure 114 on 1 Jan 2022, but no MI

Section 2

That same pt develops a MI on 20 Jan 2022

So in this way would stata use the code and say that for section 1 the pt had no MI -correct

But for section 2 the pt gets a Mi within 45 days of Section 1 date HOWEVER—> actually the code 🧑*💻 says date2 so theoretically that wouldn’t that mean 20 Jan 2022 .

How does stata interpret this correctly and calculate 45 days from Section 1 date ? Just trying to understand how this works.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#13

29 Nov 2022, 11:22

The -interval()- option in -rangestat- is a bit complicated and unintuitive. I'll try to explain how it works. First some terminology.

Thank of -rangestat- as processing each observation of the data set separately. I'll refer to the observation being processed at any given time as the current observation, and the values of its variables as their current values. I'll refer to other observations as "source" observations and the values of their variables as "source" values. (Note: the current observation itself is also considered a source value, unless the -excludeself- option has been specified.)

So, when -rangestat- is processing an observation and the -interval()- observation is -interval(date2 0 45)-, that is interpreted as:
Find the current observation of variable date2.

Add 0 to the current value of date2 to get the lower limit for inclusion of source observations in the range for calculating the statistics requested.*

Add 45 to the current value of date2 to get the upper limit for inclusion of source observations in the range for calculating the statistics requested.*

Select all observations for which the source value of date2 falls between the upper and lower limits calculated in steps 2 and 3.

Calculate the requested statistics (sum, max, mean, whatever) using the observations selected in step 4.

Set the current value(s) of the variable(s) for the requested statistics to the results calculated in step 5.

* This is how it works when the second and third argument in the -interval()- parameter are specified as constants. When they are specified as variables, the current values of those variables are used as the lower and upper limits for inclusion.
Note: The above are done restricted to exact matching between current and source variables on the variables given in the -by()- option (if any).

Concretely, regarding patient 13A, consider what happens when the observation for procedure 114 on patient 13A.
Step 1: The current value of date2 is 1 Jan 2022.
Step 2: The lower limit is therefore 1 Jan 2022 + 0 = 1 Jan 2022.
Step 3 The upper limit is 1 Jan 2022 + 45 = 15 Feb 2022.
Step 4: Select all observations of patient 13A (because hospitalid is in the -by()- option) whose (source) values of date2 fall between 1 Jan 2022 and 15 Feb 2022.
Step 5: Calculate the maximum value of variable MI among those observations. The 20 Jan 2022 observation for patient 13A does fall between 1 Jan 2022 and 15 Feb 2022, so it is among those source observations used to calculate the maximum. The source value of MI for this 20 Jan 2022 observation is 1. And since all values of MI are either 0 or 1, and we have just found a 1 for MI in the 20 Jan 2022 source observation, the maximum value of MI for all these observations must be 1.
Step 6: Set wanted = 1.
1 like
Comment

Announcement

Generating variable by group if pt developed complication after procedure

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment