create a new variable or new dataset?

Denise Vella

Join Date: Aug 2022

Posts: 187
#1

create a new variable or new dataset?

09 Nov 2023, 16:27

Find sample data below.

An individual can have a kidney stone......

first treatment is is to monitor (1), if this fails - the patient can have a stent(2) and if this fails, procedure 3 (nephrostomy)

I would like to see the

individual (or rather individuals) who had procedure 1 and see if this has failed to become procedure 2 and subsequently if 2 has failed to become procedure 3.

The aim is to compare it to those individuals who have procedure (2) first and calculate the risk of failure to become procedure 3.
I can do the failure statistics, or rather survival statistics.

But I can not think of a way to create the data.

1. Should I create a new dataset?
2. Should I create a new variable; using - bys + date + laterality +mrn

It would be easier to have one row for each patient, generate a variable that indicates if the procedure (1) has failed to become (2) to become (3) .
However not sure if this is possible, perhaps an expert in data cleaning can help.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float(procedure mrn gender age barcodeno fail laterality date2) 1 12 1 22 12 1 1 14641 2 12 1 25 17 1 1 15342 3 12 1 87 20 0 1 16805 2 103 2 90 76 1 2 16802 3 103 2 100 98 0 2 18295 1 65 2 87 90 0 2 17563 end format %td date2 label values gender f label def f 1 "male", modify label def f 2 "female", modify label values laterality right label def right 1 "Right", modify label def right 2 "left", modify

/mrn is the Scottish way for specifying patient identification number
Of course this data is falsified.
Tags: None

Clyde Schechter

Join Date: Apr 2014
Posts: 30147

09 Nov 2023, 16:45

I think I would do this as follows:

Code:

isid mrn date2, sort
assert !missing(procedure)

collapse (first) first_procedure = procedure first_date = date2 start_age = age ///
    (last) last_procedure = procedure last_date = date2 gender, by(mrn laterality)

stset last_date, failure(last_procedure == 3) origin(first_date)

Comment

Denise Vella

Join Date: Aug 2022

Posts: 187
#3

10 Nov 2023, 04:04

Clyde Schechter - thank you for this.
I didn't know the -collapse- had first and last options.

I guess, trying to take it a step further

If I wanted to to see the

individual (or rather individuals) who had procedure 1 and see if this has failed to become procedure 2 (Let's call this FIRST FAILURE) - risk failure x
(eg pt mrn 12 who had procedure 1 ( year: 2000) then subsequently had procedure 2 (year 2002)

AND

compare it to the individual who started with procedure 2 which then failed to become procedure 3 and subsequently if 2 has failed to become procedure 3 (LAST FAILURE) - risk failure y
(eg pt mrn 103- had procedure 2 then this failed to become procedure 3 - 2 years later)

I have achieved the aim to compare those individuals who have LAST FAILURE (procedure 3) - with the function - collapse(last) -thanks clyde (see post 2)

However, are you able to suggest a modification to review what happens -inbetween first and last, ie comparing, risk failure x to risk of failure y.-. As there is no -inbetween- option with collapse
I tried the below, which of course only works with the last date.

Code:

collapse (first) first_procedure = procedure first_date= date2 start_age = age /// (last) first_failure= procedure first_failuredate=date2 gender, by(mrn laterality)
Comment

Denise Vella

Join Date: Aug 2022
Posts: 187

10 Nov 2023, 05:29

Perhaps I wasn't that articulate, this is what I Would like to see (FIRST FAILURE - 2ND FAILURE - 3RD FAILURE) and the dates for 1st, failure, 2ndfailure and 3rd failure.
Clye has shown me how to do first and last, but what happens for the 'inbetween i.e 2nd failure')
Then I will just keep one line for each mrn

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(procedure mrn gender age barcodeno failure laterality opdate firstfail failure_2nd failure_3rd)
1  12 1  22 12 1 1 14641 1 1 0
2  12 1  25 17 1 1 15342 1 1 0
3  12 1  87 20 0 1 16805 1 1 0
1  65 2  87 90 0 2 17563 0 0 0
2 103 2  90 76 1 2 16802 1 0 0
3 103 2 100 98 0 2 18295 1 0 0
end
format %td opdate
label values procedure proc
label def proc 1 "monitor", modify
label def proc 2 "stent", modify
label def proc 3 "nephrostomy", modify
label values gender f
label def f 1 "male", modify
label def f 2 "female", modify
label values laterality right
label def right 1 "Right", modify
label def right 2 "left", modify

Last edited by Denise Vella; 10 Nov 2023, 05:38.

Comment

Denise Vella

Join Date: Aug 2022
Posts: 187

10 Nov 2023, 07:29

Hello, managed to solve it

Code:

/Create order
bys mrn gender laterality: gen total_noofentries = _N
bys mrn gender laterality: gen orderof_entries= _n

//Create first - 2nd - 3rd failures
bys mrn gender laterality: gen test_FIRST_failure = 1 if fail ==1 & orderof_entries == 1
bys mrn gender laterality: gen test_SECON_failure = 1 if fail ==1 & orderof_entries == 2
bys mrn gender laterality: gen test_THIRD_failure = 1 if fail ==1 & orderof_entries == 3

//Create dates for first - 2nd - 3rd failures
gen first_datefailure = date2 if test_FIRST_failure == 1
format first_datefailure %td

//create dates for Second failure
gen second_datefailure = date2 if test_SECON_failure == 1
format second_datefailure %td

//trying to get everything in one row

//First failure 
replace test_FIRST_failure = 0 if test_FIRST_failure !=1 
rangestat (max) Final_1st_revision = test_FIRST_failure, interval(test_FIRST_failure 0 1) by(mrn gender laterality)

//First date faiure
replace first_datefailure = 0 if first_datefailure ==. 
rangestat (max) Final_1_date_revision = first_datefailure, interval(first_datefailure 0 .) by(mrn gender laterality)
format Final_1_date_revision %td



//Second failure 
replace test_SECON_failure = 0 if test_SECON_failure !=1 
rangestat (max) Final_2nd_rev = test_SECON_failure, interval(test_SECON_failure 0 1) by(mrn gender laterality)


replace second_datefailure = 0 if second_datefailure ==. 
rangestat (max) Second_2_date_revision = second_datefailure, interval(second_datefailure 0 .) by(mrn gender laterality)
format Second_2_date_revision %td


keep if orderof_entries == 1

Announcement