String and dummy variable

Layo Olabisi

Join Date: Jan 2020

Posts: 10
#1

String and dummy variable

09 Jan 2020, 00:10

I am trying to create a dummy variable that indicates whether each discharge involves readmission. I only have data on the date of admission. However, readmission is subsequent hospitalization for the same patientId within 30 days of the index claim
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10195
#2

09 Jan 2020, 00:51

Please present a data example with a few cases. If you have a fully updated version of Stata 14 or later versions, see

Code:

help dataex

Otherwise

Code:

ssc install dataex help dataex
2 likes
Comment

Layo Olabisi

Join Date: Jan 2020
Posts: 10

09 Jan 2020, 14:39

Thank you. I hope this makes sense

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str9 admitdate
"28-Apr-10"
"17-Mar-10"
"16-Apr-10"
"18-May-10"
"18-Aug-10"
"27-Sep-10"
"28-Oct-10"
"31-Aug-10"
"24-Sep-10"
"25-Oct-10"
"21-Nov-10"
"8-Jul-10"
"6-Jun-10"
"7-Jul-10"
"28-Feb-10"
"2-Apr-10"
"3-May-10"
"21-Mar-10"
"5-Sep-10"
"29-Sep-10"
"9-Jan-10"
"13-Feb-10"
"7-Jun-10"
"9-Jul-10"

end

Comment

Layo Olabisi

Join Date: Jan 2020

Posts: 10
#4

09 Jan 2020, 14:44

I uploaded a sample just for better clarification.

Attached Files

Sample.xlsx (16.1 KB, 1 view)
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10195

09 Jan 2020, 15:58

This will assign a value of 1 for any patientid with readmission within 30 days and 0 otherwise.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(patientid age) str9 admitdate byte systolic int(procedure1 procedure2 procedure3 procedure4 procedure5 diagnosis1 diagnosis2 diagnosis3 diagnosis4 diagnosis5) byte aha_id
1 72 "28-Apr-10" 97 3610 1135 3813 5225 7302 1380 4560 3045 5273 3907 8
2 78 "17-Mar-10" 81 1253 3402 5113 4611 5350 3605 7463 8480 2865 3860 7
2 55 "16-Apr-10" 99 7630 7930 2576 5741 7262 2254 5432 4953 8581 2851 9
2 64 "18-May-10" 64 3362 3595 1999 2430 5833 3651 2667 5965 2976 8016 9
3 58 "18-Aug-10" 99 4322 8801 1039 6368 2008 5081 3482 6221 4265 1927 7
end

gen admit_date = date(admitdate, "DM20Y")
format admit_date %td
bys patientid (admit_date): gen difference= admit_date- admit_date[_n-1]
bys patientid: egen wanted= max(difference<=30)

Res.:

Code:

. l patientid admit_date difference wanted, sepby(patientid)

     +------------------------------------------+
     | patien~d   admit_d~e   differ~e   wanted |
     |------------------------------------------|
  1. |        1   28apr2010          .        0 |
     |------------------------------------------|
  2. |        2   17mar2010          .        1 |
  3. |        2   16apr2010         30        1 |
  4. |        2   18may2010         32        1 |
     |------------------------------------------|
  5. |        3   18aug2010          .        0 |
     +------------------------------------------+

Comment

Layo Olabisi

Join Date: Jan 2020

Posts: 10
#6

09 Jan 2020, 16:56

Thank you so much. This was so helpful
However a follow up question to that is that I would have to create another dummy variable that has an inclusion for certain groups and exclusion criteria for the first three characters
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10195
#7

09 Jan 2020, 17:02

dummy variable that has an inclusion for certain groups and exclusion criteria for the first three characters

I do not get what you mean here. Can you give an example?
Comment
Layo Olabisi

Join Date: Jan 2020

Posts: 10
#8

09 Jan 2020, 17:16

I am to create a dummy variable for a procedure that involved CABG. It says the inclusion criteria is an procedure code in these groups 3610-3616 and the exclusion criteria are procedure code where the first 3 characters are either 350/351
Comment
Layo Olabisi

Join Date: Jan 2020

Posts: 10
#9

09 Jan 2020, 20:48

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int procedure1 3610 1253 7630 3362 4322 2849 5248 1257 4455 5214 end

I need to create a dummy variable for whether a procedure involved CABG or not. The inclusion criteria is any procedure code in the groups 3610-3616 and the exclusion criteria are procedure code where the first 3 characters are either 350/351.
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10195

#10

10 Jan 2020, 00:50

Do you need to tag a patientid if any of the inclusion criteria are fulfilled? If so, from your data structure in #5

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(patientid age) str9 admitdate byte systolic int(procedure1 procedure2 procedure3 procedure4 procedure5 diagnosis1 diagnosis2 diagnosis3 diagnosis4 diagnosis5) byte aha_id
1 72 "28-Apr-10" 97 3610 1135 3813 5225 7302 1380 4560 3045 5273 3907 8
2 78 "17-Mar-10" 81 1253 3402 5113 4611 5350 3605 7463 8480 2865 3860 7
2 55 "16-Apr-10" 99 7630 7930 2576 5741 7262 2254 5432 4953 8581 2851 9
2 64 "18-May-10" 64 3362 3595 1999 2430 5833 3651 2667 5965 2976 8016 9
3 58 "18-Aug-10" 99 4322 8801 1039 6368 2008 5081 3482 6221 4265 1927 7
end

local values ""
forval i= 3610(1)3616{
     local values "`values' `i'"
}
egen tag= anymatch(procedure*), values(`values')
bys patientid: egen CABG_included = max(tag)

Defining a dummy for the exclusion criteria is more difficult as the egen command with the -anymatch- function does not accept functions within the -values()- option. That is why, for example, I cannot use the -inrange()- function in the code above and had to resort to defining a local macro. The easiest way to handle this is to have a long layout (i.e., reshape long procedure), but I will post a solution that works with a wide layout later in the day (or someone else in the list may be able to come up with a better suggestion). If you need to tag observations but not the entire patientid, exclude the last line of the code above.

Res.:

Code:

. l patientid procedure* CABG_included, sepby(patientid)

     +----------------------------------------------------------------------------+
     | patien~d   proced~1   proced~2   proced~3   proced~4   proced~5   CABG_i~d |
     |----------------------------------------------------------------------------|
  1. |        1       3610       1135       3813       5225       7302          1 |
     |----------------------------------------------------------------------------|
  2. |        2       1253       3402       5113       4611       5350          0 |
  3. |        2       7630       7930       2576       5741       7262          0 |
  4. |        2       3362       3595       1999       2430       5833          0 |
     |----------------------------------------------------------------------------|
  5. |        3       4322       8801       1039       6368       2008          0 |
     +----------------------------------------------------------------------------+

Comment

Andrew Musau

Join Date: Oct 2014

Posts: 10195
#11

10 Jan 2020, 02:30

exclusion criteria are procedure code where the first 3 characters are either 350/351.

This is easier than I thought. 4 digit numbers where the first 3 digits are 350 and 351 include numbers in the range 3500-3519. So the same approach in #10 applies.

Code:

local values "" forval i= 3500(1)3519{ local values "`values' `i'" } egen tag2= anymatch(procedure*), values(`values') bys patientid: egen CABG_excluded = max(tag2)
Comment
Layo Olabisi

Join Date: Jan 2020

Posts: 10
#12

10 Jan 2020, 11:10

Thank you so much for consistently helping me with this. I am grateful
Comment
Layo Olabisi

Join Date: Jan 2020

Posts: 10
#13

10 Jan 2020, 11:13

I am confused as to what data/code I might need to run the Elixhauser Module. I am guessing it is the diagnosis code, I have been asked to run the module to create Elixhauser comorbidity flags and a count of comorbidities. This is the same dataset I have been using for all previous questions.

Someone had previously posted something similar but there wasn't any response given. Can you help with this?
Thank you.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#14

10 Jan 2020, 11:26

#13 seems a completely different question, In any case you've already asked at https://www.statalist.org/forums/for...auser-question

Please don't ask the same question in different places, Anyone able to comment should please follow the cited thread,
Comment
Layo Olabisi

Join Date: Jan 2020

Posts: 10
#15

10 Jan 2020, 12:01

Ok! Nick. Thanks
Comment

Announcement