Tag the first violent incident: Panel data

Marvin Aliaga

Join Date: Feb 2015

Posts: 255
#1

Tag the first violent incident: Panel data

20 Apr 2016, 10:17

Hello group,

I have a panel data containing violent incidents by patients in a mental health facility. A patient can have multiple admissions. I need to know in which admission the first incident occur. I created
the order_admin variable which is the order of the admission (first, second m third , son admission) . So for the first patient , the first incident occurred in the 3rd admission. So I would like a column with the number 3 for all admissions of that patients.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input double admission str9 patientID long number_incidents float order_admin 349150988 "000235" 0 5 349140796 "000235" 0 2 241131237 "000235" 0 1 349150932 "000235" 0 4 141140620 "000235" 1 3 349151105 "000235" 2 6 240140009 "000255" 0 1 241150015 "000255" 2 2 241150843 "000255" 3 3 875150007 "000262" 0 6 875140165 "000262" 0 5 875140131 "000262" 0 4 875150077 "000262" 0 8 825130128 "000262" 0 2 441130892 "000262" 0 1 875140044 "000262" 0 3 875150016 "000262" 1 7 end

Thank you in advance,
Marvin
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35563

20 Apr 2016, 10:28

This is an FAQ. The first date (here just a serial identifier, but that changes nothing essential) is just the minimum date, and conditions defining which kind of first date are just to be specified by extra syntax.

See http://www.stata.com/support/faqs/da...t-occurrences/ and http://www.stata-journal.com/article...article=dm0055

Code:

clear
input double admission str9 patientID long number_incidents float order_admin
349150988 "000235" 0 5
349140796 "000235" 0 2
241131237 "000235" 0 1
349150932 "000235" 0 4
141140620 "000235" 1 3
349151105 "000235" 2 6
240140009 "000255" 0 1
241150015 "000255" 2 2
241150843 "000255" 3 3
875150007 "000262" 0 6
875140165 "000262" 0 5
875140131 "000262" 0 4
875150077 "000262" 0 8
825130128 "000262" 0 2
441130892 "000262" 0 1
875140044 "000262" 0 3
875150016 "000262" 1 7
end

sort patientID order_admin 
by patientID : egen first = min(cond(number > 0, order_admin, .))
format admission %10.0f 
list, sepby(patientID) 


     +----------------------------------------------------+
     | admission   patien~D   number~s   order_~n   first |
     |----------------------------------------------------|
  1. | 241131237     000235          0          1       3 |
  2. | 349140796     000235          0          2       3 |
  3. | 141140620     000235          1          3       3 |
  4. | 349150932     000235          0          4       3 |
  5. | 349150988     000235          0          5       3 |
  6. | 349151105     000235          2          6       3 |
     |----------------------------------------------------|
  7. | 240140009     000255          0          1       2 |
  8. | 241150015     000255          2          2       2 |
  9. | 241150843     000255          3          3       2 |
     |----------------------------------------------------|
 10. | 441130892     000262          0          1       7 |
 11. | 825130128     000262          0          2       7 |
 12. | 875140044     000262          0          3       7 |
 13. | 875140131     000262          0          4       7 |
 14. | 875140165     000262          0          5       7 |
 15. | 875150007     000262          0          6       7 |
 16. | 875150016     000262          1          7       7 |
 17. | 875150077     000262          0          8       7 |
     +----------------------------------------------------+

Comment

Friedrich Huebler

Join Date: Apr 2014
Posts: 1053

20 Apr 2016, 10:38

Nick's solution is shorter and more elegant but I wanted to show that there are other ways to get the same result.

Code:

gen first = 0
levelsof patientID, local(patients)
foreach p of local patients {
  sum order_admin if "`p'" == patientID & number_incidents > 0
  replace first = r(min) if "`p'" == patientID
}
format admission %9.0f
sort patientID order_admin
list, sepby(patientID)

     +----------------------------------------------------+
     | admission   patien~D   number~s   order_~n   first |
     |----------------------------------------------------|
  1. | 241131237     000235          0          1       3 |
  2. | 349140796     000235          0          2       3 |
  3. | 141140620     000235          1          3       3 |
  4. | 349150932     000235          0          4       3 |
  5. | 349150988     000235          0          5       3 |
  6. | 349151105     000235          2          6       3 |
     |----------------------------------------------------|
  7. | 240140009     000255          0          1       2 |
  8. | 241150015     000255          2          2       2 |
  9. | 241150843     000255          3          3       2 |
     |----------------------------------------------------|
 10. | 441130892     000262          0          1       7 |
 11. | 825130128     000262          0          2       7 |
 12. | 875140044     000262          0          3       7 |
 13. | 875140131     000262          0          4       7 |
 14. | 875140165     000262          0          5       7 |
 15. | 875150007     000262          0          6       7 |
 16. | 875150016     000262          1          7       7 |
 17. | 875150077     000262          0          8       7 |
     +----------------------------------------------------+

Last edited by Friedrich Huebler; 20 Apr 2016, 11:06. Reason: Edit: wrong table included in original post.

Comment

Marvin Aliaga

Join Date: Feb 2015

Posts: 255
#4

20 Apr 2016, 11:01

Thank you so much both! I was trying to do this with the _N, _n-1, n system variables but it dint work. Friedrich: thank you so much for showing another way to get the same results but I will use Nicks codes. I havent used macro syntax. It is a new world for me.

Nick: I would need to flag the second occurrence, I guess nature/concept of your syntax will not work, right? I will need another strategy?
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35563

20 Apr 2016, 11:49

Yes, I would change approach for that. If you put the observations with no incidents separately, then the first, second, etc. dates are just the first, second dates in order among the observations with any incidents.

To understand this, pepper the commands below with list statements to see what each step does.

Code:

clear
input double admission str9 patientID long number_incidents float order_admin
349150988 "000235" 0 5
349140796 "000235" 0 2
241131237 "000235" 0 1
349150932 "000235" 0 4
141140620 "000235" 1 3
349151105 "000235" 2 6
240140009 "000255" 0 1
241150015 "000255" 2 2
241150843 "000255" 3 3
875150007 "000262" 0 6
875140165 "000262" 0 5
875140131 "000262" 0 4
875150077 "000262" 0 8
825130128 "000262" 0 2
441130892 "000262" 0 1
875140044 "000262" 0 3
875150016 "000262" 1 7
end

gen any = number > 0
bysort any patientID (order_admin) : gen first = order_admin[1] if any
by any patientID (order_admin) : gen second = order_admin[2] if any
bysort patientID (first) : replace first = first[1]
bysort patientID (second) : replace second = second[1]

sort patientID order_admin
list patientID order number first second, sepby(patientID)

. list patientID order number first second, sepby(patientID)

     +-------------------------------------------------+
     | patien~D   order_~n   number~s   first   second |
     |-------------------------------------------------|
  1. |   000235          1          0       3        6 |
  2. |   000235          2          0       3        6 |
  3. |   000235          3          1       3        6 |
  4. |   000235          4          0       3        6 |
  5. |   000235          5          0       3        6 |
  6. |   000235          6          2       3        6 |
     |-------------------------------------------------|
  7. |   000255          1          0       2        3 |
  8. |   000255          2          2       2        3 |
  9. |   000255          3          3       2        3 |
     |-------------------------------------------------|
 10. |   000262          1          0       7        . |
 11. |   000262          2          0       7        . |
 12. |   000262          3          0       7        . |
 13. |   000262          4          0       7        . |
 14. |   000262          5          0       7        . |
 15. |   000262          6          0       7        . |
 16. |   000262          7          1       7        . |
 17. |   000262          8          0       7        . |
     +-------------------------------------------------+

Last edited by Nick Cox; 20 Apr 2016, 11:51.

Comment

Marvin Aliaga

Join Date: Feb 2015

Posts: 255
#6

20 Apr 2016, 16:45

Nick! this is great! The strategy is perfect!
Comment

Marvin Aliaga

Join Date: Feb 2015
Posts: 255

27 Apr 2016, 09:24

Hi Nick Cox,

I am sorry to come back to this post but I have a somewhat similar issue that I couldn't fix using you previous suggestions and other personal posts.

The unit of analysis in this date-set is the incident. Look at the patient below. this patient has 5 admissions in the my data-set. My final goal is to collapse the data and have a variable for the number of incidents this patient had in the first admission in 2015. In this case, it will be 2 incidents for the first admission in 2015 (admission id 6461). All the incidents after the first 2015 admission should be deleted (I don't need it). I also need to collapse and get the number of incidents for the prior admission before the first admission for 2015. In this case this patient had 3 incidents in the "prior" admission before 2015 (admission id=1209). I do not need the incidents before the "prior" admission. That is data from older admissions such as 881 should be deleted.

The logic of this analysis is to see if previous "violent" incidents (DV) predict future violent incidents (IV). I was trying to tag incidents based on conditions and then keep only those incidents that apply for the patient so then I can collapse my data. However, I couldn't find a effective way.

patient_id	admission_id	admission_date	incident_date
00043190H	8881	20-Apr-13	9/2/2013
00043190H	8881	21-Apr-13	9/18/2013
00043190H	1209	16-Apr-14	9/16/2014
00043190H	1209	16-Apr-14	2/3/2015
00043190H	1209	16-Apr-14	9/15/2014
00043190H	6461	10-Aug-15	8/16/2015
00043190H	6461	10-Aug-15	8/21/2015
00043190H	1024	22-Sep-15	10/13/2015
00043190H	1024	22-Sep-15	9/24/2015
00043190H	2149	22-Mar-16	4/12/2016

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str5 patient_id int admission_id float(incident_date admission_date)
"3190H" 8881 19603 19468
"3190H" 8881 19619 19469
"3190H" 1209 19982 19829
"3190H" 1209 20122 19829
"3190H" 1209 19981 19829
"3190H" 6461 20316 20310
"3190H" 6461 20321 20310
"3190H" 1024 20374 20353
"3190H" 1024 20355 20353
"3190H" 2149 20556 20535
"1485K" 4895 20221 20201
"1485K" 4895 20283 20202
"8711P" 8963 20134 20113
end
format %td incident_date
format %td admission_date

I would appreciate any help. thank you so much!
Marvin

Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30034

27 Apr 2016, 10:38

First, there is a small problem with the data you posted with -dataex-. Your admission_date variable is not always consistent within admission_id. In the code I show below, I "fix" this by picking one date for each admission_id, though it may be that the error is actually that there should be separate admission_id's. Anyway, this allows the process to proceed. You can deal with this appropriately in your actual data as needed. Also, you don't say what to do if somebody doesn't have any admissions in 2015, or if they do but have no admissions prior to 2015. The code below always keeps exactly the first 2015 admission (if any) and the last pre-2015 admission (if there is one). So a given patient can end up with zero, one or two admissions retained depending on the circumstances. It should be fairly simple for you to modify the code to drop the cases with only one admission if you prefer that.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str5 patient_id int admission_id float(incident_date admission_date)
"3190H" 8881 19603 19468
"3190H" 8881 19619 19469
"3190H" 1209 19982 19829
"3190H" 1209 20122 19829
"3190H" 1209 19981 19829
"3190H" 6461 20316 20310
"3190H" 6461 20321 20310
"3190H" 1024 20374 20353
"3190H" 1024 20355 20353
"3190H" 2149 20556 20535
"1485K" 4895 20221 20201
"1485K" 4895 20283 20202
"8711P" 8963 20134 20113
end
format %td incident_date
format %td admission_date

list, noobs clean

// NOTE THAT ADMISSION DATE IS NOT CONSISTENT WITHIN ADMISSION ID
// WILL MODIFY THE DATA TO FIX THAT 
replace admission_date = td(23apr2015) if admission_id == 4895
replace admission_date = td(20apr2013) if admission_id == 8881

// VERIFY ADMISSION DATE CONSISTENT WITHIN ADMISSIONS & NEVER MISSING
by patient_id admission_id (admission_date), sort: ///
    assert admission_date[1]==admission_date[_N]
assert !missing(admission_date)

// CREATE A TEMPORARY DATA SET OF JUST ADMISSION ID's AND ADMISSION DATES
preserve
keep patient_id admission_id admission_date
duplicates drop

// DISTINGUISH ADMISSIONS BEFORE 2015 (-1), IN 2015 (0), AFTER 2015 (1)
gen int era = sign(yofd(admission_date)-2015)
// DROP ADMISSIONS AFTER 2015
drop if era == 1
// KEEP FIRST ADMISSION IN ERA 0 AND LAST IN ERA 1
by patient_id era (admission_date), sort: drop if _n > 1 & era == 0
by patient_id era (admission_date): drop if _n < _N & era == -1

// SAVE THIS IN A TEMPORARY FILE
isid patient_id admission_date, sort
tempfile targeted_admissions
save `targeted_admissions'

// BRING BACK ORIGINAL DATA & MERGE, RETAINING ONLY MATCHES
restore
merge m:1 patient_id admission_id admission_date using `targeted_admissions', ///
    assert(match master) keep(match) nogenerate

// SHOW THE REDUCED DATA
list, noobs clean

// NOW COLLAPSE TO NUMBER OF INCIDENTS PER ADMISSION
collapse (count) n_incidents = incident_date, by(patient_id era)
format n_incidents %1.0f
label define era -1 "Last Before 2015" 0 "First 2015"
label values era era
list, noobs clean

Note: The two -by patient_id era (admission_date): drop if- statements can be combined into one using a -cond()- expression in the -if- clause, but I preferred to keep the code more transparent and easy to read.

Comment

Marvin Aliaga

Join Date: Feb 2015

Posts: 255
#9

27 Apr 2016, 11:47

I am sorry for the discrepancies among my admission ID and date variables. It was a merely typo when I was creating my sample dataset.

"Also, you don't say what to do if somebody doesn't have any admissions in 2015, or if they do but have no admissions prior to 2015"
So if they don't have any admission for 2015, for example: an patient that was admitted in 11/13/2014 and have 2 incidents, should not be in the dataset. On the other hand, if a patient do not have any admissions prior 2015 but have admission in 2015, this patient should be included and he/she will have a missing (".") for prior incidents cause he/she did not have any prior admissions if possible. If this is too complicated ,we can code them as 0.

To clarify, the goal is to see if prior violent incidents (in this case, prior violence incident in the last admission) predicts violent behaviors in their next admission (in this case 2015). I arbitrarily chose the first admission in 2015 to control for length of stay etc.

I dont quit understand why you create a temporary file and then merge it. After you do the code below, the data was perfect. Couldn't we just collapse after this?

// DISTINGUISH ADMISSIONS BEFORE 2015 (-1), IN 2015 (0), AFTER 2015 (1) gen int era = sign(yofd(admission_date)-2015) // DROP ADMISSIONS AFTER 2015 drop if era == 1 // KEEP FIRST ADMISSION IN ERA 0 AND LAST IN ERA 1 by patient_id era (admission_date), sort: drop if _n > 1 & era == 0 by patient_id era (admission_date): drop if _n < _N & era == -1
Finally, I think I need to reshape my data after the final collapse. I think I need a dataset where each row is one patient. So for example patient 1485K should have 2 incident in 2015 (first) and . missing for before (or 0 if tis too complicated). I can do this by myself- no need for explanation.

Thank you so much! I hope this make things more clear.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30034
#10

27 Apr 2016, 12:26

I dont quit understand why you create a temporary file and then merge it. After you do the code below, the data was perfect. Couldn't we just collapse after this?

// DISTINGUISH ADMISSIONS BEFORE 2015 (-1), IN 2015 (0), AFTER 2015 (1) gen int era = sign(yofd(admission_date)-2015) // DROP ADMISSIONS AFTER 2015 drop if era == 1 // KEEP FIRST ADMISSION IN ERA 0 AND LAST IN ERA 1 by patient_id era (admission_date), sort: drop if _n > 1 & era == 0 by patient_id era (admission_date): drop if _n < _N & era == -1

No, because at that point there is only one observation for each admission, so if you -collapse- there, the count of incidents in each era will always be 1. The data at this point simply identifies each admission that is to be considered. The -merge- back to the original data then, in effect, expands the data to include all of the incidents in those admissions. Then, you can collapse to get a count of incidents in each of those admissions.

Yes, you will probably want to -reshape wide- because you want to use the two counts as separate variables in a model. The gist of it would be to use era as the -j()- variable, but you can't because at that point it's coded -1 0. So when you're ready to -reshape- you need a little maneuver first:

Code:

label drop era replace era = era+1 reshape wide n_incidents, i(patient_id) j(era) rename n_incidents0 incidents_before_2015 rename n_incidents1 incidents_2015
Comment
Marvin Aliaga

Join Date: Feb 2015

Posts: 255
#11

28 Apr 2016, 10:51

Hi Clyde,

i am sorry for the late reply. I was cleaning my data first in order to run your syntax, as well as trying to understand what you did. By the way I don't know how you came up with this efficient strategy. It's amazing and extremely efficient and "beautiful" !!

However, there is an error; we are keeping patients that do not have 2015 admissions but have previous admissions. The first filter should be keep patients that have at least one admission in 2015. that would be my sample, then as we did already create variables for their previous violent incident history and the number of incidents in 2015 . These is related to your questions which I answered in my previous post.

How can we solve that? This is my try:

by patient_id: gen byte adm15= yofd(admission_date)== 2015
by patient_id: ege adm15_group=max(adm15)
keep if adm15_group==1/// keep patient that have at least 1 2015 admission.

and then run your code. What do you think?

Thank you!
Marvin

Last edited by Marvin Aliaga; 28 Apr 2016, 11:02.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30034
#12

28 Apr 2016, 12:03

Your code has a typo in the second line: the final n in -egen- is missing. Other than that it should work. A slight improvement: you can combine the first two lines into a single command:

Code:

by patient_id, sort: egen adm15_group = max(yofd(admission_date) == 2015) keep if adm15_group == 1
Comment

Announcement