Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tag the first violent incident: Panel data

    Hello group,

    I have a panel data containing violent incidents by patients in a mental health facility. A patient can have multiple admissions. I need to know in which admission the first incident occur. I created
    the order_admin variable which is the order of the admission (first, second m third , son admission) . So for the first patient , the first incident occurred in the 3rd admission. So I would like a column with the number 3 for all admissions of that patients.


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double admission str9 patientID long number_incidents float order_admin
    349150988 "000235" 0 5
    349140796 "000235" 0 2
    241131237 "000235" 0 1
    349150932 "000235" 0 4
    141140620 "000235" 1 3
    349151105 "000235" 2 6
    240140009 "000255" 0 1
    241150015 "000255" 2 2
    241150843 "000255" 3 3
    875150007 "000262" 0 6
    875140165 "000262" 0 5
    875140131 "000262" 0 4
    875150077 "000262" 0 8
    825130128 "000262" 0 2
    441130892 "000262" 0 1
    875140044 "000262" 0 3
    875150016 "000262" 1 7
    end
    Thank you in advance,
    Marvin

  • #2
    This is an FAQ. The first date (here just a serial identifier, but that changes nothing essential) is just the minimum date, and conditions defining which kind of first date are just to be specified by extra syntax.

    See http://www.stata.com/support/faqs/da...t-occurrences/ and http://www.stata-journal.com/article...article=dm0055


    Code:
    clear
    input double admission str9 patientID long number_incidents float order_admin
    349150988 "000235" 0 5
    349140796 "000235" 0 2
    241131237 "000235" 0 1
    349150932 "000235" 0 4
    141140620 "000235" 1 3
    349151105 "000235" 2 6
    240140009 "000255" 0 1
    241150015 "000255" 2 2
    241150843 "000255" 3 3
    875150007 "000262" 0 6
    875140165 "000262" 0 5
    875140131 "000262" 0 4
    875150077 "000262" 0 8
    825130128 "000262" 0 2
    441130892 "000262" 0 1
    875140044 "000262" 0 3
    875150016 "000262" 1 7
    end
    
    sort patientID order_admin 
    by patientID : egen first = min(cond(number > 0, order_admin, .))
    format admission %10.0f 
    list, sepby(patientID) 
    
    
         +----------------------------------------------------+
         | admission   patien~D   number~s   order_~n   first |
         |----------------------------------------------------|
      1. | 241131237     000235          0          1       3 |
      2. | 349140796     000235          0          2       3 |
      3. | 141140620     000235          1          3       3 |
      4. | 349150932     000235          0          4       3 |
      5. | 349150988     000235          0          5       3 |
      6. | 349151105     000235          2          6       3 |
         |----------------------------------------------------|
      7. | 240140009     000255          0          1       2 |
      8. | 241150015     000255          2          2       2 |
      9. | 241150843     000255          3          3       2 |
         |----------------------------------------------------|
     10. | 441130892     000262          0          1       7 |
     11. | 825130128     000262          0          2       7 |
     12. | 875140044     000262          0          3       7 |
     13. | 875140131     000262          0          4       7 |
     14. | 875140165     000262          0          5       7 |
     15. | 875150007     000262          0          6       7 |
     16. | 875150016     000262          1          7       7 |
     17. | 875150077     000262          0          8       7 |
         +----------------------------------------------------+

    Comment


    • #3
      Nick's solution is shorter and more elegant but I wanted to show that there are other ways to get the same result.
      Code:
      gen first = 0
      levelsof patientID, local(patients)
      foreach p of local patients {
        sum order_admin if "`p'" == patientID & number_incidents > 0
        replace first = r(min) if "`p'" == patientID
      }
      format admission %9.0f
      sort patientID order_admin
      list, sepby(patientID)
      
           +----------------------------------------------------+
           | admission   patien~D   number~s   order_~n   first |
           |----------------------------------------------------|
        1. | 241131237     000235          0          1       3 |
        2. | 349140796     000235          0          2       3 |
        3. | 141140620     000235          1          3       3 |
        4. | 349150932     000235          0          4       3 |
        5. | 349150988     000235          0          5       3 |
        6. | 349151105     000235          2          6       3 |
           |----------------------------------------------------|
        7. | 240140009     000255          0          1       2 |
        8. | 241150015     000255          2          2       2 |
        9. | 241150843     000255          3          3       2 |
           |----------------------------------------------------|
       10. | 441130892     000262          0          1       7 |
       11. | 825130128     000262          0          2       7 |
       12. | 875140044     000262          0          3       7 |
       13. | 875140131     000262          0          4       7 |
       14. | 875140165     000262          0          5       7 |
       15. | 875150007     000262          0          6       7 |
       16. | 875150016     000262          1          7       7 |
       17. | 875150077     000262          0          8       7 |
           +----------------------------------------------------+
      Last edited by Friedrich Huebler; 20 Apr 2016, 11:06. Reason: Edit: wrong table included in original post.

      Comment


      • #4
        Thank you so much both! I was trying to do this with the _N, _n-1, n system variables but it dint work. Friedrich: thank you so much for showing another way to get the same results but I will use Nicks codes. I havent used macro syntax. It is a new world for me.

        Nick: I would need to flag the second occurrence, I guess nature/concept of your syntax will not work, right? I will need another strategy?

        Comment


        • #5
          Yes, I would change approach for that. If you put the observations with no incidents separately, then the first, second, etc. dates are just the first, second dates in order among the observations with any incidents.

          To understand this, pepper the commands below with list statements to see what each step does.

          Code:
          clear
          input double admission str9 patientID long number_incidents float order_admin
          349150988 "000235" 0 5
          349140796 "000235" 0 2
          241131237 "000235" 0 1
          349150932 "000235" 0 4
          141140620 "000235" 1 3
          349151105 "000235" 2 6
          240140009 "000255" 0 1
          241150015 "000255" 2 2
          241150843 "000255" 3 3
          875150007 "000262" 0 6
          875140165 "000262" 0 5
          875140131 "000262" 0 4
          875150077 "000262" 0 8
          825130128 "000262" 0 2
          441130892 "000262" 0 1
          875140044 "000262" 0 3
          875150016 "000262" 1 7
          end
          
          gen any = number > 0
          bysort any patientID (order_admin) : gen first = order_admin[1] if any
          by any patientID (order_admin) : gen second = order_admin[2] if any
          bysort patientID (first) : replace first = first[1]
          bysort patientID (second) : replace second = second[1]
          
          sort patientID order_admin
          list patientID order number first second, sepby(patientID)
          
          . list patientID order number first second, sepby(patientID)
          
               +-------------------------------------------------+
               | patien~D   order_~n   number~s   first   second |
               |-------------------------------------------------|
            1. |   000235          1          0       3        6 |
            2. |   000235          2          0       3        6 |
            3. |   000235          3          1       3        6 |
            4. |   000235          4          0       3        6 |
            5. |   000235          5          0       3        6 |
            6. |   000235          6          2       3        6 |
               |-------------------------------------------------|
            7. |   000255          1          0       2        3 |
            8. |   000255          2          2       2        3 |
            9. |   000255          3          3       2        3 |
               |-------------------------------------------------|
           10. |   000262          1          0       7        . |
           11. |   000262          2          0       7        . |
           12. |   000262          3          0       7        . |
           13. |   000262          4          0       7        . |
           14. |   000262          5          0       7        . |
           15. |   000262          6          0       7        . |
           16. |   000262          7          1       7        . |
           17. |   000262          8          0       7        . |
               +-------------------------------------------------+
          Last edited by Nick Cox; 20 Apr 2016, 11:51.

          Comment


          • #6
            Nick! this is great! The strategy is perfect!

            Comment


            • #7
              Hi Nick Cox,

              I am sorry to come back to this post but I have a somewhat similar issue that I couldn't fix using you previous suggestions and other personal posts.

              The unit of analysis in this date-set is the incident. Look at the patient below. this patient has 5 admissions in the my data-set. My final goal is to collapse the data and have a variable for the number of incidents this patient had in the first admission in 2015. In this case, it will be 2 incidents for the first admission in 2015 (admission id 6461). All the incidents after the first 2015 admission should be deleted (I don't need it). I also need to collapse and get the number of incidents for the prior admission before the first admission for 2015. In this case this patient had 3 incidents in the "prior" admission before 2015 (admission id=1209). I do not need the incidents before the "prior" admission. That is data from older admissions such as 881 should be deleted.

              The logic of this analysis is to see if previous "violent" incidents (DV) predict future violent incidents (IV). I was trying to tag incidents based on conditions and then keep only those incidents that apply for the patient so then I can collapse my data. However, I couldn't find a effective way.
              patient_id admission_id admission_date incident_date
              00043190H 8881 20-Apr-13 9/2/2013
              00043190H 8881 21-Apr-13 9/18/2013
              00043190H 1209 16-Apr-14 9/16/2014
              00043190H 1209 16-Apr-14 2/3/2015
              00043190H 1209 16-Apr-14 9/15/2014
              00043190H 6461 10-Aug-15 8/16/2015
              00043190H 6461 10-Aug-15 8/21/2015
              00043190H 1024 22-Sep-15 10/13/2015
              00043190H 1024 22-Sep-15 9/24/2015
              00043190H 2149 22-Mar-16 4/12/2016


              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input str5 patient_id int admission_id float(incident_date admission_date)
              "3190H" 8881 19603 19468
              "3190H" 8881 19619 19469
              "3190H" 1209 19982 19829
              "3190H" 1209 20122 19829
              "3190H" 1209 19981 19829
              "3190H" 6461 20316 20310
              "3190H" 6461 20321 20310
              "3190H" 1024 20374 20353
              "3190H" 1024 20355 20353
              "3190H" 2149 20556 20535
              "1485K" 4895 20221 20201
              "1485K" 4895 20283 20202
              "8711P" 8963 20134 20113
              end
              format %td incident_date
              format %td admission_date
              I would appreciate any help. thank you so much!
              Marvin

              Comment


              • #8
                First, there is a small problem with the data you posted with -dataex-. Your admission_date variable is not always consistent within admission_id. In the code I show below, I "fix" this by picking one date for each admission_id, though it may be that the error is actually that there should be separate admission_id's. Anyway, this allows the process to proceed. You can deal with this appropriately in your actual data as needed. Also, you don't say what to do if somebody doesn't have any admissions in 2015, or if they do but have no admissions prior to 2015. The code below always keeps exactly the first 2015 admission (if any) and the last pre-2015 admission (if there is one). So a given patient can end up with zero, one or two admissions retained depending on the circumstances. It should be fairly simple for you to modify the code to drop the cases with only one admission if you prefer that.

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input str5 patient_id int admission_id float(incident_date admission_date)
                "3190H" 8881 19603 19468
                "3190H" 8881 19619 19469
                "3190H" 1209 19982 19829
                "3190H" 1209 20122 19829
                "3190H" 1209 19981 19829
                "3190H" 6461 20316 20310
                "3190H" 6461 20321 20310
                "3190H" 1024 20374 20353
                "3190H" 1024 20355 20353
                "3190H" 2149 20556 20535
                "1485K" 4895 20221 20201
                "1485K" 4895 20283 20202
                "8711P" 8963 20134 20113
                end
                format %td incident_date
                format %td admission_date
                
                list, noobs clean
                
                // NOTE THAT ADMISSION DATE IS NOT CONSISTENT WITHIN ADMISSION ID
                // WILL MODIFY THE DATA TO FIX THAT 
                replace admission_date = td(23apr2015) if admission_id == 4895
                replace admission_date = td(20apr2013) if admission_id == 8881
                
                // VERIFY ADMISSION DATE CONSISTENT WITHIN ADMISSIONS & NEVER MISSING
                by patient_id admission_id (admission_date), sort: ///
                    assert admission_date[1]==admission_date[_N]
                assert !missing(admission_date)
                
                // CREATE A TEMPORARY DATA SET OF JUST ADMISSION ID's AND ADMISSION DATES
                preserve
                keep patient_id admission_id admission_date
                duplicates drop
                
                // DISTINGUISH ADMISSIONS BEFORE 2015 (-1), IN 2015 (0), AFTER 2015 (1)
                gen int era = sign(yofd(admission_date)-2015)
                // DROP ADMISSIONS AFTER 2015
                drop if era == 1
                // KEEP FIRST ADMISSION IN ERA 0 AND LAST IN ERA 1
                by patient_id era (admission_date), sort: drop if _n > 1 & era == 0
                by patient_id era (admission_date): drop if _n < _N & era == -1
                
                // SAVE THIS IN A TEMPORARY FILE
                isid patient_id admission_date, sort
                tempfile targeted_admissions
                save `targeted_admissions'
                
                // BRING BACK ORIGINAL DATA & MERGE, RETAINING ONLY MATCHES
                restore
                merge m:1 patient_id admission_id admission_date using `targeted_admissions', ///
                    assert(match master) keep(match) nogenerate
                
                // SHOW THE REDUCED DATA
                list, noobs clean
                
                // NOW COLLAPSE TO NUMBER OF INCIDENTS PER ADMISSION
                collapse (count) n_incidents = incident_date, by(patient_id era)
                format n_incidents %1.0f
                label define era -1 "Last Before 2015" 0 "First 2015"
                label values era era
                list, noobs clean
                Note: The two -by patient_id era (admission_date): drop if- statements can be combined into one using a -cond()- expression in the -if- clause, but I preferred to keep the code more transparent and easy to read.

                Comment


                • #9
                  I am sorry for the discrepancies among my admission ID and date variables. It was a merely typo when I was creating my sample dataset.

                  "Also, you don't say what to do if somebody doesn't have any admissions in 2015, or if they do but have no admissions prior to 2015"
                  So if they don't have any admission for 2015, for example: an patient that was admitted in 11/13/2014 and have 2 incidents, should not be in the dataset. On the other hand, if a patient do not have any admissions prior 2015 but have admission in 2015, this patient should be included and he/she will have a missing (".") for prior incidents cause he/she did not have any prior admissions if possible. If this is too complicated ,we can code them as 0.

                  To clarify, the goal is to see if prior violent incidents (in this case, prior violence incident in the last admission) predicts violent behaviors in their next admission (in this case 2015). I arbitrarily chose the first admission in 2015 to control for length of stay etc.

                  I dont quit understand why you create a temporary file and then merge it. After you do the code below, the data was perfect. Couldn't we just collapse after this?

                  // DISTINGUISH ADMISSIONS BEFORE 2015 (-1), IN 2015 (0), AFTER 2015 (1) gen int era = sign(yofd(admission_date)-2015) // DROP ADMISSIONS AFTER 2015 drop if era == 1 // KEEP FIRST ADMISSION IN ERA 0 AND LAST IN ERA 1 by patient_id era (admission_date), sort: drop if _n > 1 & era == 0 by patient_id era (admission_date): drop if _n < _N & era == -1
                  Finally, I think I need to reshape my data after the final collapse. I think I need a dataset where each row is one patient. So for example patient 1485K should have 2 incident in 2015 (first) and . missing for before (or 0 if tis too complicated). I can do this by myself- no need for explanation.


                  Thank you so much! I hope this make things more clear.


                  Comment


                  • #10
                    I dont quit understand why you create a temporary file and then merge it. After you do the code below, the data was perfect. Couldn't we just collapse after this?

                    // DISTINGUISH ADMISSIONS BEFORE 2015 (-1), IN 2015 (0), AFTER 2015 (1) gen int era = sign(yofd(admission_date)-2015) // DROP ADMISSIONS AFTER 2015 drop if era == 1 // KEEP FIRST ADMISSION IN ERA 0 AND LAST IN ERA 1 by patient_id era (admission_date), sort: drop if _n > 1 & era == 0 by patient_id era (admission_date): drop if _n < _N & era == -1
                    No, because at that point there is only one observation for each admission, so if you -collapse- there, the count of incidents in each era will always be 1. The data at this point simply identifies each admission that is to be considered. The -merge- back to the original data then, in effect, expands the data to include all of the incidents in those admissions. Then, you can collapse to get a count of incidents in each of those admissions.

                    Yes, you will probably want to -reshape wide- because you want to use the two counts as separate variables in a model. The gist of it would be to use era as the -j()- variable, but you can't because at that point it's coded -1 0. So when you're ready to -reshape- you need a little maneuver first:

                    Code:
                    label drop era
                    replace era = era+1
                    reshape wide n_incidents, i(patient_id) j(era)
                    rename n_incidents0 incidents_before_2015
                    rename n_incidents1 incidents_2015

                    Comment


                    • #11
                      Hi Clyde,

                      i am sorry for the late reply. I was cleaning my data first in order to run your syntax, as well as trying to understand what you did. By the way I don't know how you came up with this efficient strategy. It's amazing and extremely efficient and "beautiful" !!

                      However, there is an error; we are keeping patients that do not have 2015 admissions but have previous admissions. The first filter should be keep patients that have at least one admission in 2015. that would be my sample, then as we did already create variables for their previous violent incident history and the number of incidents in 2015 . These is related to your questions which I answered in my previous post.

                      How can we solve that? This is my try:

                      by patient_id: gen byte adm15= yofd(admission_date)== 2015
                      by patient_id: ege adm15_group=max(adm15)
                      keep if adm15_group==1/// keep patient that have at least 1 2015 admission.

                      and then run your code. What do you think?


                      Thank you!
                      Marvin
                      Last edited by Marvin Aliaga; 28 Apr 2016, 11:02.

                      Comment


                      • #12
                        Your code has a typo in the second line: the final n in -egen- is missing. Other than that it should work. A slight improvement: you can combine the first two lines into a single command:

                        Code:
                        by patient_id, sort: egen adm15_group = max(yofd(admission_date) == 2015)
                        keep if adm15_group == 1

                        Comment

                        Working...
                        X