Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • replacing based on more than 1 condition

    Hi there.

    I have a list of procedures of each patient who came to hospital.
    I have grouped the data to present each hospitalID in order of admission to hospital (Date2).

    I would like to find out if they developed a surgical site infection (SSI2) within 30 days of their date of procedure (date2)

    hospitalid = id of patient
    expected_ssi = is what i would like to see ie has the patient developed a SSI within 30 days

    ssi2 = is the result i generated by using this code I wrote

    gen SSI2 = 0
    replace SSI2 =1 if SSI == 1 & procedureid == 1 & date2+30



    The problem with this code is that it does not consider a situation where the patient did not have a procedure on the admission,
    eg pt 12A admitted on 3 Dec 2020 but developed a SSI this should be coded SSI2 = 1

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float procedureid str3 hospitalid float SSI str11 date str3 expected_ssi float(date2 SSI2)
    1 "12A" 0 "1 Dec 2020"  "no"  22250 0
    . "12A" 1 "3 Dec 2020"  "yes" 22252 0
    . "13A" 0 "5 Dec 2021"  "no"  22619 0
    1 "13A" 0 "1 Jan 2022"  "no"  22646 0
    . "13A" 1 "20 Jan 2022" "yes" 22665 0
    1 "14A" 1 "1 Mar 2022"  "yes" 22705 1
    . "14A" 0 "2 Mar 2022"  "no"  22706 0
    . "15A" 1 "15 Mar 2022" "yes" 22719 0
    end
    format %td date2

  • #2
    maybe,
    Code:
    bysort hospitalid (date): gen wanted = SSI & date2 <= date2[1] + 30

    Comment


    • #3
      I don’t think this would work

      as the code will sort everything by hospitalid which means
      - procedureid will lose its sorting

      - your code indicates one will take the first value of date2 for each hospital ID to calculate + 30days. This brings about a problem as seen

      for pt 13A STATA will calculate the 30 days from 5 Dec 2021 when the 30 days should be calculated from the time the pt had a procedure ie 1 Jann2022

      Comment


      • #4
        How about something like this?
        Code:
        sort hospitalid date2
        by hospitalid: egen proc_date = max(cond(missing(procedureid),.,date2))
        format %td proc_date
        
        gen byte wanted = cond(missing(proc_date),.,SSI*(date2-proc_date<=30))
        which produces:
        Code:
        . li procedureid hospitalid SSI date2 proc_date wanted, noobs ab(20) sepby(hospitalid)
        
          +-----------------------------------------------------------------+
          | procedureid   hospitalid   SSI       date2   proc_date   wanted |
          |-----------------------------------------------------------------|
          |           1          12A     0   01dec2020   01dec2020        0 |
          |           .          12A     1   03dec2020   01dec2020        1 |
          |-----------------------------------------------------------------|
          |           .          13A     0   05dec2021   01jan2022        0 |
          |           1          13A     0   01jan2022   01jan2022        0 |
          |           .          13A     1   20jan2022   01jan2022        1 |
          |-----------------------------------------------------------------|
          |           1          14A     1   01mar2022   01mar2022        1 |
          |           .          14A     0   02mar2022   01mar2022        0 |
          |-----------------------------------------------------------------|
          |           .          15A     1   15mar2022           .        . |
          +-----------------------------------------------------------------+

        Comment


        • #5
          my bad. how about,
          Code:
          bysort hospitalid (procedureid date2): gen wanted = SSI & inrange(date2, date2[1], date2[1] + 30)

          Comment


          • #6
            Hi, may I ask Hemanshu Kumar nice code...

            sort hospitalid date2
            by hospitalid: egen proc_date = max(cond(missing(procedureid),.,date2))
            format %td proc_date

            Regarding the bold text - can you let me know i'm understanding correctly what I am telling stata to do. I've correlated the code with explanation in respective colours

            Cond:
            //For all the missing variables for procedureId generate Missing if True, and date2 if not missing
            //Then Generate the maximum date

            For this code : again seeking clarification regarding the underlined section

            gen byte wanted = cond(missing(proc_date),.,SSI*(date2-proc_date<=30))

            //generate a wanted variable
            //for each missing procedure date, generate . if true if not give the value of the SSI if (date2 - proc_date is = or less than 30 days)

            ADDITIONAL QUestion: What does the * represent, should it not represent a multiplication?


            Comment


            • #7
              Code:
              by hospitalid: egen proc_date =max(cond(missing(procedureid),.,date2))
              Yes, the interpretation is:
              • For all the missing observations for procedureId, generate missing if true, and date2 if not missing
              • Then generate the maximum of these dates for each hospitalid (and this maximum value will be missing for a hospitalid if procedureid is missing for all observations of that hospitalid)
              Code:
              gen byte wanted = cond(missing(proc_date),.,SSI*(date2-proc_date<=30))
              The interpretation is:
              • generate a wanted variable
              • which is missing if proc_date is missing
              • and if proc_date is non-missing, check if the difference between date2 and proc_date is no more than 30 days. If this is true (i.e. equals 1) AND SSI is 1, then return 1, else return 0.
              Yes, the asterisk is a product and serves as the AND operator described above.

              Comment


              • #8
                Excellent, Hemanshu Kumar why does the * asterisk represent and AND here rather than a multiplication?
                Are there some rules I can refer too?

                Comment


                • #9
                  There is no separate rule here. A logical evaluation (here, the evaluation of date2-proc_date<=30) results in a binary (0 = false, 1 = true), and SSI is already binary. So the product of the two will be 1 only if both are 1, which is exactly how an AND operator works.

                  Comment


                  • #10
                    I'm just practicing.


                    But I've now changed what if procedureid is changed to 0 instead of missing.
                    However, same context that if the patient had a procedure this is coded as 1 but 0 if the patient didn't have a procedure (rather than missing)

                    Then with the same aim in mind:
                    I would like to find out if they developed a surgical site infection (SSI2) within 30 days of their date of procedure (date2)

                    The difference here:
                    A patient like 14A may have had 2 separate procedures (one on 1 Mar and one on 15 Mar) and thus I would like to see if they were ever admitted within 30 dates of their respective procedure for a SSI

                    Thus I tweaked the code:

                    gen date4each = date2 if procedureid == 1
                    format date4each %td

                    Aim: To generate a new variable called proc_date, and for each procedureid leave as 0 if 0 and if not present the respected date found under 'date4each'

                    bys hospitalid: egen proc_date = cond(procedureid,.,date4each)
                    format %td proc_date

                    Problem here: Stata tells me unknown egen function cond()


                    I will then move on with the following command to continue to pursue the above aim.
                    gen byte wanted = cond(missing(proc_date),.,SSI*(date4each-proc_date<=30))




                    Code:
                    * Example generated by -dataex-. For more info, type help dataex
                    clear
                    input float procedureid str3 hospitalid float SSI str11 date str3 expected_ssi float(date2 date4each)
                    1 "12A" 0 "1 Dec 2020"  "no"  22250 22250
                    0 "12A" 1 "3 Dec 2020"  "yes" 22252     .
                    0 "13A" 0 "5 Dec 2021"  "no"  22619     .
                    1 "13A" 0 "1 Jan 2022"  "no"  22646 22646
                    0 "13A" 1 "20 Jan 2022" "yes" 22665     .
                    1 "14A" 0 "1 Mar 2022"  "no"  22705 22705
                    0 "14A" 1 "2 Mar 2022"  "yes" 22706     .
                    1 "14A" 0 "15 Mar 2022" "no"  22719 22719
                    0 "14A" 0 "16 Mar 2022" "no"  22720     .
                    end
                    format %td date2
                    format %td date4each

                    Comment


                    • #11
                      The egen command takes a very specific set of functions, and even if they have the same name as a regular Stata function (as in the case of max), they do something a little different. See
                      Code:
                      help egen

                      Comment


                      • #12
                        Thanks for this Hemanshu Kumar I have read the help and tried again. I've nearly found a solution but need your help to the last part in the puzzle.

                        Again same aim in mind to see if the patient develop a SSI within 30 days when procedureid == 1 by HospitalID

                        I have changed the data to check if my code is fullproof = see PT 14A who has had 2 procedures.

                        Code:
                        * Example generated by -dataex-. For more info, type help dataex
                        clear
                        input float procedureid str3 hospitalid float SSI str11 date str3 expected_ssi float(date4each date2 wanteddate2 difference SSI3)
                        1 "12A" 0 "1 Dec 2020"  "no"  22250 22250 22250   0 .
                        0 "12A" 1 "3 Dec 2020"  "yes"     . 22252 22250   2 1
                        0 "13A" 0 "5 Dec 2021"  "no"      . 22619 22250 369 .
                        1 "13A" 0 "1 Jan 2022"  "no"  22646 22646 22646   0 .
                        0 "13A" 1 "20 Jan 2022" "yes"     . 22665 22646  19 1
                        1 "14A" 0 "1 Mar 2022"  "no"  22705 22705 22705   0 .
                        1 "14A" 0 "15 Mar 2022" "no"  22719 22719 22719   0 .
                        0 "14A" 0 "16 Mar 2022" "no"      . 22720 22719   1 .
                        0 "14A" 1 "2 Mar 2022"  "yes"     . 22706 22719 -13 1
                        end
                        format %td date4each
                        format %td date2
                        format %td wanteddate2

                        The part where I cheated and need help:

                        date2 represents the date when a patient (represented by hospitalid attended the hospital)

                        I need to make sure that each patientid who had a procedureid = 1 , the date is replicated on each visit to the hospital by hospitaID until that same patient has another procedureid == 1 whereby this new date is replicated for the subsequent times the patient is admitted to hospital.

                        I couldn't solve the above problem so I cheated and instead created a wanteddate2 variable with the dates I want to see. This was in order to determine if the code below works to see if a patient developed a SSI within 30 days of procedureID date if procedure id ==1

                        So I would appreciate help with the above ^^

                        For demonstration purposed this is the code I subsequently used:

                        I then wanted to determine if the patient (hospitalid) developed a SSI within 30 days when procedureid == 1
                        Therefore code used:

                        gen difference = date2-wanteddate2
                        by hospitalid: gen SSI3 = 1 if SSI == 1 & difference <= 30

                        In summary, I nearly found a solution to my question but haven't fully solved it. Any solutions pls?

                        Of course if you have a simpler solution to my problem always looking to see how i can improve, let me know

                        Comment


                        • #13
                          Hi all, I have solved the problem . For anyone who refers to this in the future:
                          I learnt about the cascade effect show in red. If of course you have any other thoughts how to write the code please share !

                          Code:
                          * Example generated by -dataex-. For more info, type help dataex
                          clear
                          input float procedureid str3 hospitalid float SSI str11 date str3 expected_ssi float(date2 date4each SSI2)
                          1 "12A" 0 "1 Dec 2020"  "no"  22250 22250 0
                          0 "12A" 1 "3 Dec 2020"  "yes" 22252 22250 1
                          0 "13A" 0 "5 Dec 2021"  "no"  22619 22250 0
                          1 "13A" 0 "1 Jan 2022"  "no"  22646 22646 0
                          0 "13A" 1 "20 Jan 2022" "yes" 22665 22646 1
                          1 "14A" 0 "1 Mar 2022"  "no"  22705 22705 0
                          1 "14A" 0 "15 Mar 2022" "no"  22719 22719 0
                          0 "14A" 0 "16 Mar 2022" "no"  22720 22719 0
                          0 "14A" 1 "2 Mar 2022"  "yes" 22706 22719 1
                          end
                          format %td date2
                          format %td date4each

                          //Creating the dates when patient had a procedure otherwise it remains missing
                          gen date4each = date2 if procedureid == 1
                          format date4each %td

                          //creating procedure date the same as preceeding procedure date if pt did not have a procedure
                          replace date4each = date4each[_n-1] if date4each >=.
                          format %td date4each


                          //creating ssi within 30 days
                          gen SSI2 = 0
                          replace SSI2 = 1 if SSI == 1 & date2-date4each<=30

                          Comment

                          Working...
                          X