Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating variable by group if pt developed complication after procedure

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float colorectal str3 hospitalid float MI str3 expected_mi float(date2 date4each MI2) str1 expected float max
      0 "12A" 0 "no"  22252     . 0 ""  .
    112 "12A" 1 "yes" 22253 22253 1 "1" .
      0 "13A" 0 "no"  22678 22646 0 ""  .
      0 "13A" 1 "no"  22665 22619 0 ""  .
    113 "13A" 0 "no"  22619 22619 0 "0" 0
    114 "13A" 0 "no"  22646 22646 0 "0" 0
      0 "14A" 1 "yes" 22720 22705 1 ""  .
    115 "14A" 0 "no"  22734 22734 0 "0" 0
    116 "14A" 0 "no"  22705 22705 0 "1" 0
    end
    format %td date2
    format %td date4each




    I’m trying to create a new dataset where if the patient had a MI within 45 days this will be stored as 1 ON THE SAME LINE THE PATIENT HAD A COLORECTAL PROCEDURE.

    COLORECTAL = PT HAD A COLORECTALSURGERY

    HOSPITALID = ANY TIME PT ADMITTED TO HOSPITAL AND DATE2 WHEN PATIENT ADMITTED


    I can’t seem to do it. I hope to creating something that looks like the 'expected' column...



  • #2
    I can't quite figure out what you want here. The "expected" variable that you set as a model does not seem consistent with what you say in words. In particular, let's look at patient id 13A. This patient has an MI on 20 Jan 2022, which is just 19 days after colorectal surgery on 1 Jan 2022. So why isn't expected set to 1 in this situation?

    Also, what cannot in any case be discerned from your example is whether "within 45 days" means within the 45 days after colorectal surgery, or within the 45 days before colorectal surgery, or both of those.

    Comment


    • #3
      Ok I'm sure there is a easier way solving this but I solved a similar thing using this code:
      Code:
      gen days = datediff(date4each, date2, "day" > )
      keep if MI==1
      gen MI_within_45=1 if days<46
      replace MI_within_45=0 if days>45
      keep hospitalid MI_within_45
      save "/Volumes/USB 128GB/statahelp.dta"
      
      * Use your original file and do following:
      merge m:1 hospitalid using "/Volumes/USB 128GB/statahelp.dta"
      drop _merge
      I had a little trouble understanding which date was the surgery date, and put it as date4each, but if not, just switch it around

      Comment


      • #4
        Clyde Schechter - sorry this should read as 45 days after the colorectal procedure. You are correct error from my end. sorry. I have been trying to find ways of solving this problem for at least 72 hours.

        Interesting Vilma Antonov you use keep command. This was my last resort.

        I was wondering whether there is another other way by keeping the entire dataset and just storing the data if the Patient had a MI within 45 days on the same line.

        But if keep is the only idea people have, I will use this and create a new dta file. Then combine datasets and match.

        Comment


        • #5
          You want to copy the MI date upwards and the colorectal date downwards to match the dates when computing the differences. Otherwise, your description is not too easy to follow.

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input float colorectal str3 hospitalid float MI str3 expected_mi float(date2 date4each MI2) str1 expected float max
            0 "12A" 0 "no"  22252     . 0 ""  .
          112 "12A" 1 "yes" 22253 22253 1 "1" .
            0 "13A" 0 "no"  22678 22646 0 ""  .
            0 "13A" 1 "no"  22665 22619 0 ""  .
          113 "13A" 0 "no"  22619 22619 0 "0" 0
          114 "13A" 0 "no"  22646 22646 0 "0" 0
            0 "14A" 1 "yes" 22720 22705 1 ""  .
          115 "14A" 0 "no"  22734 22734 0 "0" 0
          116 "14A" 0 "no"  22705 22705 0 "1" 0
          end
          format %td date2
          format %td date4each
          
          g MI_date= date2*MI
          bys hospitalid (date2): g colorectal_date= date2*(colorectal>0)
          by hospitalid: replace colorectal_date= colorectal_date[_n-1] if !colorectal_date & colorectal_date[_n-1] & _n>1
          gsort hospitalid -date2
          by hospitalid: replace MI_date= MI_date[_n-1] if !MI_date & MI_date[_n-1] & _n>1
          g wanted = inrange(MI_date- colorectal_date+1, 0, 45) if colorectal
          Res.:

          Code:
          . sort hospitalid date2
          
          
          . l hospitalid colorectal MI date2 expected MI_date-wanted , sepby(hospitalid)
          
               +-------------------------------------------------------------------------------+
               | hospit~d   colore~l   MI       date2   expected   MI_date   colore~e   wanted |
               |-------------------------------------------------------------------------------|
            1. |      12A          0    0   03dec2020                22253          .        . |
            2. |      12A        112    1   04dec2020          1     22253      22253        1 |
               |-------------------------------------------------------------------------------|
            3. |      13A        113    0   05dec2021          0     22665      22619        0 |
            4. |      13A        114    0   01jan2022          0     22665      22646        1 |
            5. |      13A          0    1   20jan2022                22665      22646        . |
            6. |      13A          0    0   02feb2022                    .      22646        . |
               |-------------------------------------------------------------------------------|
            7. |      14A        116    0   01mar2022          1     22720      22705        1 |
            8. |      14A          0    1   16mar2022                22720      22705        . |
            9. |      14A        115    0   30mar2022          0         .      22734        0 |
               +-------------------------------------------------------------------------------+
          Last edited by Andrew Musau; 28 Nov 2022, 11:11.

          Comment


          • #6
            Code:
            rangestat (sum) wanted = MI, by(hospitalid) interval(date2 0 45)
            replace wanted = min(wanted, 1)
            replace wanted = . if colorectal == 0
            -rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer. It is available from SSC.

            Note: code assumes (but does not verify) that the variable MI is always 0 or 1.

            Comment


            • #7
              One quick way to do this might be using the community-contributed rangestat command (available from SSC):

              Code:
              rangestat (max) wanted = MI , interval(date2 0 45) by(hospitalid)
              replace wanted = . if colorectal == 0

              Comment


              • #8
                Code:
                * Example generated by -dataex-. For more info, type help dataex
                clear
                input float colorectal str3 hospitalid float(MI date2)
                  0 "12A" 0 22252
                112 "12A" 1 22253
                  0 "13A" 0 22678
                  0 "13A" 1 22665
                113 "13A" 0 22619
                114 "13A" 0 22646
                115 "14A" 0 22720
                  0 "14A" 1 22734
                116 "14A" 0 22705
                  0 "14A" 0 22705
                end
                format %td date2

                The code does not work for patient 14A who had procedure 116 but did not develop a MI in 45 days. But then had a procedure 115 on 16 Mar and developed a MI within 45 days.

                The code generates the max or sum depending on whose code you use but with this code presents an inaccurate picture that patient 14A develop a MI after procedure 115 and 116...

                So i think I will need to stick to keep and then merging different datasets. Click image for larger version

Name:	Screenshot 2022-11-29 at 13.10.33.png
Views:	1
Size:	85.8 KB
ID:	1691309



                Last edited by Tara Boyle; 29 Nov 2022, 06:14.

                Comment


                • #9
                  The code does not work for patient 14A who had procedure 116 but did not develop a MI in 45 days. But then had a procedure 115 on 16 Mar and developed a MI within 45 days.
                  But patient 14A did develop an MI on 30 Mar 2022, which is within 45 days of colorectal procedure 116, which took place on 1 Mar 2022. The MI was, in fact, within 45 days of both procedures 115 and 116. So the code is correct.

                  Comment


                  • #10
                    Well, the code is a correct implementation of what you asked for in #1. But perhaps what you asked for is not what you meant. Perhaps if there are multiple colorectal surgeries that all occurred within the 45 days preceding an MI, you only want the last of those surgeries to be marked 1 in the new variable. This might make sense if, for example, you are conducting a study of the incidence of MI following colorectal procedures using time-to-event analysis and you want to treat an intervening colorectal procedure as censoring the observation following the preceding colorectal procedure(s). In that case, the following code will give you what you want:
                    Code:
                    assert inlist(MI, 0, 1)
                    gsort hospitalid -date2
                    by hospitalid: gen next_colorectal_date = .
                    by hospitalid: replace next_colorectal_date = cond(colorectal[_n-1], date2[_n-1], ///
                        next_colorectal_date[_n-1]) if _n > 1
                    format next_colorectal_date %td
                    
                    gen lower = cond(colorectal, date2, 1)
                    gen upper = cond(colorectal, min(date2+45, next_colorectal_date-1), 0)
                    rangestat (max) wanted = MI, by(hospitalid) interval(date2 lower upper)
                    drop lower upper
                    Last edited by Clyde Schechter; 29 Nov 2022, 07:13.

                    Comment


                    • #11
                      Another question of the code . Although I don’t know if i’ll be able to use range stat due a problem highlighted ina previous thread but I still would like to know how the code works (bold section)

                      rangestat (max) wanted = MI , interval(date2 0 45)

                      Ex

                      Section 1

                      Pt 13 A had procedure 114 on 1 Jan 2022, but no MI




                      Section 2

                      That same pt develops a MI on 20 Jan 2022




                      So in this way would stata use the code and say that for section 1 the pt had no MI -correct




                      But for section 2 the pt gets a Mi within 45 days of Section 1 date HOWEVER—> actually the code says date2 so theoretically that wouldn’t that mean 20 Jan 2022 .




                      How does stata interpret this correctly and calculate 45 days from Section 1 date ? Just trying to understand how this works.

                      Comment


                      • #12
                        Another question of the code . Although I don’t know if i’ll be able to use range stat due a problem highlighted ina previous thread but I still would like to know how the code works (bold section)

                        rangestat (max) wanted = MI , interval(date2 0 45)

                        Ex

                        Section 1

                        Pt 13 A had procedure 114 on 1 Jan 2022, but no MI




                        Section 2

                        That same pt develops a MI on 20 Jan 2022




                        So in this way would stata use the code and say that for section 1 the pt had no MI -correct




                        But for section 2 the pt gets a Mi within 45 days of Section 1 date HOWEVER—> actually the code 🧑*💻 says date2 so theoretically that wouldn’t that mean 20 Jan 2022 .




                        How does stata interpret this correctly and calculate 45 days from Section 1 date ? Just trying to understand how this works.

                        Comment


                        • #13
                          The -interval()- option in -rangestat- is a bit complicated and unintuitive. I'll try to explain how it works. First some terminology.

                          Thank of -rangestat- as processing each observation of the data set separately. I'll refer to the observation being processed at any given time as the current observation, and the values of its variables as their current values. I'll refer to other observations as "source" observations and the values of their variables as "source" values. (Note: the current observation itself is also considered a source value, unless the -excludeself- option has been specified.)

                          So, when -rangestat- is processing an observation and the -interval()- observation is -interval(date2 0 45)-, that is interpreted as:
                          1. Find the current observation of variable date2.
                          2. Add 0 to the current value of date2 to get the lower limit for inclusion of source observations in the range for calculating the statistics requested.*
                          3. Add 45 to the current value of date2 to get the upper limit for inclusion of source observations in the range for calculating the statistics requested.*
                          4. Select all observations for which the source value of date2 falls between the upper and lower limits calculated in steps 2 and 3.
                          5. Calculate the requested statistics (sum, max, mean, whatever) using the observations selected in step 4.
                          6. Set the current value(s) of the variable(s) for the requested statistics to the results calculated in step 5.
                          * This is how it works when the second and third argument in the -interval()- parameter are specified as constants. When they are specified as variables, the current values of those variables are used as the lower and upper limits for inclusion.
                          Note: The above are done restricted to exact matching between current and source variables on the variables given in the -by()- option (if any).

                          Concretely, regarding patient 13A, consider what happens when the observation for procedure 114 on patient 13A.
                          Step 1: The current value of date2 is 1 Jan 2022.
                          Step 2: The lower limit is therefore 1 Jan 2022 + 0 = 1 Jan 2022.
                          Step 3 The upper limit is 1 Jan 2022 + 45 = 15 Feb 2022.
                          Step 4: Select all observations of patient 13A (because hospitalid is in the -by()- option) whose (source) values of date2 fall between 1 Jan 2022 and 15 Feb 2022.
                          Step 5: Calculate the maximum value of variable MI among those observations. The 20 Jan 2022 observation for patient 13A does fall between 1 Jan 2022 and 15 Feb 2022, so it is among those source observations used to calculate the maximum. The source value of MI for this 20 Jan 2022 observation is 1. And since all values of MI are either 0 or 1, and we have just found a 1 for MI in the 20 Jan 2022 source observation, the maximum value of MI for all these observations must be 1.
                          Step 6: Set wanted = 1.

                          Comment

                          Working...
                          X