Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • String and dummy variable

    I am trying to create a dummy variable that indicates whether each discharge involves readmission. I only have data on the date of admission. However, readmission is subsequent hospitalization for the same patientId within 30 days of the index claim

  • #2
    Please present a data example with a few cases. If you have a fully updated version of Stata 14 or later versions, see

    Code:
    help dataex
    Otherwise

    Code:
    ssc install dataex
    help dataex

    Comment


    • #3
      Thank you. I hope this makes sense
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str9 admitdate
      "28-Apr-10"
      "17-Mar-10"
      "16-Apr-10"
      "18-May-10"
      "18-Aug-10"
      "27-Sep-10"
      "28-Oct-10"
      "31-Aug-10"
      "24-Sep-10"
      "25-Oct-10"
      "21-Nov-10"
      "8-Jul-10"
      "6-Jun-10"
      "7-Jul-10"
      "28-Feb-10"
      "2-Apr-10"
      "3-May-10"
      "21-Mar-10"
      "5-Sep-10"
      "29-Sep-10"
      "9-Jan-10"
      "13-Feb-10"
      "7-Jun-10"
      "9-Jul-10"
      
      end

      Comment


      • #4
        I uploaded a sample just for better clarification.
        Attached Files

        Comment


        • #5
          This will assign a value of 1 for any patientid with readmission within 30 days and 0 otherwise.

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input byte(patientid age) str9 admitdate byte systolic int(procedure1 procedure2 procedure3 procedure4 procedure5 diagnosis1 diagnosis2 diagnosis3 diagnosis4 diagnosis5) byte aha_id
          1 72 "28-Apr-10" 97 3610 1135 3813 5225 7302 1380 4560 3045 5273 3907 8
          2 78 "17-Mar-10" 81 1253 3402 5113 4611 5350 3605 7463 8480 2865 3860 7
          2 55 "16-Apr-10" 99 7630 7930 2576 5741 7262 2254 5432 4953 8581 2851 9
          2 64 "18-May-10" 64 3362 3595 1999 2430 5833 3651 2667 5965 2976 8016 9
          3 58 "18-Aug-10" 99 4322 8801 1039 6368 2008 5081 3482 6221 4265 1927 7
          end
          
          gen admit_date = date(admitdate, "DM20Y")
          format admit_date %td
          bys patientid (admit_date): gen difference= admit_date- admit_date[_n-1]
          bys patientid: egen wanted= max(difference<=30)
          Res.:

          Code:
          . l patientid admit_date difference wanted, sepby(patientid)
          
               +------------------------------------------+
               | patien~d   admit_d~e   differ~e   wanted |
               |------------------------------------------|
            1. |        1   28apr2010          .        0 |
               |------------------------------------------|
            2. |        2   17mar2010          .        1 |
            3. |        2   16apr2010         30        1 |
            4. |        2   18may2010         32        1 |
               |------------------------------------------|
            5. |        3   18aug2010          .        0 |
               +------------------------------------------+

          Comment


          • #6
            Thank you so much. This was so helpful
            However a follow up question to that is that I would have to create another dummy variable that has an inclusion for certain groups and exclusion criteria for the first three characters

            Comment


            • #7
              dummy variable that has an inclusion for certain groups and exclusion criteria for the first three characters
              I do not get what you mean here. Can you give an example?

              Comment


              • #8
                I am to create a dummy variable for a procedure that involved CABG. It says the inclusion criteria is an procedure code in these groups 3610-3616 and the exclusion criteria are procedure code where the first 3 characters are either 350/351

                Comment


                • #9
                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input int procedure1
                  3610
                  1253
                  7630
                  3362
                  4322
                  2849
                  5248
                  1257
                  4455
                  5214
                  end
                  I need to create a dummy variable for whether a procedure involved CABG or not. The inclusion criteria is any procedure code in the groups 3610-3616 and the exclusion criteria are procedure code where the first 3 characters are either 350/351.

                  Comment


                  • #10
                    Do you need to tag a patientid if any of the inclusion criteria are fulfilled? If so, from your data structure in #5

                    Code:
                    * Example generated by -dataex-. To install: ssc install dataex
                    clear
                    input byte(patientid age) str9 admitdate byte systolic int(procedure1 procedure2 procedure3 procedure4 procedure5 diagnosis1 diagnosis2 diagnosis3 diagnosis4 diagnosis5) byte aha_id
                    1 72 "28-Apr-10" 97 3610 1135 3813 5225 7302 1380 4560 3045 5273 3907 8
                    2 78 "17-Mar-10" 81 1253 3402 5113 4611 5350 3605 7463 8480 2865 3860 7
                    2 55 "16-Apr-10" 99 7630 7930 2576 5741 7262 2254 5432 4953 8581 2851 9
                    2 64 "18-May-10" 64 3362 3595 1999 2430 5833 3651 2667 5965 2976 8016 9
                    3 58 "18-Aug-10" 99 4322 8801 1039 6368 2008 5081 3482 6221 4265 1927 7
                    end
                    
                    local values ""
                    forval i= 3610(1)3616{
                         local values "`values' `i'"
                    }
                    egen tag= anymatch(procedure*), values(`values')
                    bys patientid: egen CABG_included = max(tag)
                    Defining a dummy for the exclusion criteria is more difficult as the egen command with the -anymatch- function does not accept functions within the -values()- option. That is why, for example, I cannot use the -inrange()- function in the code above and had to resort to defining a local macro. The easiest way to handle this is to have a long layout (i.e., reshape long procedure), but I will post a solution that works with a wide layout later in the day (or someone else in the list may be able to come up with a better suggestion). If you need to tag observations but not the entire patientid, exclude the last line of the code above.

                    Res.:

                    Code:
                    . l patientid procedure* CABG_included, sepby(patientid)
                    
                         +----------------------------------------------------------------------------+
                         | patien~d   proced~1   proced~2   proced~3   proced~4   proced~5   CABG_i~d |
                         |----------------------------------------------------------------------------|
                      1. |        1       3610       1135       3813       5225       7302          1 |
                         |----------------------------------------------------------------------------|
                      2. |        2       1253       3402       5113       4611       5350          0 |
                      3. |        2       7630       7930       2576       5741       7262          0 |
                      4. |        2       3362       3595       1999       2430       5833          0 |
                         |----------------------------------------------------------------------------|
                      5. |        3       4322       8801       1039       6368       2008          0 |
                         +----------------------------------------------------------------------------+

                    Comment


                    • #11
                      exclusion criteria are procedure code where the first 3 characters are either 350/351.
                      This is easier than I thought. 4 digit numbers where the first 3 digits are 350 and 351 include numbers in the range 3500-3519. So the same approach in #10 applies.

                      Code:
                      local values ""
                      forval i= 3500(1)3519{
                                local values "`values' `i'"
                      }
                      egen tag2= anymatch(procedure*), values(`values')
                      bys patientid: egen CABG_excluded = max(tag2)

                      Comment


                      • #12
                        Thank you so much for consistently helping me with this. I am grateful

                        Comment


                        • #13
                          I am confused as to what data/code I might need to run the Elixhauser Module. I am guessing it is the diagnosis code, I have been asked to run the module to create Elixhauser comorbidity flags and a count of comorbidities. This is the same dataset I have been using for all previous questions.

                          Someone had previously posted something similar but there wasn't any response given. Can you help with this?
                          Thank you.

                          Comment


                          • #14
                            #13 seems a completely different question, In any case you've already asked at https://www.statalist.org/forums/for...auser-question

                            Please don't ask the same question in different places, Anyone able to comment should please follow the cited thread,

                            Comment


                            • #15
                              Ok! Nick. Thanks

                              Comment

                              Working...
                              X