Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create variable that pulls in information from other rows

    Hi everyone,

    I am studying hospital readmissions and would like to know the number of days that pass between hospitalizations. For each row, I have the patients ID, a unique ID for the hospitalization (admission ID), date of admission, and date of discharge. For each admission ID, I would like to know the date of their last discharge. From there, I can create a new variable to calculate number of days between last discharge and current admission date.

    Is there a way to create a variable that would pull in discharge dates from other rows in the data set?

    Here is an example with the new variable I would like to create in blue.
    Patient ID Admission ID Admit Date Discharge Date New var: last discharge date
    12345 135 1/1/2010 1/30/2010 ​.
    12345 234 2/5/2010 2/12/2010 1/30/2010
    12345 333 5/2/2010 5/28/2010 2/12/2010
    999 196 11/6/2007 11/7/2007 .
    999 247 12/1/2007 12/4/2007 11/7/2007

    Appreciate your help!

  • #2
    Welcome to Statalist, Susan!

    The basic idea would be to do something like:

    Code:
    bysort patient_id (admission_id): gen last_discharge_date = discharge_date[_n-1]
    assuming that admission ID is numeric, and always higher for later admissions.

    In the future, do post a data extract using the dataex command, as recommended in the Statalist FAQ. It helps clarify some critical things like the storage type of a given variable, that a table like the one you showed does not.
    Last edited by Hemanshu Kumar; 27 Oct 2022, 13:57.

    Comment


    • #3
      Hello Susan Gutierrez. I was just about to post some code when I saw that Hemanshu Kumar beat me to it. My solution was pretty similar to that one, but I thought that if all you really want is the number of days from the previous discharge to the current admission, you can get that without computing a last_discharge_date variable. See my code below.

      I hope this helps.

      Code:
      * Read in the sample data
      clear
      input ptid admid str10(datestr1 datestr2)
      12345     135     "1/1/2010"     "1/30/2010"
      12345     234     "2/5/2010"     "2/12/2010"
      12345     333     "5/2/2010"     "5/28/2010"
      999     196     "11/6/2007" "11/7/2007"
      999     247     "12/1/2007" "12/4/2007"
      end
      
      * Convert string date variables to elapsed time date variables
      generate indate = date(datestr1, "MDY")
      generate outdate = date(datestr2, "MDY")
      * Change he display format
      format indate outdate %td
      list
      
      * Generate days from previous discharge to current admission
      bysort ptid (indate): generate daysbtwn = indate - outdate[_n-1]
      list
      
      /*
      References
      https://stats.oarc.ucla.edu/stata/modules/using-dates-in-stata/
      https://stats.oarc.ucla.edu/stata/faq/stata-faq-how-can-i-count-the-number-of-days-between-two-dates/
      https://stats.oarc.ucla.edu/stata/seminars/notes/counting-from-_n-to-_n/
      */
      Output from the final -list- command:
      Code:
      . list
      
           +--------------------------------------------------------------------------+
           |  ptid   admid    datestr1    datestr2      indate     outdate   daysbtwn |
           |--------------------------------------------------------------------------|
        1. |   999     196   11/6/2007   11/7/2007   06nov2007   07nov2007          . |
        2. |   999     247   12/1/2007   12/4/2007   01dec2007   04dec2007         24 |
        3. | 12345     135    1/1/2010   1/30/2010   01jan2010   30jan2010          . |
        4. | 12345     234    2/5/2010   2/12/2010   05feb2010   12feb2010          6 |
        5. | 12345     333    5/2/2010   5/28/2010   02may2010   28may2010         79 |
           +--------------------------------------------------------------------------+
      --
      Bruce Weaver
      Email: [email protected]
      Version: Stata/MP 19.5 (Windows)

      Comment


      • #4
        Hemanshu Kumar - this code worked perfectly! Thank you so much. And I'll be sure to use dataex going forward.

        Bruce Weaver Yes - you are right that my focus is "days between hospitalization" and knowing the last discharge date is only a means to get to this final answer. I tried your code and this was my first time using the input command to convert dates to string variables. I was able to manually enter two rows of data, follow your code, and it seemed to work. I have 16,000 participants in this study so manual entry won't be possible - is there a fast way to do this for all 16,000 entries?
        Also - I'm new to stata so thanks for bearing with me if this is a pretty basic question. Appreciate it.

        Comment


        • #5
          Hello Susan. I used the -input- command to create a small dataset, not to convert date variables to string variables. I read them in as strings because that seemed the easiest approach, given the example you showed in #1.

          If you already have indate and outdate as proper date variables, then all you need is the line starting with bysort:

          Code:
          bysort ptid (indate): generate daysbtwn = indate - outdate[_n-1]
          Replace indate and outdate with the names of your admission and discharge date variables; and replace daysbtwn with whatever variable name you want for the days in between the previous discharge date and the current admission date.
          --
          Bruce Weaver
          Email: [email protected]
          Version: Stata/MP 19.5 (Windows)

          Comment


          • #6
            Bruce Weaver After fixing the date formats, I was able to use the code you suggested to calculate the days between each admission. This was a huge help. Thanks so much!

            Comment

            Working...
            X