Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finding first/last date and associated values.

    I have a data set that has multiple dates and associated values stored as separate variables like this: 'creatinine_1_date', 'creatinine_1_value', 'creatinine_2_date', 'creatinine_2_value' ... 'creatinine_350_date', 'creatinine_350_value' etc, but in no particular chronological order. I want to find the first and last date and associated value for each observation. Finding the last date is easy using:

    egen lastdate = rowmax(creatinine_*_date)

    but I cannot figure out how to find the associated value that corresponds to this date. Can anyone provide help? I'm using Stata 13.1.

  • #2
    if you really want that, why not just use summarize the variable? use the "format" option if you want the value in date format (I presume you have used format command to format your new variable), correct?

    Comment


    • #3
      Thanks for your reply, but I think I was not clear in my original post. My data is stored like this
      id creatinine_1_date creatinine_1_value creatinine_2_date creatinine_2_value creatinine_3_date creatinine_3_value
      1 25jun2003 1.5 01aug2005 2.5 01aug2008 3.0
      2 15jul2003 1.6 01aug2006 1.4 01aug2009 1.2
      3 22jul2003 2.2 01aug2007 1.1 01aug2010 1.9

      I want to find the last date for each row, which I have done, but then I also want to generate a variable lastcr that is the corresponding creatinine_*_value that is associated with the last date (i.e. if creatinine_3_date is the last date in the row, I also want lastcr = creatinine_3_value). I hope that is clearer.

      Comment


      • #4
        Consider reshaping to long form (and back to wide, if you need to). Then you can simply sort the dates, and extract first and last by id groups. An example:

        Code:
        clear all
        set more off
        
        *----- example data -----
        
        input ///
        id cr1date     cr1val     cr2date     cr2val     cr3date     cr3val
        1     1     1.5     6     2.5     4     3.0
        2     45     1.6     3     1.4     34     1.2
        3     56     2.2     8     1.1     12     1.9
        end
        
        format cr?date %td
        
        list
        
        *----- what you want -----
        
        reshape long cr@val cr@date, i(id) j(obs)
        
        bysort id (crdate): gen first = crval[1]
        by id: gen last = crval[_N]
        
        list, sepby(id)
        If you have missings then some further work would be needed. I assume also that no two dates are the same for each original observation (if so, and they turn out to be first or last, then you need to decide which value to use).

        See -help reshape-, -help by-, and -help subscripting-.
        Last edited by Roberto Ferrer; 09 Aug 2014, 20:00.
        You should:

        1. Read the FAQ carefully.

        2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

        3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

        4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

        Comment


        • #5
          Roberto Ferrer offered good advice, this kind of problem is easily dealt with when you reshape to long format. If you really want to stay in wide, I fiddled with the toy data set you posted to do the following:

          Code:
          . egen lastdate = rowmax(creatinine_*_date)
          
          . gen last_value = .
          (3 missing values generated)
          
          . forvalues i = 1/3{
            2. replace last_value = creatinine_`i'_value if creatinine_`i'_date == lastdate
            3. }
          (0 real changes made)
          (0 real changes made)
          (3 real changes made)
          
          . format lastdate %td
          
          . list
          
               +----------------------------------------------------------------------------------------------+
               | id   cr~1_date   ~1_value   c~2_date   ~2_value   c~3_date   ~3_value    lastdate   last_v~e |
               |----------------------------------------------------------------------------------------------|
            1. |  1   25-Jun-03        1.5   1-Aug-05        2.5   1-Aug-08          3   01aug2008          3 |
            2. |  2   15-Jul-03        1.6   1-Aug-06        1.4   1-Aug-09        1.2   01aug2009        1.2 |
            3. |  3   22-Jul-03        2.2   1-Aug-07        1.1   1-Aug-10        1.9   01aug2010        1.9 |
               +----------------------------------------------------------------------------------------------+
          
          .
          end of do-file
          .
          Richard T. Campbell
          Emeritus Professor of Biostatistics and Sociology
          University of Illinois at Chicago

          Comment

          Working...
          X