Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculate difference between first and last records, where number of records may vary

    I have a dataset where subjects are seen at a variable number of visits and at each visit they have various measurements made. I want to calculate the difference in measurements taken at visit 'n' and visit (some other n) but the number of visits may vary (e.g. maybe the first and last visits, but some subjects will have 20 visits and others only 3). I know how to do this by brute force, but am wondering if there is a simple way to do such a calculation.

    Many thanks.
    Don

  • #2
    how about:

    bys id visit: gen diff=measure[_N] - measure[1]

    Comment


    • #3
      whoops - don't include "visit" ; if you need a specific order, do the sorting first and then use the -gen- statement

      Comment


      • #4
        or, in one statement: bys patient (visit): gen diff=measure[_N] - measure[1]

        Comment


        • #5
          Rich
          Many thanks for the rapid response, but it doesn't work.
          Here is a sample of my data set.
          id is the identifier. Date is the date of measurement. Spell1 reflects the particular cycle the subject is in. There can be many 'spells', but this can be ignored for now.
          Visspell1 is the visit number for spell '1'. recnum is the record number for the visit. Some subjects will have many records at a visit, so I am only interested in the first record
          eq5d is the measurement I am interested in.
          I want to calculate the difference between eq5d at visspell1 = 5 and visspell1 == 1 (in both cases with recnum == 1), and for each id
          How do I do this?
          id date spell1 visspell1 recnum eq5d
          10001 30nov2006 . . 1 .088
          10001 12feb2007 1 1 1 .088
          10001 01may2007 1 2 1 .293
          10001 14dec2007 1 3 1 .
          10001 30may2008 1 4 2 .516
          10001 30may2008 1 4 1 .516
          10001 30may2008 1 4 3 .516
          10001 21nov2008 1 5 2 .62
          10001 21nov2008 1 5 1 .62

          10002 15feb2005 1 1 1 .587

          Don

          Comment


          • #6
            Don: We need to see your version of Rich's code to comment. But I guess it won't work because your data structure is more complicated than you implied. You need to define a sort order very carefully:

            Code:
            gen recnum1 = recnum == 1 
            bysort recnum1 id (date spell1 visspell1) : gen diff = measure[_N] - measure[1] if recnum1
            Last edited by Nick Cox; 29 May 2014, 09:26.

            Comment


            • #7
              Nick,
              Your suggest is close, but intermittent (i.e. some differences are correct and others not), and I don't see why.
              Here is some output.
              Don


              id date spell1 eq5d diffeq visspell1
              10001 12feb2007 1 .088 .532 1
              10001 01may2007 1 .293 .532 2
              10001 14dec2007 1 .532 3
              10001 30may2008 1 .516 .532 4
              10001 21nov2008 1 .62 .532 5
              10001 08may2009 1 .62 .532 6 This difference is right (0.62 - 0.088)
              10002 26aug2005 1 .587 .14 1
              10002 01dec2005 1 .727 .14 2
              10002 08may2006 1 .587 .14 3
              10002 30jan2007 1 .055 .14 4
              10002 09aug2007 1 .587 .14 5 This difference is NOT right. While the eq5d varies over time, the first and last are the same, but diffeq == 0.14.
              10004 10jun2010 1 .228 .069 1
              10004 09dec2010 1 1 .069 2
              10004 14apr2011 1 .691 .069 3
              10004 29sep2011 1 .796 .069 4
              10004 19apr2012 1 .069 5
              10004 20sep2012 1 .76 .069 6 This difference is NOT right. It should be 0.76 - 0.228 = .532, but diffeq is 0.069
              10005 01jun2006 1 -.016 .707 1
              10005 25oct2006 1 .691 .707 2
              10005 01mar2007 1 .62 .707 3
              10005 10sep2007 1 .088 .707 4
              10005 13mar2008 1 .796 .707 5
              10005 05sep2008 1 .727 .707 6
              10005 12mar2009 1 .691 .707 7
              10005 04sep2009 1 .691 .707 8
              10005 03sep2010 1 .691 .707 9
              10005 08sep2011 1 .707 10
              10005 27jul2012 1 .691 .707 11 This difference is CORRECT
              10006 02oct2006 1 .055 .741 1
              10006 09jan2007 1 .623 .741 2
              10006 10sep2007 1 .727 .741 3
              10006 25mar2008 1 .796 .741 4
              10006 15sep2008 1 .727 .741 5
              10006 19mar2009 1 .727 .741 6
              10006 21sep2009 1 .691 .741 7
              10006 18mar2010 1 .727 .741 8
              10006 10mar2011 1 .691 .741 9
              10006 08mar2012 1 .796 .741 10 CORRECT
              10008 24apr2008 1 -.016 . 1
              10008 24jul2008 1 .689 . 2
              10008 06oct2008 1 .62 . 3
              10008 06apr2009 1 .587 . 4
              10008 01oct2009 1 .186 . 5
              10008 08apr2010 1 -.016 . 6
              10008 25jun2010 1 -.016 . 7 NO ANSWER. Considered missing, but real difference is 0
              10010 21jun2007 1 -.003 .799 1
              10010 13sep2007 1 .691 .799 2
              10010 27mar2008 1 .587 .799 3
              10010 11sep2008 1 .691 .799 4
              10010 13jan2009 1 .055 .799 5 WRONG. Real diff is 0.055 - -0.003 = 0.058, NOT 0.788
              10011 16jun2006 1 .656 . 1
              10011 01sep2006 1 .796 . 2
              10011 16mar2007 1 .796 . 3
              10011 07sep2007 1 .796 . 4
              10011 29feb2008 1 .796 . 5
              10011 12sep2008 1 .796 . 6
              10011 06mar2009 1 .796 . 7
              10011 11sep2009 1 .796 . 8
              10011 01oct2010 1 .656 . 9
              10011 09sep2011 1 . . 10
              10011 28sep2012 1 , . 11 CORRECT. MISSING - something = ., but it would be nice to handle this differently
              10013 14jul2004 1 -.016 . 1
              10013 15oct2004 1 .62 . 2 WRONG. Should be 0.78
              10014 05jun2007 1 .2580001 .7419999 1
              10014 14sep2007 1 1 .7419999 2
              10014 14mar2008 1 1 .7419999 3
              10014 19sep2008 1 1 .7419999 4
              10014 13mar2009 1 1 .7419999 5
              10014 04sep2009 1 1 .7419999 6
              10014 26feb2010 1 1 .7419999 7
              10014 03sep2010 1 1 .7419999 8
              10014 04mar2011 1 1 .7419999 9
              10014 09sep2011 1 1 .7419999 10
              10014 14sep2012 1 1 .7419999 11 CORRECT
              10015 05apr2004 1 -.016 1.016 1
              10015 17feb2005 1 .364 1.016 2
              10015 22sep2005 1 .2600001 1.016 3
              10015 10apr2006 1 .727 1.016 4
              10015 01sep2006 1 .691 1.016 5
              10015 09mar2007 1 .691 1.016 6
              10015 14sep2007 1 .691 1.016 7
              10015 14mar2008 1 .727 1.016 8
              10015 05sep2008 1 .796 1.016 9
              10015 06mar2009 1 .796 1.016 10
              10015 04sep2009 1 1 1.016 11
              10015 03sep2010 1 .796 1.016 12
              10015 09sep2011 1 1 1.016 13
              10015 31aug2012 1 1 1.016 14 CORRECT

              Comment


              • #8
                It's very hard to read your sample data as . for missing and . for decimal points get mixed in a copy and paste. Please format it as code.

                Comment


                • #9
                  Nick
                  Here you are. Apologies for not doing so the first time. Getting this in the right format for this post was a lesson in itself.
                  Don
                  id date spell1 visspell1 recnum eq5d diffeq
                  10001 30nov2006 . . 1 .088
                  10001 12feb2007 1 1 1 .088 .532
                  10001 01may2007 1 2 1 .293 .532
                  10001 14dec2007 1 3 1 .532
                  10001 30may2008 1 4 1 .516 .532
                  10001 21nov2008 1 5 1 .62 .532
                  10001 08may2009 1 6 1 .62 .532 CORRECT
                  10002 15feb2005 . . 1 .587 .
                  10002 17may2005 . . 1 .691 .
                  10002 26aug2005 1 1 1 .587 .14
                  10002 01dec2005 1 2 1 .727 .14
                  10002 08may2006 1 3 1 .587 .14
                  10002 30jan2007 1 4 1 .055 .14
                  10002 09aug2007 1 5 1 .587 .14 INCORRECT SHOULD BE ‘0’
                  10002 21jul2009 . . 1 .055 .
                  10002 29oct2009 . . 1 .587 .
                  10002 29apr2010 . . 1 .727 .
                  10002 28oct2010 . . 1 .727 .
                  10004 07jun2006 . . 1 .691 .
                  10004 30aug2006 . . 1 .76 .
                  10004 27apr2007 . . 1 .796 .
                  10004 07sep2007 . . 1 .796 .
                  10004 10jun2010 1 1 1 .228 .069
                  10004 09dec2010 1 2 1 1... .069
                  10004 14apr2011 1 3 1 .691 .069
                  10004 29sep2011 1 4 1 .796 .069
                  10004 19apr2012 1 5 1 . .069
                  10004 20sep2012 1 6 1 .76 .069 INCORRECT SHOULD BE ‘.532’
                  10005 22oct2004 , . 1 -.016 .
                  10005 23jun2005 , . 1 .516 .
                  10005 01jun2006 1 1 1 -.016 .707
                  10005 25oct2006 1 2 1 .691 .707
                  10005 01mar2007 1 3 1 .62 .707
                  10005 10sep2007 1 4 1 .088 .707
                  10005 13mar2008 1 5 1 .796 .707
                  10005 05sep2008 1 6 1 .727 .707
                  10005 12mar2009 1 7 1 .691 .707
                  10005 04sep2009 1 8 1 .691 .707
                  10005 03sep2010 1 9 1 .691 .707
                  10005 08sep2011 1 10 1 . .707
                  10005 27jul2012 1 11 1 .691 .707 CORRECT
                  10006 02oct2006 1 1 1 .055 .741
                  10006 09jan2007 1 2 1 .623 .741
                  10006 10sep2007 1 3 1 .727 .741
                  10006 25mar2008 1 4 1 .796 .741
                  10006 15sep2008 1 5 1 .727 .741
                  10006 19mar2009 1 6 1 .727 .741
                  10006 21sep2009 1 7 1 .691 .741
                  10006 18mar2010 1 8 1 .727 .741
                  10006 10mar2011 1 9 1 .691 .741
                  10006 08mar2012 1 10 1 .796 .741 CORRECT
                  10008 18sep2007 . . 1 . .
                  10008 15jan2008 . . 1 .088 .
                  10008 24apr2008 1 1 1 -.016 .
                  10008 24jul2008 1 2 1 .689 .
                  10008 06oct2008 1 3 1 .62 .
                  10008 06apr2009 1 4 1 .587 .
                  10008 01oct2009 1 5 1 .186 .
                  10008 08apr2010 1 6 1 -.016 . CORRECT
                  10008 25jun2010 1 7 1 -.016 . NO ANSWER (MISSING)
                  10008 17sep2010 . . 1 .848 .
                  10008 11mar2011 . . 1 .725 .
                  10008 05aug2011 . . 1 -.016 .
                  10008 27jan2012 . . 1 .725 .
                  10008 31aug2012 . . 1 .725 .
                  10010 21jun2007 1 1 1 -.003 .799
                  10010 13sep2007 1 2 1 .691 .799
                  10010 27mar2008 1 3 1 .587 .799
                  10010 11sep2008 1 4 1 .691 .799
                  10010 13jan2009 1 5 1 .055 .799 WRONG, SHOULD BE 0.058
                  10010 02apr2009 . . 1 .796 .
                  10010 08oct2009 . . 1 .691 .
                  10010 06may2010 . . 1 .796 .
                  10010 16sep2010 . . 1 .796 .
                  10010 25apr2011 . . 1 .796 .
                  10010 20oct2011 . . 1 .796 .
                  10010 16apr2012 . . 1 .796 .
                  10011 16jun2006 1 1 1 .656 .
                  10011 01sep2006 1 2 1 .796 .
                  10011 16mar2007 1 3 1 .796 .
                  10011 07sep2007 1 4 1 .796 .
                  10011 29feb2008 1 5 1 .796 .
                  10011 12sep2008 1 6 1 .796 .
                  10011 06mar2009 1 7 1 .796 .
                  10011 11sep2009 1 8 1 .796 .
                  10011 01oct2010 1 9 1 .656 .
                  10011 09sep2011 1 10 1 . .
                  10011 28sep2012 1 11 1 . . CORRECT (MISSING)
                  10013 14jul2004 1 1 1 -.016 .
                  10013 15oct2004 1 2 1 .62 . WRONG SHOULD BE 0.78
                  10013 17jan2005 . . 1 .516 .
                  10013 11apr2005 . . 1 .743 .
                  10013 06oct2005 . . 1 .691 .
                  10013 08mar2006 . . 1 .62 .
                  10013 18aug2006 . . 1 .587 .
                  10013 18dec2006 . . 1 .587 .
                  10013 07mar2007 . . 1 .691 .
                  10013 13sep2007 . . 1 .62 .
                  10013 07jan2008 . . 1 1 .
                  10013 11aug2008 . . 1 1 .
                  10013 11feb2009 . . 1 1 .
                  10013 10aug2009 . . 1 .814 .
                  10013 09aug2010 . . 1 .76 .
                  10013 07jul2011 . . 1 . .
                  10013 19jul2012 . . 1 . .
                  10014 05jun2007 1 1 1 .258 .7419999
                  10014 14sep2007 1 2 1 1 .7419999
                  10014 14mar2008 1 3 1 1 .7419999
                  10014 19sep2008 1 4 1 1 .7419999
                  10014 13mar2009 1 5 1 1 .7419999
                  10014 04sep2009 1 6 1 1 .7419999
                  10014 26feb2010 1 7 1 1 .7419999
                  10014 03sep2010 1 8 1 1 .7419999
                  10014 04mar2011 1 9 1 1 .7419999
                  10014 09sep2011 1 10 1 1 .7419999
                  10014 14sep2012 1 11 1 1 .7419999 CORRECT

                  Comment


                  • #10
                    This appears to be a sort order issue.
                    Your data example is still not lined up right (see for example line 4 which has too few values) so I can't import it into Stata to play with the whole thing.
                    However the issue, as near as I can tell without importing the full data example, is that the visspell1 numbering is not necessarily in date order. The code sorts on date first, then spell, then visspell1. So you end up taking the difference between the first and last date within recnum=1.

                    It sounds like maybe what you want is:
                    Code:
                    gen recnum1 = recnum == 1
                    bysort recnum1 id (spell1 visspell1) : gen diff = measure[_N] - measure[1] if recnum1
                    but you need to think through exactly what you mean by first and last and then write your sort statement accordingly.

                    Comment


                    • #11
                      Sorry, but your copying hasn't worked to produce an unambiguous dataset. Beyond the header I count 6 field, 7, 7, 6, ...

                      I must get back to the day job; anyone else wanting to have a go would surely want to know what your code was, still not given.

                      Comment


                      • #12
                        Actually, looking more carefully I realized that my date sorting issue was caused by the fact that I was pasting the raw data into excel to separate and then into the Stata data editor. I ended up getting the dates as strings instead of real dates. If I convert it back into a date variable, Nick's code works fine for id=10002.

                        So from there I'll ask the same question Nick did: What code did you type to get the results you posted?

                        Comment


                        • #13
                          Nick and others
                          Here is the corrected version of the data.
                          The issue I think lies in the [_N] and [1] bits of the measure. I was perhaps unclear as to what I meant by 'first' and 'last' values.
                          What I want is the difference between the 'last' value and the 'first' value of a measure WITHIN a spell. In id 10001, below, the spell does not start until
                          the second line (i.e. spell1 and visspell1 are missing data in the first row. The spell1 ENDS, for my purposes, at row 7 (on 08may2009) The diff is correct because the values of eq5d are the same in row 1 (which should not be included, but which would be eq5d[1] and row 2, which actually is the beginning of the spell (i.e. spell1[1]) and ends on row 7.
                          For id 10002, the spell starts on 26aug2005 and ends on 09Aug2007, but the patient is followed until 08oct2010, which would be eq5d{_N]. IF we take the value for eq5d on 08oct2010, (.727) and subtract from it the value for eq5d[1], we get .14, (.727-.587 =.14), but what I want is the value for eq5d when the SPELL STARTS (i.e. visspell1 == 1) and when the spell ENDS (in this case visspell1=5, but in other instances it could be a lot of other numbers.) Basically I want the value of the measure for visspell1[1] and visspell1[_N].
                          How do I get that?
                          It is a conditional in that the command would be subtract the value of eq5d for visspell1[_N] from the value of eq5d for visspell1[1]
                          i.e. diff = eq5d (if visspell1[_N] - eq5d (if visspell1[1])
                          Is that possible.
                          Thanks
                          Don

                          id date spell1 visspell1 recnum eq5d diffeq
                          10001 30nov2006 . . 1 .088 .
                          10001 12feb2007 1 1 1 .088 .532
                          10001 01may2007 1 2 1 .293 .532
                          10001 14dec2007 1 3 1 . .532
                          10001 30may2008 1 4 1 .516 .532
                          10001 21nov2008 1 5 1 .62 .532
                          10001 08may2009 1 6 1 .62 .532 CORRECT
                          10002 15feb2005 . . 1 .587 .
                          10002 17may2005 . . 1 .691 .
                          10002 26aug2005 1 1 1 .587 .14
                          10002 01dec2005 1 2 1 .727 .14
                          10002 08may2006 1 3 1 .587 .14
                          10002 30jan2007 1 4 1 .055 .14
                          10002 09aug2007 1 5 1 .587 .14 INCORRECT SHOULD BE ‘0’
                          10002 21jul2009 . . 1 .055 .
                          10002 29oct2009 . . 1 .587 .
                          10002 29apr2010 . . 1 .727 .
                          10002 28oct2010 . . 1 .727 .
                          10004 07jun2006 . . 1 .691 .
                          10004 30aug2006 . . 1 .76 .
                          10004 27apr2007 . . 1 .796 .
                          10004 07sep2007 . . 1 .796 .
                          10004 10jun2010 1 1 1 .228 .069
                          10004 09dec2010 1 2 1 1... .069
                          10004 14apr2011 1 3 1 .691 .069
                          10004 29sep2011 1 4 1 .796 .069
                          10004 19apr2012 1 5 1 . .069
                          10004 20sep2012 1 6 1 .76 .069 INCORRECT SHOULD BE ‘.532’
                          10005 22oct2004 , . 1 -.016 .
                          10005 23jun2005 , . 1 .516 .
                          10005 01jun2006 1 1 1 -.016 .707
                          10005 25oct2006 1 2 1 .691 .707
                          10005 01mar2007 1 3 1 .62 .707
                          10005 10sep2007 1 4 1 .088 .707
                          10005 13mar2008 1 5 1 .796 .707
                          10005 05sep2008 1 6 1 .727 .707
                          10005 12mar2009 1 7 1 .691 .707
                          10005 04sep2009 1 8 1 .691 .707
                          10005 03sep2010 1 9 1 .691 .707
                          10005 08sep2011 1 10 1 . .707
                          10005 27jul2012 1 11 1 .691 .707 CORRECT
                          10006 02oct2006 1 1 1 .055 .741
                          10006 09jan2007 1 2 1 .623 .741
                          10006 10sep2007 1 3 1 .727 .741
                          10006 25mar2008 1 4 1 .796 .741
                          10006 15sep2008 1 5 1 .727 .741
                          10006 19mar2009 1 6 1 .727 .741
                          10006 21sep2009 1 7 1 .691 .741
                          10006 18mar2010 1 8 1 .727 .741
                          10006 10mar2011 1 9 1 .691 .741
                          10006 08mar2012 1 10 1 .796 .741 CORRECT
                          10008 18sep2007 . . 1 . .
                          10008 15jan2008 . . 1 .088 .
                          10008 24apr2008 1 1 1 -.016 .
                          10008 24jul2008 1 2 1 .689 .
                          10008 06oct2008 1 3 1 .62 .
                          10008 06apr2009 1 4 1 .587 .
                          10008 01oct2009 1 5 1 .186 .
                          10008 08apr2010 1 6 1 -.016 . CORRECT
                          10008 25jun2010 1 7 1 -.016 . NO ANSWER (MISSING)
                          10008 17sep2010 . . 1 .848 .
                          10008 11mar2011 . . 1 .725 .
                          10008 05aug2011 . . 1 -.016 .
                          10008 27jan2012 . . 1 .725 .
                          10008 31aug2012 . . 1 .725 .
                          10010 21jun2007 1 1 1 -.003 .799
                          10010 13sep2007 1 2 1 .691 .799
                          10010 27mar2008 1 3 1 .587 .799
                          10010 11sep2008 1 4 1 .691 .799
                          10010 13jan2009 1 5 1 .055 .799 WRONG, SHOULD BE 0.058
                          10010 02apr2009 . . 1 .796 .
                          10010 08oct2009 . . 1 .691 .
                          10010 06may2010 . . 1 .796 .
                          10010 16sep2010 . . 1 .796 .
                          10010 25apr2011 . . 1 .796 .
                          10010 20oct2011 . . 1 .796 .
                          10010 16apr2012 . . 1 .796 .
                          10011 16jun2006 1 1 1 .656 .
                          10011 01sep2006 1 2 1 .796 .
                          10011 16mar2007 1 3 1 .796 .
                          10011 07sep2007 1 4 1 .796 .
                          10011 29feb2008 1 5 1 .796 .
                          10011 12sep2008 1 6 1 .796 .
                          10011 06mar2009 1 7 1 .796 .
                          10011 11sep2009 1 8 1 .796 .
                          10011 01oct2010 1 9 1 .656 .
                          10011 09sep2011 1 10 1 . .
                          10011 28sep2012 1 11 1 . . CORRECT (MISSING)
                          10013 14jul2004 1 1 1 -.016 .
                          10013 15oct2004 1 2 1 .62 . WRONG SHOULD BE 0.78
                          10013 17jan2005 . . 1 .516 .
                          10013 11apr2005 . . 1 .743 .
                          10013 06oct2005 . . 1 .691 .
                          10013 08mar2006 . . 1 .62 .
                          10013 18aug2006 . . 1 .587 .
                          10013 18dec2006 . . 1 .587 .
                          10013 07mar2007 . . 1 .691 .
                          10013 13sep2007 . . 1 .62 .
                          10013 07jan2008 . . 1 1 .
                          10013 11aug2008 . . 1 1 .
                          10013 11feb2009 . . 1 1 .
                          10013 10aug2009 . . 1 .814 .
                          10013 09aug2010 . . 1 .76 .
                          10013 07jul2011 . . 1 . .
                          10013 19jul2012 . . 1 . .
                          10014 05jun2007 1 1 1 .258 .7419999
                          10014 14sep2007 1 2 1 1 .7419999
                          10014 14mar2008 1 3 1 1 .7419999
                          10014 19sep2008 1 4 1 1 .7419999
                          10014 13mar2009 1 5 1 1 .7419999
                          10014 04sep2009 1 6 1 1 .7419999
                          10014 26feb2010 1 7 1 1 .7419999
                          10014 03sep2010 1 8 1 1 .7419999
                          10014 04mar2011 1 9 1 1 .7419999
                          10014 09sep2011 1 10 1 1 .7419999
                          10014 14sep2012 1 11 1 1 .7419999 CORRECT

                          Comment


                          • #14
                            The [_N] and [1] designations are fine as long as you properly define the group that you want to look within.
                            I think what you want is probably something like:
                            Code:
                            bysort recnum id spell (visspell1) : gen diff=eq5d[_N] - eq5d[1] if recnum==1

                            Comment


                            • #15
                              Sarah
                              BIngo!. That is it. Many thanks. I guess my fundamental misunderstandings were (a) not to initially define the question correctly, and (b) not understand the bysort command.
                              Don

                              Comment

                              Working...
                              X