Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • subtracting last from first rows

    I have a dataset that has repeated values of a variable measured at different times as indicated by a variable 'seqsij' which takes the values of 1,2,3...N for each new value of a variable sij_sp within a given ID. Each value of seqsij and sij_sp constitues a new row. I want to subtract the value of the LAST measure of sij_sp from the FIRST measure of sij_sp.

    I tried the codes suggested by Nick Cox in response to a query by Roman Mostazir on 02Mar 2015; i.e.

    bysort id (pre) : gen change = kimsob[_n] - kimsob[1]

    but my code is

    #
    Code:
    bysort id (seqsij) :gen sijdif = sij_sp[_N] - sij_sp[1]
    #
    Code:

    When I do this I get all missing values.

    I also tried the suggestion of Klaudia Erhardt in response to the same query where Klaudia used
    bysort id: gen change = cond(kinsob != kinsob[_n-1], kinsob - kinsob[_n-1],.)

    but my code was
    #
    Code:
    bysort id (seqsij): gen sijdif =cond(sij_sp != sij_sp[_n-1], sij_sp - sij_sp[_n-1],.)
    #
    and this worked, but it does not give the FIRST - LAST value; it just provides the value of the difference between the current value and the immediately preceding one.

    If I try

    ​#
    Code:
    bysort id (seqsij sij_sp): gen sijdif2 = cond(sij_sp[1]!= sij_sp[_N],sij_sp[1] - sij_sp[_N],.)
    #
    I get all missing values.

    Where is my error?

    Many thanks.

    Don

  • #2
    The allusion is to http://www.statalist.org/forums/foru...acting-by-rows where I suggested using _N (not _n) .

    Let's start from the top and assume an identifier, time variable and outcome variable id time y. Then

    Code:
     
    bysort id (time) : gen change = y[_N] - y[1]
    would be change from first to last except that if either of those values of y were missing, then so too would the change be missing. If you have missings, you would need to segregate them or otherwise ignore them. One way to do that would be

    Code:
     
    gen ispresent = !missing(y) 
    bysort ispresent id (time) : gen change = y[_N] - y[1]
    bysort id (change) : replace change = change[1]
    As I understand it, the equivalent for your code would be

    Code:
    gen ispresent = !missing(y) 
    bysort ispresent id (seqsij) : gen sijdif = sij_sp[_N] - sig_sp[1] 
    bysort id (seqsij) : replace sijdif = sijdif[1] 
    




    Comment


    • #3
      There is an abundance of missing data for each variable in the dataset, save for id and time (which I call 'edate'); however, for the variables seqsij and sij_sp there are no missing data. Any given id may have 'n' rows of data; seqsij and sij_sp may be present in only 3 or 4 of these rows, but for every seqsij value there is a corresponding sij_sp value and my interpretation of _N is that it reflects the last value of seqsij for that id, not the last observation for the ID; hence the need to bysort id (seqsij).

      In such an instance, do I need the 'missing' step.
      At any rate, my time variable is called edate. So I tried

      #
      Code:
      gen ispresent = !missing(edate)
      bysort ispresent id (seqsij) : gen sijdif = sij_sp[_N] - sij[_sp[1]
      bysort id (seqsij) : replace sijdif = sijdif[1]
      #
      When I did the first bysort command, and then did a sum sijdif, I got all missing values
      The second bysort command had 0 changes made.

      Don

      Comment


      • #4
        You say that

        1. For seqsij and sij_sp there are no missing data.

        and also that

        2. seqsij and sij_sp may be present in only 3 or 4 of the rows, meaning observations, for each id.

        On the face of it #2 contradicts #1 and would explain missing values.

        Missing naturally, means empty for string variables and system missing . or higher for numeric variables.

        Presumably these data are confidential but the lack of a concrete example is most frustrating here. Is it possible to post
        the results of

        Code:
        describe seqsij and sij_sp
        gen nmissing = missing(seqsij) + missing(sij_sp)
        tab nmissing
        ?

        Last edited by Nick Cox; 14 Mar 2015, 12:02.

        Comment


        • #5
          Nick
          I did what you suggested and got values of
          nmissing 0 = 550
          nmissing 2 = 5648
          Meaning that there are far more missing than non-missing.

          There is nothing overly confidential about the data. Here is a snippet.

          #
          Code:
           
          id edate seqsij sij_sp
          4 11-Feb-03 1 0
          4 27-Jan-05 2 1
          4 31-Aug-06 3 5
          4 11-Sep-07 4 2
          4 31-Jul-03 . .
          6 16-Nov-05 . .
          6 11-May-05 . .
          7 16-Jun-05 1 0
          7 20-Dec-05 2 2
          7 1-Jun-07 3 5
          7 29-Aug-07 . .
          7 12-Feb-07 . .
          7 8-Jan-03 . .
          7 4-Feb-05 . .
          8 29-Nov-04 1 1
          8 29-Dec-05 2 2
          8 11-Nov-07 3 4
          8 24-Apr-02 . .
          #
          Don

          Comment


          • #6
            Thanks. What I suggested works, but you really do need to do work with missings on seqsij not edate. (There is a stray typo in the last code block of my #2, but you translated to one of your variables, just the wrong one.)

            Code:
            . clear
            
            . input id str9 edate     seqsij  sij_sp
            
                        id      edate     seqsij     sij_sp
              1. 4       "11-Feb-03"     1       0
              2. 4       "27-Jan-05"     2       1
              3. 4       "31-Aug-06"     3       5
              4. 4       "11-Sep-07"     4       2
              5. 4       "31-Jul-03"     .       .
              6. 6       "16-Nov-05"     .       .
              7. 6       "11-May-05"     .       .
              8. 7       "16-Jun-05"     1       0
              9. 7       "20-Dec-05"     2       2
             10. 7       "1-Jun-07"      3       5
             11. 7       "29-Aug-07"     .       .
             12. 7       "12-Feb-07"     .       .
             13. 7       "8-Jan-03"      .       .
             14. 7       "4-Feb-05"      .       .
             15. 8       "29-Nov-04"     1       1
             16. 8       "29-Dec-05"     2       2
             17. 8       "11-Nov-07"     3       4
             18. 8       "24-Apr-02"     .       .
             19. end
            
            .
            . gen ispresent = !missing(seqsij)
            
            . bysort ispresent id (seqsij) : gen diff = sij_sp[_N] - sij_sp[1]
            (8 missing values generated)
            
            . bysort id (seqsij) : replace diff = diff[1]
            (6 real changes made)
            
            .
            . list , sepby(id)
            
                 +----------------------------------------------------+
                 | id       edate   seqsij   sij_sp   ispres~t   diff |
                 |----------------------------------------------------|
              1. |  4   11-Feb-03        1        0          1      2 |
              2. |  4   27-Jan-05        2        1          1      2 |
              3. |  4   31-Aug-06        3        5          1      2 |
              4. |  4   11-Sep-07        4        2          1      2 |
              5. |  4   31-Jul-03        .        .          0      2 |
                 |----------------------------------------------------|
              6. |  6   16-Nov-05        .        .          0      . |
              7. |  6   11-May-05        .        .          0      . |
                 |----------------------------------------------------|
              8. |  7   16-Jun-05        1        0          1      5 |
              9. |  7   20-Dec-05        2        2          1      5 |
             10. |  7    1-Jun-07        3        5          1      5 |
             11. |  7   29-Aug-07        .        .          0      5 |
             12. |  7    8-Jan-03        .        .          0      5 |
             13. |  7    4-Feb-05        .        .          0      5 |
             14. |  7   12-Feb-07        .        .          0      5 |
                 |----------------------------------------------------|
             15. |  8   29-Nov-04        1        1          1      3 |
             16. |  8   29-Dec-05        2        2          1      3 |
             17. |  8   11-Nov-07        3        4          1      3 |
             18. |  8   24-Apr-02        .        .          0      3 |
                 +----------------------------------------------------+
            .
            Last edited by Nick Cox; 14 Mar 2015, 12:36.

            Comment


            • #7
              Nick
              It works just the way I want. MANY thanks.
              Next time I will post sample data as a matter of course.
              Don

              Comment

              Working...
              X