Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to find 2 consecutive values for a dataset with repeated observations

    Hello,

    I have the following data below with id, result and results-date. I would like to tag by id and date, observations that have at least 2 consecutive values of >=20 and have this be "1" in the new variable. If an observation does not have 2 consecutive values, I would like this tagged "0" . So for example, id 1 in my example dataset below has a 20 and immediately followed by a 20, so in the new variable, id 1 will be tagged a 1. I used the following code below but I am not getting this at all.


    [CODE]

    egen tag = tag(id resultdate)
    egen ngood = total(tag * (result >= 20)) , by(id)




    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    
    clear
    
    input byte(id result) int resultdate
    
    1 20 20089
    
    1 20 20211
    
    1 35 20226
    
    1 28 20227
    
    1 33 20241
    
    1 32 20281
    
    2 12 20089
    
    2 11 20211
    
    2 13 20226
    
    2 12 20241
    
    2 12 20270
    
    2 12 20298
    
    3 23 20089
    
    3 25 20211
    
    3 28 20226
    
    4 31 20089
    
    4 12 20211
    
    4 14 20226
    
    4 27 20227
    
    4 10 20241
    
    5 20 20089
    
    5 12 20211
    
    5 13 20226
    
    5 16 20241
    
    end
    
    format %tdnn/dd/CCYY resultdate
    Last edited by May Blake; 16 Feb 2020, 21:43.

  • #2
    Code:
    by id (resultdate), sort: gen wanted = min(result, result[_n-1]) >= 20 & _n > 1
    by id (wanted), sort: replace wanted = wanted[_N]

    Comment


    • #3
      Clyde Schechter answered the question very nicely. For a broader context see tsspell from SSC and https://www.stata-journal.com/articl...article=dm0029


      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input byte(id result) int resultdate
      1 20 20089
      1 20 20211
      1 35 20226
      1 28 20227
      1 33 20241
      1 32 20281
      2 12 20089
      2 11 20211
      2 13 20226
      2 12 20241
      2 12 20270
      2 12 20298
      3 23 20089
      3 25 20211
      3 28 20226
      4 31 20089
      4 12 20211
      4 14 20226
      4 27 20227
      4 10 20241
      5 20 20089
      5 12 20211
      5 13 20226
      5 16 20241
      end
      format %tdnn/dd/CCYY resultdate
      
      bysort id (resultdate) : gen pseudot = _n
      
      tsset id pseudot
             panel variable:  id (unbalanced)
              time variable:  pseudot, 1 to 6
                      delta:  1 unit
      
      tsspell, cond(result >= 20 & result < .)
      
      bysort id _spell (resultdate) : gen length = _N if _spell
      (12 missing values generated)
      
      sort id resultdate
      
      list, sepby(id _spell)
      
           +-------------------------------------------------------------------+
           | id   result   resultd~e   pseudot   _seq   _spell   _end   length |
           |-------------------------------------------------------------------|
        1. |  1       20    1/1/2015         1      1        1      0        6 |
        2. |  1       20    5/3/2015         2      2        1      0        6 |
        3. |  1       35   5/18/2015         3      3        1      0        6 |
        4. |  1       28   5/19/2015         4      4        1      0        6 |
        5. |  1       33    6/2/2015         5      5        1      0        6 |
        6. |  1       32   7/12/2015         6      6        1      1        6 |
           |-------------------------------------------------------------------|
        7. |  2       12    1/1/2015         1      0        0      0        . |
        8. |  2       11    5/3/2015         2      0        0      0        . |
        9. |  2       13   5/18/2015         3      0        0      0        . |
       10. |  2       12    6/2/2015         4      0        0      0        . |
       11. |  2       12    7/1/2015         5      0        0      0        . |
       12. |  2       12   7/29/2015         6      0        0      0        . |
           |-------------------------------------------------------------------|
       13. |  3       23    1/1/2015         1      1        1      0        3 |
       14. |  3       25    5/3/2015         2      2        1      0        3 |
       15. |  3       28   5/18/2015         3      3        1      1        3 |
           |-------------------------------------------------------------------|
       16. |  4       31    1/1/2015         1      1        1      1        1 |
           |-------------------------------------------------------------------|
       17. |  4       12    5/3/2015         2      0        0      0        . |
       18. |  4       14   5/18/2015         3      0        0      0        . |
           |-------------------------------------------------------------------|
       19. |  4       27   5/19/2015         4      1        2      1        1 |
           |-------------------------------------------------------------------|
       20. |  4       10    6/2/2015         5      0        0      0        . |
           |-------------------------------------------------------------------|
       21. |  5       20    1/1/2015         1      1        1      1        1 |
           |-------------------------------------------------------------------|
       22. |  5       12    5/3/2015         2      0        0      0        . |
       23. |  5       13   5/18/2015         3      0        0      0        . |
       24. |  5       16    6/2/2015         4      0        0      0        . |
           +-------------------------------------------------------------------+
      
      . 
      .
      Last edited by Nick Cox; 17 Feb 2020, 02:19.

      Comment


      • #4
        Thank you so much Nick and Clyde. Both of these solutions worked very well for my dataset. Thank you for taking the time to help me out.

        Comment

        Working...
        X