Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identifying and flagging matches

    I have a large dataset with rows that include a unique identifier 'id' and a unique time variable called 'timeto'. There are variable numbers of duplicate id's but all in the dataset are duplicated at one point or another. I would like to create a flag indicating when two rows with the same id have 'timeto' values that are <= 2. If there are more than one, I'd like to count up. Examples below:

    starting data:

    id timeto
    1 3
    2 7
    2 8
    2 0
    2 14
    2 15
    3 4
    3 6

    ending data:
    id timeto flag
    1 3 0
    2 7 1
    2 8 1
    2 0 0
    2 14 2
    2 15 2
    3 4 1
    3 6 1

    Eventually, for those 'pairs' of rows with the same ID and timeto within 2, I'd be planning to pivot wide so there is a single row for each ID. Any help would be appreciated!


  • #2
    What would you want to do if the data were like this?

    id timeto
    1 3
    2 7
    2 8
    2 9
    2 10
    2 0
    2 14
    2 15
    3 4
    3 6

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      What would you want to do if the data were like this?

      id timeto
      1 3
      2 7
      2 8
      2 9
      2 10
      2 0
      2 14
      2 15
      3 4
      3 6
      Hi! thanks for the response. In this case:

      id timeto flag
      1 3 0
      2 7 1
      2 8 1
      2 9 1
      2 10 1
      2 0 0
      2 14 2
      2 15 2
      3 4 1
      3 6 1

      I expect the situation with 4 values close to one another to be rare (the 2, 7-10 combo you included above) just based on the nature of the data, so if there were a rare occurrence I think i could figure it out more manually.

      Comment


      • #4
        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input byte(id timeto)
        1  3
        2  7
        2  8
        2  9
        2 10
        2  0
        2 14
        2 15
        3  4
        3  6
        end
        
        gen `c(obs_t)' obs_no = _n  
        by id (obs_no), sort: gen byte close = inrange(abs(timeto[_n+1]-timeto), 0, 2) ///
            | inrange(abs(timeto-timeto[_n-1]), 0, 2)
        by id (obs_no): gen flag = sum(close == 1 & (close[_n-1] == 0 | _n == 1))
        replace flag = 0 if !close

        Comment

        Working...
        X