Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using countmatch or rangestat with a time interval

    For a +/- 2 month-year interval, I want to calculate for a given id, how many times did that id appear in the id2 column.

    based on the link below, I tried:
    1) countmatch id id2
    2) rangestat (count) n=id2, interval(id2 id id)

    But what I am unable to figure out is how to count it over a time interval?

    Would you have any advice to do this?

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(id monthyear id2)
    4217 525 .
    3217 526 4217
    4217 526 .
    3217 528 4217
    4217 528 .
    3217 529 .
    4217 529 4217
    3217 530 4217
    4217 530 3217
    4217 530 4217
    4217 533 .
    4217 533 4217
    3217 535 4217
    4217 535 4217
    4217 535 4217
    4217 535 4217
    4217 535 4217
    3217 536 4217
    3217 536 4217
    4217 536 4217
    3217 537 4217
    3217 537 4217
    4217 685 .
    end
    format %tm monthyear
    Dear all, I am working on a project and have run into a problem with my dataset. I have a dataset with two variables labled ID1 and ID2. I want to generate a
    Last edited by Harry Kaj; 01 Nov 2019, 17:11.

  • #2
    I m not familiar with -countmatch-. I don't think -rangestat- can do this. You need -rangejoin- (by Robert Picard, available from SSC):

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(id monthyear id2)
    4217 525 .
    3217 526 4217
    4217 526 .
    3217 528 4217
    4217 528 .
    3217 529 .
    4217 529 4217
    3217 530 4217
    4217 530 3217
    4217 530 4217
    4217 533 .
    4217 533 4217
    3217 535 4217
    4217 535 4217
    4217 535 4217
    4217 535 4217
    4217 535 4217
    3217 536 4217
    3217 536 4217
    4217 536 4217
    3217 537 4217
    3217 537 4217
    4217 685 .
    end
    format %tm monthyear
    
    
    tempfile copy
    clonevar link = id2
    save `copy'
    
    drop link
    clonevar link = id
    gen long obs_no = _n
    rangejoin monthyear -2 2 using `copy', by(link)
    
    collapse (count) wanted = link (first) id monthyear id2, by(obs_no)

    Comment


    • #3
      This seems a little simpler. rangerun is also from SSC. (In #1 please explain that the commands mentioned are from SSC: FAQ Advice #12.)


      Code:
      program myprog
          count if id == id2
          gen wanted = r(N)
      end
      
      rangerun myprog , int(monthyear -2 2) use(id*)

      Comment


      • #4
        Thanks, Clyde. This code gives me exactly what I am looking for.

        Originally posted by Clyde Schechter View Post
        I m not familiar with -countmatch-. I don't think -rangestat- can do this. You need -rangejoin- (by Robert Picard, available from SSC):

        Nick, thanks for your response.

        The rangerun command seems to be counting how many times id==id2 in the specific time interval. But what I am looking for is, for a fixed id, how many times the specific id==id2 in that time interval. For example, for id 4217 in monthyear 2004m8, id2==4217 11 times in the +/- 2 month interval.

        Comment


        • #5
          That being so, did you try the code with a by() option?

          Comment


          • #6
            I think that the rangerun solution that Nick was thinking of requires comparing the id of the current observation with the id2 of observations within the time window. That requires using the sprefix()option to create scalars that hold the values of each variable for the current observation. Similarly, Clyde's solution do not correctly handle the case where there is no match within the window. Here's a restating of both approach that yield the same results:

            Code:
            clear all
            
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input double(id monthyear id2)
            4217 525 .
            3217 526 4217
            4217 526 .
            3217 528 4217
            4217 528 .
            3217 529 .
            4217 529 4217
            3217 530 4217
            4217 530 3217
            4217 530 4217
            4217 533 .
            4217 533 4217
            3217 535 4217
            4217 535 4217
            4217 535 4217
            4217 535 4217
            4217 535 4217
            3217 536 4217
            3217 536 4217
            4217 536 4217
            3217 537 4217
            3217 537 4217
            4217 685 .
            end
            format %tm monthyear
            gen long obs = _n
            
            program myprog
                count if rr_id == id2
                gen wanted = r(N)
            end
            
            rangerun myprog , int(monthyear -2 2) sprefix(rr_)
            
            preserve
            keep if !mi(id2)
            keep id2 monthyear
            rename id2 id
            tempfile hold
            save "`hold'"
            restore
            rangejoin monthyear -2 2 using "`hold'", by(id)
            bysort obs (monthyear): gen wanted2 = _N
            replace wanted2 = 0 if mi(monthyear_U)
            by obs: keep if _n == 1
            drop monthyear_U
            list
            and the results:
            Code:
            . list
            
                 +-------------------------------------------------+
                 |   id   monthy~r    id2   obs   wanted   wanted2 |
                 |-------------------------------------------------|
              1. | 4217    2003m10      .     1        1         1 |
              2. | 3217    2003m11   4217     2        0         0 |
              3. | 4217    2003m11      .     3        2         2 |
              4. | 3217     2004m1   4217     4        1         1 |
              5. | 4217     2004m1      .     5        5         5 |
                 |-------------------------------------------------|
              6. | 3217     2004m2      .     6        1         1 |
              7. | 4217     2004m2   4217     7        4         4 |
              8. | 3217     2004m3   4217     8        1         1 |
              9. | 4217     2004m3   3217     9        4         4 |
             10. | 4217     2004m3   4217    10        4         4 |
                 |-------------------------------------------------|
             11. | 4217     2004m6      .    11        6         6 |
             12. | 4217     2004m6   4217    12        6         6 |
             13. | 3217     2004m8   4217    13        0         0 |
             14. | 4217     2004m8   4217    14       11        11 |
             15. | 4217     2004m8   4217    15       11        11 |
                 |-------------------------------------------------|
             16. | 4217     2004m8   4217    16       11        11 |
             17. | 4217     2004m8   4217    17       11        11 |
             18. | 3217     2004m9   4217    18        0         0 |
             19. | 3217     2004m9   4217    19        0         0 |
             20. | 4217     2004m9   4217    20       10        10 |
                 |-------------------------------------------------|
             21. | 3217    2004m10   4217    21        0         0 |
             22. | 3217    2004m10   4217    22        0         0 |
             23. | 4217     2017m2      .    23        0         0 |
                 +-------------------------------------------------+
            
            .

            Comment


            • #7
              Thank you -- this works great!

              Originally posted by Robert Picard View Post
              I think that the rangerun solution that Nick was thinking of requires comparing the id of the current observation with the id2 of observations within the time window. That requires using the sprefix()option to create scalars that hold the values of each variable for the current observation. Similarly, Clyde's solution do not correctly handle the case where there is no match within the window. Here's a restating of both approach that yield the same results:

              Comment

              Working...
              X