Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finding observations relative to other observations

    Dear Stata-Listers

    I have a problem which took me hours to think about but which may be easily solvable for more advanced users of STATA.
    Basically, I want to identify and tag observations that fulfill several conditions relative to other observations.
    I have a dataset of 500’000+ observations from emergency services. I am seeking the occurrence of a rare “constellation” following an incident (incident==1).
    I want to tag any observation, within ten minutes in start_datetime after the start_datetime of an incident observation, which is not itself an incident (incident==0), and which is a duplicate of the incident observation on some vars (frequency, dose, los, drid) but has a different value in var fid.
    In the below example, I would have obs 3 and 17 tagged, but no other obs.

    I need a tag variable (tag) which identifies the observation described above. At best, I want to create a new var (pair), with identifies the pair of the incident obs and the tagged obs so I could easily find the two belonging together.

    Any help is very appreciated. Thank you in advance!
    Daniel

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float incident double(start_datetime stop_datetime) str23 frequency float dose int los str34(drid fid) float(tag pair)
    1 1798954011000 1798954011000 "10-0-0-0" 2  5 "0x443178A53734C2B48498256FD0FD9BFB" "0x2A3BC31D862F19FC4D5F126B396CC2C0" 0 1
    0 1798954251000 1798977166000 "10-0-0-0" 1  5 "0x443178A53734C2B48498256FD0FD9BFB" "0x2A3BC31D862F19FC4D5F126B396CC2C0" 0 .
    0 1798954402000 1798985780000 "10-0-0-0" 2  5 "0x443178A53734C2B48498256FD0FD9BFB" "1xCBB12BE6F88769437BD13D8B267C656A" 1 1
    0 1799174378000 1801055663000 "1"        8 23 "0x904ABD4141385038BB85D9360748F82D" "0xDA5D3CAAAE366ED618B67695412629E4" 0 .
    0 1802429977000 1802680184000 "1"        8  6 "0x904ABD4141385038BB85D9360748F82D" "0xD3E24E45FE938598C4FB9B04C357CF9C" 0 .
    0 1802589402000 1803299158000 "1"        . 15 "0x904ABD4141385038BB85D9360748F82D" "0xEFD5D7A721BF321A5D24BC16CCAE1AD6" 0 .
    0 1802884729000 1803812926000 "1"        4 15 "0x904ABD4141385038BB85D9360748F82D" "0x2E652FE474BC4B58379E3AF3F204CC7C" 0 .
    0 1803569988000 1804067168000 "10"       1  9 "0x94F82965444657777314E059B961A3D9" "0xF740A242D8DEB84484D37603FA4D1427" 0 .
    0 1803978958000 1804084104000 "10-0-0-0" 4 16 "0x443178A53734C2B48498256FD0FD9BFB" "0x7329746111FB668F195EE365445EA3FE" 0 .
    0 1805180417000 1805353442000 "1"        2 17 "0x904ABD4141385038BB85D9360748F82D" "0xA7BF6C03C0C74BBF715C43CF24A719C9" 0 .
    0 1805605993000 1806078192000 "1"        1  7 "0x904ABD4141385038BB85D9360748F82D" "0xB92BA80DF45C4AA5CB369A0F10EDEBF4" 0 .
    1 1805744265000 1805744265000 "10-0-0-0" 5  8 "0x443178A53734C2B48498256FD0FD9BFB" "0xCCECF11E3F31434C3F880935700C096B" 0 .
    0 1805744385000 1808150593000 "1"        2  5 "0x904ABD4141385038BB85D9360748F82D" "0x9BA86B2F4CE3E01AD1616374801C3B2E" 0 .
    0 1807004217000 1807111244000 "10-0-0-0" 5  8 "0x443178A53734C2B48498256FD0FD9BFB" "0x794D2F7DA3409BB4D1932CB6BF3BCEA5" 0 .
    0 1810530166000 1811176640000 "1"        1 11 "0x904ABD4141385038BB85D9360748F82D" "0x12914E68A073DCA6238E1DB3DAEB1631" 0 .
    1 1831318408000 1831629487000 "10"       2 15 "0x94F82965444657777314E059B961A3D9" "0xD3767AD986390DDDC0CCBD5E123011D5" 0 2
    0 1831318708000 1832856920000 "10"       2 15 "0x94F82965444657777314E059B961A3D9" "1xD3767AD986390DDDC0CCBD5E123011D5" 1 2
    0 1832327702000 1832431659000 "10-0-0-0" 1 10 "0x443178A53734C2B48498256FD0FD9BFB" "0x8CEB9320117B09AA5BE56FCD54143581" 0 .
    0 1834510257000 1834823348000 "10"       5 14 "0x94F82965444657777314E059B961A3D9" "0x6E5267472EC1B116D3A2DB66EF534E0C" 0 .
    0 1835215610000 1835430926000 "10"       2  9 "0x94F82965444657777314E059B961A3D9" "0x834F6AFA88A0B217B84222732816CC14" 0 .
    end
    format %tc start_datetime
    format %tc stop_datetime

  • #2
    The following code will leave in memory a data set of pairs consisting of incidents and any non-incident meeting the criteria you describe.
    Code:
    gen `c(obs_t)' obs_no = _n
    preserve
    drop if incident
    tempfile copy
    save `copy'
    restore
    
    keep if incident
    local 10_minutes = msofminutes(10)
    rangejoin start_datetime 1 `10_minutes' using `copy', by(frequency dose los drid)
    drop if fid == fid_U | missing(obs_no_U)
    If you wish, you can merge this back to the original data. In that case, the "tag" will simply be the presence of the paired data in the "tagged" observations. (The paired variables are all suffixed with _U.)

    -rangejoin- is written by Robert Picard, and is available from SSC. To use it, you must also install -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer. It, too, is available from SSC.

    Comment


    • #3
      Consider this:

      Code:
      gen incident_num = sum(incident)
      bysort incident_num (start_datetime): gen byte within_window = (start_datetime - start_datetime[1] < 600000)
      duplicates tag incident_num frequency dose los drid if within_window, gen(my_tag)
      
      egen my_pair = group(incident_num my_tag) if my_tag != 0
      replace my_tag = 0 if incident == 1 | my_tag == .
      drop within_window
      which produces:
      Code:
      . list, noobs sepby(incident_num)
      
        +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
        | incident       start_datetime        stop_datetime   freque~y   dose   los                                 drid                                  fid   tag   pair   incide~m   my_tag   my_pair |
        |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
        |        1   02jan2017 05:26:51   02jan2017 05:26:51   10-0-0-0      2     5   0x443178A53734C2B48498256FD0FD9BFB   0x2A3BC31D862F19FC4D5F126B396CC2C0     0      1          1        0         1 |
        |        0   02jan2017 05:30:51   02jan2017 11:52:46   10-0-0-0      1     5   0x443178A53734C2B48498256FD0FD9BFB   0x2A3BC31D862F19FC4D5F126B396CC2C0     0      .          1        0         . |
        |        0   02jan2017 05:33:22   02jan2017 14:16:20   10-0-0-0      2     5   0x443178A53734C2B48498256FD0FD9BFB   1xCBB12BE6F88769437BD13D8B267C656A     1      1          1        1         1 |
        |        0   04jan2017 18:39:38   26jan2017 13:14:23          1      8    23   0x904ABD4141385038BB85D9360748F82D   0xDA5D3CAAAE366ED618B67695412629E4     0      .          1        0         . |
        |        0   11feb2017 10:59:37   14feb2017 08:29:44          1      8     6   0x904ABD4141385038BB85D9360748F82D   0xD3E24E45FE938598C4FB9B04C357CF9C     0      .          1        0         . |
        |        0   13feb2017 07:16:42   21feb2017 12:25:58          1      .    15   0x904ABD4141385038BB85D9360748F82D   0xEFD5D7A721BF321A5D24BC16CCAE1AD6     0      .          1        0         . |
        |        0   16feb2017 17:18:49   27feb2017 11:08:46          1      4    15   0x904ABD4141385038BB85D9360748F82D   0x2E652FE474BC4B58379E3AF3F204CC7C     0      .          1        0         . |
        |        0   24feb2017 15:39:48   02mar2017 09:46:08         10      1     9   0x94F82965444657777314E059B961A3D9   0xF740A242D8DEB84484D37603FA4D1427     0      .          1        0         . |
        |        0   01mar2017 09:15:58   02mar2017 14:28:24   10-0-0-0      4    16   0x443178A53734C2B48498256FD0FD9BFB   0x7329746111FB668F195EE365445EA3FE     0      .          1        0         . |
        |        0   15mar2017 07:00:17   17mar2017 07:04:02          1      2    17   0x904ABD4141385038BB85D9360748F82D   0xA7BF6C03C0C74BBF715C43CF24A719C9     0      .          1        0         . |
        |        0   20mar2017 05:13:13   25mar2017 16:23:12          1      1     7   0x904ABD4141385038BB85D9360748F82D   0xB92BA80DF45C4AA5CB369A0F10EDEBF4     0      .          1        0         . |
        |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
        |        1   21mar2017 19:37:45   21mar2017 19:37:45   10-0-0-0      5     8   0x443178A53734C2B48498256FD0FD9BFB   0xCCECF11E3F31434C3F880935700C096B     0      .          2        0         . |
        |        0   21mar2017 19:39:45   18apr2017 16:03:13          1      2     5   0x904ABD4141385038BB85D9360748F82D   0x9BA86B2F4CE3E01AD1616374801C3B2E     0      .          2        0         . |
        |        0   05apr2017 09:36:57   06apr2017 15:20:44   10-0-0-0      5     8   0x443178A53734C2B48498256FD0FD9BFB   0x794D2F7DA3409BB4D1932CB6BF3BCEA5     0      .          2        0         . |
        |        0   16may2017 05:02:46   23may2017 16:37:20          1      1    11   0x904ABD4141385038BB85D9360748F82D   0x12914E68A073DCA6238E1DB3DAEB1631     0      .          2        0         . |
        |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
        |        1   11jan2018 19:33:28   15jan2018 09:58:07         10      2    15   0x94F82965444657777314E059B961A3D9   0xD3767AD986390DDDC0CCBD5E123011D5     0      2          3        0         2 |
        |        0   11jan2018 19:38:28   29jan2018 14:55:20         10      2    15   0x94F82965444657777314E059B961A3D9   1xD3767AD986390DDDC0CCBD5E123011D5     1      2          3        1         2 |
        |        0   23jan2018 11:55:02   24jan2018 16:47:39   10-0-0-0      1    10   0x443178A53734C2B48498256FD0FD9BFB   0x8CEB9320117B09AA5BE56FCD54143581     0      .          3        0         . |
        |        0   17feb2018 18:10:57   21feb2018 09:09:08         10      5    14   0x94F82965444657777314E059B961A3D9   0x6E5267472EC1B116D3A2DB66EF534E0C     0      .          3        0         . |
        |        0   25feb2018 22:06:50   28feb2018 09:55:26         10      2     9   0x94F82965444657777314E059B961A3D9   0x834F6AFA88A0B217B84222732816CC14     0      .          3        0         . |
        +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      The variables my_tag and my_pair replicate your tag and pair, respectively.
      Last edited by Hemanshu Kumar; 25 Jun 2023, 09:32.

      Comment


      • #4
        Dear Clyde

        many, many thanks. The code you provide works perfectly for identifying the observations I need. The resulting dataset lists only the incident cases, not the "tagged" related observations. But with the info in obs_u var, I probably manage to identify them back after merging?


        @Hemanshu: Thank you very much, too. Your suggestion misses my requirements on var fid as far as I see, but that would probably require only a few more steps.
        Last edited by Daniel Schnyder; 26 Jun 2023, 10:33.

        Comment


        • #5
          Your suggestion misses my requirements on var fid as far as I see, but that would probably require only a few more steps.
          Sorry, I missed that. This should do it:

          Code:
          gen incident_num = sum(incident)
          bysort incident_num (start_datetime): gen byte within_window = (start_datetime - start_datetime[1] < 600000)
          duplicates tag incident_num frequency dose los drid if within_window, gen(my_tag)
          duplicates tag incident_num frequency dose los drid fid if within_window, gen(my_tag2)
          replace my_tag = 0 if my_tag2
          
          egen my_pair = group(incident_num my_tag) if my_tag != 0
          replace my_tag = 0 if incident == 1 | my_tag == .
          drop within_window my_tag2

          Comment

          Working...
          X