Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identify a subset of observations filtered on the value of var2 when var1 =x

    Hi there

    I have a large clinical dataset that includes data from multiple sites. Each entry represents an individual admission to hospital. Some hospitals have been contributing to the dataset for longer than others (i.e. later start date) but none have dropped out.

    I wish to include only sites that have been contributing data for the whole time period. I'm planning to do this by extracting all data for those sites that contributed entries in the first year, which is a proxy for the whole time period since there is no drop-out.

    The general principle would be to identify all the values of var2 within the subset of observations where var1 =x, then filter the whole dataset on those identified values of var2

    In the sample (invented) data below I want to extract all entries from sites that have contributed data in the year 2000 (i.e. sites 1 and 2 but not sites 3 and 4). I want all entries for sites 1 and 2, not just the ones from the year 2000.

    I can do this manually as shown below. Is there a more elegant/non manual way to do this?

    clear
    input pt_id year site age sex died
    1 2000 1 50 1 1
    2 2000 2 65 0 1
    3 2001 1 66 0 0
    4 2001 1 30 0 1
    5 2001 2 80 1 1
    5 2002 1 94 1 1
    6 2002 2 62 0 0
    7 2002 3 72 1 0
    8 2003 1 38 0 0
    9 2003 1 50 1 0
    10 2003 1 18 1 0
    11 2003 2 49 0 0
    12 2003 3 56 1 0
    13 2003 4 66 0 1
    14 2004 1 22 0 0
    15 2004 2 64 0 0
    16 2004 3 51 1 0
    17 2004 4 32 1 1
    18 2004 5 53 0 1
    end

    tab site if year ==2000

    site | Freq. Percent Cum.
    ------------+-----------------------------------
    1 | 1 50.00 50.00
    2 | 1 50.00 100.00
    ------------+-----------------------------------
    Total | 2 100.00


    keep if inlist (site,1,2)

    Thanks!


  • #2
    Aleece:
    welcome to this forum.
    You may want to consider:
    Code:
    . bysort site (year): gen wanted=1 if year[1]==2000
    
    . replace wanted=0 if wanted==.
    
    
    . list
    
         +-------------------------------------------------+
         | pt_id   year   site   age   sex   died   wanted |
         |-------------------------------------------------|
      1. |     1   2000      1    50     1      1        1 |
      2. |     3   2001      1    66     0      0        1 |
      3. |     4   2001      1    30     0      1        1 |
      4. |     5   2002      1    94     1      1        1 |
      5. |     9   2003      1    50     1      0        1 |
         |-------------------------------------------------|
      6. |    10   2003      1    18     1      0        1 |
      7. |     8   2003      1    38     0      0        1 |
      8. |    14   2004      1    22     0      0        1 |
      9. |     2   2000      2    65     0      1        1 |
     10. |     5   2001      2    80     1      1        1 |
         |-------------------------------------------------|
     11. |     6   2002      2    62     0      0        1 |
     12. |    11   2003      2    49     0      0        1 |
     13. |    15   2004      2    64     0      0        1 |
     14. |     7   2002      3    72     1      0        0 |
     15. |    12   2003      3    56     1      0        0 |
         |-------------------------------------------------|
     16. |    16   2004      3    51     1      0        0 |
     17. |    13   2003      4    66     0      1        0 |
     18. |    17   2004      4    32     1      1        0 |
     19. |    18   2004      5    53     0      1        0 |
         +-------------------------------------------------+
    
    . drop if wanted==0
    
    
    . list
    
         +-------------------------------------------------+
         | pt_id   year   site   age   sex   died   wanted |
         |-------------------------------------------------|
      1. |     1   2000      1    50     1      1        1 |
      2. |     3   2001      1    66     0      0        1 |
      3. |     4   2001      1    30     0      1        1 |
      4. |     5   2002      1    94     1      1        1 |
      5. |     9   2003      1    50     1      0        1 |
         |-------------------------------------------------|
      6. |    10   2003      1    18     1      0        1 |
      7. |     8   2003      1    38     0      0        1 |
      8. |    14   2004      1    22     0      0        1 |
      9. |     2   2000      2    65     0      1        1 |
     10. |     5   2001      2    80     1      1        1 |
         |-------------------------------------------------|
     11. |     6   2002      2    62     0      0        1 |
     12. |    11   2003      2    49     0      0        1 |
     13. |    15   2004      2    64     0      0        1 |
         +-------------------------------------------------+
    
    .
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      That's great, thank you!

      Comment

      Working...
      X