Identify a subset of observations filtered on the value of var2 when var1 =x

Aleece MacPhail

Join Date: Oct 2023

Posts: 2
#1

Identify a subset of observations filtered on the value of var2 when var1 =x

09 Oct 2023, 23:58

Hi there

I have a large clinical dataset that includes data from multiple sites. Each entry represents an individual admission to hospital. Some hospitals have been contributing to the dataset for longer than others (i.e. later start date) but none have dropped out.

I wish to include only sites that have been contributing data for the whole time period. I'm planning to do this by extracting all data for those sites that contributed entries in the first year, which is a proxy for the whole time period since there is no drop-out.

The general principle would be to identify all the values of var2 within the subset of observations where var1 =x, then filter the whole dataset on those identified values of var2

In the sample (invented) data below I want to extract all entries from sites that have contributed data in the year 2000 (i.e. sites 1 and 2 but not sites 3 and 4). I want all entries for sites 1 and 2, not just the ones from the year 2000.

I can do this manually as shown below. Is there a more elegant/non manual way to do this?

clear
input pt_id year site age sex died
1 2000 1 50 1 1
2 2000 2 65 0 1
3 2001 1 66 0 0
4 2001 1 30 0 1
5 2001 2 80 1 1
5 2002 1 94 1 1
6 2002 2 62 0 0
7 2002 3 72 1 0
8 2003 1 38 0 0
9 2003 1 50 1 0
10 2003 1 18 1 0
11 2003 2 49 0 0
12 2003 3 56 1 0
13 2003 4 66 0 1
14 2004 1 22 0 0
15 2004 2 64 0 0
16 2004 3 51 1 0
17 2004 4 32 1 1
18 2004 5 53 0 1
end

tab site if year ==2000

site | Freq. Percent Cum.
------------+-----------------------------------
1 | 1 50.00 50.00
2 | 1 50.00 100.00
------------+-----------------------------------
Total | 2 100.00

keep if inlist (site,1,2)

Thanks!
Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17730

10 Oct 2023, 00:15

Aleece:
welcome to this forum.
You may want to consider:

Code:

. bysort site (year): gen wanted=1 if year[1]==2000

. replace wanted=0 if wanted==.


. list

     +-------------------------------------------------+
     | pt_id   year   site   age   sex   died   wanted |
     |-------------------------------------------------|
  1. |     1   2000      1    50     1      1        1 |
  2. |     3   2001      1    66     0      0        1 |
  3. |     4   2001      1    30     0      1        1 |
  4. |     5   2002      1    94     1      1        1 |
  5. |     9   2003      1    50     1      0        1 |
     |-------------------------------------------------|
  6. |    10   2003      1    18     1      0        1 |
  7. |     8   2003      1    38     0      0        1 |
  8. |    14   2004      1    22     0      0        1 |
  9. |     2   2000      2    65     0      1        1 |
 10. |     5   2001      2    80     1      1        1 |
     |-------------------------------------------------|
 11. |     6   2002      2    62     0      0        1 |
 12. |    11   2003      2    49     0      0        1 |
 13. |    15   2004      2    64     0      0        1 |
 14. |     7   2002      3    72     1      0        0 |
 15. |    12   2003      3    56     1      0        0 |
     |-------------------------------------------------|
 16. |    16   2004      3    51     1      0        0 |
 17. |    13   2003      4    66     0      1        0 |
 18. |    17   2004      4    32     1      1        0 |
 19. |    18   2004      5    53     0      1        0 |
     +-------------------------------------------------+

. drop if wanted==0


. list

     +-------------------------------------------------+
     | pt_id   year   site   age   sex   died   wanted |
     |-------------------------------------------------|
  1. |     1   2000      1    50     1      1        1 |
  2. |     3   2001      1    66     0      0        1 |
  3. |     4   2001      1    30     0      1        1 |
  4. |     5   2002      1    94     1      1        1 |
  5. |     9   2003      1    50     1      0        1 |
     |-------------------------------------------------|
  6. |    10   2003      1    18     1      0        1 |
  7. |     8   2003      1    38     0      0        1 |
  8. |    14   2004      1    22     0      0        1 |
  9. |     2   2000      2    65     0      1        1 |
 10. |     5   2001      2    80     1      1        1 |
     |-------------------------------------------------|
 11. |     6   2002      2    62     0      0        1 |
 12. |    11   2003      2    49     0      0        1 |
 13. |    15   2004      2    64     0      0        1 |
     +-------------------------------------------------+

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Aleece MacPhail

Join Date: Oct 2023

Posts: 2
#3

10 Oct 2023, 00:37

That's great, thank you!
Comment

Announcement

Identify a subset of observations filtered on the value of var2 when var1 =x

Comment

Comment