Dropping groups of observations that have few observations across years

Mona Elsayed

Join Date: May 2020

Posts: 26
#1

Dropping groups of observations that have few observations across years

09 Sep 2022, 09:58

Hello everyone,

I have this hypothetical data which contains year, individual ID (indiv_id), and occupation codes of their occupations (occ_id).

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float(year indiv_id occ_id) 1998 1 1 1998 2 2 1998 3 5 1998 4 2 1998 5 3 1998 6 2 1998 7 4 1998 8 5 1998 9 5 1998 10 3 1998 11 2 2018 1 1 2018 2 1 2018 3 2 2018 4 3 2018 5 4 2018 6 5 2018 7 5 2018 8 2 2018 9 4 2018 10 3 2018 11 2 2018 12 5 end

I've generated a new variable and named it "observ" to count the number of individuals working in a specific occupation in a specific year by:

Code:

bysort occ_id year: egen observ = count(occ_id)

Now, I would like to keep only occupations that have at least 3 individuals in each of the two years.

I thought of generating a new variable "hint" that equals 1 if "observ" < 3 and then generating a new variable that sums the "hint" variable for each occupation across the 2 years to take a value of 0, 1 and 2, then drop if it is >=1. However, I am struggling in finding the exact command that creates such variable.

I hope anyone can help

Thanks
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10481
#2

09 Sep 2022, 10:21

Code:

bys occ_id (observ): keep if observ[1]>=3

The lowest value is sorted first, so the code keeps occupations where the minimum value of the count variable is at least 3.
Comment
Mona Elsayed

Join Date: May 2020

Posts: 26
#3

09 Sep 2022, 11:57

Thanks so much Andrew. It works directly for the case I stated.
Do you have any idea if I want to generate the variable I mentioned that sums the "hint" variable for each occupation across the 2 years. I am actually need to generate it in case I would like to relax the condition and allow for a number of observation less than 3 in only one year
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10481

09 Sep 2022, 16:27

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(year indiv_id occ_id)
1998  1 1
1998  2 2
1998  3 5
1998  4 2
1998  5 3
1998  6 2
1998  7 4
1998  8 5
1998  9 5
1998 10 3
1998 11 2
2018  1 1
2018  2 1
2018  3 2
2018  4 3
2018  5 4
2018  6 5
2018  7 5
2018  8 2
2018  9 4
2018 10 3
2018 11 2
2018 12 5
end

bys occ_id year: g hint=(_N<3)* (1/_N)
by occ_id: egen wanted= total(hint)

Res.:

Code:

. l, sepby(occ)

     +------------------------------------------+
     | year   indiv_id   occ_id   hint   wanted |
     |------------------------------------------|
  1. | 1998          1        1      1        2 |
  2. | 2018          1        1     .5        2 |
  3. | 2018          2        1     .5        2 |
     |------------------------------------------|
  4. | 1998          2        2      0        0 |
  5. | 1998          4        2      0        0 |
  6. | 1998          6        2      0        0 |
  7. | 1998         11        2      0        0 |
  8. | 2018          3        2      0        0 |
  9. | 2018          8        2      0        0 |
 10. | 2018         11        2      0        0 |
     |------------------------------------------|
 11. | 1998         10        3     .5        2 |
 12. | 1998          5        3     .5        2 |
 13. | 2018          4        3     .5        2 |
 14. | 2018         10        3     .5        2 |
     |------------------------------------------|
 15. | 1998          7        4      1        2 |
 16. | 2018          9        4     .5        2 |
 17. | 2018          5        4     .5        2 |
     |------------------------------------------|
 18. | 1998          3        5      0        0 |
 19. | 1998          9        5      0        0 |
 20. | 1998          8        5      0        0 |
 21. | 2018         12        5      0        0 |
 22. | 2018          7        5      0        0 |
 23. | 2018          6        5      0        0 |
     +------------------------------------------+

.

Comment

Mona Elsayed

Join Date: May 2020

Posts: 26
#5

09 Sep 2022, 17:44

Thanks so much
Comment

Announcement

Dropping groups of observations that have few observations across years

Comment

Comment

Comment

Comment