Hello everyone,
I have this hypothetical data which contains year, individual ID (indiv_id), and occupation codes of their occupations (occ_id).
I've generated a new variable and named it "observ" to count the number of individuals working in a specific occupation in a specific year by:
Now, I would like to keep only occupations that have at least 3 individuals in each of the two years.
I thought of generating a new variable "hint" that equals 1 if "observ" < 3 and then generating a new variable that sums the "hint" variable for each occupation across the 2 years to take a value of 0, 1 and 2, then drop if it is >=1. However, I am struggling in finding the exact command that creates such variable.
I hope anyone can help
Thanks
I have this hypothetical data which contains year, individual ID (indiv_id), and occupation codes of their occupations (occ_id).
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float(year indiv_id occ_id) 1998 1 1 1998 2 2 1998 3 5 1998 4 2 1998 5 3 1998 6 2 1998 7 4 1998 8 5 1998 9 5 1998 10 3 1998 11 2 2018 1 1 2018 2 1 2018 3 2 2018 4 3 2018 5 4 2018 6 5 2018 7 5 2018 8 2 2018 9 4 2018 10 3 2018 11 2 2018 12 5 end
Code:
bysort occ_id year: egen observ = count(occ_id)
I thought of generating a new variable "hint" that equals 1 if "observ" < 3 and then generating a new variable that sums the "hint" variable for each occupation across the 2 years to take a value of 0, 1 and 2, then drop if it is >=1. However, I am struggling in finding the exact command that creates such variable.
I hope anyone can help
Thanks

Comment