Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping groups of observations that have few observations across years

    Hello everyone,

    I have this hypothetical data which contains year, individual ID (indiv_id), and occupation codes of their occupations (occ_id).

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(year indiv_id occ_id)
    1998  1 1
    1998  2 2
    1998  3 5
    1998  4 2
    1998  5 3
    1998  6 2
    1998  7 4
    1998  8 5
    1998  9 5
    1998 10 3
    1998 11 2
    2018  1 1
    2018  2 1
    2018  3 2
    2018  4 3
    2018  5 4
    2018  6 5
    2018  7 5
    2018  8 2
    2018  9 4
    2018 10 3
    2018 11 2
    2018 12 5
    end
    I've generated a new variable and named it "observ" to count the number of individuals working in a specific occupation in a specific year by:

    Code:
    bysort occ_id year: egen observ = count(occ_id)
    Now, I would like to keep only occupations that have at least 3 individuals in each of the two years.

    I thought of generating a new variable "hint" that equals 1 if "observ" < 3 and then generating a new variable that sums the "hint" variable for each occupation across the 2 years to take a value of 0, 1 and 2, then drop if it is >=1. However, I am struggling in finding the exact command that creates such variable.

    I hope anyone can help

    Thanks



  • #2
    Code:
    bys occ_id (observ): keep if observ[1]>=3
    The lowest value is sorted first, so the code keeps occupations where the minimum value of the count variable is at least 3.

    Comment


    • #3
      Thanks so much Andrew. It works directly for the case I stated.
      Do you have any idea if I want to generate the variable I mentioned that sums the "hint" variable for each occupation across the 2 years. I am actually need to generate it in case I would like to relax the condition and allow for a number of observation less than 3 in only one year

      Comment


      • #4
        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input float(year indiv_id occ_id)
        1998  1 1
        1998  2 2
        1998  3 5
        1998  4 2
        1998  5 3
        1998  6 2
        1998  7 4
        1998  8 5
        1998  9 5
        1998 10 3
        1998 11 2
        2018  1 1
        2018  2 1
        2018  3 2
        2018  4 3
        2018  5 4
        2018  6 5
        2018  7 5
        2018  8 2
        2018  9 4
        2018 10 3
        2018 11 2
        2018 12 5
        end
        
        bys occ_id year: g hint=(_N<3)* (1/_N)
        by occ_id: egen wanted= total(hint)
        Res.:

        Code:
        . l, sepby(occ)
        
             +------------------------------------------+
             | year   indiv_id   occ_id   hint   wanted |
             |------------------------------------------|
          1. | 1998          1        1      1        2 |
          2. | 2018          1        1     .5        2 |
          3. | 2018          2        1     .5        2 |
             |------------------------------------------|
          4. | 1998          2        2      0        0 |
          5. | 1998          4        2      0        0 |
          6. | 1998          6        2      0        0 |
          7. | 1998         11        2      0        0 |
          8. | 2018          3        2      0        0 |
          9. | 2018          8        2      0        0 |
         10. | 2018         11        2      0        0 |
             |------------------------------------------|
         11. | 1998         10        3     .5        2 |
         12. | 1998          5        3     .5        2 |
         13. | 2018          4        3     .5        2 |
         14. | 2018         10        3     .5        2 |
             |------------------------------------------|
         15. | 1998          7        4      1        2 |
         16. | 2018          9        4     .5        2 |
         17. | 2018          5        4     .5        2 |
             |------------------------------------------|
         18. | 1998          3        5      0        0 |
         19. | 1998          9        5      0        0 |
         20. | 1998          8        5      0        0 |
         21. | 2018         12        5      0        0 |
         22. | 2018          7        5      0        0 |
         23. | 2018          6        5      0        0 |
             +------------------------------------------+
        
        .

        Comment


        • #5
          Thanks so much

          Comment

          Working...
          X