Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to recode variables based on frequency counts

    Hi again -- I have a categorical variable with 57 categories and would like to create a new variable that brings the number of categories down based on a rule that if a category has less than 10 observations, it will become a new "other" category. Please see my data below. Thank you for any guidance.

    Code:
     tab big_publishers
    
                  big_publishers |      Freq.     Percent        Cum.
    -----------------------------+-----------------------------------
                       2K Sports |          1        0.19        0.19
                             3DO |          1        0.19        0.37
           Acclaim Entertainment |         25        4.63        5.00
                      Activision |         42        7.78       12.78
                           Atari |         14        2.59       15.37
                           Atlus |          2        0.37       15.74
              BAM! Entertainment |          6        1.11       16.85
                          Bandai |          5        0.93       17.78
                     Buena Vista |          2        0.37       18.15
                          Capcom |         19        3.52       21.67

  • #2
    Scott:
    perhaps the following toy-example gives what you're after:
    Code:
    . set obs 27
    number of observations (_N) was 0, now 27
    
    . g big_publishers="2K Sports" in 1
    (26 missing values generated)
    
    . replace big_publishers="3DO" in 2
    (1 real change made)
    
    . replace big_publishers="Acclaim Entertainment" if big_publishers==""
    variable big_publishers was str9 now str21
    (25 real changes made)
    
    . bysort big_publishers: g count=_n
    
    . bysort big_publishers: egen flag= count(count)
    
    . bysort big_publishers: gen new_big_publishers=big_publishers if flag>=10
    (2 missing values generated)
    
    . bysort big_publishers: replace new_big_publishers="Others" if flag<10
    (2 real changes made)
    
    . list big_publishers count flag new_big_publishers
    
         +--------------------------------------------------------------+
         |        big_publishers   count   flag      new_big_publishers |
         |--------------------------------------------------------------|
      1. |             2K Sports       1      1                  Others |
      2. |                   3DO       1      1                  Others |
      3. | Acclaim Entertainment       1     25   Acclaim Entertainment |
      4. | Acclaim Entertainment       2     25   Acclaim Entertainment |
      5. | Acclaim Entertainment       3     25   Acclaim Entertainment |
         |--------------------------------------------------------------|
      6. | Acclaim Entertainment       4     25   Acclaim Entertainment |
      7. | Acclaim Entertainment       5     25   Acclaim Entertainment |
      8. | Acclaim Entertainment       6     25   Acclaim Entertainment |
      9. | Acclaim Entertainment       7     25   Acclaim Entertainment |
     10. | Acclaim Entertainment       8     25   Acclaim Entertainment |
         |--------------------------------------------------------------|
     11. | Acclaim Entertainment       9     25   Acclaim Entertainment |
     12. | Acclaim Entertainment      10     25   Acclaim Entertainment |
     13. | Acclaim Entertainment      11     25   Acclaim Entertainment |
     14. | Acclaim Entertainment      12     25   Acclaim Entertainment |
     15. | Acclaim Entertainment      13     25   Acclaim Entertainment |
         |--------------------------------------------------------------|
     16. | Acclaim Entertainment      14     25   Acclaim Entertainment |
     17. | Acclaim Entertainment      15     25   Acclaim Entertainment |
     18. | Acclaim Entertainment      16     25   Acclaim Entertainment |
     19. | Acclaim Entertainment      17     25   Acclaim Entertainment |
     20. | Acclaim Entertainment      18     25   Acclaim Entertainment |
         |--------------------------------------------------------------|
     21. | Acclaim Entertainment      19     25   Acclaim Entertainment |
     22. | Acclaim Entertainment      20     25   Acclaim Entertainment |
     23. | Acclaim Entertainment      21     25   Acclaim Entertainment |
     24. | Acclaim Entertainment      22     25   Acclaim Entertainment |
     25. | Acclaim Entertainment      23     25   Acclaim Entertainment |
         |--------------------------------------------------------------|
     26. | Acclaim Entertainment      24     25   Acclaim Entertainment |
     27. | Acclaim Entertainment      25     25   Acclaim Entertainment |
         +--------------------------------------------------------------+
    
    tab new_big_publishers
    
       new_big_publishers |      Freq.     Percent        Cum.
    ----------------------+-----------------------------------
    Acclaim Entertainment |         25       92.59       92.59
                   Others |          2        7.41      100.00
    ----------------------+-----------------------------------
                    Total |         27      100.00
    Last edited by Carlo Lazzaro; 25 May 2018, 11:52.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Does this help?

      Code:
      . sysuse auto, clear
      (1978 Automobile Data)
      
      . tab rep78
      
           Repair |
      Record 1978 |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                1 |          2        2.90        2.90
                2 |          8       11.59       14.49
                3 |         30       43.48       57.97
                4 |         18       26.09       84.06
                5 |         11       15.94      100.00
      ------------+-----------------------------------
            Total |         69      100.00
      
      . bysort rep78 : gen freq = _N
      
      . replace rep78 = 6 if freq < 10
      (15 real changes made)
      
      . label def rep78 6 "other"
      
      . label val rep78 rep78
      
      . tab rep78
      
           Repair |
      Record 1978 |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                3 |         30       40.54       40.54
                4 |         18       24.32       64.86
                5 |         11       14.86       79.73
            other |         15       20.27      100.00
      ------------+-----------------------------------
            Total |         74      100.00
      EDIT: Seems similar in spirit to Carlo's.

      Comment


      • #4
        Definitely the same in spirit and it worked brilliantly. Thanks to both of you. Very kind.

        Comment

        Working...
        X