Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Keep then reshape

    I have sales data of two years (2017 and 2018). I want to keep only top 5 firms of 2017 by sales; that is, the new data set will contain only those firms which were "top five" in 2017 by sales (That means there will be exactly 5 firms in 2017 and 2018). After that I want to reshape the data set in order to have three variables: firm, sales2017,and sales2018.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int year str1 firms long sales
    2017 "a"  46546
    2017 "b"   2343
    2017 "c" 234234
    2017 "d"  34342
    2017 "e"   2342
    2017 "f"   2343
    2017 "g"     34
    2017 "h"    234
    2017 "i"   6786
    2018 "a"    345
    2018 "b"   5675
    2018 "c"  56345
    2018 "d"   5675
    2018 "e"    453
    2018 "f"    543
    2018 "g"   2342
    2018 "h" 343443
    2018 "i"    234
    2018 "j"    343
    end

  • #2
    You have two firms tying at 5th in 2017 at 2343. #6 is just 2342. Perhaps you made these data up, but this sort of quirk shows up the arbitrariness of this kind of procedure. Here I make a start and retire midway, disenchanted with the entire goal:

    Code:
    . bysort year (sales) : gen wanted = year == 2017 & _n > _N - 5
    
    . bysort firms (year) : replace wanted = wanted[1] 
    (5 real changes made)
    
    . 
    . sort wanted year sales 
    
    . list, sepby(wanted)  
    
         +--------------------------------+
         | year   firms    sales   wanted |
         |--------------------------------|
      1. | 2017       g       34        0 |
      2. | 2017       h      234        0 |
      3. | 2017       e     2342        0 |
      4. | 2017       f     2343        0 |
      5. | 2018       j      343        0 |
      6. | 2018       e      453        0 |
      7. | 2018       f      543        0 |
      8. | 2018       g     2342        0 |
      9. | 2018       h   343443        0 |
         |--------------------------------|
     10. | 2017       b     2343        1 |
     11. | 2017       i     6786        1 |
     12. | 2017       d    34342        1 |
     13. | 2017       a    46546        1 |
     14. | 2017       c   234234        1 |
     15. | 2018       i      234        1 |
     16. | 2018       a      345        1 |
     17. | 2018       d     5675        1 |
     18. | 2018       b     5675        1 |
     19. | 2018       c    56345        1 |
         +--------------------------------+

    Comment


    • #3
      Like Nick, I found your goal not well defined. I chose a different approach, reshaping first.
      Code:
      . reshape wide sales, i(firms) j(year)
      (note: j = 2017 2018)
      
      Data                               long   ->   wide
      -----------------------------------------------------------------------------
      Number of obs.                       19   ->      10
      Number of variables                   3   ->       3
      j variable (2 values)              year   ->   (dropped)
      xij variables:
                                        sales   ->   sales2017 sales2018
      -----------------------------------------------------------------------------
      
      . gsort -sales2017
      
      . list
      
           +-----------------------------+
           | firms   sal~2017   sal~2018 |
           |-----------------------------|
        1. |     c     234234      56345 |
        2. |     a      46546        345 |
        3. |     d      34342       5675 |
        4. |     i       6786        234 |
        5. |     b       2343       5675 |
           |-----------------------------|
        6. |     f       2343        543 |
        7. |     e       2342        453 |
        8. |     h        234     343443 |
        9. |     g         34       2342 |
       10. |     j          .        343 |
           +-----------------------------+
      
      // keep in 1/5
      // once you figure out how to break ties

      Comment


      • #4
        I agree with William that reshaping first is easier.

        Comment


        • #5
          Many thanks Nick Cox and William Lisowski . I indeed made a mistake in creating the example data. Next time I will keep it in mind. Thanks again!

          Comment

          Working...
          X