Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Copy highest, 2nd highest,... nth highest value into columns 1, 2, ..., n


    Hi to all!
    I have a data set with Countries, operators and dates. I know want to have the highest date (most recent) per group into a new column that only contains most the recent dates across groups. The same with the 2nd ... nth highest dates. There's about up to 10 of those different dates. The dates within a country group and in one column should obviously be constant.
    Country Operator Date highest date 2nd highest date 3 highest date
    US AT&T 2017 2017 2016 2004
    US Sprint 2016 2017 2016 2004
    US AT&T 2004 2017 2016 2004
    UK EE 2004 2004 2001 .
    UK O2 2001 2004 2001 .
    France Orange 2012 2012 . .
    France SFR 2012 2012 . .
    Can anyone help me set up those columns to the right, please? I tried to work with groups and ranks but nothing really worked.
    Thanks a lot!
    Bests

  • #2
    Please read and act on FAQ Advice #12 and #6.

    Data example in requested form; show us your code attempted; error reports like "nothing really worked" don't allow us to explain what you did wrong; real names are preferred here.

    Stata has variables, not columns.

    That said, this shows some technique.

    Code:
    . webuse grunfeld, clear 
    
    . 
    . bysort company (mvalue) : gen year1 = year[_N] 
    
    . by company (mvalue) : gen year2 = year[_N-1] 
    
    . by company (mvalue) : gen year3 = year[_N-2] 
    
    . 
    . list company year mvalue year? in 18/20 
    
         +-------------------------------------------------+
         | company   year   mvalue   year1   year2   year3 |
         |-------------------------------------------------|
     18. |       1   1937   5387.1    1953    1954    1937 |
     19. |       1   1954   5593.6    1953    1954    1937 |
     20. |       1   1953   6241.7    1953    1954    1937 |
         +-------------------------------------------------+
    Last edited by Nick Cox; 12 Aug 2018, 05:45.

    Comment


    • #3
      Thanks! Sorry, I've just read over the relevant FAQ paragraphs and will change that.

      Thanks for your help. This approach however, only gives me the highest value as long as those observations have distinct years assigned to them. So, if there's 1954 assigned twice it gives me that as the highest and second highest value. Can this be prevented? Thanks!

      I tried

      Code:
      sort company year
      egen date_ranks = rank(year), by(company)
      bysort company: gen year_1 ==  year if rank == 1
      bysort company: gen year_2 ==  year if rank == 2
      and so on but my ranks are partly decimal numbers (avg ranks). Any idea how to fix this?
      It is more about putting the right year in the right variable (year_n). I can "fill up" all the observations per company easily.
      Thanks!
      Last edited by John Stata; 12 Aug 2018, 06:18.

      Comment


      • #4
        Data example in requested form that shows the difficulty please.

        Also cross-posted at https://stackoverflow.com/questions/...n-new-variable

        Please read the entirety of the FAQ Advice -- as every prompt reminds you -- including policy on cross-posting.
        Last edited by Nick Cox; 12 Aug 2018, 07:20.

        Comment


        • #5
          Code:
          bys company: egen rank = rank(year), f 
          tab rank, gen(ear_)
          
          foreach v of var ear_* {
          by company: egen y`v' = max(cond(`v'==1,year,.))
          }
          drop rank ear_*
          For future posts, please notice to use -dataex- (as guided in section 12.2 of FAQ) to give out a small example of your data.

          Comment

          Working...
          X