Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Keeping variables with the highest three values

    Hello,

    I have a dataset with 100 countries (please see the sample screenshot below) where I would like to keep only 3 variables with the highest values for each country.

    I was not able to come up with the solution/code. I would really appreciate if you could share your thoughts on this.

    Many thanks in advance.

    Click image for larger version

Name:	sa.PNG
Views:	1
Size:	6.7 KB
ID:	1349896




  • #2
    Please study http://www.statalist.org/forums/help#stata on not presuming Excel. not using screenshots and on giving us data as CODE we can copy, paste and run.

    In general, there seems be no reason why the three highest values for each country should occur in the same variables.

    This may help. See also http://www.stata-journal.com/sjpdf.h...iclenum=pr0046

    Code:
    . clear
    
    . set obs 2
    obs was 0, now 2
    
    . gen country = cond(_n == 1, "X", "Y")
    
    . version 10: set seed 2803
    
    . foreach v in healthcare crime economy education environment {
      2.         gen `v'_3ip = runiform()
      3. }
    
    . reshape long @_3ip, i(country) j(which) string
    (note: j = crime economy education environment healthcare)
    
    Data                               wide   ->   long
    -----------------------------------------------------------------------------
    Number of obs.                        2   ->      10
    Number of variables                   6   ->       3
    j variable (5 values)                     ->   which
    xij variables:
    crime_3ip economy_3ip ... healthcare_3ip  ->   _3ip
    -----------------------------------------------------------------------------
    
    . rename _3ip score
    
    . sort country score
    
    . list
    
         +----------------------------------+
         | country         which      score |
         |----------------------------------|
      1. |       X         crime   .2176738 |
      2. |       X       economy   .3009803 |
      3. |       X    healthcare   .5476295 |
      4. |       X     education    .827122 |
      5. |       X   environment   .9975566 |
         |----------------------------------|
      6. |       Y       economy   .1240616 |
      7. |       Y     education   .2128766 |
      8. |       Y    healthcare   .2639835 |
      9. |       Y         crime   .3344298 |
     10. |       Y   environment   .9873822 |
         +----------------------------------+
    
    . bysort country (score): keep if _n > _N - 3  
    (4 observations deleted)
    
    . by country : gen rank = _N - _n + 1
    
    . list
    
         +-----------------------------------------+
         | country         which      score   rank |
         |-----------------------------------------|
      1. |       X    healthcare   .5476295      3 |
      2. |       X     education    .827122      2 |
      3. |       X   environment   .9975566      1 |
      4. |       Y    healthcare   .2639835      3 |
      5. |       Y         crime   .3344298      2 |
         |-----------------------------------------|
      6. |       Y   environment   .9873822      1 |
         +-----------------------------------------+
    
    . drop which
    
    . reshape wide score, i(country) j(rank)
    (note: j = 1 2 3)
    
    Data                               long   ->   wide
    -----------------------------------------------------------------------------
    Number of obs.                        6   ->       2
    Number of variables                   3   ->       4
    j variable (3 values)              rank   ->   (dropped)
    xij variables:
                                      score   ->   score1 score2 score3
    -----------------------------------------------------------------------------
    
    . list
    
         +------------------------------------------+
         | country     score1     score2     score3 |
         |------------------------------------------|
      1. |       X   .9975566    .827122   .5476295 |
      2. |       Y   .9873822   .3344298   .2639835 |
         +------------------------------------------+
    Last edited by Nick Cox; 19 Jul 2016, 03:48.

    Comment


    • #3
      Gokay:
      surely les efficient than Nick's solution:
      Code:
      . set obs 2
      
      . g country=_n
      
      . g A=runiform()*2
      
      . g B=runiform()*3
      
      . g C=runiform()*4
      
      . g D=runiform()*5
      
      . egen First=rowmax( A B C D)
      
      . foreach var of varlist A-D {
        2. replace `var'=. if `var'== First
        3.  }
      
      
      . egen Second=rowmax( A B C D)
      
      . foreach var of varlist A-D {
        2. replace `var'=. if `var'== Second
        3.  }
      
      
      . egen Third=rowmax( A B C D)
      
      . foreach var of varlist A-D {
        2. replace `var'=. if `var'== Third
        3.  }
      
      
      . drop A B C D
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Dear Nick,

        Code works very well, many thanks for your help - really appreciated.

        Best wishes,

        Gokay


        PS: My apologies for posting the screenshot - I understand why it is not helpful and will stick to the rules from now on.


        Comment


        • #5
          Originally posted by Carlo Lazzaro View Post
          Gokay:
          surely les efficient than Nick's solution:
          Code:
          . set obs 2
          
          . g country=_n
          
          . g A=runiform()*2
          
          . g B=runiform()*3
          
          . g C=runiform()*4
          
          . g D=runiform()*5
          
          . egen First=rowmax( A B C D)
          
          . foreach var of varlist A-D {
          2. replace `var'=. if `var'== First
          3. }
          
          
          . egen Second=rowmax( A B C D)
          
          . foreach var of varlist A-D {
          2. replace `var'=. if `var'== Second
          3. }
          
          
          . egen Third=rowmax( A B C D)
          
          . foreach var of varlist A-D {
          2. replace `var'=. if `var'== Third
          3. }
          
          
          . drop A B C D
          Dear Carlo,

          Many thanks for help, I will also try your code.

          Best wishes,

          Gokay

          Comment

          Working...
          X