Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • removing min value between variables, but only the first

    Hello Statalisters!

    Another potentially trivial question, bur again something I'm incapable of finding a solution to.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(var1 var2 var3 var4 var5)
    69   . 50  . 30
    40  50 40 60 80
    20  25 20 30  .
    80 100 20 40 50
    end

    I have 4 variables, each a different task. For each observation, efficiency scores are reported if the observation carried out that task. For the purposes of my research I wish to remove the minimum value in each observation, but only one of them. for example, the second observation - I wish to remove one of the 40's and for the 3rd observation I wish to remove the 20. I'm able to remove all minimum variables with the following code, but don't know how to remove just one:

    Code:
    foreach var of varlist var1-var4  {
     sum `var', meanonly
     qui gen `var'2 = `var' if `var' != r(min)
    
     }
    
    If anyone has any advice that would be much appreciated!
    
    Thanks in advance statalisters!!!

  • #2
    since you actually show 5 variables, I am a little confused; however, if your data set is relatively small, you can download -rowsort- (use -search- to find and get instructions for downloading) and then:
    Code:
    rowsort var1-var5, gen(newvar1-newvar5)
    you can then either delete newvar1 and use the other 4 or you can leave newvar1 in but just analyze the other 4

    if you have a large data set, follow the advice in the help file for -rowsort-

    note that you have not said how you want to use the resulting data; for certain uses that I can imagine, this would not be the correct strategy

    Comment


    • #3
      I suspect that you have a poor data structure for what you want. Consider this

      Code:
      clear 
      
      input byte(var1 var2 var3 var4 var5)
      69   . 50  . 30
      40  50 40 60 80
      20  25 20 30  .
      80 100 20 40 50
      end
      
      gen id = _n 
      
      reshape long var, i(id) j(which) 
      
      bysort id (var) : drop if (var == var[1] & _n == 2) | missing(var) 
      
      list, sepby(id) 
      
      reshape wide var, i(id) j(which) 
      
      list, sepby(id)
      I've added the reshape wide at the end, but my instinct is that you will have many other problems if you persist with your present structure.

      Comment


      • #4
        Hi Both, thank you !

        Apologies Rich, I meant var1-var5 and I'm still not sure how your method gets me to the solution.

        Nick, thank you - yes, that is a possible solution, but indeed it does mess up the data a little later down the line. Presumably there's just no way round this with the current structure? I'm not sure how better to structure my data.

        Code:
        foreach var of varlist var1-var4  {  
        sum `var', meanonly  
         qui gen `var'2 = `var' if `var' != r(min)
         qui gen `var'3 = `var' if `var' != r(min)  
        }
        Do you think it would be possible to do something like this - and then for all the `var'3 variables, perhaps there is a way of only keeping 1 non missing value in a row? After that I could just merge the rest.

        Thanks again

        ​​​​​​​

        Comment


        • #5
          I think I've already answered this as my code includes a reshape wide for anyone insisting on that. I advise against that based on the evidence here -- I can't infer anything else about your data or your goals --- but it's in my code.

          I don't see how your latest code can possibly do what you want. Your minimum now arises from summarizing each variable separately, which is not at all what (I think you said) you wanted.

          If I am misunderstanding you, then you may need to spell out more fully what you have and what you want. Adding why might be helpful too, as at first sight this is a bizarre request. If there were a rationale another approach might become evident.

          Also, I don't understand the complaint that any solution messes up the data, as doing precisely that appears an inevitable consequence of what you are asking for, namely to overwrite some values with missing. Further, there is some arbitrariness in which of two or more tied values are omitted, so you need to ensure that doesn't bite.

          Comment

          Working...
          X