Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • extract variable name when the observation has a highest value.

    dear listers,

    I would like to fill in the cell with a variable name showing the highest value in a row from the variables list. In other words, if there are variables A,b and C, and their weights are 1.5, 1.6, and 1.9, respectively, I would like to show 'C' because the variable presenting highest number in a row. This is totally same to the previous post. https://www.statalist.org/forums/for...ariables-names

    However, my code returns 'no observations'.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(Ashley Avonmouth Bedminster)
    .47862031997167737  1.532034471353277   .335260066474374
    194.98321551370202  765.8470150728051  862.8241100970403
    1.0459228370345555 1.6267469649369481 .36140040372976157
    .47862031997167737  1.532034471353277   .335260066474374
    1.6624821901728994  5.725894733221313 3.0836531482405807
    end

    Code:
     local vars   Ashley-Bedminster
     
    egen  m2=rowmax(`vars')  
    gen m=""
    
    foreach var of local vars {
    replace m = "`var'" if m2==`vars'
    }
    tab m
    Any help would be much appreciated. Probably I have missed very basic coding grammar.

    Kind regards,

    Kim
    Last edited by sungwook kim; 23 Nov 2018, 07:13.

  • #2
    There are two issues that prevent you from getting your desired result:
    1. How you define a local macro
    2. Precision

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(Ashley Avonmouth Bedminster)
    .47862031997167737  1.532034471353277   .335260066474374
    194.98321551370202  765.8470150728051  862.8241100970403
    1.0459228370345555 1.6267469649369481 .36140040372976157
    .47862031997167737  1.532034471353277   .335260066474374
    1.6624821901728994  5.725894733221313 3.0836531482405807
    end
    
    *TO STORE FULL LIST OF VARIABLE NAMES IN A LOCAL MACRO
    qui ds
    local vars `r(varlist)'
    
    *ALTERNATIVELY, INCLUDE ALL VARIABLES EXPLICITLY
    local vars "Ashley Avonmouth Bedminster"
    
    *PRECISION: USE DOUBLE STORAGE TYPE, YOUR NUMBERS HAVE MANY DIGITS
    egen double m2= rowmax( `vars' )
    gen wanted=""
    foreach var of local vars {
    replace wanted = "`var'" if m2==`var'
    }
    drop m2
    Result:

    Code:
    
    . l
    
         +------------------------------------------------+
         |    Ashley   Avonmouth   Bedmins~r       wanted |
         |------------------------------------------------|
      1. | .47862032   1.5320345   .33526007    Avonmouth |
      2. | 194.98322   765.84702   862.82411   Bedminster |
      3. | 1.0459228    1.626747    .3614004    Avonmouth |
      4. | .47862032   1.5320345   .33526007    Avonmouth |
      5. | 1.6624822   5.7258947   3.0836531    Avonmouth |
         +------------------------------------------------+
    Last edited by Andrew Musau; 23 Nov 2018, 08:02.

    Comment


    • #3
      Dear Andrew,

      Always thank you very much with your advice.

      If so how can I put lots of consecutive string variables in a local macro? e.g. Ashley - Bedminster this is not possible?
      I would like to shorten it rather than showing all of them.

      Kind regards,

      Kim

      Comment


      • #4
        From # 2:

        Code:
        *TO STORE FULL LIST OF VARIABLE NAMES IN A LOCAL MACRO
        qui ds
        local vars `r(varlist)'
        This will do it if you want to include all variables. However, if you need to restrict some variables, e.g., not include string variables in the list, we can consider the following example:

        Code:
        . sysuse auto, clear
        (1978 Automobile Data)
        
        . desc
        
        Contains data from C:\Program Files (x86)\Stata15\ado\base/a/auto.dta
          obs:            74                          1978 Automobile Data
         vars:            12                          13 Apr 2016 17:45
         size:         3,182                          (_dta has notes)
        -------------------------------------------------------------------------------------------------------------------------------------------------------
                      storage   display    value
        variable name   type    format     label      variable label
        -------------------------------------------------------------------------------------------------------------------------------------------------------
        make            str18   %-18s                 Make and Model
        price           int     %8.0gc                Price
        mpg             int     %8.0g                 Mileage (mpg)
        rep78           int     %8.0g                 Repair Record 1978
        headroom        float   %6.1f                 Headroom (in.)
        trunk           int     %8.0g                 Trunk space (cu. ft.)
        weight          int     %8.0gc                Weight (lbs.)
        length          int     %8.0g                 Length (in.)
        turn            int     %8.0g                 Turn Circle (ft.)
        displacement    int     %8.0g                 Displacement (cu. in.)
        gear_ratio      float   %6.2f                 Gear Ratio
        foreign         byte    %8.0g      origin     Car type
        -------------------------------------------------------------------------------------------------------------------------------------------------------
        Sorted by: foreign

        Here, all variables have a numeric storage type except "make" which is a string variable. Therefore, to exclude strings:

        Code:
        . qui ds, has(type numeric)
        
        . local vars `r(varlist)'
        
        . di "`vars'"
        price mpg rep78 headroom trunk weight length turn displacement gear_ratio foreign
        To exclude a small subset from the list, you can subtract elements in two local macros. Say I wanted to exclude "rep78", "length" and "turn" from the above list, I would just define a macro with the exclusion list and do the subtraction as below:

        Code:
        . local toexclude "rep78 length turn"
        
        . local mylist: list vars - toexclude
        
        . di "`mylist'"
        price mpg headroom trunk weight displacement gear_ratio foreign

        If so how can I put lots of consecutive string variables in a local macro? e.g. Ashley - Bedminster this is not possible?
        I would like to shorten it rather than showing all of them.
        So for all strings

        Code:
        ds, has(type string)
        local vars `r(varlist)'
        To exclude some string variables in the list, exactly as above. But not in your case, Ashley - Bedminster are numeric variables from the dataex example in #1.
        Last edited by Andrew Musau; 23 Nov 2018, 09:19.

        Comment


        • #5
          I'm going to assume that in the actual dataset the variables in the variable list are adjacent to each other, as they are in the data in post #1. In that case, we don't need to worry about keeping or eliminating unwanted variables. In that case, three changes are needed to your original code.
          Code:
          local vars   Ashley-Bedminster
           
          egen double m2=rowmax(`vars')  
          gen m=""
          
          foreach var of varlist `vars' {
          replace m = "`var'" if m2==`var'
          }
          tab m
          First, as Andrew mentioned, m2 needs to be double to accurately store the maximum value of the list of double variables.

          Second, "Ashley-Bedminster" is just a string, to have foreach treat it as a varlist and expand it into a list of individual variables you need to use a different syntax.

          Third, repair a typo in your original code.

          Comment


          • #6
            Following the old dictum to "never compare floating point numbers for strict equality," my approach would be slightly different, i.e., no "if m2 == `var'".
            Code:
            gen maxvar = ""
            foreach v of varlist Ashley-Bedminster {
               replace maxvar = "`v'" if (`v' > maxval)
               replace maxval = `v' if (`v' > maxval)
            }

            Comment


            • #7
              I agree with Mike Lacy. This is what I would do.

              Code:
              gen maxval = Ashley
              gen whichmax = "Ashley"
              
              foreach v of varlist Avonmouth Bedminster {
                     replace whichmax = "`v'" if `v' > maxval
                     replace maxval = `v' if `v' > maxval
              }
              Although ties are unlikely to be an issue if the data resemble those in #1, note that this approach leaves the first of several variables equal to the maximum as that identified. If you had reason to want the last of several variables, use >= not >.

              Also, think very carefully about what you want to do with missing values. The code above does not ignore them.

              Comment


              • #8
                Dear all,

                Thank you very much for all of your valuable advice!

                Kind regards,

                Kim

                Comment

                Working...
                X