Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating new variable that picks up the a string value with the most frequency

    Hi

    Please, I want to create a string variable that picks up the value "EN" from the string variable location_ncreif (as shown in the example) as this is the value with the highest frequency
    i would normally do the following

    bysort snl_instn_key year location_ncreif : gen ttl_no_location_ncreif = _N
    bysort snl_instn_key year : egen max_no = max(ttl_no_location_ncreif)
    bysort snl_instn_key year : gen newvar = location_ncreif if max_no == ttl_no_location_ncreif

    (as shown in the second example)

    but that would leave the last 2 observations without the value "EN" because, of course, they don't match the condition

    so please, how can i create a variable with the value "EN" for the whole year
    and is there a command where i can directly generate a variable that picks up the string value with the highest frequency?

    Thank you in advance.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str9 snl_instn_key float year str9 location_ncreif
    "102910" 2002 "EN"       
    "102910" 2002 "EN"       
    "102910" 2002 "EN"       
    "102910" 2002 "EN"       
    "102910" 2002 "EN"       
    "102910" 2002 "EN"       
    "102910" 2002 "EN"       
    "102910" 2002 "EN"       
    "102910" 2002 "EN"       
    "102910" 2002 "EN"       
    "102910" 2002 "EN"       
    "102910" 2002 "EN"       
    "102910" 2002 "EN"       
    "102910" 2002 "EN"       
    "102910" 2002 "EN"       
    "102910" 2002 "EN"       
    "102910" 2002 "EN"       
    "102910" 2002 "EN"       
    "102910" 2002 "EN"       
    "102910" 2002 "SE"       
    "102910" 2002 "unkncreif"
    end


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str9 snl_instn_key float year str9 location_ncreif float(ttl_no_location_ncreif max_no) str9 ncreif_FE_no
    "102910" 2002 "EN"        19 19 "EN"
    "102910" 2002 "EN"        19 19 "EN"
    "102910" 2002 "EN"        19 19 "EN"
    "102910" 2002 "EN"        19 19 "EN"
    "102910" 2002 "EN"        19 19 "EN"
    "102910" 2002 "EN"        19 19 "EN"
    "102910" 2002 "EN"        19 19 "EN"
    "102910" 2002 "EN"        19 19 "EN"
    "102910" 2002 "EN"        19 19 "EN"
    "102910" 2002 "EN"        19 19 "EN"
    "102910" 2002 "EN"        19 19 "EN"
    "102910" 2002 "EN"        19 19 "EN"
    "102910" 2002 "EN"        19 19 "EN"
    "102910" 2002 "EN"        19 19 "EN"
    "102910" 2002 "EN"        19 19 "EN"
    "102910" 2002 "EN"        19 19 "EN"
    "102910" 2002 "EN"        19 19 "EN"
    "102910" 2002 "EN"        19 19 "EN"
    "102910" 2002 "EN"        19 19 "EN"
    "102910" 2002 "SE"         1 19 ""  
    "102910" 2002 "unkncreif"  1 19 ""  
    end



  • #2
    Code:
    bysort snl_instn_key year location_ncreif : gen ttl_no_location_ncreif = _N
    by snl_instn_key year (ttl_no_location_ncreif), sort: gen wanted = location_ncreif[_N]
    Note: you do not say what you want to do if two different values of location_ncreif are tied for the largest number of occurrences. The above code will break this tie randomly and irreproducibly. If you have a rule for how you want to break such ties, the code can be modified to take it into account.

    Comment


    • #3
      Thank you so much Schechter. The code much helped. Also thank you for raising the note, i was thinking of how to break the tie theoretically to avoid/minimize the basis. But in the meantime, please, how can i modify the code such that the variable "wanted" pick up the two names if it is a tie? for example, ENSE or so? Thanks again

      Comment


      • #4
        The mode() function of egen accepts string arguments IIRC.

        Comment


        • #5
          Thank you Cox. Also, that works great. Additionally, it automatically identifies the tie cases by generating missing values for them.

          Comment

          Working...
          X