Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Find the highest frequency.

    Dear All, Suppose the data set is:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str75 firmname double profit float(profitsd id)
    "中科搏锐(北京)科技有限公司"                         0 14.516553 1
    "中科搏锐(北京)科技有限公司"                         0 14.516553 1
    "中科搏锐(北京)科技有限公司"       -32.459999084472656 14.516553 1
    "中科搏锐(北京)科技有限公司"                         0 14.516553 1
    "中科搏锐(北京)科技有限公司"                         0 14.516553 1
    "中米(北京)农业科技股份有限公司"  -196.3000030517578  82.23363 2
    "中米(北京)农业科技股份有限公司"  -196.3000030517578  82.23363 2
    "中米(北京)农业科技股份有限公司"  -196.3000030517578  82.23363 2
    "中米(北京)农业科技股份有限公司" -12.420000076293945  82.23363 2
    "中米(北京)农业科技股份有限公司"  -196.3000030517578  82.23363 2
    "优伴(北京)文化产业有限公司"                         0  45.66051 3
    "优伴(北京)文化产业有限公司"                         0  45.66051 3
    "优伴(北京)文化产业有限公司"                         0  45.66051 3
    "优伴(北京)文化产业有限公司"        -102.0999984741211  45.66051 3
    "优伴(北京)文化产业有限公司"                         0  45.66051 3
    "北京万合鸿瑞科技有限公司"               8.699999809265137  8.322645 4
    "北京万合鸿瑞科技有限公司"               -9.90999984741211  8.322645 4
    "北京万合鸿瑞科技有限公司"               8.699999809265137  8.322645 4
    "北京万合鸿瑞科技有限公司"               8.699999809265137  8.322645 4
    "北京万合鸿瑞科技有限公司"               8.699999809265137  8.322645 4
    end
    For each id, I want to replace all the values with the one with highest frequency. For instance, all the values for id=1 should be 0 (with highest frequency). Similarly, for id=2, all the values should be -196.3000030518, and so on. Any suggestions are appreciated!
    Ho-Chuan (River) Huang
    Stata 19.0, MP(4)

  • #2
    More or less, you want:
    Code:
    by id, sort: egen modal_value = mode(profit)
    But your question as posed is indeterminate. It may be that for some id's there are two (or more) different values that occur equally often and they occur more often than any others. The code above will return missing value in situations like this. So you need some algorithm for breaking ties.

    Read -help egen- and scroll down to the information on the options you can specify to decide which to keep when there is more than one possible answer. If none of those options is suitable for your purposes, post back with a full explanation of how you want to handle this, and I'll try to craft some code for the purpose.

    Comment


    • #3
      Hi, Clyde, Thanks for your answer. In fact, the question was taken from other's (http://bbs.pinggu.org/thread-6540379-1-1.html).
      Ho-Chuan (River) Huang
      Stata 19.0, MP(4)

      Comment

      Working...
      X