Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to keep only the observations that are not repeated on a single string variable?

    Hello,

    I have a database that has repeated observations on a single variable. The problem is that the variable is a string type. I would like to develop a program in which it takes for example the first obs of the variable, then compare it to all the others and in the case that it is repeated the program could erase all but one, so there is just one and not a lot of repeated variables

    Something like this:

    var x
    1 SAN JAVIER
    2 SAN JAVIER
    3 MATEGUA
    4 OROBAYAYA
    5 SAN JAVIER
    6 OROBAYAYA
    7 SUCRE

  • #2
    All you need is
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte var str10 x
    1 "SAN JAVIER"
    2 "SAN JAVIER"
    3 "MATEGUA"   
    4 "OROBAYAYA" 
    5 "SAN JAVIER"
    6 "OROBAYAYA" 
    7 "SUCRE"     
    end
    
    by x, sort: keep if _n == 1
    Now, there is a problem. The first and second observations both have x = "SAN JAVIER", as does the fifth. You say you only want to retain one of those, but you don't say which one. And you will be left with a different value of var, depending on how you do it. The code above will choose one randomly. And it will not necessarily be the same one each time you run the code. It looks to me like the variable var is not of great importance--just a sequence number that can easily be replaced. But if your real data has other variables, you may find that this operation will discard important information, and your downstream analyses may become inconsistent garbage. So give the matter some thought.

    In the future, when showing data examples, please use the -dataex- command to do so, as I have done here. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.



    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Thank you very much for the code, I will run it. And also, I will use a dataex command next time.

      Comment

      Working...
      X