Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Removing duplicate words from a string variable

    I have a variable with a row of ICD codes. Some of them are repeated
    One registration looks like this "I120 N185 I509 D649 J159 A402 K658 I109 J960 N185 I509 I109 R104X I120 I509 N185"
    There are duplicates of I120, N185 and I159. How can I remove them?
    I tried this https://stackoverflow.com/questions/...he-same-string but it seems it only remove duplicates when they follow one after the other.

  • #2
    Two methods here.


    Code:
    clear
    set obs 1 
    gen testex = "I120 N185 I509 D649 J159 A402 K658 I109 J960 N185 I509 I109 R104X I120 I509 N185"
    gen id = 1 
    
    split testex, gen(word)
    local nvars : word count `r(varlist)'
    
    * method 1 
    
    quietly forval j = 2/`nvars' { 
        local K = `j' - 1 
        forval k = 1/`K' { 
            replace word`j' = "" if word`k' == word`j' & word`k' != "" 
        }
    }
    
    egen wanted = concat(word*), p(" ")
    
    l wanted 
    
    * method 2 
    
    keep id word*  
    reshape long word, i(id) j(which)
    
    bysort id word (which) : keep if _n == 1 
    bysort id (which) : replace which = _n 
    reshape wide word, i(id) j(which)
    
    egen wanted = concat(word*), p(" ") 
    
    l wanted
    Note also itrim().



    Comment

    Working...
    X