Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • strgroup 1.0.5 released

    The Stata command strgroup is updated to version 1.0.5 with a 4.5x performance improvement. Install with:

    Code:
                                                                                                                                                 
    net install strgroup, from("https://raw.githubusercontent.com/reifjulian/strgroup/master") replace
    strgroup performs fuzzy string matching using the Levenshtein edit distance. It groups similar strings based on a user-specified similarity threshold, which is useful for identifying potential matches between datasets that don't merge cleanly due to typos, abbreviations, or other inconsistencies. Detailed documentation is available on https://github.com/reifjulian/strgroup. Syntax and usage instructions can be accessed directly in Stata by typing help strgroup at the command prompt.

    Example: Identify potential matches between two datasets that didn't merge.
    Code:
    sysuse auto, clear
    tempfile t
    keep make price
    replace make = make + "a" in 5
    save "`t'"
    
    sysuse auto, clear
    keep make
    merge 1:1 make using "`t'"
    strgroup make if _merge!=3, gen(group) threshold(0.25)
    list make group if _merge!=3
    
           +-------------------------------+
           | make                    group |
           |-------------------------------|
        5. | Buick Electra 225a          1 |
       79. | Buick Electra 225           1 |
           +-------------------------------+
    ​​​​​​​
    Associate Professor of Finance and Economics
    University of Illinois
    www.julianreif.com
Working...
X