Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dealing with unifying string observations

    Hi all and thanks in advance for the help.

    I am trying to merge crime data from across the US, which I collected from different cities. While some cities use unified codes for the type of crime (NIBRS, NCIC) others just write in the crime.

    Thankfully, most write it in rather similarly (e.g., dataset A: "Larceny" while dataset B: Larceny - Theft"), but I am struggling with how to unify these 'unique' datasets with each other. Right now I would think I need to go through the crimes manually and match them. Then assign a numeric to each type of crime. Then replace the string with a uniform value for that crime.

    Does anyone have another way in mind?

  • #2
    Michael:
    I'd check whether the -split- function available from -egen- can be of some help.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      The solutions to these kinds of problems are often quite specific to the exact nature of messiness in your data. You've given one example, but it would be good to have a proper data example (use the dataex command) with a decent sample of the types of problems you want to fix.

      Comment

      Working...
      X