Dealing with unifying string observations

Michael LoGiudice

Join Date: Aug 2022

Posts: 2
#1

Dealing with unifying string observations

22 Jun 2023, 03:08

Hi all and thanks in advance for the help.

I am trying to merge crime data from across the US, which I collected from different cities. While some cities use unified codes for the type of crime (NIBRS, NCIC) others just write in the crime.

Thankfully, most write it in rather similarly (e.g., dataset A: "Larceny" while dataset B: Larceny - Theft"), but I am struggling with how to unify these 'unique' datasets with each other. Right now I would think I need to go through the crimes manually and match them. Then assign a numeric to each type of crime. Then replace the string with a uniform value for that crime.

Does anyone have another way in mind?
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17735
#2

23 Jun 2023, 02:43

Michael:
I'd check whether the -split- function available from -egen- can be of some help.

Kind regards,
Carlo
(Stata 19.0)
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1478
#3

23 Jun 2023, 02:47

The solutions to these kinds of problems are often quite specific to the exact nature of messiness in your data. You've given one example, but it would be good to have a proper data example (use the dataex command) with a decent sample of the types of problems you want to fix.
1 like
Comment

Announcement