Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Check if values are in a list

    I'm working with a dataset that includes names of cities and states, and I need to make sure the cities and states are spelled right--eg that "Sacramento" isn't entered as "Sarcamento." I've got almost 15,000 observations with thousands of cities, so I can't do it manually, but I do have a list of the universe all possible (correctly spelled) cities and states. Is there a way to check that every city-state entry I have is contained in that list?

    Cities and states are in separate variables for both the observations I have and the universe of possible names, so the variables look something like:

    City State PossibleCity PossibleState
    "City 1" "State 1" "PossibleCity 1" "Possible State 1"
    "City 2" "State 2" "PossibleCity 2" "PossibleState 2"

    Thanks!

  • #2
    merge the two files and look for mismatches.

    Comment


    • #3
      Whats the difference between City and PossibleCity if both are variable names in the same record.


      Assuming you have the universe of correctly spelled city names, you would have to put each city name in quotes and list them in the code as below

      Code:
      gen matchcity=1 if inlist(city, "cityname1", "cityname2", "cityname3..."///
      "cityname5", "cityname6", "cityname7", "cityname8".....)
      
      list city state if matchcity==.
      The above would be a very time consuming task if you have to write each city and state name

      Other option could be to do a m:1 merge and then list the observations in the master data that did not match with the using data (i.e. data with all city names correctly spelled)

      I am sure there is a better code out or stata Module that might do this more efficiently. Until someone posts that try this.

      Comment

      Working...
      X