I have a list of street addresses, cities, and a number that was assigned to each based (roughly) on neighborhood/area. However, some observations have missings for numbers, and I want to try to fill them in. The best way I can think of doing this is by matching on street name within a city. Of course, this is a little difficult given that a lot of the address data is fairly messy, and addresses don't follow perfect patterns in the first place.
The first part is fairly straightforward, I just parse and then create a new variable with address2 and address3, which should (usually) be most of the street name, or at least enough to match.
The second part is where I'm getting tripped up. I think I'd want something like (and I'm writing this in plain text on purpose because I'm just writing what I think I'd need in words):
gen matched_string = 1 if [multiple observations have the same city and new_address]
Then I could just replace number wherever I get a match:
If someone could help me with that middle part, I'd appreciate it. Here is some example data:
The first part is fairly straightforward, I just parse and then create a new variable with address2 and address3, which should (usually) be most of the street name, or at least enough to match.
Code:
split address gen new_address = address2 + " " + address3
The second part is where I'm getting tripped up. I think I'd want something like (and I'm writing this in plain text on purpose because I'm just writing what I think I'd need in words):
gen matched_string = 1 if [multiple observations have the same city and new_address]
Then I could just replace number wherever I get a match:
Code:
gen number_2 = . replace number_2 = number if matched_string == 1
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str22 address str12 city byte number "12 Grassy Knoll Rd" "Springfield" 13 "62 Prince Street" "Woodside" 42 "8732 Rockledge Road" "Woodside" 23 "306 Howard Street" "Woodside" 34 "9453 NW Hill Field Dr." "Woodside" 22 "123 Winter Oaks Lan" "Springfield" . "100 Bo Mountain Dr" "Woodside" 51 "711 Church St." "Woodside" 13 "9274 San Juan Court" "Collegeville" 44 "40 Pilgrim Drive" "Collegeville" 78 "93 Redwood Street" "Collegeville" 87 "Piper St. PO Box 10" "Collegeville" 56 "150 Winter Oaks Lane" "Springfield" 76 "1000 Prince St" "Springfield" . "5 NW Hill Field" "Woodside" . end

Comment