Search and Select based on Labels

Julian Emmler

Join Date: Mar 2017

Posts: 7
#1

Search and Select based on Labels

04 Dec 2017, 06:42

Hello everyone,

I have a problem with inconsistent coding of a variable I'm working with and would like to automate the clean-up, however I'm stuck on how to do it. Since my data is confidential I will try to illustrate the problem with an example:

I have individual level panel data, where at each point in time for each person their place of work (variable "wp") is recorded. Unfortunately over time the coding changed and often differnt values and different labels are assigned to the same place over time. However, one part of the label (normally the name of the city) is contained in each label. So if one would look at the values of "wp" and its labels it would look something like this:

Value Label

110 "Dresden"

113 "KR Dresden"

117 "PR Berlin Stadt"

119 "Berlin"

221 "Dresden KRF"

As one can see, the different labels all contain the name of the city ("Dresden" and "Berlin") but have different values and pre-/post-fixes. The values are somewhat mixed, so the values for one place have not to be consecutive. So the only option I would see to clean this up swiftly, is to supply Stata with the names that should be in every label of one place and then let it search for these names in all the labels and collect those labels and values that contain these "unchanging" names.

I tried something like this:

Code:

local names "Dresden Berlin" foreach v of local namen{ replace wp_new=v2+1 if v2!=. replace wp_new=1 if wp=="`v'":cities }

where "cities" is the name of the object that contains the labels. However, this obsviously does not do the trick since it only collects those variables for which the labels exactly match "Dresden" and "Berlin". However, what I would need is selection based on a part of the label. I looked at the "lookfor" command as well but it doesn't seem to do the trick neither. If anyone has an idea how to solve this, please let me know. If this needs clarification I'm happy to supply it.
Tags: None
Felix Holub

Join Date: Jan 2016

Posts: 42
#2

04 Dec 2017, 09:23

If I were you I'd translate the variable into a string. Then apply certain tools, say regular expressions, to the string, and then (if you want) back into a numerical variable using encode.
Comment

Value	Label
110	"Dresden"
113	"KR Dresden"
117	"PR Berlin Stadt"
119	"Berlin"
221	"Dresden KRF"

Announcement

Search and Select based on Labels

Comment