Hello everyone,
I have a problem with inconsistent coding of a variable I'm working with and would like to automate the clean-up, however I'm stuck on how to do it. Since my data is confidential I will try to illustrate the problem with an example:
I have individual level panel data, where at each point in time for each person their place of work (variable "wp") is recorded. Unfortunately over time the coding changed and often differnt values and different labels are assigned to the same place over time. However, one part of the label (normally the name of the city) is contained in each label. So if one would look at the values of "wp" and its labels it would look something like this:
As one can see, the different labels all contain the name of the city ("Dresden" and "Berlin") but have different values and pre-/post-fixes. The values are somewhat mixed, so the values for one place have not to be consecutive. So the only option I would see to clean this up swiftly, is to supply Stata with the names that should be in every label of one place and then let it search for these names in all the labels and collect those labels and values that contain these "unchanging" names.
I tried something like this:
where "cities" is the name of the object that contains the labels. However, this obsviously does not do the trick since it only collects those variables for which the labels exactly match "Dresden" and "Berlin". However, what I would need is selection based on a part of the label. I looked at the "lookfor" command as well but it doesn't seem to do the trick neither. If anyone has an idea how to solve this, please let me know. If this needs clarification I'm happy to supply it.
I have a problem with inconsistent coding of a variable I'm working with and would like to automate the clean-up, however I'm stuck on how to do it. Since my data is confidential I will try to illustrate the problem with an example:
I have individual level panel data, where at each point in time for each person their place of work (variable "wp") is recorded. Unfortunately over time the coding changed and often differnt values and different labels are assigned to the same place over time. However, one part of the label (normally the name of the city) is contained in each label. So if one would look at the values of "wp" and its labels it would look something like this:
Value | Label |
110 | "Dresden" |
113 | "KR Dresden" |
117 | "PR Berlin Stadt" |
119 | "Berlin" |
221 | "Dresden KRF" |
As one can see, the different labels all contain the name of the city ("Dresden" and "Berlin") but have different values and pre-/post-fixes. The values are somewhat mixed, so the values for one place have not to be consecutive. So the only option I would see to clean this up swiftly, is to supply Stata with the names that should be in every label of one place and then let it search for these names in all the labels and collect those labels and values that contain these "unchanging" names.
I tried something like this:
Code:
local names "Dresden Berlin" foreach v of local namen{ replace wp_new=v2+1 if v2!=. replace wp_new=1 if wp=="`v'":cities }
Comment