Merging String Variables

Lidiya Bayliyeva

Join Date: Mar 2017

Posts: 1
#1

Merging String Variables

14 Mar 2017, 00:39

I would like to merge two data sets that have in common string variables. If I first encode the string variable in one data set and then do the same in another data set, I end up with two variables that have been encoded differently. My merge no longer works properly, because the numerically assigned labels are different. How do I ensure that during encoding Stata assigns the same numerical labels for both data sets for string variables so that the merge works properly? Is there a way to create unique labels for each string category in each data set and then ask Stata to merge using the same labels?
Tags: None
Jorrit Gosens

Join Date: Jan 2015

Posts: 1019
#2

14 Mar 2017, 02:28

Don't encode. The string format is fine for merging. Encoding is useful for dummy variables or variables that categorize observations in a limited number of groups. For xtset-ting youll also need a numeric variable, but I find it best to create a new variable for that purpose, either a copied and then encoded ID variable, or using egen group.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35659
#3

14 Mar 2017, 02:32

You can merge on string variables directly and that is the advised procedure if string variables are identifiers for your dataset.

If what to you are strings that mean the same are in fact different, e.g. through extra spaces, different punctuation, or differing use of lower and upper case, then that won't work as you want, e.g. "lidiya" "Lidiya" " lidiya " " Lidiya " are all different strings so far as Stata is concerned and the corresponding observations won't match in a merge

In principle the way to ensure identical results from encode is documented and easy to explain: you just use a label() option to refer to a single set of value labels previously defined. But encode is a red herring here: if the value labels would be identical, then so would the corresponding strings be identical.

If you have problems with strings that should be the same but aren't, then encode can't help you because it must produce, or can only work with, different value labels corresponding to different strings.
Comment

Announcement

Merging String Variables

Comment

Comment