Understanding how -encode- determines label values

Andrew Dickens

Join Date: May 2014
Posts: 9

Understanding how -encode- determines label values

14 Jan 2015, 14:24

Hi all,

I'm running Stata 12 and noticed a strange feature about how encode assigns numeric values and labels. Here I encode some example string variables in 2 separate ways:

1) I encode the string using value labels that are identical to the original string variables
2) I encode one of the strings using a value label that is different than the original string variable

What I've noticed is that when using approach 1, the string variables are encoded in their appropriate order when viewed with labels, and when tabulating v1 without its labels the corresponding values are (1, 2, 3) and the same for v2.

However, when encoding using the approach 2, the string variables are again encoded in their appropriate order when viewed with labels, but when tabulating v2 without its labels the corresponding values are now (4, 5, 6).

Why?

For my actual data I require v2 have the corresponding values (1, 2, 3), but I do not want the attached labels to be identical to the original string variables. I suspect my problem is with how I'm using label define, but have been unsuccessful troubleshooting this seemingly trivial problem.

Code:

 
  // 1) define labels same as string variables
  
  clear all
  inp str20(v1)
  "Second string"
  "String one"
  "And number three"
  end
  
  input str20(v2)
  "Third string"
  "Another string one"
  "Second again"
  
  label define order1 1 "String one" 2 "Second string" 3 "And number three"
  encode v1, g(_v1) label(order1)
  drop v1
  rename _v1 v1
  
  label define order2 1 "Another string one" 2 "Second again" 3 "Third string"
  encode v2, g(_v2) label(order2)
  drop v2
  rename _v2 v2
  
  tab v1
  tab v1, nol
  tab v2
  tab v2, nol
  
  
  // 2) define labels different than string variables
  
  clear all
  inp str20(v1)
  "Second string"
  "String one"
  "And number three"
  end
  
  input str20(v2)
  "Third string"
  "Another string one"
  "Second again"
  
  label define order3 1 "String one" 2 "Second string" 3 "And number three"
  encode v1, g(_v1) label(order3)
  drop v1
  rename _v1 v1
  
  label define order4 1 "One" 2 "Two" 3 "Three"
  encode v2, g(_v2) label(order4)
  drop v2
  rename _v2 v2
  
  tab v1
  tab v1, nol
  tab v2
  tab v2, nol

Tags: None

Joe Canner

Join Date: Mar 2014

Posts: 580
#2

14 Jan 2015, 15:12

Andrew,

As I understand the encode command with the label option, if the value labels don't match values in your data set (as in your 3rd and 4th examples) the values that are in your data set that don't exist in your value label are added to the value label. Values that are added to the value label would get (in your 4th example) values 4, 5, and 6.

Regards,
Joe
Comment
Andrew Dickens

Join Date: May 2014

Posts: 9
#3

15 Jan 2015, 07:03

Hi Joe,

Makes sense to me now. Thanks for clarifying.

Best,

Andrew
Comment

Announcement

Understanding how -encode- determines label values

Comment

Comment