Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Understanding how -encode- determines label values

    Hi all,

    I'm running Stata 12 and noticed a strange feature about how encode assigns numeric values and labels. Here I encode some example string variables in 2 separate ways:

    1) I encode the string using value labels that are identical to the original string variables
    2) I encode one of the strings using a value label that is different than the original string variable

    What I've noticed is that when using approach 1, the string variables are encoded in their appropriate order when viewed with labels, and when tabulating v1 without its labels the corresponding values are (1, 2, 3) and the same for v2.

    However, when encoding using the approach 2, the string variables are again encoded in their appropriate order when viewed with labels, but when tabulating v2 without its labels the corresponding values are now (4, 5, 6).

    Why?

    For my actual data I require v2 have the corresponding values (1, 2, 3), but I do not want the attached labels to be identical to the original string variables. I suspect my problem is with how I'm using label define, but have been unsuccessful troubleshooting this seemingly trivial problem.

    Code:
     
      // 1) define labels same as string variables
      
      clear all
      inp str20(v1)
      "Second string"
      "String one"
      "And number three"
      end
      
      input str20(v2)
      "Third string"
      "Another string one"
      "Second again"
      
      label define order1 1 "String one" 2 "Second string" 3 "And number three"
      encode v1, g(_v1) label(order1)
      drop v1
      rename _v1 v1
      
      label define order2 1 "Another string one" 2 "Second again" 3 "Third string"
      encode v2, g(_v2) label(order2)
      drop v2
      rename _v2 v2
      
      tab v1
      tab v1, nol
      tab v2
      tab v2, nol
      
      
      // 2) define labels different than string variables
      
      clear all
      inp str20(v1)
      "Second string"
      "String one"
      "And number three"
      end
      
      input str20(v2)
      "Third string"
      "Another string one"
      "Second again"
      
      label define order3 1 "String one" 2 "Second string" 3 "And number three"
      encode v1, g(_v1) label(order3)
      drop v1
      rename _v1 v1
      
      label define order4 1 "One" 2 "Two" 3 "Three"
      encode v2, g(_v2) label(order4)
      drop v2
      rename _v2 v2
      
      tab v1
      tab v1, nol
      tab v2
      tab v2, nol
     

  • #2
    Andrew,

    As I understand the encode command with the label option, if the value labels don't match values in your data set (as in your 3rd and 4th examples) the values that are in your data set that don't exist in your value label are added to the value label. Values that are added to the value label would get (in your 4th example) values 4, 5, and 6.

    Regards,
    Joe

    Comment


    • #3
      Hi Joe,

      Makes sense to me now. Thanks for clarifying.

      Best,

      Andrew

      Comment

      Working...
      X