Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with Gen & string/numeric variable conversion

    Dear forum members,

    I am new to Stata and having a problem that I can't seem to find a straightforward answer to. I'm dealing with a data set that has mix of many string and numeric variables. So when I do a code such as "gen X = Y-Z", I get a type mismatch error. So then I try to encode the string variable, it goes through without an error but the actual number in X is not correct (not actual Y-Z calculation outcome). It seems to be a constant problem whenever I use a variable created from "encode" command.

    Could you please advise how to resolve this?

    I appreciate your help.

    Best,
    Nathan

  • #2
    if a variable, though technically a string, is actually a set of numbers, you want the -destring- command; see
    Code:
    h destring
    posting a -dataex- data example within CODE blocks makes it much easier to respond to questions; please read and follow the advice in the FAQ

    Comment


    • #3
      I agree with Rich Goldstein. The help and manual entries warn not to use encode if you need destring and there is a longer (either long-winded or gentle and leisurely) discussion at https://journals.sagepub.com/doi/pdf...867X1801800413

      In a nutshell: If your strings are akin to "Alabama" "Alaska" "Arizona" and you need a numeric variable, encode is right. Unless you specify otherwise you will get numeric values like 1 2 3 and the distinct strings will be mapped to value labels. That can be useful, as when tsset or xtset requires a numeric identifier.

      If your strings are akin to "1" "10" "100" "2" "20" "200" and you want a numeric variable, destring is right, but watch out for why you have a string variable. There might be a problem like metadata in your observations or or commas used as decimal points or missing values represented in an unStataish manner.

      But encode is quite wrong: if those were the only distinct values they would be mapped to 1 2 3 4 5 6 and the strings would appear as value labels, and the new variable might look fine in a list or tabulate or the Data Editor.. But typically the results of calculations with such a new variable will be quite wrong, as you have found.

      Comment


      • #4
        It worked! Thank you very much, Rich and Nick!

        Comment

        Working...
        X