Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using the encode command to destring variables issue

    Hi!

    When using the encode command to destring variables it seems to change the value of the variables.

    For example when highlighting a random value (-1,2760%) it now says "1740". Why does the value change?

    I'm new to stata so all help is appreciated!



  • #2
    The point of -encode- is to map strings to numeric variables. -encode- uses alphanumeric order by default. When your string values are "A" "B" "C" "D" "E" this produces 1 2 3 4 5 and everyone is happy. But let's imagine that your string values are "1" "11" "111" "1111" .... "2" "21" "211" ... or whatever. That will end up with something corresponding to the sorted string order, such as I have just given, say, 1 2 3 4 ... 42 43 44
    -encode- does not look inside the string value to see a numeric value. -encode- does not think: what are the numbers here to extract?

    So, the problem is encapsulated in your title. You are using -encode- when the -destring- command is what you want. They solve quite different problems. Look at the help for -destring- to see what options you need to cope with decimal commas and percent signs.


    Comment


    • #3
      Thank you for the response!

      I've tried to destring in all possible ways.. Stata keeps telling me that my variable contains nonnumeric characters. This is a typical number under a variable "-1.00338443161457%"

      I've tried replacing the dots with commas and ive tried it without the percentage sign aswell...

      Comment


      • #4
        What did you exactly type?
        You should consider the ignore option for the "%", the dpcomma for commas, and perhaps (but be very careful about that), the force option, which will convert as missing values all observation with nonnumeric characters, beside those in ignore option.
        I advice you to use generate option instead of replace, so you'll be able to check if everything was ok.

        example :
        Code:
        destring var1, ignore("%") dpcomma gen(var1_) force
        tab var1 if var1_==.
        The tab line will report all values of the string variable that Stata couldn't destring with the above command.
        Check whether there is no another character that should be inserted in the ignore option.

        Comment


        • #5
          Try something like this
          Code:
          destring strvar, gen(numvar) dpcomma percent force
          tab strvar if missing(numvar)
          That will catch any decimal commas and percent signs. If there are other problems the tabulation will show other characters that are problematic.

          Please also (re-)read FAQ Advice Section 12 on showing us exact code used and exact results displayed.

          Comment

          Working...
          X