Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Convert non numeric values to numeric ones (destring, or encode not properly solve it)

    Deal all,

    I would like to convert some non-numeric observations in x1 (numbers not recognized as numbers) to numeric ones. My problem is that even though I can identify those obs. and using encode, this does not solve it (destring, force -> will get rid of those values, but I need those values).
    Code:
    . encode x1, generate(newx1) label(new_x1)
    
    .
    . destring x1, gen(x2) force  
    x1: contains nonnumeric characters; x2 generated as byte
    (21644 missing values generated)
    If I do tab of newx1, you will see next that it seems to not make sense since higher values are place before lower values (see the last 4 obs.)
    Code:
                    6,75 |          2        0.02       93.91
       6,754513888888888 |          1        0.01       93.92
       6,770833333333333 |          1        0.01       93.93
       6,816666666666666 |          1        0.01       93.94
       6,833333333333333 |          5        0.04       93.98
                   6,875 |          2        0.02       93.99
                      60 |          1        0.01       94.00
                   68,75 |          1        0.01       94.01
                       7 |        115        0.91       94.91
                    7,05 |          1        0.01       94.92
    My question is how to convert those fake number into real number?

    Thanks for the help in advance.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str20 x1 long newx1 byte x2
    "35"                 699 35
    "."                    1  .
    "."                    1  .
    "."                    1  .
    "10"                  67 10
    "."                    1  .
    "16,666666666666668" 207  .
    "22,5"               387  .
    "21,416666666666668" 366  .
    "10,291666666666666"  72  .
    "."                    1  .
    "."                    1  .
    "10"                  67 10
    "26"                 485 26
    end
    label values newx1 new_x1
    label def new_x1 1 ".", modify
    label def new_x1 67 "10", modify
    label def new_x1 72 "10,291666666666666", modify
    label def new_x1 207 "16,666666666666668", modify
    label def new_x1 366 "21,416666666666668", modify
    label def new_x1 387 "22,5", modify
    label def new_x1 485 "26", modify
    label def new_x1 699 "35", modify

  • #2
    are those commas in your data the same as I would write with a decimal point? if yes, see
    Code:
    help set dp
    if no, how do I interpret 16.6666.....?

    added in edit: if you do use -set dp-, follow it with -destring-, not -encode-

    Comment


    • #3
      The problem is the commas in x1, Stata doesn't recognize the values with commas as numeric. Just replace the commas with dots:
      Code:
      replace x1 = subinstr(x1, ",", ".", 1)
      destring x1, gen(wanted)

      Comment


      • #4
        Dear Rich and Wouter, thanks for your help. It perfectly worked. I was not aware that the problem was the comma issue.

        Comment

        Working...
        X