Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Non-numeric string variable with missing value to numeric variable that recognises the missing value

    I have a string variable in the form as below:

    delivery

    air
    ship
    ship
    air
    air
    NA
    NA
    ship
    air
    air

    I am trying to convert the above strings with non-numeric values such that air is assigned a label value of 0 and ship is assigned a label value of 1, while also recognising NA as missing value.

    So far I have tried destring/generate, resulting in the following error:

    [destring delivery, ignore ("NA") gen (delivery_new)]

    [delivery: contains characters not specified in ignore(); no generate]

    Similarly, I tried encode, which converted strings to numeric successfully, but still does not recognise the missing values:

    [label define deliverylabel 0 "ship" 1 "air"
    encode delivery, gen (delivery_new) label (deliverylabel)]

    I am not sure what is the best way to tackle this. Any insights will be hugely appreciated.

    Thanks,
    Keshab





  • #2
    Try something like the following
    Code:
    label define Modes 0 air 1 ship .n NA
    encode delivery, generate(delivery_new) label(Modes)

    Comment


    • #3
      @Joseph Coveney: Thanks much; that did work quite nicely after replacing .n with "."

      I am loving this forum as it is making the start of my stata journey much more effective.

      Comment


      • #4
        Originally posted by Keshab Parajuli View Post
        that did work quite nicely after replacing .n with "."
        Thanks for reporting back. Your statement seems a bit strange, though; you cannot define a label for system missing values. Thus,

        Code:
        label define Modes 0 air 1 ship . NA
        will exit with the error message

        Code:
        may not label .
        You probably meant that you followed the encode command with something like

        Code:
        replace delivery_new = . if delivery_new == .n
        which is usually not necessary because extended missing values are treated the same way as system missing values (one exception is multiple imputation, where extended missing values are treated differently).

        Best
        Daniel

        Comment

        Working...
        X