Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Loop Encode but ignore "."

    Hello,

    I have a dataset with ~881k observations and 621 variables that I'm just beginning to clean.

    The vast majority of the data is numeric but I have a chunk of 87 variables that are stored as string and contain either "yes", "no" or ".". I can encode them in a loop using the below but this also encodes "." as a number.

    I have read the help files on encode and searched several posts and articles but I'm beginning to think it either isn't possible and I need to approach it another way or i'm missing something plainly obvious that I just cannot see?

    foreach v of varlist var1-var87{
    encode `v', generate (new`v')
    drop `v'
    rename new`v' `v'
    }

    Any advice would be appreciated.

    I'm using STATA 16 SE

    Aidan

  • #2
    Code:
    foreach v of varlist var1-var87{
        replace `v' = "" if `v' == "."
        encode `v', generate (new`v')
        drop `v'
        rename new`v' `v'
    }
    Will do it. One caution. If the values of these variables, apart from "." are "yes" and "no", then -encode-, which works in alphabetic order, will turn encode yes as 2 and no as 1. That's not going to be very useful if you have to use these variables in analyses. I recommend that you instead first create a value label that has yes as 1 and no as 0, and then force -encode- to use that:

    Code:
    label define yesno 0 "no" 1 "yes"
    foreach v of varlist var1-var87{
        replace `v' = "" if `v' == "."
        encode `v', generate (new`v') label(yesno)
        drop `v'
        rename new`v' `v'
    }

    Comment


    • #3
      Thanks Clyde, and thankyou for the recommendation!

      Comment

      Working...
      X