Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cleaning Raw Data--How to edit non-numeric data records using commands

    Hello!

    I'm sure this has been addressed before, but I have been unable to find a forum post or youtube video on this specific topic. My apologies!

    I'm currently cleaning a raw data set. The data is from a mail survey and write in responses were coded verbatim by undergraduate research assistants. Consequently, in a number of cases, I have a few non-numeric values in what should be a numeric variable. I'm wondering what commands will allow me to edit these records.

    As a specific example, in variable "acresowned", I have been trying to convert an entry of "5000ish" to "5000." The recode and replace commands, which I hoped would work, appear to be unable to edit the non-numeric records--"type mismatch" is the error. At the moment, I've resorted to manual edits in the data editor, but the lack of documented changes in a do-file make me uncomfortable.


    Thank you!
    Matthew Houser
    Stata 15.0, Mac



  • #2
    since you haven't shown us your exact (attempted) code, I can't be sure but it appears you are not using -replace- (at least) correctly; the following certainly works:
    Code:
    replace var="5000" if var=="5000ish"
    destring var,replace
    that is, while the variable is a string variable, you need to use double quotes - once all values are "really" numeric, you can use -destring-

    added: the current version is 15.1 and the update is free; do it; see
    Code:
    help update

    Comment


    • #3
      Rich,

      Thanks, this did the trick. I was missing the quotes around the 5000.

      Much appreciated!

      Matthew

      Comment


      • #4
        As a general comment, Stata has a range of string functions that are helpfully described in the output of -help string functions-.

        Comment


        • #5
          If you run into this problem again, you can also use the following functions:

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str8 var1
          "5000ish" 
          "40"      
          "maybe 10"
          "100"     
          "1OO" 
          "~60"
          "70.6"    
          end
          1) using regular expressions:
          Code:
          gen clean1=ustrregexra(var1, "[a-zA-Z~]", "")  //remove only alpha characters & "~"
          destring clean1, gen(ok1)
          2) using sieve from egenmore (available from SSC)
          Code:
          ssc install egenmore
          egen clean2=sieve(var1), char(0123456789.)
          destring clean2, gen(ok2)

          Stata/MP 14.1 (64-bit x86-64)
          Revision 19 May 2016
          Win 8.1

          Comment

          Working...
          X