Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • I have the problem of converting string data into numerical data in STATA

    Hi all
    I am a new user of STATA. Today, I have the problem of converting string data into numerical data because data viewers show that destring longtermdebt, generate(longtermdebt_n) longtermdebt: contains nonnumeric characters; no generate. ( longtermdbet is my variable name). Also, I type browse if real( longtermdebt)==. Then, in data editor browse, most of data is shown as NA. I am so confused about that because most of my data are actually numerical, but is not converted to string data successfully.
    Could someone help me?

    Thanks

  • #2
    Export your data in excel (or probably they already come from an excel file) and remove all the NA entries, leaving the cell empty. Then import the data in stata again and everything will work. At the moment your data should come out in red, shouldn't they?

    Dario

    Comment


    • #3
      There is no need to go via excel.
      With a data example:
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str2 var1
      "1"
      "2"
      "3"
      "NA"
      "5"
      "6"
      "NA"
      "8"
      "9"
      "10"
      end
      Then do:
      Code:
      replace var1="" if var1=="NA"
      destring var1, gen(var1numeric)
      Or alternatively:
      Code:
      replace var1="" if var1=="NA"
      destring var1, replace
      Or:
      Code:
      destring var1, gen(var1numeric) force
      Which will create missings for each of your observations that previously had non-numeric values
      Last edited by Jorrit Gosens; 16 Aug 2018, 09:45.

      Comment


      • #4
        Welcome to Statalist.

        Here is another approach.
        Code:
        destring longtermdebt, generate(longtermdebt_n) force
        tabulate longtermdebt if missing(longtermdebt_n)
        This will cause destring to ignore the "NA" values, replacing them - and any other non-numeric data - with missing values when it creates longtermdebt_n. Then the tabulate command will allow you to be sure there wasn't something else that also caused problems, such as a number with a comma in it. We will hope that the tabulate will confirm that the only non-numeric data was the "NA".

        Added in edit: crossed in cyberspace with Jorrit's more complete advice.

        Comment

        Working...
        X