Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to stop stata converting numeric variables to string when importing from excel

    Hi All,

    I am importing a spreadsheet from excel. Many variables are numeric in excel, but when imported to Stata they become string. These variables contain decimal points and therefore it is difficult (??impossible) to destring?

    How can I prevent the variables being converted to string on import? Alternatively, how do you destring variables which contain decimals?

    Many thanks,
    Simone

  • #2
    Stata does not convert variables to string, unless it is needed.

    See the data example below. This was imported from excel, with decimal point and thousand separators as formatting options in excel.
    Stata recognizes variable A as numeric values, keeps them that way, and even preserves the format/display options in Stata.
    Variable B is imported as string because there is some text value in one of the observations, and Stata wants to preserve the observation until you tell it to drop it.
    Decimal points or thousand separators are not an issue, text values are.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double A str6 B
    5000.3 "5000.3"
    5000.3 "n/a"   
    end
    After copying above data example, you can run the following code to see what happens:
    Code:
    destring(B), generate(C)
    destring(B), generate(C) force
    If some or all of your data, which you think is numeric is being imported as string, then have another look at your data. Some values will in fact have text values. Identifying them can be done with e.g.,:
    Code:
     list B C if C==.
    (following on the data example and code above)

    If you are sure that all of your variables should be numeric but are imported as stirng, as in the example:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double A str6(B C D)
    5000.3 "5000.3" "5000.3" "5000.3"
    5000.3 "n/a"    "n/a"    "n/a"   
    end
    Then you can destirng all of it by doing:
    Code:
    destring, replace force
    But note that you need to be really sure of what your data contains before you do this.

    Lastly, note the use of dataex in this answer, and learn how to use it when you post a question here. It makes answering questions much easier, and improves your chances for a helpful answer. Read more in the FAQ: https://www.statalist.org/forums/help#stata

    Comment


    • #3
      Jorrit makes all the key points, but extra spin in possible.. A short answer to

      How can I prevent the variables being converted to string on import?
      is: Use exactly the same rules within Excel for your data as used by Stata. That may not be practical or congenial, but those are other matters.

      However, as I understand it, Excel has no concept of numeric variable insofar as -- given numbers or text in a column -- the contents of other cells in a column can still be whatever you like them to be.

      It's not clear what causes Simone problems on import, and it won't be decimal points alone.
      Last edited by Nick Cox; 25 Jun 2018, 01:55.

      Comment


      • #4
        Thank you, Jorrit and Nick. The force command has been useful. Use of dataex noted. Thanks again.

        Simone

        Comment

        Working...
        X