Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with individual ID when importing an excel sheet to stata

    Hi. I have an excel sheet that includes individual ID variable. Whenever I import the excel sheet to stata, the individual ID observations get rounded up, even though my variable is string.

    This is my command:
    Code:
    import excel "/Users/salmanouh/Desktop/New_excel.xlsx", sheet("Sheet1") firstrow allstring clear
    and this is an example of my dataset
    input str23 indid
    "1812010001101"
    "1812010002101"
    "18120100021_180201"
    "18120100021_180202"
    "18120100021_180203"
    "18120100021_180301"
    "18120100021_180302"
    "18120100021_180303"
    "18120100021_180304"
    "1812010003101"
    "1812010003102"
    "1812010003103"
    "1812010003104"
    "1812010004101"
    "1812010004102"
    "1812010004103"
    "1812010004104"
    "1812010004105"
    "18120100041_180301"
    "18120100041_180302"
    "1812010005101"
    "1812010005102"
    "1812010005103"
    "1812010005104"
    "1812010005105"
    "1812010006101"
    "1812010006102"
    "1812010006103"
    "1812010006104"
    "1812010006105"
    "1812010006106"
    "1812010006107"
    "1812010006108"
    "1812010006109"
    "1812010006110"
    "1812010006111"
    "1812010008101"
    "1812010008102"
    "1812010008103"
    "1812010008104"
    "1812010009101"
    "1812010009102"
    "18120100091_180301"
    "18120100091_180302"
    "18120100091_180303"
    "1812010010101"
    "1812010010102"
    "1812010010103"
    "1812010010104"
    "1812010010105"
    "1812010010106"
    "1812010011101"
    "1812010011102"
    "1812010011103"
    "1812010011104"
    "1812010011105"
    "1812010012101"
    "1812010012102"
    "1812010012103"
    "1812010012104"
    "1812010013101"
    "1812010013102"
    "1812010013103"
    "1812010013104"
    "1812010013105"
    "1812010013106"
    "1812010014101"
    "1812010014102"
    "1812010014103"
    "1812010014104"
    "1812010014105"
    "1812010015101"
    "1812010015102"
    "1812010015103"
    "1812010015104"
    "1812010017101"
    "1812010017102"
    "1812010017103"
    "18120100171_180301"
    "18120100171_180302"
    "18120100171_180303"
    "18120100171_180501"
    "18120100171_180502"
    "18120100171_180503"
    "18120100171_180504"
    "1812010018101"
    "1812010018102"
    "1812010019101"
    "1812010019102"
    "1812010019103"
    "1812010019104"
    "1812010019105"
    "1812010019106"
    "1812010019107"
    "1812010024101"
    "1812010024102"
    "1812010024103"
    "1812010024104"
    "1812010024105"
    "1812010024106"
    end

    The dataex doesn't show the observations that get rounded up, but in the remaining dataset, I have observations such as 1.81802210360e+16. Does anyone know how to exactly show the individual ID?

  • #2
    The longest string value you show is 18 characters long so should not be a problem. In contrast

    1.81802210360e+16

    is a numeric value, so how do you get that from a string identifier?

    What is the longest string value you have within MS Excel?

    Comment


    • #3
      Thank you for your reply. The longest string value I have within excel is 23 characters. Regarding your first question, I am not quite sure, I have formatted the individual ID in excel as "Number" since this is the only format that shows me the entire value of the ID. I then imported it using the command I posted including "allstring"

      Comment


      • #4
        I don't know enough about what MS Excel does to say more helpfully.

        Comment


        • #5
          I am not sure that Excel is showing you the entire value of your individual ID.

          I opened a new Excel workbook, formatted column A to be number with zero digits to the right of the decimal point, then typed in the 23-digit number
          Code:
          12345678901234567890123
          which Excel then displayed as
          Code:
          12345678901234500000000
          so it had 15 digits of precision and then was rounded.

          In general, 15 digits of precision is the limit for most applications, if they use the standard data storage formats. Below are the limits on storage of decimal integers with full accuracy in the various numeric storage types. The fixed-point variables lose the 27 largest positive values to missing value codes; the similar loss for floating point variables occurs only for the largest exponent, so it doesn't affect the much smaller integer values.

          byte - 7 bits -127 100
          int - 15 bits -32,767 32,740
          long - 31 bits -2,147,483,647 2,147,483,620
          float - 24 bits -16,777,216 16,777,216
          double - 53 bits -9,007,199,254,740,992 9,007,199,254,740,992

          Both Excel and Stata "double" formats are limited to 15 digits of precision. In general, identification numbers - especially long ones - should be stored as strings rather than numbers.

          So you need to go back to Excel and (a) see if your ID numbers are as accurate as you think they are, in which case they are likely stored as text rather than numbers, and (b) ensure that they are stored as text and exported as such, because the problem you are having is in your Excel data, not in how Stata imports it.

          Comment


          • #6
            William Lisowski Thanks a lot for your reply! As you exactly mentioned, I just figured out that the ID values are not accurate on excel to start with, and that excel replaced the last digits to zeros as in your example. I guess I will need to figure out how to keep the original ID values in excel before I import them to Stata

            Comment

            Working...
            X