Problem with individual ID when importing an excel sheet to stata

Salma Nooh

Join Date: Jun 2020

Posts: 31
#1

Problem with individual ID when importing an excel sheet to stata

02 Aug 2020, 05:23

Hi. I have an excel sheet that includes individual ID variable. Whenever I import the excel sheet to stata, the individual ID observations get rounded up, even though my variable is string.

This is my command:

Code:

import excel "/Users/salmanouh/Desktop/New_excel.xlsx", sheet("Sheet1") firstrow allstring clear

and this is an example of my dataset
input str23 indid
"1812010001101"
"1812010002101"
"18120100021_180201"
"18120100021_180202"
"18120100021_180203"
"18120100021_180301"
"18120100021_180302"
"18120100021_180303"
"18120100021_180304"
"1812010003101"
"1812010003102"
"1812010003103"
"1812010003104"
"1812010004101"
"1812010004102"
"1812010004103"
"1812010004104"
"1812010004105"
"18120100041_180301"
"18120100041_180302"
"1812010005101"
"1812010005102"
"1812010005103"
"1812010005104"
"1812010005105"
"1812010006101"
"1812010006102"
"1812010006103"
"1812010006104"
"1812010006105"
"1812010006106"
"1812010006107"
"1812010006108"
"1812010006109"
"1812010006110"
"1812010006111"
"1812010008101"
"1812010008102"
"1812010008103"
"1812010008104"
"1812010009101"
"1812010009102"
"18120100091_180301"
"18120100091_180302"
"18120100091_180303"
"1812010010101"
"1812010010102"
"1812010010103"
"1812010010104"
"1812010010105"
"1812010010106"
"1812010011101"
"1812010011102"
"1812010011103"
"1812010011104"
"1812010011105"
"1812010012101"
"1812010012102"
"1812010012103"
"1812010012104"
"1812010013101"
"1812010013102"
"1812010013103"
"1812010013104"
"1812010013105"
"1812010013106"
"1812010014101"
"1812010014102"
"1812010014103"
"1812010014104"
"1812010014105"
"1812010015101"
"1812010015102"
"1812010015103"
"1812010015104"
"1812010017101"
"1812010017102"
"1812010017103"
"18120100171_180301"
"18120100171_180302"
"18120100171_180303"
"18120100171_180501"
"18120100171_180502"
"18120100171_180503"
"18120100171_180504"
"1812010018101"
"1812010018102"
"1812010019101"
"1812010019102"
"1812010019103"
"1812010019104"
"1812010019105"
"1812010019106"
"1812010019107"
"1812010024101"
"1812010024102"
"1812010024103"
"1812010024104"
"1812010024105"
"1812010024106"
end

The dataex doesn't show the observations that get rounded up, but in the remaining dataset, I have observations such as 1.81802210360e+16. Does anyone know how to exactly show the individual ID?
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35810
#2

02 Aug 2020, 05:29

The longest string value you show is 18 characters long so should not be a problem. In contrast

1.81802210360e+16

is a numeric value, so how do you get that from a string identifier?

What is the longest string value you have within MS Excel?
Comment
Salma Nooh

Join Date: Jun 2020

Posts: 31
#3

02 Aug 2020, 06:12

Thank you for your reply. The longest string value I have within excel is 23 characters. Regarding your first question, I am not quite sure, I have formatted the individual ID in excel as "Number" since this is the only format that shows me the entire value of the ID. I then imported it using the command I posted including "allstring"
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35810
#4

02 Aug 2020, 07:36

I don't know enough about what MS Excel does to say more helpfully.
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

02 Aug 2020, 08:56

I am not sure that Excel is showing you the entire value of your individual ID.

I opened a new Excel workbook, formatted column A to be number with zero digits to the right of the decimal point, then typed in the 23-digit number

Code:

12345678901234567890123

which Excel then displayed as

Code:

12345678901234500000000

so it had 15 digits of precision and then was rounded.

In general, 15 digits of precision is the limit for most applications, if they use the standard data storage formats. Below are the limits on storage of decimal integers with full accuracy in the various numeric storage types. The fixed-point variables lose the 27 largest positive values to missing value codes; the similar loss for floating point variables occurs only for the largest exponent, so it doesn't affect the much smaller integer values.

byte - 7 bits	-127	100
int - 15 bits	-32,767	32,740
long - 31 bits	-2,147,483,647	2,147,483,620
float - 24 bits	-16,777,216	16,777,216
double - 53 bits	-9,007,199,254,740,992	9,007,199,254,740,992

Both Excel and Stata "double" formats are limited to 15 digits of precision. In general, identification numbers - especially long ones - should be stored as strings rather than numbers.

So you need to go back to Excel and (a) see if your ID numbers are as accurate as you think they are, in which case they are likely stored as text rather than numbers, and (b) ensure that they are stored as text and exported as such, because the problem you are having is in your Excel data, not in how Stata imports it.

Comment

Salma Nooh

Join Date: Jun 2020

Posts: 31
#6

02 Aug 2020, 12:44

William Lisowski Thanks a lot for your reply! As you exactly mentioned, I just figured out that the ID values are not accurate on excel to start with, and that excel replaced the last digits to zeros as in your example. I guess I will need to figure out how to keep the original ID values in excel before I import them to Stata
Comment

Announcement