numeric vs string identifier

natalia malancu

Join Date: Apr 2014

Posts: 110
#1

numeric vs string identifier

01 Jan 2016, 16:07

hi guys!

this is technically not a stata question. reading N.Cox's (2002) article "Speaking Stata: On numbers and strings" The Stata Journal (2002) 2, Number 3, pp. 314–329, I started wondering which are those cases in which a numerical identifier is really a must. The dataset I am currently using provides both (pid - Person identifier in numeric format and xwaveid - identifier meant to help match ppl across waves, in string format; the only difference between the two - a leading zero, e.g. xwaveid 0100003 , pid 100003; the code from the automated merging .do: gen long pid=real(xwaveid); label var pid "XWAVEID as long integer) and I can't think of a reason to chose one over the other.

thanks,
natalia

http://ageconsearch.umn.edu

Last edited by natalia malancu; 01 Jan 2016, 16:16.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35694
#2

01 Jan 2016, 17:32

Stata tends to prefer numeric identifiers where there is a choice. For example tsset and xtset insist on numeric identifiers.

Another broad issue that can bite is the storage required. For example, a 9 integer identifier fits in 4 bytes as a long but needs 9 bytes as a str9.

Last edited by Nick Cox; 01 Jan 2016, 18:08.
Comment
natalia malancu

Join Date: Apr 2014

Posts: 110
#3

02 Jan 2016, 04:21

thanks nick.
Comment

Announcement