Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • numeric vs string identifier

    hi guys!

    this is technically not a stata question. reading N.Cox's (2002) article "Speaking Stata: On numbers and strings" The Stata Journal (2002) 2, Number 3, pp. 314–329, I started wondering which are those cases in which a numerical identifier is really a must. The dataset I am currently using provides both (pid - Person identifier in numeric format and xwaveid - identifier meant to help match ppl across waves, in string format; the only difference between the two - a leading zero, e.g. xwaveid 0100003 , pid 100003; the code from the automated merging .do: gen long pid=real(xwaveid); label var pid "XWAVEID as long integer) and I can't think of a reason to chose one over the other.


    thanks,
    natalia
    Last edited by natalia malancu; 01 Jan 2016, 16:16.

  • #2
    Stata tends to prefer numeric identifiers where there is a choice. For example tsset and xtset insist on numeric identifiers.

    Another broad issue that can bite is the storage required. For example, a 9 integer identifier fits in 4 bytes as a long but needs 9 bytes as a str9.
    Last edited by Nick Cox; 01 Jan 2016, 18:08.

    Comment


    • #3
      thanks nick.

      Comment

      Working...
      X