Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Convert "string" ID to numeric with non-numeric characters

    I am trying to convert a non-numeric string ID variable to a numeric variable but I get an error message
    "too many values". after I use: encode id, gen(copy_id)

    My dataset has >75,000 and I understand this may be part of the problem. Please help because I cant do subsequent analysis if this ID variable is not converted to a numeric variable. I have more than 10 datasets to merge on this ID. I cant extract the data using dataex because I am working in a secure environment where I cant copy the data except as an image below.

    Example ID's are shown below.

    Click image for larger version

Name:	IDs.PNG
Views:	1
Size:	141.0 KB
ID:	1664940


    Thanks.

  • #2
    Unfortunately you have run up against the limit of the number of codings allowed with one label (65,536 -- see "help limits" for more limits). You can generate an integer for each unique value using:

    Code:
    egen wanted = group(id)
    Which should suffice for your purposes, but you won't be able to assign a value label to the values.

    Comment


    • #3
      My only concern is if I run the syntax on other datasets, will it create the same "wanted" ID in other datasets to enable me to merge on ID.

      Comment


      • #4
        The best way is to create a so-called crosswalk dataset (a master dataset of IDs) by (i) appending the ID variable for all eleven datasets, (ii) then contracting the resulting appended dataset and (iii) finally generate-ing a numeric co-primary key based upon the row number. You can then use that master ID datasets' row-based numerical identifier in all subsequent work with the eleven datasets simply by merge-ing on it whenever the numerical identifier is needed.

        You could also try a hash function (for example, from Mata) and take your chances with collisions. (In your case, I reckon that you'd be fairly safe.)

        Comment

        Working...
        X