Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate string identifier variable

    Hello, trying to create a coded ID based on a string variable that is 18 characters long with over 58 million unique observations. I tried just generating a variable using "_n" and group options but they both result in a numeric field. I then tried "tostring, replace" on the new variable but it is hiccuping and leaving me with duplicate values. I welcome any suggestions. The goal is to "anonymize" a billing ID.

  • #2
    You don't show explicit code - contrary to our FAQ Advice #12 -- but my guess is that you did something

    Code:
    gen obsno = _n 
    That's legal to Stata and it won't complain but (unless you have changed a default) you have there a float variable which won't hold observation numbers uniquely, which bites when you try to convert to string.

    Code:
    gen wanted = strofreal(_n)
    might work directly but

    Code:
    gen long wanted = _n 
    tostring wanted, replace 
    should work too.

    Comment


    • #3
      My apologies, I have no doubt it was my fault/lack of skill. The coding I used after confirming no duplicates in my tcn field (set as a string variable) was:

      gen c2tcn=_n(tcn)
      tostring c2tcn, replace
      duplicates tag c2tcn, gen(dup)

      and then I found the new duplicate values.

      My second attempt was:
      egen c2tcn=group(tcn)

      I'll try your suggestions, thank you.

      Comment


      • #4
        Gen long... was the winner. Thank you!

        Comment


        • #5
          Good that you solved your problem but

          _n(tcn)

          is puzzling syntax. _n isn't a function; it's a kind of inbuilt variable.

          Comment

          Working...
          X