Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to add leading zeros for string variables containing both numbers and letters

    Hi Statalisters,

    I try to add leading zeros for my string variable named cusip by using the code -gen cusipnew = string(real(cusip),"%09.0f")-.
    But the result seems not ideal because this string variable contains both numbers and letters. The result for those observations containing letters shows in missing observations. Another tricky problem is that the digit for the string variable is not congruent, which means some will add leading zeros, some will not. Is there any better solution to this questions?

    Thanks in advanced.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str17 cusip str9 cusipnew
    "361105"    "000361105"
    "361105"    "000361105"
    "361105"    "000361105"
    "361105"    "000361105"
    "361105"    "000361105"
    "361105"    "000361105"
    "361105"    "000361105"
    "361105"    "000361105"
    "361105"    "000361105"
    "886309"    "000886309"
    "02376R102" "."        
    "02376R102" "."        
    "02376R102" "."        
    "02376R102" "."        
    "02376R102" "."        
    "02376R102" "."        
    "02376R102" "."        
    "02376R102" "."        
    "02376R102" "."        
    "125141101" "125141101"
    "125141101" "125141101"
    "125141101" "125141101"
    "125141101" "125141101"
    "125141101" "125141101"
    "125141101" "125141101"
    end

  • #2
    Code:
    generate long len = strlen(cusip)
    summarize len, meanonly
    generate str cusipnew = substr("0" * r(max) + cusip, -r(max), .)
    drop len
    list, noobs separator(0)

    Comment


    • #3
      Originally posted by Joseph Coveney View Post
      Code:
      generate long len = strlen(cusip)
      summarize len, meanonly
      generate str cusipnew = substr("0" * r(max) + cusip, -r(max), .)
      drop len
      list, noobs separator(0)
      Thanks Joseph. Your method converts to numeric first and then converts back.
      I also try another code that works as well

      -gen cusipnew = substr("000000000", 1, 9 - length(cusip)) + cusip-

      which adds leading zeros directly.

      Comment


      • #4
        Code:
        gen cusipnew = substr(9 * "0", 1, 9 - length(cusip)) + cusip
        will work in Stata 16 and a few versions back.

        Comment


        • #5
          Also;
          Code:
          gen cusipnew_1 = substr(9 * "0" + cusip, -9, .)
          Code:
          gen cusipnew_2 = "0" * ( 9-strlen(cusip) ) + cusip
          And, one might use trim(cusip) in the code above.

          Comment


          • #6
            From post #3 commenting on post #2

            Your method converts to numeric first and then converts back.
            is not in fact the case. It is simply more cautious code that first calculates the maximum length of a cusip in the data, rather than assume that it is 9. So in
            Code:
            generate str cusipnew = substr("0" * r(max) + cusip, -r(max), .)
            if the longest cusip is 9 characters long, r(max) will be 9 and the command becomes
            Code:
            generate str cusipnew = substr("0" * 9 + cusip, -9, .)
            as seen later in post #5.
            Last edited by William Lisowski; 23 Feb 2020, 08:08.

            Comment


            • #7
              Thank you guys for your kindly sharing, these are helpful!!

              Comment

              Working...
              X