Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to generate one and two leading zeros into a variable while keeping other characters in a variable the same?

    Dear Statalisters,

    I have a string variable called "oldcodes" and like other forum questions I have read, stata removed the leading zeroes. I have 4 characters in oldcodes variable and I want to keep the number of characters this way and I have 3 characters and want to keep it as it is, however when I imported these characters from excel "004" "005" "006" "021" "022" "023" , stata removed the leading zeroes and the oldcodes var with 3 characters became "4" "5" "6" "21" "22" "23". I used a command gen oldcodes=substr ("0" * 4 + newcodes, -4, .) to keep the 4 characters, however, character "014A" "014B" "014C" became "14A" "14B" "14C" and ofcourse the characters "21" "22" "23" became "0021" "0022" "0023". Please know that I have around 4,879 observations and about 400 of these have leading zeroes which are omitted. How can I keep the 4 characters as it is and keep the 3 characters with leading zeroes?

    Perhaps, as a side note. I will change this variable later into a numeric variable as I will need to use it as an ID variable for panel data set. Therefore, I think I need these characters to be the same before I encode it to numeric var. I greatly appreciate your help.

    Code: gen oldcodes = substr("0" * 4 + newcodes, -4, .)

    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str5 oldcodes str4 newcodes
    "901A" "901A"
    "901B" "901B"
    "901C" "901C"
    "901D" "901D"
    "902Z" "902Z"
    "4" "0004"
    "5" "0005"
    "6" "0006"
    "21" "0021"
    "22" "0022"
    "23" "0023"
    "014A" "14A"
    "014B" "14B"
    "014C" "14C"
    "A05A" "A05A"
    "A05B" "A05B"
    "A06Z" "A06Z"
    "A07A" "A07A"
    "A07B" "A07B"
    "A07C" "A07C"
    "A11A" "A11A"
    "A11B" "A11B"
    "A11C" "A11C"
    "A11D" "A11D"
    "A13A" "A13A"
    "A13B" "A13B"
    "A13C" "A13C"
    "A13D" "A13D"
    "A13E" "A13E"
    "A15A" "A15A"
    "A15B" "A15B"
    "A15C" "A15C"
    "A16Z" "A16Z"
    "A17A" "A17A"
    "A17B" "A17B"
    "A18Z" "A18Z"
    "A36A" "A36A"
    "A36B" "A36B"
    "A42A" "A42A"
    "A42B" "A42B"


    Sincerely,
    Rosie Syy

  • #2
    Is this perhaps the result you wanted?
    Code:
    . generate wanted = oldcodes
    
    . replace wanted = substr("000"+wanted,-3,.) if length(wanted)<4
    (6 real changes made)
    
    . list oldcodes wanted, clean noobs
    
        oldcodes   wanted  
            901A     901A  
            901B     901B  
            901C     901C  
            901D     901D  
            902Z     902Z  
               4      004  
               5      005  
               6      006  
              21      021  
              22      022  
              23      023  
            014A     014A  
            014B     014B  
            014C     014C  
            A05A     A05A  
            A05B     A05B  
            A06Z     A06Z  
            A07A     A07A  
            A07B     A07B  
            A07C     A07C  
            A11A     A11A  
            A11B     A11B  
            A11C     A11C  
            A11D     A11D  
            A13A     A13A  
            A13B     A13B  
            A13C     A13C  
            A13D     A13D  
            A13E     A13E  
            A15A     A15A  
            A15B     A15B  
            A15C     A15C  
            A16Z     A16Z  
            A17A     A17A  
            A17B     A17B  
            A18Z     A18Z  
            A36A     A36A  
            A36B     A36B  
            A42A     A42A  
            A42B     A42B  
    
    .

    Comment


    • #3
      Your post is a little confusing, because you talk about creating a variable oldcodes from newcodes, which seems odd, so I don't know exactly what you are starting with and what you are ending up with. It looks like neither oldcodes nor newcodes satisfies your requirement of being 4 characters long, packed with leading zeroes. The code to get that from either oldcodes or newcodes is, however, exactly what you tried, and I don't see any reason why it should not have worked for you. It works for me:

      Code:
      . gen zero_packed_4_chars1 = substr("0"*4+oldcodes, -4, .)
      
      . gen zero_packed_4_chars2 = substr("0"*4+newcodes, -4, .)
      
      . assert zero_packed_4_chars1 == zero_packed_4_chars2 & strlen(zero_packed_4_chars1) == 4
      
      . list, noobs clean
      
          oldcodes   newcodes   zero_p~1   zero_p~2  
              901A       901A       901A       901A  
              901B       901B       901B       901B  
              901C       901C       901C       901C  
              901D       901D       901D       901D  
              902Z       902Z       902Z       902Z  
                 4       0004       0004       0004  
                 5       0005       0005       0005  
                 6       0006       0006       0006  
                21       0021       0021       0021  
                22       0022       0022       0022  
                23       0023       0023       0023  
              014A        14A       014A       014A  
              014B        14B       014B       014B  
              014C        14C       014C       014C  
              A05A       A05A       A05A       A05A  
              A05B       A05B       A05B       A05B  
              A06Z       A06Z       A06Z       A06Z  
              A07A       A07A       A07A       A07A  
              A07B       A07B       A07B       A07B  
              A07C       A07C       A07C       A07C  
              A11A       A11A       A11A       A11A  
              A11B       A11B       A11B       A11B  
              A11C       A11C       A11C       A11C  
              A11D       A11D       A11D       A11D  
              A13A       A13A       A13A       A13A  
              A13B       A13B       A13B       A13B  
              A13C       A13C       A13C       A13C  
              A13D       A13D       A13D       A13D  
              A13E       A13E       A13E       A13E  
              A15A       A15A       A15A       A15A  
              A15B       A15B       A15B       A15B  
              A15C       A15C       A15C       A15C  
              A16Z       A16Z       A16Z       A16Z  
              A17A       A17A       A17A       A17A  
              A17B       A17B       A17B       A17B  
              A18Z       A18Z       A18Z       A18Z  
              A36A       A36A       A36A       A36A  
              A36B       A36B       A36B       A36B  
              A42A       A42A       A42A       A42A  
              A42B       A42B       A42B       A42B  
      
      .
      Added: Crossed with #2.

      Comment


      • #4
        Wow! Thank you so much William Lisowski you`re my savior. It worked. Thank you.

        Comment


        • #5
          Clyde Schechter Thank you for your response and apologies for the confusion it might have caused you. I wanted to be precise as possible of the result I wanted to have and the problem I have- to keep the 4 characters as it is and keep 3 characters as it is with leading zeroes which stata removed. William Lisowski provided the missing infos I couldn`t figured out. Thank you so much for all your help and quick response. This forum is very helpful. Thank you.

          Comment

          Working...
          X