Dear Statalisters,
I have a string variable called "oldcodes" and like other forum questions I have read, stata removed the leading zeroes. I have 4 characters in oldcodes variable and I want to keep the number of characters this way and I have 3 characters and want to keep it as it is, however when I imported these characters from excel "004" "005" "006" "021" "022" "023" , stata removed the leading zeroes and the oldcodes var with 3 characters became "4" "5" "6" "21" "22" "23". I used a command gen oldcodes=substr ("0" * 4 + newcodes, -4, .) to keep the 4 characters, however, character "014A" "014B" "014C" became "14A" "14B" "14C" and ofcourse the characters "21" "22" "23" became "0021" "0022" "0023". Please know that I have around 4,879 observations and about 400 of these have leading zeroes which are omitted. How can I keep the 4 characters as it is and keep the 3 characters with leading zeroes?
Perhaps, as a side note. I will change this variable later into a numeric variable as I will need to use it as an ID variable for panel data set. Therefore, I think I need these characters to be the same before I encode it to numeric var. I greatly appreciate your help.
Code: gen oldcodes = substr("0" * 4 + newcodes, -4, .)
* Example generated by -dataex-. For more info, type help dataex
clear
input str5 oldcodes str4 newcodes
"901A" "901A"
"901B" "901B"
"901C" "901C"
"901D" "901D"
"902Z" "902Z"
"4" "0004"
"5" "0005"
"6" "0006"
"21" "0021"
"22" "0022"
"23" "0023"
"014A" "14A"
"014B" "14B"
"014C" "14C"
"A05A" "A05A"
"A05B" "A05B"
"A06Z" "A06Z"
"A07A" "A07A"
"A07B" "A07B"
"A07C" "A07C"
"A11A" "A11A"
"A11B" "A11B"
"A11C" "A11C"
"A11D" "A11D"
"A13A" "A13A"
"A13B" "A13B"
"A13C" "A13C"
"A13D" "A13D"
"A13E" "A13E"
"A15A" "A15A"
"A15B" "A15B"
"A15C" "A15C"
"A16Z" "A16Z"
"A17A" "A17A"
"A17B" "A17B"
"A18Z" "A18Z"
"A36A" "A36A"
"A36B" "A36B"
"A42A" "A42A"
"A42B" "A42B"
Sincerely,
Rosie Syy
I have a string variable called "oldcodes" and like other forum questions I have read, stata removed the leading zeroes. I have 4 characters in oldcodes variable and I want to keep the number of characters this way and I have 3 characters and want to keep it as it is, however when I imported these characters from excel "004" "005" "006" "021" "022" "023" , stata removed the leading zeroes and the oldcodes var with 3 characters became "4" "5" "6" "21" "22" "23". I used a command gen oldcodes=substr ("0" * 4 + newcodes, -4, .) to keep the 4 characters, however, character "014A" "014B" "014C" became "14A" "14B" "14C" and ofcourse the characters "21" "22" "23" became "0021" "0022" "0023". Please know that I have around 4,879 observations and about 400 of these have leading zeroes which are omitted. How can I keep the 4 characters as it is and keep the 3 characters with leading zeroes?
Perhaps, as a side note. I will change this variable later into a numeric variable as I will need to use it as an ID variable for panel data set. Therefore, I think I need these characters to be the same before I encode it to numeric var. I greatly appreciate your help.
Code: gen oldcodes = substr("0" * 4 + newcodes, -4, .)
* Example generated by -dataex-. For more info, type help dataex
clear
input str5 oldcodes str4 newcodes
"901A" "901A"
"901B" "901B"
"901C" "901C"
"901D" "901D"
"902Z" "902Z"
"4" "0004"
"5" "0005"
"6" "0006"
"21" "0021"
"22" "0022"
"23" "0023"
"014A" "14A"
"014B" "14B"
"014C" "14C"
"A05A" "A05A"
"A05B" "A05B"
"A06Z" "A06Z"
"A07A" "A07A"
"A07B" "A07B"
"A07C" "A07C"
"A11A" "A11A"
"A11B" "A11B"
"A11C" "A11C"
"A11D" "A11D"
"A13A" "A13A"
"A13B" "A13B"
"A13C" "A13C"
"A13D" "A13D"
"A13E" "A13E"
"A15A" "A15A"
"A15B" "A15B"
"A15C" "A15C"
"A16Z" "A16Z"
"A17A" "A17A"
"A17B" "A17B"
"A18Z" "A18Z"
"A36A" "A36A"
"A36B" "A36B"
"A42A" "A42A"
"A42B" "A42B"
Sincerely,
Rosie Syy
Comment