Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • convert hexadecimal to binary

    I have variables whose values come as hexadecimal strings 32 characters long as a rusult of an md5 hash function to an original string. I need to convert those values into 128 bit long binary numbers.
    I have been searching the forum but don't seem to be able to find a command that performs this apparently simple task in Stata. Most likely I am overlooking it. Any help?.
    Thanks Luca

  • #2
    Check out deci (https://ideas.repec.org/c/boc/bocode/s457195.html)

    Code:
    ssc install deci
    help deci

    Comment


    • #3
      I had checked that out already, Deci does not take hex as input but thanks anyway

      Comment


      • #4
        From their online help:

        Code:
            . clear all
        
            . set obs 1000
        
            . gen x1=int(1000*runiform())
        
            . gen str3 x2=string(x1)
        
            . Convert variable 'x1' from decimal system to binary system and generate a new veriable 'x1_1':
        
            . deci x1, f(10) t(2) gen(x1_1)
        
            . Convert variable 'x2' from decimal system to octal system:
        
            . deci x2, f(10) t(8)
        
            . Convert variable 'x2' from octal system to binary system and generate a new veriable 'x2_1':
        
            . deci x2, f(8) t(2) g(x2_1)
        
            . Convert variable 'x2_1' from binary system to hexadecimal system:
        
            . deci x2_1, f(2) t(16)
        Then I tried:
        Code:
        deci x2_1, f(16) t(2) g(x3)
        and it seems to work.

        Again, sorry if it is not what you were talking above. I do hope other users will be able to render help. Good day.
        Last edited by Ken Chui; 31 Mar 2021, 08:42.

        Comment


        • #5
          In general Stata stores numeric values in 32 bits (four bytes: float, int) and 64 bits (eight bytes: double, long). It does not appear possible to store a 128-bit numeric value in Stata.

          Comment


          • #6
            William is correct - Stata will not natively be able to store such large numbers as numeric representations. Stata has -inbase- and -inten- commands that are useful when the numbers do not need decimal representation. For your task, you would need to write your own way to store and parse the numbers as strings. Here I show you a way that works specifically for base-16 to base-2 conversion, requiring strings as inputs and outputs. I take advantage of the fact that base-16 and base-2 are related by a power of two, and therefore to go from hex->binary requires reading off each hexadecimal digit, converting it to a binary string (with left-padding as needed) and then joining them together. You could also use the previously mentioned approaches by parsing digits perhaps a few at a time for the same general idea.

            I have two programs. -base16to2- handles conversion from a string representation of hex to a string representation of binary, and checks input for valid characters (0-9 or a-z, case-insensitive). The second program, -hex2bin- is a convenience program that handles this conversion over a variable list but does not check input, as the assumption is that invalid hexadecimal strings should yield missing values (empty strings, in this case).

            Code:
            capture program drop base16to2
            program define base16to2, rclass
              version 14
              args hex
            
              /* check for valid input */
              if ustrregexm("`hex'", "[0-9a-f]{1,}", .) != 1 {
                  di in red `"Hex value "`hex'" contains invalid characters (only 0-9 and a-f are allowed)."'
                exit 198
              }
              local hex = strlower("`hex'")
            
              local hex_list "0123456789abcdef"
             
              tempname l idx res next
              local `l' = length("`hex'")
              local `res' = ""
              local `idx' = .
             
              forval hi = 1/``l'' {
                local `idx' = strpos("`hex_list'", substr("`hex'", `hi', 1)) - 1
                
                /* count out the hex value with powers of 2 */
                forval bi = 4(-1)1 {
                    local `next' = cond(int(mod(``idx'', 2^`bi') / 2^`=`bi'-1') >= 1, "1", "0")
                  local `res' = "``res''" + "``next''"
                }
              }
             
              return local base16 = "`hex'"
              return local base2 = "``res''"
            end
            
            capture program drop hex2bin
            program define hex2bin
              version 14
              syntax varlist (min=1 str) [if] [in], GENerate(name)
              marksample touse , strok
             
              unab hexvars : `varlist'
              local nhexvars : word count `hexvars'
            
              local vi 0
              foreach hex of varlist `hexvars' {
                local ++vi
                
                qui gen `generate'`vi' = ""
                forval i = 1/`=_N' {
                  qui cap base16to2 `=`hex'[`i']'
                  qui replace `generate'`vi' = "`r(base2)'" if `touse' & _n==`i'
                }
              }
            
              if `nhexvars'==1 {
                rename `generate'1 `generate'
              }
            end
            
            * start here
            input str9 value str32 hex
            "1" "1"
            "23" "17"
            "1" "00000000000000000000000000000001"
            "2^128" "10000000000000000000000000000000"
            "(2^129)-1" "ffffffffffffffffffffffffffffffff"
            "missing" "wrong"
            "42" "2a"
            end
            
            base16to2 0123aA
            ret list
            
            hex2bin hex , gen(want)
            list
            Results:

            Code:
            . base16to2 0123aA
            . ret list
            
            macros:
                          r(base2) : "000000010010001110101010"
                         r(base16) : "0123aa"
            
            . hex2bin hex , gen(want)
            
            . list
            
                 +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
                 |     value                                hex                                                                                                                               want |
                 |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
              1. |         1                                  1                                                                                                                               0001 |
              2. |        23                                 17                                                                                                                           00010111 |
              3. |         1   00000000000000000000000000000001   00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001 |
              4. |     2^128   10000000000000000000000000000000   00010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 |
              5. | (2^129)-1   ffffffffffffffffffffffffffffffff   11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111 |
                 |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
              6. |   missing                              wrong                                                                                                                                    |
              7. |        42                                 2a                                                                                                                           00101010 |
                 +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

            Comment


            • #7
              Originally posted by Ken Chui View Post
              From their online help:

              Code:
               . clear all
              
              . set obs 1000
              
              . gen x1=int(1000*runiform())
              
              . gen str3 x2=string(x1)
              
              . Convert variable 'x1' from decimal system to binary system and generate a new veriable 'x1_1':
              
              . deci x1, f(10) t(2) gen(x1_1)
              
              . Convert variable 'x2' from decimal system to octal system:
              
              . deci x2, f(10) t(8)
              
              . Convert variable 'x2' from octal system to binary system and generate a new veriable 'x2_1':
              
              . deci x2, f(8) t(2) g(x2_1)
              
              . Convert variable 'x2_1' from binary system to hexadecimal system:
              
              . deci x2_1, f(2) t(16)
              Then I tried:
              Code:
              deci x2_1, f(16) t(2) g(x3)
              and it seems to work.

              Again, sorry if it is not what you were talking above. I do hope other users will be able to render help. Good day.

              When I try on my hex variable I get this error message
              charlist md5_1
              0123456789abcdef
              .deci md5_1, f(16) t(2) g(x3)
              inbase: cannot convert nonintegers
              r(198);

              As pointed out by Williamand Leonardo it may not be a conversion issue but a storage capacity issue with such big numbers.


              Comment


              • #8
                Originally posted by Leonardo Guizzetti View Post
                William is correct - Stata will not natively be able to store such large numbers as numeric representations. Stata has -inbase- and -inten- commands that are useful when the numbers do not need decimal representation. For your task, you would need to write your own way to store and parse the numbers as strings. Here I show you a way that works specifically for base-16 to base-2 conversion, requiring strings as inputs and outputs. I take advantage of the fact that base-16 and base-2 are related by a power of two, and therefore to go from hex->binary requires reading off each hexadecimal digit, converting it to a binary string (with left-padding as needed) and then joining them together. You could also use the previously mentioned approaches by parsing digits perhaps a few at a time for the same general idea.

                I have two programs. -base16to2- handles conversion from a string representation of hex to a string representation of binary, and checks input for valid characters (0-9 or a-z, case-insensitive). The second program, -hex2bin- is a convenience program that handles this conversion over a variable list but does not check input, as the assumption is that invalid hexadecimal strings should yield missing values (empty strings, in this case).

                Code:
                capture program drop base16to2
                program define base16to2, rclass
                version 14
                args hex
                
                /* check for valid input */
                if ustrregexm("`hex'", "[0-9a-f]{1,}", .) != 1 {
                di in red `"Hex value "`hex'" contains invalid characters (only 0-9 and a-f are allowed)."'
                exit 198
                }
                local hex = strlower("`hex'")
                
                local hex_list "0123456789abcdef"
                
                tempname l idx res next
                local `l' = length("`hex'")
                local `res' = ""
                local `idx' = .
                
                forval hi = 1/``l'' {
                local `idx' = strpos("`hex_list'", substr("`hex'", `hi', 1)) - 1
                
                /* count out the hex value with powers of 2 */
                forval bi = 4(-1)1 {
                local `next' = cond(int(mod(``idx'', 2^`bi') / 2^`=`bi'-1') >= 1, "1", "0")
                local `res' = "``res''" + "``next''"
                }
                }
                
                return local base16 = "`hex'"
                return local base2 = "``res''"
                end
                
                capture program drop hex2bin
                program define hex2bin
                version 14
                syntax varlist (min=1 str) [if] [in], GENerate(name)
                marksample touse , strok
                
                unab hexvars : `varlist'
                local nhexvars : word count `hexvars'
                
                local vi 0
                foreach hex of varlist `hexvars' {
                local ++vi
                
                qui gen `generate'`vi' = ""
                forval i = 1/`=_N' {
                qui cap base16to2 `=`hex'[`i']'
                qui replace `generate'`vi' = "`r(base2)'" if `touse' & _n==`i'
                }
                }
                
                if `nhexvars'==1 {
                rename `generate'1 `generate'
                }
                end
                
                * start here
                input str9 value str32 hex
                "1" "1"
                "23" "17"
                "1" "00000000000000000000000000000001"
                "2^128" "10000000000000000000000000000000"
                "(2^129)-1" "ffffffffffffffffffffffffffffffff"
                "missing" "wrong"
                "42" "2a"
                end
                
                base16to2 0123aA
                ret list
                
                hex2bin hex , gen(want)
                list
                Results:

                Code:
                . base16to2 0123aA
                . ret list
                
                macros:
                r(base2) : "000000010010001110101010"
                r(base16) : "0123aa"
                
                . hex2bin hex , gen(want)
                
                . list
                
                +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
                | value hex want |
                |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
                1. | 1 1 0001 |
                2. | 23 17 00010111 |
                3. | 1 00000000000000000000000000000001 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001 |
                4. | 2^128 10000000000000000000000000000000 00010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 |
                5. | (2^129)-1 ffffffffffffffffffffffffffffffff 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111 |
                |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
                6. | missing wrong |
                7. | 42 2a 00101010 |
                +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
                Excellent Leonardo, problem solved. Grazie!

                Comment


                • #9
                  Say the -base16to2- program accepted a second argument (a parameter h0 for instance) so that line # 4 above became
                  Code:
                  args hex h0
                  How would the convenience program -hextobin- have to be modified to accommodate that extra argument?
                  Thank you

                  Comment

                  Working...
                  X