Converting from hex to base64 - python

I have been tasked to take a hex text and convert it base64 system. My problem is I am getting an incorrect output from my code.
Input given:
49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d
Output expected:
SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t
Output from my program:
kk7aga2lY2NL5nIHLe6uiBicMLS3IGxp1spAhIHBe0ub9ub3rmQNXVzaOTe3
How I expect my code to work :
NOTE: I understand that Python has built in ways to convert something to base64, as well as using int() to convert any base to decimal, I have opted to use my own as a way to understand the problem more.
Take a string of hexadecimal text and convert it to a decimal number.
Convert the decimal number to a binary number.
Separate the binary number into chunks of 24 bits.
Split the chunk of 24 bits into 4 sections of 6 bits each.
Convert each 6 bits into a decimal number.
Convert the decimal number into the base64-encoded letter/number. Starting from "A" (index of 0) to "/" (index of 63)
My code:
def convertToDecimal(text, original_base):
'''
Program assumes user is using it correctly. It does not bother checking for weird cases like a base being 0 or a negative.
'''
decimal = 0 # used for converting to decimal base first
exp = len(text) - 1 # starting exponent of the base (8^1, 8^0, etc)
hex_nums = {"a": 10, "b": 11, "c": 12, "d": 13, "e": 14, "f": 15}
# convert original_base to a decimal number
for val in text:
if val in hex_nums: # have to worry about letters
val = hex_nums[val]
decimal += int(val) * (original_base ** exp)
exp -= 1
return decimal
def base64(text, base):
letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
bin_num = bin(convertToDecimal(text, base))[2:]
base64_val = ""
# used for indexing each chunk of binary values
i = 0
j = 25
# used for each chunk of six bits to be converted into a number
six_bit_chunk = ""
while True:
if j > len(bin_num):
# prevents index out of bounds error
break
bin_chunk = bin_num[i:j] # take a chunk of 24 bits
for x in range(0, 24, 6):
six_bit_chunk += bin_chunk[x:x + 6] # take a smaller chunk of 6 bits
index = convertToDecimal(six_bit_chunk, 2) # use the 6 bits to create a decimal value from 0-63
base64_val += letters[index]
six_bit_chunk = ""
i += 25
j += 25
return base64_val
new_text = base64("49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d",
16)
print(new_text)
My attempts to solve this problem have been to look at what the chunks of six bits produce, and they are creating the correct decimal number and the correct letter/number in base64.

Can't you just iterate over the binary representation in groups of 6 directly instead of grouping by 24 and separating into 4 groups again?
But it still has to be left-padded with zeros to a multiple of 24 before splitting into groups of 6.
My function:
def base64_v2(text, base):
base64_letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
n1 = 24
n2 = 6
dec_num = int(text, base)
bin_str = bin(dec_num)[2:]
# pad with zeros to the left to get multiple of 'n1'
remainder = divmod(len(bin_str), n1)[1]
if remainder > 0:
additional_zeros = n1 - remainder
bin_str = ('0' * additional_zeros) + bin_str
return ''.join(
base64_letters[int(bin_str[i:i+n2], 2)]
for i in range(0, len(bin_str), n2))
My function passes all the following test cases:
sample_data = [
(
'49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d',
'SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t',
),
(
'24e1f218372064987d5457b639819715bddb51558cc4acb3ce6eeacd7afa2751bac0a5c7c3c636f24babff4ad68473db',
'JOHyGDcgZJh9VFe2OYGXFb3bUVWMxKyzzm7qzXr6J1G6wKXHw8Y28kur/0rWhHPb',
),
(
'79d6e3a9d5bfba9902274e1b1c2d956fa92d16ba9a972825eb0b32aecd2401653624f900fe79b4550c24aae136ae7260',
'edbjqdW/upkCJ04bHC2Vb6ktFrqalygl6wsyrs0kAWU2JPkA/nm0VQwkquE2rnJg',
),
(
'd251b89c473d6336dfeb3714d92c6b2080c863624ca2746e1380bd0642893ba64ebdae23bb7ad9a1f3ad9efbdc2f18e6',
'0lG4nEc9Yzbf6zcU2SxrIIDIY2JMonRuE4C9BkKJO6ZOva4ju3rZofOtnvvcLxjm',
),
(
'5c2b578064178ad329f37041b063ec05c3ce8f202bb44e9a1260c6ded11ddd91d25ac83bba31bac7987e2da3a188c23d',
'XCtXgGQXitMp83BBsGPsBcPOjyArtE6aEmDG3tEd3ZHSWsg7ujG6x5h+LaOhiMI9',
),
(
'018b79f5c3c1a4f59d12cda25c5ca2a29c4c1fdd1cfdf3f0a4faf350fe384d21bcfd83a8350b49231cf8536595f2a43a',
'AYt59cPBpPWdEs2iXFyiopxMH90c/fPwpPrzUP44TSG8/YOoNQtJIxz4U2WV8qQ6',
),
(
'a617cfbe469cecd19f5ac75303a3049319ffb03d9a757c690d7c09d94dbabd6dce2314e1f409e6285fc0a0220eb803fe',
'phfPvkac7NGfWsdTA6MEkxn/sD2adXxpDXwJ2U26vW3OIxTh9AnmKF/AoCIOuAP+',
),
(
'2bc40041dbe6937e1113b191fd136bcdd741169e9e81809e83ad3104d447d700ed9d1ab5cfbc113c0731b855fde98f87',
'K8QAQdvmk34RE7GR/RNrzddBFp6egYCeg60xBNRH1wDtnRq1z7wRPAcxuFX96Y+H',
),
(
'1904435f5964861c5284b08e0ebf3d201da3ec795a3d9b0f7d5056c4369daed59d42cac72c897356a46305f7aac5fcb4',
'GQRDX1lkhhxShLCODr89IB2j7HlaPZsPfVBWxDadrtWdQsrHLIlzVqRjBfeqxfy0',
),
]
for i, (input_str, expected_output) in enumerate(sample_data):
print('i ', i)
print('input_str ', input_str)
print('expected_output', expected_output)
function_output = base64_v2(input_str, 16)
print('function_output', function_output)
assert expected_output == function_output

I'm not entirely sure if this is correct, but with a few changes I achieved that the output of the function matches the expected output you provided.
The changes (also taking into account the comment from user #JamesKPolt):
everywhere you had 25 y used 24 instead
add zero padding to the beginning of bin_num to complete the length to a multiple of 24
The modified code:
def base64(text, base):
letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
bin_num = bin(convertToDecimal(text, base))[2:]
# pad with zeros to the left to get multiple of 24
remainder = divmod(len(bin_num), 24)[1]
if remainder > 0:
additional_zeros = 24 - remainder
bin_num = ('0' * additional_zeros) + bin_num
base64_val = ""
# used for indexing each chunk of binary values
i = 0
j = 24
# used for each chunk of six bits to be converted into a number
six_bit_chunk = ""
while True:
if j > len(bin_num):
# prevents index out of bounds error
break
bin_chunk = bin_num[i:j] # take a chunk of 24 bits
for x in range(0, 24, 6):
six_bit_chunk += bin_chunk[x:x + 6] # take a smaller chunk of 6 bits
index = convertToDecimal(six_bit_chunk, 2) # use the 6 bits to create a decimal value from 0-63
base64_val += letters[index]
six_bit_chunk = ""
i += 24
j += 24
return base64_val
This modified function passes all of the following test cases:
sample_data = [
(
'49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d',
'SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t',
),
(
'24e1f218372064987d5457b639819715bddb51558cc4acb3ce6eeacd7afa2751bac0a5c7c3c636f24babff4ad68473db',
'JOHyGDcgZJh9VFe2OYGXFb3bUVWMxKyzzm7qzXr6J1G6wKXHw8Y28kur/0rWhHPb',
),
(
'79d6e3a9d5bfba9902274e1b1c2d956fa92d16ba9a972825eb0b32aecd2401653624f900fe79b4550c24aae136ae7260',
'edbjqdW/upkCJ04bHC2Vb6ktFrqalygl6wsyrs0kAWU2JPkA/nm0VQwkquE2rnJg',
),
(
'd251b89c473d6336dfeb3714d92c6b2080c863624ca2746e1380bd0642893ba64ebdae23bb7ad9a1f3ad9efbdc2f18e6',
'0lG4nEc9Yzbf6zcU2SxrIIDIY2JMonRuE4C9BkKJO6ZOva4ju3rZofOtnvvcLxjm',
),
(
'5c2b578064178ad329f37041b063ec05c3ce8f202bb44e9a1260c6ded11ddd91d25ac83bba31bac7987e2da3a188c23d',
'XCtXgGQXitMp83BBsGPsBcPOjyArtE6aEmDG3tEd3ZHSWsg7ujG6x5h+LaOhiMI9',
),
(
'018b79f5c3c1a4f59d12cda25c5ca2a29c4c1fdd1cfdf3f0a4faf350fe384d21bcfd83a8350b49231cf8536595f2a43a',
'AYt59cPBpPWdEs2iXFyiopxMH90c/fPwpPrzUP44TSG8/YOoNQtJIxz4U2WV8qQ6',
),
(
'a617cfbe469cecd19f5ac75303a3049319ffb03d9a757c690d7c09d94dbabd6dce2314e1f409e6285fc0a0220eb803fe',
'phfPvkac7NGfWsdTA6MEkxn/sD2adXxpDXwJ2U26vW3OIxTh9AnmKF/AoCIOuAP+',
),
(
'2bc40041dbe6937e1113b191fd136bcdd741169e9e81809e83ad3104d447d700ed9d1ab5cfbc113c0731b855fde98f87',
'K8QAQdvmk34RE7GR/RNrzddBFp6egYCeg60xBNRH1wDtnRq1z7wRPAcxuFX96Y+H',
),
(
'1904435f5964861c5284b08e0ebf3d201da3ec795a3d9b0f7d5056c4369daed59d42cac72c897356a46305f7aac5fcb4',
'GQRDX1lkhhxShLCODr89IB2j7HlaPZsPfVBWxDadrtWdQsrHLIlzVqRjBfeqxfy0',
),
]
for i, (input_str, expected_output) in enumerate(sample_data):
print('i ', i)
print('input_str ', input_str)
print('expected_output', expected_output)
function_output = base64(input_str, 16)
print('function_output', function_output)
assert expected_output == function_output

Related

Splitting arrays in unequal chunks using np.array_split: overcomplicating?

I am trying to convert an octal number to decimal.
The inputs are a set of strings as numbers such as "23", or "23 24", or "23 24 25". My code works for inputs like this, but cannot handle say "23 240", or "23 240 1" I.e. when the inputs are of different lengths, the array splits them incorrectly.
I think I've overcomplicated it by using arrays. Is there a way to assess each input individually (i.e. "23" then "240" then "1"), and then put these back into the desired output "19 160 1"?
Code:
import numpy as np
def decode(code):
decimals = []
code = code.split()
for n in code:
p = len(n)
for digit in n:
decimal = int(digit) * (8**(p - 1))
decimals.append(decimal)
p -= 1
split_input = np.array_split(decimals, len(code))
sum_decimals = []
for number in split_input:
sum_decimal = sum(map(int, number))
sum_decimals.append(str(sum_decimal))
separate_outputs = " ".join(sum_decimals)
return str(separate_outputs)
Does this work? Try running it in an instant interpreter such as Google Colab
CODE:
import numpy as np
def decode(code):
if isinstance(code, str):
decimals = []
code = code.split(' ')
for n in code:
p = len(n)
for digit in n:
decimal = int(digit) * (8**(p - 1))
decimals.append(decimal)
p -= 1
split_input = np.array_split(decimals, len(code))
sum_decimals = []
for number in split_input:
sum_decimal = sum(map(int, number))
sum_decimals.append(str(sum_decimal))
separate_outputs = " ".join(sum_decimals)
return str(separate_outputs)
INPUT:
str1 = input('Enter octals:') #23 240 1
decode(str1)
OUTPUT:
19 160 1
EXTRAS:
If you copy code make sure to adjust tabs because that could lead to
multiple errors.
Also, you could put an else statement to raise
error if function does not receive a string array. OR, you could add
multiple if...else statements to handle different data formats
like: str, int, list, tuple, etc.

int 111 to binary 111(decimal 7)

Problem:Take a number example 37 is (binary 100101).
Count the binary 1s and create a binary like (111) and print the decimal of that binary(7)
num = bin(int(input()))
st = str(num)
count=0
for i in st:
if i == "1":
count +=1
del st
vt = ""
for i in range(count):
vt = vt + "1"
vt = int(vt)
print(vt)
I am a newbie and stuck here.
I wouldn't recommend your approach, but to show where you went wrong:
num = bin(int(input()))
st = str(num)
count = 0
for i in st:
if i == "1":
count += 1
del st
# start the string representation of the binary value correctly
vt = "0b"
for i in range(count):
vt = vt + "1"
# tell the `int()` function that it should consider the string as a binary number (base 2)
vt = int(vt, 2)
print(vt)
Note that the code below does the exact same thing as yours, but a bit more concisely so:
ones = bin(int(input())).count('1')
vt = int('0b' + '1' * ones, 2)
print(vt)
It uses the standard method count() on the string to get the number of ones in ones and it uses Python's ability to repeat a string a number of times using the multiplication operator *.
Try this once you got the required binary.
def binaryToDecimal(binary):
binary1 = binary
decimal, i, n = 0, 0, 0
while(binary != 0):
dec = binary % 10
decimal = decimal + dec * pow(2, i)
binary = binary//10
i += 1
print(decimal)
In one line:
print(int(format(int(input()), 'b').count('1') * '1', 2))
Let's break it down, inside out:
format(int(input()), 'b')
This built-in function takes an integer number from the input, and returns a formatted string according to the Format Specification Mini-Language. In this case, the argument 'b' gives us a binary format.
Then, we have
.count('1')
This str method returns the total number of occurrences of '1' in the string returned by the format function.
In Python, you can multiply a string times a number to get the same string repeatedly concatenated n times:
x = 'a' * 3
print(x) # prints 'aaa'
Thus, if we take the number returned by the count method and multiply it by the string '1' we get a string that only contains ones and only the same amount of ones as our original input number in binary. Now, we can express this number in binary by casting it in base 2, like this:
int(number_string, 2)
So, we have
int(format(int(input()), 'b').count('1') * '1', 2)
Finally, let's print the whole thing:
print(int(format(int(input()), 'b').count('1') * '1', 2))

Python: How do I convert file to custom base number and back?

I have a file that I want to convert into custom base (base 86 for example, with custom alphabet)
I have try to convert the file with hexlify and then into my custom base but it's too slow... 8 second for 60 Ko..
def HexToBase(Hexa, AlphabetList, OccurList, threshold=10):
number = int(Hexa,16) #base 16 vers base 10
alphabet = GetAlphabet(AlphabetList, OccurList, threshold)
#GetAlphabet return a list of all chars that occurs more than threshold times
b_nbr = len(alphabet) #get the base
out = ''
while number > 0:
out = alphabet[(number % b_nbr)] + out
number = number // b_nbr
return out
file = open("File.jpg","rb")
binary_data = file.read()
HexToBase(binascii.hexlify(binary_data),['a','b'],[23,54])
So, could anyone help me to find the right solution ?
Sorry for my poor English I'm French, and Thank's for your help !
First you can replace:
int(binascii.hexlify(binary_data), 16) # timeit: 14.349809918712538
By:
int.from_bytes(binary_data, byteorder='little') # timeit: 3.3330371951720164
Second you can use the divmod function to speed up the loop:
out = ""
while number > 0:
number, m = divmod(number, b_nbr)
out = alphabet[m] + out
# timeit: 3.8345545611298126 vs 7.472579440019706
For divmod vs %, // comparison and large numbers, see Is divmod() faster than using the % and // operators?.
(Remark: I expected that buildind an array and then making a string with "".join would be faster than out = ... + out but that was not the case with CPython 3.6.)
Everything put together gave me a speed up factor of 6.

Represent number as a bytes using 16-bit blocks

I wish to convert a number like 683550 (0xA6E1E) to b'\x1e\x6e\x0a\x00', where the number of bytes in the array is a multiple of 2 and where the len of the bytes object is only so long as it needs to be to represent the number.
This is as far as I got:
"{0:0{1}x}".format(683550,8)
giving:
'000a6e1e'
Use the .tobytes-method:
num = 683550
bytes = num.to_bytes((num.bit_length()+15)//16*2, "little")
Using python3:
def encode_to_my_hex_format(num, bytes_group_len=2, byteorder='little'):
"""
#param byteorder can take the values 'little' or 'big'
"""
bytes_needed = abs(-len(bin(num)[2: ]) // 8)
if bytes_needed % bytes_group_len:
bytes_needed += bytes_group_len - bytes_needed % bytes_group_len
num_in_bytes = num.to_bytes(bytes_needed, byteorder)
encoded_num_in_bytes = b''
for index in range(0, len(num_in_bytes), bytes_group_len):
bytes_group = num_in_bytes[index: index + bytes_group_len]
if byteorder == 'little':
bytes_group = bytes_group[-1: -len(bytes_group) -1 : -1]
encoded_num_in_bytes += bytes_group
encoded_num = ''
for byte in encoded_num_in_bytes:
encoded_num += r'\x' + hex(byte)[2: ].zfill(2)
return encoded_num
print(encode_to_my_hex_format(683550))

Am I missing something or is this Microsoft algorithm for calculating the excel column characters incorrect?

I'm trying to write a function in Python that takes in a column number and outputs the corresponding Excel column code (for example: 5 -> "E", 27 -> "AA"). I tried implementing the algorithm given here: http://support.microsoft.com/kb/833402, which is the following visual basic:
Function ConvertToLetter(iCol As Integer) As String
Dim iAlpha As Integer
Dim iRemainder As Integer
iAlpha = Int(iCol / 27)
iRemainder = iCol - (iAlpha * 26)
If iAlpha > 0 Then
ConvertToLetter = Chr(iAlpha + 64)
End If
If iRemainder > 0 Then
ConvertToLetter = ConvertToLetter & Chr(iRemainder + 64)
End If
End Function
My python version:
def excelcolumn(colnum):
alpha = colnum // 27
remainder = colnum - (alpha*26)
out = ""
if alpha > 0:
out = chr(alpha+64)
if remainder > 0:
out = out + chr(remainder+64)
return out
This works fine until column number 53 which results in "A[", as alpha = 53 // 27 == 1 and thus remainder = 53 - 1*26 == 27 meaning the second character chr(64+27) will be "[". Am I missing something? My VBA skills are quite lackluster so that might be the issue.
edit: I am using Python 3.3.1
The Microsoft formula is incorrect. I'll bet they never tested it beyond 53. When I tested it myself in Excel it gave the same incorrect answer that yours did.
Here's how I'd do it:
def excelcolumn(colnum):
alpha, remainder = colnum // 26, colnum % 26
out = "" if alpha == 0 else chr(alpha - 1 + ord('A'))
out += chr(remainder + ord('A'))
return out
Not that this assumes a 0-based column number while the VBA code assumes 1-based.
If you need to extend beyond 701 columns you need something slightly different as noted in the comments:
def excelcolumn(colnum):
if colnum < 26:
return chr(colnum + ord('A'))
return excelcolumn(colnum // 26 - 1) + chr(colnum % 26 + ord('A'))
Here is one way to do it:
def xl_col_to_name(col_num):
col_str = ''
while col_num:
remainder = col_num % 26
if remainder == 0:
remainder = 26
# Convert the remainder to a character.
col_letter = chr(ord('A') + remainder - 1)
# Accumulate the column letters, right to left.
col_str = col_letter + col_str
# Get the next order of magnitude.
col_num = int((col_num - 1) / 26)
return col_str
Which gives:
>>> xl_col_to_name(5)
'E'
>>> xl_col_to_name(27)
'AA'
>>> xl_col_to_name(256)
'IV'
>>> xl_col_to_name(1000)
'ALL'
This is taken from the utility functions in the XlsxWriter module.
I am going to answer your specific question:
... is this Microsoft algorithm for calculating the excel column characters incorrect?
YES.
Generally speaking, when you want to have the integer division (typically called DIV) of two numbers, and the remainder (typically called MOD), you should use the same value as the denominator. Thus, you should use either 26 or 27 in both places.
So, the algorithm is incorrect (and it is easy to see that with iCol=27, where iAlpha=1 and iRemainder=1, while it should be iRemainder=0).
In this particular case, the number should be 26. Since this gives you numbers starting at zero, you should probably add ascii("A") (=65), generically speaking, instead of 64. The double error made it work for some cases.
The (hardly acceptable) confusion may stem from the fact that, from A to Z there are 26 columns, from A to ZZ there are 26*27 columns, from A to ZZZ there are 26*27*27 columns, and so on.
Code that works for any column, and non-recursive:
def excelcolumn(colnum):
if colnum < 1:
raise ValueError("Index is too small")
result = ""
while True:
if colnum > 26:
colnum, r = divmod(colnum - 1, 26)
result = chr(r + ord('A')) + result
else:
return chr(colnum + ord('A') - 1) + result
(taken from here).

Categories