How to use Hashlib to MD5 hash a number? - python

It seems everyone is doing this:
import hashlib
# initializing string
str = "GeeksforGeeks"
# encoding GeeksforGeeks using encode()
# then sending to md5()
result = hashlib.md5(str.encode())
However, I want to hash plain numbers. Something like
result = hashlib.md5(0)
#or
var = 5
result = hashlib.md5(var)
isn't working, and I've tried lots of other variations. What's the right syntax?

Hashes operate on a sequence of bytes.
An integer in Python is just simply a logical value; it has no definite size or byte-wise representation. If you want to hash numbers, you need to decide what form to put the number in before hashing it.
The simplest option would be to hash the string representation of the number. Do this by calling str and hashing that result. E.g.
var = 5
hash_input = str(var)
result = hashlib.md5(hash_input)
Another option would be to choose a fixed size, and hash the binary representation of the number:
var = 5
hash_input = struct.pack('<I', var) # Little-endian 32-bit unsigned
result = hashlib.md5(hash_input)
The correct way to do this totally depends on what exactly you're trying to accomplish, which you haven't told us.

Hashing plain numbers is ambiguous, to say the least.
The number should be converted to bytes before being fed to the digest algorithm. The first problem that you'll run into is how much byte size the number occupies, could be 4 byte, 8 byte or whatever. Then comes endianness, the order of bytes in memory. All these would result in different digest for seemingly the same number. (For simplicity, I've assumed the number is int)
>>> hashlib.md5(b'4').hexdigest()
'a87ff679a2f3e71d9181a67b7542122c'
>>> i = 4
>>> hashlib.md5(i.to_bytes(2, 'big')).hexdigest()
'c244b9cdf7853b5693a295e384c07367'
>>> hashlib.md5(i.to_bytes(4, 'big')).hexdigest()
'ea4959eb64a1f09be580d950964f3843'
>>> hashlib.md5(i.to_bytes(8, 'big')).hexdigest()
'59cff542fae7e0c4267e45740a12c9a0'
So, your solution would be to convert the int to str and encode it so you can get a digest from it.
# either this
>>> hashlib.md5(b'4').hexdigest()
# or
>>> hashlib.md5(str(4).encode()).hexdigest()
Hope that helps.

Related

Pack into c types and obtain the binary value back

I'm using the following code to pack an integer into an unsigned short as follows,
raw_data = 40
# Pack into little endian
data_packed = struct.pack('<H', raw_data)
Now I'm trying to unpack the result as follows. I use utf-16-le since the data is encoded as little-endian.
def get_bin_str(data):
bin_asc = binascii.hexlify(data)
result = bin(int(bin_asc.decode("utf-16-le"), 16))
trimmed_res = result[2:]
return trimmed_res
print(get_bin_str(data_packed))
Unfortunately, it throws the following error,
result = bin(int(bin_asc.decode("utf-16-le"), 16)) ValueError: invalid
literal for int() with base 16: '㠲〰'
How do I properly decode the bytes in little-endian to binary data properly?
Use unpack to reverse what you packed. The data isn't UTF-encoded so there is no reason to use UTF encodings.
>>> import struct
>>> data_packed = struct.pack('<H', 40)
>>> data_packed.hex() # the two little-endian bytes are 0x28 (40) and 0x00 (0)
2800
>>> data = struct.unpack('<H',data_packed)
>>> data
(40,)
unpack returns a tuple, so index it to get the single value
>>> data = struct.unpack('<H',data_packed)[0]
>>> data
40
To print in binary use string formatting. Either of these work work best. bin() doesn't let you specify the number of binary digits to display and the 0b needs to be removed if not desired.
>>> format(data,'016b')
'0000000000101000'
>>> f'{data:016b}'
'0000000000101000'
You have not said what you are trying to do, so let's assume your goal is to educate yourself. (If you are trying to pack data that will be passed to another program, the only reliable test is to check if the program reads your output correctly.)
Python does not have an "unsigned short" type, so the output of struct.pack() is a byte array. To see what's in it, just print it:
>>> data_packed = struct.pack('<H', 40)
>>> print(data_packed)
b'(\x00'
What's that? Well, the character (, which is decimal 40 in the ascii table, followed by a null byte. If you had used a number that does not map to a printable ascii character, you'd see something less surprising:
>>> struct.pack("<H", 11)
b'\x0b\x00'
Where 0b is 11 in hex, of course. Wait, I specified "little-endian", so why is my number on the left? The answer is, it's not. Python prints the byte string left to right because that's how English is written, but that's irrelevant. If it helps, think of strings as growing upwards: From low memory locations to high memory. The least significant byte comes first, which makes this little-endian.
Anyway, you can also look at the bytes directly:
>>> print(data_packed[0])
40
Yup, it's still there. But what about the bits, you say? For this, use bin() on each of the bytes separately:
>>> bin(data_packed[0])
'0b101000'
>>> bin(data_packed[1])
'0b0'
The two high bits you see are worth 32 and 8. Your number was less than 256, so it fits entirely in the low byte of the short you constructed.
What's wrong with your unpacking code?
Just for fun let's see what your sequence of transformations in get_bin_str was doing.
>>> binascii.hexlify(data_packed)
b'2800'
Um, all right. Not sure why you converted to hex digits, but now you have 4 bytes, not two. (28 is the number 40 written in hex, the 00 is for the null byte.) In the next step, you call decode and tell it that these 4 bytes are actually UTF-16; there's just enough for two unicode characters, let's take a look:
>>> b'2800'.decode("utf-16-le")
'㠲〰'
In the next step Python finally notices that something is wrong, but by then it does not make much difference because you are pretty far away from the number 40 you started with.
To correctly read your data as a UTF-16 character, call decode directly on the byte string you packed.
>>> data_packed.decode("utf-16-le")
'('
>>> ord('(')
40

Convert binary to signed, little endian 16bit integer in Python

Trying to a convert a binary list into a signed 16bit little endian integer
input_data = [['1100110111111011','1101111011111111','0010101000000011'],['1100111111111011','1101100111111111','0010110100000011']]
Desired Output =[[-1074, -34, 810],[-1703, -39, 813]]
This is what I've got so far. It's been adapted from: Hex string to signed int in Python 3.2?,
Conversion from HEX to SIGNED DEC in python
results = []
for i in input_data:
hex_convert = [hex(int(x,2)) for x in i]
convert = [int(y[4:6] + y[2:4], 16) for y in hex_convert]
results.append(convert)
print (results)
output: [[64461, 65502, 810], [64463, 65497, 813]]
This is works fine, but the above are unsigned integers. I need signed integers capable of handling negative values. I then tried a different approach:
results_2 = []
for i in input_data:
hex_convert = [hex(int(x,2)) for x in i]
to_bytes = [bytes(j, 'utf-8') for j in hex_convert]
split_bits = [int(k, 16) for k in to_bytes]
convert_2 = [int.from_bytes(b, byteorder = 'little', signed = True) for b in to_bytes]
results_2.append(convert_2)
print (results_2)
Output: [[108191910426672, 112589973780528, 56282882144304], [108191943981104, 112589235583024, 56282932475952]]
This result is even more wild than the first. I know my approach is wrong, and it doesn't help that i've never been able to get my head around binary conversion etc, but I feel i'm on the right path with:
(b, byteorder = 'little', signed = True)
but can't work out where i'm wrong. Any help explaining this concept would be greatly appreciated.
This result is even more wild than the first. I know my approach is wrong... but can't work out where i'm wrong.
The problem is in the conversion to bytes. Let's look at it a step at a time:
int(x, 2)
Fine; we treat the string as a base-2 representation of the integer value, and get that integer. Only problem is it's a) unsigned and b) big-endian.
hex(int(x,2))
What this does is create a string representation of the integer, in base 16, with a 0x prefix. Notably, there are two text characters per byte that we want. This is already heading is down the wrong path.
You might have thought of using hexadecimal because you've seen \xAB style escapes inside string representations. This is a completely different thing. The string '\xAB' contains one character. The string '0xAB' contains four.
From there, everything else is still nonsense. Converting to bytes with a text encoding just means that the text character 0 for example is replaced with the byte value 48 (since in UTF-8 it's encoded with a single byte with that value). For this data we get the same results with UTF-8 that we would by assuming plain ASCII (since UTF-8 is "ASCII transparent" and there are no non-ASCII characters in the text).
So how do we do it?
We want to convert the integer from the first step into the bytes used to represent it. Just as there is a .from_bytes class method allowing us to create an integer from underlying bytes, there is an instance method allowing us to get the bytes that would represent the integer.
So, we use .to_bytes, specifying the length, signedness and endianness that was assumed when we created the int from the binary string - that gives us bytes that correspond to that string. Then, we re-create the integer from those bytes, except now specifying the proper signedness and endianness. The reason that .to_bytes makes us specify a length is because the integer doesn't have a particular length - there are a minimum number of bytes required to represent it, but you could use as many more as you like. (This is especially important if you want to handle signed values, since it will do sign-extension automatically.)
Thus:
for i in input_data:
values = [int(x,2) for x in i]
as_bytes = [x.to_bytes(2, byteorder='big', signed=False) for x in values]
reinterpreted = [int.from_bytes(x, byteorder='little', signed=True) for x in as_bytes]
results_2.append(reinterpreted)
But let's improve the organization of the code a bit. I will first make a function to handle a single integer value, and then we can use comprehensions to process the list. In fact, we can use nested comprehensions for the nested list.
def as_signed_little(binary_str):
# This time, taking advantage of positional args and default values.
as_bytes = int(binary_str, 2).to_bytes(2, 'big')
return int.from_bytes(as_bytes, 'little', signed=True)
# And now we can do:
results_2 = [[as_signed_little(x) for x in i] for i in input_data]

How to convert floating point number in python?

How to convert floating point number to base-16 numbers, 8 hexadecimal digits per 32-bit FLP number in python?
eg : input = 1.2717441261e+20 output wanted : 3403244E
If you want the byte values of the IEEE-754 representation, the struct module can do this:
>>> import struct
>>> f = 1.2717441261e+20
>>> struct.pack('f', f)
'\xc9\x9c\xdc`'
This is a string version of the bytes, which can then be converted into a string representation of the hex values:
>>> struct.pack('f', f).encode('hex')
'c99cdc60'
And, if you want it as a hex integer, parse it as such:
>>> s = struct.pack('f', f).encode('hex')
>>> int(s, 16)
3382500448
To display the integer as hex:
>>> hex(int(s, 16))
'0xc99cdc60'
Note that this does not match the hex value in your question -- if your value is the correct one you want, please update the question to say how it is derived.
There are several possible ways to do so, but none of them leads to the result you wanted.
You can code this float value into its IEEE binary representation. This leads indeed to a 32 bit number (if you do it with single precision). But it leads to different results, no matter which endianness I suppose:
import struct
struct.pack("<f", 1.2717441261e+20).encode("hex")
# -> 'c99cdc60'
struct.pack(">f", 1.2717441261e+20).encode("hex")
# -> '60dc9cc9'
struct.unpack("<f", "3403244E".decode("hex"))
# -> (687918336.0,)
struct.unpack(">f", "3403244E".decode("hex"))
# -> (1.2213533295835077e-07,)
As the other one didn't fit result-wise, I'll take the other answers and include them here:
float.hex(1.2717441261e+20)
# -> '0x1.b939919e12808p+66'
Has nothing to do with 3403244E as well, so maybe you want to clarify what exactly you mean.
There are surely other ways to do this conversation, but unless you specify which method you want, no one is likely to be able to help you.
There is something wrong with your expected output :
import struct
input = 1.2717441261e+20
buf = struct.pack(">f", input)
print ''.join("%x" % ord(c) for c in struct.unpack(">4c", buf) )
Output :
60dc9cc9
Try float.hex(input) if input is already a float.
Try float.hex(input). This should convert a number into a string representing the number in base 16, and works with floats, unlike hex(). The string will begin with 0x however, and will contain 13 digits after the decimal point, so I can't help you with the 8 digits part.
Source: http://docs.python.org/2/library/stdtypes.html#float.hex

Getting Raw Binary Representation of a file in Python

I'd like to get the exact sequence of bits from a file into a string using Python 3. There are several questions on this topic which come close, but don't quite answer it. So far, I have this:
>>> data = open('file.bin', 'rb').read()
>>> data
'\xa1\xa7\xda4\x86G\xa0!e\xab7M\xce\xd4\xf9\x0e\x99\xce\xe94Y3\x1d\xb7\xa3d\xf9\x92\xd9\xa8\xca\x05\x0f$\xb3\xcd*\xbfT\xbb\x8d\x801\xfanX\x1e\xb4^\xa7l\xe3=\xaf\x89\x86\xaf\x0e8\xeeL\xcd|*5\xf16\xe4\xf6a\xf5\xc4\xf5\xb0\xfc;\xf3\xb5\xb3/\x9a5\xee+\xc5^\xf5\xfe\xaf]\xf7.X\x81\xf3\x14\xe9\x9fK\xf6d\xefK\x8e\xff\x00\x9a>\xe7\xea\xc8\x1b\xc1\x8c\xff\x00D>\xb8\xff\x00\x9c9...'
>>> bin(data[:][0])
'0b11111111'
OK, I can get a base-2 number, but I don't understand why data[:][x], and I still have the leading 0b. It would also seem that I have to loop through the whole string and do some casting and parsing to get the correct output. Is there a simpler way to just get the sequence of 01's without looping, parsing, and concatenating strings?
Thanks in advance!
I would first precompute the string representation for all values 0..255
bytetable = [("00000000"+bin(x)[2:])[-8:] for x in range(256)]
or, if you prefer bits in LSB to MSB order
bytetable = [("00000000"+bin(x)[2:])[-1:-9:-1] for x in range(256)]
then the whole file in binary can be obtained with
binrep = "".join(bytetable[x] for x in open("file", "rb").read())
If you are OK using an external module, this uses bitstring:
>>> import bitstring
>>> bitstring.BitArray(filename='file.bin').bin
'110000101010000111000010101001111100...'
and that's it. It just makes the binary string representation of the whole file.
It is not quite clear what the sequence of bits is meant to be. I think it would be most natural to start at byte 0 with bit 0, but it actually depends on what you want.
So here is some code to access the sequence of bits starting with bit 0 in byte 0:
def bits_from_char(c):
i = ord(c)
for dummy in range(8):
yield i & 1
i >>= 1
def bits_from_data(data):
for c in data:
for bit in bits_from_char(c):
yield bit
for bit in bits_from_data(data):
# process bit
(Another note: you would not need data[:][0] in your code. Simply data[0] would do the trick, but without copying the whole string first.)
To convert raw binary data such as b'\xa1\xa7\xda4\x86' into a bitstring that represents the data as a number in binary system (base-2) in Python 3:
>>> data = open('file.bin', 'rb').read()
>>> bin(int.from_bytes(data, 'big'))[2:]
'1010000110100111110110100011010010000110...'
See Convert binary to ASCII and vice versa.

Is there a way to pad to an even number of digits?

I'm trying to create a hex representation of some data that needs to be transmitted (specifically, in ASN.1 notation). At some points, I need to convert data to its hex representation. Since the data is transmitted as a byte sequence, the hex representation has to be padded with a 0 if the length is odd.
Example:
>>> hex2(3)
'03'
>>> hex2(45)
'2d'
>>> hex2(678)
'02a6'
The goal is to find a simple, elegant implementation for hex2.
Currently I'm using hex, stripping out the first two characters, then padding the string with a 0 if its length is odd. However, I'd like to find a better solution for future reference. I've looked in str.format without finding anything that pads to a multiple.
def hex2(n):
x = '%x' % (n,)
return ('0' * (len(x) % 2)) + x
To be totally honest, I am not sure what the issue is. A straightforward implementation of what you describe goes like this:
def hex2(v):
s = hex(v)[2:]
return s if len(s) % 2 == 0 else '0' + s
I would not necessarily call this "elegant" but I would certainly call it "simple."
Python's binascii module's b2a_hex is guaranteed to return an even-length string.
the trick then is to convert the integer into a bytestring. Python3.2 and higher has that built-in to int:
from binascii import b2a_hex
def hex2(integer):
return b2a_hex(integer.to_bytes((integer.bit_length() + 7) // 8, 'big'))
Might want to look at the struct module, which is designed for byte-oriented i/o.
import struct
>>> struct.pack('>i',678)
'\x00\x00\x02\xa6'
#Use h instead of i for shorts
>>> struct.pack('>h',1043)
'\x04\x13'

Categories