Convert string with HEX MD5 to base64 encoding - python

I need to convert a HEX-type md5 string to the base64 version in Python.
For example, if I had MD5: 4297f44b13955235245b2497399d7a93
I need the code to produce Qpf0SxOVUjUkWySXOZ16kw==
This is identical to another SO asking for a C# implementation, but I need the Python code. This is similar to this SO asking to convert a single binary number to base64 in Python.

Depending on the version of Python you are running, the following will work:
Python 2
base64.b64encode("4297f44b13955235245b2497399d7a93".decode("‌​hex"))
Python 3
base64.b64encode(bytes.fromhex("4297f44b13955235245b2497399d‌​7a93"))

Related

What is the Python equivalent of Ruby's Base64.urlsafe_encode64(Digest::SHA256.hexdigest(STRING))

I am trying to port parts of a ruby project to python and cannot figure out the equivalent to Base64.urlsafe_encode64(Digest::SHA256.hexdigest(STRING)) Closest I have gotten is base64.urlsafe_b64encode(hashlib.sha256(STRING.encode('utf-8')).digest()) however giving the input of StackOverflow it returns: b'HFqE4xhK0TPtcmK7rNQMl3bsQRnD-sNum5_K9vY1G98=' for Python and MWM1YTg0ZTMxODRhZDEzM2VkNzI2MmJiYWNkNDBjOTc3NmVjNDExOWMzZmFjMzZlOWI5ZmNhZjZmNjM1MWJkZg== in Ruby.
Full Python & Ruby Code:
Ruby
require "base64"
require "digest"
string= "StackOverflow"
output= Base64.urlsafe_encode64(Digest::SHA256.hexdigest(string))
puts output
Python
import hashlib
import base64
string = str("StackOverflow")
output = base64.urlsafe_b64encode(hashlib.sha256(string.encode('utf-8')).digest())
print(str(output))
In your original Python code, you used digest instead of hexdigest which will give different results, as it's not the same thing. Keep in mind that converting code to different languages can be very difficult, as you need to understand both languages well enough to compare the code. Try and dissect the code piece by piece, splitting lines and printing each strings output / giving output at each stage to check what is happening.
Jamming everything into one line can be messy and you can easily overlook different factors which could play a major role in bug fixing.
You should write your code "spaced-out" at first, and in production you can change the code to be a singular line, although it's not very readable with long code.
What you are looking for is:
string = str("StackOverflow")
output = hashlib.sha256(code_verifier.encode('utf-8')).hexdigest()
output = base64.urlsafe_b64encode(code_challenge.encode('utf-8'))
print(str(output.decode('utf-8')))
It gives the same result, as if you are using Base64.urlsafe_encode64(Digest::SHA256.hexdigest(string)) in ruby.
You need to use the hexdigest method instead of digest on the hash in Python to get the same output as in your Ruby example (since you use the hexdigest function there).
Also note that the hexdigest method returns a string instead of bytes, so you'll need to encode the result again (with .encode("utf-8")).
Here's a full example:
import hashlib
import base64
string = "StackOverflow"
output = base64.urlsafe_b64encode(hashlib.sha256(string.encode("utf-8")).hexdigest().encode("utf-8"))
print(str(output))

Validate base64 in python

I would like to check wether a string is base64 encoded in python. As the built in module is very forgiving, I tried the following
s = b'111='
b64encode(b64decode(s)) == s
To my surprise it returned False. Indeed, b64encode(b64decode(s)) returns b'110='. I expected it to return True as I'm under the impression '111=' is a valid base64 string.
My python version is 3.6.4
$ python --version
Python 3.6.4
Why is this? Can someone explain this?
Since "0" and "1" both only differ in the last 4 bits (and the encoding has 4 padding bits), both are valid encodings of the plaintext. b64encode(), however, will only encode using "0" there.
In order to be robust you will need to calculate how many padding bits and bytes the encoded text should have and then only compare the significant bytes and bits.

Python Encoding that ignores leading 0s

I'm writing code in python 3.5 that uses hashlib to spit out MD5 encryption for each packet once it is is given a pcap file and the password. I am traversing through the pcap file using pyshark. Currently, the values it is spitting out are not the same as the MD5 encryptions on the packets in the pcap file.
One of the reasons I have attributed this to is that in the hex representation of the packet, the values are represented with leading 0s. Eg: Protocol number is shown as b'06'. But the value I am updating the hashlib variable with is b'6'. And these two values are not the same for same reason:
>> b'06'==b'6'
False
The way I am encoding integers is:
(hex(int(value))[2:]).encode()
I am doing this encoding because otherwise it would result in this error: "TypeError: Unicode-objects must be encoded before hashing"
I was wondering if I could get some help finding a python encoding library that ignores leading 0s or if there was any way to get the inbuilt hex method to ignore the leading 0s.
Thanks!
Hashing b'06' and b'6' gives different results because, in this context, '06' and '6' are different.
The b string prefix in Python tells the Python interpreter to convert each character in the string into a byte. Thus, b'06' will be converted into the two bytes 0x30 0x36, whereas b'6' will be converted into the single byte 0x36. Just as hashing b'a' and b' a' (note the space) produces different results, hashing b'06' and b'6' will similarly produce different results.
If you don't understand why this happens, I recommend looking up how bytes work, both within Python and more generally - Python's handling of bytes has always been a bit counterintuitive, so don't worry if it seems confusing! It's also important to note that the way Python represents bytes has changed between Python 2 and Python 3, so be sure to check which version of Python any information you find is talking about. You can comment here, too,

Python - hashing binary value

I wanted to use sha1 alghoritm to calculate the checksum of some data, the thing is that in python hashlib input is given as string.
Is it possible to calculate sha1 in python, but somehow give raw bytes as input?
I am asking because if I would want to calculate hash of an file, in C I would use openssl library and just pass normal bytes, but in Python I need to pass string, so if I would calculate hash of some specific file I would get different results in both languages.
In Python 2.x, str objects can be arbitrary byte streams. So yes, you can just pass the data into the hashlib functions as strs.
>>> import hashlib
>>> "this is binary \0\1\2"
'this is binary \x00\x01\x02'
>>> hashlib.sha1("this is binary \0\1\2").hexdigest()
'17c27af39d476f662be60be7f25c8d3873041bb3'

How to define a binary string in Python in a way that works with both py2 and py3?

I am writing a module that is supposed to work in both Python 2 and 3 and I need to define a binary string.
Usually this would be something like data = b'abc' but this code code fails on Python 2.5 with invalid syntax.
How can I write the above code in a way that will work in all versions of Python 2.5+
Note: this has to be binary (it can contain any kind of characters, 0xFF), this is very important.
I would recommend the following:
from six import b
That requires the six module, of course.
If you don't want that, here's another version:
import sys
if sys.version < '3':
def b(x):
return x
else:
import codecs
def b(x):
return codecs.latin_1_encode(x)[0]
More info.
These solutions (essentially the same) work, are clean, as fast as you are going to get, and can support all 256 byte values (which none of the other solutions here can).
If the string only has ASCII characters, call encode. This will give you a str in Python 2 (just like b'abc'), and a bytes in Python 3:
'abc'.encode('ascii')
If not, rather than putting binary data in the source, create a data file, open it with 'rb' and read from it.
You could store the data base64-encoded.
First step would be to transform into base64:
>>> import base64
>>> base64.b64encode(b"\x80\xFF")
b'gP8='
This is to be done once, and using the b or not depends on the version of Python you use for it.
In the second step, you put this byte string into a program without the b.
Then it is ensured that it works in py2 and py3.
import base64
x = 'gP8='
base64.b64decode(x.encode("latin1"))
gives you a str '\x80\xff' in 2.6 (should work in 2.5 as well) and a b'\x80\xff'in 3.x.
Alternatively to the two steps above, you can do the same with hex data, you can do
import binascii
x = '80FF'
binascii.unhexlify(x) # `bytes()` in 3.x, `str()` in 2.x

Categories