Python3 and hmac . How to handle string not being binary

Python3 and hmac . How to handle string not being binary - python

I had a script in Python2 that was working great.
def _generate_signature(data):
return hmac.new('key', data, hashlib.sha256).hexdigest()
Where data was the output of json.dumps.
Now, if I try to run the same kind of code in Python 3, I get the following:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.4/hmac.py", line 144, in new
return HMAC(key, msg, digestmod)
File "/usr/lib/python3.4/hmac.py", line 42, in __init__
raise TypeError("key: expected bytes or bytearray, but got %r" %type(key).__name__)
TypeError: key: expected bytes or bytearray, but got 'str'
If I try something like transforming the key to bytes like so:
bytes('key')
I get
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: string argument without an encoding
I'm still struggling to understand the encodings in Python 3.

You can use bytes literal: b'key'
def _generate_signature(data):
return hmac.new(b'key', data, hashlib.sha256).hexdigest()
In addition to that, make sure data is also bytes. For example, if it is read from file, you need to use binary mode (rb) when opening the file.

Not to resurrect an old question but I did want to add something I feel is missing from this answer, to which I had trouble finding an appropriate explanation/example of anywhere else:
Aquiles Carattino was pretty close with his attempt at converting the string to bytes, but was missing the second argument, the encoding of the string to be converted to bytes.
If someone would like to convert a string to bytes through some other means than static assignment (such as reading from a config file or a DB), the following should work:
(Python 3+ only, not compatible with Python 2)
import hmac, hashlib
def _generate_signature(data):
key = 'key' # Defined as a simple string.
key_bytes= bytes(key , 'latin-1') # Commonly 'latin-1' or 'ascii'
data_bytes = bytes(data, 'latin-1') # Assumes `data` is also an ascii string.
return hmac.new(key_bytes, data_bytes , hashlib.sha256).hexdigest()
print(
_generate_signature('this is my string of data')
)

try
codecs.encode()
which can be used both in python2.7.12 and 3.5.2
import hashlib
import codecs
import hmac
a = "aaaaaaa"
b = "bbbbbbb"
hmac.new(codecs.encode(a), msg=codecs.encode(b), digestmod=hashlib.sha256).hexdigest()

for python3 this is how i solved it.
import codecs
import hmac
def _generate_signature(data):
return hmac.new(codecs.encode(key), codecs.encode(data), codecs.encode(hashlib.sha256)).hexdigest()

Related

Upload file to Databricks DBFS with Python API

I'm following the Databricks example for uploading a file to DBFS (in my case .csv):
import json
import requests
import base64
DOMAIN = '<databricks-instance>'
TOKEN = '<your-token>'
BASE_URL = 'https://%s/api/2.0/dbfs/' % (DOMAIN)
def dbfs_rpc(action, body):
""" A helper function to make the DBFS API request, request/response is encoded/decoded as JSON """
response = requests.post(
BASE_URL + action,
headers={'Authorization': 'Bearer %s' % TOKEN },
json=body
)
return response.json()
# Create a handle that will be used to add blocks
handle = dbfs_rpc("create", {"path": "/temp/upload_large_file", "overwrite": "true"})['handle']
with open('/a/local/file') as f:
while True:
# A block can be at most 1MB
block = f.read(1 << 20)
if not block:
break
data = base64.standard_b64encode(block)
dbfs_rpc("add-block", {"handle": handle, "data": data})
# close the handle to finish uploading
dbfs_rpc("close", {"handle": handle})
When using the tutorial as is, I get an error:
Traceback (most recent call last):
File "db_api.py", line 65, in <module>
data = base64.standard_b64encode(block)
File "C:\Miniconda3\envs\dash_p36\lib\base64.py", line 95, in standard_b64encode
return b64encode(s)
File "C:\Miniconda3\envs\dash_p36\lib\base64.py", line 58, in b64encode
encoded = binascii.b2a_base64(s, newline=False)
TypeError: a bytes-like object is required, not 'str'
I tried doing with open('./sample.csv', 'rb') as f: before passing the blocks to base64.standard_b64encode but then getting another error:
TypeError: Object of type 'bytes' is not JSON serializable
This happens when the encoded block data is being sent into the API call.
I tried skipping encoding entirely and just passing the blocks into the post call. In this case the file gets created in the DBFS but has 0 bytes size.
At this point I'm trying to make sense of it all. It doesn't want a string but it doesn't want bytes either. What am I doing wrong? Appreciate any help.

In Python we have strings and bytes, which are two different entities note that there is no implicit conversion between them, so you need to know when to use which and how to convert when necessary. This answer provides nice explanation.
With the code snippet I see two issues:
This you already got - open by default reads the file as text. So your block is a string, while standard_b64encode expects bytes and returns bytes. To read bytes from file it needs to be opened in binary mode:
with open('/a/local/file', 'rb') as f:
Only strings can be encoded as JSON. There's no source code available for dbfs_rpc (or I can't find it), but apparently it expects a string, which it internally encodes. Since your data is bytes, you need to convert it to string explicitly and that's done using decode:
dbfs_rpc("add-block", {"handle": handle, "data": data.decode('utf8')})

Is it possible for str.encode(encoding='utf-8', errors='strict') to raise UnicodeError?

I am writing some code that needs to work with both Py2.7 and Py3.7+.
I need to write text to a file using UTF-8 encoding. My code looks like this:
import six
...
content = ...
if isinstance(content, six.string_types):
content = content.encode(encoding='utf-8', errors='strict')
# write 'content' to file
Above, is it possible for content.encode() to raise UnicodeError from either Py2.7 or Py3.7+? I cannot think of a scenario where this is possible. I am not a Python expert, so I think there there must be an edge case.
Here is my reasoning why I think it will never raise UnicodeError:
six.string_types covers three types: Py2.7 str & unicode, Py3.7+ str
All of these types can always encode as UTF-8.

Yes, it's possible:
import six
content = ''.join(map(chr, range(0x110000)))
if isinstance(content, six.string_types):
content = content.encode(encoding='utf-8', errors='strict')
Result (Try it online!, using Python 3.7.4):
Traceback (most recent call last):
File ".code.tio", line 5, in <module>
content = content.encode(encoding='utf-8', errors='strict')
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 55296-57343: surrogates not allowed
And UnicodeEncodeErrors are UnicodeErrors.

How to create UUID from bytes when bytes comes from input() in Python

I'm using Python3 to get some useful UUID from string like this: \227L\310\210\r\232MG\2110\221?h$0\313
When I trying to make it manually by placing string to code, all works fine:
>>> from uuid import UUID
>>> print(UUID(bytes=b"\227L\310\210\r\232MG\2110\221?h$0\313"))
974cc888-0d9a-4d47-8930-913f682430cb
>>>
But if I take this string from input() function, Python adds some characters there (backslashes), so I can't use such string as UUID bytes argument:
from uuid import UUID
from codecs import encode
def give_client_id():
input_str = input('Paste bynary string: ')
print(input_str)
bytes_str = bytes(input_str, 'ascii')
print(bytes_str)
encoded_str = input_str.encode()
print(encoded_str)
print(UUID(bytes=bytes_str))
if __name__ == "__main__":
give_client_id()
This code always returns the same for me:
Paste bynary string: \227L\310\210\r\232MG\2110\221?h$0\313
\227L\310\210\r\232MG\2110\221?h$0\313
b'\\227L\\310\\210\\r\\232MG\\2110\\221?h$0\\313'
b'\\227L\\310\\210\\r\\232MG\\2110\\221?h$0\\313'
Traceback (most recent call last):
File "give_client_id_from_bytes.py", line 19, in <module>
give_client_id()
File "give_client_id_from_bytes.py", line 15, in give_client_id
print(UUID(bytes=bytes_str))
File "/usr/lib/python3.8/uuid.py", line 178, in __init__
raise ValueError('bytes is not a 16-char string')
ValueError: bytes is not a 16-char string
How can I force Python accept such string in UUID or re-format this string to something more useful?

Attribute error when encoding with base64

I have two keys(secret key and public key) that are generated using curve25519. I want to encode the two keys using base64.safe_b64encode but i keep getting an error. Is there any way I can encode using this?
This is my code:
import libnacl.public
import libnacl.secret
import libnacl.utils
from tinydb import TinyDB
from hashlib import sha256
import json
import base64
pikeys = libnacl.public.SecretKey()
piprivkey = pikeys.sk
pipubkey = pikeys.pk
piprivkey = base64.safe_b64encode(piprivkey)
pipubkey = base64.safe_b64encode(pipubkey)
print("encoded priv", piprivkey)
print("encoded pub", pipubkey)
This is the error I got:
Traceback (most recent call last):
File "/home/pi/Desktop/finalcode/pillar1.py", line 130, in <module>
File "/home/pi/Desktop/finalcode/pillar1.py", line 50, in generatepillar1key
piprivkey = base64.safe_b64encode(piprivkey)
AttributeError: 'module' object has no attribute 'safe_b64encode'

The reason you get this error is because the base64 library does not have a function named safe_base64encode. What do you even mean by safe_base64encode? Why do you want to encode both of your keys with base64? there is a urlsafe encoding function and there is the regular base64 encoding function.
encoded_data = base64.b64encode(data_to_encode)
or
encoded_data = base64.urlsafe_b64encode(data_to_encode)
The latter one will just have a different alphabet with - instead of + and _ instead of / so it's urlsafe. I'm not sure what you want to do but refer to the docs here

The error is telling you that the function safe_b64encode does not exist in the base64 module. Perhaps you meant to use base64.urlsafe_b64encode(s)?

How to hash a variable in Python?

This example works fine example:
import hashlib
m = hashlib.md5()
m.update(b"Nobody inspects")
r= m.digest()
print(r)
Now, I want to do the same thing but with a variable: var= "hash me this text, please". How could I do it following the same logic of the example ?

The hash.update() method requires bytes, always.
Encode unicode text to bytes first; what you encode to is a application decision, but if all you want to do is fingerprint text for then UTF-8 is a great choice:
m.update(var.encode('utf8'))
The exception you get when you don't is quite clear however:
>>> import hashlib
>>> hashlib.md5().update('foo')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Unicode-objects must be encoded before hashing
If you are getting the hash of a file, open the file in binary mode instead:
from functools import partial
hash = hashlib.md5()
with open(filename, 'rb') as binfile:
for chunk in iter(binfile, partial(binfile.read, 2048)):
hash.update(chunk)
print hash.hexdigest()

Try this. Hope it helps.
The variable var has to be utf-8 encoded. If you type in a string i.e. "Donald Duck", the var variable will be b'Donald Duck'. You can then hash the string with hexdigest()
#!/usr/bin/python3
import hashlib
var = input('Input string: ').encode('utf-8')
hashed_var = hashlib.md5(var).hexdigest()
print(hashed_var)

I had the same issue as the OP. I couldn't get either of the previous answers to work for me for some reason, but a combination of both helped come to this solution.
I was originally hashing a string like this;
str = hashlib.sha256(b'hash this text')
text_hashed = str.hexdigest()
print(text_hashed)
Result;d3dba6081b7f171ec5fa4687182b269c0b46e77a78611ad268182d8a8c245b40
My solution to hash a variable;
text = 'hash this text'
str = hashlib.sha256(text.encode('utf-8'))
text_hashed = str.hexdigest()
print(text_hashed)
Result; d3dba6081b7f171ec5fa4687182b269c0b46e77a78611ad268182d8a8c245b40

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python3 and hmac . How to handle string not being binary - python

You can use bytes literal: b'key' def _generate_signature(data): return hmac.new(b'key', data, hashlib.sha256).hexdigest() In addition to that, make sure data is also bytes. For example, if it is read from file, you need to use binary mode (rb) when opening the file.

try codecs.encode() which can be used both in python2.7.12 and 3.5.2 import hashlib import codecs import hmac a = "aaaaaaa" b = "bbbbbbb" hmac.new(codecs.encode(a), msg=codecs.encode(b), digestmod=hashlib.sha256).hexdigest()

for python3 this is how i solved it. import codecs import hmac def _generate_signature(data): return hmac.new(codecs.encode(key), codecs.encode(data), codecs.encode(hashlib.sha256)).hexdigest()

Related

Upload file to Databricks DBFS with Python API

Is it possible for str.encode(encoding='utf-8', errors='strict') to raise UnicodeError?

How to create UUID from bytes when bytes comes from input() in Python

Attribute error when encoding with base64

How to hash a variable in Python?

Categories

Resources