Python Bytes & Lists & Encryption - python

I'm using Fernet to encrypt my data with this implementation. Let's assume that I have these three data:
data = [fernet.encrypt("Hello".encode()), fernet.encrypt("Stack".encode()), fernet.encrypt("Overflow".encode())]
After this operation, Python automatically converts bytes to string, and I'm writing them to a csv file. When I need to decrypt them like:
fernet.decrypt(data)
It gives me an error like you can only decrypt only bytes etc. I also checked that my data in the csv file is already bytes but string form.
b'gAAAAABiVUw5BzOkOv3VxlV5xa57Iaf0R4dzPbgsrnheAME8uYeslCZfTx9GeyRWe7l9VMM-gdDXiPZ4zsAXoXkG6T1dyXH6EztcqirrPhXX3YCt65_3xXvykVTDPdbEXs51cHvR-3HH'

An end-to-end usage example for encoding, writing to text, reading, and decoding.
The Fernet documentation can be referenced here.
from cryptography.fernet import Fernet
# Auto-generate a secret key.
key = Fernet.generate_key()
f = Fernet(key)
# Encode the string 'Hello' and encrypt.
encoded = f.encrypt('Hello'.encode())
This creates a bytestring (a bytes object) as:
b'gAAAAABiVVOOeO-hUG2QaKCVOyshntpbqVbxnexIVsFr7ttBGmKhHlDeM7jkTCjPPGphZxbh4D15X82pts3hKes12DjzwI8_jQ=='
Write, read and decrypt:
# Write the *decoded* encrypted string to a TXT file.
with open('/tmp/encoded.txt', 'w') as fh:
fh.write(encoded.decode())
# Read the encrypted string from TXT file.
with open('/tmp/encoded.txt') as fh:
encoded = fh.read()
# Encode the string, pass through fernet for decryption,
# and decode the bytes output.
f.decrypt(encoded.encode()).decode()
Output:
'Hello'

fernet.encrypt returns bytes (I assume, you're not being specific which implementation you're using, I'm guessing this one). .decode() them to a string. Then your CSV will contain "gAAA...", not "b'gAAA...'". When reading those again from the CSV, .encode() the string before passing it to fernet.decrypt.
fernet.encrypt returns bytes
bytes.decode() turns bytes into str
CSV wants str
str.encode() turns str into bytes
fernet.decrypt wants bytes

Related

Save bytes in a .txt and read out as bytes later

I have written a small python script which encrypts a message with rsa.
Now I want to save the bytes in a txt to read them later.
But when I use str(...) on it I don't know how to convert the string back.
For example I encrypted "Test" to b'Y\xf8\xbc\xca\x14\x0f\x80\xd3\xc6\xce\xecE\x14\xc1\xaf\xbd\x82\xd24\xcf\x04\xe2\x9a\x81NF\xbeXi\x85\xef\xc4\xbbl\xd3(5\x80\xe4\xde3\x8eC\xd2jR*\xb7.gq\x8c\x8b\xa12\x1a\x10+\xbf\xefHZ\n/'
and saved it as a string.
When I aply bytes(...) on it I get the Error: TypeError: string argument without an encoding.
What can I do in order to do this?
You've saved the Python string representation of a binary byte array (bytestring).
To get the actual bytes back from such a representation, pass it through ast.literal_eval():
>>> import ast
>>> s = r"b'Y\xf8\xbc\xca\x14\x0f\x80\xd3\xc6\xce\xecE\x14\xc1\xaf\xbd\x82\xd24\xcf\x04\xe2\x9a\x81NF\xbeXi\x85\xef\xc4\xbbl\xd3(5\x80\xe4\xde3\x8eC\xd2jR*\xb7.gq\x8c\x8b\xa12\x1a\x10+\xbf\xefHZ\n/'"
>>> b = ast.literal_eval(s)
b'Y\xf8\xbc\xca\x14\x0f\x80\xd3\xc6\xce\xecE\x14\xc1\xaf\xbd\x82\xd24\xcf\x04\xe2\x9a\x81NF\xbeXi\x85\xef\xc4\xbbl\xd3(5\x80\xe4\xde3\x8eC\xd2jR*\xb7.gq\x8c\x8b\xa12\x1a\x10+\xbf\xefHZ\n/'
Better yet, just save the binary bytes to your file without passing through a string:
encrypted_bytes = my_rsa("Test")
with open("encrypted.bin", "wb") as f:
f.write(encrypted_bytes)
# ...
with open("encrypted.bin", "rb") as f:
encrypted_bytes = f.read()
If you really want a "text-safe" format for those bytes, use base64.b64encode() and base64.b64decode().

Convert Hex Encoded GZIP string back to uncompressed string

I'm having trouble converting a compressed, hex-encoded string back into its original format, without introducing numerous / seemingly erroneous backslashes + unconverted unicode characters.
The code I'm using to do this process is:
import gzip
from io import StringIO, BytesIO
def string_to_bytes(input_str: str) -> bytes:
"""
Read the given string, encode it in utf-8, gzip compress
the data and return it as a byte array.
"""
bio = BytesIO()
bio.write(input_str.encode("utf-8"))
bio.seek(0)
stream = BytesIO()
compressor = gzip.GzipFile(fileobj=stream, mode='w')
while True: # until EOF
chunk = bio.read(8192)
if not chunk: # EOF?
compressor.close()
return stream.getvalue()
compressor.write(chunk)
def bytes_to_string(input_bytes: bytes) -> str:
"""
Decompress the given byte array (which must be valid
compressed gzip data) and return the decoded text (utf-8).
"""
bio = BytesIO()
stream = BytesIO(input_bytes)
decompressor = gzip.GzipFile(fileobj=stream, mode='r')
while True: # until EOF
chunk = decompressor.read(8192)
if not chunk:
decompressor.close()
bio.seek(0)
return bio.read().decode("utf-8")
bio.write(chunk)
return None
In the script I'm running the input_string gets compressed + saved as hex with:
saved_hex = string_to_bytes(input_string).hex()
This gets stored as a BINARY datatype in a Snowflake database (using the HEX binary format).
This gets loaded out from there like so:
hex_bytes = bytes.fromhex(hex_html)
html_string = bytes_to_string(hex_bytes)
And the results are coming out like:
href\\\\\\\\u003d\\\\\\\\\\\\x22https://www.google.com/advanced_search\\\\\\\\\\\\x22 target\\\\\\\\u003d\\\\\\\\\\\\x22_blank\\\\\\\\\\\\x22\\\\\\\\u003eadvanced search\\\\\\\\u003c/a\\\\\\\\u003e to find results...
Where there's multiple backslashes which I'm unable to convert back to a single backslash (in the case of the unicode characters) or remove entirely.
Is there any way to more efficiently:
Gzip compress the string
Convert to Hex
Decode the hex + decompress - without adding any of these weird unconverted unicode characters?
Thank you all for the answers - foolishly I realised that:
I was adding an additional json.dumps() to the input string (further encoding it as a string and adding all the additional back-slashes).
Snowflake saves the data as bytes, which must be converted to binary first using TO_VARCHAR(saved_hex_data) before you can call bytes_to_string(bytes.fromhex(output_string)) on it.
At which point everything is preserved as before, many thanks again.

Bytes encoding in cryptography module giving error

I am using the cryptography module's Fernet for encoding.
The Fernet technique converts the data to bytes using a key, and then we can convert the bytes back to a string using that same key.
I want to convert the encoded bytes to a string and store that string. (It is important for me to convert it to a string).
So, I used the following way:
f = Fernet(key)
mystr = str(f.encrypt(bytes(mystr, "utf-8"))) # convert mystr to bytes
But now, when I try to convert the string back to bytes, I am unable to decrypt it again.
mystr = str(f.decrypt(bytes(mystr, "utf-8"))) # convert mystr back to a string
I get the following error:
File "C:\Users\Me\Desktop\Python\Encode.py", line 155, in encode
data = str(f.decrypt(bytes(data, "utf-8")))
File "C:\Users\Me\AppData\Local\Programs\Python\Python39\lib\site-packages\cryptography\fernet.py", line 75, in decrypt
timestamp, data = Fernet._get_unverified_token_data(token)
File "C:\Users\Me\AppData\Local\Programs\Python\Python39\lib\site-packages\cryptography\fernet.py", line 107, in _get_unverified_token_data
raise InvalidToken
cryptography.fernet.InvalidToken
I tried decrypting like:
mystr = str(f.decrypt(bytes(mystr, "ascii")))
or
mystr = str(f.decrypt(bytes(mystr, "base64")))
but, the error is still present.
You should use print() to see what you have in variables after using bytes and `str()
When you use
bytes('abc', 'utf-8')
then you get
b'abc'
and when you use
str(b'abc')
then you get
"b'abc'"
instead of 'abc' - and this prefix b and ' ' change everything. Now you have string with 6 chars instead of 3 chars.
You should use encode to create bytes and decode to create string again
'abc'.encode('utf-8').decode('utf-8')

How to encrypt JSON in python, error using cryptography

This question has already been answered here:
How to encrypt JSON in python
However, I'm getting an error when using the cryptography module.
raise TypeError("{} must be bytes".format(name))
TypeError: data must be bytes
Here's my code:
from cryptography.fernet import Fernet
key= b'F9tdtAlS5kqVL5_uxKCnOPailXUqKsJmxbHWGLv_H-c='
with open('info.json', 'rb') as loader1:
params = json.load(loader1)
if xyz(x, y)==True:
fernet = Fernet(key)
encrypted=fernet.encrypt(params)
print(encrypted)
with open('info.json', 'wb') as writer1:
json.dump(encrypted, writer1)
print("Operation was a success")
else:
print("error")
If you see in the original answer they are reading the contents from json file and not using json.load, so the content they are encrypting is in byte format however you are feeding in a json therefore the error data must be bytes. Quick fix will be to convert json to string using json.loads and then encoding to byte format before feeding it into fernet.encrypt()
To encode to byte
https://www.geeksforgeeks.org/python-convert-string-to-bytes/

Base64 encoding issue in Python

I need to save a params file in python and this params file contains some parameters that I won't leave on plain text, so I codify the entire file to base64 (I know that this isn't the most secure encoding of the world but it works for the kind of data that I need to use).
With the encoding, everything works well. I encode the content of my file (a simply txt with a proper extension) and save the file. The problem comes with the decode. I print the text coded before save the file and the text coded from the file saved and there are exactly the same, but for a reason I don't know, the decode of the text of the file saved returns me this error UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8d in position 1: invalid start byte and the decode of the text before save the file works well.
Any idea to resolve this issue?
This is my code, I have tried converting all to bytes, to string, and everything...
params = open('params.bpr','r').read()
paramsencoded = base64.b64encode(bytes(params,'utf-8'))
print(paramsencoded)
paramsdecoded = str(base64.b64decode(str(paramsencoded,'utf-8')),'utf-8')
newparams = open('paramsencoded.bpr','w+',encoding='utf-8')
newparams.write(str(paramsencoded))
newparams.close()
params2 = open('paramsencoded.bpr',encoding='utf-8').read()
print(params2)
paramsdecoded = str(base64.b64decode(str(paramsencoded,'utf-8')),'utf-8')
paramsdecoded = base64.b64decode(str(params2))
print(str(paramsdecoded,'utf-8'))
Your error lies in your handling of the bytes object returned by base64.b64encode(), you called str() on the object:
newparams.write(str(paramsencoded))
That doesn't decode the bytes object:
>>> bytesvalue = b'abc='
>>> str(bytesvalue)
"b'abc='"
Note the b'...' notation. You produced the representation of the bytes object, which is a string containing Python syntax that can reproduce the value for debugging purposes (you can copy that string value and paste it into Python to re-create the same bytes value).
This may not be that easy to notice at first, as base64.b64encode() otherwise only produces output with printable ASCII bytes.
But your decoding problem originates from there, because when decoding the value read back from the file includes the b' characters at the start. Those first two characters are interpreted as Base64 data too; the b is a valid Base64 character, and the ' is ignored by the parser:
>>> bytesvalue = b'hello world'
>>> base64.b64encode(bytesvalue)
b'aGVsbG8gd29ybGQ='
>>> str(base64.b64encode(bytesvalue))
"b'aGVsbG8gd29ybGQ='"
>>> base64.b64decode(str(base64.b64encode(bytesvalue))) # with str()
b'm\xa1\x95\xb1\xb1\xbc\x81\xdd\xbd\xc9\xb1\x90'
>>> base64.b64decode(base64.b64encode(bytesvalue)) # without str()
b'hello world'
Note how the output is completely different, because the Base64 decoding is now starting from the wrong place, as b is the first 6 bits of the first byte (making the first decoded byte a 6C, 6D, 6E or 6F bytes, so m,n, o or p ASCII).
You could properly decode the value (using paramsencoded.decode('ascii') or str(paramsencoded, 'ascii')) but you should't treat any of this data as text.
Instead, open your files in binary mode. Reading and writing then operates with bytes objects, and the base64.b64encode() and base64.b64decode() functions also operate on bytes, making for a perfect match:
with open('params.bpr', 'rb') as params_source:
params = params_source.read() # bytes object
params_encoded = base64.b64encode(params)
print(params_encoded.decode('ascii')) # base64 data is always ASCII data
params_decoded = base64.b64decode(params_encoded)
with open('paramsencoded.bpr', 'wb') as new_params:
newparams.write(params_encoded) # write binary data
with open('paramsencoded.bpr', 'rb') as new_params:
params_written = new_params.read()
print(params_written.decode('ascii')) # still Base64 data, so decode as ASCII
params_decoded = base64.b64decode(params_written) # decode the bytes value
print(params_decoded.decode('utf8')) # assuming the original source was UTF-8
I explicitly use bytes.decode(codec) rather than str(..., codec) to avoid accidental str(...) calls.

Categories