I have written a small python script which encrypts a message with rsa.
Now I want to save the bytes in a txt to read them later.
But when I use str(...) on it I don't know how to convert the string back.
For example I encrypted "Test" to b'Y\xf8\xbc\xca\x14\x0f\x80\xd3\xc6\xce\xecE\x14\xc1\xaf\xbd\x82\xd24\xcf\x04\xe2\x9a\x81NF\xbeXi\x85\xef\xc4\xbbl\xd3(5\x80\xe4\xde3\x8eC\xd2jR*\xb7.gq\x8c\x8b\xa12\x1a\x10+\xbf\xefHZ\n/'
and saved it as a string.
When I aply bytes(...) on it I get the Error: TypeError: string argument without an encoding.
What can I do in order to do this?
You've saved the Python string representation of a binary byte array (bytestring).
To get the actual bytes back from such a representation, pass it through ast.literal_eval():
>>> import ast
>>> s = r"b'Y\xf8\xbc\xca\x14\x0f\x80\xd3\xc6\xce\xecE\x14\xc1\xaf\xbd\x82\xd24\xcf\x04\xe2\x9a\x81NF\xbeXi\x85\xef\xc4\xbbl\xd3(5\x80\xe4\xde3\x8eC\xd2jR*\xb7.gq\x8c\x8b\xa12\x1a\x10+\xbf\xefHZ\n/'"
>>> b = ast.literal_eval(s)
b'Y\xf8\xbc\xca\x14\x0f\x80\xd3\xc6\xce\xecE\x14\xc1\xaf\xbd\x82\xd24\xcf\x04\xe2\x9a\x81NF\xbeXi\x85\xef\xc4\xbbl\xd3(5\x80\xe4\xde3\x8eC\xd2jR*\xb7.gq\x8c\x8b\xa12\x1a\x10+\xbf\xefHZ\n/'
Better yet, just save the binary bytes to your file without passing through a string:
encrypted_bytes = my_rsa("Test")
with open("encrypted.bin", "wb") as f:
f.write(encrypted_bytes)
# ...
with open("encrypted.bin", "rb") as f:
encrypted_bytes = f.read()
If you really want a "text-safe" format for those bytes, use base64.b64encode() and base64.b64decode().
Related
I'm using Fernet to encrypt my data with this implementation. Let's assume that I have these three data:
data = [fernet.encrypt("Hello".encode()), fernet.encrypt("Stack".encode()), fernet.encrypt("Overflow".encode())]
After this operation, Python automatically converts bytes to string, and I'm writing them to a csv file. When I need to decrypt them like:
fernet.decrypt(data)
It gives me an error like you can only decrypt only bytes etc. I also checked that my data in the csv file is already bytes but string form.
b'gAAAAABiVUw5BzOkOv3VxlV5xa57Iaf0R4dzPbgsrnheAME8uYeslCZfTx9GeyRWe7l9VMM-gdDXiPZ4zsAXoXkG6T1dyXH6EztcqirrPhXX3YCt65_3xXvykVTDPdbEXs51cHvR-3HH'
An end-to-end usage example for encoding, writing to text, reading, and decoding.
The Fernet documentation can be referenced here.
from cryptography.fernet import Fernet
# Auto-generate a secret key.
key = Fernet.generate_key()
f = Fernet(key)
# Encode the string 'Hello' and encrypt.
encoded = f.encrypt('Hello'.encode())
This creates a bytestring (a bytes object) as:
b'gAAAAABiVVOOeO-hUG2QaKCVOyshntpbqVbxnexIVsFr7ttBGmKhHlDeM7jkTCjPPGphZxbh4D15X82pts3hKes12DjzwI8_jQ=='
Write, read and decrypt:
# Write the *decoded* encrypted string to a TXT file.
with open('/tmp/encoded.txt', 'w') as fh:
fh.write(encoded.decode())
# Read the encrypted string from TXT file.
with open('/tmp/encoded.txt') as fh:
encoded = fh.read()
# Encode the string, pass through fernet for decryption,
# and decode the bytes output.
f.decrypt(encoded.encode()).decode()
Output:
'Hello'
fernet.encrypt returns bytes (I assume, you're not being specific which implementation you're using, I'm guessing this one). .decode() them to a string. Then your CSV will contain "gAAA...", not "b'gAAA...'". When reading those again from the CSV, .encode() the string before passing it to fernet.decrypt.
fernet.encrypt returns bytes
bytes.decode() turns bytes into str
CSV wants str
str.encode() turns str into bytes
fernet.decrypt wants bytes
I'm having trouble converting a compressed, hex-encoded string back into its original format, without introducing numerous / seemingly erroneous backslashes + unconverted unicode characters.
The code I'm using to do this process is:
import gzip
from io import StringIO, BytesIO
def string_to_bytes(input_str: str) -> bytes:
"""
Read the given string, encode it in utf-8, gzip compress
the data and return it as a byte array.
"""
bio = BytesIO()
bio.write(input_str.encode("utf-8"))
bio.seek(0)
stream = BytesIO()
compressor = gzip.GzipFile(fileobj=stream, mode='w')
while True: # until EOF
chunk = bio.read(8192)
if not chunk: # EOF?
compressor.close()
return stream.getvalue()
compressor.write(chunk)
def bytes_to_string(input_bytes: bytes) -> str:
"""
Decompress the given byte array (which must be valid
compressed gzip data) and return the decoded text (utf-8).
"""
bio = BytesIO()
stream = BytesIO(input_bytes)
decompressor = gzip.GzipFile(fileobj=stream, mode='r')
while True: # until EOF
chunk = decompressor.read(8192)
if not chunk:
decompressor.close()
bio.seek(0)
return bio.read().decode("utf-8")
bio.write(chunk)
return None
In the script I'm running the input_string gets compressed + saved as hex with:
saved_hex = string_to_bytes(input_string).hex()
This gets stored as a BINARY datatype in a Snowflake database (using the HEX binary format).
This gets loaded out from there like so:
hex_bytes = bytes.fromhex(hex_html)
html_string = bytes_to_string(hex_bytes)
And the results are coming out like:
href\\\\\\\\u003d\\\\\\\\\\\\x22https://www.google.com/advanced_search\\\\\\\\\\\\x22 target\\\\\\\\u003d\\\\\\\\\\\\x22_blank\\\\\\\\\\\\x22\\\\\\\\u003eadvanced search\\\\\\\\u003c/a\\\\\\\\u003e to find results...
Where there's multiple backslashes which I'm unable to convert back to a single backslash (in the case of the unicode characters) or remove entirely.
Is there any way to more efficiently:
Gzip compress the string
Convert to Hex
Decode the hex + decompress - without adding any of these weird unconverted unicode characters?
Thank you all for the answers - foolishly I realised that:
I was adding an additional json.dumps() to the input string (further encoding it as a string and adding all the additional back-slashes).
Snowflake saves the data as bytes, which must be converted to binary first using TO_VARCHAR(saved_hex_data) before you can call bytes_to_string(bytes.fromhex(output_string)) on it.
At which point everything is preserved as before, many thanks again.
I need to save a params file in python and this params file contains some parameters that I won't leave on plain text, so I codify the entire file to base64 (I know that this isn't the most secure encoding of the world but it works for the kind of data that I need to use).
With the encoding, everything works well. I encode the content of my file (a simply txt with a proper extension) and save the file. The problem comes with the decode. I print the text coded before save the file and the text coded from the file saved and there are exactly the same, but for a reason I don't know, the decode of the text of the file saved returns me this error UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8d in position 1: invalid start byte and the decode of the text before save the file works well.
Any idea to resolve this issue?
This is my code, I have tried converting all to bytes, to string, and everything...
params = open('params.bpr','r').read()
paramsencoded = base64.b64encode(bytes(params,'utf-8'))
print(paramsencoded)
paramsdecoded = str(base64.b64decode(str(paramsencoded,'utf-8')),'utf-8')
newparams = open('paramsencoded.bpr','w+',encoding='utf-8')
newparams.write(str(paramsencoded))
newparams.close()
params2 = open('paramsencoded.bpr',encoding='utf-8').read()
print(params2)
paramsdecoded = str(base64.b64decode(str(paramsencoded,'utf-8')),'utf-8')
paramsdecoded = base64.b64decode(str(params2))
print(str(paramsdecoded,'utf-8'))
Your error lies in your handling of the bytes object returned by base64.b64encode(), you called str() on the object:
newparams.write(str(paramsencoded))
That doesn't decode the bytes object:
>>> bytesvalue = b'abc='
>>> str(bytesvalue)
"b'abc='"
Note the b'...' notation. You produced the representation of the bytes object, which is a string containing Python syntax that can reproduce the value for debugging purposes (you can copy that string value and paste it into Python to re-create the same bytes value).
This may not be that easy to notice at first, as base64.b64encode() otherwise only produces output with printable ASCII bytes.
But your decoding problem originates from there, because when decoding the value read back from the file includes the b' characters at the start. Those first two characters are interpreted as Base64 data too; the b is a valid Base64 character, and the ' is ignored by the parser:
>>> bytesvalue = b'hello world'
>>> base64.b64encode(bytesvalue)
b'aGVsbG8gd29ybGQ='
>>> str(base64.b64encode(bytesvalue))
"b'aGVsbG8gd29ybGQ='"
>>> base64.b64decode(str(base64.b64encode(bytesvalue))) # with str()
b'm\xa1\x95\xb1\xb1\xbc\x81\xdd\xbd\xc9\xb1\x90'
>>> base64.b64decode(base64.b64encode(bytesvalue)) # without str()
b'hello world'
Note how the output is completely different, because the Base64 decoding is now starting from the wrong place, as b is the first 6 bits of the first byte (making the first decoded byte a 6C, 6D, 6E or 6F bytes, so m,n, o or p ASCII).
You could properly decode the value (using paramsencoded.decode('ascii') or str(paramsencoded, 'ascii')) but you should't treat any of this data as text.
Instead, open your files in binary mode. Reading and writing then operates with bytes objects, and the base64.b64encode() and base64.b64decode() functions also operate on bytes, making for a perfect match:
with open('params.bpr', 'rb') as params_source:
params = params_source.read() # bytes object
params_encoded = base64.b64encode(params)
print(params_encoded.decode('ascii')) # base64 data is always ASCII data
params_decoded = base64.b64decode(params_encoded)
with open('paramsencoded.bpr', 'wb') as new_params:
newparams.write(params_encoded) # write binary data
with open('paramsencoded.bpr', 'rb') as new_params:
params_written = new_params.read()
print(params_written.decode('ascii')) # still Base64 data, so decode as ASCII
params_decoded = base64.b64decode(params_written) # decode the bytes value
print(params_decoded.decode('utf8')) # assuming the original source was UTF-8
I explicitly use bytes.decode(codec) rather than str(..., codec) to avoid accidental str(...) calls.
How can I convert bytes to string without changing data ?
E.g
Input:
file_data = b'\xb4\xeb7s\x14q[\xc4\xbb\x8e\xd4\xe0\x01\xec+\x8f\xf8c\xff\x00 \xeb\xff'
Output:
'\xb4\xeb7s\x14q[\xc4\xbb\x8e\xd4\xe0\x01\xec+\x8f\xf8c\xff\x00 \xeb\xff'
I want to write an image data using StringIO with some additional data, Below is my code snippet,
img_buf = StringIO()
f = open("Sample_image.jpg", "rb")
file_data = f.read()
img_buf.write('\r\n' + file_data + '\r\n')
This works fine with python 2.7 but I want it to be working with python 3.4.
on read operation file_data = f.read() returns bytes object data something like this
b'\xb4\xeb7s\x14q[\xc4\xbb\x8e\xd4\xe0\x01\xec+\x8f\xf8c\xff\x00 \xeb\xff'
While writting data using img_buf it accepts only String data, so unable to write file_data with some additional characters.
So I want to convert file_data as it is in String object without changing its data. Something like this
'\xb4\xeb7s\x14q[\xc4\xbb\x8e\xd4\xe0\x01\xec+\x8f\xf8c\xff\x00 \xeb\xff'
so that I can concat and write the image data.
I don't want to decode or encode data. Any suggestions would be helpful for me. thanks in advance.
It is not clear what kind of output you desire. If you are interested in aesthetically translating bytes to a string representation without encoding:
s = str(file_data)[1:]
print(s)
# '\xb4\xeb7s\x14q[\xc4\xbb\x8e\xd4\xe0\x01\xec+\x8f\xf8c\xff\x00 \xeb\xff'
This is the informal string representation of the original byte string (no conversion).
Details
The official string representation looks like this:
s
# "'\\xb4\\xeb7s\\x14q[\\xc4\\xbb\\x8e\\xd4\\xe0\\x01\\xec+\\x8f\\xf8c\\xff\\x00 \\xeb\\xff'"
String representation handles how a string looks. Double escape characters and double quotes are implicitly interpreted in Python to do the right thing so that the print function outputs a formatted string.
String intrepretation handles what a string means. Each block of characters means something different depending on the applied encoding. Here we interpret these blocks of characters (e.g. \\xb4, \\xeb, 7, s) with the UTF-8 encoding. Blocks unrecognized by this encoding are replaced with a default character, �:
file_data.decode("utf-8", "replace")
# '��7s\x14q[Ļ���\x01�+��c�\x00 ��'
Converting from bytes to strings is required for reliably working with strings.
In short, there is a difference in string output between how it looks (representation) and what it means (interpretation). Clarify which you prefer and proceed accordingly.
Addendum
If your question is "how do I concatenate a byte string?", here is one approach:
buffer = io.BytesIO()
with buffer as f:
f.write(b"\r\n")
f.write(file_data)
f.write(b"\r\n")
print(buffer.getvalue())
# b'\r\n\xb4\xeb7s\x14q[\xc4\xbb\x8e\xd4\xe0\x01\xec+\x8f\xf8c\xff\x00 \xeb\xff\r\n'
Equivalently:
buffer = b""
buffer += b"\r\n"
buffer += file_data
buffer += b"\r\n"
buffer
# b'\r\n\xb4\xeb7s\x14q[\xc4\xbb\x8e\xd4\xe0\x01\xec+\x8f\xf8c\xff\x00 \xeb\xff\r\n'
I have a .bin file that holds data, however I am not sure of what format or encoding. I want to be able to transform the data into something readable. Formatting is not a problem, I can do that later.
My issue is parsing the file. I've tried to use struct, binascii and codecs with no such luck.
with open(sys.argv[1], 'rb') as f:
data = f.read()
lists = list(data)
# Below returns that each item is class 'bytes' and a number that appears to be <255
# However, if I add type(i) == bytes it spits an error
for i in lists:
print("Type: ", type(data))
print(i, "\n")
# Below returns that the class is 'bytes' and prints like this: b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xdd\x07\x00\x00\x0b\x00\x00\x00\x18\x00\x00\x00\x00\x00\x00\x00\x0e\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x08#\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xa0=\xa1D\xc0\x00\x00\x00\x00t\xdfe#
# To my knowledge, this looks like hex notation.
print("Data type: ", type(data))
print(data)
However, there should be someway to convert this into characters I can read i.e. letters or numbers, represented in a string. I seem to be over-complicating things, as I'm sure there's an inbuilt method that is being elusive.
Use binascii.hexlify:
>>> import binascii
>>> binascii.hexlify(b'\x00t\xdfe#')
b'0074df6540'