Python - Base64 - zlib - how to convert encoded Gzip file to Dictionary - python

When pulling data from a Solar API, a base64 encoded GZip file is sent.
When decoded, the data is received as a string but clearly is a nested dictionary.
Attempting to convert the string to a dictionary is unsuccessful.
I have tried to convert a string output from "json_str" to a dictionary "json_dict" but received an error of:
ValueError: dictionary update sequence element #0 has length 1; 2 is required
My Code:
import base64
import zlib
compressed = #Full byte string attached below
json_str = zlib.decompress(base64.b64decode(data), 16 + zlib.MAX_WBITS).decode('utf-8')
json_dict = dict(json_str)
Link to compressed byte string (.txt file):
Drive Link to code
My Thoughts:
This conversion should be successful as the output string is a nested dictionary as specified by the API reference, maybe I am decoding the string incorrectly from base64?
Any help is much appreciated.
Thanks

Related

Python Bytes & Lists & Encryption

I'm using Fernet to encrypt my data with this implementation. Let's assume that I have these three data:
data = [fernet.encrypt("Hello".encode()), fernet.encrypt("Stack".encode()), fernet.encrypt("Overflow".encode())]
After this operation, Python automatically converts bytes to string, and I'm writing them to a csv file. When I need to decrypt them like:
fernet.decrypt(data)
It gives me an error like you can only decrypt only bytes etc. I also checked that my data in the csv file is already bytes but string form.
b'gAAAAABiVUw5BzOkOv3VxlV5xa57Iaf0R4dzPbgsrnheAME8uYeslCZfTx9GeyRWe7l9VMM-gdDXiPZ4zsAXoXkG6T1dyXH6EztcqirrPhXX3YCt65_3xXvykVTDPdbEXs51cHvR-3HH'
An end-to-end usage example for encoding, writing to text, reading, and decoding.
The Fernet documentation can be referenced here.
from cryptography.fernet import Fernet
# Auto-generate a secret key.
key = Fernet.generate_key()
f = Fernet(key)
# Encode the string 'Hello' and encrypt.
encoded = f.encrypt('Hello'.encode())
This creates a bytestring (a bytes object) as:
b'gAAAAABiVVOOeO-hUG2QaKCVOyshntpbqVbxnexIVsFr7ttBGmKhHlDeM7jkTCjPPGphZxbh4D15X82pts3hKes12DjzwI8_jQ=='
Write, read and decrypt:
# Write the *decoded* encrypted string to a TXT file.
with open('/tmp/encoded.txt', 'w') as fh:
fh.write(encoded.decode())
# Read the encrypted string from TXT file.
with open('/tmp/encoded.txt') as fh:
encoded = fh.read()
# Encode the string, pass through fernet for decryption,
# and decode the bytes output.
f.decrypt(encoded.encode()).decode()
Output:
'Hello'
fernet.encrypt returns bytes (I assume, you're not being specific which implementation you're using, I'm guessing this one). .decode() them to a string. Then your CSV will contain "gAAA...", not "b'gAAA...'". When reading those again from the CSV, .encode() the string before passing it to fernet.decrypt.
fernet.encrypt returns bytes
bytes.decode() turns bytes into str
CSV wants str
str.encode() turns str into bytes
fernet.decrypt wants bytes

How to attach a gz file to a json object in python?

I have an api that returns a gz file. The application from where I am running the api accepts only json formats. Is there a way to attach the returned gz file to a json object?
Would converting the gz file to base64 format and then creating a json object like
{ "file": "the base64 format" } work?
print(json.dumps({'file': base64.b64decode(response_alert.content)}))
I get the error
Object of type bytes is not JSON serializable
String encoding is str→bytes and decoding is bytes→str. Base64 is the reverse, as it encodes binary data as characters rather than characters as binary data. However, since many of its use cases involve ASCII protocols like SMTP, base64.b64encode actually requires and returns bytes (ASCII in the latter case). You therefore want
json.dumps(dict(file=base64.b64encode(response_alert.content).decode()))
which takes advantage of the default encoding (UTF-8) supporting ASCII text. On the other end, you don’t have to bother encoding back to bytes, since str is accepted by base64.b64decode.

Convert Hex Encoded GZIP string back to uncompressed string

I'm having trouble converting a compressed, hex-encoded string back into its original format, without introducing numerous / seemingly erroneous backslashes + unconverted unicode characters.
The code I'm using to do this process is:
import gzip
from io import StringIO, BytesIO
def string_to_bytes(input_str: str) -> bytes:
"""
Read the given string, encode it in utf-8, gzip compress
the data and return it as a byte array.
"""
bio = BytesIO()
bio.write(input_str.encode("utf-8"))
bio.seek(0)
stream = BytesIO()
compressor = gzip.GzipFile(fileobj=stream, mode='w')
while True: # until EOF
chunk = bio.read(8192)
if not chunk: # EOF?
compressor.close()
return stream.getvalue()
compressor.write(chunk)
def bytes_to_string(input_bytes: bytes) -> str:
"""
Decompress the given byte array (which must be valid
compressed gzip data) and return the decoded text (utf-8).
"""
bio = BytesIO()
stream = BytesIO(input_bytes)
decompressor = gzip.GzipFile(fileobj=stream, mode='r')
while True: # until EOF
chunk = decompressor.read(8192)
if not chunk:
decompressor.close()
bio.seek(0)
return bio.read().decode("utf-8")
bio.write(chunk)
return None
In the script I'm running the input_string gets compressed + saved as hex with:
saved_hex = string_to_bytes(input_string).hex()
This gets stored as a BINARY datatype in a Snowflake database (using the HEX binary format).
This gets loaded out from there like so:
hex_bytes = bytes.fromhex(hex_html)
html_string = bytes_to_string(hex_bytes)
And the results are coming out like:
href\\\\\\\\u003d\\\\\\\\\\\\x22https://www.google.com/advanced_search\\\\\\\\\\\\x22 target\\\\\\\\u003d\\\\\\\\\\\\x22_blank\\\\\\\\\\\\x22\\\\\\\\u003eadvanced search\\\\\\\\u003c/a\\\\\\\\u003e to find results...
Where there's multiple backslashes which I'm unable to convert back to a single backslash (in the case of the unicode characters) or remove entirely.
Is there any way to more efficiently:
Gzip compress the string
Convert to Hex
Decode the hex + decompress - without adding any of these weird unconverted unicode characters?
Thank you all for the answers - foolishly I realised that:
I was adding an additional json.dumps() to the input string (further encoding it as a string and adding all the additional back-slashes).
Snowflake saves the data as bytes, which must be converted to binary first using TO_VARCHAR(saved_hex_data) before you can call bytes_to_string(bytes.fromhex(output_string)) on it.
At which point everything is preserved as before, many thanks again.

Python gpg decryption not printing non english characters correctly

I am trying to decrypt a json message body having mix of numeric and non english characters. The decrypted string is not showing non English characters properly.
Details:-
1) Input is base64 encoded and gpg encrypted
2) I am using python base64 and gnupg modules to decode and decrypt the message.
The output is displayed as (part of the output due to the data sensitivity):-
{"id":"123","name":"ååéå業é¡
I am expecting the output as
{{"id":"123","name":"豐國業銀"
Here is the python code I am using for the above task:
import json
import os
import base64
import gnupg
gpg = gnupg.GPG()
with open('item2.json', 'r') as file:
json_data = json.load(file)
for value in json_data['items']:
data = value['payload']
print (data)
str_data = base64.b64decode(data)
print (str_data)
decrypted_data = gpg.decrypt(str_data, passphrase=output)
print (decrypted_data)
It's probably caused by a character encoding mismatch. It will also make a difference whether you're using Python 2 or Python 3.
Since Python 2 has finally reached its EOL, I'm going to assume you (and everyone to subsequently read this) need Python 3 code. There are also a number of possible explanations stemming from the way the python-gnupg module accesses the gpg executable.
For both security and ease of use reasons, I recommend instead using the Python bindings module which ships whith the GPGME C API instead.
import json
import os
import base64
import gpg
c = gpg.Context(armor=True)
with open('item2.json', 'r') as file:
json_data = json.load(file)
for value in json_data['items']:
data = value['payload']
print (data)
str_data = base64.b64decode(data.encode())
print (str_data)
decrypted_data, result, verify_result = c.decrypt(str_data)
print (decrypted_data)
Now decrypted_data only contains the plaintext, result contains information about the key(s) the encrypted data is encrypted to, and verify_result contains information about the key(s) the data was signed with, if any.
More details are in the included documentation, a draft copy is available here for convenience.
In all likelihood, however, the cause was the base64.b64decoding method, which needs to process bytes, not str. Which means you either need to encode data (as I have done above), or read in the JSON data as bytes instead of strings.

Python Read base 64 string from xml file to byte array similar to QT

I am reading a base64 string from xml file. The base64 string written to xml file is generated from QTc++ program. I have The Qtc++ program which parses the xml file and able to generate the data from base64 string.I want to do the same with python i am unable to figure how to read the base64 as byte array similar to QTc++.
My Qt c++ code which parses the xml and decode the data file looks like below. For simplicity i am just including a part of encoded base64 string.
datastring='AAAAAAAAAAA/YGJN0vGp/D9wYk3S8an8P3iTdLxqfvo/'
QByteArray xByteArray =QByteArray::fromBase64('datastring')
QDataStream xInStream(xByteArray);
while (!xInStream.atEnd())
{
double d;
xInStream >> d;
qDebug()<<d;
pPlotCurve->addXAxisValue(d);
}
I am able to decode the data using QTc++ the output looks like [0.0, 0.002 0.004,......]
I am trying to decode the same base64 string in python, i am getting scrambled data. My Python Code is
import base64
# I parse the xml file and retrieve the data below
datastring = 'AAAAAAAAAAA/YGJN0vGp/D9wYk3S8an8P3iTdLxqfvo/'
out = base64.b64decode(datastring)
print out
When i decode this the above code i am getting scrambled data, how can i perform the same operation like the above QTc++ program and get the same output. I tried lot of methods but nothing worked.
base64.b64decode will take your base64 string and return bytes. Those bytes are just a pile of bits with no meaning. You need to tell Python how to turn those bits into meaningful data.
Use the struct module to tell Python how to interpret your binary data:
import base64
import struct
data_base64 = 'AAAAAAAAAAA/YGJN0vGp/D9wYk3S8an8P3iTdLxqfvo/'
data_bytes = base64.b64decode(data_base64)
# '>ddddx' big endian, 4 8-byte floating point plus 1 extra byte
struct.unpack('>ddddx', data_bytes)
# gives (0.0, 0.002, 0.004, 0.006)
To unpack an arbitrary number of doubles do:
[struct.unpack_from('>d', data_bytes, offset)[0] for offset in range(0, len(data_bytes), 8)]
# gives [0.0, 0.002, 0.004, 0.006]
Note that your example data is 33 bytes long. If you use this second method make sure the length of your bytes is a multiple of 8, otherwise you will get struct.error: unpack_from requires a buffer of at least 8 bytes.

Categories