How do I decompress a compressed base64 string? - python

The following is the string I want to decompress:

I have tried zlib:
import zlib
decompressed_data = zlib.decompress(data)
I get the following error:
TypeError: a bytes-like object is required, not 'str'
Then I did:
data = bytes(data, "utf-8")
decompressed_data = zlib.decompress(data)
I get an error again:
Error -3 while decompressing data: incorrect header check

You first need to decode the base64, then zlib decompress:
import zlib, base64
decompressed_data = zlib.decompress(base64.b64decode(data))
Looking at your data, it appears to be UTF-8 encoded XML, so we're almost there:
xml = decompressed_data.decode("utf-8")

Related

How to turn a string into a binary object in python

I'm using this library to download and decode MMS PDUs:
https://github.com/pmarti/python-messaging
The sample code almost works, except that this method:
mms = MMSMessage.from_data(response)
Is throwing an exception:
TypeError: unsupported operand type(s) for &: 'str' and 'int'
Which seems to obviously be some sort of binary formatting problem.
In the sample code, the HTTP response is passed directly into the from_data method, however in my case it comes through with HTTP headers on it so I'm splitting the response by double CRLF and then passing in just the PDU data:
data = buf.getvalue()
split = data.split("\r\n\r\n");
mms = MMSMessage.from_data(split[1].strip())
This throws an error BUT if I first write the exact same data to a file then use the from_file method it works:
data = buf.getvalue()
split = data.split("\r\n\r\n");
f = open('dump','w+')
f.write(split[1])
f.close()
path = 'dump'
mms = MMSMessage.from_file(path)
I looked in the from_file method, and all it does is load the contents and then pass it into the same method as the from_data method, so the first way should Just Work™.
What I did notice is that the file is opened in binary format, and the content is loaded like this:
data = array.array('B')
with open(filename, 'rb') as f:
data.fromfile(f, num_bytes)
return self.decode_data(data)
So it seems obvious that somehow what I'm passing into the first function is actually a "string representation of binary data" and what's being read from the file is "actual binary data".
I tried using bytearray like this to "binaryfy" the string:
mms = MMSMessage.from_data(bytearray(split[1].strip(), "utf8"))
but that throws the error:
Traceback (most recent call last):
File "decodepdu.py", line 41, in <module>
mms = MMSMessage.from_data(bytearray(split[1].strip(), "utf8"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8c in position 0: ordinal not in range(128)
which seems weird because it's using an 'ascii' codec but I specified utf8 encoding.
Anyway at this point I'm in over my head because I'm not really all that familiar with python, so for now I'm just writing the content to a temporary file but I would really rather not.
Any help would be most appreciated!
Okay thanks to Paul M. in the comments, this works:
data = buf.getvalue()
split = data.split("\r\n\r\n");
pdu = array.array('B');
pdu.fromstring(split[1]);
mms = MMSMessage.from_data(pdu);

XML to base64 using python

How to convert complete xml file to base64 string using python/ scala?
I have tried b64 module ,but it requires a string(bytes-like) to be passed to it . But how to do that with ML given it's multiline structure and hierarchy.
Could anyone give an example on how to do it.
Thanks.
Python solution:
import base64
# convert file content to base64 encoded string
with open("input.xml", "rb") as file:
encoded = base64.encodebytes(file.read()).decode("utf-8")
# output base64 content
print(encoded)
decoded = base64.decodebytes(encoded.encode('utf-8'))
# write decoded base64 content to file
with open("output.xml", "wb") as file:
file.write(decoded)
# output decoded base64 content
print(decoded.decode('utf-8'))

What is the encoding getting used in tf.gfile.GFile()?

tf.gfile.GFile() does not accept an 'encoding' argument. from here i gathered that gfile returns only a byte stream but that seems to have changed now as:
with tf.gfile.GFile("./data/squad/test1.txt", mode = "rb") as file1:
print(file1.read(n = 2), type(file1.read(n = 2)))
with tf.gfile.GFile("./data/squad/test1.txt", mode = "r") as file1:
print(file1.read(n = 2), type(file1.read(n = 2)))
output:
b'as' <class 'bytes'>
as <class 'str'>
So what exactly is the encoding that it uses while reading those strings? Is it utf8 or is it platform dependent as in the case of open protocol in python?
As far as I understand the implementation, tf.io.gfile.GFile is always using UTF-8: https://github.com/tensorflow/tensorflow/blob/b3376f73ccfd6ae8721a946daf064675ee19b427/tensorflow/python/lib/io/file_io.py#L100
def write(self, file_content):
"""Writes file_content to the file. Appends to the end of the file."""
self._prewrite_check()
self._writable_file.append(compat.as_bytes(file_content))
It is converting str to bytes using tf.compat.as_bytes which encode to UTF-8:
Is just a byte-stream so is up to you to know what is the byte-encoding of your text.
You can use a library to detect the encoding and use that as the decoding method. As of today (June 2020) one of the best python libraries for encoding detection is chardet by the awesome guys at Mozilla
pip install chardet
If you know is 'utf-8' you can decode it using
import chardet
bstream = file1.read()
info = chardet.detect(bstream)
enc = info['encoding']
info['confidence']
text = bstream.decode(enc)

decompress zip string in Python 2.7

I am trying to decompress a byte64 encoded string in Python 2.7.
I can verify that my string is valid by running this in the command line:
echo -n "MY_BASE64_ENCODED_STRING" | base64 -d | zcat
However, if I run this in Python2.7:
b64_data = 'MY_BASE64_ENCODED_STRING'
text_data = zlib.decompress(base64.b64decode(b64_data))
I get an exception:
Error -3 while decompressing data: incorrect header check
Should I pass extra parameters to zlib.decompress to make it work?
As noted in the comments, your data is in gzip format and not just zlib compressed data. In Python 2.7, you can use GzipFile with StringIO to process the string:
>>> from gzip import GzipFile
>>> from StringIO import StringIO
>>> from base64 import b64decode
>>> data = 'H4sIAEm2algAAytJLS7hAgDGNbk7BQAAAA=='
>>> GzipFile(fileobj=StringIO(b64decode(data))).read()
'test\n'

Base 64 encode a JSON variable in Python

I have a variable that stores json value. I want to base64 encode it in Python. But the error 'does not support the buffer interface' is thrown. I know that the base64 needs a byte to convert. But as I am newbee in Python, no idea as how to convert json to base64 encoded string.Is there a straight forward way to do it??
In Python 3.x you need to convert your str object to a bytes object for base64 to be able to encode them. You can do that using the str.encode method:
>>> import json
>>> import base64
>>> d = {"alg": "ES256"}
>>> s = json.dumps(d) # Turns your json dict into a str
>>> print(s)
{"alg": "ES256"}
>>> type(s)
<class 'str'>
>>> base64.b64encode(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.2/base64.py", line 56, in b64encode
raise TypeError("expected bytes, not %s" % s.__class__.__name__)
TypeError: expected bytes, not str
>>> base64.b64encode(s.encode('utf-8'))
b'eyJhbGciOiAiRVMyNTYifQ=='
If you pass the output of your_str_object.encode('utf-8') to the base64 module, you should be able to encode it fine.
Here are two methods worked on python3
encodestring is deprecated and suggested one to use is encodebytes
import json
import base64
with open('test.json') as jsonfile:
data = json.load(jsonfile)
print(type(data)) #dict
datastr = json.dumps(data)
print(type(datastr)) #str
print(datastr)
encoded = base64.b64encode(datastr.encode('utf-8')) #1 way
print(encoded)
print(base64.encodebytes(datastr.encode())) #2 method
You could encode the string first, as UTF-8 for example, then base64 encode it:
data = '{"hello": "world"}'
enc = data.encode() # utf-8 by default
print base64.encodestring(enc)
This also works in 2.7 :)
Here's a function that you can feed a string and it will output a base64 string.
import base64
def b64EncodeString(msg):
msg_bytes = msg.encode('ascii')
base64_bytes = base64.b64encode(msg_bytes)
return base64_bytes.decode('ascii')

Categories