Serialize Python objects without file system - python

I want to serialize a trained scikit pipeline object to reload it for predictions. What I saw pickle and joblib dump are two common methods for that, whereas joblib is the preferable approach.
In my case I want to store the serialized python object in the database and load it from there, deserialize it and use it for predictions. Is it possible to serialize the object without any file system access?

Yes, for the pickle library you can get the serialized version of an object by using pickle.dumps instead of pickle.dump.
serialized_object = pickle.dumps(object)
This returns a bytes object, which you should then be able to store in your database, potentially converting it to base64 before doing so, or maybe directly.

You can do this:
import joblib
from io import BytesIO
import base64
with BytesIO() as tmp_bytes:
joblib.dump({"test": "test"}, tmp_bytes)
bytes_obj = tmp_bytes.getvalue()
base64_obj = base64.b64encode(bytes_obj)
Then, bytes_obj is a bytes object. And base64_obj is the base64 version. Select what you like.

Related

Pandas - to_csv() and from_csv()

I've been working to_csv()/read_csv() to read/write a data frame a user works with in an applet, where one of the columns is a datetime.datetime object, and it seems like to_csv automatically converts the datetimes to strings. Is this correct? If so, is there a way to "preserve" the dates as datetime rather than them being converted to strings? I've read through the documentation, and I can't seem to find the answer. Thank you.
To preserve the exact structure of a DataFrame, complete with data types, check out the pickle module, which "serializes" any python object to disk and reloads it back into a python environment.
Use pd.to_pickle instead of pd.to_csv, optionally with a compression argument (see docs):
# Save to pickle
df.to_pickle('pickle-file.pkl')
# Pickle with compression
df.to_pickle('pickle-file.pkl.gz', compression='gzip')
# Load pickle from disk
df = pd.read_pickle('pickle-file.pkl')
# or...
df = pd.read_pickle('pickle-file.pkl.gz', compression='gzip')

Persist generator object into some file type with python

I have a generator object obtained by sklearn
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html
df_fit = clf.fit(df_training, df_label.values.ravel())
generator=clf.staged_decision_function(df_training)
However, I wish to persist the Generator object in generator variable in some file type so I can retrieve them by reading the file back and save into a variable. I had tried using pickle with
pickle.dumps(generator)
The approach failed and from Google I learnt that we cannot serialize generator object in pickle. I am trying to save it in txt but I don't think it is the proper way to do. Any idea?

Read Binary string in Python, zlib

I want to store a large JSON (dict) from Python in dynamoDB.
After some investigation it seems that zlib is the way to go to get compression at a good level. Using below Im able to encode the dict.
ranking_compressed = zlib.compress(simplejson.dumps(response["Item"]["ranking"]).encode('utf-8'))
The (string?) then looks like this: b'x\x9c\xc5Z\xdfo\xd3....
I can directly decompress this and get the dict back with:
ranking_decompressed = simplejson.loads(str(zlib.decompress(ranking_compressed).decode('utf-8')))
All good so far. However, when putting this in dynamoDB and then reading it back using the same decompress code as above. The (string?) now looks like this:
Binary(b'x\x9c\xc5Z\xdf...
The error I get is:
bytes-like object is required, not 'Binary'
Ive tried accessing the Binary with e.g. .data but I cant reach it.
Any help is appreciated.
Boto3 Binary objects have a value property.
# in general...
binary_obj.value
# for your specific case...
ranking_decompressed = simplejson.loads(str(zlib.decompress(response["Item"]["ranking_compressed"].value).decode('utf-8')))
Oddly, this seems to be documented nowhere except the source code for the Binary class here

How to convert a numpy array to leveldb or lmdb format

I'm trying to convert a numpy array that was created using pylearn2 into leveldb/lmdb so that I can use in Caffe.
This is the script that I used to created the dataset.
After running this script, couple of files are generated, among which there are test.pkl, test.npy, train.pkl, train.npy
I dont know if there is a direct way for converting to leveldb/lmdb, so assuming there is no way, I need to be able to read each image and its corresponding label, so that I can then save it into a leveldb/lmdb database.
I was told I need to use the pickle file for reading since it provides a dictionary like structure. However, trying to do
import cPickle as pickle
pickle.load( open( "N:/pylearn2-master/datasets/cifar10/pylearn2_gcn_whitened/test.pkl", "rb" ) )
outputs
<pylearn2.datasets.cifar10.CIFAR10 at 0xde605f8>
and I dont know what the correct way of accessing the items in a pickle file is and or whether I need to read from the numpy array directly.

How do I serve a binary data from Tornado?

I have a numpy array, that I want to serve using Tornado, but when I try to write it using self.write(my_np_array) I just get an AssertionErrror.
What am I doing wrong?
File "server.py", line 28, in get
self.write(values)
File "/usr/lib/python2.7/site-packages/tornado/web.py", line 468, in write
chunk = utf8(chunk)
File "/usr/lib/python2.7/site-packages/tornado/escape.py", line 160, in utf8
assert isinstance(value, unicode)
Not exactly sure what your goal is, but if you want to get a string representation of the object you can do
self.write(str(your_object))
If you want to serve the numpy array as a python object in order to use it on a different client you need to pickle the object first
import pickle
self.write(pickle.dumps(your_object))
the object can then be retrieved with
your_object = pickle.loads(sent_object)
Keep in mind that it is dangerous to unpickle objects from an untrusted source as it can lead to malicious code execution.
Edit:
If you want to transfer a numpy array and use it within javascript you don't need a binary representation.
Just convert the numpy array to a list
your_numpy_list = your_numpy_object.tolist()
and convert it to json
import json
self.write(json.dumps(your_numpy_list))
at the javascript side you just parse the result string
var result = JSON.parse(resultString)
and create the typed array from it
var typedResult = new Float32Array(result)
voila!

Categories