Python pickle._dumps works while pickle.dumps fails - python

I ran into an issue recently where I tried to do Pickle (Python 3.6) on an object of type:
twilio.rest.chat.v2.service.channel.ChannelInstance (The code for it can be found here. )
The relevant code was just this:
picked_rv = pickle.dumps(value)
it would result in: TypeError: can’t pickle _thread.RLock objects on our server
or
AttributeError: Can't pickle local object 'init_logging.<locals>.NotProductionHandler'
on my local system. Both RLock and NotProductionHandler are objects in the same environment as the ChannelInstance but not actually referenced by it at all, i.e. those are objects I instantiated in the same context while making a call to Twilio and getting the ChannelInstance back.
However, as soon as I try to use the pure Python Pickle instead of the cPickle (or _pickle in the module) dumps like this:
picked_rv = pickle._dumps(value)
it would work just fine. Can anyone explain the difference in behavior? I'm just baffled by both the error and difference in behavior. Why would cPickle try to pull in the other variables and objects in the context and then why would the pure Python implementation work?
This was not an issue with on Twilio 6.18.0, cPickle worked just fine, but I suspect the issue is Pickle.
Thanks in advance.

Related

How to specify the object type so it is recognized prior to running the code in Python?

Suppose I have the following code excerpt:
import pickle
with open('my_object.pkl','r') as f:
object = pickle.load(f)
My question is:
Suppose object is from a class I defined previously, how can I specify this in the code such that my interpreter knows prior to running the code what the object class is ? My goal here is to have the auto-completion of my IDE (I use VSCode) recognize the object so I can auto-complete and easily search the methods and attributes of that object.
It depends on the version of Python and IDE, but in general looks like an additional statement with assertion instance type is the only way so far. This will trigger VS autocomplete settings
import pickle
with open('my_object.pkl','r') as f:
object = pickle.load(f)
assert isinstance(object, YourType)
# and now you can use autocompletion with the object
The following issue is tracking that feature: #82.

Error with pickle module. AttributeError: class has no attribute '__new__'

I have been used python and Abaqus for a long time. But when i upgraded my python from 2.7 to 3.5.2 some error occures. I try to pickle some object A of my class.
f = open(utilsDir + "aclass.log", 'wb')
pickle.dump(A,f,protocol=2)
f.close()
and then unpickle it with abaqus' python, which is still 2.7.
filepath = utilsDir + 'aclass.log'
A1 = pickle.load(file(filepath))
All it has worked before updating my python, but now i have an error:
This is old and the answer will not help the OP, but in case anyone stumbles on this for a code he can modify, this error usually appears when the class pickled in Python 2 is not a new style class, i.e. does not inherit from object.

Module Instantiation in myhdl

I'm currently looking into myHdl to see if it's worth using or not. However, I've come across a hiccup regarding the instantiation of modules. I've got two files, one that's a module and one that's the testbench. Inside the testbench, I've instantiated the module following the example they have on the website:
http://www.myhdl.org/examples/flipflops.html
The instantiation specifically is this line: dff_inst = dff(q, d, clk)
However, I get an error when I try to run the testbench:
Exception TypeError: 'isinstance() arg 2 must be a class, type, or tuple of classes and types' in <generator object _LabelGenerator at 0x7f6070b2ea50> ignored
I assume this has something to do with the fact that I have two separate files, so my guess is that python isn't finding the dff module(since it's in a separate file). I tried adding in an import dff line, but that simply gave me a 'module' object is not callable type error, which makes sense.
Looking in the documentation, they don't have a full .py file, so I'm not sure how they're linking these testbenches with the module. They specifically mention a hierarchy system and being able to instantiate other modules, but I can't seem to get it to work.
From what I understand from documentation, it looks like they're just writing the testbench and the module in the same file. However, to my understanding, it looks like they imply you can import modules, but I can't figure out how that's done. Is there just some simple thing I'm overlooking?
After experimenting a bit, it seems like I just need to use the following command: from dff import dff,
which makes a lot of sense.

Serialize a python function with dependencies

I have tried multiple approaches to pickle a python function with dependencies, following many recommendations on StackOverflow, (such as dill, cloudpickle, etc.) but all seem to run into a fundamental issue that I cannot figure out.
I have a main module that tries to pickle a function from an imported module, sends it over ssh to be unpickled and executed at a remote machine.
So main has:
import dill (for example)
import modulea
serial=dill.dumps( modulea.func )
send (serial)
On the remote machine:
import dill
receive serial
funcremote = dill.loads( serial )
funcremote()
If the functions being pickled and sent are top level functions defined in main itself, everything works. When they are in an imported module, the loads function fails with messages of the type "module modulea not found".
It appears that the module name is pickled along with the function name. I do not see any way to "fix up" the pickle to remove the dependency, or alternately, to create a dummy module in the receiver to become the recipient of the unpickling.
Any pointers will be much appreciated.
--prasanna
I'm the dill author. I do this exact thing over ssh, but with success. Currently, dill and any of the other serializers pickle modules by reference… so to successfully pass a function defined in a file, you have to ensure that the relevant module is also installed on the other machine. I do not believe there is any object serializer that serializes modules directly (i.e. not by reference).
Having said that, dill does have some options to serialize object dependencies. For example, for class instances, the default in dill is to not serialize class instances by reference… so the class definition can also be serialized and send with the instance. In dill, you can also (use a very new feature to) serialize file handles by serializing the file, instead of the doing so by reference. But again, if you have the case of a function defined in a module, you are out-of-luck, as modules are serialized by reference pretty darn universally.
You might be able to use dill to do so, however, just not with pickling the object, but with extracting the source and sending the source code. In pathos.pp and pyina, dill us used to extract the source and the dependencies of any object (including functions), and pass them to another computer/process/etc. However, since this is not an easy thing to do, dill can also use the failover of trying to extract a relevant import and send that instead of the source code.
You can understand, hopefully, this is a messy messy thing to do (as noted in one of the dependencies of the function I am extracting below). However, what you are asking is successfully done in the pathos package to pass code and dependencies to different machines across ssh-tunneled ports.
>>> import dill
>>>
>>> print dill.source.importable(dill.source.importable)
from dill.source import importable
>>> print dill.source.importable(dill.source.importable, source=True)
def _closuredsource(func, alias=''):
"""get source code for closured objects; return a dict of 'name'
and 'code blocks'"""
#FIXME: this entire function is a messy messy HACK
# - pollutes global namespace
# - fails if name of freevars are reused
# - can unnecessarily duplicate function code
from dill.detect import freevars
free_vars = freevars(func)
func_vars = {}
# split into 'funcs' and 'non-funcs'
for name,obj in list(free_vars.items()):
if not isfunction(obj):
# get source for 'non-funcs'
free_vars[name] = getsource(obj, force=True, alias=name)
continue
# get source for 'funcs'
#…snip… …snip… …snip… …snip… …snip…
# get source code of objects referred to by obj in global scope
from dill.detect import globalvars
obj = globalvars(obj) #XXX: don't worry about alias?
obj = list(getsource(_obj,name,force=True) for (name,_obj) in obj.items())
obj = '\n'.join(obj) if obj else ''
# combine all referred-to source (global then enclosing)
if not obj: return src
if not src: return obj
return obj + src
except:
if tried_import: raise
tried_source = True
source = not source
# should never get here
return
I imagine something could also be built around the dill.detect.parents method, which provides a list of pointers to all parent object for any given object… and one could reconstruct all of any function's dependencies as objects… but this is not implemented.
BTW: to establish a ssh tunnel, just do this:
>>> t = pathos.Tunnel.Tunnel()
>>> t.connect('login.university.edu')
39322
>>> t
Tunnel('-q -N -L39322:login.university.edu:45075 login.university.edu')
Then you can work across the local port with ZMQ, or ssh, or whatever. If you want to do so with ssh, pathos also has that built in.

We need to pickle any sort of callable

Recently a question was posed regarding some Python code attempting to facilitate distributed computing through the use of pickled processes. Apparently, that functionality has historically been possible, but for security reasons the same functionality is disabled. On the second attempted at transmitting a function object through a socket, only the reference was transmitted. Correct me if I am wrong, but I do not believe this issue is related to Python's late binding. Given the presumption that process and thread objects can not be pickled, is there any way to transmit a callable object? We would like to avoid transmitting compressed source code for each job, as that would probably make the entire attempt pointless. Only the Python core library can be used for portability reasons.
You could marshal the bytecode and pickle the other function things:
import marshal
import pickle
marshaled_bytecode = marshal.dumps(your_function.func_code)
# In this process, other function things are lost, so they have to be sent separated.
pickled_name = pickle.dumps(your_function.func_name)
pickled_arguments = pickle.dumps(your_function.func_defaults)
pickled_closure = pickle.dumps(your_function.func_closure)
# Send the marshaled bytecode and the other function things through a socket (they are byte strings).
send_through_a_socket((marshaled_bytecode, pickled_name, pickled_arguments, pickled_closure))
In another python program:
import marshal
import pickle
import types
# Receive the marshaled bytecode and the other function things.
marshaled_bytecode, pickled_name, pickled_arguments, pickled_closure = receive_from_a_socket()
your_function = types.FunctionType(marshal.loads(marshaled_bytecode), globals(), pickle.loads(pickled_name), pickle.loads(pickled_arguments), pickle.loads(pickled_closure))
And any references to globals inside the function would have to be recreated in the script that receives the function.
In Python 3, the function attributes used are __code__, __name__, __defaults__ and __closure__.
Please do note that send_through_a_socket and receive_from_a_socket do not actually exist, and you should replace them by actual code that transmits data through sockets.

Categories