How to execute pickle serialize class a remote host with Python - python

I used Pickle to serialize a Python object locally, and after the deserialization on the remote host, an error was reported.suggesting that I ModuleNotFoundError: No module named 'plugins'.
I want the remote host to not see my source file while executing my function
This is my file structure
|plugins/
|one/
|--test1.py
|main.py
The file is located in the plugins directory
class Plugin:
def start(self):
pass
o=Plugin()
import pickle
pickle.loads(o)
This is the code that is executed remotely
import pickle,requests
result=requests.get('http://127.0.0.1:8000/').content
o=pickle.loads(result)
o.start()
Remote host throws an exception
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-16-bc409313ddfa> in <module>
----> 1 o=pickle.loads(result)
ModuleNotFoundError: No module named 'plugins'

It might help if you would explain in more detail what you want to accomplish here. It seems to me as if you want to implement a protocol where arbitrary code can be executed on a remote host.
I think it is quite possible that the pickle module is not what you are looking for.
Quote from the documentation:
Note that functions (built-in and user-defined) are pickled by “fully qualified” name reference, not by value. This means that only the function name is pickled, along with the name of the module the function is defined in. Neither the function’s code, nor any of its function attributes are pickled. Thus the defining module must be importable in the unpickling environment, and the module must contain the named object, otherwise an exception will be raised.
So in order to make your example work as it is, the remote host must get the source of the 'plugins' module or at least the source of the 'Plugin' class.
Update:
As you explained in your comment, your aim is to execute code remotely. This cannot be done with the pickle module.
You may want to have a look at RPyC, a library for remote procedure calls and distributed-computing:
https://en.wikipedia.org/wiki/RPyC
https://rpyc.readthedocs.io/en/latest/

I think the dill could save your question. It's extends the pickle.
dill.dump(your_class_obj, local_file) # dump to file
dill.load(local_file) # load in other place

Related

ImportError for top-level package when trying to use dill to pickle entire package source code alongside instance

I have the following project structure:
Package1
|--__init__.py
|--__main__.py
|--Module1.py
|--Module2.py
where Module1.py contains something like:
import dill as pickle
import Package1.Module2
# from https://stackoverflow.com/questions/52402783/pickle-class-definition-in-module-with-dill
def mainify(obj):
import __main__
import inspect
import ast
s = inspect.getsource(obj)
m = ast.parse(s)
co = compile(m, "<string>", "exec")
exec(co, __main__.__dict__)
def Module1():
"""I hope the details of this class are not necessary for this example. I can add detail if necessary
"""
obj_to_pickle = Module1()
def write_session():
mainify(Module1)
mainify(Module2)
with FileHandler.open_file(...) as f:
pickle.dump(obj_to_pickle, f)
I run the code as a module via python -m Package1 ..., thus __main__.py is the entry point to package execution, though I hope these details aren't relevant (I can improve my example if necessary).
Now, when I try to load the pickled object, I get ModuleNotFoundError: No module named Package1.
How can tell dill in this situation to understand that Package1 is the package? The mainify function seems to be getting the modules' source code into the pickle, but I believe the import statement in Module1.py that is import Package1.Module2.py is causing the ImportError. How can I tell dill to understand the reference to Package1?
NOTE: this reference can be fixed by adding the directory that Package1 is in via sys.path.append. But the whole point of pickling the package source alongside the instance is to make pickled instance unpicklable without needed to do this.
Relevant posts:
Pickle class definition in module with dill
Why dill dumps external classes by reference, no matter what?
#courtyardz. I'm a contributor of dill and your question is similar to others that have been asked in the past.
First, let me explain that generally dill assumes that all the modules necessary to deserialize an object are importable in the "unpickling" environment. Therefore modules are almost always saved by reference, with the current exception of modules that are not properly installed, like local modules (e.g. located in the working directory) or modules at non-canonical paths added to sys.path. There's also a function that's able to save the complete state of a module, which can be restored afterwards, but not the module itself.
That said, what exactly do you need? It's to serialize an object alongside its class (including any objects in the module's namespace that it refers to), or it's really the whole module?
If you need to transfer the complete module to an interpreter session where it's not available, like in a different machine, this problem is under active discussion here: https://github.com/uqfoundation/dill/issues/123. There's no complete solution for this currently, but one possibility is to ship the module as a ZIP archive, and load it using the zipimport module (indirectly, by saving the zip file to disk, maybe in a temporary location, and adding its path to sys.path as described in Python's documentation).
If you just need to serialize an object with its class, note that doing such has the limitation that objects of that class pickled by separate calls to dill.dump() or dill.dumps() will end up having different (although identical) classes when unpickled. This may or may not be a problem. There's also an open discussion about forcing the serialization of a class by value: https://github.com/uqfoundation/dill/issues/424.
The workaround you are trying to use should work because dill pickles classes defined in the __main__ module by value, as well as "orphaned" classes, i.e. classes that can't be found in the module where they were defined. However, for this to work the object must be created by the __main__.Module1 class (I suppose this is a class, even though you used def instead of class in your code example), not the Package1.Module1.Module1 class. If the class references global objects in Module1 in its methods, you may need to use the option recurse=True with dill.dump(s).
A simpler workaround, that may not work for your specific case as it involves multiple modules, is to temporarily change the __module__ attribute of the class. For example, at a module's body:
import dill
class X:
pass
obj = X()
X.__module__ = None # temporarily orphan the class
with open('/path/to/file.pkl', 'wb') as file:
dill.dump(obj) # X will be pickled by value because __module__ is None
X.__module__ = __name__ # de-orphan the class
Going back to your example, if you can't create the object with the "mainified" class, you may change the object's class temporarily too:
obj_to_pickle = Module1()
def write_session():
mainify(Module1)
mainify(Module2)
obj_to_pickle.__class__ = __main__.Module1
with FileHandler.open_file(...) as f:
pickle.dump(obj_to_pickle, f)
obj_to_pickle.__class__ = Module1
If the object has instance attributes of types defined in Package1, it won't work however.

Serialize a python function with dependencies

I have tried multiple approaches to pickle a python function with dependencies, following many recommendations on StackOverflow, (such as dill, cloudpickle, etc.) but all seem to run into a fundamental issue that I cannot figure out.
I have a main module that tries to pickle a function from an imported module, sends it over ssh to be unpickled and executed at a remote machine.
So main has:
import dill (for example)
import modulea
serial=dill.dumps( modulea.func )
send (serial)
On the remote machine:
import dill
receive serial
funcremote = dill.loads( serial )
funcremote()
If the functions being pickled and sent are top level functions defined in main itself, everything works. When they are in an imported module, the loads function fails with messages of the type "module modulea not found".
It appears that the module name is pickled along with the function name. I do not see any way to "fix up" the pickle to remove the dependency, or alternately, to create a dummy module in the receiver to become the recipient of the unpickling.
Any pointers will be much appreciated.
--prasanna
I'm the dill author. I do this exact thing over ssh, but with success. Currently, dill and any of the other serializers pickle modules by reference… so to successfully pass a function defined in a file, you have to ensure that the relevant module is also installed on the other machine. I do not believe there is any object serializer that serializes modules directly (i.e. not by reference).
Having said that, dill does have some options to serialize object dependencies. For example, for class instances, the default in dill is to not serialize class instances by reference… so the class definition can also be serialized and send with the instance. In dill, you can also (use a very new feature to) serialize file handles by serializing the file, instead of the doing so by reference. But again, if you have the case of a function defined in a module, you are out-of-luck, as modules are serialized by reference pretty darn universally.
You might be able to use dill to do so, however, just not with pickling the object, but with extracting the source and sending the source code. In pathos.pp and pyina, dill us used to extract the source and the dependencies of any object (including functions), and pass them to another computer/process/etc. However, since this is not an easy thing to do, dill can also use the failover of trying to extract a relevant import and send that instead of the source code.
You can understand, hopefully, this is a messy messy thing to do (as noted in one of the dependencies of the function I am extracting below). However, what you are asking is successfully done in the pathos package to pass code and dependencies to different machines across ssh-tunneled ports.
>>> import dill
>>>
>>> print dill.source.importable(dill.source.importable)
from dill.source import importable
>>> print dill.source.importable(dill.source.importable, source=True)
def _closuredsource(func, alias=''):
"""get source code for closured objects; return a dict of 'name'
and 'code blocks'"""
#FIXME: this entire function is a messy messy HACK
# - pollutes global namespace
# - fails if name of freevars are reused
# - can unnecessarily duplicate function code
from dill.detect import freevars
free_vars = freevars(func)
func_vars = {}
# split into 'funcs' and 'non-funcs'
for name,obj in list(free_vars.items()):
if not isfunction(obj):
# get source for 'non-funcs'
free_vars[name] = getsource(obj, force=True, alias=name)
continue
# get source for 'funcs'
#…snip… …snip… …snip… …snip… …snip…
# get source code of objects referred to by obj in global scope
from dill.detect import globalvars
obj = globalvars(obj) #XXX: don't worry about alias?
obj = list(getsource(_obj,name,force=True) for (name,_obj) in obj.items())
obj = '\n'.join(obj) if obj else ''
# combine all referred-to source (global then enclosing)
if not obj: return src
if not src: return obj
return obj + src
except:
if tried_import: raise
tried_source = True
source = not source
# should never get here
return
I imagine something could also be built around the dill.detect.parents method, which provides a list of pointers to all parent object for any given object… and one could reconstruct all of any function's dependencies as objects… but this is not implemented.
BTW: to establish a ssh tunnel, just do this:
>>> t = pathos.Tunnel.Tunnel()
>>> t.connect('login.university.edu')
39322
>>> t
Tunnel('-q -N -L39322:login.university.edu:45075 login.university.edu')
Then you can work across the local port with ZMQ, or ssh, or whatever. If you want to do so with ssh, pathos also has that built in.

cPickle.load throwing ImportError in Python

I have Python 2.7.3 installed on my Windows 7 computer. When I run the following code
import nltk, json, cPickle, itertools
import numpy as np
from nltk.tokenize import word_tokenize
from pprint import pprint
t_given_a = json.load(open('conditional_probability.json','rb'))
a_unconditional = json.load(open('age.json','rb'))
t_unconditional = cPickle.load(open('freqdist.pkl','rb'))['distribution']
The command prompt gives me the error
"ImportError: No Module named Multiarray."
I'm fairly new to Python and I'm not exactly sure why this error happened. I searched other threads and many suggested to use 'rb' instead of 'r'. I have rb to begin with and it's still throwing me that error. Any suggestion?
When you pickle an object in python it saves its class as a string of package name + class name. On unpickle python tries to import that module and find that class for you to recreate an object. And if you don't have that module importable you'll get an ImportError.
Just install that Multiarray module, and if you don't know which is it then ask whoever you got that pickle file from.
From the docs:
Note that functions (built-in and user-defined) are pickled by “fully
qualified” name reference, not by value. This means that only the
function name is pickled, along with the name of the module the
function is defined in. Neither the function’s code, nor any of its
function attributes are pickled. Thus the defining module must be
importable in the unpickling environment, and the module must contain
the named object, otherwise an exception will be raised.
Similarly, classes are pickled by named reference, so the same restrictions in
the unpickling environment apply. Note that none of the class’s code
or data is pickled
[...] These restrictions are why picklable functions and classes must be
defined in the top level of a module

I'm getting AttributeError when I call a method in other class

I'm very new to Python and I have a code like this:
class Configuration:
#staticmethod
def test():
return "Hello World"
When I call the method test from other python code like this:
import test
test.Configuration.test()
I get an error like this:
Traceback (most recent call last):
File "example.py", line 3, in <module>
test.Configuration.test()
AttributeError: 'module' object has no attribute 'test'
where I'm making the mistake?
Edit:
My directory structure:
root
--example.py
--test
----__init.py__
----Configuration.py
Python module names and the classes they contain are separate. You need use the full path:
import test
print test.Configuration.Configuration.test()
Your test package has a module named Configuration, and inside that module is your Configuration class.
Note that Python, unlike Java, lets you define methods outside classes too, no need to make this a static method. Nor do you need to use a separate file per class.
Try to rename your module to something other than 'test', since this is the name of a standard library module (http://docs.python.org/2/library/test.html) and probably you're importing that module instead of your own. Another option is to add the directory containing your test module into the PYTHONPATH environment variable, so that python may find it instead of the standard library module (but this is not advised as it shadows the standard module and you won't be able to import it later).
To check which file you're importing from, do:
import test
print test

Python : How to read pickle dump?

I have a pickle dump which I got from a friend and he asked me to read it like :
f = open('file.pickle')
import pickle
l = pickle.loads(f.read())
But I get an ImportError saying no module named sql.models
Can someone help me understand what is happening ?
You are missing the code required to reconstruct the pickled objects.
Pickles store the location where the class can be imported from, together with the instance attributes. The original module is still required to recreate the module. From the documentation:
Note that functions (built-in and user-defined) are pickled by “fully qualified” name reference, not by value. This means that only the function name is pickled, along with the name of the module the function is defined in. Neither the function’s code, nor any of its function attributes are pickled. Thus the defining module must be importable in the unpickling environment, and the module must contain the named object, otherwise an exception will be raised. [4]
Similarly, classes are pickled by named reference, so the same restrictions in the unpickling environment apply. Note that none of the class’s code or data is pickled, so in the following example the class attribute attr is not restored in the unpickling environment:
class Foo:
attr = 'a class attr'
picklestring = pickle.dumps(Foo)
These restrictions are why picklable functions and classes must be defined in the top level of a module.
In other words, the original data used to create the pickle includes at least one instance of a custom class that originates in a module named sql.models.
Do be careful reading arbitrary pickles, even from friends. A pickle is just a stack language that recreates arbitrary Python structures. You can construct a pickle that spawns a secret back-door server on your computer, with enough determination and skill. The pickle documention warns you explicitly:
Warning: The pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.
This has been a problem in the past, even for experienced developers.

Categories