Pickle with a specific module name - python

I am using the pickle library to serialise a custom object, let's call it A, which is defined in a.py.
If I pickle an object of type A in a.py as follows:
import pickle
class A:
...
if __name__ == "__main__":
inst = A("some param")
with open("a.pickle", 'wb') as dumpfile:
pickle.dump(inst, dumpfile)
Then there is a problem with loading this object from storage if the module A is not explicitly in the namespace __main__. See this other question. This is because pickle knows that it should look for the class A in __main__, since that's where it was when pickle.dump() happened.
Now, there are two approaches to dealing with this:
Deal with it at the deserialisation end,
Deal with it at the serialisation end.
For 1, there are various options (see the above link, for example), but I want to avoid these, because I think it makes sense to give the 'pickler' responsibility regarding its data.
For 2, we could just avoid pickling when the module is under the __main__ namespace, but that doesn't seem very flexible. We could alternatively modify A.__module__, and set it to the name of the module (as done here).
Pickle uses this __module__ variable to find where to import the class A from, so setting it before .dump() works:
if __name__ == "__main__":
inst = A("some param")
A.__module__ = 'a'
with open("a.pickle", 'wb') as dumpfile:
pickle.dump(inst, dumpfile)
Q: is this a good idea? It seems like it's implementation dependent, not interface dependent. That is, pickle could decide to use another method of locating modules to import, and this approach would break. Is there an alternative that uses pickle's interface?

Another way around it would be to import the file itself:
import pickle
import a
class A:
pass
def main():
inst = a.A()
print(inst.__module__)
with open("a.pickle", 'wb') as dumpfile:
pickle.dump(inst, dumpfile)
if __name__ == "__main__":
main()
Note that it works because the import statement is purely an assignment of the name a to a module object, it doesn't go to infinite recursion when you import a within a.py.

Related

Python multiprocessing - mapping private method

Generally I'm aware of pickle mechanism, but can't understand why this example:
from multiprocessing import Pool
class Foo:
attr = 'a class attr'
def __test(self,x):
print(x, self.attr)
def test2(self):
with Pool(4) as p:
p.map(self.__test, [1,2,3,4])
if __name__ == '__main__':
f = Foo()
f.test2()
complains about __test method?
return _ForkingPickler.loads(res)
AttributeError: 'Foo' object has no attribute '__test'
After changing def __test to def _test(one underscore) everything works fine. Do I miss any basics knowledge of pickleing or "private" methods?
This appears to be a flaw in the name mangling magic. The actual name of a name-mangled private function incorporates the class name, so Foo.__test is actually named Foo._Foo__test, and other methods in the class just implicitly look up that name when they use self.__test.
Problem is, the magic extends to preserving the __name__ unmangled; Foo._Foo__test.__name__ is "__test". And pickle uses the __name__ to serialize the method. When it tries to deserialize it on the other end, it tries to look up plain __test, without applying the name mangling, so it can't find _Foo__test (the real name).
I don't think there is any immediate solution here aside from not using a private method directly (using it indirectly via another non-private method or global function would be fine); even if you try to pass self._Foo__test, it'll still pickle the unmangled name from __name__.
The longer term solution would be to file a bug on the Python bug tracker; there may be a clever way to preserve the "friendly" __name__ while still allowing pickle to seamlessly mangle as needed.

Import or not to import classmethod?

I hope this isn't a stupid question but I found some code where they imported classmethod and some code where they don't so there is difference?
I'm using python 3.6 but the code originally I think was for python 2.7 (it used from __builtin__ import)
import unittest
from selenium import webdriver
from builtins import classmethod #original code was from __builtin__ import classmethod
class HomePageTest(unittest.TestCase):
#classmethod
def setUp(cls):
# create a new Firefox session
cls.driver = webdriver.Firefox()
cls.driver.implicitly_wait(30)
cls.driver.maximize_window()
# navigate to the application home page
cls.driver.get("http://demo-store.seleniumacademy.com/")
def test_search_field(self):
pass
#My tests without #classmethod
#classmethod
def tearDown(cls):
# close the browser window
cls.driver.quit()
if __name__ == '__main__':
unittest.main(verbosity=2)
Normally you only import builtins or __builtin__ if you also have a variable in your code with the same name as a builtin and also want to access the builtin name. The documentation of the module explains it rather well:
builtins — Built-in objects
This module provides direct access to all ‘built-in’ identifiers of Python; for example, builtins.open is the full name for the built-in function open(). See Built-in Functions and Built-in Constants for documentation.
This module is not normally accessed explicitly by most applications, but can be useful in modules that provide objects with the same name as a built-in value, but in which the built-in of that name is also needed. For example, in a module that wants to implement an open() function that wraps the built-in open(), this module can be used directly:
import builtins
def open(path):
f = builtins.open(path, 'r')
return UpperCaser(f)
class UpperCaser:
'''Wrapper around a file that converts output to upper-case.'''
def __init__(self, f):
self._f = f
def read(self, count=-1):
return self._f.read(count).upper()
However in your case there seems to be no classmethod definition in the file so you don't actually need the from builtins import classmethod.
In Python 3, there's no need to import the builtins module, or anything inside it. When the lookup for a name in the current scope fails, builtins is looked up as a fallback.
If you need to maintain code integrity, consider explicitly checking Python version before this.
import sys
if sys.version_info[0] == 2:
from __builtin__ import classmethod

namedtuple pickling fails when variable name doesn't match typename

The python code below fails with the error pickle.PicklingError: Can't pickle <class '__main__.SpecialName'>: it's not found as __main__.SpecialName
import pickle
from collections import namedtuple
different_SpecialName = namedtuple('SpecialName', 'foo bar')
def save():
foo = different_SpecialName(1, 2)
with open('foosave.pkl', 'w') as f:
pickle.dump(foo, f)
if __name__ == '__main__':
save()
This seems like bad behaviour of the pickle module, as it depends on the correctness of a variable name. Changing different_SpecialName to SpecialName and re-running the code allows it to complete successfully. Changing the code to the below, where a variable with SpecialName is instantiated to be the same value as different_SpecialName, also lets the code run successfully
import pickle
from collections import namedtuple
different_SpecialName = namedtuple('SpecialName', 'foo bar')
## create new variable with 'correct' name
SpecialName = different_SpecialName
def save():
# foo = different_SpecialName(1, 2)
foo = SpecialName(1, 2)
with open('foosave.pkl', 'w') as f:
pickle.dump(foo, f)
if __name__ == '__main__':
save()
My questions: is this fundamentally a pickle (and cPickle) bug? It seems like pickle shouldn't be looking up the class definition by using the name of the variable (although, I'm not sure what else it could do). Or, instead, is this an issue with the namedtuple API? I browsed the namedtuple documentation and couldn't find anything that explicitly told me to name my namedtuple variables the same as my typename argument (the first argument to the namedtuple() function)
It's not a bug. pickle requires that
the class definition must be importable and live in the same module as when the object was stored.
From the perspective of the namedtuple's __reduce__ method, the type name is SpecialName (that's what you passed it after all). So when unpickling, it will try to import the module it was declared in and look for SpecialName. But since you didn't save it as SpecialName, it can't find it.
Without resorting to namedtuples, you can produce the exact same problem with:
class Foo:
pass
Bar = Foo
del Foo
and trying to pickle and unpickle a Bar(); under the hood, you've effectively done the same thing with your mismatched names for a namedtuple.

Can I "fake" a package (or at least a module) in python for testing purposes?

I want to fake a package in python. I want to define something so that the code can do
from somefakepackage.morefakestuff import somethingfake
And somefakepackage is defined in code and so is everything below it. Is that possible? The reason for doing this is to trick my unittest that I got a package ( or as I said in the title, a module ) in the python path which actually is just something mocked up for this unittest.
Sure. Define a class, put the stuff you need inside that, assign the class to sys.modules["classname"].
class fakemodule(object):
#staticmethod
def method(a, b):
return a+b
import sys
sys.modules["package.module"] = fakemodule
You could also use a separate module (call it fakemodule.py):
import fakemodule, sys
sys.modules["package.module"] = fakemodule
Yes, you can make a fake module:
from types import ModuleType
m = ModuleType("fake_module")
import sys
sys.modules[m.__name__] = m
# some scripts may expect a file
# even though this file doesn't exist,
# it may be used by Python for in error messages or introspection.
m.__file__ = m.__name__ + ".py"
# Add a function
def my_function():
return 10
m.my_function = my_function
Note, in this example its using an actual module (of ModuleType) since some
Python code may expect modules, (instead of a dummy class).
This can be made into a utility function:
def new_module(name, doc=None):
import sys
from types import ModuleType
m = ModuleType(name, doc)
m.__file__ = name + '.py'
sys.modules[name] = m
return m
print(new_module("fake_module", doc="doc string"))
Now other scripts can run:
import fake_module
I took some of the ideas from the other answers and turned them into a Python decorator #modulize which converts a function into a module. This module can then be imported as usual. Here is an example.
#modulize('my_module')
def my_dummy_function(__name__): # the function takes one parameter __name__
# put module code here
def my_function(s):
print(s, 'bar')
# the function must return locals()
return locals()
# import the module as usual
from my_module import my_function
my_function('foo') # foo bar
The code for the decorator is as follows
import sys
from types import ModuleType
class MockModule(ModuleType):
def __init__(self, module_name, module_doc=None):
ModuleType.__init__(self, module_name, module_doc)
if '.' in module_name:
package, module = module_name.rsplit('.', 1)
get_mock_module(package).__path__ = []
setattr(get_mock_module(package), module, self)
def _initialize_(self, module_code):
self.__dict__.update(module_code(self.__name__))
self.__doc__ = module_code.__doc__
def get_mock_module(module_name):
if module_name not in sys.modules:
sys.modules[module_name] = MockModule(module_name)
return sys.modules[module_name]
def modulize(module_name, dependencies=[]):
for d in dependencies: get_mock_module(d)
return get_mock_module(module_name)._initialize_
The project can be found here on GitHub. In particular, I created this for programming contests which only allow the contestant to submit a single .py file. This allows one to develop a project with multiple .py files and then combine them into one .py file at the end.
You could fake it with a class which behaves like somethingfake:
try:
from somefakepackage.morefakestuff import somethingfake
except ImportError:
class somethingfake(object):
# define what you'd expect of somethingfake, e.g.:
#staticmethod
def somefunc():
...
somefield = ...
TL;DR
Patch sys.modules using unittest.mock:
mock.patch.dict(
sys.modules,
{'somefakepackage': mock.Mock()},
)
Explanation
Other answers correctly recommend to fix sys.modules but a proper way to do it is by patching it using mock.patch. Meaning replacing it temporarily (only for when tests are run) with a fake object that optionally imitates the desired behaviour. And restoring it back once tests are finished to not affect other test cases.
The code in TL;DR section will simply make your missing package not raise ImportError. To provide fake package with contents and imitate desired behaviour, initiate mock.Mock(…) with proper arguments (e.g. add attributes via Mock's **kwargs).
Full code example
The code below temporarily patches sys.modules so that it includes somefakepackage and makes it importable from the dependent modules without ImportError.
import sys
import unittest
from unittest import mock
class SomeTestCase(unittest.TestCase):
def test_smth(self):
# implement your testing logic, for example:
self.assertEqual(
123,
somefakepackage_dependent.some_func(),
)
#classmethod
def setUpClass(cls): # called once before all the tests
# define what to patch sys.modules with
cls._modules_patcher = mock.patch.dict(
sys.modules,
{'somefakepackage': mock.Mock()},
)
# actually patch it
cls._modules_patcher.start()
# make the package globally visible and import it,
# just like if you have imported it in a usual way
# placing import statement at the top of the file,
# but relying on a patched dependency
global somefakepackage_dependent
import somefakepackage_dependent
#classmethod # called once after all tests
def tearDownClass(cls):
# restore initial sys.modules state back
cls._modules_patcher.stop()
To read more about setUpClass/tearDownClass methods, see unittest docs.
unittest's built-in mock subpackage is actually a very powerful tool. Better dive deeper into its documentation to get a better understanding.

Intercepting module calls?

I'm trying to 'intercept' all calls to a specific module, and reroute them to another object. I'd like to do this so that I can have a fairly simple plugin architecture.
For example, in main.py
import renderer
renderer.draw('circle')
In renderer.py
specificRenderer = OpenGLRenderer()
#Then, i'd like to route all calls from main.py so that
#specificRenderer.methodName(methodArgs) is called
# i.e. the above example would call specificRenderer.draw('circle')
This means that any function can just import renderer and use it, without worrying about the details. It also means that I can completely change the renderer just by creating another object and assigning it to the 'specificRenderer' value in renderer.py
Any ideas?
In renderer.py:
import sys
if __name__ != "__main__":
sys.modules[__name__] = OpenGLRenderer()
The module name is now mapped to the OpenGLRenderer instance, and import renderer in other modules will get the same instance.
Actually, you don't even need the separate module. You can just do:
import sys
sys.modules["renderer"] = OpenGLRenderer()
import renderer # gives current module access to the "module"
... first thing in your main module. Imports of renderer in other modules, once again, will refer to the same instance.
Are you sure you really want to do this in the first place? It isn't really how people expect modules to behave.
The simplest way to do that is to have main.py do
from renderer import renderer
instead, then just name specificRenderer renderer.
My answer is very similar to #kindall's although I got the idea elsewhere. It goes a step further in the sense that it replaces the module object that's usually put in the sys.modules list with an instance of a class of your own design. At a minimum such a class would need to look something like this:
File renderer.py:
class _renderer(object):
def __init__(self, specificRenderer):
self.specificRenderer = specificRenderer
def __getattr__(self, name):
return getattr(self.specificRenderer, name)
if __name__ != '__main__':
import sys
# from some_module import OpenGLRenderer
sys.modules[__name__] = _renderer(OpenGLRenderer())
The __getattr__() method simply forwards most attribute accesses on to the real renderer object. The advantage to this level of indirection is that with it you can add your own attributes to the private _renderer class and access them through the renderer object imported just as though they were part of an OpenGLRenderer object. If you give them the same name as something already in an OpenGLRenderer object, they will be called instead, are free to forward, log, ignore, and/or modify the call before passing it along -- which can sometimes be very handy.
Class instances placed in sys.modules are effectively singletons, so if the module is imported in other scripts in the application, they will all share the single instance created by the first one.
If you don't mind that import renderer results an object rather than a module, then see kindall's brilliant solution.
If you want to make #property work (i.e. each time you fetch renderer.mytime, you want the function corresponding to OpenGLRenderer.mytime get called) and you want to keep renderer as a module, then it's impossible. Example:
import time
class OpenGLRenderer(object):
#property
def mytime(self):
return time.time()
If you don't care about properties, i.e. it's OK for you that mytime gets called only once (at module load time), and it will keep returning the same timestamp, then it's possible to do it by copying all symbols from the object to the module:
# renderer.py
specificRenderer = OpenGLRenderer()
for name in dir(specificRenderer):
globals()[name] = getattr(specificRenderer, name)
However, this is a one-time copy. If you add methods or other attributes to specificRenderer later dynamically, or change some attributes later, then they won't be automatically copied to the renderer module. This can be fixed, however, by some ugly __setattr__ hacking.
Edit: This answer does not do what the OP wants; it doesn't instantiate an object and then let calls to a module be redirected to that same object. This answer is about changing which rendering module is being used.
Easiest might be to import the OpenGLRenderer in the main.py program like this:
import OpenGLRenderer as renderer
That's code in just one place, and in the rest of your module OpenGLRenderer can be referred to as renderer.
If you have several modules like main.py, you could have your renderer.py file be just the same line:
import OpenGLRenderer as renderer
and then other modules can use
from renderer import renderer
If OpenGLRenderer doesn't quite quack right yet, you can monkeypatch it to work as you need in the renderer.py module.

Categories