(You may read this question for some background)
I would like to have a gracefully-degrading way to pickle objects in Python.
When pickling an object, let's call it the main object, sometimes the Pickler raises an exception because it can't pickle a certain sub-object of the main object. For example, an error I've been getting a lot is "can’t pickle module objects." That is because I am referencing a module from the main object.
I know I can write up a little something to replace that module with a facade that would contain the module's attributes, but that would have its own issues(1).
So what I would like is a pickling function that automatically replaces modules (and any other hard-to-pickle objects) with facades that contain their attributes. That may not produce a perfect pickling, but in many cases it would be sufficient.
Is there anything like this? Does anyone have an idea how to approach this?
(1) One issue would be that the module may be referencing other modules from within it.
You can decide and implement how any previously-unpicklable type gets pickled and unpickled: see standard library module copy_reg (renamed to copyreg in Python 3.*).
Essentially, you need to provide a function which, given an instance of the type, reduces it to a tuple -- with the same protocol as the reduce special method (except that the reduce special method takes no arguments, since when provided it's called directly on the object, while the function you provide will take the object as the only argument).
Typically, the tuple you return has 2 items: a callable, and a tuple of arguments to pass to it. The callable must be registered as a "safe constructor" or equivalently have an attribute __safe_for_unpickling__ with a true value. Those items will be pickled, and at unpickling time the callable will be called with the given arguments and must return the unpicked object.
For example, suppose that you want to just pickle modules by name, so that unpickling them just means re-importing them (i.e. suppose for simplicity that you don't care about dynamically modified modules, nested packages, etc, just plain top-level modules). Then:
>>> import sys, pickle, copy_reg
>>> def savemodule(module):
... return __import__, (module.__name__,)
...
>>> copy_reg.pickle(type(sys), savemodule)
>>> s = pickle.dumps(sys)
>>> s
"c__builtin__\n__import__\np0\n(S'sys'\np1\ntp2\nRp3\n."
>>> z = pickle.loads(s)
>>> z
<module 'sys' (built-in)>
I'm using the old-fashioned ASCII form of pickle so that s, the string containing the pickle, is easy to examine: it instructs unpickling to call the built-in import function, with the string sys as its sole argument. And z shows that this does indeed give us back the built-in sys module as the result of the unpickling, as desired.
Now, you'll have to make things a bit more complex than just __import__ (you'll have to deal with saving and restoring dynamic changes, navigate a nested namespace, etc), and thus you'll have to also call copy_reg.constructor (passing as argument your own function that performs this work) before you copy_reg the module-saving function that returns your other function (and, if in a separate run, also before you unpickle those pickles you made using said function). But I hope this simple cases helps to show that there's really nothing much to it that's at all "intrinsically" complicated!-)
How about the following, which is a wrapper you can use to wrap some modules (maybe any module) in something that's pickle-able. You could then subclass the Pickler object to check if the target object is a module, and if so, wrap it. Does this accomplish what you desire?
class PickleableModuleWrapper(object):
def __init__(self, module):
# make a copy of the module's namespace in this instance
self.__dict__ = dict(module.__dict__)
# remove anything that's going to give us trouble during pickling
self.remove_unpickleable_attributes()
def remove_unpickleable_attributes(self):
for name, value in self.__dict__.items():
try:
pickle.dumps(value)
except Exception:
del self.__dict__[name]
import pickle
p = pickle.dumps(PickleableModuleWrapper(pickle))
wrapped_mod = pickle.loads(p)
Hmmm, something like this?
import sys
attribList = dir(someobject)
for attrib in attribList:
if(type(attrib) == type(sys)): #is a module
#put in a facade, either recursively list the module and do the same thing, or just put in something like str('modulename_module')
else:
#proceed with normal pickle
Obviously, this would go into an extension of the pickle class with a reimplemented dump method...
Related
I have a challenge where I'm given a function where I can pass only a single argument which must be a builtin (no modules of any kind), for example chr or IndexError and use its attributes and call its functions to get access to other builtin types.
For example, if I choose the getattr function, I can access the builtins like this:
def main(a):
builtins = a(a, '__self__')
main(getattr)
Most other functions aren't of much help for my challenge. I know that the attributes are deep and a lot of information can be extracted.
This is a good reference: https://book.hacktricks.xyz/misc/basic-python/bypass-python-sandboxes
What can I get access to using an Ellipsis object, in Python written as ... ?
Subclasses can be accessed using ....__class__.__base__.__subclasses__() which returns a list and eventually get access back using a for loop to find which of those classes's __name__ attribute is catch-warnings, and that class's _module attribute has all the builtins (Code). I cannot use that because the index at which it will appear is always different
The python version I target is 3.9.
globalEx1.py:
globals()['a']='100'
def setvalue(val):
globals()['a'] = val
globalEx2.py:
from globalEx1 import *
print a
setvalue('200')
print a
On executing globalEx2.py:
Output:
100
100
How can I change value of globals['a'] using a function, so that it reflects across the .py files?
Each module has its own globals. Python is behaving exactly as expected. Updating globalEx1's a to point to something else isn't going to affect where globalEx2's a is pointing.
There are various ways around this, depending on exactly what you want.
re-import a after the setvalue() call
return a and assign it, like a = setvalue().
import globalEx1 and use globalEx1.a instead of a. (Or use import globalEx1 as and a shorter name.)
pass globalEx2's globals() as an argument to setvalue and set the value on that instead.
make a a mutable object containing your value, like a list, dict or types.SimpleNamespace, and mutate it in setvalue.
use inspect inside setvalue to get the caller's globals from its stack frame. (Convenient, but brittle.)
Last option looks suitable for me.. it will do the job with minimal code change but can I update globals of multiple modules using same way? or it only gives me the caller's globals?
Option 6 is actually the riskiest. The caller itself basically becomes a hidden parameter to the function, so something like a decorator from another module can break it without warning. Option 4 just makes that hidden parameter explicit, so it's not so brittle.
If you need this to work across more than two modules, option 6 isn't good enough, since it only gives you the current call stack. Option 3 is probably the most reliable for what you seem to be trying to do.
How does option 1 work? I mean is it about running again -> "from globalEx1 import *" because I have many variables like 'a'.
A module becomes an object when imported the first time and it's saved in the sys.modules cache, so importing it again doesn't execute the module again. A from ... import (even with the *) just gets attributes from that module object and adds them to the local scope (which is the module globals if done at the top level, that is, outside of any definition.)
The module object's __dict__ is basically its globals, so any function that alters the module's globals will affect the resulting module object's attrs, even if it's done after the module was imported.
We cannot do from 'globalEx1 import *' from a python function, any alternative to this?
The star syntax is only allowed at the top level. But remember that it's just reading attributes from the module object. So you can get a dict of all the module attributes like
return vars(globalEx1)
This will give you more than * would. It doesn't return names that begin with an _ by default, or the subset specified in __all__ otherwise. You can filter the resulting dict with a dict comprehension, and even .update() the globals dict for some other module with the result.
But rather than re-implementing this filtering logic, you could just use exec to make it the top level. Then the only weird key you'd get is __builtins__
namespace = {}
exec('from globalEx1 import *', namespace)
del namespace['__builtins__']
return namespace
Then you can globals().update(namespace) or whatever.
Using exec like this is probably considered bad form, but then so is import * to begin with, honestly.
This is an interesting problem, related to the fact that strings are immutable. The line from globalEx1 import * creates two references in the globalEx2 module: a and setvalue. globalEx2.a initially refers to the same string object as globalEx1.a, since that's how imports work.
However, once you call setvalue, which operates on the globals of globalEx1, the value referenced by globalEx1.a is replaced by another string object. Since strings are immutable, there is no way to do this in place. The value of globalEx2.a remains bound to the original string object, as it should.
You have a couple of workarounds available here. The most pythonic is to fix the import in globalEx2:
import globalEx1
print globalEx1.a
globalEx1.setvalue('200')
print globalEx1.a
Another option would be to use a mutable container for a, and access that:
globals()['a']=['100']
def setvalue(val):
globals()['a'][0] = val
from globalEx1 import *
print a[0]
setvalue('200')
print a[0]
A third, and wilder option, is to make globalEx2's setvalue a copy of the original function, but with its __globals__ attribute set to the namespace of globalEx2 instead of globalEx1:
from functools import update_wrapper
from types import FunctionType
from globalEx1 import *
_setvalue = FunctionType(setvalue.__code__, globals(), name=setvalue.__name__,
argdefs=setvalue.__defaults__,
closure=setvalue.__closure__)
_setvalue = functools.update_wrapper(_setvalue, setvalue)
_setvalue.__kwdefaults__ = f.__kwdefaults__
setvalue = _setvalue
del _setvalue
print a
...
The reason you have to make the copy is that __globals__ is a read-only attribute, and also you don't want to mess with the function in globalEx1. See https://stackoverflow.com/a/13503277/2988730.
Globals are imported only once at the beginning with the import statement. Thus, if the global is an immutable object like str, int, etc, any update will not be reflected. However, if the global is a mutable object like list, etc, updates will be reflected. For example,
globalEx1.py:
globals()['a']=[100]
def setvalue(val):
globals()['a'][0] = val
The output will be changed as expected:
[100]
[200]
Aside
It's easier to define globals like normal variables:
a = [100]
def setvalue(value):
a[0] = value
Or when editing value of immutable objects:
a = 100
def setvalue(value):
global a
a = value
I am trying to save a dictionary that contains a lambda function in django.core.cache. The example below fails silently.
from django.core.cache import cache
cache.set("lambda", {"name": "lambda function", "function":lambda x: x+1})
cache.get("lambda")
#None
I am looking for an explanation for this behaviour. Also, I would like to know if there is a workaround without using def.
The example below fails silently.
No, it doesn't. The cache.set() call should give you an error like:
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
Why? Internally, Django is using Python's pickle library to serialize the value you are attempting to store in cache. When you want to pull it out of cache again with your cache.get() call, Django needs to know exactly how to reconstruct the cached value. And due to this desire not to lose information or incorrectly/improperly reconstruct a cached value, there are several restrictions on what kinds of objects can be pickled. You'll note that only these types of functions may be pickled:
functions defined at the top level of a module
built-in functions defined at the top level of a module
And there is this further explanation about how pickling functions works:
Note that functions (built-in and user-defined) are pickled by “fully qualified” name reference, not by value. This means that only the function name is pickled, along with the name of the module the function is defined in. Neither the function’s code, nor any of its function attributes are pickled. Thus the defining module must be importable in the unpickling environment, and the module must contain the named object, otherwise an exception will be raised.
I have a situation where there's a complex object that can be referenced by unique name like package.subpackage.MYOBJECT. While it's possible to pickle this object using standard pickle algorithm, resulting data string will be very big.
I'm looking for some way to get same pickling semantic for an object that is already here for classes and functions: Python's pickle just dumps their fully qualified names, not code. This way just string like package.subpackage.MYOBJECT will be dumped and upon unpickling object will be imported, just like it happens for functions or classes.
It seems that this task boils down to making object aware of variable name it's bound to, but I have no clues how to do it.
Here's short example to explain myself clearly (obvious imports are skipped).
File bigpackage/bigclasses/models.py:
class SomeInterface():
__meta__ = ABCMeta
#abstractmethod
def operation():
pass
class ImplementationA(SomeInterface):
def operation():
print "ImplementationA"
class ImplementationB(SomeInterface):
def operation():
print "ImplementationB"
IMPL_A = ImplementationA()
IMPL_B = ImplementationB()
File bigpackage/bigclasses/tasks.py:
#celery.task
def background_task(impl, somearg):
assert isinstance(impl, SomeInterface)
impl.operation()
print somearg
File bigpackage/bigclasses/work.py:
from bigpackage.bigclasses.models import IMPL_A, IMPL_B
from bigpackage.bigclasses.tasks import background_task
background_task.submit(IMPL_A, "arg1")
background_task.submit(IMPL_B, "arg2")
Here I have trivial background Celery task that accept one of two available implementations of SomeInterface as an argument. Task's arguments are pickled by Celery, passed to a queue and executed on some worker server, that runs exactly the same code base. My idea is to avoid deep pickling of IMPL_A and IMPL_B and instead pass them as bigpackage.bigclasses.models.IMPL_A and bigpackage.bigclasses.models.IMPL_B correspondingly. That will help with performance and total traffic for queue server and also provide some safety against changes in IMPL_A and IMPL_B that will make them non-pickleable (for example lambda anywhere in object attributes hierarchy).
I'm trying to mimic methods.grep from Ruby which simply returns a list of available methods for any object (class or instance) called upon, filtered by regexp pattern passed to grep.
Very handy for investigating objects in an interactive prompt.
def methods_grep(self, pattern):
""" returns list of object's method by a regexp pattern """
from re import search
return [meth_name for meth_name in dir(self) \
if search(pattern, meth_name)]
Because of Python's limitation not quite clear to me it unfortunately can't be simply inserted in the object class ancestor:
object.mgrep = classmethod(methods_grep)
# TypeError: can't set attributes of built-in/extension type 'object'
Is there some workaround how to inject all classes or do I have to stick with a global function like dir ?
There is a module called forbiddenfruit that enables you to patch built-in objects. It also allows you to reverse the changes. You can find it here https://pypi.python.org/pypi/forbiddenfruit/0.1.1
from forbiddenfruit import curse
curse(object, "methods_grep", classmethod(methods_grep))
Of course, using this in production code is likely a bad idea.
There is no workaround AFAIK. I find it quite annoying that you can't alter built-in classes. Personal opinion though.
One way would be to create a base object and force all your objects to inherit from it.
But I don't see the problem to be honest. You can simply use methods_grep(object, pattern), right? You don't have to insert it anywhere.