Python multiprocessing - mapping private method - python

Generally I'm aware of pickle mechanism, but can't understand why this example:
from multiprocessing import Pool
class Foo:
attr = 'a class attr'
def __test(self,x):
print(x, self.attr)
def test2(self):
with Pool(4) as p:
p.map(self.__test, [1,2,3,4])
if __name__ == '__main__':
f = Foo()
f.test2()
complains about __test method?
return _ForkingPickler.loads(res)
AttributeError: 'Foo' object has no attribute '__test'
After changing def __test to def _test(one underscore) everything works fine. Do I miss any basics knowledge of pickleing or "private" methods?

This appears to be a flaw in the name mangling magic. The actual name of a name-mangled private function incorporates the class name, so Foo.__test is actually named Foo._Foo__test, and other methods in the class just implicitly look up that name when they use self.__test.
Problem is, the magic extends to preserving the __name__ unmangled; Foo._Foo__test.__name__ is "__test". And pickle uses the __name__ to serialize the method. When it tries to deserialize it on the other end, it tries to look up plain __test, without applying the name mangling, so it can't find _Foo__test (the real name).
I don't think there is any immediate solution here aside from not using a private method directly (using it indirectly via another non-private method or global function would be fine); even if you try to pass self._Foo__test, it'll still pickle the unmangled name from __name__.
The longer term solution would be to file a bug on the Python bug tracker; there may be a clever way to preserve the "friendly" __name__ while still allowing pickle to seamlessly mangle as needed.

Related

Pickle with a specific module name

I am using the pickle library to serialise a custom object, let's call it A, which is defined in a.py.
If I pickle an object of type A in a.py as follows:
import pickle
class A:
...
if __name__ == "__main__":
inst = A("some param")
with open("a.pickle", 'wb') as dumpfile:
pickle.dump(inst, dumpfile)
Then there is a problem with loading this object from storage if the module A is not explicitly in the namespace __main__. See this other question. This is because pickle knows that it should look for the class A in __main__, since that's where it was when pickle.dump() happened.
Now, there are two approaches to dealing with this:
Deal with it at the deserialisation end,
Deal with it at the serialisation end.
For 1, there are various options (see the above link, for example), but I want to avoid these, because I think it makes sense to give the 'pickler' responsibility regarding its data.
For 2, we could just avoid pickling when the module is under the __main__ namespace, but that doesn't seem very flexible. We could alternatively modify A.__module__, and set it to the name of the module (as done here).
Pickle uses this __module__ variable to find where to import the class A from, so setting it before .dump() works:
if __name__ == "__main__":
inst = A("some param")
A.__module__ = 'a'
with open("a.pickle", 'wb') as dumpfile:
pickle.dump(inst, dumpfile)
Q: is this a good idea? It seems like it's implementation dependent, not interface dependent. That is, pickle could decide to use another method of locating modules to import, and this approach would break. Is there an alternative that uses pickle's interface?
Another way around it would be to import the file itself:
import pickle
import a
class A:
pass
def main():
inst = a.A()
print(inst.__module__)
with open("a.pickle", 'wb') as dumpfile:
pickle.dump(inst, dumpfile)
if __name__ == "__main__":
main()
Note that it works because the import statement is purely an assignment of the name a to a module object, it doesn't go to infinite recursion when you import a within a.py.

Idiomatic way to access a static method within the class in Python

I understood that staticmethod should always be referred to by the classname in which they belong. But I see that they can also be accessed by the keyword self too.
This is bit confusing and I don't see interpreter throwing an error.
import unittest
class TestA(unittest.TestCase):
#staticmethod
def fun1():
return True
#staticmethod
def fun2():
return False
def test_one(self):
assert TestA.fun1() == True
def test_two(self):
assert self.fun2() == False
if __name__ == '__main__':
unittest.main()
What is the right way to access the staticmethod. Like TestA.fun1 above which is clear to me or as self.fun2 which is mildly concerning because there is no instance sent to fun2.
Either way is acceptable, as described in the documentation:
It can be called either on the class (such as C.f()) or on an instance (such as C().f()). The instance is ignored except for its class.
In some sense, the point of a staticmethod is to allow you to call the method without worrying about whether you're calling it on an instance or a class.
Either way "works", but if possible you should call using the instance. This allows it to work properly in the event that you subclass TestA. That is, it will properly find the implementation of the static method for the type of 'self' rather than the hard coded TestA. If you were in a context where you had a "cls" variable (such as a class method or perhaps factory function) you should call on that. You would only name the class object directly if you always want to call the TestA implementation, typically when you don't have a self or cls reference in the local scope.

static variables inside a python method

In a Python method, I would like to have a local variable whose value persists between calls to the method.
This question shows how to declare such "static variables" (c++ terminology) inside functions. I tried to do the same in an instance method, and failed.
Here's a working minimal example that reproduces the problem. You can copy-paste it into an interpreter.
class SomeClass(object):
def some_method(self):
if not hasattr(SomeClass.some_method, 'some_static_var'):
SomeClass.some_method.some_static_var = 1 # breaks here
for i in range(3):
print SomeClass.some_method.some_static_var
SomeClass.some_method.some_static_var += 1
if __name__ == '__main__':
some_instance = SomeClass()
some_instance.some_method()
On the line labeled "# breaks here", I get:
AttributeError: 'instancemethod' object has no attribute 'some_static_var'
I realize there's an easy workaround, where I make some_static_var a member variable of SomeClass. However, the variable really has no use outside of the method, so I'd much prefer to keep it from cluttering up SomeClass' namespace if I could.
In python 2, you have to deal with bound and unbound methods. These do not have a __dict__ attribute, like functions do:
#python 2
'__dict__' in dir(SomeClass.some_method)
Out[9]: False
def stuff():
pass
'__dict__' in dir(stuff)
Out[11]: True
In python 3, your code works fine! The concept of bound/unbound methods is gone, everything is a function.
#python 3
'__dict__' in dir(SomeClass.some_method)
Out[2]: True
Back to making your code work, you need to put the attribute on the thing which has a __dict__: the actual function:
if not hasattr(SomeClass.some_method.__func__, 'some_static_var'):
#etc
Read more on im_func and __func__ here
It is up to you to decide whether this makes your code more or less readable - for me, making these types of things class attributes is almost always the way to go; it doesn't matter that only one method is accessing said attribute, it's where I look for "static" type vars. I value readable code over clean namespaces.
This last paragraph was of course an editorial, everyone is entitled to their opinion :-)
You can't set attribute on method objects.
Creating class attributes instead (that is, SomeClass.some_var = 1) is the standard Python way. However, we might be able to suggest more appropriate fixes if you give us a high-level overview of your actual problem (what are you writing this code for?).
Use the global keyword to access file-level variables
my_static = None
class MyClass(object):
def some_method(self):
global my_static
if my_static is None:
my_static = 0
else:
my_static = my_static + 1
print my_static
if __name__ == '__main__':
instance = MyClass()
instance.some_method()
instance.some_method()
Outputs:
0
1
Although, as mentioned elsewhere, a class variable would be preferable

Why Is The property Decorator Only Defined For Classes?

tl;dr: How come property decorators work with class-level function definitions, but not with module-level definitions?
I was applying property decorators to some module-level functions, thinking they would allow me to invoke the methods by mere attribute lookup.
This was particularly tempting because I was defining a set of configuration functions, like get_port, get_hostname, etc., all of which could have been replaced with their simpler, more terse property counterparts: port, hostname, etc.
Thus, config.get_port() would just be the much nicer config.port
I was surprised when I found the following traceback, proving that this was not a viable option:
TypeError: int() argument must be a string or a number, not 'property'
I knew I had seen some precedant for property-like functionality at module-level, as I had used it for scripting shell commands using the elegant but hacky pbs library.
The interesting hack below can be found in the pbs library source code. It enables the ability to do property-like attribute lookups at module-level, but it's horribly, horribly hackish.
# this is a thin wrapper around THIS module (we patch sys.modules[__name__]).
# this is in the case that the user does a "from pbs import whatever"
# in other words, they only want to import certain programs, not the whole
# system PATH worth of commands. in this case, we just proxy the
# import lookup to our Environment class
class SelfWrapper(ModuleType):
def __init__(self, self_module):
# this is super ugly to have to copy attributes like this,
# but it seems to be the only way to make reload() behave
# nicely. if i make these attributes dynamic lookups in
# __getattr__, reload sometimes chokes in weird ways...
for attr in ["__builtins__", "__doc__", "__name__", "__package__"]:
setattr(self, attr, getattr(self_module, attr))
self.self_module = self_module
self.env = Environment(globals())
def __getattr__(self, name):
return self.env[name]
Below is the code for inserting this class into the import namespace. It actually patches sys.modules directly!
# we're being run as a stand-alone script, fire up a REPL
if __name__ == "__main__":
globs = globals()
f_globals = {}
for k in ["__builtins__", "__doc__", "__name__", "__package__"]:
f_globals[k] = globs[k]
env = Environment(f_globals)
run_repl(env)
# we're being imported from somewhere
else:
self = sys.modules[__name__]
sys.modules[__name__] = SelfWrapper(self)
Now that I've seen what lengths pbs has to go through, I'm left wondering why this facility of Python isn't built into the language directly. The property decorator in particular seems like a natural place to add such functionality.
Is there any partiuclar reason or motivation for why this isn't built directly in?
This is related to a combination of two factors: first, that properties are implemented using the descriptor protocol, and second that modules are always instances of a particular class rather than being instantiable classes.
This part of the descriptor protocol is implemented in object.__getattribute__ (the relevant code is PyObject_GenericGetAttr starting at line 1319). The lookup rules go like this:
Search through the class mro for a type dictionary that has name
If the first matching item is a data descriptor, call its __get__ and return its result
If name is in the instance dictionary, return its associated value
If there was a matching item from the class dictionaries and it was a non-data descriptor, call its __get__ and return the result
If there was a matching item from the class dictionaries, return it
raise AttributeError
The key to this is at number 3 - if name is found in the instance dictionary (as it will be with modules), then its value will just be returned - it won't be tested for descriptorness, and its __get__ won't be called. This leads to this situation (using Python 3):
>>> class F:
... def __getattribute__(self, attr):
... print('hi')
... return object.__getattribute__(self, attr)
...
>>> f = F()
>>> f.blah = property(lambda: 5)
>>> f.blah
hi
<property object at 0xbfa1b0>
You can see that .__getattribute__ is being invoked, but isn't treating f.blah as a descriptor.
It is likely that the reason for the rules being structured this way is an explicit tradeoff between the usefulness of allowing descriptors on instances (and, therefore, in modules) and the extra code complexity that this would lead to.
Properties are a feature specific to classes (new-style classes specifically) so by extension the property decorator can only be applied to class methods.
A new-style class is one that derives from object, i.e. class Foo(object):
Further info: Can modules have properties the same way that objects can?

Use a class in the context of a different module

I want to modify some classes in the standard library to use a different set of globals the ones that other classes in that module use.
Example
This example is an example only:
# module_a.py
my_global = []
class A:
def __init__(self):
my_global.append(self)
class B:
def __init__(self):
my_global.append(self)
In this example, If I create an instance of A, via A(), it will call append on the object named by my_global. But now I wish to create a new module, import B to it, and have B use my_global from the module it's been imported into, instead of the my_global from the module B was original defined.
# module_b.py
from module_a import B
my_global = []
Related
I'm struggling to explain my problem, here is my previous attempt which did in fact ask something completely different:
Clone a module and make changes to the copy
Update0
The example above is only for illustration of what I'm trying to achieve.
Since there is no variable scope for classes (unlike say, C++), I think a reference to a globals mapping is not stored in a class, but instead is attached to every function when defined.
Update1
An example was requested from the standard library:
Many (maybe all?) of the classes in the threading module make use of globals such as _allocate_lock, get_ident, and _active, defined here and here. One cannot change these globals without changing it for all the classes in that module.
You can't change the globals without affecting all other users of the module, but what you sort of can do is create a private copy of the whole module.
I trust you are familiar with sys.modules, and that if you remove a module from there, Python forgets it was imported, but old objects referencing it will continue to do so. When imported again, a new copy of the module will be made.
A hacky solution to your problem could would be something like this:
import sys
import threading
# Remove the original module, but keep it around
main_threading = sys.modules.pop('threading')
# Get a private copy of the module
import threading as private_threading
# Cover up evidence by restoring the original
sys.modules['threading'] = main_threading
# Modify the private copy
private_threading._allocate_lock = my_allocate_lock()
And now, private_threading.Lock has globals entirely separate from threading.Lock!
Needless to say, the module wasn't written with this in mind, and especially with a system module such as threading you might run into problems. For example, threading._active is supposed to contain all running threads, but with this solution, neither _active will have them all. The code may also eat your socks and set your house on fire, etc. Test rigorously.
Okay, here's a proof-of-concept that shows how to do it. Note that it only goes one level deep -- properties and nested functions are not adjusted. To implement that, as well as make this more robust, each function's globals() should be compared to the globals() that should be replaced, and only make the substitution if they are the same.
def migrate_class(cls, globals):
"""Recreates a class substituting the passed-in globals for the
globals already in the existing class. This proof-of-concept
version only goes one-level deep (i.e. properties and other nested
functions are not changed)."""
name = cls.__name__
bases = cls.__bases__
new_dict = dict()
if hasattr(cls, '__slots__'):
new_dict['__slots__'] = cls.__slots__
for name in cls.__slots__:
if hasattr(cls, name):
attr = getattr(cls, name)
if callable(attr):
closure = attr.__closure__
defaults = attr.__defaults__
func_code = attr.__code__
attr = FunctionType(func_code, globals)
new_dict[name] = attr
if hasattr(cls, '__dict__'):
od = getattr(cls, '__dict__')
for name, attr in od.items():
if callable(attr):
closure = attr.__closure__
defaults = attr.__defaults__
kwdefaults = attr.__kwdefaults__
func_code = attr.__code__
attr = FunctionType(func_code, globals, name, defaults, closure)
if kwdefaults:
attr.__kwdefaults__ = kwdefaults
new_dict[name] = attr
return type(name, bases, new_dict)
After having gone through this excercise, I am really curious as to why you need to do this?
"One cannot change these globals without changing it for all the classes in that module." That's the root of the problem isn't it, and a good explanation of the problem with global variables in general. The use of globals in threading tethers its classes to those global objects.
By the time you jerry-rig something to find and monkey patch each use of a global variable within an individual class from the module, are you any further ahead of just reimplementing the code for your own use?
The only work around that "might" be of use in your situation is something like mock. Mock's patch decorators/context managers (or something similar) could be used to swap out a global variable for the life-time of a given object. It works well within the very controlled context of unit testing, but in any other circumstances I wouldn't recommend it and would think about just reimplementing the code to suit my needs.
Globals are bad for exactly this reason, as I am sure you know well enough.
I'd try to reimplement A and B (maybe by subclassing them) in my own module and with all references to
my_global replaced by an injected dependency on A and B, which I'll call registry here.
class A(orig.A):
def __init__(self, registry):
self.registry = registry
self.registry.append(self)
# more updated methods
If you are creating all instances of A yourself you are pretty much done. You might want to create a factory which hides away the new init parameter.
my_registry = []
def A_in_my_registry():
return A(my_registry)
If foreign code creates orig.A instances for you, and you would rather have new A instances, you have to hope the foreign code is customizeable
with factories. If not, derive from the foreign classes and update them to use (newly injected) A factories instead. .... And rinse repeat for for the creation of those updated classes. I realize this can be tedious to almost impossible depending on the complexity of the foreign code, but most std libs are quite flat.
--
Edit: Monkey patch std lib code.
If you don't mind monkey patching std libs, you could also try to modifiy the original classes to work
with a redirection level which defaults to the original globals, but is customizable per instance:
import orig
class A(orig.A):
def __init__(self, registry=orig.my_globals):
self.registry = registry
self.registry.append(self)
# more updated methods
orig.A = A
As before you will need to control creations of A which should use non "standard globals",
but you won't have different A classes around as long as you monkey patch early enough.
If you use Python 3, you can subclass B and redefine the __globals__ attribute of the __init__ method like this:
from module_a import B
function = type(lambda: 0) # similar to 'from types import FunctionType as function', but faster
my_global = []
class My_B (B):
__init__ = function(B.__init__.__code__, globals(), '__init__', B.__init__.__defaults__, B.__init__.__closure__)
IMHO it is not possible to override global variables...
Globals are rarely a good idea.
Implicit variables are rarely a good idea.
An implicitly-used global is easy to indict as also "rarely good".
Additionally, you don't want A.__init__() doing anything "class-level" like updating some mysterious collection that exists for the class as a whole. That's often a bad idea.
Rather than mess with implicit class-level collection, you want a Factory in module_a that (1) creates A or B instances and (b) updates an explicit collection.
You can then use this factory in module_b, except with a different collection.
This can promote testability by exposing an implicit dependency.
module_a.py
class Factory( object ):
def __init__( self, collection ):
self.collection= collection
def make( self, name, *args, **kw ):
obj= eval( name )( *args, **kw )
self.collection.append( obj )
return obj
module_collection = []
factory= Factory( module_collection )
module_b.py
module_collection = []
factory = module_a.Factory( module_collection )
Now a client can do this
import module_b
a = module_b.factory.make( "A" )
b = module_b.factory.make( "B" )
print( module_b.module_collection )
You can make the API a bit more fluent by making the factory "callable" (implementing __call__ instead of make.
The point is to make the collection explicit via a factory class.

Categories