Python multiprocessing initialising the Pool

Python multiprocessing initialising the Pool - python

I have a large read-only object that I would like the child processes to use but unfortunately this object cannot be pickled. Given that it's read-only I thought about declaring it as a global and then using an initializing function in the Pool where I perform the necessary copying. My code is something like:
def f(processes, args):
global pool
pool = multiprocessing.Pool(processes,setGlobal,[args])
def setGlobal(args):
# global object to be used by the child processes...
global obj
obj = copy.deepcopy(args)
The function setGlobal performs the initialization. My first question concerns the arguments to setGlobal (which are passed as a list). Do these need to be pickle-able? The errors I'm getting seem to suggest that they do. If so, how can I make the unpickle-able read-only object visible to my child processes?

Related

Python garbage collection and singletons classes

I have a singleton class and I do not understand how the Python garbage collector is not removing the instance.
I'm using - from singleton_decorator import singleton
example of my class:
from singleton_decorator import singleton
#singleton
class FilesRetriever:
def __init__(self, testing_mode: bool = False):
self.testing_mode = testing_mode
test example:
def test_singletone(self):
FilesRetriever(testing_mode=True)
mode = FilesRetriever().testing_mode
print("mode 1:" + str(mode))
mode = FilesRetriever().testing_mode
print("mode 2:" + str(mode))
count_before = gc.get_count()
gc.collect()
count_after = gc.get_count()
mode = FilesRetriever().testing_mode
print("mode 3:" + str(mode))
print("count_before:" + str(count_before))
print("count_after:" + str(count_after))
test output:
mode 1:True
mode 2:True
mode 3:True
count_before:(306, 10, 5)
count_after:(0, 0, 0)
I would expect after the garbage collector runs automatically or after I ran it in my test that the instance of _SingletonWrapper (the class in the decorator implementation) will be removed because nothing is pointing to it. and then the value of "print("mode 3:" + str(mode))" will be False because that is the default (and the instance was re-created)

So the code and garbage collection is working as intended. Look at the code for the singleton decorator you are referring to. decorator
Just because you call gc.collect() and your code isn't holding a reference somewhere doesn't mean other code isn't.
In the decorator it creates an instance then stores that instance in a variable within the decorator. So even though you collected relative to your code. Their code is still holding a reference to that instance and so it doesn't get collected.
This would be expected behavior from a singleton since that is its whole purpose. Is to store an instance somewhere that can be retrieved and used instead of creating a new instance every time. So you wouldn't want that instance to be trashed unless you need to replace the instance.
To answer your comment
No, you are not getting the instance to the _SingletonWrapper. When you write FileRetriever() what you're actually doing is invoking the __call__ method of the instance of _SingletonWrapper. When you use #singleton() that returns an instance not the class object.
Again while in your code you are not storing it anywhere doesn't mean it isn't stored some where else. When you define a class what you are doing in a sense is in the global scope of the module it is creating a variable that holds that class definition. So in you code in the global scope it has something like this,
FileRetriever = (class:
def __init__(self):
blahblahblah
So now your class definition is stored in a variable call FileRetriever.
So now you're using a decorator, so now it looks like this based on the code in the single decorator.
FileRetriever = _SingletonWrapper(class: blahblah)
Now you're class is wrapped and stored in the variable FileRetriever.
Now you invoke the _SingletonWrapper.__call__() when you run FileRetriever().
Because __call__ is an instance method. It can hold a reference to you're original class and instance of the class you declared and so even if you remove all you're references to that class this code is still holding that reference.
If you truly want to remove all references to you're singleton which I'm not sure why you would want to. You need to remove all references to the wrapper as well as you're class. So something like FileRetriever = None might cause the gc to collect it. But you would lose you're original class definition in the process.

Multiprocessing pool with parameters that cannot be serialized

The pool function of multiprocessing pickles all the parameters that are passed into it and then recreates them in the pool.
In my example, I have some parameters that cannot be pickeled (they are c++ objects) and they take a lot of time to create.
Is there any way I can pass those parameters into the pool without having to make them serializable?

multiprocessing.Pool allows to pass an initializer function which is executed every time a new worker process is spawned.
You can use this function to initialize you C++ objects. Each process will have its own copy.
from multiprocessing import Pool
parameters = None
def initializer():
global parameters
parameters = initialize_cpluplus_objects()
def function():
parameters.do_something()
pool = Pool(initializer=initializer_function)
pool.apply_async(function)

Parallel processing loop using multiprocessing Pool

I want to process a large for loop in parallel, and from what I have read the best way to do this is to use the multiprocessing library that comes standard with Python.
I have a list of around 40,000 objects, and I want to process them in parallel in a separate class. The reason for doing this in a separate class is mainly because of what I read here.
In one class I have all the objects in a list and via the multiprocessing.Pool and Pool.map functions I want to carry out parallel computations for each object by making it go through another class and return a value.
# ... some class that generates the list_objects
pool = multiprocessing.Pool(4)
results = pool.map(Parallel, self.list_objects)
And then I have a class which I want to process each object passed by the pool.map function:
class Parallel(object):
def __init__(self, args):
self.some_variable = args[0]
self.some_other_variable = args[1]
self.yet_another_variable = args[2]
self.result = None
def __call__(self):
self.result = self.calculate(self.some_variable)
The reason I have a call method is due to the post I linked before, yet I'm not sure I'm using it correctly as it seems to have no effect. I'm not getting the self.result value to be generated.
Any suggestions?
Thanks!

Use a plain function, not a class, when possible. Use a class only when there is a clear advantage to doing so.
If you really need to use a class, then given your setup, pass an instance of Parallel:
results = pool.map(Parallel(args), self.list_objects)
Since the instance has a __call__ method, the instance itself is callable, like a function.
By the way, the __call__ needs to accept an additional argument:
def __call__(self, val):
since pool.map is essentially going to call in parallel
p = Parallel(args)
result = []
for val in self.list_objects:
result.append(p(val))

Pool.map simply applies a function (actually, a callable) in parallel. It has no notion of objects or classes. Since you pass it a class, it simply calls __init__ - __call__ is never executed. You need to either call it explicitly from __init__ or use pool.map(Parallel.__call__, preinitialized_objects)

If I have an #staticmethod in my python webapp, do I need to protect it with threading.RLock()?

I have a webapp built in python running off the paste server. If I have declared an #staticmethod that assigns state to method-scoped variables do I have to protect it with eg threading.RLock() (or is there a better way) in order to prevent multiple HTTP requests (I'm assuming paste as a server contains some sort of threadpool for serving incoming requests) from interfering with eachother's state?
I should point out I am using Grok as my framework.
so -
#staticmethod
def doSomeStuff():
abc = 1
...some code...
abc = 5
given the above, is it thread safe inside grok/paste between threads (again, assuming requests are dealt with in threads?)

Local variables are created for each method call separately, no matter if it's static method, class method, non-static method or a stand-alone function, the same way as in Java. Unless you copy the references to those objects somewhere outside explicitly, so that they survive the method and can be accessed from other threads, you don't have to lock anything.
For example, this is safe unless CoolClass uses any shared state between instances:
def my_safe_method(*args):
my_cool_object = CoolClass()
my_cool_object.populate_from_stuff(*args)
return my_cool_object.result()
This is likely to be unsafe since the object reference may be shared between the threads (depends on what get_cool_inst does):
def my_suspicious_method(*args):
my_cool_object = somewhere.get_cool_inst()
my_cool_object.populate_from_stuff(*args)
# another thread received the same instance
# and modified it
# (my_cool_object is still local, but it's a reference to a shared object)
return my_cool_object.result()
This can be unsafe too, if publish shares the reference:
def my_suspicious_method(*args):
my_cool_object = CoolClass()
# puts somewhere into global namespace, other threads access it
publish(my_cool_object)
my_cool_object.prepare(*args)
# another thread modifies it now
return my_cool_object.result()
EDIT: The code sample you've provided is completely thread safe, #staticmethod didn't change anything in that respect.

thread Locking/unlocking in constructor/destructor in python

I have a class that is only ever accessed externally through static methods. Those static methods then create an object of the class to use within the method, then they return and the object is presumably destroyed. The class is a getter/setter for a couple config files and now I need to place thread locks on the access to the config files.
Since I have several different static methods that all need read/write access to the config files that all create objects in the scope of the method, I was thinking of having my lock acquires done inside of the object constructor, and then releasing in the destructor.
My coworker expressed concern that it seems like that could potentially leave the class locked forever if something happened. And he also mentioned something about how the destructor in python was called in regards to the garbage collector, but we're both relatively new to python so that's an unknown.
Is this a reasonable solution or should I just lock/unlock in each of the methods themselves?
Class A():
rateLock = threading.RLock()
chargeLock = threading.RLock()
#staticmethod
def doZStuff():
a = A()
a.doStuff('Z')
#staticmethod
def doYStuff():
a = A()
a.doStuff('Y')
#synchronized(lock)
def doStuff(self, type):
if type == 'Z':
otherstuff()
elif type == 'B':
evenmorestuff()
Is it even possible to get it to work that way with the decorator on doStuff() instead of doZStuff()
Update
Thanks for the answers everyone. The problem I'm facing is mostly due to the fact that it doesn't really make sense to access my module asynchronously, but this is just part of an API. And the team accessing our stuff through the API was complaining about concurrency issues. So I don't need the perfect solution, I'm just trying to make it so they can't crash our side or get garbage data back

Class A():
rateLock = threading.RLock()
chargeLock = threading.RLock()
def doStuff(self,ratefile,chargefile):
with A.rateLock:
with open(ratefile) as f:
# ...
with A.chargeLock:
with open(chargefile) as f:
# ...
Using the with statement will guarantee that the (R)Lock is acquired and released in pairs. The release will be called even if there an exception occurs within the with-block.
You might also want to think about placing your locks around the file access block with open(...) as ... as tightly as you can so that the locks are not held longer than necessary.
Finally, the creation and garbage collection of a=A() will not affect the locks
if (as above) the locks are class attributes (as opposed to instance attributes). The class attributes live in A.__dict__, rather than a.__dict__. So the locks will not be garbage collected until A itself is garbage collected.

You are right with the garbage collection, so it is not a good idea.
Look into decorators, for writing synchronized functions.
Example: http://code.activestate.com/recipes/465057-basic-synchronization-decorator/
edit
I'm still not 100% sure what you have in mind, so my suggestion may be wrong:
class A():
lockZ = threading.RLock()
lockY = threading.RLock()
#staticmethod
#synchroized(lockZ)
def doZStuff():
a = A()
a.doStuff('Z')
#staticmethod
#synchroized(lockY)
def doYStuff():
a = A()
a.doStuff('Y')
def doStuff(self, type):
if type == 'Z':
otherstuff()
elif type == 'B':
evenmorestuff()

However, if you HAVE TO acquire and release locks in constructors and destructors, then you really, really, really should give your design another chance. You should change your basic assumptions.
In any application: a "LOCK" should always be held for a short time only - as short as possible. That means - in probably 90% of all cases, you will acquire the lock in the same method that will also release the lock.
There should hardly be NEVER EVER a reason to lock/unlock an object in a RAII style. This is not what it was meant to become ;)
Let me give you an ekxample: you manage some ressources, those res. can be read from many threads at once but only one thread can write to them.
In a "naive" implementation you would have one lock per object, and whenever someone wants to write to it, then you will LOCK it. When multiple threads want to write to it, then you have it synchronyzed fairly, all safe and well, BUT: When thread says "WRITE", then we will stall, until the other threads decide to release the lock.
But please understand that locks, mutex - all these primitives were created to synchronize only a few lines of your source code. So, instead of making the lock part of you writeable object, you have only a lock for the very short time where it really is required. You have to invest more time and thoughts in your interfaces. But, LOCKS/MUTEXES were never meant to be "held" for more than a few microseconds.

I don't know which platform you are on, but if you need to lock a file, well, you should probably use flock() if it is available instead of rolling your own locking routines.
Since you've mentioned that you are new to python, I must say that most of the time threads are not the solution in python. If your activity is CPU-bound, you should consider using multiprocessing. There is no concurrent execution because of GIL, remember? (this is true for most cases). If your activity is I/O bound, which I guess is the case, you should, perhaps, consider using an event-driven framework, like Twisted. That way you won't have to worry about deadlocks at all, I promise :)

Releasing locks in the destruction of objects is risky as has already been mentioned because of the garbage collector, because deciding when to call the __del__() method on objects is exclusively decided by the GC (usually when the refcount reaches zero) but in some cases, if you have circular references, it might never be called, even when the program exits.
If you are treating one specific configfile inside a class instance, then you might put a lock object from the Threading module inside it.
Some example code of this:
from threading import Lock
class ConfigFile:
def __init__(file):
self.file = file
self.lock = Lock()
def write(self, data):
self.lock.aquire()
<do stuff with file>
self.lock.release()
# Function that uses ConfigFile object
def staticmethod():
config = ConfigFile('myconfig.conf')
config.write('some data')
You can also use locks in a With statement, like:
def write(self, data):
with self.lock:
<do stuff with file>
And Python will aquire and release the lock for you, even in case of errors that happens while doing stuff with the file.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python multiprocessing initialising the Pool - python

Related

Python garbage collection and singletons classes

Multiprocessing pool with parameters that cannot be serialized

Parallel processing loop using multiprocessing Pool

If I have an #staticmethod in my python webapp, do I need to protect it with threading.RLock()?

thread Locking/unlocking in constructor/destructor in python

Categories

Resources