Multiprocessing pool with parameters that cannot be serialized

Multiprocessing pool with parameters that cannot be serialized - python

The pool function of multiprocessing pickles all the parameters that are passed into it and then recreates them in the pool.
In my example, I have some parameters that cannot be pickeled (they are c++ objects) and they take a lot of time to create.
Is there any way I can pass those parameters into the pool without having to make them serializable?

multiprocessing.Pool allows to pass an initializer function which is executed every time a new worker process is spawned.
You can use this function to initialize you C++ objects. Each process will have its own copy.
from multiprocessing import Pool
parameters = None
def initializer():
global parameters
parameters = initialize_cpluplus_objects()
def function():
parameters.do_something()
pool = Pool(initializer=initializer_function)
pool.apply_async(function)

Related

Python multiprocessing initialising the Pool

I have a large read-only object that I would like the child processes to use but unfortunately this object cannot be pickled. Given that it's read-only I thought about declaring it as a global and then using an initializing function in the Pool where I perform the necessary copying. My code is something like:
def f(processes, args):
global pool
pool = multiprocessing.Pool(processes,setGlobal,[args])
def setGlobal(args):
# global object to be used by the child processes...
global obj
obj = copy.deepcopy(args)
The function setGlobal performs the initialization. My first question concerns the arguments to setGlobal (which are passed as a list). Do these need to be pickle-able? The errors I'm getting seem to suggest that they do. If so, how can I make the unpickle-able read-only object visible to my child processes?

Creating generic Signal using *args

I'm trying to create a generic function caller class with the purpose of calling GUI functions from outside the main thread. The class has a signal and two functions: one is called to emit the signal and runs from a side thread. The other runs the actual function and is connected to the signal in the main thread.
In keeping this class generic, I want to be able to pass any number of arguments, as would normally be done using *args or **kwargs. But when initializing the Signal I'm required to define what arguments are passed.
This is my generic function caller (code stripped down for clarity):
from PySide.QtCore import Signal
class FunctionCaller(QObject):
_request = Signal(int)
def __init__(self, function=str):
super().__init__()
self.function = function
self._request.connect(self._do_function)
#Slot(*args)
def _do_function(self, *args):
"""This runs in the GUI thread to handle the thing."""
self.function(*args)
def doFunction(self, *args):
self._request.emit(*args)
Here's what appears in the main thread:
functionCaller = FunctionCaller(function = someGUIfunction)
foo = functionCaller.doFunction # foo is a global variable
And here's where I call the function in the side thread:
foo(0xc000, 'string')
The error I get when running the function with 3 arguments: _request(int) only accepts 1 arguments, 3 given!
Is there a way to do this or must this function be less generic than I hoped, only working for a set number and types of arguments?

How to enqueue a job in rq from redis

I have to fetch functions and time when it execute from mysql and then save this thing into redis.Now from redis I have to execute functions at prescribed time.I want to use rq as scheduler but I am not able to find out the model in which I should save imported data into redis.
I am totally new in python and redis

If you install redis there is a file (for me it was the ~/lib/python2.7/site-packages/rq/queue.py which in turn calls job.py) that clearly states the enqueue and enqueue_call functions:
def enqueue_call(self, func, args=None, kwargs=None,
timeout=None, result_ttl=None, description=None,
depends_on=None):
"""Creates a job to represent the delayed function call and enqueues it.
It is much like `.enqueue()`, except that it takes the function's args
and kwargs as explicit arguments. Any kwargs passed to this function
contain options for RQ itself.
etc...."""
def enqueue(self, f, *args, **kwargs):
"""Creates a job to represent the delayed function call and enqueues it.
Expects the function to call, along with the arguments and keyword
arguments.
etc...."""

Parallel processing loop using multiprocessing Pool

I want to process a large for loop in parallel, and from what I have read the best way to do this is to use the multiprocessing library that comes standard with Python.
I have a list of around 40,000 objects, and I want to process them in parallel in a separate class. The reason for doing this in a separate class is mainly because of what I read here.
In one class I have all the objects in a list and via the multiprocessing.Pool and Pool.map functions I want to carry out parallel computations for each object by making it go through another class and return a value.
# ... some class that generates the list_objects
pool = multiprocessing.Pool(4)
results = pool.map(Parallel, self.list_objects)
And then I have a class which I want to process each object passed by the pool.map function:
class Parallel(object):
def __init__(self, args):
self.some_variable = args[0]
self.some_other_variable = args[1]
self.yet_another_variable = args[2]
self.result = None
def __call__(self):
self.result = self.calculate(self.some_variable)
The reason I have a call method is due to the post I linked before, yet I'm not sure I'm using it correctly as it seems to have no effect. I'm not getting the self.result value to be generated.
Any suggestions?
Thanks!

Use a plain function, not a class, when possible. Use a class only when there is a clear advantage to doing so.
If you really need to use a class, then given your setup, pass an instance of Parallel:
results = pool.map(Parallel(args), self.list_objects)
Since the instance has a __call__ method, the instance itself is callable, like a function.
By the way, the __call__ needs to accept an additional argument:
def __call__(self, val):
since pool.map is essentially going to call in parallel
p = Parallel(args)
result = []
for val in self.list_objects:
result.append(p(val))

Pool.map simply applies a function (actually, a callable) in parallel. It has no notion of objects or classes. Since you pass it a class, it simply calls __init__ - __call__ is never executed. You need to either call it explicitly from __init__ or use pool.map(Parallel.__call__, preinitialized_objects)

Attributes of Celery Tasks

Does celery purge/fail to copy instance variables when a task is handled by delay?
class MyContext(object):
a = 1
class MyTask(Task):
def run(self):
print self.context.a
from tasks import MyTask, MyContext
c = MyContext()
t = MyTask()
t.context = c
print t.context.a
#Shows 1
t.delay()
=====Worker Output
Task tasks.MyTask[d30e1c37-d094-4809-9f72-89ff37b81a85]
raised exception: AttributeError("'NoneType' object has no attribute 'a'",)
It looks like this has been asked before here, but I do not see an answer.

This doesn't work because the instance that actually runs isn't the same instance where you call the delay method. Every worker instantiates it's own singleton for each task.
In short, celery isn't designed for the task objects to carry data. Data should be passed to the task through the delay or apply_async methods. If the context object is simple and can be pickled just pass it to delay. If it's complex, a better approach may be to pass a database id so that the task can retrieve it in the worker.
http://docs.celeryproject.org/en/latest/userguide/tasks.html#instantiation
Also, note that in celery 2.5 delay and apply_async were class methods.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Multiprocessing pool with parameters that cannot be serialized - python

Related

Python multiprocessing initialising the Pool

Creating generic Signal using *args

How to enqueue a job in rq from redis

Parallel processing loop using multiprocessing Pool

Attributes of Celery Tasks

Categories

Resources