Consider a simple workflow like this:
from dask.distributed import Client
import time
with Client() as client:
futs = client.map(time.sleep, list(range(10)))
The above code will submit and almost immediately cancel the futures since the context manager will close. It's possible to keep the context manager open until tasks are completed with client.gather, however that will block further execution in the current process.
I am interested in submitting tasks to multiple clusters (e.g. local and distributed) within the same process, ideally without blocking the current process. It's straightforward to do with explicit definition of different clients and clusters, but is it also possible with context managers (one for each unique client/cluster)?
It might sound like a bit of an anti-pattern, but maybe there is way to close the cluster only after computations all futures run. I tried fire_and_forget and also tried passing shutdown_on_close=False, but that doesn't seem to be implemented.
For some Dask cluster/scheduler types, such as the dask-cloudprovider ECSCluster, the approach described above using the with block and shutdown_on_close=False would work fine.
Both ECSCluster and SLURMCluster are derived from SpecCluster. However, ECSCluster passes its **kwargs (including shutdown_on_close) down to the SpecCluster constructor via this call:
super().__init__(**kwargs)
(see the ECSCluster code here)
SLURMCluster does not: it calls the JobQueueCluster constructor which in turn instantiates SpecCluster with only a subset of its parameters:
super().__init__(
scheduler=scheduler,
worker=worker,
loop=loop,
security=security,
silence_logs=silence_logs,
asynchronous=asynchronous,
name=name,
)
See the JobQueueCluster code here
Therefore SLURMCluster/JobQueueCluster is ignoring shutdown_on_close (and other optional parameters). Looks like an update to JobQueueCluster would be required for your use case.
Related
I have a threadpool that i'd like to limit not only the max number of workers but the max number of jobs that can be submitted to the threadpool at once. The reason for limiting the jobs is because the jobs are generated much quicker than the threadpool workers can execute and it can exhaust all available memory quickly.
How i'd like to interface with a "blocking" threadpool:
with ThreadPoolExecutor(max_workers=10) as executor:
for i in range(100_000_000):
executor.submit(do_work, i, block=True)
But block=True is not a thing on the executor.
Is there a blocking threadpool I can use which will block submission to the queue if the number of jobs in queue is at max_size? If not, what would be the best way to implement a blocking threadpool?
Looking at the implementation, there seems to be a relatively non-intrusive way to define one yourself:
class BlockingThreadPoolExecutor(ThreadPoolExecutor):
def __init__(self, *, queue_size=0, **kwargs):
super().__init__(**kwargs)
self._work_queue = queue.SimpleQueue(queue_size)
All this does is replace the unbounded work queue with a bounded one. Calls to submit will now block on their call to self._work_queue.put when the queue gets too full.
(This definition assumes you'll use keywords arguments, even though ThreadPoolExecutor.__init__ does not require them.)
All the standard warnings about modifying private class details apply, but this is a pretty minimal change. As long as no future versions of ThreadPoolExecutor change the name of the attribute or switch to a class with an incompatible interface with SimpleQueue, it should work fine.
Most importantly, what kind of function are you trying to parallelize?
In many cases a ThreadPool might not be what you want since due to the GIL only one thread can manipulate python objects at a time.
In that case you will have to use multiprocessing to get actual benefit out of parallelizing.
If not then coroutines (asyncio) might be a good fit for your problem as well which could avoid your problem.
How to achieve something like:
def call_me():
# doing some stuff which requires distributed locking
def i_am_calling():
# other logic
call_me()
# other logic
This code runs in a multithreaded environment. How can I make it something like, only a single thread from the thread pool has responsibility to run call_me() part of the i_am_calling()?
It depends on the exact requirement in hand and on the system architecture / solution. Accordingly, one of the approach can be based on lock to ensure that only one process does the locking at a time.
You can arrive on logic by trying usage of apply_async of the multiprocessing module that could enable invocation of a number of different functions (not of same type of function) with pool.apply_async. It shall use only one process when that function is invoked only once, however you can bundle up tasks ahead and pass/submit these tasks to the various worker processes. There is also the pool.apply that submits a task to the pool , but it blocks until the function is completed or result is available. The equivalent of it is pool.apply_async(func, args, kwargs).get() based on get() or a callback function with pool.apply_async without get(). Also, it should be noted that pool.apply(f, args) ensures that only one of the workers of the pool will execute f(args).
You can also arrive on logic by trying of making a respective call in its own thread using executor.submit that is part of concurrent.futures which is a standard Python library . The asyncio can be coupled with concurrent.futures such that it can await functions executed in thread or process pools provided by concurrent.futures as highlighted in this example.
If you would like to run a routine functionality at regular interval, then you can arrive on a logic based on threading.timer.
I've got the following problem:
I have two different classes; let's call them the interface and worker. The interface is supposed to accept requests from outside, and multiplexes them to several workers.
Contrary to almost every example I have found, I have several peculiarities:
The workers are not supposed to be recreated for every request.
The workers are different; a request for workers[0] cannot be answered by workers[1]. This multiplexing is done in interface.
I have a number of function-like calls which are difficult to model via events or simple queues.
There are a few different requests, which would make one queue per request difficult.
For example, assume that each worker is storing a single integer number (let's say the number of calls this worker received). In non-parallel processing, I'd use something like this:
class interface(object):
workers = None #set somewhere else.
def get_worker_calls(self, worker_id):
return self.workers[worker_id].get_calls()
class worker(object)
calls = 0
def get_calls(self):
self.calls += 1
return self.calls
This, obviously, doesn't work. What does?
Or, maybe more relevantly, I don't have experience with multiprocessing. Is there a design paradigm I'm missing that would easily solve the above?
Thanks!
For reference, I have considered several approaches, and I was unable to find a good one:
Use one request and answer queue. I've discarded this idea since that'd either block interface'for the answer-time of the current worker (making it badly scalable), or would require me sending around extra information.
Use of one request queue. Each message contains a pipe to return the answer to that request. After fixing the issue with being unable to send pipes via pipes, I've run into problems with pipe closing unless sending both ends over the connection.
Use of one request queue. Each message contains a queue to return the answer to that request. Fails since I cannot send queues via queues, but the reduction trick doesn't work.
The above also applies to the respective Manager-generated objects.
Multiprocessing means you have 2+ separated processes running. There is no way to access memory from one process to another directly (as with multithreading).
Your best shot is to use some kind of external Queue mechanism, you can start with Celery or RQ. RQ is simpler but celery has built-in monitoring.
But you have to know that Multiprocessing will work only if Celery/RQ are able to "pack" the needed functions/classes and send them to other process. Therefore you have to use __main__ level functions (that are in top of file, not belongs to any class).
You can always implement it yourself, Redis is very simple, ZeroMQ and RabbitMQ are also good.
Beaver library is good example of how to deal with multiprocessing in python using ZeroMQ queue.
First of all i know i can use threading to accomplish such task, like so:
import Queue
import threading
# called by each thread
def do_stuff(q, arg):
result = heavy_operation(arg)
q.put(result)
operations = range(1, 10)
q = Queue.Queue()
for op in operations:
t = threading.Thread(target=do_stuff, args = (q,op))
t.daemon = True
t.start()
s = q.get()
print s
However, in google app engine there's something called ndb tasklets and according to their documentation you can execute code in parallel using them.
Tasklets are a way to write concurrently running functions without
threads; tasklets are executed by an event loop and can suspend
themselves blocking for I/O or some other operation using a yield
statement. The notion of a blocking operation is abstracted into the
Future class, but a tasklet may also yield an RPC in order to wait for
that RPC to complete.
Is it possible to accomplish something like the example with threading above?
I already know how to handle retrieving entities using get_async() (got it from their examples at doc page) but its very unclear to me when it comes to parallel code execution.
Thanks.
The answer depended on what your heavy_operation really is. If the heavy_operation use RPC (Remote Procedure Call, such as datastore access, UrlFetch, ... etc), then the answer is yes.
In
how to understand appengine ndb.tasklet?
I asked a similar question, you may find more details there.
May I put any kind of code inside a function and decorate it as ndb.tasklet? Then used it as async function later. Or it must be appengine RPC?
The Answer
Technically yes, but it will not run asynchronously. When you decorate a non-yielding function with #tasklet, its Future's value is computed and set when you call that function. That is, it runs through the entire function when you call it. If you want to achieve asynchronous operation, you must yield on something that does asynchronous work. Generally in GAE it will work its way down to an RPC call.
How can I make the following object instantiation asynchronous:
class IndexHandler(tornado.web.RequentHandler):
def get(self, id):
# Async the following
data = MyDataFrame(id)
self.write(data.getFoo())
The MyDataFrame returns a pandas DataFrame object and can take some time depending on the file it has to parse.
MyDataFrame() is a synchronous interface; to use it without blocking you need to do one of two things:
Rewrite it to be asynchronous. You can't really make an __init__ method asynchronous, so you'll need to refactor things into a static factory function instead of a constructor. In most cases this path only makes sense if the method depends on network I/O (and not reading from the filesystem or processing the results on the CPU).
Run it on a worker thread and asynchronously wait for its result on the main thread. From the way you've framed the question, this sounds like the right approach for you. I recommend the concurrent.futures package (in the standard library since Python 3.2; available via pip install futures for 2.x).
This would look something like:
#tornado.gen.coroutine
def get(self, id):
data = yield executor.submit(MyDataFrame, id)
self.write(data.getFoo())
where executor is a global instance of ThreadPoolExecutor.