parallel processing of DAG

parallel processing of DAG - python

I'm trying hard to figure out how I can process a directed acyclic graph in parallel. Each node should only be able to "execute" when all its input nodes have been processed beforehand. Imagine a class Task with the following interface:
class Task(object):
result = None
def inputs(self):
''' List all requirements of the task. '''
return ()
def run(self):
pass
I can not think of a way to process the graph that could be represented
by this structure asynchronously with a maximum number of workers at the
same time, except for one method.
I think the optimal processing would be achieved by creating a thread
for each task, waiting for all inputs to be processed. But, spawning
a thread for each task immediately instead of consecutively (i.e. when the
task is ready to be processed) does not sound like a good idea to me.
import threading
class Runner(threading.Thread):
def __init__(self, task):
super(Runner, self).__init__()
self.task = task
self.start()
def run(self):
threads = [Runner(r) for r in self.task.inputs()]
[t.join() for t in threads]
self.task.run()
Is there a way to mimic this behaviour more ideally? Also, this approach
does currently not implement a way to limit the number of running tasks at
a time.

Have one master thread push items to a queue once they are ready for being processsed. Then have a pool of workers listen on the queue for tasks to work on. (Python provides a synchronized queue in the Queue module, renamed to lower-case queue in Python 3).
The master first creates a map from dependencies to dependent tasks. Every task that doesn't have any dependcies can go into the queue. Everytime a task is completed, the master uses the dictionary to figure out which dependent tasks there are, and puts them into the queue if all their depndencies are met now.

Celery (http://www.celeryproject.org/) is the leading task management tool for Python. It should be able to help you with this.

Related

Kill a worker thread after a certain time in python2.7

I'm working on a Python 2.7 script using threading.
There is one global connection object, which has to be used by each thread.
Code Example:
from threading import Thread
import time
class Connection:
def __init__(self):
self.connected = True
def send_command(self, command):
return str(command)+' result'
class Config:
def __init__(self):
self.conn = Connection()
def do_remote_config(self):
time.sleep(2)
return self.conn.send_command('my config')
def do_other_remote_config(self):
time.sleep(2)
return self.conn.send_command('my other config')
class Executor:
def execute(self):
config = Config()
worker1 = Worker(config.do_remote_config)
worker1.start()
worker1.join()
print(worker1.result)
worker2 = Worker(config.do_other_remote_config)
worker2.start()
worker2.join()
print(worker2.result)
class Worker(Thread):
def __init__(self, method):
super(Worker, self).__init__()
self.result = None
self.method = method
def run(self):
try:
self.result = self.method()
except Exception as ex:
self.result = ex
if __name__ == "__main__":
e = Executor()
e.execute()
In order to ensure that none of the threads runs for more than 10 minutes, I wanted to kill each thread in case the time limit is reached. Unfortunately, it turns out that Python threads cannot be killed.
Thread Kill Pill Option:
Due to the actual complexity of the worker threads, it is unfortunately not possible to build some kind of kill-trigger, which lets the worker thread end himself. So, it seems that I really need to get rid of threading here because threads by nature cannot be killed.
Multiprocess Option:
Using the multiprocess module, different processes could be used. Those could then be killed after a certain time. However I did not find a way to pass on my connection object in such a way that it can be used by several processes.
Remote Procedure Calls (RPC) option:
RPCs seem to introduce an unnecessary level of complexity and the kill switch could presumably still not be implemented.
Question:
Which Python technologies would work best in order to being able to use the connection object with all workers while ensuring that each worker can reliably be killed after 10 minutes?
Thanks very much!

Too long for a comment, too abstract for an answer.
I would say that multiprocessing is the way to go if you wish to be able to interrupt processing in a random moment without revamping the whole processing. All other methods demand some sort of cooperation from threads being interrupted.
Certainly splitting the whole process into pieces demand some processing changes as well. All shared file-like resources (opened files, sockets, pipes) are to be opened before the forking of processes and carefully orchestrated. Probably the safest approach would be like this:
you have a master socket being listen()ed by a master process. The master also runs a workers pool. It's essential to create the master socket before the pool, to make the socket available to the workers.
The master delivers new job tasks to the workers and receives results if needed via multiprocessing primitives.
When a new client arrives, the master orders a selected worker from the pool to accept() the connection and returns to back to waiting for new clients and other master's activities. The worker accept()s the connection thus creating a private socket to communicate with a particular client, no other workers can and should access the client socket.
If workers need to communicate with each other, all necessary communication primitives must be created before the pool and distributed among the workers by the master.

Sending completed jobs back to correct process in python

I'd like to create a set of processes with the following structure:
main, which dequeues requests from an external source. main generates a variable number of worker processes.
worker which does some preliminary processing on job requests, then sends data to gpuProc.
gpuProc, which accepts job requests from worker processes. When it has received enough requests, it sends the batch to a process that runs on the GPU. After getting the results back, it has to then send back the completed batch of requests back to the worker processes such that the worker that requested it receives it back
One could envision doing this with a number of queues. Since the number of worker processes is variable, it would be ideal if gpuProc had a single input queue into which workers put their job request and their specific return queue as a tuple. However, this isn't possible--you can only share vanilla queues in python via inheritance, and manager.Queues() fail with:
RemoteError:
---------------------------------------------------------------------------
Unserializable message: ('#RETURN', ('Worker 1 asked proc to do some work.', <Queue.Queue instance at 0x7fa0ba14d908>))
---------------------------------------------------------------------------
Is there a pythonic way to do this without invoking some external library?

multiprocessing.Queue is implemented with a pipe, a deque and a thread.
When you call queue.put() the objects ends up in the deque and the thread takes care of pushing it into the pipe.
You cannot share threads within processes for obvious reasons. Therefore you need to use something else.
Regular pipes and sockets can be easily shared.
Nevertheless I'd rather use a different architecture for your program.
The main process would act as an orchestrator routing the tasks to two different Pools of processes, one for CPU bound jobs and the other to GPU bound ones. This would imply you need to share more information within the workers but it's way more robust and scalable.
Here you get a draft:
from multiprocessing import Pool
def cpu_worker(job_type, data):
if job_type == "first_computation":
results do_cpu_work()
elif job_type == "compute_gpu_results":
results = do_post_gpu_work()
return results
def gpu_worker(data):
return do_gpu_work()
class Orchestrator:
def __init__(self):
self.cpu_pool = Pool()
self.gpu_pool = Pool()
def new_task(self, task):
"""Entry point for a new task. The task will be run by the CPU workers and the results handled by the cpu_job_done method."""
self.cpu_pool.apply_async(cpu_worker, args=["first_computation", results], callback=self.cpu_job_done)
def cpu_job_done(self, results):
"""Once the first CPU computation is done, send its results to a GPU worker. Its results will be handled by the gpu_job_done method."""
self.gpu_pool.apply_async(gpu_worker, args=[results], callback=self.gpu_job_done)
def gpu_job_done(self, results):
"""GPU computation done, send the data back for the last CPU computation phase. Results will be handled by the task_done method."""
self.cpu_pool.apply_async(cpu_worker, args=["compute_gpu_results", results], callback=self.task_done)
def task_done(self, results):
"""Here you get your final results for the task."""
print(results)

Dynamically reordering jobs in a multiprocessing pool in Python

I'm writing a python script (for cygwin and linux environments) to run regression testing on a program that is run from the command line using subprocess.Popen(). Basically, I have a set of jobs, a subset of which need to be run depending on the needs of the developer (on the order of 10 to 1000). Each job can take anywhere from a few seconds to 20 minutes to complete.
I have my jobs running successfully across multiple processors, but I'm trying to eke out some time savings by intelligently ordering the jobs (based on past performance) to run the longer jobs first. The complication is that some jobs (steady state calculations) need to be run before others (the transients based on the initial conditions determined by the steady state).
My current method of handling this is to run the parent job and all child jobs recursively on the same process, but some jobs have multiple, long-running children. Once the parent job is complete, I'd like to add the children back to the pool to farm out to other processes, but they would need to be added to the head of the queue. I'm not sure I can do this with multiprocessing.Pool. I looked for examples with Manager, but they all are based on networking it seems, and not particularly applicable. Any help in the form of code or links to a good tutorial on multiprocessing (I've googled...) would be much appreciated. Here's a skeleton of the code for what I've got so far, commented to point out the child jobs that I would like spawned off on other processors.
import multiprocessing
import subprocess
class Job(object):
def __init__(self, popenArgs, runTime, children)
self.popenArgs = popenArgs #list to be fed to popen
self.runTime = runTime #Approximate runTime for the job
self.children = children #Jobs that require this job to run first
def runJob(job):
subprocess.Popen(job.popenArgs).wait()
####################################################
#I want to remove this, and instead kick these back to the pool
for j in job.children:
runJob(j)
####################################################
def main(jobs):
# This jobs argument contains only jobs which are ready to be run
# ie no children, only parent-less jobs
jobs.sort(key=lambda job: job.runTime, reverse=True)
multiprocessing.Pool(4).map(runJob, jobs)

First, let me second Armin Rigo's comment: There's no reason to use multiple processes here instead of multiple threads. In the controlling process you're spending most of your time waiting on subprocesses to finish; you don't have CPU-intensive work to parallelize.
Using threads will also make it easier to solve your main problem. Right now you're storing the jobs in attributes of other jobs, an implicit dependency graph. You need a separate data structure that orders the jobs in terms of scheduling. Also, each tree of jobs is currently tied to one worker process. You want to decouple your workers from the data structure you use to hold the jobs. Then the workers each draw jobs from the same queue of tasks; after a worker finishes its job, it enqueues the job's children, which can then be handled by any available worker.
Since you want the child jobs to be inserted at the front of the line when their parent is finished a stack-like container would seem to fit your needs; the Queue module provides a thread-safe LifoQueue class that you can use.
import threading
import subprocess
from Queue import LifoQueue
class Job(object):
def __init__(self, popenArgs, runTime, children):
self.popenArgs = popenArgs
self.runTime = runTime
self.children = children
def run_jobs(queue):
while True:
job = queue.get()
subprocess.Popen(job.popenArgs).wait()
for child in job.children:
queue.put(child)
queue.task_done()
# Parameter 'jobs' contains the jobs that have no parent.
def main(jobs):
job_queue = LifoQueue()
num_workers = 4
jobs.sort(key=lambda job: job.runTime)
for job in jobs:
job_queue.put(job)
for i in range(num_workers):
t = threading.Thread(target=run_jobs, args=(job_queue,))
t.daemon = True
t.start()
job_queue.join()
A couple of notes: (1) We can't know when all the work is done by monitoring the worker threads, since they don't keep track of the work to be done. That's the queue's job. So the main thread monitors the queue object to know when all the work is complete (job_queue.join()). We can thus mark the worker threads as daemon threads, so the process will exit whenever the main thread does without waiting on the workers. We thereby avoid the need for communication between the main thread and the worker threads in order to tell the latter when to break out of their loops and stop.
(2) We know all the work is done when all tasks that have been enqueued have been marked as done (specifically, when task_done() has been called a number of times equal to the number of items that have been enqueued). It wouldn't be reliable to use the queue's being empty as the condition that all work is done; the queue might be momentarily and misleadingly empty between popping a job from it and enqueuing that job's children.

Python queues - have at most n threads running

The scenario:
I have a really large DB model migration going on for a new build, and I'm working on boilerplating how we will go about migration current live data from a webapp into the local test databases.
I'd like to setup in python a script that will concurrently process the migration of my models. I have from_legacy and to_legacy methods for my model instances. What I have so far loads all my instances and creates threads for each, with each thread subclassed from the core threading modules with a run method that just does the conversion and saves the result.
I'd like to make the main loop in the program build a big stack of instances of these threads, and start to process them one by one, running only at most 10 concurrently as it does its work, and feeding the next in to be processed as others finish migrating.
What I can't figure out is how to utilize the queue correctly to do this? If each thread represents the full task of migration, should I load all the instances first and then create a Queue with maxsize set to 10, and have that only track currently running queues? Something like this perhaps?
currently_running = Queue()
for model in models:
task = Migrate(models) #this is subclassed thread
currently_running.put(task)
task.start()
In this case relying on the put call to block while it is at capacity? If I were to go this route, how would I call task_done?
Or rather, should the Queue include all the tasks (not just the started ones) and use join to block to completion? Does calling join on a queue of threads start the included threads?
What is the best methodology to approach the "at most have N running threads" problem and what role should the Queue play?

Although not documented, the multiprocessing module has a ThreadPool class which, as its name implies, creates a pool of threads. It shares the same API as the multiprocessing.Pool class.
You can then send tasks to the thread pool using pool.apply_async:
import multiprocessing.pool as mpool
def worker(task):
# work on task
print(task) # substitute your migration code here.
# create a pool of 10 threads
pool = mpool.ThreadPool(10)
N = 100
for task in range(N):
pool.apply_async(worker, args = (task, ))
pool.close()
pool.join()

This should probably be done using semaphores the example in the documentation is a hint of what you're try to accomplish.

Best multiprocessing approach in this toy environment

I'd like to increase the speed of my project using multiprocessing.
from multiprocessing import Queue, Process
def build(something):
# ... Build something ...
return something
# Things I want to build.
# Each of these things requires DIFFERENT TIME to be built.
some_things = [a_house, a_rocket, a_car]
#________________________________
# My approach
def do_work(queue, func, args):
queue.put(func(*args))
# Initialize a result queue
queue = Queue()
# Here I'll need to distribute the tasks (in case there are many)
# through each process. For example process 1 build a house and a rocket
# and so on. Anyway this is not the case..
procs = [Process(target=do_work, args=thing) for thing in some_things]
# Finally, Retrieve things from the queue
results = []
while not queue.empty():
results.append(queue.get())
Here the problem is that if a process finish to build its stuff it will wait until other processes will finish while I want such process to do something else.
How can I achieve this? I think I could use a pool of workers but I don't really understand how to use it because I need to retrieve the results. Can someone help with this?

There are a couple of techniques you can use:
Use a shared-memory Array to communicate between the main process and all the child processes. Put dicts as input values and set a flag once an output value has been computed.
Use Pipes to communicate job init data from the master to the workers, and results back from the workers to the master. This works well if you can serialize the data easily.
Both of these classes are detailed here: http://docs.python.org/2/library/multiprocessing.html

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.