Best multiprocessing approach in this toy environment

Best multiprocessing approach in this toy environment - python

I'd like to increase the speed of my project using multiprocessing.
from multiprocessing import Queue, Process
def build(something):
# ... Build something ...
return something
# Things I want to build.
# Each of these things requires DIFFERENT TIME to be built.
some_things = [a_house, a_rocket, a_car]
#________________________________
# My approach
def do_work(queue, func, args):
queue.put(func(*args))
# Initialize a result queue
queue = Queue()
# Here I'll need to distribute the tasks (in case there are many)
# through each process. For example process 1 build a house and a rocket
# and so on. Anyway this is not the case..
procs = [Process(target=do_work, args=thing) for thing in some_things]
# Finally, Retrieve things from the queue
results = []
while not queue.empty():
results.append(queue.get())
Here the problem is that if a process finish to build its stuff it will wait until other processes will finish while I want such process to do something else.
How can I achieve this? I think I could use a pool of workers but I don't really understand how to use it because I need to retrieve the results. Can someone help with this?

There are a couple of techniques you can use:
Use a shared-memory Array to communicate between the main process and all the child processes. Put dicts as input values and set a flag once an output value has been computed.
Use Pipes to communicate job init data from the master to the workers, and results back from the workers to the master. This works well if you can serialize the data easily.
Both of these classes are detailed here: http://docs.python.org/2/library/multiprocessing.html

Related

How to parallelize function call in Python

I have this code:
fog_coeff = [0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]
start = time.time()
for f in fog_coeff:
foggy_images= am.add_fog(images[0:278],fog_coeff=f)
for img in foggy_images:
im = Image.fromarray(img)
im.save('./result/'+str(counter)+'-'+str(fog_coeff)+'.jpg')
counter += 1
print("time taken"+ str(time.time()-start))
I want to parallize this. How can I do this? My main idea was to take each value from fog_coeff list and give it to each core. Each core will then process 278 images. Is this the right direction? If so, how can I proceed?

You have two options for this, Threads or Processes. The first are allowed to share memory but they are limited in what they can do concurrently, so you can use variables to share the results for example, but you will have to use locks to avoid fdata races.
On the other side processes do not allow to share memory, gaining full concurrency at the OS level. You will have to use some sort of external communication like sockets to send the output back to the main process, or write their results to files.
The answer would depend on which of these two mechanisms you choose.
Edit: elaborating multi processing.
This is done with the multiprocessing library. You will basically define the function that you want your other process to run and then run it in a different process. Processes are handled by the OS, not by Python, so your OS scheduler will be in charge of where each process can be executed. There are advanced tools like process pools that would allow you to have always 4 processes running (in case you are on a quadra-core) but you won't be able to tell your OS how should he handle those processes. He may want to execute its own background processes.

Python multiprocessing create background threads that waits for function inputs

Am a beginner in python threading.... I want to create a program that has multiple threads waiting in the background, and at some point, execute function f(x) asynchronously. f(x) really takes a lot of time to compute (it computes gradients)..
I plan to run the program for several steps (i.e. for 100 steps), and each step has several values for x (i.e. 10 values), but I want to compute f(x) for all 10 values in a parallel manner to save time..
I looked at the multiprocessing python module but I need help on how to implement the threads and processes..

It's as easy as a python import:
from multiprocessing import Pool
pool = Pool(5)
pool.map(f, [<list of inputs>])
Now if your asynchronous functions will need to save it's computational result into the same place, it will be a little bit trickier:
from multiprocessing import Pool, Manager
l = Manager.list()
def func(l, *args, **kwargs): # you need to use the manager list as it's multiprocess safe
blah blah
pool = Pool(5)
pool.map(func, [<list of inputs>])
# result will now be stored in l.
And there you go.

If you want to run a script that fires off parallel tasks and manages a pool of processes, you will want to use multiprocessing.Pool.
However, it's not clear what your platform is; you could look into something like celery to handle queues for you (or AWS Lambda for potentially larger-scale work that would benefit from third party infrastructure management).

Execute Python threads in small groups

I am trying to insert some number(100) of data sets into SQL server using python. I am using multi-threading to create 100 threads in a loop. All of them are starting at the same time and this is bogging down the database. I want to group my threads into set of 5 and once that group is done, I would like to start the next group of threads and so on. As I am new to python and multi-threading, any help would be highly appreciated.Please find my code below.
for row in datasets:
argument1=row[0]
argument2=row[1]
jobs=[]
t = Thread(target=insertDataIntoSQLServer, args=(argument1,argument2,))
jobs.append(t)
t.start()
for t in jobs:
t.join()

On Python 2 and 3 you could use a multiprocessing.ThreadPool. This is like a multiprocessing.Pool, but using threads instead of processes.
import multiprocessing
datasets = [(1,2,3), (4,5,6)] # Iterable of datasets.
def insertfn(data):
pass # shove data to SQL server
pool = multiprocessing.ThreadPool()
p.map(insertfn, datasets)
By default, a Pool will create as many worker threads as your CPU has cores. Using more threads will probably not help, because they will be fighting for CPU time.
Note that I've grouped data into tuples. That is one way to get around the one argument restriction for pool workers.
On Python 3 you can also use a ThreadPoolExecutor.
Note however that on Python implementations (like the "standard" CPython) that have a Global Interpreter Lock, only one thread at a time can be executing Python bytecode. So using large numbers of threads will not automatically increase performance. Threads might help with operations that are I/O bound. If one thread is waiting for I/O, another thread can run.

First note that your code doesn't work as you intended: it sets jobs to an empty list every time through the loop, so after the loop is over you only join() the last thread created.
So repair that, by moving jobs=[] out of the loop. After that, you can get exactly what you asked for by adding this after t.start():
if len(jobs) == 5:
for t in jobs:
t.join()
jobs = []
I'd personally use some kind of pool (as other answers suggest), but it's easy to directly get what you had in mind.

You can create a ThreadPoolExecutor and specify max_workers=5.
See here.
And you can use functools.partial to turn your functions into the required 0-argument functions.
EDIT: You can pass the args in with the function name when you submit to the executor. Thanks, Roland Smith, for reminding me that partial is a bad idea. There was a better way.

Dynamically reordering jobs in a multiprocessing pool in Python

I'm writing a python script (for cygwin and linux environments) to run regression testing on a program that is run from the command line using subprocess.Popen(). Basically, I have a set of jobs, a subset of which need to be run depending on the needs of the developer (on the order of 10 to 1000). Each job can take anywhere from a few seconds to 20 minutes to complete.
I have my jobs running successfully across multiple processors, but I'm trying to eke out some time savings by intelligently ordering the jobs (based on past performance) to run the longer jobs first. The complication is that some jobs (steady state calculations) need to be run before others (the transients based on the initial conditions determined by the steady state).
My current method of handling this is to run the parent job and all child jobs recursively on the same process, but some jobs have multiple, long-running children. Once the parent job is complete, I'd like to add the children back to the pool to farm out to other processes, but they would need to be added to the head of the queue. I'm not sure I can do this with multiprocessing.Pool. I looked for examples with Manager, but they all are based on networking it seems, and not particularly applicable. Any help in the form of code or links to a good tutorial on multiprocessing (I've googled...) would be much appreciated. Here's a skeleton of the code for what I've got so far, commented to point out the child jobs that I would like spawned off on other processors.
import multiprocessing
import subprocess
class Job(object):
def __init__(self, popenArgs, runTime, children)
self.popenArgs = popenArgs #list to be fed to popen
self.runTime = runTime #Approximate runTime for the job
self.children = children #Jobs that require this job to run first
def runJob(job):
subprocess.Popen(job.popenArgs).wait()
####################################################
#I want to remove this, and instead kick these back to the pool
for j in job.children:
runJob(j)
####################################################
def main(jobs):
# This jobs argument contains only jobs which are ready to be run
# ie no children, only parent-less jobs
jobs.sort(key=lambda job: job.runTime, reverse=True)
multiprocessing.Pool(4).map(runJob, jobs)

First, let me second Armin Rigo's comment: There's no reason to use multiple processes here instead of multiple threads. In the controlling process you're spending most of your time waiting on subprocesses to finish; you don't have CPU-intensive work to parallelize.
Using threads will also make it easier to solve your main problem. Right now you're storing the jobs in attributes of other jobs, an implicit dependency graph. You need a separate data structure that orders the jobs in terms of scheduling. Also, each tree of jobs is currently tied to one worker process. You want to decouple your workers from the data structure you use to hold the jobs. Then the workers each draw jobs from the same queue of tasks; after a worker finishes its job, it enqueues the job's children, which can then be handled by any available worker.
Since you want the child jobs to be inserted at the front of the line when their parent is finished a stack-like container would seem to fit your needs; the Queue module provides a thread-safe LifoQueue class that you can use.
import threading
import subprocess
from Queue import LifoQueue
class Job(object):
def __init__(self, popenArgs, runTime, children):
self.popenArgs = popenArgs
self.runTime = runTime
self.children = children
def run_jobs(queue):
while True:
job = queue.get()
subprocess.Popen(job.popenArgs).wait()
for child in job.children:
queue.put(child)
queue.task_done()
# Parameter 'jobs' contains the jobs that have no parent.
def main(jobs):
job_queue = LifoQueue()
num_workers = 4
jobs.sort(key=lambda job: job.runTime)
for job in jobs:
job_queue.put(job)
for i in range(num_workers):
t = threading.Thread(target=run_jobs, args=(job_queue,))
t.daemon = True
t.start()
job_queue.join()
A couple of notes: (1) We can't know when all the work is done by monitoring the worker threads, since they don't keep track of the work to be done. That's the queue's job. So the main thread monitors the queue object to know when all the work is complete (job_queue.join()). We can thus mark the worker threads as daemon threads, so the process will exit whenever the main thread does without waiting on the workers. We thereby avoid the need for communication between the main thread and the worker threads in order to tell the latter when to break out of their loops and stop.
(2) We know all the work is done when all tasks that have been enqueued have been marked as done (specifically, when task_done() has been called a number of times equal to the number of items that have been enqueued). It wouldn't be reliable to use the queue's being empty as the condition that all work is done; the queue might be momentarily and misleadingly empty between popping a job from it and enqueuing that job's children.

Python queues - have at most n threads running

The scenario:
I have a really large DB model migration going on for a new build, and I'm working on boilerplating how we will go about migration current live data from a webapp into the local test databases.
I'd like to setup in python a script that will concurrently process the migration of my models. I have from_legacy and to_legacy methods for my model instances. What I have so far loads all my instances and creates threads for each, with each thread subclassed from the core threading modules with a run method that just does the conversion and saves the result.
I'd like to make the main loop in the program build a big stack of instances of these threads, and start to process them one by one, running only at most 10 concurrently as it does its work, and feeding the next in to be processed as others finish migrating.
What I can't figure out is how to utilize the queue correctly to do this? If each thread represents the full task of migration, should I load all the instances first and then create a Queue with maxsize set to 10, and have that only track currently running queues? Something like this perhaps?
currently_running = Queue()
for model in models:
task = Migrate(models) #this is subclassed thread
currently_running.put(task)
task.start()
In this case relying on the put call to block while it is at capacity? If I were to go this route, how would I call task_done?
Or rather, should the Queue include all the tasks (not just the started ones) and use join to block to completion? Does calling join on a queue of threads start the included threads?
What is the best methodology to approach the "at most have N running threads" problem and what role should the Queue play?

Although not documented, the multiprocessing module has a ThreadPool class which, as its name implies, creates a pool of threads. It shares the same API as the multiprocessing.Pool class.
You can then send tasks to the thread pool using pool.apply_async:
import multiprocessing.pool as mpool
def worker(task):
# work on task
print(task) # substitute your migration code here.
# create a pool of 10 threads
pool = mpool.ThreadPool(10)
N = 100
for task in range(N):
pool.apply_async(worker, args = (task, ))
pool.close()
pool.join()

This should probably be done using semaphores the example in the documentation is a hint of what you're try to accomplish.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.