Good afternoon,
I am trying to parallelize a linear programming solving scheme, code is partially reproduced below. The solving method make use of the PuLP library, which uses subprocesses to run solver
commands.
from collections import OrderedDict
from time import time
from multiprocessing import Queue, Process
from queue import Empty
from os import getpid, path, mkdir
import sys
SOLVER = None
NUMBER_OF_PROCESSES = 12
# other parameters
def choose_solver():
"""Choose an initial solver"""
if SOLVER == "CHOCO":
solver = plp.PULP_CHOCO_CMD()
elif SOLVER == "GLPK":
solver = plp.GLPK_CMD(msg=0)
elif SOLVER == "GUROBI":
solver = plp.GUROBI_CMD(msg=0)
else:
solver = plp.PULP_CBC_CMD(msg=0)
return solver
# other functions that are not multiprocess relevant
def is_infeasible(status):
"""Wrapper around PulP infeasible status"""
return status in (plp.LpStatusInfeasible, plp.LpStatusUndefined)
def feasible_problems(input_var, output_var, initial_problem, solver):
"""Perform LP solving on a initial
problem, return the feasible ones"""
input_gt = input_var - TOL >= 0
input_lt = input_var + TOL <= 0
output_eq_input = (output_var - input_var == 0)
output_eq_zero = (output_var == 0)
problem_a = initial_problem.deepcopy()
problem_a += input_gt
problem_a += output_eq_input
problem_b = initial_problem.deepcopy()
problem_b += input_lt
problem_b += output_eq_zero
problem_a.solve(solver)
problem_b.solve(solver)
status_act = problem_a.status
status_inact = problem_b.status
if is_infeasible(status_act):
return (problem_b,)
else:
if is_infeasible(status_inact):
return (problem_a,)
else:
return (problem_a, problem_b)
def worker(q, r, start_problem, start_idx, to_check):
"""Worker spawned in a new process.
Iterates over the neuron expression list.
Sends a new job to the tasks queue if two activations are available.
"""
problem = start_problem
solver = choose_solver()
for idx in range(start_idx, len(to_check) + 1):
if idx == len(to_check):
r.put_nowait(problem)
else:
output_var, input_var = to_check[idx]
pbs = feasible_problems(input_var, output_var, problem, solver)
if len(pbs) == 1:
problem = pbs[0]
elif len(pbs) == 2:
q.put_nowait((idx+1, pbs[0]))
problem = pbs[1]
def overseer(init_prob, neuron_exprs):
"""Running in the initial process,
this function create tasks and results queues,
maintain the number of current running processes
and spawn new processes when there is enough resources
for them to run.
"""
tasks = Queue()
results = Queue()
working_processes = {}
init_p = Process(target=worker,
args=(tasks, results, init_prob, 0, neuron_exprs))
init_p.start()
working_processes[init_p.pid] = init_p
res_list = []
while len(working_processes) > 0:
if len(working_processes) <= NUMBER_OF_PROCESSES:
# if there is enough room in the working queue,
# spawn a new process and add it
try:
(idx, problem) = tasks.get(timeout=1)
except Empty:
break
proc = Process(target=worker, args=(tasks,
results, problem, idx, neuron_exprs))
proc.start()
working_processes[proc.pid] = proc
to_del = []
for pid in working_processes:
pwork = working_processes[pid]
pwork.join(timeout=0)
if pwork.exitcode is not None:
to_del.append(pid)
for pid in to_del:
#deleting working process
del working_processes[pid]
results.join_thread()
for i in range(results.qsize()):
elt = results.get()
res_list.append(elt)
return res_list
def test_multi(init_prob, neuron_exprs):
print("Testing multi process mode")
now = time()
init_prob, exprs = #some function that calculate those
res = overseer(init_prob, exprs)
print("Time spent: {:.4f}s".format(time()-now))
for idx, problem in enumerate(res):
if not path.exists("results"):
mkdir("results")
problem.writeLP("results/"+str(idx))
if __name__ == '__main__':
torch_model = read_model(MODEL_PATH)
print("Number of neurons: ", count_neurons(torch_model))
print("Expected number of facets: ",
theoretical_number(torch_model, DIM_INPUT))
prob, to_check, hp, var_dict = init_problem(torch_model)
test_multi(prob, to_check)
In my worker, I perform some costly calculations that may result in two different problems;
if that happens, I send one problem to a tasks queue while keeping the other for the current worker process. My overseer take a task in the queue and launch a process when it can.
to_check is a list of PuLP expressions,
What I want to do is to fill the working_processes dictionnary with processes that are actually running, then look for their results at each iteration and remove those who have finished. The expected behaviour would be to keep spawning new processes when old ones terminates, which does not seem to be the case. However here I am indefinitely hanging: I successfully take the tasks in the queue, but my program hangs when I spawn more than NUMBER_OF_PROCESSES.
I'm quite new to multiprocessing, so there is maybe something wrong with how I'm doing it. Does anyone have any idea?
Take a look at the ProcessPoolExecutor from concurrent.futures.
Executor objects allow you to specify a pool of workers with a capped size. You can submit all your jobs simultaneously and the executors run through them picking up new jobs as old ones are completed.
Related
I'm trying to use multiprocessing for a function that can potentially return a segfault (I have no control over this ATM). In cases where the child process hits a segfault, I want only that child to fail, but all other child tasks to continue/return their results.
I've already switched from multiprocessing.Pool to concurrent.futures.ProcessPoolExecutor avoid the issue of the child process hanging forever (or until an arbitrary timeout) as documented in this bug: https://bugs.python.org/issue22393.
However the issue I face now, is that when the first child task hits a segfault, all in-flight child processes get marked as broken (concurrent.futures.process.BrokenProcessPool).
Is there a way to only mark actually broken child processes as broken?
Code I'm running in Python 3.7.4:
import concurrent.futures
import ctypes
from time import sleep
def do_something(x):
print(f"{x}; in do_something")
sleep(x*3)
if x == 2:
# raise a segmentation fault internally
return x, ctypes.string_at(0)
return x, x-1
nums = [1, 2, 3, 1.5]
executor = concurrent.futures.ProcessPoolExecutor()
result_futures = []
for num in nums:
# Using submit with a list instead of map lets you get past the first exception
# Example: https://stackoverflow.com/a/53346191/7619676
future = executor.submit(do_something, num)
result_futures.append(future)
# Wait for all results
concurrent.futures.wait(result_futures)
# After a segfault is hit for any child process (i.e. is "terminated abruptly"), the process pool becomes unusable
# and all running/pending child processes' results are set to broken
for future in result_futures:
try:
print(future.result())
except concurrent.futures.process.BrokenProcessPool:
print("broken")
Result:
(1, 0)
broken
broken
(1.5, 0.5)
Desired result:
(1, 0)
broken
(3, 2)
(1.5, 0.5)
multiprocessing.Pool and concurrent.futures.ProcessPoolExecutor both make assumptions about how to handle the concurrency of the interactions between the workers and the main process that are violated if any one process is killed or segfaults, so they do the safe thing and mark the whole pool as broken. To get around this, you will need to build up your own pool with different assumptions directly using multiprocessing.Process instances.
This might sound intimidating but a list and a multiprocessing.Manager will get you pretty far:
import multiprocessing
import ctypes
import queue
from time import sleep
def do_something(job, result):
while True:
x=job.get()
print(f"{x}; in do_something")
sleep(x*3)
if x == 2:
# raise a segmentation fault internally
return x, ctypes.string_at(0)
result.put((x, x-1))
nums = [1, 2, 3, 1.5]
if __name__ == "__main__":
# you ARE using the spawn context, right?
ctx = multiprocessing.get_context("spawn")
manager = ctx.Manager()
job_queue = manager.Queue(maxsize=-1)
result_queue = manager.Queue(maxsize=-1)
pool = [
ctx.Process(target=do_something, args=(job_queue, result_queue), daemon=True)
for _ in range(multiprocessing.cpu_count())
]
for proc in pool:
proc.start()
for num in nums:
job_queue.put(num)
try:
while True:
# Timeout is our only signal that no more results coming
print(result_queue.get(timeout=10))
except queue.Empty:
print("Done!")
print(pool) # will see one dead Process
for proc in pool:
proc.kill() # avoid stderr spam
This "Pool" is a little inflexible, and you will probably want to customize it for your application's specific needs. But you can definitely skip right over segfaulting workers.
When I went down this rabbit hole, where I was interested in cancelling specific submissions to a worker pool, I eventually wound up writing a whole library to integrate into Trio async apps: trio-parallel. Hopefully you won't need to go that far!
Based on #Richard Sheridan's answer, I ended up using the code below. This version doesn't require setting a timeout, which is something I couldn't do for my use case.
import ctypes
import multiprocessing
from typing import List
from time import sleep
def do_something(x, result):
print(f"{x} starting")
sleep(x * 3)
if x == 2:
# raise a segmentation fault internally
y = ctypes.string_at(0)
y = x
print(f"{x} done")
results_queue.put(y)
def wait_for_process_slot(
processes: List,
concurrency: int = multiprocessing.cpu_count() - 1,
wait_sec: int = 1,
) -> int:
"""Blocks main process if `concurrency` processes are already running.
Alternative to `multiprocessing.Semaphore.acquire`
useful for when child processes might fail and not be able to signal.
Relies instead on the main's (parent's) tracking of `multiprocessing.Process`es.
"""
counter = 0
while True:
counter = sum([1 for i, p in processes.items() if p.is_alive()])
if counter < concurrency:
return counter
sleep(wait_sec)
if __name__ == "__main__":
# "spawn" results in an OSError b/c pickling a segfault fails?
ctx = multiprocessing.get_context()
manager = ctx.Manager()
results_queue = manager.Queue(maxsize=-1)
concurrency = multiprocessing.cpu_count() - 1 # reserve 1 CPU for waiting
nums = [3, 1, 2, 1.5]
all_processes = {}
for idx, num in enumerate(nums):
num_running_processes = wait_for_process_slot(all_processes, concurrency)
p = ctx.Process(target=do_something, args=(num, results_queue), daemon=True)
all_processes.update({idx: p})
p.start()
# Wait for the last batch of processes not blocked by wait_for_process_slot to finish
for p in all_processes.values():
p.join()
# Check last batch of processes for bad processes
# Relies on all processes having finished (the p.joins above)
bad_nums = [idx for idx, p in all_processes.items() if p.exitcode != 0]
I have some expensive long-running functions that I'd like to run on multiple cores. This is easy to do with multiprocessing. But I will also need to periodically run a function that calculates a value based on the state (global variables) of a specific process. I think this should be possible by simply spawning a thread on the subprocess.
Here's a simplified example. Please suggest how I can call procces_query_state().
import multiprocessing
import time
def process_runner(x: int):
global xx
xx = x
while True:
time.sleep(0.1)
xx += 1 # actually an expensive calculation
def process_query_state() -> int:
y = xx * 2 # actually an expenseive calculation
return y
def main():
processes = {}
for x in range(10):
p = multiprocessing.get_context('spawn').Process(target=process_runner, args=(x,))
p.start()
processes[x] = p
while True:
time.sleep(1)
print(processes[3].process_query_state()) # this doesn't actually work
if __name__ == '__main__':
main()
I see two problems:
Process is not RPC (Remote Procedure Call) and you can't execute other function process_query_state from main process. You can only use queue to send some information to other process - but this process has to periodically check if there is new message.
Process can run only one function so it would stop one function when it get message to run other function or it would have to run threads on new processes to run many functions at the same time.
EDIT: It may give other problem - if two functions will work at the same time on the same data then one can change value before other will use old value and this can create wrong results.
I created example which uses queues to send message to process_runner, and it periodically check if there is message and run process_query_state, and it send result back to main process.
Main process wait for result from selected porcess - it blocks code - but if you want to work with more processes then it would have to make it more complex.
import multiprocessing
import time
def process_query_state():
y = xx * 2 # actually an expenseive calculation
return y
def process_runner(x: int, queue_in, queue_out):
global xx
xx = x
# reverse direction
q_in = queue_out
q_out = queue_in
while True:
time.sleep(0.1)
xx += 1 # actually an expensive calculation
# run other function - it will block main calculations
# but this way it will use correct `xx` (other calculations will not change it)
if not q_in.empty():
if q_in.get() == 'run':
result = process_query_state()
q_out.put(result)
def main():
processes = {}
for x in range(4):
ctx = multiprocessing.get_context('spawn')
q_in = ctx.Queue()
q_out = ctx.Queue()
p = ctx.Process(target=process_runner, args=(x, q_in, q_out))
p.start()
processes[x] = (p, q_in, q_out)
while True:
time.sleep(1)
q_in = processes[3][1]
q_out = processes[3][2]
q_out.put('run')
# non blocking version
#if not q_in.empty():
# print(q_in.get())
# blocking version
print(q_in.get())
if __name__ == '__main__':
main()
I have a large number of tasks (40,000 to be exact) that I am using a Pool to run in parallel. To maximize efficiency, I pass the list of all tasks at once to starmap and let them run.
I would like to have it so that if my program is broken using Ctrl+C then currently running tasks will be allowed to finish but new ones will not be started. I have figured out the signal handling part to handle the Ctrl+C breaking just fine using the recommended method and this works well (at least with Python 3.6.9 that I am using):
import os
import signal
import random as rand
import multiprocessing as mp
def init() :
signal.signal(signal.SIGINT, signal.SIG_IGN)
def child(a, b, c) :
st = rand.randrange(5, 20+1)
print("Worker thread", a+1, "sleep for", st, "...")
os.system("sleep " + str(st))
pool = mp.Pool(initializer=init)
try :
pool.starmap(child, [(i, 2*i, 3*i) for i in range(10)])
pool.close()
pool.join()
print("True exit!")
except KeyboardInterrupt :
pool.terminate()
pool.join()
print("Interupted exit!")
The problem is that Pool seems to have no function to let the currently running tasks complete and then stop. It only has terminate and close. In the example above I use terminate but this is not what I want as this immediately terminates all running tasks (whereas I want to let the currently running tasks run to completion). On the other hand, close simply prevents adding more tasks, but calling close then join will wait for all pending tasks to complete (40,000 of them in my real case) (whereas I only want currently running tasks to finish not all of them).
I could somehow gradually add my tasks one by one or in chunks so I could use close and join when interrupted, but this seems less efficient unless there is a way to add a new task as soon as one finishes manually (which I'm not seeing how to do from the Pool documentation). It really seems like my use case would be common and that Pool should have a function for this, but I have not seen this question asked anywhere (or maybe I'm just not searching for the right thing).
Does anyone know how to accomplish this easily?
I tried to do something similar with concurrent.futures - see the last code block in this answer: it attempts to throttle adding tasks to the pool and only adds new tasks as tasks complete. You could change the logic to fit your needs. Maybe keep the pending work items slightly greater than the number of workers so you don't starve the executor. something like:
import concurrent.futures
import random as rand
import time
def child(*args, n=0):
signal.signal(signal.SIGINT, signal.SIG_IGN)
a,b,c = args
st = rand.randrange(1, 5)
time.sleep(st)
x = f"Worker {n} thread {a+1} slept for {st} - args:{args}"
return (n,x)
if __name__ == '__main__':
nworkers = 5 # ncpus?
results = []
fs = []
with concurrent.futures.ProcessPoolExecutor(max_workers=nworkers) as executor:
data = ((i, 2*i, 3*i) for i in range(100))
for n,args in enumerate(data):
try:
# limit pending tasks
while len(executor._pending_work_items) >= nworkers + 2:
# wait till one completes and get the result
futures = concurrent.futures.wait(fs, return_when=concurrent.futures.FIRST_COMPLETED)
#print(futures)
results.extend(future.result() for future in futures.done)
print(f'{len(results)} results so far')
fs = list(futures.not_done)
print(f'add a new task {n}')
fs.append(executor.submit(child, *args,**{'n':n}))
except KeyboardInterrupt as e:
print('ctrl-c!!}',file=sys.stderr)
# don't add anymore tasks
break
# get leftover results as they finish
for future in concurrent.futures.as_completed(fs):
print(f'{len(executor._pending_work_items)} tasks pending:')
result = future.result()
results.append(result)
results.sort()
# separate the results from the value used to sort
for n,result in results:
print(result)
Here is a way to get the results sorted in submission order without modifying the task. It uses a dictionary to relate each future to its submission order and uses it for the sort key.
# same imports
def child(*args):
signal.signal(signal.SIGINT, signal.SIG_IGN)
a,b,c = args
st = random.randrange(1, 5)
time.sleep(st)
x = f"Worker thread {a+1} slept for {st} - args:{args}"
return x
if __name__ == '__main__':
nworkers = 5 # ncpus?
sort_dict = {}
results = []
fs = []
with concurrent.futures.ProcessPoolExecutor(max_workers=nworkers) as executor:
data = ((i, 2*i, 3*i) for i in range(100))
for n,args in enumerate(data):
try:
# limit pending tasks
while len(executor._pending_work_items) >= nworkers + 2:
# wait till one completes and grab it
futures = concurrent.futures.wait(fs, return_when=concurrent.futures.FIRST_COMPLETED)
results.extend(future for future in futures.done)
print(f'{len(results)} futures completed so far')
fs = list(futures.not_done)
future = executor.submit(child, *args)
fs.append(future)
print(f'task {n} added - future:{future}')
sort_dict[future] = n
except KeyboardInterrupt as e:
print('ctrl-c!!',file=sys.stderr)
# don't add anymore tasks
break
# get leftover futures as they finish
for future in concurrent.futures.as_completed(fs):
print(f'{len(executor._pending_work_items)} tasks pending:')
results.append(future)
#sort the futures
results.sort(key=lambda f: sort_dict[f])
# get the results
for future in results:
print(future.result())
You could also just add an attribute to each future and sort on that (no need for the dictionary)
...
future = executor.submit(child, *args)
# add an attribute to the future that can be sorted on
future.submitted = n
fs.append(future)
...
results.sort(key=lambda f: f.submitted)
I have several processes, say A_step1, A_step2, B_step1, B_step2... They must run in the way that step1 has to finish before step2 starts to run. This is what I've done:
from subprocess import check_call
check_call(A_step1)
check_call(A_step2)
check_call(B_step1)
check_call(B_step2)
However, I want A and B processes to run in parallel. Is there anyway to achieve this in Python?
Many thanks
You probably can put related processes in a function and then run them asynchronously. For the asynchronous part, I would recommend the multiprocessing module
One common strategy is to use queues as a mechanism to allow a coordinator (typically your main process) to dole out work and as a way to allow workers to tell the coordinator when they have completed something.
Here is a simplified example. You can experiment with the random sleep times to convince yourself that none of the step-2 work will begin until both workers have finished their step-1 jobs.
from multiprocessing import Process, Manager
from time import sleep
from random import randint
def main():
# Some queues so that we can tell the workers to advance
# to the next step, and so that the workers to tell
# us when they have completed a step.
workQA = Manager().Queue()
workQB = Manager().Queue()
progQ = Manager().Queue()
# Start the worker processes.
pA = Process(target = workerA, args = (workQA, progQ))
pB = Process(target = workerB, args = (workQB, progQ))
pA.start()
pB.start()
# Step through some work.
for step in (1, 2):
workQA.put(step)
workQB.put(step)
done = []
while True:
item_done = progQ.get()
print item_done
done.append(item_done)
if len(done) == 2:
break
# Tell the workers to stop and wait for everything to wrap up.
workQA.put('stop')
workQB.put('stop')
pA.join()
pB.join()
def workerA(workQ, progQ):
do_work('A', workQ, progQ)
def workerB(workQ, progQ):
do_work('B', workQ, progQ)
def do_work(worker, workQ, progQ):
# Of course, in your real code the two workers won't
# be doing the same thing.
while True:
step = workQ.get()
if step == 1:
do_step(worker, step, progQ)
elif step == 2:
do_step(worker, step, progQ)
else:
return
def do_step(worker, step, progQ):
n = randint(1, 5)
msg = 'worker={} step={} sleep={}'.format(worker, step, n)
sleep(n)
progQ.put(msg)
main()
Example output:
worker=B step=1 sleep=2
worker=A step=1 sleep=4
worker=A step=2 sleep=1
worker=B step=2 sleep=3
class Job(object):
def __init__(self, name):
self.name = name
self.depends = []
self.waitcount = 0
def work(self):
#does some work
def add_dependent(self, another_job)
self.depends.append(another_job)
self.waitcount += 1
so, waitcount is based on the number of jobs you have in depends
job_board = {}
# create a dependency tree
for i in range(1000):
# create random jobs
j = Job(<new name goes here>)
# add jobs to depends if dependent
# record it in job_board
job_board[j.name] = j
# example
# jobC is in self.depends of jobA and jobB
# jobC would have a waitcount of 2
rdyQ = Queue.Queue()
def worker():
try:
job = rdyQ.get()
success = job.work()
# if this job was successful create dependent jobs
if success:
for dependent_job in job.depends:
dependent_job.waitcount -= 1
if dependent_job.waitcount == 0:
rdyQ.put(dependent_job)
and then i would create threads
for i in range(10):
t = threading.Thread( target=worker )
t.daemon=True
t.start()
for job_name, job_obj in job_board.iteritems():
if job_obj.waitcount == 0:
rdyQ.put(job_obj)
while True:
# until all jobs finished wait
Now here is an example:
# example
# jobC is in self.depends of jobA and jobB
# jobC would have a waitcount of 2
now in this scenario, if both jobA and jobB are running and they both tried to decrement waitcount of jobC, weird things were happening
so i put a lock
waitcount_lock = threading.Lock()
and changed this code to:
# if this job was successful create dependent jobs
if success:
for dependent_job in job.depends:
with waitcount_lock:
dependent_job.waitcount -= 1
if dependent_job.waitcount == 0:
rdyQ.put(dependent_job)
and strange things still happen
i.e. same job was being processed by multiple threads, as if the job was put into the queue twice
is it not a best practice to have/modify nested objects when complex objects are being pass amongst threads?
Here's a complete, executable program that appears to work fine. I expect you're mostly seeing "weird" behavior because, as I suggested in a comment, you're counting job successors instead of job predecessors. So I renamed things with "succ" and "pred" in their names to make that much clearer. daemon threads are also usually a Bad Idea, so this code arranges to shut down all the threads cleanly when the work is over. Note too the use of assertions to verify that implicit beliefs are actually true ;-)
import threading
import Queue
import random
NTHREADS = 10
NJOBS = 10000
class Job(object):
def __init__(self, name):
self.name = name
self.done = False
self.succs = []
self.npreds = 0
def work(self):
assert not self.done
self.done = True
return True
def add_dependent(self, another_job):
self.succs.append(another_job)
another_job.npreds += 1
def worker(q, lock):
while True:
job = q.get()
if job is None:
break
success = job.work()
if success:
for succ in job.succs:
with lock:
assert succ.npreds > 0
succ.npreds -= 1
if succ.npreds == 0:
q.put(succ)
q.task_done()
jobs = [Job(i) for i in range(NJOBS)]
for i, job in enumerate(jobs):
# pick some random successors
possible = xrange(i+1, NJOBS)
succs = random.sample(possible,
min(len(possible),
random.randrange(10)))
for succ in succs:
job.add_dependent(jobs[succ])
q = Queue.Queue()
for job in jobs:
if job.npreds == 0:
q.put(job)
print q.qsize(), "ready jobs initially"
lock = threading.Lock()
threads = [threading.Thread(target=worker,
args=(q, lock))
for _ in range(NTHREADS)]
for t in threads:
t.start()
q.join()
# add sentinels so threads end cleanly
for t in threads:
q.put(None)
for t in threads:
t.join()
for job in jobs:
assert job.done
assert job.npreds == 0
CLARIFYING THE LOCK
In a sense, the lock in this code protects "too much". The potential problem it's addressing is that multiple threads may try to decrement the .npreds member of the same Job object simultaneously. Without mutual exclusion, the stored value at the end of that may be anywhere from 1 smaller than its initial value, to the correct result (the initial value minus the number of threads trying to decrement it).
But there's no need to also mutate the queue under lock protection. Queues do their own thread-safe locking. So, e.g., the code could be written like so instead:
for succ in job.succs:
with lock:
npreds = succ.npreds = succ.npreds - 1
assert npreds >= 0
if npreds == 0:
q.put(succ)
It's generally best practice to hold a lock for as little time as possible. However, I find this rewrite harder to follow. Pick your poison ;-)