Cpu locked when using multiprocessing with python - python

My code:
def create_rods(folder="./", kappas=10, allowed_kappa_error=.3,
radius_correction_ratio=0.1):
"""
Create one rod for each rod_data and for each file
returns [RodGroup1, RodGroup2, ...]
"""
names, files = import_files(folder=folder)
if len(files) == 0:
print "No files to import."
raise ValueError
states = [None for dummy_ in range(len(files))]
processes = []
states_queue = mp.Queue()
for index in range(len(files)):
process = mp.Process(target=create_rods_process,
args=(kappas, allowed_kappa_error,
radius_correction_ratio, names,
files, index, states_queue))
processes.append(process)
run_processes(processes) #This part seem to take a lot of time.
try:
while True:
[index, state] = states_queue.get(False)
states[index] = state
except Queue.Empty:
pass
return names, states
def create_rods_process(kappas, allowed_kappa_error,
radius_correction_ratio, names,
files, index, states_queue):
"""
Process of method.
"""
state = SystemState(kappas, allowed_kappa_error,
radius_correction_ratio, names[index])
data = import_data(files[index])
for dataline in data:
parameters = tuple(dataline)
new_rod = Rod(parameters)
state.put_rod(new_rod)
state.check_rods()
states_queue.put([index, state])
def run_processes(processes, time_out=None):
"""
Runs all processes using all cores.
"""
running = []
cpus = mp.cpu_count()
try:
while True:
#for cpu in range(cpus):
next_process = processes.pop()
running.append(next_process)
next_process.start()
except IndexError:
pass
if not time_out:
try:
while True:
for process in running:
if not process.is_alive():
running.remove(process)
except TypeError:
pass
else:
for process in running:
process.join(time_out)
I expect processes to end but I get a process stucked. I don't know if there is a problem with run_processes() method or with create_rods() method. With join cpus are freed, but program doesn't go on.

From Python's multiprocessing guidelines.
Joining processes that use queues
Bear in mind that a process that has put items in a queue will wait before terminating until all the buffered items are fed by the “feeder” thread to the underlying pipe. (The child process can call the Queue.cancel_join_thread method of the queue to avoid this behaviour.)
This means that whenever you use a queue you need to make sure that all items which have been put on the queue will eventually be removed before the process is joined. Otherwise you cannot be sure that processes which have put items on the queue will terminate. Remember also that non-daemonic processes will be joined automatically.
Joining processes before draining their Queues results in a deadlock. You need to be sure the queues are emptied before joining the processes.

Related

Get error flag/message from a queued process in Python multiprocessing

I am preparing a Python multiprocessing tool where I use Process and Queue commands. The queue is putting another script in a process to run in parallel. As a sanity check, in the queue, I want to check if there is any error happing in my other script and return a flag/message if there was an error (status = os.system() will run the process and status is a flag for error). But I can't output errors from the queue/child in the consumer process to the parent process. Following are the main parts of my code (shortened):
import os
import time
from multiprocessing import Process, Queue, Lock
command_queue = Queue()
lock = Lock()
p = Process(target=producer, args=(command_queue, lock, test_config_list_path))
for i in range(consumer_num):
c = Process(target=consumer, args=(command_queue, lock))
consumers.append(c)
p.daemon = True
p.start()
for c in consumers:
c.daemon = True
c.start()
p.join()
for c in consumers:
c.join()
if error_flag:
Stop_this_process_and_send_a_message!
def producer(queue, lock, ...):
for config_path in test_config_list_path:
queue.put((config_path, process_to_be_queued))
def consumer(queue, lock):
while True:
elem = queue.get()
if elem is None:
return
status = os.system(elem[1])
if status:
error_flag = 1
time.sleep(3)
Now I want to get that error_flag and use it in the main code to handle things. But seems I can't output error_flag from the consumer (child) part to the main part of the code. I'd appreciate it if someone can help with this.
Given your update, I also pass an multiprocessing.Event instance to your to_do process. This allows you to simply issue a call to wait on the event in the main process, which will block until a call to set is called on it. Naturally, when to_do or one of its threads detects a script error, it would call set on the event after setting error_flag.value to True. This will wake up the main process who can then call method terminate on the process, which will do what you want. On a normal completion of to_do, it still is necessary to call set on the event since the main process is blocking until the event has been set. But in this case the main process will just call join on the process.
Using a multiprocessing.Value instance alone would have required periodically checking its value in a loop, so I think waiting on a multiprocessing.Event is better. I have also made a couple of other updates to your code with comments, so please review them:
import multiprocessing
from ctypes import c_bool
...
def to_do(event, error_flag):
# Run the tests
wrapper_threads.main(event, error_flag)
# on error or normal process completion:
event.set()
def git_pull_change(path_to_repo):
repo = Repo(path)
current = repo.head.commit
repo.remotes.origin.pull()
if current == repo.head.commit:
print("Repo not changed. Sleep mode activated.")
# Call to time.sleep(some_number_of_seconds) should go here, right?
return False
else:
print("Repo changed. Start running the tests!")
return True
def main():
while True:
status = git_pull_change(git_path)
if status:
# The repo was just pulled, so no point in doing it again:
#repo = Repo(git_path)
#repo.remotes.origin.pull()
event = multiprocessing.Event()
error_flag = multiprocessing.Value(c_bool, False, lock=False)
process = multiprocessing.Process(target=to_do, args=(event, error_flag))
process.start()
# wait for an error or normal process completion:
event.wait()
if error_flag.value:
print('Error! breaking the process!!!!!!!!!!!!!!!!!!!!!!!')
process.terminate() # Kill the process
else:
process.join()
break
You should always tag multiprocessing questions with the platform you are running on. Since I do not see your process-creating code within a if __name__ == '__main__': block, I have to assume you are running on a platform that uses OS fork calls to create new processes, such as Linux.
That means your newly created processes inherit the value of error_flag when they are created but for all intents and purposes, if a process modifies this variable, it is modifying a local copy of this variable that exists in an address space that is unique to that process.
You need to create error_flag in shared memory and pass it as an argument to your process:
from multiprocessing import Value
from ctypes import c_bool
...
error_flag = Value(c_bool, False, lock=False)
for i in range(consumer_num):
c = Process(target=consumer, args=(command_queue, lock, error_flag))
consumers.append(c)
...
if error_flag.value:
...
#Stop_this_process_and_send_a_message!
def consumer(queue, lock, error_flag):
while True:
elem = queue.get()
if elem is None:
return
status = os.system(elem[1])
if status:
error_flag.value = True
time.sleep(3)
But I have a questions/comments for you. You have in your original code the following statement:
if error_flag:
Stop_this_process_and_send_a_message!
But this statement is located after you have already joined all the started processes. So what processes are there to stop and where are you sending a message to (you have potentially multiple consumers any of which might be setting the error_flag -- by the way, no need to have this done under a lock since setting the value True is an atomic action). And since you are joining all your processes, i.e. waiting for them to complete, I am not sure why you are making them daemon processes. You are also passing a Lock instance to your producer and consumers, but it is not being used at all.
Your consumers return when they get a None record from the queue. So if you have N consumers, the last N elements of test_config_path need to be None.
I also see no need for having the producer process. The main process could just as well write all the records to the queue either before or even after it starts the consumer processes.
The call to time.sleep(3) you have at the end of function consumer is unreachable.
So the above code summary is the inner process to run some tests in parallel. I removed the def function part from it, but just assume that is the wrapper_threads in the following code summary. Here I'll add the parent process which is checking a variable (let's assume a commit in my git repo). The following process is meant to run indefinitely and when there is a change it will trigger the multiprocess in the main question:
def to_do():
# Run the tests
wrapper_threads.main()
def git_pull_change(path_to_repo):
repo = Repo(path)
current = repo.head.commit
repo.remotes.origin.pull()
if current == repo.head.commit:
print("Repo not changed. Sleep mode activated.")
return False
else:
print("Repo changed. Start running the tests!")
return True
def main():
process = None
while True:
status = git_pull_change(git_path)
if status:
repo = Repo(git_path)
repo.remotes.origin.pull()
process = multiprocessing.Process(target=to_do)
process.start()
if error_flag.value:
print('Error! breaking the process!!!!!!!!!!!!!!!!!!!!!!!')
os.system('pkill -U user XXX')
break
Now I want to propagate that error_flag from the child process to this process and stop process XXX. The problem is that I don't know how to bring that error_flag to this (grand)parent process.

ProcessPoolExecutor, BrokenProcessPool handling

In this documentation ( https://pymotw.com/3/concurrent.futures/ ) it says:
"The ProcessPoolExecutor works in the same way as ThreadPoolExecutor, but uses processes instead of threads. This allows CPU-intensive operations to use a separate CPU and not be blocked by the CPython interpreter’s global interpreter lock."
This sounds great! It also says:
"If something happens to one of the worker processes to cause it to exit unexpectedly, the ProcessPoolExecutor is considered “broken” and will no longer schedule tasks."
This sounds bad :( So I guess my question is: What is considered "Unexpectedly?" Does that just mean the exit signal is not 1? Can I safely exit the thread and still keep processing a queue? The example is as follows:
from concurrent import futures
import os
import signal
with futures.ProcessPoolExecutor(max_workers=2) as ex:
print('getting the pid for one worker')
f1 = ex.submit(os.getpid)
pid1 = f1.result()
print('killing process {}'.format(pid1))
os.kill(pid1, signal.SIGHUP)
print('submitting another task')
f2 = ex.submit(os.getpid)
try:
pid2 = f2.result()
except futures.process.BrokenProcessPool as e:
print('could not start new tasks: {}'.format(e))
I hadn't see it IRL, but from the code it looks like the returned file descriptors not contains the results_queue file descriptor.
from concurrent.futures.process:
reader = result_queue._reader
while True:
_add_call_item_to_queue(pending_work_items,
work_ids_queue,
call_queue)
sentinels = [p.sentinel for p in processes.values()]
assert sentinels
ready = wait([reader] + sentinels)
if reader in ready: # <===================================== THIS
result_item = reader.recv()
else:
# Mark the process pool broken so that submits fail right now.
executor = executor_reference()
if executor is not None:
executor._broken = True
executor._shutdown_thread = True
executor = None
# All futures in flight must be marked failed
for work_id, work_item in pending_work_items.items():
work_item.future.set_exception(
BrokenProcessPool(
"A process in the process pool was "
"terminated abruptly while the future was "
"running or pending."
))
# Delete references to object. See issue16284
del work_item
the wait function depends on system, but assuming linux OS (at multiprocessing.connection, removed all timeout related code):
def wait(object_list, timeout=None):
'''
Wait till an object in object_list is ready/readable.
Returns list of those objects in object_list which are ready/readable.
'''
with _WaitSelector() as selector:
for obj in object_list:
selector.register(obj, selectors.EVENT_READ)
while True:
ready = selector.select(timeout)
if ready:
return [key.fileobj for (key, events) in ready]
else:
# some timeout code

Why doesn't Python process with input and output queues not join once it is done?

This simple Python3 program using multiprocessing does not seem to work as expected.
All the input processes share an input queue from which they consume data. They all share an output queue where they write a result once they are fully done. I find that this program hangs at the process join(). Why is that?
#!/usr/bin/env python3
import multiprocessing
def worker_func(in_q, out_q):
print("A worker has started")
w_results = {}
while not in_q.empty():
v = in_q.get()
w_results[v] = v
out_q.put(w_results)
print("A worker has finished")
def main():
# Input queue to share among processes
fpaths = [str(i) for i in range(10000)]
in_q = multiprocessing.Queue()
for fpath in fpaths:
in_q.put(fpath)
# Create processes and start them
N_PROC = 2
out_q = multiprocessing.Queue()
workers = []
for _ in range(N_PROC):
w = multiprocessing.Process(target=worker_func, args=(in_q, out_q,))
w.start()
workers.append(w)
print("Done adding workers")
# Wait for processes to finish
for w in workers:
w.join()
print("Done join of workers")
# Collate worker results
out_results = {}
while not out_q.empty():
out_results.update(out_q.get())
if __name__ == "__main__":
main()
I get this result from this program when N_PROC = 2:
$ python3 test.py
Done adding workers
A worker has started
A worker has started
A worker has finished
<---- I do not get "A worker has finished" from second worker
<---- I do not get "Done join of workers"
It does not work even with a single child process N_PROC = 1:
$ python3 test.py
Done adding workers
A worker has started
A worker has finished
<---- I do not get "Done join of workers"
If I try a smaller input queue with say 1000 items, everything works fine.
I am aware of some old StackOverflow questions that say that the Queue has a limit. Why is this not documented in the Python3 docs?
What is an alternative solution I can use? I want to use multi-processing (not threading), to split the input among N processes. Once their shared input queue is empty, I want each process to collect its results (can be a big/complex data structure like dict) and return it back to the parent process. How to do this?
This is a classical bug caused by your design. When the worker are terminating, they stall because they have not been able to put all the data in the out_q, thus deadlocking your program. This has to do with size of the pipe buffer underlying your queue.
When you are using a multiprocessing.Queue, you should empty it before trying to join the feeder process, to make sure that the Process does not stall waiting for all the object to be put in the Queue. So putting your out_q.get call before the joinning the processes should solve your problem:. You can use a sentinel pattern to detect the end of the computations.
#!/usr/bin/env python3
import multiprocessing
from multiprocessing.queues import Empty
def worker_func(in_q, out_q):
print("A worker has started")
w_results = {}
while not in_q.empty():
try:
v = in_q.get(timeout=1)
w_results[v] = v
except Empty:
pass
out_q.put(w_results)
out_q.put(None)
print("A worker has finished")
def main():
# Input queue to share among processes
fpaths = [str(i) for i in range(10000)]
in_q = multiprocessing.Queue()
for fpath in fpaths:
in_q.put(fpath)
# Create processes and start them
N_PROC = 2
out_q = multiprocessing.Queue()
workers = []
for _ in range(N_PROC):
w = multiprocessing.Process(target=worker_func, args=(in_q, out_q,))
w.start()
workers.append(w)
print("Done adding workers")
# Collate worker results
out_results = {}
n_proc_end = 0
while not n_proc_end == N_PROC:
res = out_q.get()
if res is None:
n_proc_end += 1
else:
out_results.update(res)
# Wait for processes to finish
for w in workers:
w.join()
print("Done join of workers")
if __name__ == "__main__":
main()
Also, note that your code has a race condition in it. The queue in_q can be emptied between the moment you check not in_q.empty() and the get. You should use a non blocking get to make sure you don't deadlock, waiting on an empty queue.
Finally, you are trying to implement something that look like a multiprocessing.Pool, which handle this kind of communication in a more robust way. you can also look at the concurrent.futures API, which is even more robust and in some sense, better designed.

Thread cue is running serially, not parallel?

I'm making remote API calls using threads, using no join so that the program could make the next API call without waiting for the last to complete.
Like so:
def run_single_thread_no_join(function, args):
thread = Thread(target=function, args=(args,))
thread.start()
return
The problem was I needed to know when all API calls were completed. So I moved to code that's using a cue & join.
Threads seem to run in serial now.
I can't seem to figure out how to get the join to work so that threads execute in parallel.
What am I doing wrong?
def run_que_block(methods_list, num_worker_threads=10):
'''
Runs methods on threads. Stores method returns in a list. Then outputs that list
after all methods in the list have been completed.
:param methods_list: example ((method name, args), (method_2, args), (method_3, args)
:param num_worker_threads: The number of threads to use in the block.
:return: The full list of returns from each method.
'''
method_returns = []
# log = StandardLogger(logger_name='run_que_block')
# lock to serialize console output
lock = threading.Lock()
def _output(item):
# Make sure the whole print completes or threads can mix up output in one line.
with lock:
if item:
print(item)
msg = threading.current_thread().name, item
# log.log_debug(msg)
return
# The worker thread pulls an item from the queue and processes it
def _worker():
while True:
item = q.get()
if item is None:
break
method_returns.append(item)
_output(item)
q.task_done()
# Create the queue and thread pool.
q = Queue()
threads = []
# starts worker threads.
for i in range(num_worker_threads):
t = threading.Thread(target=_worker)
t.daemon = True # thread dies when main thread (only non-daemon thread) exits.
t.start()
threads.append(t)
for method in methods_list:
q.put(method[0](*method[1]))
# block until all tasks are done
q.join()
# stop workers
for i in range(num_worker_threads):
q.put(None)
for t in threads:
t.join()
return method_returns
You're doing all the work in the main thread:
for method in methods_list:
q.put(method[0](*method[1]))
Assuming each entry in methods_list is a callable and a sequence of arguments for it, you did all the work in the main thread, then put the result from each function call in the queue, which doesn't allow any parallelization aside from printing (which is generally not a big enough cost to justify thread/queue overhead).
Presumably, you want the threads to do the work for each function, so change that loop to:
for method in methods_list:
q.put(method) # Don't call it, queue it to be called in worker
and change the _worker function so it calls the function that does the work in the thread:
def _worker():
while True:
method, args = q.get() # Extract and unpack callable and arguments
item = method(*args) # Call callable with provided args and store result
if item is None:
break
method_returns.append(item)
_output(item)
q.task_done()

Why am I unable to join this thread in python?

I am writing a multithreading class. The class has a parallel_process() function that is overridden with the parallel task. The data to be processed is put in the queue. The worker() function in each thread keeps calling parallel_process() until the queue is empty. Results are put in the results Queue object. The class definition is:
import threading
try:
from Queue import Queue
except ImportError:
from queue import Queue
class Parallel:
def __init__(self, pkgs, common=None, nthreads=1):
self.nthreads = nthreads
self.threads = []
self.queue = Queue()
self.results = Queue()
self.common = common
for pkg in pkgs:
self.queue.put(pkg)
def parallel_process(self, pkg, common):
pass
def worker(self):
while not self.queue.empty():
pkg = self.queue.get()
self.results.put(self.parallel_process(pkg, self.common))
self.queue.task_done()
return
def start(self):
for i in range(self.nthreads):
t = threading.Thread(target=self.worker)
t.daemon = False
t.start()
self.threads.append(t)
def wait_for_threads(self):
print('Waiting on queue to empty...')
self.queue.join()
print('Queue processed. Joining threads...')
for t in self.threads:
t.join()
print('...Thread joined.')
def get_results(self):
results = []
print('Obtaining results...')
while not self.results.empty():
results.append(self.results.get())
return results
I use it to create a parallel task:
class myParallel(Parallel): # return square of numbers in a list
def parallel_process(self, pkg, common):
return pkg**2
p = myParallel(range(50),nthreads=4)
p.start()
p.wait_for_threads()
r = p.get_results()
print('FINISHED')
However all threads do not join every time the code is run. Sometimes only 2 join, sometimes no thread joins. I do not think I am blocking the threads from finishing. What reason could there be for join() to not work here?
This statement may lead to errors:
while not self.queue.empty():
pkg = self.queue.get()
With multiple threads pulling items from the queue, there's no guarantee that self.queue.get() will return a valid item, even if you check if the queue is empty beforehand. Here is a possible scenario
Thread 1 checks the queue and the queue is not empty, control proceeds into the while loop.
Control passes to Thread 2, which also checks the queue, finds it is not empty and enters the while loop. Thread 2 gets an item from the loop. The queue is now empty.
Control passes back to Thread 1, it gets an item from the queue, but the queue is now empty, an Empty Exception should be raised.
You should just use a try/except to get an item from the queue
try:
pkg = self.queue.get_nowait()
except Empty:
pass
#Brendan Abel identified the cause. I'd like to suggest a different solution: queue.join() is usually a Bad Idea too. Instead, create a unique value to use as a sentinel:
class Parallel:
_sentinel = object()
At the end of __init__(), add one sentinel to the queue for each thread:
for i in range(nthreads):
self.queue.put(self._sentinel)
Change the start of worker() like so:
while True:
pkg = self.queue.get()
if pkg is self._sentinel:
break
By the construction of the queue, it won't be empty until each thread has seen its sentinel value, so there's no need to mess with the unpredictable queue.size().
Also remove the queue.join() and queue.task_done() cruft.
This will give you reliable code that's easy to modify for fancier scenarios. For example, if you want to add more work items while the threads are running, fine - just write another method to say "I'm done adding work items now", and move the loop adding sentinels into that.

Categories