Multiprocesing pool.join() hangs under some circumstances

Multiprocesing pool.join() hangs under some circumstances - python

I am trying to create a simple producer / consumer pattern in Python using multiprocessing. It works, but it hangs on poll.join().
from multiprocessing import Pool, Queue
que = Queue()
def consume():
while True:
element = que.get()
if element is None:
print('break')
break
print('Consumer closing')
def produce(nr):
que.put([nr] * 1000000)
print('Producer {} closing'.format(nr))
def main():
p = Pool(5)
p.apply_async(consume)
p.map(produce, range(5))
que.put(None)
print('None')
p.close()
p.join()
if __name__ == '__main__':
main()
Sample output:
~/Python/Examples $ ./multip_prod_cons.py
Producer 1 closing
Producer 3 closing
Producer 0 closing
Producer 2 closing
Producer 4 closing
None
break
Consumer closing
However, it works perfectly when I change one line:
que.put([nr] * 100)
It is 100% reproducible on Linux system running Python 3.4.3 or Python 2.7.10. Am I missing something?

There is quite a lot of confusion here. What you are writing is not a producer/consumer scenario but a mess which is misusing another pattern usually referred as "pool of workers".
The pool of workers pattern is an application of the producer/consumer one in which there is one producer which schedules the work and many consumers which consume it. In this pattern, the owner of the Pool ends up been the producer while the workers will be the consumers.
In your example instead you have a hybrid solution where one worker ends up being a consumer and the others act as sort of middle-ware. The whole design is very inefficient, duplicates most of the logic already provided by the Pool and, more important, is very error prone. What you end up suffering from, is a Deadlock.
Putting an object into a multiprocessing.Queue is an asynchronous operation. It blocks only if the Queue is full and your Queue has infinite size.
This means your produce function returns immediately therefore the call to p.map is not blocking as you expect it to do. The related worker processes instead, wait until the actual message goes through the Pipe which the Queue uses as communication channel.
What happens next is that you terminate prematurely your consumer as you put in the Queue the None "message" which gets delivered before all the lists your produce function create are properly pushed through the Pipe.
You notice the issue once you call p.join but the real situation is the following.
the p.join call is waiting for all the worker processes to terminate.
the worker processes are waiting for the big lists to go though the Queue's Pipe.
as the consumer worker is long gone, nobody drains the Pipe which is obviously full.
The issue does not show if your lists are small enough to go through before you actually send the termination message to the consume function.

Related

Python multiprocessing queue makes code hang with large data

I'm using python's multiprocessing to analyse some large texts. After some days trying to figure out why my code was hanging (i.e. the processes didn't end), I was able to recreate the problem with the following simple code:
import multiprocessing as mp
for y in range(65500, 65600):
print(y)
def func(output):
output.put("a"*y)
if __name__ == "__main__":
output = mp.Queue()
process = mp.Process(target = func, args = (output,))
process.start()
process.join()
As you can see, if the item to put in the queue gets too large, the process just hangs.
It doesn't freeze, if I write more code after output.put() it will run, but still, the process never stops.
This starts happening when the string gets to 65500 chars, depending on your interpreter it may vary.
I was aware that mp.Queue has a maxsize argument, but doing some search I found out it is about the Queue's size in number of items, not the size of the items themselves.
Is there a way around this?
The data I need to put inside the Queue in my original code is very very large...

Your queue fills up with no consumer to empty it.
From the definition of Queue.put:
If the optional argument block is True (the default) and timeout is None (the default), block if necessary until a free slot is available.
Assuming there is no deadlock possible between producer and consumer (and assuming your original code does have a consumer, since your sample doesn't), eventually the producers should be unlocked and terminate. Check the code of your consumer (or add it to the question, so we an have a look)
Update
This is not the problem, because queue has not been given a maxsize so put should succeed until you run out of memory.
This is not the behavior of Queue. As elaborated in this ticket, the part blocking here is not the queue itself, but the underlying pipe. From the linked resource (inserts between "[]" are mine):
A queue works like this:
- when you call queue.put(data), the data is added to a deque, which can grow and shrink forever
- then a thread pops elements from the deque, and sends them so that the other process can receive them through a pipe or a Unix socket (created via socketpair). But, and that's the important point, both pipes and unix sockets have a limited capacity (used to be 4k - pagesize - on older Linux kernels for pipes, now it's 64k, and between 64k-120k for unix sockets, depending on tunable systcls).
- when you do queue.get(), you just do a read on the pipe/socket
[..] when size [becomes too big] the writing thread blocks on the write syscall.
And since a join is performed before dequeing the item [note: that's your process.join], you just deadlock, since the join waits for the sending thread to complete, and the write can't complete since the pipe/socket is full!
If you dequeue the item before waiting the submitter process, everything works fine.
Update 2
I understand. But I don't actually have a consumer (if it is what I'm thinking it is), I will only get the results from the queue when process has finished putting it into the queue.
Yeah, this is the problem. The multiprocessing.Queue is not a storage container. You should use it exclusively for passing data between "producers" (the processes that generate data that enters the queue) and "consumers (the processes that "use" that data). As you now know, leaving the data there is a bad idea.
How can I get an item from the queue if I cannot even put it there first?
put and get hide away the problem of putting together the data if it fills up the pipe, so you only need to set up a loop in your "main" process to get items out of the queue and, for example, append them to a list. The list is in the memory space of the main process and does not clog the pipe.

How to fix 'TypeError: can't pickle _thread.lock objects' when passing a Queue to a thread in a child process

I've been stuck on this issue all day, and I have not been able to find any solutions relating to what I am trying to accomplish.
I am trying to pass Queues to threads spawned in sub-processes. The Queues were created in the entrance file and passed to each sub-process as a parameter.
I am making a modular program to a) run a neural network b) automatically update the network models when needed c) log events/images from the neural network to the servers. My former program idolized only one CPU-core running multiple threads and was getting quite slow, so I decided I needed to sub-process certain parts of the program so they can run in their own memory spaces to their fullest potential.
Sub-process:
Client-Server communication
Webcam control and image processing
Inferencing for the neural networks (there are 2 neural networks with their own process each)
4 total sub-processes.
As I develop, I need to communicate across each process so they are all on the same page with events from the servers and whatnot. So Queue would be the best option as far as I can tell.
(Clarify: 'Queue' from the 'multiprocessing' module, NOT the 'queue' module)
~~ However ~~
Each of these sub-processes spawn their own thread(s). For example, the 1st sub-process will spawn multiple threads: One thread per Queue to listen to the events from the different servers and hand them to different areas of the program; one thread to listen to the Queue receiving images from one of the neural networks; one thread to listen to the Queue receiving live images from the webcam; and one thread to listen to the Queue receiving the output from the other neural network.
I can pass the Queues to the sub-processes without issue and can use them effectively. However, when I try to pass them to the threads within each sub-process, I get the above error.
I am fairly new to multiprocessing; however, the methodology behind it looks to be relatively the same as threads except for the shared memory space and GIL.
This is from Main.py; the program entrance.
from lib.client import Client, Image
from multiprocessing import Queue, Process
class Main():
def __init__(self, server):
self.KILLQ = Queue()
self.CAMERAQ = Queue()
self.CLIENT = Client((server, 2005), self.KILLQ, self.CAMERAQ)
self.CLIENT_PROCESS = Process(target=self.CLIENT.do, daemon=True)
self.CLIENT_PROCESS.start()
if __name__ == '__main__':
m = Main('127.0.0.1')
while True:
m.KILLQ.put("Hello world")
And this is from client.py (in a folder called lib)
class Client():
def __init__(self, connection, killq, cameraq):
self.TCP_IP = connection[0]
self.TCP_PORT = connection[1]
self.CAMERAQ = cameraq
self.KILLQ = killq
self.BUFFERSIZE = 1024
self.HOSTNAME = socket.gethostname()
self.ATTEMPTS = 0
self.SHUTDOWN = False
self.START_CONNECTION = MakeConnection((self.TCP_IP, self.TCP_PORT))
# self.KILLQ_THREAD = Thread(target=self._listen, args=(self.KILLQ,), daemon=True)
# self.KILLQ_THREAD.start()
def do(self):
# The function ran as the subprocess from Main.py
print(self.KILLQ.get())
def _listen(self, q):
# This is threaded multiple times listening to each Queue (as 'q' that is passed when the thread is created)
while True:
print(self.q.get())
# self.KILLQ_THREAD = Thread(target=self._listen, args=(self.KILLQ,), daemon=True)
This is where the error is thrown. If I leave this line commented, the program runs fine. I can read from the queue in this sub-process without issue (i.e. the function 'do') not in a thread under this sub-process (i.e. the function '_listen').
I need to be able to communicate across each process so they can be in step with the main program (i.e. in the case of a neural network model update, the inference sub-process needs to shut down so the model can be updated without causing errors).
Any help with this would be great!
I am also very open to other methods of communication that would work as well. In the event that you believe a better communication process would work; it would need to be fast enough to support real-time streaming of 4k images sent to the server from the camera.
Thank you very much for your time! :)

The queue is not the problem. The ones from the multiprocessing package are designed to be picklable, so that they can be shared between processes.
The issue is, that your thread KILLQ_THREAD is created in the main process. Threads are not to be shared between processes. In fact, when a process is forked following POSIX standards, threads that are active in the parent process are not part of the process image that is cloned to the new child's memory space. One reason is that the state of mutexes at the time of the call to fork() might lead to deadlocks in the child process.
You'll have to move the creation of your thread to your child process, i.e.
def do(self):
self.KILLQ_THREAD = Thread(target=self._listen, args=(self.KILLQ,), daemon=True)
self.KILLQ_THREAD.start()
Presumably, KILLQ is supposed to signal the child processes to shut down. In that case, especially if you plan to use more than one child process, a queue is not the best method to achieve that. Since Queue.get() and Queue.get_nowait() remove the item from the queue, each item can only be retrieved and processed by one consumer. Your producer would have to put multiple shutdown signals into the queue. In a multi-consumer scenario, you also have no reasonable way to ensure that a specific consumer receives any specific item. Any item put into the queue can potentially be retrieved by any of the consumers reading from it.
For signalling, especially with multiple recipients, better use Event
You'll also notice, that your program appears to hang quickly after starting it. That's because you start both, your child process and the thread with daemon=True.
When your Client.do() method looks like above, i.e. creates and starts the thread, then exits, your child process ends right after the call to self.KILLQ_THREAD.start() and the daemonic thread immediately ends with it. Your main process does not notice anything and continues to put Hello world into the queue until it eventually fills up and queue.Full raises.
Here's a condensed code example using an Event for shutdown signalling in two child processes with one thread each.
main.py
import time
from lib.client import Client
from multiprocessing import Process, Event
class Main:
def __init__(self):
self.KILLQ = Event()
self._clients = (Client(self.KILLQ), Client(self.KILLQ))
self._procs = [Process(target=cl.do, daemon=True) for cl in self._clients]
[proc.start() for proc in self._procs]
if __name__ == '__main__':
m = Main()
# do sth. else
time.sleep(1)
# signal for shutdown
m.KILLQ.set()
# grace period for both shutdown prints to show
time.sleep(.1)
client.py
import multiprocessing
from threading import Thread
class Client:
def __init__(self, killq):
self.KILLQ = killq
def do(self):
# non-daemonic thread! We want the process to stick around until the thread
# terminates on the signal set by the main process
self.KILLQ_THREAD = Thread(target=self._listen, args=(self.KILLQ,))
self.KILLQ_THREAD.start()
#staticmethod
def _listen(q):
while not q.is_set():
print("in thread {}".format(multiprocessing.current_process().name))
print("{} - master signalled shutdown".format(multiprocessing.current_process().name))
Output
[...]
in thread Process-2
in thread Process-1
in thread Process-2
Process-2 - master signalled shutdown
in thread Process-1
Process-1 - master signalled shutdown
Process finished with exit code 0
As for methods of inter process communication, you might want to look into a streaming server solution.
Miguel Grinberg has written an excellent tutorial on Video Streaming with Flask back in 2014 with a more recent follow-up from August 2017.

Python Child Processes Not Always Cleaning Up

I'm working on a small bit of Python (2.7.3) to run a script which constantly monitors a message/queue broker and processes entries it finds. Due to volume the processing is embedded in multi-processing fashion that looks a bit like this:
result = Queue()
q = JoinableQueue()
process_pool = []
for i in range(args.max_processes):
q.put(i)
p = Process(target=worker, args=(q, result, broker, ...))
process_pool.append(p)
#Start processes
for p in process_pool:
p.start()
#Block until completion
q.join()
logger.warning("All processes completed")
Despite the code regularly iterating and logging that all processes had completed, I found PIDs gradually stacking up beyond args.max_processes.
I added an additional block to the end of this:
for p in process_pool:
if p.is_alive():
logger.warning("Process with pid %s is still alive - terminating" % p.pid)
try:
p.terminate()
except exception as e:
logger.warning("PROBLEM KILLING PID: stack: %s" % e)
I reaped all processes to clear the slate, started again, and I can clearly see the logger very intermittently show instances where a PID is found to still be alive even after it has flagged completion to the parent process AND the terminate process fails to kill it.
I added logger output to the individual threads and each one logs success indicating that it's completed cleanly prior to signaling completion to the parent process, and yet it still hangs around.
Because I plan to run this as a service over time the number of vagrant processes lying around can cause problems as they stack into the thousands.
I'd love some insight on what I'm missing and doing wrong.
Thank you
Edit: Update - adding overview of worker block for question completeness:
The worker interacts with a message/queue broker, the details of which I'll omit somewhat since outside of some debug logging messages everything is wrapped in a try/except block and each thread appears to run to completion, even on the occasions when a child process gets left behind.
def worker(queue, result_queue, broker, other_variables...):
logger.warning("Launched individual thread")
job = queue.get()
try:
message_broker logic
except Exception as e:
result_queue.put("Caught exception: %s" % e.message)
logger.warning("Individual thread completed cleanly...")
queue.task_done()
To iterate the problem. Without any exceptions being thrown and caught I can see all the logging to indicate n threads are started, run to completion, and complete with good status on each iteration. The blocking "q.join()" call which cannot complete until all threads complete returns each time, but some very small number of processes get left behind. I can see them with ps -ef, and if I monitor their count overtime it gradually increases until it breaks Pythons multi-threading capabilities. I added code to look for these instances and manually terminate them, which works in so far as it detects the hung processes, but it does not seem able to terminate them, despite the processes having returned good completion. What am I missing?
Thanks again!

Python - Notifying another thread blocked on subprocess

I am creating a custom job scheduler with a web frontend in python 3.4 on linux. This program creates a daemon (consumer) thread that waits for jobs to come available in a PriorityQueue. These jobs can manually be added through the web interface which adds them to the queue. When the consumer thread finds a job, it executes a program using subprocess.run, and waits for it to finish.
The basic idea of the worker thread:
class Worker(threading.Thread):
def __init__(self, queue):
self.queue = queue
# more code here
def run(self):
while True:
try:
job = self.queue.get()
#do some work
proc = subprocess.run("myprogram", timeout=my_timeout)
#do some more things
except TimeoutExpired:
#do some administration
self.queue.add(job)
However:
This consumer should be able to receive some kind of signal from the frontend (main thread) that it should stop the current job and instead work on the next job in the queue (saving the state of the current job and adding it to the end of the queue again). This can (and will most likely) happen while blocked on subprocess.run().
The subprocesses can simply be killed (the program that is executed saves sme state in a file) but the worker thread needs to do some administration on the killed job to make sure it can be resumed later on.
There can be multiple such worker threads.
Signal handlers are not an option (since they are always handled by the main thread which is a webserver and should not be bothered with this).
Having an event loop in which the process actively polls for events (such as the child exiting, the timeout occurring or the interrupt event) is in this context not really a solution but an ugly hack. The jobs are performance-heavy and constant context switches are unwanted.
What synchronization primitives should I use to interrupt this thread or to make sure it waits for several events at the same time in a blocking fashion?

I think you've accidentally glossed over a simple solution: your second bullet point says that you have the ability to kill the programs that are running in subprocesses. Notice that subprocess.call returns the return code of the subprocess. This means that you can let the main thread kill the subprocess, and just check the return code to see if you need to do any cleanup. Even better, you could use subprocess.check_call instead, which will raise an exception for you if the returncode isn't 0. I don't know what platform you're working on, but on Linux, killed processes generally don't return a 0 if they're killed.
It could look something like this:
class Worker(threading.Thread):
def __init__(self, queue):
self.queue = queue
# more code here
def run(self):
while True:
try:
job = self.queue.get()
#do some work
subprocess.check_call("myprogram", timeout=my_timeout)
#do some more things
except (TimeoutExpired, subprocess.CalledProcessError):
#do some administration
self.queue.add(job)
Note that if you're using Python 3.5, you can use subprocess.run instead, and set the check argument to True.
If you have a strong need to handle the cases where the worker needs to be interrupted when it isn't running the subprocess, then I think you're going to have to use a polling loop, because I don't think the behavior you're looking for is supported for threads in Python. You can use a threading.Event object to pass the "stop working now" pseudo-signal from your main thread to the worker, and have the worker periodically check the state of that event object.
If you're willing to consider using multiple processing stead of threads, consider switching over to the multiprocessing module, which would allow you to handle signals. There is more overhead to spawning full-blown subprocesses instead of threads, but you're essentially looking for signal-like asynchronous behavior, and I don't think Python's threading library supports anything like that. One benefit though, would be that you would be freed from the Global Interpreter Lock(PDF link), so you may actually see some speed benefits if your worker processes (formerly threads) are doing anything CPU intensive.

Threads in Python again

guys!
My application is a bot. It simply receives a message, process it and returns result.
But there are a lot of messages and I'm creating separate thread for processing each, but it makes an application slower (not a bit).
So, Is it any way to reduce CPU usage by replacing threads with something else?

You probably want processes rather than threads. Spawn processes at startup, and use Pipes to talk to them.
http://docs.python.org/dev/library/multiprocessing.html

Threads and processes have the same speed.
Your problem is not which one you use, but how many you use.
The answer is to only have a fixed couple of threads or processes. Say 10.
You then create a Queue (use the Queue module) to store all messages from your robot.
The 10 threads will constantly be working, and everytime they finish, they wait for a new message in the Queue.
This saves you from the overhead of creating and destroying threads.
See http://docs.python.org/library/queue.html for more info.
def worker():
while True:
item = q.get()
do_work(item)
q.task_done()
q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in source():
q.put(item)
q.join() # block until all tasks are done

You could try creating only a limited amount of workers and distribute work between them. Python's multiprocessing.Pool would be the thing to use.

You might not even need threads. If your server can handle each request quickly, you can just make it all single-threaded using something like Twisted.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.