Understanding The Threading and Queue combination in Python - python

I started looking into create threads in Python. I did some theory search first to understand how Threads work in Python. I also went ahead to read about the use of Queue in Python and how it can help solving trivial Threading problems. I was able to understand separate codes for each. Then I came across the following tutorial :
http://www.ibm.com/developerworks/aix/library/au-threadingpython/
It shows the relevance of Thread and Queue in Python and how it can speed up the execution process under certain circumstances.
I am having difficulty in understanding some areas of the code
def main():
#spawn a pool of threads, and pass them queue instance
for i in range(5):
t = ThreadUrl(queue)
t.setDaemon(True)
t.start()
#populate queue with data
for host in hosts:
queue.put(host)
#wait on the queue until everything has been processed
queue.join()
main()
print "Elapsed Time: %s" % (time.time() - start)
In the first for loop the multiple threads are created and a queue instance is passed to it. But from my understanding the queue is empty as of now.
In the next for loop
for host in hosts:
The host values are pushed into the queue. Now how is this Queue data assigned to threads?
Lastly, what is the use of queue.join() with relevance to this program?

"In the first for loop the multiple threads are created and a queue instance is passed to it. But from my understanding the queue is empty as of now."
Yes, the threads are started but they have no work to do yet.
for host in hosts:
"The host values are pushed into the queue. Now how is this Queue data assigned to threads?"
by the Queue instance
"what is the use of queue.join() with relevance to this program?"
join causes your program to wait until the threads have all finished processing and their outputs are collected by the queue instance. Your program will block at this point until the queue has completed.

Related

How to fix 'TypeError: can't pickle _thread.lock objects' when passing a Queue to a thread in a child process

I've been stuck on this issue all day, and I have not been able to find any solutions relating to what I am trying to accomplish.
I am trying to pass Queues to threads spawned in sub-processes. The Queues were created in the entrance file and passed to each sub-process as a parameter.
I am making a modular program to a) run a neural network b) automatically update the network models when needed c) log events/images from the neural network to the servers. My former program idolized only one CPU-core running multiple threads and was getting quite slow, so I decided I needed to sub-process certain parts of the program so they can run in their own memory spaces to their fullest potential.
Sub-process:
Client-Server communication
Webcam control and image processing
Inferencing for the neural networks (there are 2 neural networks with their own process each)
4 total sub-processes.
As I develop, I need to communicate across each process so they are all on the same page with events from the servers and whatnot. So Queue would be the best option as far as I can tell.
(Clarify: 'Queue' from the 'multiprocessing' module, NOT the 'queue' module)
~~ However ~~
Each of these sub-processes spawn their own thread(s). For example, the 1st sub-process will spawn multiple threads: One thread per Queue to listen to the events from the different servers and hand them to different areas of the program; one thread to listen to the Queue receiving images from one of the neural networks; one thread to listen to the Queue receiving live images from the webcam; and one thread to listen to the Queue receiving the output from the other neural network.
I can pass the Queues to the sub-processes without issue and can use them effectively. However, when I try to pass them to the threads within each sub-process, I get the above error.
I am fairly new to multiprocessing; however, the methodology behind it looks to be relatively the same as threads except for the shared memory space and GIL.
This is from Main.py; the program entrance.
from lib.client import Client, Image
from multiprocessing import Queue, Process
class Main():
def __init__(self, server):
self.KILLQ = Queue()
self.CAMERAQ = Queue()
self.CLIENT = Client((server, 2005), self.KILLQ, self.CAMERAQ)
self.CLIENT_PROCESS = Process(target=self.CLIENT.do, daemon=True)
self.CLIENT_PROCESS.start()
if __name__ == '__main__':
m = Main('127.0.0.1')
while True:
m.KILLQ.put("Hello world")
And this is from client.py (in a folder called lib)
class Client():
def __init__(self, connection, killq, cameraq):
self.TCP_IP = connection[0]
self.TCP_PORT = connection[1]
self.CAMERAQ = cameraq
self.KILLQ = killq
self.BUFFERSIZE = 1024
self.HOSTNAME = socket.gethostname()
self.ATTEMPTS = 0
self.SHUTDOWN = False
self.START_CONNECTION = MakeConnection((self.TCP_IP, self.TCP_PORT))
# self.KILLQ_THREAD = Thread(target=self._listen, args=(self.KILLQ,), daemon=True)
# self.KILLQ_THREAD.start()
def do(self):
# The function ran as the subprocess from Main.py
print(self.KILLQ.get())
def _listen(self, q):
# This is threaded multiple times listening to each Queue (as 'q' that is passed when the thread is created)
while True:
print(self.q.get())
# self.KILLQ_THREAD = Thread(target=self._listen, args=(self.KILLQ,), daemon=True)
This is where the error is thrown. If I leave this line commented, the program runs fine. I can read from the queue in this sub-process without issue (i.e. the function 'do') not in a thread under this sub-process (i.e. the function '_listen').
I need to be able to communicate across each process so they can be in step with the main program (i.e. in the case of a neural network model update, the inference sub-process needs to shut down so the model can be updated without causing errors).
Any help with this would be great!
I am also very open to other methods of communication that would work as well. In the event that you believe a better communication process would work; it would need to be fast enough to support real-time streaming of 4k images sent to the server from the camera.
Thank you very much for your time! :)
The queue is not the problem. The ones from the multiprocessing package are designed to be picklable, so that they can be shared between processes.
The issue is, that your thread KILLQ_THREAD is created in the main process. Threads are not to be shared between processes. In fact, when a process is forked following POSIX standards, threads that are active in the parent process are not part of the process image that is cloned to the new child's memory space. One reason is that the state of mutexes at the time of the call to fork() might lead to deadlocks in the child process.
You'll have to move the creation of your thread to your child process, i.e.
def do(self):
self.KILLQ_THREAD = Thread(target=self._listen, args=(self.KILLQ,), daemon=True)
self.KILLQ_THREAD.start()
Presumably, KILLQ is supposed to signal the child processes to shut down. In that case, especially if you plan to use more than one child process, a queue is not the best method to achieve that. Since Queue.get() and Queue.get_nowait() remove the item from the queue, each item can only be retrieved and processed by one consumer. Your producer would have to put multiple shutdown signals into the queue. In a multi-consumer scenario, you also have no reasonable way to ensure that a specific consumer receives any specific item. Any item put into the queue can potentially be retrieved by any of the consumers reading from it.
For signalling, especially with multiple recipients, better use Event
You'll also notice, that your program appears to hang quickly after starting it. That's because you start both, your child process and the thread with daemon=True.
When your Client.do() method looks like above, i.e. creates and starts the thread, then exits, your child process ends right after the call to self.KILLQ_THREAD.start() and the daemonic thread immediately ends with it. Your main process does not notice anything and continues to put Hello world into the queue until it eventually fills up and queue.Full raises.
Here's a condensed code example using an Event for shutdown signalling in two child processes with one thread each.
main.py
import time
from lib.client import Client
from multiprocessing import Process, Event
class Main:
def __init__(self):
self.KILLQ = Event()
self._clients = (Client(self.KILLQ), Client(self.KILLQ))
self._procs = [Process(target=cl.do, daemon=True) for cl in self._clients]
[proc.start() for proc in self._procs]
if __name__ == '__main__':
m = Main()
# do sth. else
time.sleep(1)
# signal for shutdown
m.KILLQ.set()
# grace period for both shutdown prints to show
time.sleep(.1)
client.py
import multiprocessing
from threading import Thread
class Client:
def __init__(self, killq):
self.KILLQ = killq
def do(self):
# non-daemonic thread! We want the process to stick around until the thread
# terminates on the signal set by the main process
self.KILLQ_THREAD = Thread(target=self._listen, args=(self.KILLQ,))
self.KILLQ_THREAD.start()
#staticmethod
def _listen(q):
while not q.is_set():
print("in thread {}".format(multiprocessing.current_process().name))
print("{} - master signalled shutdown".format(multiprocessing.current_process().name))
Output
[...]
in thread Process-2
in thread Process-1
in thread Process-2
Process-2 - master signalled shutdown
in thread Process-1
Process-1 - master signalled shutdown
Process finished with exit code 0
As for methods of inter process communication, you might want to look into a streaming server solution.
Miguel Grinberg has written an excellent tutorial on Video Streaming with Flask back in 2014 with a more recent follow-up from August 2017.

Python Queues in Multithreading and Continual Processing

I have several threads doing several tasks. One thread listens to data from a UDP socket. Another thread processes the data into JSON objects and queues the data. Another thread sends the information.
What Im trying to work out at the moment, is how a queue works on empty data? While most routines does a
while not q.empty():
object = q.get()
I need to find out how to process the queue in the while loop when there is data on the queue.
I guess I could put a while True loop in and sleep(1). But the trouble is, if the data hits the queue quicker than the sleep time. If I take the sleep(1) out, then my usually understanding, is that the while loop will just eat CPU.
So I guess I need some way of telling the thread that there's data on the queue and therefore to run a processing routine?
You can use conditional variable used in classical synchronization problem, to avoid busy waiting.
You can read more about "Producer consumer" problem.
In this case, the producer threads will add one or more items to the queue, whereas a consumer would be a thread waiting for an item in the queue.
Look at threading.Condition class for usage and more details.
Ref: https://docs.python.org/3/library/threading.html#threading.Condition
Edit:
If you use Queue class from queue module then you don't have to implement nitty-gritty of thread lock and condition management w.r.t Queue.
Ref: https://docs.python.org/3.8/library/queue.html
def worker():
while True:
item = q.get()
if item is None:
break
do_work(item)
q.task_done()
q = queue.Queue()
threads = []
for i in range(num_worker_threads):
t = threading.Thread(target=worker)
t.start()
threads.append(t)
for item in source():
q.put(item)
# block until all tasks are done
q.join()
# stop workers
for i in range(num_worker_threads):
q.put(None)
for t in threads:
t.join()
In the above example, the worker thread is waiting for the item in the queue. If an item is made available then Queue object will notify the waiting threads and one of them will get the item from the queue.

Multiprocesing pool.join() hangs under some circumstances

I am trying to create a simple producer / consumer pattern in Python using multiprocessing. It works, but it hangs on poll.join().
from multiprocessing import Pool, Queue
que = Queue()
def consume():
while True:
element = que.get()
if element is None:
print('break')
break
print('Consumer closing')
def produce(nr):
que.put([nr] * 1000000)
print('Producer {} closing'.format(nr))
def main():
p = Pool(5)
p.apply_async(consume)
p.map(produce, range(5))
que.put(None)
print('None')
p.close()
p.join()
if __name__ == '__main__':
main()
Sample output:
~/Python/Examples $ ./multip_prod_cons.py
Producer 1 closing
Producer 3 closing
Producer 0 closing
Producer 2 closing
Producer 4 closing
None
break
Consumer closing
However, it works perfectly when I change one line:
que.put([nr] * 100)
It is 100% reproducible on Linux system running Python 3.4.3 or Python 2.7.10. Am I missing something?
There is quite a lot of confusion here. What you are writing is not a producer/consumer scenario but a mess which is misusing another pattern usually referred as "pool of workers".
The pool of workers pattern is an application of the producer/consumer one in which there is one producer which schedules the work and many consumers which consume it. In this pattern, the owner of the Pool ends up been the producer while the workers will be the consumers.
In your example instead you have a hybrid solution where one worker ends up being a consumer and the others act as sort of middle-ware. The whole design is very inefficient, duplicates most of the logic already provided by the Pool and, more important, is very error prone. What you end up suffering from, is a Deadlock.
Putting an object into a multiprocessing.Queue is an asynchronous operation. It blocks only if the Queue is full and your Queue has infinite size.
This means your produce function returns immediately therefore the call to p.map is not blocking as you expect it to do. The related worker processes instead, wait until the actual message goes through the Pipe which the Queue uses as communication channel.
What happens next is that you terminate prematurely your consumer as you put in the Queue the None "message" which gets delivered before all the lists your produce function create are properly pushed through the Pipe.
You notice the issue once you call p.join but the real situation is the following.
the p.join call is waiting for all the worker processes to terminate.
the worker processes are waiting for the big lists to go though the Queue's Pipe.
as the consumer worker is long gone, nobody drains the Pipe which is obviously full.
The issue does not show if your lists are small enough to go through before you actually send the termination message to the consume function.

producer/consumer-like multithreading in python

A typical producer-consumer problem is solved in python like below:
from queue import Queue
job_queue = Queue(maxsize=10)
def manager():
while i_have_some_job_do:
job = get_data_from_somewhere()
job_queue.put(job) #blocks only if queue is currently full
def worker():
while True:
data = job_queue.get() # blocks until data available
#get things done
But I have a variant of producer/consumer problem (not one strictly speaking, so let me call it manager-worker):
The manager puts some job in a Queue, and the worker should keep getting the jobs and doing them. But when the worker get a job, it does not remove the job from the Queue(unlike Queue.get()). And it is the manager which is able to remove a job from the Queue.
So how does the worker get the job while not removing the job from the queue? Maybe get and put is OK?
How does the manager remove a particular job from the queue?
Perhaps your works can't remove jobs completely, but consider letting them move them from the original queue to a different "job done" queue. The move itself should be cheap and fast, and the manager can then process the "job done" queue, removing elements it agrees are done, and moving others back to the worker queue.

Threads in Python again

guys!
My application is a bot. It simply receives a message, process it and returns result.
But there are a lot of messages and I'm creating separate thread for processing each, but it makes an application slower (not a bit).
So, Is it any way to reduce CPU usage by replacing threads with something else?
You probably want processes rather than threads. Spawn processes at startup, and use Pipes to talk to them.
http://docs.python.org/dev/library/multiprocessing.html
Threads and processes have the same speed.
Your problem is not which one you use, but how many you use.
The answer is to only have a fixed couple of threads or processes. Say 10.
You then create a Queue (use the Queue module) to store all messages from your robot.
The 10 threads will constantly be working, and everytime they finish, they wait for a new message in the Queue.
This saves you from the overhead of creating and destroying threads.
See http://docs.python.org/library/queue.html for more info.
def worker():
while True:
item = q.get()
do_work(item)
q.task_done()
q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in source():
q.put(item)
q.join() # block until all tasks are done
You could try creating only a limited amount of workers and distribute work between them. Python's multiprocessing.Pool would be the thing to use.
You might not even need threads. If your server can handle each request quickly, you can just make it all single-threaded using something like Twisted.

Categories