Python multithreading design suggestions - python

I have a question regarding my current threading design -- my current process spawns a new thread and continues the main thread until the termination condition. The process waits until all threads finish before terminating. The issue I am having is that each new thread spawned needs to see if the previous thread spawned is done. Should I simply set up a queue and use just one thread to process all the tasks? Or is it possible to spawn a thread, somehow check if the previous thread is done and process the task only once that thread in question is done?
thanks for your help

If all of the threads aside from the initial "main" thread are supposed to run sequentially, then yes, you should use a task queue and a single worker thread.
Queue can help with this (and allow your main thread to .join() on it if it needs to wait for all of the queued tasks to be completed).

Look at Gevent.
You can create several Greenlet objects for your several tasks.
Each greenlet is green thread.
from gevent import monkey
monkey.patch_all()
import gevent
from gevent import Greenlet
class Task(Greenlet):
def __init__(self, name):
Greenlet.__init__(self)
self.name = name
def _run(self):
print "Task %s: some task..." % self.name
t1 = Task("task1")
t2 = Task("task2")
t1.start()
t2.start()
# here we are waiting all tasks
gevent.joinall([t1,t2])

Related

How to fix 'TypeError: can't pickle _thread.lock objects' when passing a Queue to a thread in a child process

I've been stuck on this issue all day, and I have not been able to find any solutions relating to what I am trying to accomplish.
I am trying to pass Queues to threads spawned in sub-processes. The Queues were created in the entrance file and passed to each sub-process as a parameter.
I am making a modular program to a) run a neural network b) automatically update the network models when needed c) log events/images from the neural network to the servers. My former program idolized only one CPU-core running multiple threads and was getting quite slow, so I decided I needed to sub-process certain parts of the program so they can run in their own memory spaces to their fullest potential.
Sub-process:
Client-Server communication
Webcam control and image processing
Inferencing for the neural networks (there are 2 neural networks with their own process each)
4 total sub-processes.
As I develop, I need to communicate across each process so they are all on the same page with events from the servers and whatnot. So Queue would be the best option as far as I can tell.
(Clarify: 'Queue' from the 'multiprocessing' module, NOT the 'queue' module)
~~ However ~~
Each of these sub-processes spawn their own thread(s). For example, the 1st sub-process will spawn multiple threads: One thread per Queue to listen to the events from the different servers and hand them to different areas of the program; one thread to listen to the Queue receiving images from one of the neural networks; one thread to listen to the Queue receiving live images from the webcam; and one thread to listen to the Queue receiving the output from the other neural network.
I can pass the Queues to the sub-processes without issue and can use them effectively. However, when I try to pass them to the threads within each sub-process, I get the above error.
I am fairly new to multiprocessing; however, the methodology behind it looks to be relatively the same as threads except for the shared memory space and GIL.
This is from Main.py; the program entrance.
from lib.client import Client, Image
from multiprocessing import Queue, Process
class Main():
def __init__(self, server):
self.KILLQ = Queue()
self.CAMERAQ = Queue()
self.CLIENT = Client((server, 2005), self.KILLQ, self.CAMERAQ)
self.CLIENT_PROCESS = Process(target=self.CLIENT.do, daemon=True)
self.CLIENT_PROCESS.start()
if __name__ == '__main__':
m = Main('127.0.0.1')
while True:
m.KILLQ.put("Hello world")
And this is from client.py (in a folder called lib)
class Client():
def __init__(self, connection, killq, cameraq):
self.TCP_IP = connection[0]
self.TCP_PORT = connection[1]
self.CAMERAQ = cameraq
self.KILLQ = killq
self.BUFFERSIZE = 1024
self.HOSTNAME = socket.gethostname()
self.ATTEMPTS = 0
self.SHUTDOWN = False
self.START_CONNECTION = MakeConnection((self.TCP_IP, self.TCP_PORT))
# self.KILLQ_THREAD = Thread(target=self._listen, args=(self.KILLQ,), daemon=True)
# self.KILLQ_THREAD.start()
def do(self):
# The function ran as the subprocess from Main.py
print(self.KILLQ.get())
def _listen(self, q):
# This is threaded multiple times listening to each Queue (as 'q' that is passed when the thread is created)
while True:
print(self.q.get())
# self.KILLQ_THREAD = Thread(target=self._listen, args=(self.KILLQ,), daemon=True)
This is where the error is thrown. If I leave this line commented, the program runs fine. I can read from the queue in this sub-process without issue (i.e. the function 'do') not in a thread under this sub-process (i.e. the function '_listen').
I need to be able to communicate across each process so they can be in step with the main program (i.e. in the case of a neural network model update, the inference sub-process needs to shut down so the model can be updated without causing errors).
Any help with this would be great!
I am also very open to other methods of communication that would work as well. In the event that you believe a better communication process would work; it would need to be fast enough to support real-time streaming of 4k images sent to the server from the camera.
Thank you very much for your time! :)
The queue is not the problem. The ones from the multiprocessing package are designed to be picklable, so that they can be shared between processes.
The issue is, that your thread KILLQ_THREAD is created in the main process. Threads are not to be shared between processes. In fact, when a process is forked following POSIX standards, threads that are active in the parent process are not part of the process image that is cloned to the new child's memory space. One reason is that the state of mutexes at the time of the call to fork() might lead to deadlocks in the child process.
You'll have to move the creation of your thread to your child process, i.e.
def do(self):
self.KILLQ_THREAD = Thread(target=self._listen, args=(self.KILLQ,), daemon=True)
self.KILLQ_THREAD.start()
Presumably, KILLQ is supposed to signal the child processes to shut down. In that case, especially if you plan to use more than one child process, a queue is not the best method to achieve that. Since Queue.get() and Queue.get_nowait() remove the item from the queue, each item can only be retrieved and processed by one consumer. Your producer would have to put multiple shutdown signals into the queue. In a multi-consumer scenario, you also have no reasonable way to ensure that a specific consumer receives any specific item. Any item put into the queue can potentially be retrieved by any of the consumers reading from it.
For signalling, especially with multiple recipients, better use Event
You'll also notice, that your program appears to hang quickly after starting it. That's because you start both, your child process and the thread with daemon=True.
When your Client.do() method looks like above, i.e. creates and starts the thread, then exits, your child process ends right after the call to self.KILLQ_THREAD.start() and the daemonic thread immediately ends with it. Your main process does not notice anything and continues to put Hello world into the queue until it eventually fills up and queue.Full raises.
Here's a condensed code example using an Event for shutdown signalling in two child processes with one thread each.
main.py
import time
from lib.client import Client
from multiprocessing import Process, Event
class Main:
def __init__(self):
self.KILLQ = Event()
self._clients = (Client(self.KILLQ), Client(self.KILLQ))
self._procs = [Process(target=cl.do, daemon=True) for cl in self._clients]
[proc.start() for proc in self._procs]
if __name__ == '__main__':
m = Main()
# do sth. else
time.sleep(1)
# signal for shutdown
m.KILLQ.set()
# grace period for both shutdown prints to show
time.sleep(.1)
client.py
import multiprocessing
from threading import Thread
class Client:
def __init__(self, killq):
self.KILLQ = killq
def do(self):
# non-daemonic thread! We want the process to stick around until the thread
# terminates on the signal set by the main process
self.KILLQ_THREAD = Thread(target=self._listen, args=(self.KILLQ,))
self.KILLQ_THREAD.start()
#staticmethod
def _listen(q):
while not q.is_set():
print("in thread {}".format(multiprocessing.current_process().name))
print("{} - master signalled shutdown".format(multiprocessing.current_process().name))
Output
[...]
in thread Process-2
in thread Process-1
in thread Process-2
Process-2 - master signalled shutdown
in thread Process-1
Process-1 - master signalled shutdown
Process finished with exit code 0
As for methods of inter process communication, you might want to look into a streaming server solution.
Miguel Grinberg has written an excellent tutorial on Video Streaming with Flask back in 2014 with a more recent follow-up from August 2017.

Trying to understand python multithreading

Please consider this code:
import threading
def printer():
for i in range(2):
with lock:
print ['foo', 'bar', 'baz']
def main():
global lock
lock = threading.Lock()
threads = [threading.Thread(target=printer) for x in xrange(2)]
for t in threads:
t.start()
t.join()
main()
I can understand this code and it is clear: We create two threads and we run them sequentially - we run second thread only when first thread is finished. Ok, now consider another variant:
import threading
def printer():
for i in range(2):
with lock:
print ['foo', 'bar', 'baz']
def main():
global lock
lock = threading.Lock()
threads = [threading.Thread(target=printer) for x in xrange(2)]
for t in threads:
t.start()
for t in threads:
t.join()
main()
What happens here? Ok, we run them in parallel, but what is the purpose of make main thread waiting for child threads in second variant? How it can influence on the output?
In the second variant, the ordering of execution is much less defined.
The lock is released each time through the loop in printer. In both variants, you have two threads and two loops within a thread.
In the first variant, since only one thread runs at a time, you know the total ordering.
In the second variant, each time the lock is released, the thread running may change.
So you might get
thread 1 loop 1
thread 1 loop 2
thread 2 loop 1
thread 2 loop 2
or perhaps
* thread 2 loop 1
* thread 1 loop 1
* thread 1 loop 2
* thread 2 loop 2
The only constraint is that loop1 within a given thread runs before loop 2 within that thread and that the two print statements come together since the lock is held for both of them.
In this particular case I'm not sure the call to t.join() in the second variant has an observable effect. It guarantees that the main thread will be the last thread to end, but I'm not sure that in this code you can observe that in any way. In more complex code, joining the threads can be important so that cleanup actions are only performed after all threads terminate. This can also be very important if you have daemon threads, because the entire program will terminate when all non-daemon threads terminate.
To better understand the multithreading in python, you need to first understand the relationship between the main thread and the children threads.
The main thread is the entry of the program, it is created by your system when you run your script. For example, in your script, the main function is run in the main thread.
While the children thread is created by your main thread when you instanate the Thread class.
The most important thing is how the main thread controls the children thread. Basically, the instance of the Thread is everything that the main thread know about and control over this child thread. At the time when a child thread is created, this child thread does not run immediately, until the main thread call start function on this thread instance. After the start the child thread, you can assume that the main thread and the child thread is running parallelly now.
But one more important thing is how the main thread knows that the task of child thread is done. Though the main thread knows nothing about how the task is done by the child thread, it does be aware of the running status of the child thread. Thread.is_alive can check the status of a thread by the main thread. In pratice, the Thread.join function is always used to tell the main thread wait until the child thread is done. This function will block the main thread.
Okay, let's examine the two script you are confused with. For the first script:
for t in threads:
t.start()
t.join()
The children threads in the loop are started and then joined one by one. Note that start does not block main thread, while join will block the main thread wait until this child thread is done. Thus they are running sequentially.
While for the second script:
for t in threads:
t.start()
for t in threads:
t.join()
All children threads are started in the first loop. As the Thread.start function will not block the main thread, all children threadings are running parallelly after the first loop. In the second loop, the main thread will wait for the task done of each child thread one by one.
Now I think you should notice the difference between these two script: in the first one, children threads running one by one, while in the second script, they are running simultaneously.
There are other useful topics for the python threading:
(1) How to handle the Keyboard Interrupt Exception, e.g., when I want to terminate the program by Ctrl-C? Only the main thread will receive the exception, you have to handle the termination of children threads.
(2) Multithreading vs Multiprocessing. Although we are saying that threading is parallel, it is not the real parallel in CPU level. So if your application is CPU intensive, try multiprocessing, and if your application is I/O intensive, multithreading maybe sufficient.
By the way, read through the documentation of python threading section and try some code may help you understand it.
Hope this would be helpful. Thanks.

Python - Notifying another thread blocked on subprocess

I am creating a custom job scheduler with a web frontend in python 3.4 on linux. This program creates a daemon (consumer) thread that waits for jobs to come available in a PriorityQueue. These jobs can manually be added through the web interface which adds them to the queue. When the consumer thread finds a job, it executes a program using subprocess.run, and waits for it to finish.
The basic idea of the worker thread:
class Worker(threading.Thread):
def __init__(self, queue):
self.queue = queue
# more code here
def run(self):
while True:
try:
job = self.queue.get()
#do some work
proc = subprocess.run("myprogram", timeout=my_timeout)
#do some more things
except TimeoutExpired:
#do some administration
self.queue.add(job)
However:
This consumer should be able to receive some kind of signal from the frontend (main thread) that it should stop the current job and instead work on the next job in the queue (saving the state of the current job and adding it to the end of the queue again). This can (and will most likely) happen while blocked on subprocess.run().
The subprocesses can simply be killed (the program that is executed saves sme state in a file) but the worker thread needs to do some administration on the killed job to make sure it can be resumed later on.
There can be multiple such worker threads.
Signal handlers are not an option (since they are always handled by the main thread which is a webserver and should not be bothered with this).
Having an event loop in which the process actively polls for events (such as the child exiting, the timeout occurring or the interrupt event) is in this context not really a solution but an ugly hack. The jobs are performance-heavy and constant context switches are unwanted.
What synchronization primitives should I use to interrupt this thread or to make sure it waits for several events at the same time in a blocking fashion?
I think you've accidentally glossed over a simple solution: your second bullet point says that you have the ability to kill the programs that are running in subprocesses. Notice that subprocess.call returns the return code of the subprocess. This means that you can let the main thread kill the subprocess, and just check the return code to see if you need to do any cleanup. Even better, you could use subprocess.check_call instead, which will raise an exception for you if the returncode isn't 0. I don't know what platform you're working on, but on Linux, killed processes generally don't return a 0 if they're killed.
It could look something like this:
class Worker(threading.Thread):
def __init__(self, queue):
self.queue = queue
# more code here
def run(self):
while True:
try:
job = self.queue.get()
#do some work
subprocess.check_call("myprogram", timeout=my_timeout)
#do some more things
except (TimeoutExpired, subprocess.CalledProcessError):
#do some administration
self.queue.add(job)
Note that if you're using Python 3.5, you can use subprocess.run instead, and set the check argument to True.
If you have a strong need to handle the cases where the worker needs to be interrupted when it isn't running the subprocess, then I think you're going to have to use a polling loop, because I don't think the behavior you're looking for is supported for threads in Python. You can use a threading.Event object to pass the "stop working now" pseudo-signal from your main thread to the worker, and have the worker periodically check the state of that event object.
If you're willing to consider using multiple processing stead of threads, consider switching over to the multiprocessing module, which would allow you to handle signals. There is more overhead to spawning full-blown subprocesses instead of threads, but you're essentially looking for signal-like asynchronous behavior, and I don't think Python's threading library supports anything like that. One benefit though, would be that you would be freed from the Global Interpreter Lock(PDF link), so you may actually see some speed benefits if your worker processes (formerly threads) are doing anything CPU intensive.

python threads synchronization

I have a python app that goes like this:
main thread is GUI
there is a config thread that is started even before GUI
config thread starts a few others independent threads
=> How can I let GUI know that all of those "independent threads" (3.) have finished? How do I detect it in my program (just give me general idea)
I know about Semaphores but I couldnt figure it out, since this is a little more logically complicated than to what I am used to when dealing with threads.
PS all those threads are QThreads from PyQt if that is of any importance but I doubt it.
thanks
The Queue module is great for communicating between threads without worrying about locks or other mutexes. It features a pair of methods, task_done() and join() that are used to signal the completion of tasks and to wait for all tasks to be completed. Here's an example from the docs:
def worker():
while True:
item = q.get()
do_work(item)
q.task_done()
q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in source():
q.put(item)
q.join() # block until all tasks are done

Parent Thread exiting before Child Threads [python]

I'm using Python in a webapp (CGI for testing, FastCGI for production) that needs to send an occasional email (when a user registers or something else important happens). Since communicating with an SMTP server takes a long time, I'd like to spawn a thread for the mail function so that the rest of the app can finish up the request without waiting for the email to finish sending.
I tried using thread.start_new(func, (args)), but the Parent return's and exits before the sending is complete, thereby killing the sending process before it does anything useful. Is there anyway to keep the process alive long enough for the child process to finish?
Take a look at the thread.join() method. Basically it will block your calling thread until the child thread has returned (thus preventing it from exiting before it should).
Update:
To avoid making your main thread unresponsive to new requests you can use a while loop.
while threading.active_count() > 0:
# ... look for new requests to handle ...
time.sleep(0.1)
# or try joining your threads with a timeout
#for thread in my_threads:
# thread.join(0.1)
Update 2:
It also looks like thread.start_new(func, args) is obsolete. It was updated to thread.start_new_thread(function, args[, kwargs]) You can also create threads with the higher level threading package (this is the package that allows you to get the active_count() in the previous code block):
import threading
my_thread = threading.Thread(target=func, args=(), kwargs={})
my_thread.daemon = True
my_thread.start()
You might want to use threading.enumerate, if you have multiple workers and want to see which one(s) are still running.
Other alternatives include using threading.Event---the main thread sets the event to True and starts the worker thread off. The worker thread unsets the event when if finishes work, and the main check whether the event is set/unset to figure out if it can exit.

Categories