An Exception is raised in threading._wait_for_tstate_lock when I transfere hugh data between a Process and a Thread via multiprocessing.Queue.
My minimal working example looks a bit complex first - sorry. I will explain. The original application loads a lot of (not so important) files into RAM. This is done in a separate process to save ressources. The main gui thread shouldn't freez.
The GUI start a separate Thread to prevent the gui event loop from freezing.
This separate Thread then starts one Process which should does the work.
a) This Thread instanciates a multiprocess.Queue (be aware that this is a multiprocessing and not threading!)
b) This is givin to the Process for sharing data from Process back to the Thread.
The Process does some work (3 steps) and .put() the result into the multiprocessing.Queue.
When the Process ends the Thread takes over again and collect the data from the Queue, store it to its own attribute MyThread.result.
The Thread tells the GUI main loop/thread to call a callback function if it has time for.
The callback function (MyWindow::callback_thread_finished()) get the results from MyWindow.thread.result.
The problem is if the data put to the Queue is to big something happen I don't understand - the MyThread never ends. I have to cancle the application via Strg+C.
I got some hints from the docs. But my problem is I did not fully understand the documentation. But I have the feeling that the key of my problems can be found there.
Please see the two red boxex in "Pipes and Queues" (Python 3.5 docs).
That is the full output
MyWindow::do_start()
Running MyThread...
Running MyProcess...
MyProcess stoppd.
^CProcess MyProcess-1:
Exception ignored in: <module 'threading' from '/usr/lib/python3.5/threading.py'>
Traceback (most recent call last):
File "/usr/lib/python3.5/threading.py", line 1288, in _shutdown
t.join()
File "/usr/lib/python3.5/threading.py", line 1054, in join
self._wait_for_tstate_lock()
File "/usr/lib/python3.5/threading.py", line 1070, in _wait_for_tstate_lock
elif lock.acquire(block, timeout):
KeyboardInterrupt
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 252, in _bootstrap
util._exit_function()
File "/usr/lib/python3.5/multiprocessing/util.py", line 314, in _exit_function
_run_finalizers()
File "/usr/lib/python3.5/multiprocessing/util.py", line 254, in _run_finalizers
finalizer()
File "/usr/lib/python3.5/multiprocessing/util.py", line 186, in __call__
res = self._callback(*self._args, **self._kwargs)
File "/usr/lib/python3.5/multiprocessing/queues.py", line 198, in _finalize_join
thread.join()
File "/usr/lib/python3.5/threading.py", line 1054, in join
self._wait_for_tstate_lock()
File "/usr/lib/python3.5/threading.py", line 1070, in _wait_for_tstate_lock
elif lock.acquire(block, timeout):
KeyboardInterrupt
This is the minimal working example
#!/usr/bin/env python3
import multiprocessing
import threading
import time
import gi
gi.require_version('Gtk', '3.0')
from gi.repository import Gtk
from gi.repository import GLib
class MyThread (threading.Thread):
"""This thread just starts the process."""
def __init__(self, callback):
threading.Thread.__init__(self)
self._callback = callback
def run(self):
print('Running MyThread...')
self.result = []
queue = multiprocessing.Queue()
process = MyProcess(queue)
process.start()
process.join()
while not queue.empty():
process_result = queue.get()
self.result.append(process_result)
print('MyThread stoppd.')
GLib.idle_add(self._callback)
class MyProcess (multiprocessing.Process):
def __init__(self, queue):
multiprocessing.Process.__init__(self)
self.queue = queue
def run(self):
print('Running MyProcess...')
for i in range(3):
self.queue.put((i, 'x'*102048))
print('MyProcess stoppd.')
class MyWindow (Gtk.Window):
def __init__(self):
Gtk.Window.__init__(self)
self.connect('destroy', Gtk.main_quit)
GLib.timeout_add(2000, self.do_start)
def do_start(self):
print('MyWindow::do_start()')
# The process need to be started from a separate thread
# to prevent the main thread (which is the gui main loop)
# from freezing while waiting for the process result.
self.thread = MyThread(self.callback_thread_finished)
self.thread.start()
def callback_thread_finished(self):
result = self.thread.result
for r in result:
print('{} {}...'.format(r[0], r[1][:10]))
if __name__ == '__main__':
win = MyWindow()
win.show_all()
Gtk.main()
Possible duplicate but quite different and IMO without an answer for my situation: Thread._wait_for_tstate_lock() never returns.
Workaround
Using a Manager by modifing line 22 to queue = multiprocessing.Manager().Queue() solve the problem. But I don't know why. My intention of this question is to understand the things behind and not only to make my code work. Even I don't really know what a Manager() is and if it has other (problem causing) implications.
According to the second warning box in the documentation you are linking to you can get a deadlock when you join a process before processing all items in the queue. So starting the process and immediately joining it and then processing the items in the queue is the wrong order of steps. You have to start the process, then receive the items, and then only when all items are received you can call the join method. Define some sentinel value to signal that the process is finished sending data through the queue. None for example if that can't be a regular value you expect from the process.
class MyThread(threading.Thread):
"""This thread just starts the process."""
def __init__(self, callback):
threading.Thread.__init__(self)
self._callback = callback
self.result = []
def run(self):
print('Running MyThread...')
queue = multiprocessing.Queue()
process = MyProcess(queue)
process.start()
while True:
process_result = queue.get()
if process_result is None:
break
self.result.append(process_result)
process.join()
print('MyThread stoppd.')
GLib.idle_add(self._callback)
class MyProcess(multiprocessing.Process):
def __init__(self, queue):
multiprocessing.Process.__init__(self)
self.queue = queue
def run(self):
print('Running MyProcess...')
for i in range(3):
self.queue.put((i, 'x' * 102048))
self.queue.put(None)
print('MyProcess stoppd.')
The documentation in question
reads:
Warning
As mentioned above, if a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread), then that process will not terminate until all buffered items have been flushed to the pipe.
This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed. Similarly, if the child process is non-daemonic then the parent process may hang on exit when it tries to join all its non-daemonic children.
Note that a queue created using a manager does not have this issue. See Programming guidelines.
This is supplementary to the accepted answer, but the edit queue is full.
Related
I'm trying to use a cluster of computers to run millions of small simulations. To do this I tried to set up two "servers" on my main computer, one to add input variables in a queue to the network and one to take care of the result.
This is the code for putting stuff into the simulation variables queue:
"""This script reads start parameters and calls on run_sim to run the
simulations"""
import time
from multiprocessing import Process, freeze_support, Manager, Value, Queue, current_process
from multiprocessing.managers import BaseManager
class QueueManager(BaseManager):
pass
class MultiComputers(Process):
def __init__(self, sim_name, queue):
self.sim_name = sim_name
self.queue = queue
super(MultiComputers, self).__init__()
def get_sim_obj(self, offset, db):
"""returns a list of lists from a database query"""
def handle_queue(self):
self.sim_nr = 0
sims = self.get_sim_obj()
self.total = len(sims)
while len(sims) > 0:
if self.queue.qsize() > 100:
self.queue.put(sims[0])
self.sim_nr += 1
print(self.sim_nr, round(self.sim_nr/self.total * 100, 2), self.queue.qsize())
del sims[0]
def run(self):
self.handle_queue()
if __name__ == '__main__':
freeze_support()
queue = Queue()
w = MultiComputers('seed_1_hundred', queue)
w.start()
QueueManager.register('get_queue', callable=lambda: queue)
m = QueueManager(address=('', 8001), authkey=b'abracadabra')
s = m.get_server()
s.serve_forever()
And then is this queue run to take care of the results of the simulations:
__author__ = 'axa'
from multiprocessing import Process, freeze_support, Queue
from multiprocessing.managers import BaseManager
import time
class QueueManager(BaseManager):
pass
class SaveFromMultiComp(Process):
def __init__(self, sim_name, queue):
self.sim_name = sim_name
self.queue = queue
super(SaveFromMultiComp, self).__init__()
def run(self):
res_got = 0
with open('sim_type1_' + self.sim_name, 'a') as f_1:
with open('sim_type2_' + self.sim_name, 'a') as f_2:
while True:
if self.queue.qsize() > 0:
while self.queue.qsize() > 0:
res = self.queue.get()
res_got += 1
if res[0] == 1:
f_1.write(str(res[1]) + '\n')
elif res[0] == 2:
f_2.write(str(res[1]) + '\n')
print(res_got)
time.sleep(0.5)
if __name__ == '__main__':
queue = Queue()
w = SaveFromMultiComp('seed_1_hundred', queue)
w.start()
m = QueueManager(address=('', 8002), authkey=b'abracadabra')
s = m.get_server()
s.serve_forever()
These scripts works as expected for handling the first ~7-800 simulations, after that I get the following error in the terminal running the receiving result script:
Exception in thread Thread-1:
Traceback (most recent call last):
File "C:\Python35\lib\threading.py", line 914, in _bootstrap_inner
self.run()
File "C:\Python35\lib\threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "C:\Python35\lib\multiprocessing\managers.py", line 177, in accepter
t.start()
File "C:\Python35\lib\threading.py", line 844, in start
_start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread
Can anyone give som insights in where and how the threads are spawned, is a new thread spawned every time I call queue.get() or how does it work?
And I would be very glad if someone knows what I can do to avoid this failure? (i'm running the script with Python3.5-32)
All signs point to your system being out of resources it needs to launch a thread (probably memory, but you could be leaking threads or other resources). You could use OS system monitoring tools (top for Linux, Resource Monitor for windows) to look at the number of threads and memory usage to track this down, but I would recommend you just use an easier, more efficient programming pattern.
While not a perfect comparison, you generally are seeing the C10K problem and it states that blocking threads waiting for results do not scale well and can be prone to leaking errors like this. The solution was to implement Async IO patterns (one blocking thread that launches other workers) and this is pretty straight forward to do in Web Servers.
A framework like pythons aiohttp should be a good fit for what you want. You just need a handler that can get the ID of the remote code and the result. The framework should hopefully take care of the scaling for you.
So in your case you can keep your launching code, but after it starts the process on the remote machine, kill the thread. Have the remote code then send an HTTP message to your server with 1) its ID and 2) its result. Throw in a little extra code to ask it to try again if it does not get a 200 'OK' Status code and you should be in much better shape.
I think you have to many Threads running for your system. I would first check your system ressources and then rethink my Program.
Try limiting your threads and use as few as possible.
Running the following minimized and reproducible code example, python (e.g. 3.7.3, and 3.8.3) will emit a message as follows when a first Ctrl+C is pressed, rather than terminate the program.
Traceback (most recent call last):
File "main.py", line 44, in <module>
Main()
File "main.py", line 41, in __init__
self.interaction_manager.join()
File "/home/user/anaconda3/lib/python3.7/threading.py", line 1032, in join
self._wait_for_tstate_lock()
File "/home/user/anaconda3/lib/python3.7/threading.py", line 1048, in _wait_for_tstate_lock
elif lock.acquire(block, timeout):
KeyboardInterrupt
Only on a second Ctrl+C being pressed after that, the program will terminate.
What is the rationale behind this design? What would be an elegant way for avoiding the need for more than a single Ctrl+C or underlying signal?
Here's the code:
from threading import Thread
from queue import Queue, Empty
def get_event(queue: Queue, block=True, timeout=None):
""" just a convenience wrapper for avoiding try-except clutter in code """
try:
element = queue.get(block, timeout)
except Empty:
element = Empty
return element
class InteractionManager(Thread):
def __init__(self):
super().__init__()
self.queue = Queue()
def run(self):
while True:
event = get_event(self.queue, block=True, timeout=0.1)
class Main(object):
def __init__(self):
# kick off the user interaction
self.interaction_manager = InteractionManager()
self.interaction_manager.start()
# wait for the interaction manager object shutdown as a signal to shutdown
self.interaction_manager.join()
if __name__ == "__main__":
Main()
Prehistoric related question: Interruptible thread join in Python
Python waits for all non-daemon threads before exiting. The first Ctrl+C merely kills the explicit self.interaction_manager.join(), the second Ctrl+C kills the internal join() of threading. Either declare the thread as an expendable daemon thread, or signal it to shut down.
A thread can be declared as expendable by setting daemon=True, either as a keyword or attribute:
class InteractionManager(Thread):
def __init__(self):
super().__init__(daemon=True)
self.queue = Queue()
def run(self):
while True:
event = get_event(self.queue, block=True, timeout=0.1)
A daemon thread is killed abruptly, and may fail to cleanly release resources if it holds any.
Graceful shutdown can be coordinated using a shared flag, such as threading.Event or a boolean value:
shutdown = False
class InteractionManager(Thread):
def __init__(self):
super().__init__()
self.queue = Queue()
def run(self):
while not shutdown:
event = get_event(self.queue, block=True, timeout=0.1)
def main()
self.interaction_manager = InteractionManager()
self.interaction_manager.start()
try:
self.interaction_manager.join()
finally:
global shutdown
shutdown = True
I am getting BrokenPipeError when threads which employ multiprocessing.JoinableQueue spawn processes. It seems that happens after the program finished working and tries to exit, because it does everyithing it supposed to do. What does it mean, is there a way to fix this / safe to ignore?
import requests
import multiprocessing
from multiprocessing import JoinableQueue
from queue import Queue
import threading
class ProcessClass(multiprocessing.Process):
def __init__(self, func, in_queue, out_queue):
super().__init__()
self.in_queue = in_queue
self.out_queue = out_queue
self.func = func
def run(self):
while True:
arg = self.in_queue.get()
self.func(arg, self.out_queue)
self.in_queue.task_done()
class ThreadClass(threading.Thread):
def __init__(self, func, in_queue, out_queue):
super().__init__()
self.in_queue = in_queue
self.out_queue = out_queue
self.func = func
def run(self):
while True:
arg = self.in_queue.get()
self.func(arg, self.out_queue)
self.in_queue.task_done()
def get_urls(host, out_queue):
r = requests.get(host)
out_queue.put(r.text)
print(r.status_code, host)
def get_title(text, out_queue):
print(text.strip('\r\n ')[:5])
if __name__ == '__main__':
def test():
q1 = JoinableQueue()
q2 = JoinableQueue()
for i in range(2):
t = ThreadClass(get_urls, q1, q2)
t.daemon = True
t.setDaemon(True)
t.start()
for i in range(2):
t = ProcessClass(get_title, q2, None)
t.daemon = True
t.start()
for host in ("http://ibm.com", "http://yahoo.com", "http://google.com", "http://amazon.com", "http://apple.com",):
q1.put(host)
q1.join()
q2.join()
test()
print('Finished')
Program output:
200 http://ibm.com
<!DOC
200 http://google.com
<!doc
200 http://yahoo.com
<!DOC
200 http://apple.com
<!DOC
200 http://amazon.com
<!DOC
Finished
Exception in thread Thread-2:
Traceback (most recent call last):
File "C:\Python\33\lib\multiprocessing\connection.py", line 313, in _recv_bytes
nread, err = ov.GetOverlappedResult(True)
BrokenPipeError: [WinError 109]
The pipe has been ended
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Python\33\lib\threading.py", line 901, in _bootstrap_inner
self.run()
File "D:\Progs\Uspat\uspat\spider\run\threads_test.py", line 31, in run
arg = self.in_queue.get()
File "C:\Python\33\lib\multiprocessing\queues.py", line 94, in get
res = self._recv()
File "C:\Python\33\lib\multiprocessing\connection.py", line 251, in recv
buf = self._recv_bytes()
File "C:\Python\33\lib\multiprocessing\connection.py", line 322, in _recv_bytes
raise EOFError
EOFError
....
(cut same errors for other threads.)
If I switch JoinableQueue to queue.Queue for multithreading part, everything fixes, but why?
This is happening because you're leaving the background threads blocking in a multiprocessing.Queue.get call when the main thread exits, but it only happens in certain conditions:
A daemon thread is running and blocking on a multiprocessing.Queue.get when the main thread exits.
A multiprocessing.Process is running.
The multiprocessing context is something other than 'fork'.
The exception is telling you that the other end of the Connection that the multiprocessing.JoinableQueue is listening to when its inside of a get() call sent an EOF. Generally this means the other side of the Connection has closed. It makes sense that this happens during shutdown - Python is cleaning up all objects prior to exiting the interpreter, and part of that clean up involves closing all the open Connection objects. What I haven't been able to figure out yet is why it only (and always) happens if a multiprocessing.Process has been spawned (not forked, which is why it doesn't happen on Linux by default) and is still running. I can even reproduce it if I create a multiprocessing.Process that just sleeps in a while loop. It doesn't take any Queue objects at all. For whatever reason, the presence of a running, spawned child process seems to guarantee the exception will be raised. It might simply cause the order that things are destroyed to be just right for race condition to occur, but that's a guess.
In any case, using a queue.Queue instead of multiprocessing.JoinableQueue is a good way to fix it, since you don't actually need a multiprocessing.Queue there. You could also make sure that the background threads and/or background processes are shut down before the main thread, by sending sentinels to their queues. So, make both run methods check for the sentinel:
def run(self):
for arg in iter(self.in_queue.get, None): # None is the sentinel
self.func(arg, self.out_queue)
self.in_queue.task_done()
self.in_queue.task_done()
And then send the sentinels when you're done:
threads = []
for i in range(2):
t = ThreadClass(get_urls, q1, q2)
t.daemon = True
t.setDaemon(True)
t.start()
threads.append(t)
p = multiprocessing.Process(target=blah)
p.daemon = True
p.start()
procs = []
for i in range(2):
t = ProcessClass(get_title, q2, None)
t.daemon = True
t.start()
procs.append(t)
for host in ("http://ibm.com", "http://yahoo.com", "http://google.com", "http://amazon.com", "http://apple.com",):
q1.put(host)
q1.join()
# All items have been consumed from input queue, lets start shutting down.
for t in procs:
q2.put(None)
t.join()
for t in threads:
q1.put(None)
t.join()
q2.join()
Here's some slimmed down code that demonstrates my use of threading:
import threading
import Queue
import time
def example():
""" used in MainThread as the example generator """
while True:
yield 'asd'
class ThreadSpace:
""" A namespace to be shared among threads/functions """
# set this to True to kill the threads
exit_flag = False
class MainThread(threading.Thread):
def __init__(self, output):
super(MainThread, self).__init__()
self.output = output
def run(self):
# this is a generator that contains a While True
for blah in example():
self.output.put(blah)
if ThreadSpace.exit_flag:
break
time.sleep(0.1)
class LoggerThread(threading.Thread):
def __init__(self, output):
super(LoggerThread, self).__init__()
self.output = output
def run(self):
while True:
data = self.output.get()
print data
def main():
# start the logging thread
logging_queue = Queue.Queue()
logging_thread = LoggerThread(logging_queue)
logging_thread.daemon = True
logging_thread.start()
# launch the main thread
main_thread = MainThread(logging_queue)
main_thread.start()
try:
while main_thread.isAlive():
time.sleep(0.5)
except KeyboardInterrupt:
ThreadSpace.exit_flag = True
if __name__ == '__main__':
main()
I have one main thread which gets data yielded to it from a blocking generator. In the real code, this generator yields network related data it sniffs out over the socket.
I then have a logging, daemon, thread which prints the data to screen.
To exit the program cleanly, I'm catching a KeyboardInterrupt which will set an exit_flag to try - This tells the main thread to return.
9 times out of 10, this will work fine. The program will exit cleanly. However, there's a few occasions when I'll receive the following two errors:
Error 1:
^CTraceback (most recent call last):
File "demo.py", line 92, in <module>
main('')
File "demo.py", line 87, in main
time.sleep(0.5)
KeyboardInterrupt
Error 2:
Exception KeyboardInterrupt in <module 'threading' from '/usr/lib/python2.7/threading.pyc'> ignored
I've run this exact sample code a few times and haven't been able to replicate the errors. The only difference between this and the real code is the example() generator. This, like I said, yields network data from the socket.
Can you see anything wrong with how I'm handling the threads?
KeyboardInterrupts are received by arbitrary threads. If the receiver isn't the main thread, it dies, the main thread is unaffected, ThreadSpace.exit_flag remains false, and the script keeps running.
If you want sigint to work, you can have each thread catch KeyboardInterrupt and call thread.interrupt_main() to get Python to exit, or use the signal module as the official documentation explains.
I am trying to run a simple multiple processes application in Python. The main thread spawns 1 to N processes and waits until they all done processing. The processes each run an infinite loop, so they can potentially run forever without some user interruption, so I put in some code to handle a KeyboardInterrupt:
#!/usr/bin/env python
import sys
import time
from multiprocessing import Process
def main():
# Set up inputs..
# Spawn processes
Proc( 1).start()
Proc( 2).start()
class Proc ( Process ):
def __init__ ( self, procNum):
self.id = procNum
Process.__init__(self)
def run ( self ):
doneWork = False
while True:
try:
# Do work...
time.sleep(1)
sys.stdout.write('.')
if doneWork:
print "PROC#" + str(self.id) + " Done."
break
except KeyboardInterrupt:
print "User aborted."
sys.exit()
# Main Entry
if __name__=="__main__":
main()
The problem is that when using CTRL-C to exit, I get an additional error even though the processes seem to exit immediately:
......User aborted.
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "C:\Python26\lib\atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "C:\Python26\lib\multiprocessing\util.py", line 281, in _exit_function
p.join()
File "C:\Python26\lib\multiprocessing\process.py", line 119, in join
res = self._popen.wait(timeout)
File "C:\Python26\lib\multiprocessing\forking.py", line 259, in wait
res = _subprocess.WaitForSingleObject(int(self._handle), msecs)
KeyboardInterrupt
Error in sys.exitfunc:
Traceback (most recent call last):
File "C:\Python26\lib\atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "C:\Python26\lib\multiprocessing\util.py", line 281, in _exit_function
p.join()
File "C:\Python26\lib\multiprocessing\process.py", line 119, in join
res = self._popen.wait(timeout)
File "C:\Python26\lib\multiprocessing\forking.py", line 259, in wait
res = _subprocess.WaitForSingleObject(int(self._handle), msecs)
KeyboardInterrupt
I am running Python 2.6 on Windows. If there is a better way to handle multiprocessing in Python, please let me know.
This is a very old question, but it seems like the accepted answer does not actually fix the problem.
The main issue is that you need to handle the keyboard interrupt in the parent process as well. Additionally to that, in the while loop, you just need to exit the loop, there's no need to call sys.exit()
I've tried to as closely match the example in the original question. The doneWork code does nothing in the example so have removed that for clarity.
import sys
import time
from multiprocessing import Process
def main():
# Set up inputs..
# Spawn processes
try:
processes = [Proc(1), Proc(2)]
[p.start() for p in processes]
[p.join() for p in processes]
except KeyboardInterrupt:
pass
class Proc(Process):
def __init__(self, procNum):
self.id = procNum
Process.__init__(self)
def run(self):
while True:
try:
# Do work...
time.sleep(1)
sys.stdout.write('.')
except KeyboardInterrupt:
print("User aborted.")
break
# Main Entry
if __name__ == "__main__":
main()
Rather then just forcing sys.exit(), you want to send a signal to your threads to tell them to stop. Look into using signal handlers and threads in Python.
You could potentially do this by changing your while True: loop to be while keep_processing: where keep_processing is some sort of global variable that gets set on the KeyboardInterrupt exception. I don't think this is a good practice though.