I'm doing some calculations with Sage.
I am playing around with fork. I have a very simple test case which is basically like this:
def fork_test():
import os
pid = os.fork()
if pid != 0:
print "parent, child: %i" % pid
os.waitpid(pid, 0)
else:
print "child"
try:
# some dummy matrix calculation
finally:
os._exit(0)
(Look below for _fork_test_func() for some matrix calculations.)
And I'm getting:
------------------------------------------------------------------------
Unhandled SIGILL: An illegal instruction occurred in Sage.
This probably occurred because a *compiled* component of Sage has a bug
in it and is not properly wrapped with sig_on(), sig_off(). You might
want to run Sage under gdb with 'sage -gdb' to debug this.
Sage will now terminate.
------------------------------------------------------------------------
With this (incomplete) backtrace:
Crashed Thread: 0 Dispatch queue: com.apple.root.default-priority
Exception Type: EXC_BAD_INSTRUCTION (SIGILL)
Exception Codes: 0x0000000000000001, 0x0000000000000000
Application Specific Information:
BUG IN LIBDISPATCH: flawed group/semaphore logic
Thread 0 Crashed:: Dispatch queue: com.apple.root.default-priority
0 libsystem_kernel.dylib 0x00007fff8c6d1d46 __kill + 10
1 libcsage.dylib 0x0000000101717f33 sigdie + 124
2 libcsage.dylib 0x0000000101717719 sage_signal_handler + 364
3 libsystem_c.dylib 0x00007fff86b1094a _sigtramp + 26
4 libdispatch.dylib 0x00007fff89a66c74 _dispatch_thread_semaphore_signal + 27
5 libdispatch.dylib 0x00007fff89a66f3e _dispatch_apply2 + 143
6 libdispatch.dylib 0x00007fff89a66e30 dispatch_apply_f + 440
7 libBLAS.dylib 0x00007fff906ca435 APL_dtrsm + 1963
8 libBLAS.dylib 0x00007fff906702b6 cblas_dtrsm + 882
9 matrix_modn_dense_double.so 0x0000000108612615 void FFLAS::Protected::ftrsmRightLowerNoTransUnit<double>::delayed<FFPACK::Modular<double> >(FFPACK::Modular<double> const&, unsigned long, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, unsigned long, unsigned long) + 2853
10 matrix_modn_dense_double.so 0x0000000108611daa void FFLAS::Protected::ftrsmRightLowerNoTransUnit<double>::delayed<FFPACK::Modular<double> >(FFPACK::Modular<double> const&, unsigned long, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, unsigned long, unsigned long) + 698
11 matrix_modn_dense_double.so 0x0000000108612ccf void FFLAS::Protected::ftrsmRightLowerNoTransUnit<double>::operator()<FFPACK::Modular<double> >(FFPACK::Modular<double> const&, unsigned long, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, FFPACK::Modular<double>::Element*, unsigned long) + 831
12 ??? 0x00007f99e481a028 0 + 140298940424232
Thread 1:
0 libsystem_kernel.dylib 0x00007fff8c6d26d6 __workq_kernreturn + 10
1 libsystem_c.dylib 0x00007fff86b24f4c _pthread_workq_return + 25
2 libsystem_c.dylib 0x00007fff86b24d13 _pthread_wqthread + 412
3 libsystem_c.dylib 0x00007fff86b0f1d1 start_wqthread + 13
Thread 2:
0 libsystem_kernel.dylib 0x00007fff8c6d26d6 __workq_kernreturn + 10
1 libsystem_c.dylib 0x00007fff86b24f4c _pthread_workq_return + 25
2 libsystem_c.dylib 0x00007fff86b24d13 _pthread_wqthread + 412
3 libsystem_c.dylib 0x00007fff86b0f1d1 start_wqthread + 13
Thread 0 crashed with X86 Thread State (64-bit):
rax: 0x0000000000000000 rbx: 0x00007fff5ec8e418 rcx: 0x00007fff5ec8df28 rdx: 0x0000000000000000
rdi: 0x000000000000b8f7 rsi: 0x0000000000000004 rbp: 0x00007fff5ec8df40 rsp: 0x00007fff5ec8df28
r8: 0x00007fff5ec8e418 r9: 0x0000000000000000 r10: 0x000000000000000a r11: 0x0000000000000202
r12: 0x00007f99ea500de0 r13: 0x0000000000000003 r14: 0x00007fff5ec8e860 r15: 0x00007fff906ca447
rip: 0x00007fff8c6d1d46 rfl: 0x0000000000000202 cr2: 0x00007fff74a29848
Logical CPU: 0
Is there something special I need to do after a fork? I looked up the fork decorator of Sage and it looks like it basically does the same.
The crash also happens with the fork decorator of Sage itself. Another test case:
def fork_test2():
def test():
# do some stuff
from sage.parallel.decorate import fork
test_ = fork(test, verbose=True)
test_()
Even simpler test case:
def _fork_test_func():
while True:
m = matrix(QQ, 100, [randrange(-100,100) for i in range(100*100)])
m.right_kernel()
def fork_test():
import os
pid = os.fork()
if pid != 0:
print "parent, child: %i" % pid
os.waitpid(pid, 0)
else:
print "child"
try:
_fork_test_func()
finally:
os._exit(0)
Results in a slightly different crash:
python(48672) malloc: *** error for object 0x11185f000: pointer being freed already on death-row
*** set a breakpoint in malloc_error_break to debug
With backtrace:
Crashed Thread: 1 Dispatch queue: com.apple.root.default-priority
Exception Type: EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Application Specific Information:
*** error for object 0x11185f000: pointer being freed already on death-row
Thread 0:: Dispatch queue: com.apple.main-thread
0 matrix2.so 0x0000000107fa403f __pyx_pw_4sage_6matrix_7matrix2_6Matrix_71right_kernel_matrix + 27551
1 ??? 0x000000000000000d 0 + 13
Thread 1 Crashed:: Dispatch queue: com.apple.root.default-priority
0 libsystem_kernel.dylib 0x00007fff8c6d239a __semwait_signal_nocancel + 10
1 libsystem_c.dylib 0x00007fff86b17e1b nanosleep$NOCANCEL + 138
2 libsystem_c.dylib 0x00007fff86b7b9a8 usleep$NOCANCEL + 54
3 libsystem_c.dylib 0x00007fff86b67eca __abort + 203
4 libsystem_c.dylib 0x00007fff86b67dff abort + 192
5 libsystem_c.dylib 0x00007fff86b43905 szone_error + 580
6 libsystem_c.dylib 0x00007fff86b43f7d free_large + 229
7 libsystem_c.dylib 0x00007fff86b3b8f8 free + 199
8 libBLAS.dylib 0x00007fff906b0431 __APL_dgemm_block_invoke_0 + 132
9 libdispatch.dylib 0x00007fff89a65f01 _dispatch_call_block_and_release + 15
10 libdispatch.dylib 0x00007fff89a620b6 _dispatch_client_callout + 8
11 libdispatch.dylib 0x00007fff89a631fa _dispatch_worker_thread2 + 304
12 libsystem_c.dylib 0x00007fff86b24d0b _pthread_wqthread + 404
13 libsystem_c.dylib 0x00007fff86b0f1d1 start_wqthread + 13
The same happens also for this:
def fork_test2():
from sage.parallel.decorate import fork
test_ = fork(_fork_test_func, verbose=True)
test_()
-- but only if you used some other matrix calculations before.
This test case also works on a fresh Sage session:
def _fork_test_func(iterator=None):
if not iterator:
import itertools
iterator = itertools.count()
for i in iterator:
m = matrix(QQ, 100, [randrange(-100,100) for i in range(100*100)])
m.right_kernel()
def fork_test():
_fork_test_func(range(10))
import os
pid = os.fork()
if pid != 0:
print "parent, child: %i" % pid
os.waitpid(pid, 0)
else:
print "child"
try:
_fork_test_func()
finally:
os._exit(0)
I have downloaded the binaries for MacOSX 64bit of Sage 5.8.
(Note that I also asked on ask.sagemath.org here.)
Both of these crashreports indicate that a multi-threaded process fork()ed, which greatly restricts the set of operations that are safe to execute in the child, essentially you can only call execve() et al, along with a few other functions from the list of async-signal-safe functions
This is documented in the CAVEATS section of the fork(2) manpage as well as in the standard:
A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources. Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called.
Since many APIs in Mac OS X frameworks will cause the process to become multithreaded, if you want the fork-child to be fully useable, you must limit you operations in the parent process before fork to APIs documented not to make a process multithreaded (essentially only POSIX APIs).
Related
To take advantage of several CPU cores in a Python program, I am using the multiprocessing module and sending data via its Pipe class. But when the main program closes the sending end, the child processes are blocking on recv() instead of raising an EOFError exception. This is caused by open file descriptors, which need to be closed in the other process context first, as described in these (and other) answers:
Why doesn't pipe.close() cause EOFError during pipe.recv() in python multiprocessing?
Python multiprocessing pipe recv() doc unclear or did I miss anything?
My problem is that when consecutively creating two Processes with Pipes, the second one inherits the remaining, "parent" end file descriptor of the first one's Pipe. So closing the first Pipe will lead to hanging instead of EOFError again, even though each Pipe's unused ends were closed as recommended.
This code illustrates the problem, Linux only:
import os
import time
import multiprocessing as mp
import subprocess
class MeasurementWriter:
def __init__(self, name):
self.name = name
self.parent_conn = None
self.worker = None
def open(self):
conn_pair = mp.Pipe()
self.worker = mp.Process(target=self.run, name=self.name, args=(conn_pair,))
self.worker.start()
self.parent_conn, child_conn = conn_pair
print('pid %d started %d; fds: %d %d'
% (os.getpid(), self.worker.pid,
self.parent_conn.fileno(), child_conn.fileno()))
# Close the other end, as it is not needed in our process context
child_conn.close()
subprocess.call(["ls", "-l", "/proc/%d/fd" % os.getpid()])
def close(self):
if self.parent_conn is None:
print('not open')
return
print('closing pipe', self.parent_conn.fileno())
self.parent_conn.close()
print('joining worker')
self.worker.join() # HANGS if more than one mp.Process has been started!
def run(self, conn_pair):
parent_conn, conn = conn_pair
print('%s pid %d started; fds: %d %d'
% (self.name, os.getpid(), parent_conn.fileno(), conn.fileno()))
# Close the other end, as it is not needed in our process context
parent_conn.close()
time.sleep(0.5)
print(self.name, 'parent_conn.closed =', parent_conn.closed)
subprocess.call(["ls", "-l", "/proc/%d/fd" % os.getpid()])
try:
print(self.name, 'recv blocking...')
data = conn.recv()
print(self.name, 'recv', data)
except EOFError:
print(self.name, 'EOF')
conn.close()
if __name__ == '__main__':
a = MeasurementWriter('A')
a.open()
# Increase fd numbers to make them recognizable
n = open('/dev/null')
z = open('/dev/zero')
# Wait for debug printing to complete
time.sleep(1)
b = MeasurementWriter('B')
b.open() # Uncomment to see clean exit
time.sleep(2)
# Clean up
a.close() # HANGS: The parent_conn fd is still open in the second Process
b.close()
The output is as follows (some uninteresting fd lines omitted). Tested with Python 3.5 and 3.8.10 under Linux:
pid 592770 started 592771; fds: 3 4
A pid 592771 started; fds: 3 4
total 0
lrwx------ 1 acolomb acolomb 64 Mar 18 19:02 3 -> 'socket:[8294651]'
lr-x------ 1 acolomb acolomb 64 Mar 18 19:02 5 -> 'pipe:[8294653]'
A parent_conn.closed = True
total 0
lrwx------ 1 acolomb acolomb 64 Mar 18 19:02 4 -> 'socket:[8294652]'
lr-x------ 1 acolomb acolomb 64 Mar 18 19:02 5 -> /dev/null
l-wx------ 1 acolomb acolomb 64 Mar 18 19:02 6 -> 'pipe:[8294653]'
A recv blocking...
pid 592770 started 592774; fds: 7 8
B pid 592774 started; fds: 7 8
total 0
lrwx------ 1 acolomb acolomb 64 Mar 18 19:02 3 -> 'socket:[8294651]'
lr-x------ 1 acolomb acolomb 64 Mar 18 19:02 4 -> /dev/null
lr-x------ 1 acolomb acolomb 64 Mar 18 19:02 5 -> 'pipe:[8294653]'
lr-x------ 1 acolomb acolomb 64 Mar 18 19:02 6 -> /dev/zero
lrwx------ 1 acolomb acolomb 64 Mar 18 19:02 7 -> 'socket:[8294672]'
lr-x------ 1 acolomb acolomb 64 Mar 18 19:02 9 -> 'pipe:[8294674]'
B parent_conn.closed = True
total 0
l-wx------ 1 acolomb acolomb 64 Mar 18 19:02 10 -> 'pipe:[8294674]'
lrwx------ 1 acolomb acolomb 64 Mar 18 19:02 3 -> 'socket:[8294651]'
lr-x------ 1 acolomb acolomb 64 Mar 18 19:02 4 -> /dev/null
lr-x------ 1 acolomb acolomb 64 Mar 18 19:02 5 -> 'pipe:[8294653]'
lr-x------ 1 acolomb acolomb 64 Mar 18 19:02 6 -> /dev/zero
lrwx------ 1 acolomb acolomb 64 Mar 18 19:02 8 -> 'socket:[8294673]'
lr-x------ 1 acolomb acolomb 64 Mar 18 19:02 9 -> /dev/null
B recv blocking...
closing pipe 3
joining worker
We can see that the youngest process (B) has inherited fd number 3 that belongs to A's Pipe on the parent end. Therefore closing it will not lead to terminating A's process, as it is still referenced. How can I avoid subsequent child processes inheriting the file descriptors of another child's Pipe objects?
For this simple example, switching the order of the .close() calls would probably help, but in reality they may be started in random order based on user interactions. The intended use is to write several output streams (one MeasurementWriter instance for each) with transparent compression being handled in an associated child process, to not block the main process regularly.
One suggestion I found at https://microeducate.tech/using-python-multiprocessing-pipes/ keeps track of all pipe ends in the parent process using a list, then closing all unrelated ones in newly created child processes. But I have no good place for such a "manager", as these objects come and go during the app lifetime.
In a real-life situation one process would probably be in a loop doing recv calls on its connection. Since we see that getting an EOFError exception is undependable when the connection is closed on the other end, the simplest solution is for the sending end to signal "end of file" by issuing a send call on the connection with a special sentinel item that cannot be mistaken for a normal data item. None is often is suitable for that purpose.
So modify method method close to be:
def close(self):
if self.parent_conn is None:
print('not open')
return
print('closing pipe', self.parent_conn.fileno())
self.parent_conn.send(None) # Sentinel
self.parent_conn.close()
print('joining worker')
self.worker.join() # HANGS if more than one mp.Process has been started!
And a more realistic run method might be:
def run(self, conn_pair):
parent_conn, conn = conn_pair
print('%s pid %d started; fds: %d %d'
% (self.name, os.getpid(), parent_conn.fileno(), conn.fileno()))
# Close the other end, as it is not needed in our process context
parent_conn.close()
time.sleep(0.5)
print(self.name, 'parent_conn.closed =', parent_conn.closed)
subprocess.call(["ls", "-l", "/proc/%d/fd" % os.getpid()])
try:
while True:
print(self.name, 'recv blocking...')
data = conn.recv()
if data is None: # Sentinel?
break
print(self.name, 'recv', data)
except EOFError:
print(self.name, 'EOF')
conn.close()
I am not sure whether this is an iOS issue or whether this is an issue with Kivy or even with Python (e.g. https://bugs.python.org/issue37788), but I am experiencing some problems with threading.
I have built an iPad app using the Kivy framework that makes several calls to an API, and uses the threading module to asynchronously make requests. Below is the code that handles the API requests:
import json
import requests
import base64
import threading
def thread(function):
def wrap(*args, **kwargs):
t = threading.Thread(target=function, args=args, kwargs=kwargs)
t.start()
return t
return wrap
class MathPixAPI:
stroke_url = '*******************'
header = {
"content-type": "application/json",
"app_id": "*******************",
"app_key": "*******************"
}
#thread
def post_data(self, file_name: str, root):
"""
Posts a base64 encoded image to the MathPixAPI then updates the data DictProperty of the ExpressionWriter that
calls this function
:param file_name: The name of the file - e.g. "image.png"
:param root: The ExpressionWriter that calls the function
"""
image_uri = "data:image/png;base64," + base64.b64encode(open(file_name, "rb").read()).decode()
r = requests.post("https://api.mathpix.com/v3/text",
data=json.dumps({'src': image_uri}),
headers=self.header)
root.data = json.loads(r.text)
The app makes no more than 5 asynchronous requests at one time, and is called from the function below:
def get_image_data(self):
"""
The function first saves the ExpressionWriter.canvas as a PNG file to the user_data_directory (automatically
determined depending on the device the user is running the app on). Then this images is sent to the MathPix API
which then return data on the handwritten answer (see api.py for more details). The api call updates self.data
which in turn calls self._on_data().
"""
file_name = f'{App.get_running_app().user_data_dir}/image_{self.number}.png'
self.export_to_png(file_name)
MathPixAPI().post_data(file_name, self)
This works really well, up until the 20th-25th request, upon which the program halts. In Xcode I receive the following error log:
021-04-09 18:11:02.300179+0100 ccc-writer-3[4261:4790641] [Animation] +[UIView setAnimationsEnabled:] being called from a background thread. Performing any operation from a background thread on UIView or a subclass is not supported and may result in unexpected and insidious behavior. trace=(
0 UIKitCore 0x0000000187cbb538 8518EAE3-832B-3FF0-9FA5-9DBE3041F26C + 17859896
1 libdispatch.dylib 0x0000000101ce56c0 _dispatch_client_callout + 20
2 libdispatch.dylib 0x0000000101ce71f8 _dispatch_once_callout + 136
3 UIKitCore 0x0000000187cbb4bc 8518EAE3-832B-3FF0-9FA5-9DBE3041F26C + 17859772
4 UIKitCore 0x0000000187cbb628 8518EAE3-832B-3FF0-9FA5-9DBE3041F26C + 17860136
5 UIKitCore 0x0000000187abbd64 8518EAE3-832B-3FF0-9FA5-9DBE3041F26C + 15764836
6 UIKitCore 0x0000000187aae150 8518EAE3-832B-3FF0-9FA5-9DBE3041F26C + 15708496
7 UIKitCore 0x00000001877b2f20 8518EAE3-832B-3FF0-9FA5-9DBE3041F26C + 12582688
8 UIKitCore 0x0000000187cb2b30 8518EAE3-832B-3FF0-9FA5-9DBE3041F26C + 17824560
9 UIKitCore 0x0000000187aacd50 8518EAE3-832B-3FF0-9FA5-9DBE3041F26C + 15703376
10 ccc-writer-3 0x0000000100822960 -[SDL_uikitviewcontroller showKeyboard] + 108
11 ccc-writer-3 0x0000000100823164 UIKit_ShowScreenKeyboard + 60
12 ccc-writer-3 0x00000001007ec490 SDL_StartTextInput + 92
... [A whole bunch of memory addresses] ...
74 ccc-writer-3 0x0000000100610df4 _PyEval_EvalFrameDefault + 5432
75 ccc-writer-3 0x000000010054dfe0 function_code_fastcall + 120
76 ccc-writer-3 0x00000001005505f8 method_vectorcall + 264
77 ccc-writer-3 0x000000010054d95c PyVectorcall_Call + 104
78 ccc-writer-3 0x0000000100770c40 t_bootstrap + 80
79 ccc-writer-3 0x000000010065e8e8 pythread_wrapper + 28
80 libsystem_pthread.dylib 0x00000001cfbb3cb0 _pthread_start + 320
81 libsystem_pthread.dylib 0x00000001cfbbc778 thread_start + 8
)
2021-04-09 18:11:02.308745+0100 ccc-writer-3[4261:4790641] *** Assertion failure in -[_UISimpleFenceProvider trackSystemAnimationFence:], _UISimpleFenceProvider.m:51
2021-04-09 18:11:02.311976+0100 ccc-writer-3[4261:4790641] *** Terminating app due to uncaught exception 'NSInternalInconsistencyException', reason: 'main thread only'
*** First throw call stack:
(0x184dc686c 0x199de1c50 0x184ccc000 0x18606091c 0x186cd20bc 0x187777d30 0x1877cb888 0x186c00e58 0x1875b2610 0x1871c71b8 0x1871c54d0 0x1871c51f0 0x1871c674c 0x1871c67c8 0x1871c682c 0x187541c94 0x1871c3478 0x1871c2b88 0x1877b7f58 0x1877b2fc8 0x187cb2b30 0x187aacd50 0x100822960 0x100823164 0x1007ec490 0x1008d30b8 0x10055680c 0x100614e6c 0x100610df4 0x100615e98 0x10054e160 0x10055058c 0x100614e6c 0x100611d50 0x10054dfe0 0x100614e6c 0x100610df4 0x10054dfe0 0x100614e6c 0x100610df4 0x100615e98 0x10054e160 0x100550664 0x10054d95c 0x100612888 0x100615e98 0x10060f87c 0x100859f50 0x10085e5dc 0x10085f020 0x100bc7598 0x100bc5c78 0x100beac94 0x100591568 0x1005908dc 0x100c116a0 0x1005908dc 0x100610d94 0x10054dfe0 0x100614e6c 0x100610df4 0x100615e98 0x10054e160 0x10055058c 0x100614e6c 0x100611d50 0x100615e98 0x10060f87c 0x100859f50 0x10085e5dc 0x10085f020 0x100bc7598 0x100bc5c78 0x100bcbdc8 0x100beac94 0x100591568 0x1005908dc 0x100610d94 0x10054dfe0 0x10054d95c 0x100612888 0x10054dfe0 0x100614e6c 0x100610df4 0x10054dfe0 0x100614e6c 0x100610df4 0x10054dfe0 0x1005505f8 0x10054d95c 0x100770c40 0x10065e8e8 0x1cfbb3cb0 0x1cfbbc778)
libc++abi.dylib: terminating with uncaught exception of type NSException
*** Terminating app due to uncaught exception 'NSInternalInconsistencyException', reason: 'main thread only'
terminating with uncaught exception of type NSException
I am not sure what "main thread only" means nor do I have any idea how to resolve this issue. Can anyone clarify what this means and explain what the problem is? How can I stop my program from halting?
Thanks in advance.
I fixed this issue in Xcode by amending the runtime API checking.
Navigate to:
Product > Scheme > Edit Scheme > Run / Debug > Diagnostics
then deselect Main Thread Checker
Running the following code:
import threading
import pyaudio
from matplotlib import pyplot as plt
def output():
p = pyaudio.PyAudio()
stream_ = p.open(format=pyaudio.paFloat32,
channels=1,
rate=8000,
output=True)
stream_.stop_stream()
stream_.close()
p.terminate()
output_thread = threading.Thread(target=output, args=())
output_thread.start()
output_thread.join()
fig = plt.figure()
ax0 = fig.add_subplot(111)
ax0.plot([1,2,3])
plt.show()
causes Python to crash with the error below. How might I solve this? I am running Python 3.8, PyAudio 0.2.11 and Matplotlib 3.3.1 and Mac Os version 10.15.5.
Crashed Thread: 0 Dispatch queue: com.apple.main-thread
Exception Type: EXC_BAD_INSTRUCTION (SIGILL)
Exception Codes: 0x0000000000000001, 0x0000000000000000
Exception Note: EXC_CORPSE_NOTIFY
Termination Signal: Illegal instruction: 4
Termination Reason: Namespace SIGNAL, Code 0x4
Terminating Process: exc handler [4840]
Application Specific Information:
The current event queue and the main event queue are not the same. This is probably because _TSGetMainThread was called for the first time off the main thread. _TSGetMainThread was called for the first time here:
0 CarbonCore 0x00007fff36993345 _TSGetMainThread + 138
1 CarbonCore 0x00007fff3699324a GetThreadGlobals + 26
2 CarbonCore 0x00007fff3699d5e4 NewPtrClear + 14
3 CarbonCore 0x00007fff369b4aba AVLInit + 62
4 CarbonCore 0x00007fff369b49f1 __INIT_Folders_block_invoke + 9
5 libdispatch.dylib 0x00007fff6f603658 _dispatch_client_callout + 8
6 libdispatch.dylib 0x00007fff6f6047de _dispatch_once_callout + 20
Ok if I run:
fig = plt.figure()
before I create the thread I can avoid this crash. I'm guessing this allows me to call _TSGetMainThread on the main thread for the first time.
I am trying to accelerate my python3 script using multi-processing by running 4 processes simultaneously. However, my processes never reaches 100% CPU utilization. The core of my code simply reads a .mp3 recording and do some recognition on it using scikit-learn then saves the results to a .json.
here is my top output:
top - 17:07:07 up 18 days, 3:31, 4 users, load average: 3.73, 3.67, 3.87
Tasks: 137 total, 1 running, 75 sleeping, 18 stopped, 0 zombie
%Cpu(s): 32.8 us, 20.3 sy, 0.0 ni, 46.3 id, 0.0 wa, 0.0 hi, 0.5 si, 0.1 st
KiB Mem : 8167880 total, 2683088 free, 4314756 used, 1170036 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 3564064 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5832 am 20 0 1887644 776736 24076 S 63.0 9.5 201:10.19 python3
5829 am 20 0 1956336 845556 24348 S 55.0 10.4 200:31.20 python3
5830 am 20 0 2000772 890260 23820 S 55.0 10.9 200:39.80 python3
5834 am 20 0 2430932 1.260g 24252 S 50.3 16.2 200:45.52 python3
4657 am 20 0 108116 4460 3424 S 0.3 0.1 1:11.48 sshd
6564 root 20 0 0 0 0 I 0.3 0.0 7:30.08 kworker/2:1
1 root 20 0 225212 6660 4452 S 0.0 0.1 0:26.33 systemd
......
As you can see in the previous output, there is no heavy load on the memory so the limited CPU utilization cannot be related to I/O or Memory.
Is there anyway to 'force' python to use all 100% ? or how can I debug my code to figure out what is causing this? and if I am missing something obvious, how can I change my code to reach 100% CPU utilization?
Here is a small overview of my main multi-processing code:
# -*- coding: utf-8 -*-
import os
import time
import logging
import cProfile
import multiprocessing as mp
from packages.Recognizer import Recognizer
from packages.RecordingFile import RecordingFile
from packages.utils.pickle_utils import pickle_load
_PRINTS = True
class ServerSER:
def __init__(self, date, model_fname, results_path,
nprocs=1, run_type="server"):
# bunch of inits
def process_files(self):
# Setup a list of processes that we want to run
self.processes = [mp.Process(target=self.recognition_func, args=("processes/p" + str(p),
self.model_obj, p, self.output))
for p in range(self.nprocs)]
# Run processes
for p in self.processes: p.start()
# Exit the completed processes
for p in self.processes: p.join()
# Get process results from the output queue
self.results = []
for p in self.processes:
try:
r = self.output.get_nowait()
self.results.append(r)
except Exception as e:
print(e)
return [e[1][0] for e in self.results]
def recognition_func(self, pfolder, model_obj, pos, output, profile=True):
# start profiling
pr = cProfile.Profile()
pr.enable()
# start logging
logger_name = "my-logger" + str(pos)
logging.basicConfig(format='%(asctime)s %(levelname)5s %(message)s',
level=logging.INFO, filename=logger_name + ".txt")
logging.info("Start logging for process number " + str(pos))
# start processing until no files available
while len([f for f in os.listdir(pfolder) if ".mp3" in f]) > 0:
# get oldest file
oldest_file = [f for f in self.sorted_ls(pfolder) if ".mp3" in f][0]
# process
try:
recording = RecordingFile(pfolder=pfolder,
base_url=self.base_url,
fpath=oldest_file,
results_path=self.results_path)
if _PRINTS:
msg = "%10s : %50s" % ("PROCESSING", oldest_file)
logging.info(msg)
# prints
print("------------------------------------------------------------------------")
print("%10s : %50s" % ("PROCESSING", oldest_file))
print("------------------------------------------------------------------------")
# recognize for file
_ = Recognizer(recording=recording,
duration_step=1,
channel=1,
model_obj=self.model_obj)
# clean ups
recording.delete_files()
print("%10s." % ("DONE"))
print("------------------------------------------------------------------------")
except Exception as e:
self.errors[oldest_file] = e
logging.warn(e)
print(e, " while processing ", oldest_file)
# put results in queue
self.output.put((pos, [self.errors, self.durations]))
# save profiling results
# pr.print_stats(sort='time')
pr.disable()
pr.dump_stats("incode_cprofiler_output" + str(pos) + ".txt")
return True
output of uname -a:
Linux 4.15.0-70-generic #79-Ubuntu SMP x86_64 x86_64 x86_64 GNU/Linux
output of lscpu:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 94
Model name: Intel Core Processor (Skylake, IBRS)
Stepping: 3
CPU MHz: 3696.000
BogoMIPS: 7392.00
Virtualization: VT-x
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 4096K
L3 cache: 16384K
NUMA node0 CPU(s): 0-3
EDIT
When playing a bit with the number of processes, the following happens:
In case I use 1 process, CPU usage is by 110%.
For 2 processes, CPU usage is at 80%.
With 6 processes, each process is around 50% CPU usage.
I am using the pynetdicom library to receive and process medical dicom images. The processing is performed in the callback function "on_association_released". However, when receiving certain studies, it will cause Python to crash due to what appears to be a child thread crashing.
From the OSX crash report it seems to be libdispatch library that is the cause but not sure how or why.
This is the function:
def on_association_released(self):
if not self.auto_process:
self.incoming = []
return
dicoms = [Dicom(f=x) for x in self.incoming]
self.incoming = []
incoming = Study(dicom_list=dicoms)
log.info("Incoming study: {incoming}".format(**locals()))
completed_tasks = {}
time.sleep(1)
for task in AVAILABLE_PROCESS_TASKS:
log.info("Trying task: {task}".format(**locals()))
process_task = task(study=incoming)
try:
if process_task.valid:
log.info("{incoming} is valid for {process_task}".format(**locals()))
try:
process_task.process()
except Exception as e:
log.warning(
'Failed to perform {process_task} on {incoming}: \n {e}'.format(**locals())
)
else:
log.info("Completed {process_task} for {incoming} !".format(**locals()))
else:
log.warning("{incoming} is not a valid study for {process_task}".format(**locals()))
except Exception as e:
log.warning("{incoming} could not be assessed by {process_task}".format(**locals()))
myemail.nhs_mail(recipients=[admin],
subject=f"dicomserver {VERSION}: Failed to start listener",
message=f"{incoming} could not be assessed by {process_task}: {e.args}"
)
This is the final log message from the application log:
2019-03-15 12:19:06 I [process.py:on_association_released:171] Incoming study: Study(1.2.826.0.1.2112370.55.1.12145941)
This is the OSX Crash Report:
Process: Python [84177]
Path: /Library/Frameworks/Python.framework/Versions/3.6/Resources/Python.app/Contents/MacOS/Python
Identifier: Python
Version: 3.6.1 (3.6.1)
Code Type: X86-64 (Native)
Parent Process: Python [84175]
Responsible: Terminal [346]
User ID: 503
Date/Time: 2019-03-15 12:19:06.371 +0000
OS Version: Mac OS X 10.11.6 (15G1108)
Report Version: 11
Anonymous UUID: E7340644-9523-1C6B-0B2B-74D6043CFED6
Time Awake Since Boot: 590000 seconds
System Integrity Protection: enabled
Crashed Thread: 1
Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x0000000000000110
VM Regions Near 0x110:
-->
__TEXT 0000000100000000-0000000100001000 [ 4K] r-x/rwx SM=COW /Library/Frameworks/Python.framework/Versions/3.6/Resources/Python.app/Contents/MacOS/Python
Application Specific Information:
*** multi-threaded process forked ***
crashed on child side of fork pre-exec
This is the top of Thread 1 crash trace:
Thread 1 Crashed:
0 libdispatch.dylib 0x00007fff8e6cc661 _dispatch_queue_push_queue + 345
1 libdispatch.dylib 0x00007fff8e6cab06 _dispatch_queue_wakeup_with_qos_slow + 126
2 libdispatch.dylib 0x00007fff8e6d113f _dispatch_mach_msg_send + 1952
3 libdispatch.dylib 0x00007fff8e6d08dc dispatch_mach_send + 262
4 libxpc.dylib 0x00007fff86858fc9 xpc_connection_send_message_with_reply + 131
5 com.apple.CoreFoundation 0x00007fff8ef43b3f __66-[CFPrefsSearchListSource generationCountFromListOfSources:count:]_block_invoke_2 + 143
6 com.apple.CoreFoundation 0x00007fff8ef4396d _CFPrefsWithDaemonConnection + 381
7 com.apple.CoreFoundation 0x00007fff8ef42af6 __66-[CFPrefsSearchListSource generationCountFromListOfSources:count:]_block_invoke + 150
8 com.apple.CoreFoundation 0x00007fff8ef42893 -[CFPrefsSearchListSource generationCountFromListOfSources:count:] + 179
9 com.apple.CoreFoundation 0x00007fff8ef42174 -[CFPrefsSearchListSource alreadylocked_copyDictionary] + 324
10 com.apple.CoreFoundation 0x00007fff8ef41dbc -[CFPrefsSearchListSource alreadylocked_copyValueForKey:] + 60
11 com.apple.CoreFoundation 0x00007fff8ef41d4c ___CFPreferencesCopyAppValueWithContainer_block_invoke + 60
12 com.apple.CoreFoundation 0x00007fff8ef39a70 +[CFPrefsSearchListSource withSearchListForIdentifier:container:perform:] + 608
13 com.apple.CoreFoundation 0x00007fff8ef397c7 _CFPreferencesCopyAppValueWithContainer + 183
14 com.apple.SystemConfiguration 0x00007fff998b3a9b SCDynamicStoreCopyProxiesWithOptions + 163
15 _scproxy.cpython-36m-darwin.so 0x000000010f0f5a63 get_proxy_settings + 35
16 org.python.python 0x000000010006a604 _PyCFunction_FastCallDict + 436
17 org.python.python 0x00000001000f33e4 call_function + 612
18 org.python.python 0x00000001000f8d84 _PyEval_EvalFrameDefault + 21892
The issue looks awfully similar to a long-standing problem with Python on MacOS.
The root cause as far as I understand it is that fork() is hard to do right if there's threads involved, unless you immediately exec().
MacOS "protects" again the possible pitfalls by crashing a process if it's accessing certain system functionality such as libdispatch if it forked, but didn't exec yet.
Unfortunately these calls can happen in unexpected places, such as in _scproxy.cpython-36m-darwin.so which is shown at position 15 of the stack trace.
There's a number of Python bugs filed about this (1, 2, 3, for example), but there's no silver bullet as far as I know.
In your particular case, it might be possible to prevent the crash by running your Python interpreter with the environment variable no_proxy=*. This should prevent calls to the system configuration framework scproxy to find proxy settings.