I've been having an odd issue with PyCharm and subprocesses created by the multiprocessing library locking up forever. I'm using Windows with Python 3.5. What I'm trying to do is:
Start a background thread to block on stdin (waiting for input)
Have the main thread check occasionally for input from stdin and then delegate the work to Python processes created using multiprocessing
However, I've found that newly created multiprocessing Processes lock up forever if and only if the following conditions are met:
I'm running the code via Pycharm (both the latest and older versions)
The background thread is blocking on stdin
Here's the simplest example I can create that reproduces the problem:
import multiprocessing
import threading
import sys
def noop():
pass
def consume():
while True:
sys.stdin.readline()
if __name__ == '__main__':
# create a daemon thread to block on stdin
thread = threading.Thread(target=consume, daemon=True)
thread.start()
# create a background process
process = multiprocessing.Process(target=noop)
process.start()
I've Googled various combinations of "PyCharm stdin multiprocessing hang ..." and had no luck at finding an explanation, and I can't figure out why a thread of the main process blocking on stdin should ever cause a subprocess to also block/hang, let alone why it would only happen when running the script in PyCharm. The only think I can guess is that there might be some monkey-patching of either stdin or the multiprocessing library going on.
Has anyone else encountered this problem? Can anyone explain to me why this only occurs in PyCharm, and how I can make it work regardless of the Python editor I'm using?
I faced the same problem when I was trying to do multiple API calls to fetch data from a remote server. I replaced multiprocessing dummy with ThreadPoolExecutor. It works in the same way as dummy.
Following is a short snippet of a running code to write the response to a json file:
uids = [] # an array of the requisite parameters used in requests
with open('flight_config.json', 'w') as f:
futures = []
for i in range(chunk_index, len(uids)):
print('For uid[{}], fetching started:'.format(i))
chunk_index += 1
auth_token = get_header()
with ThreadPoolExecutor(max_workers=7) as executor:
future_to_url = {executor.submit(fetch_response_from_api, uid=uid, auth_token=auth_token): uid for uid in
uids[i]}
for future in concurrent.futures.as_completed(future_to_url):
result = future_to_url[future]
try:
data = future.result()
print(data)
except Exception as exc:
print('%r generated an exception: %s' % (result, exc))
else:
print('%r page is %d bytes' % (result, len(data)))
Related
I have a python written a python module to query a database and read this into a dataframe.
Some of these queries are quite big and are causing the module to exit. i.e. I get:
Exited
printed to the screen. Digging a bit deeper, I find
Memory cgroup out of memory: Kill process
So it's running out of memory - my question is how do I capture that kill signal so I can print a useful error message e.g. you need to request more resources to run this command...
Currently I have:
import signal
import pandas as pd
kill_now = False
def exit_gracefully(signum, frame, ):
kill_now = True
signal.signal(signal.SIGINT, exit_gracefully)
signal.signal(signal.SIGTERM, exit_gracefully)
sql_reader = pd.read(query, conn, chunksize=1000)
table_data = []
while not kill_now:
for data in sql_reader:
table_data.append(data)
break
if kill_now:
print("ran out of memory...")
But this doesn't catch the "Killed" signal
Background: I'm trying to do 100's of dymola simulations with the python-dymola interface. I managed to run them in a for-loop. Now I want them to run while multi-threading so I can run multiple models parallel (which will be much faster). Since probably nobody uses the interface, I wrote some simple code that also shows my problem:
1: Turn a for-loop into a definition that is run into another for-loop BUT both the def and the for-loop share the same variable 'i'.
2: Turn a for-loop into a definition and use multi-threading to execute it. A for-loop runs the command one by one. I want to run them parallel with a maximum of x threads at the same time. The result should be the same as when executing the for-loop
Example-code:
import os
nSim = 100
ndig='{:01d}'
for i in range(nSim):
os.makedirs(str(ndig.format(i)))
Note that the name of the created directories are just the numbers from the for-loop (this is important). Now instead of using the for-loop, I would love to create the directories with multi-threading (note: probably not interesting for this short code but when calling and executing 100's of simulation models it definitely is interesting to use multi-threading).
So I started with something simple I thought, turning the for-loop into a function that then is run inside another for-loop and hoped to have the same result as with the for-loop code above but got this error:
AttributeError: 'NoneType' object has no attribute 'start'
(note: I just started with this, because I did not use the def-statement before and the thread package is also new. After this I would evolve towards the multi-threading.)
1:
import os
nSim = 100
ndig='{:01d}'
def simulation(i):
os.makedirs(str(ndig.format(i)))
for i in range(nSim):
simulation(i=i).start
After that failed, I tried to evolve to multi-threading (converting the for-loop into something that does the same but with multi-threading and by that running the code parallel instead of one by one and with a maximum number of threads):
2:
import os
import threading
nSim = 100
ndig='{:01d}'
def simulation(i):
os.makedirs(str(ndig.format(i)))
if __name__ == '__main__':
i in range(nSim)
simulation_thread[i] = threading.Thread(target=simulation(i=i))
simulation_thread[i].daemon = True
simulation_thread[i].start()
Unfortunately that attempt failed as well and now I got the error:
NameError: name 'i' is not defined
Does anybody has suggestions for issues 1 or 2?
Both examples are incomplete. Here's a complete example. Note that target gets passed the name of the function target=simulation and a tuple of its arguments args=(i,). Don't call the function target=simulation(i=i) because that just passes the result of the function, which is equivalent to target=None in this case.
import threading
nSim = 100
def simulation(i):
print(f'{threading.current_thread().name}: {i}')
if __name__ == '__main__':
threads = [threading.Thread(target=simulation,args=(i,)) for i in range(nSim)]
for t in threads:
t.start()
for t in threads:
t.join()
Output:
Thread-1: 0
Thread-2: 1
Thread-3: 2
.
.
Thread-98: 97
Thread-99: 98
Thread-100: 99
Note you usually don't want more threads that CPUs, which you can get from multiprocessing.cpu_count(). You can use create a thread pool and use queue.Queue to post work that the threads execute. An example is in the Python Queue documentation.
Cannot call .start like this
simulation(i=i).start
on an non-threading object. Also, you have to import the module as well
It seems like you forgot to add 'for' and indent the code in your loop
i in range(nSim)
simulation_thread[i] = threading.Thread(target=simulation(i=i))
simulation_thread[i].daemon = True
simulation_thread[i].start()
to
for i in range(nSim):
simulation_thread[i] = threading.Thread(target=simulation(i=i))
simulation_thread[i].daemon = True
simulation_thread[i].start()
If you would like to have max number of thread in a pool, and to run all items in the queue. We can continue #mark-tolonen answer and do like this:
import threading
import queue
import time
def main():
size_of_threads_pool = 10
num_of_tasks = 30
task_seconds = 1
q = queue.Queue()
def worker():
while True:
item = q.get()
print(my_st)
print(f'{threading.current_thread().name}: Working on {item}')
time.sleep(task_seconds)
print(f'Finished {item}')
q.task_done()
my_st = "MY string"
threads = [threading.Thread(target=worker, daemon=True) for i in range(size_of_threads_pool)]
for t in threads:
t.start()
# send the tasks requests to the worker
for item in range(num_of_tasks):
q.put(item)
# block until all tasks are done
q.join()
print('All work completed')
# NO need this, as threads are while True, so never will stop..
# for t in threads:
# t.join()
if __name__ == '__main__':
main()
This will run 30 tasks of 1 second in each, using 10 threads.
So total time would be 3 seconds.
$ time python3 q_test.py
...
All work completed
real 0m3.064s
user 0m0.033s
sys 0m0.016s
EDIT: I found another higher-level interface for asynchronously executing callables.
Use concurrent.futures, see the example in the docs:
import concurrent.futures
import urllib.request
URLS = ['http://www.foxnews.com/',
'http://www.cnn.com/',
'http://europe.wsj.com/',
'http://www.bbc.co.uk/',
'http://some-made-up-domain.com/']
# Retrieve a single page and report the URL and contents
def load_url(url, timeout):
with urllib.request.urlopen(url, timeout=timeout) as conn:
return conn.read()
# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
# Start the load operations and mark each future with its URL
future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
data = future.result()
except Exception as exc:
print('%r generated an exception: %s' % (url, exc))
else:
print('%r page is %d bytes' % (url, len(data)))
Note the max_workers=5 that will tell the max number of threads, and
note the for loop for url in URLS that you can use.
In a nutshell
I get a BrokenProcessPool exception when parallelizing my code with concurrent.futures. No further error is displayed. I want to find the cause of the error and ask for ideas of how to do that.
Full problem
I am using concurrent.futures to parallelize some code.
with ProcessPoolExecutor() as pool:
mapObj = pool.map(myMethod, args)
I end up with (and only with) the following exception:
concurrent.futures.process.BrokenProcessPool: A child process terminated abruptly, the process pool is not usable anymore
Unfortunately, the program is complex and the error appears only after the program has run for 30 minutes. Therefore, I cannot provide a nice minimal example.
In order to find the cause of the issue, I wrapped the method that I run in parallel with a try-except-block:
def myMethod(*args):
try:
...
except Exception as e:
print(e)
The problem remained the same and the except block was never entered. I conclude that the exception does not come from my code.
My next step was to write a custom ProcessPoolExecutor class that is a child of the original ProcessPoolExecutor and allows me to replace some methods with cusomized ones. I copied and pasted the original code of the method _process_worker and added some print statements.
def _process_worker(call_queue, result_queue):
"""Evaluates calls from call_queue and places the results in result_queue.
...
"""
while True:
call_item = call_queue.get(block=True)
if call_item is None:
# Wake up queue management thread
result_queue.put(os.getpid())
return
try:
r = call_item.fn(*call_item.args, **call_item.kwargs)
except BaseException as e:
print("??? Exception ???") # newly added
print(e) # newly added
exc = _ExceptionWithTraceback(e, e.__traceback__)
result_queue.put(_ResultItem(call_item.work_id, exception=exc))
else:
result_queue.put(_ResultItem(call_item.work_id,
result=r))
Again, the except block is never entered. This was to be expected, because I already ensured that my code does not raise an exception (and if everything worked well, the exception should be passed to the main process).
Now I am lacking ideas how I could find the error. The exception is raised here:
def submit(self, fn, *args, **kwargs):
with self._shutdown_lock:
if self._broken:
raise BrokenProcessPool('A child process terminated '
'abruptly, the process pool is not usable anymore')
if self._shutdown_thread:
raise RuntimeError('cannot schedule new futures after shutdown')
f = _base.Future()
w = _WorkItem(f, fn, args, kwargs)
self._pending_work_items[self._queue_count] = w
self._work_ids.put(self._queue_count)
self._queue_count += 1
# Wake up queue management thread
self._result_queue.put(None)
self._start_queue_management_thread()
return f
The process pool is set to be broken here:
def _queue_management_worker(executor_reference,
processes,
pending_work_items,
work_ids_queue,
call_queue,
result_queue):
"""Manages the communication between this process and the worker processes.
...
"""
executor = None
def shutting_down():
return _shutdown or executor is None or executor._shutdown_thread
def shutdown_worker():
...
reader = result_queue._reader
while True:
_add_call_item_to_queue(pending_work_items,
work_ids_queue,
call_queue)
sentinels = [p.sentinel for p in processes.values()]
assert sentinels
ready = wait([reader] + sentinels)
if reader in ready:
result_item = reader.recv()
else: #THIS BLOCK IS ENTERED WHEN THE ERROR OCCURS
# Mark the process pool broken so that submits fail right now.
executor = executor_reference()
if executor is not None:
executor._broken = True
executor._shutdown_thread = True
executor = None
# All futures in flight must be marked failed
for work_id, work_item in pending_work_items.items():
work_item.future.set_exception(
BrokenProcessPool(
"A process in the process pool was "
"terminated abruptly while the future was "
"running or pending."
))
# Delete references to object. See issue16284
del work_item
pending_work_items.clear()
# Terminate remaining workers forcibly: the queues or their
# locks may be in a dirty state and block forever.
for p in processes.values():
p.terminate()
shutdown_worker()
return
...
It is (or seems to be) a fact that a process terminates, but I have no clue why. Are my thoughts correct so far? What are possible causes that make a process terminate without a message? (Is this even possible?) Where could I apply further diagnostics? Which questions should I ask myself in order to come closer to a solution?
I am using python 3.5 on 64bit Linux.
I think I was able to get as far as possible:
I changed the _queue_management_worker method in my changed ProcessPoolExecutor module such that the exit code of the failed process is printed:
def _queue_management_worker(executor_reference,
processes,
pending_work_items,
work_ids_queue,
call_queue,
result_queue):
"""Manages the communication between this process and the worker processes.
...
"""
executor = None
def shutting_down():
return _shutdown or executor is None or executor._shutdown_thread
def shutdown_worker():
...
reader = result_queue._reader
while True:
_add_call_item_to_queue(pending_work_items,
work_ids_queue,
call_queue)
sentinels = [p.sentinel for p in processes.values()]
assert sentinels
ready = wait([reader] + sentinels)
if reader in ready:
result_item = reader.recv()
else:
# BLOCK INSERTED FOR DIAGNOSIS ONLY ---------
vals = list(processes.values())
for s in ready:
j = sentinels.index(s)
print("is_alive()", vals[j].is_alive())
print("exitcode", vals[j].exitcode)
# -------------------------------------------
# Mark the process pool broken so that submits fail right now.
executor = executor_reference()
if executor is not None:
executor._broken = True
executor._shutdown_thread = True
executor = None
# All futures in flight must be marked failed
for work_id, work_item in pending_work_items.items():
work_item.future.set_exception(
BrokenProcessPool(
"A process in the process pool was "
"terminated abruptly while the future was "
"running or pending."
))
# Delete references to object. See issue16284
del work_item
pending_work_items.clear()
# Terminate remaining workers forcibly: the queues or their
# locks may be in a dirty state and block forever.
for p in processes.values():
p.terminate()
shutdown_worker()
return
...
Afterwards I looked up the meaning of the exit code:
from multiprocessing.process import _exitcode_to_name
print(_exitcode_to_name[my_exit_code])
whereby my_exit_code is the exit code that was printed in the block I inserted to the _queue_management_worker. In my case the code was -11, which means that I ran into a segmentation fault. Finding the reason for this issue will be a huge task but goes beyond the scope of this question.
If you are using macOS, there is a known issue with how some versions of macOS uses forking that's not considered fork-safe by Python in some scenarios. The workaround that worked for me is to use no_proxy environment variable.
Edit ~/.bash_profile and include the following (it might be better to specify list of domains or subnets here, instead of *)
no_proxy='*'
Refresh the current context
source ~/.bash_profile
My local versions the issue was seen and worked around are: Python 3.6.0 on
macOS 10.14.1 and 10.13.x
Sources:
Issue 30388
Issue 27126
I have the following situation:
I receive a request on a socketio server. I answer it (socket.emit(..)) and then start something with heavy computation load in another thread.
If the heavy computation is caused by subprocess.Popen (using subprocess.PIPE) it totally blocks every incoming request as long as it is being executed although it happens in a separate thread.
No problem - in this thread it was suggested to asynchronously read the result of the subprocess with a buffer size of 1 so that between these reads other threads have the chance to do something. Unfortunately this did not help for me.
I also already monkeypatched eventlet and that works fine - as long as I don't use subprocess.Popen with subprocess.PIPE in the thread.
In this code sample you can see that it only happens using subprocess.Popen with subprocess.PIPE. When uncommenting #functionWithSimulatedHeavyLoad() and instead comment functionWithHeavyLoad() everything works like charm.
from flask import Flask
from flask.ext.socketio import SocketIO, emit
import eventlet
eventlet.monkey_patch()
app = Flask(__name__)
socketio = SocketIO(app)
import time
from threading import Thread
#socketio.on('client command')
def response(data, type = None, nonce = None):
socketio.emit('client response', ['foo'])
thread = Thread(target = testThreadFunction)
thread.daemon = True
thread.start()
def testThreadFunction():
#functionWithSimulatedHeavyLoad()
functionWithHeavyLoad()
def functionWithSimulatedHeavyLoad():
time.sleep(5)
def functionWithHeavyLoad():
from datetime import datetime
import subprocess
import sys
from queue import Queue, Empty
ON_POSIX = 'posix' in sys.builtin_module_names
def enqueueOutput(out, queue):
for line in iter(out.readline, b''):
if line == '':
break
queue.put(line)
out.close()
# just anything that takes long to be computed
shellCommand = 'find / test'
p = subprocess.Popen(shellCommand, universal_newlines=True, shell=True, stdout=subprocess.PIPE, bufsize=1, close_fds=ON_POSIX)
q = Queue()
t = Thread(target = enqueueOutput, args = (p.stdout, q))
t.daemon = True
t.start()
t.join()
text = ''
while True:
try:
line = q.get_nowait()
text += line
print(line)
except Empty:
break
socketio.emit('client response', {'text': text})
socketio.run(app)
The client receives the message 'foo' after the blocking work in the functionWithHeavyLoad() function is completed. It should receive the message earlier, though.
This sample can be copied and pasted in a .py file and the behavior can be instantly reproduced.
I am using Python 3.4.3, Flask 0.10.1, flask-socketio1.2, eventlet 0.17.4
Update
If I put this into the functionWithHeavyLoad function it actually works and everything's fine:
import shlex
shellCommand = shlex.split('find / test')
popen = subprocess.Popen(shellCommand, stdout=subprocess.PIPE)
lines_iterator = iter(popen.stdout.readline, b"")
for line in lines_iterator:
print(line)
eventlet.sleep()
The problem is: I used find for heavy load in order to make the sample for you more easily reproducable. However, in my code I actually use tesseract "{0}" stdout -l deu as the sell command. This (unlike find) still blocks everything. Is this rather a tesseract issue than eventlet? But still: how can this block if it happens in a separate thread where it reads line by line with context switch when find does not block?
Thanks to this question I learned something new today. Eventlet does offer a greenlet friendly version of subprocess and its functions, but for some odd reason it does not monkey patch this module in the standard library.
Link to the eventlet implementation of subprocess: https://github.com/eventlet/eventlet/blob/master/eventlet/green/subprocess.py
Looking at the eventlet patcher, the modules that are patched are os, select, socket, thread, time, MySQLdb, builtins and psycopg2. There is absolutely no reference to subprocess in the patcher.
The good news is that I was able to work with Popen() in an application very similar to yours, after I replaced:
import subprocess
with:
from eventlet.green import subprocess
But note that the currently released version of eventlet (0.17.4) does not support the universal_newlines option in Popen, you will get an error if you use it. Support for this option is in master (here is the commit that added the option). You will either have to remove that option from your call, or else install the master branch of eventlet direct from github.
I'm using ftplib for connecting and getting file list from FTP server.
The problem I have is that the connection hangs from time to time and I don't know why. I'm running python script as a daemon, using threads.
See what I mean:
def main():
signal.signal(signal.SIGINT, signal_handler)
app.db = MySQLWrapper()
try:
app.opener = FTP_Opener()
mainloop = MainLoop()
while not app.terminate:
# suspend main thread until the queue terminates
# this lets to restart the queue automatically in case of unexpected shutdown
mainloop.join(10)
while (not app.terminate) and (not mainloop.isAlive()):
time.sleep(script_timeout)
print time.ctime(), "main: trying to restart the queue"
try:
mainloop = MainLoop()
except Exception:
time.sleep(60)
finally:
app.db.close()
app.db = None
app.opener = None
mainloop = None
try:
os.unlink(PIDFILE)
except:
pass
# give other threads time to terminate
time.sleep(1)
print time.ctime(), "main: main thread terminated"
MainLoop() has some functions for FTP connect, download specific files and disconnect from the server.
Here's how I get the file's list:
file_list = app.opener.load_list()
And how FTP_Opener.load_list() function looks like:
def load_list(self):
attempts = 0
while attempts<=ftp_config.load_max_attempts:
attempts += 1
filelist = []
try:
self._connect()
self._chdir()
# retrieve file list to 'filelist' var
self.FTP.retrlines('LIST', lambda s: filelist.append(s))
filelist = self._filter_filelist(self._parse_filelist(filelist))
return filelist
except Exception:
print sys.exc_info()
self._disconnect()
sleep(0.1)
print time.ctime(), "FTP Opener: can't load file list"
return []
Why sometimes the FTP connection hangs and how can I monitor this? So if it happens I would like to terminate the thread somehow and start a new one.
Thanks
If you are building for robustness, I would highly recommend that you look into using an event-driven method. One such which have FTP support is Twisted (API).
The advantage is that you don't block the thread while waiting for i/O and you can create simple timer functions to monitor your connections if you so prefer. It also scales a lot better. It is slightly more complicated to code using event-driven patterns, so if this is just a simple script it may or may not be worth the extra effort, but since you write that you are writing a daemon, it might be worth looking into.
Here is an example of an FTP client: ftpclient.py