Subprocess not thread safe, alternatives? - python

I'm using python 2.7 and do not have the option of upgrading or back-porting subprocess32. I am using it in a threaded environment in which usually it works fine, however sometimes the subprocess creation is not returning and so the thread hangs, even strace does not work in the instance of a hang, so I get no feedback.
E.G. this line can cause a hang (data returned is small so it is not a pipe issue):
process = subprocess.Popen(cmd,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
I have subsequently read that subprocess is not thread safe in python 2.7 and that "various issues" were fixed in the newest versions. I am using multiple threads calling subprocess.
I have demonstrated this problem with the following code (as a demonstrable example - not my actual code) which starts numerous threads with a subprocess each:
import os, time, threading, sys
from subprocess import Popen
i=0
class Process:
def __init__(self, args):
self.args = args
def run(self):
global i
retcode = -1
try:
self.process = Popen(self.args)
i+=1
if i == 10:
sys.stdout.write("Complete\n")
while self.process.poll() is None:
time.sleep(1.0)
retcode = self.process.returncode
except:
sys.stdout.write("ERROR\n")
return retcode
def main():
processes = [Process(["/bin/cat"]) for _ in range(10)]
# start all processes
for p in processes:
t = threading.Thread(target=Process.run, args=(p,))
t.daemon = True
t.start()
sys.stdout.write("all threads started\n")
# wait for Ctrl+C
while True:
time.sleep(1.0)
main()
This will often result in 1 or more subprocess calls never returning. Does anybody have more information on this or a solution/alternative
I am thinking of using the deprecated commands.getoutput instead but do not know if that is thread safe? It certainly seems to work correctly for the code above.

If the bulk of what your threads is doing is just waiting on subprocesses you can acomplish this much more effectively with coroutines. With python2 you would inplement this with generators so the necessary changes to the run function are:
replace time.sleep(1.0) with yield to pass control to another routine
replace return retcode with self.retcode = retcode or similar since generators can't return a value before python3.3
Then the main function could be something like this:
def main():
processes = [Process(["/bin/cat"]) for _ in range(10)]
#since p.run() is a generator this doesn't run any of the code yet
routines = [p.run() for p in processes]
while routines:
#iterate in reverse so we can remove routines while iterating without skipping any
for routine in reversed(routines):
try:
next(routine) #continue the routine to next yield
except StopIteration:
#this routine has finished, we no longer need to check it
routines.remove(routine)
This is intended to give you a place to start from, I'd recommend adding print statements around the yields or use pythontutor to better understand the order of execution.
This has the benefit of never having any threads waiting for anything, just one thread doing a section of processing at a time which can be much more efficient then many idling threads.

Related

Python3 Process.join() not actually waiting on Linux when the process is created in multi-thread

I need to put a timeout on a process that is created inside a thread, however i encountered a strange behavoir and i'm not sure how to proceed.
The following code executed on Linux produces a wierd bug where (if the number of thrads is greater than 2 (my laptop has 8 core) or the code is executed in a loop for a few times) the process.join() doesn't actually wait for the process to finish or the timeout to expire but just goes on with the next instruction.
If the same code is executed on Windows with python 3.9 it gives a circular import error in the libraries for no reason.
If it is executed with python 3.8 it works almost perfectly until like 256 threads, then gives the same stange beahvour on process.join() as in linux.
Error on windows Python 3.9:
ImportError: cannot import name 'Queue' from partially initialized module 'multiprocessing.queues' (most likely due to a circular import)
Furthermore if i remove the return value from the process, so i remove the Queue. On linux the process.join() start working properly for arbitrarily large n_threads. However running the code in a loop stiil gives the error even for very small n_threads.
import random
from multiprocessing import Process, Queue
from threading import Thread
def dummy_process():
return random.randint(1, 10)
#function to retrieve process return value
def process_returner(queue, function, args):
queue.put(function(*args))
#function that creates the process with timeout
def execute_with_timeout(function, args, timeout=3):
q = Queue()
p1 = Process(
target=process_returner,
args=(q, function, args),
name="P",
)
p1.start()
p1.join(timeout=timeout) # SOMETIME IT DOES NOT WAIT FOR THE PROCESS TO FINISH
if p1.exitcode is None:
print(f"Oops, {p1} timeouts!")# SO IT RAISES THIS ERROR even if nowhere near 3 secods have passed
raise TimeoutError
p1.terminate()
return q.get() if not q.empty() else None
#thread that just call the new process and stores the return value in the given array
def dummy_thread(result_array, index):
try:
result_array[index] = execute_with_timeout(dummy_process, args=())
except TimeoutError:
pass
def test():
#in loop because with low n_threads as 4 the error is not so common
for _ in range(10):
n_threads =8
results = [-1] * n_threads
threads = set()
for i in range(n_threads):
t = Thread(target=dummy_thread, args=(results, i))
threads.add(t)
t.start()
for t in threads:
t.join()
print(results)
if __name__ == '__main__':
test()
I ran into a similar problem when using the multiprocessing module on Linux. Process.join() started returning immediately instead of waiting. exitcode would be equal to None and is_alive() would return True.
It turns out the problem wasn't in the Python code. I was calling my Python program from a Bash script that would sometimes execute trap "" SIGCHLD. Normally, setting trap only affects the script itself, but trap "" some_signal tells the shell's child processes to ignore the signal as well. Blocking SIGCHLD interferes with the multiprocessing module.
In my case, adding signal.signal(signal.SIGCHLD, signal.SIG_DFL) to the beginning of the Python program fixed the problem.

Multiprocessing does not work and hangs on join on windows 10 [duplicate]

I have a question understanding the queue in the multiprocessing module in python 3
This is what they say in the programming guidelines:
Bear in mind that a process that has put items in a queue will wait before
terminating until all the buffered items are fed by the “feeder” thread to
the underlying pipe. (The child process can call the
Queue.cancel_join_thread
method of the queue to avoid this behaviour.)
This means that whenever you use a queue you need to make sure that all
items which have been put on the queue will eventually be removed before the
process is joined. Otherwise you cannot be sure that processes which have
put items on the queue will terminate. Remember also that non-daemonic
processes will be joined automatically.
An example which will deadlock is the following:
from multiprocessing import Process, Queue
def f(q):
q.put('X' * 1000000)
if __name__ == '__main__':
queue = Queue()
p = Process(target=f, args=(queue,))
p.start()
p.join() # this deadlocks
obj = queue.get()
A fix here would be to swap the last two lines (or simply remove the
p.join() line).
So apparently, queue.get() should not be called after a join().
However there are examples of using queues where get is called after a join like:
import multiprocessing as mp
import random
import string
# define a example function
def rand_string(length, output):
""" Generates a random string of numbers, lower- and uppercase chars. """
rand_str = ''.join(random.choice(
string.ascii_lowercase
+ string.ascii_uppercase
+ string.digits)
for i in range(length))
output.put(rand_str)
if __name__ == "__main__":
# Define an output queue
output = mp.Queue()
# Setup a list of processes that we want to run
processes = [mp.Process(target=rand_string, args=(5, output))
for x in range(2)]
# Run processes
for p in processes:
p.start()
# Exit the completed processes
for p in processes:
p.join()
# Get process results from the output queue
results = [output.get() for p in processes]
print(results)
I've run this program and it works (also posted as a solution to the StackOverFlow question Python 3 - Multiprocessing - Queue.get() does not respond).
Could someone help me understand what the rule for the deadlock is here?
The queue implementation in multiprocessing that allows data to be transferred between processes relies on standard OS pipes.
OS pipes are not infinitely long, so the process which queues data could be blocked in the OS during the put() operation until some other process uses get() to retrieve data from the queue.
For small amounts of data, such as the one in your example, the main process can join() all the spawned subprocesses and then pick up the data. This often works well, but does not scale, and it is not clear when it will break.
But it will certainly break with large amounts of data. The subprocess will be blocked in put() waiting for the main process to remove some data from the queue with get(), but the main process is blocked in join() waiting for the subprocess to finish. This results in a deadlock.
Here is an example where a user had this exact issue. I posted some code in an answer there that helped him solve his problem.
Don't call join() on a process object before you got all messages from the shared queue.
I used following workaround to allow processes to exit before processing all its results:
results = []
while True:
try:
result = resultQueue.get(False, 0.01)
results.append(result)
except queue.Empty:
pass
allExited = True
for t in processes:
if t.exitcode is None:
allExited = False
break
if allExited & resultQueue.empty():
break
It can be shortened but I left it longer to be more clear for newbies.
Here resultQueue is the multiprocess.Queue that was shared with multiprocess.Process objects. After this block of code you will get the result array with all the messages from the queue.
The problem is that input buffer of the queue pipe that receive messages may become full causing writer(s) infinite block until there will be enough space to receive next message. So you have three ways to avoid blocking:
Increase the multiprocessing.connection.BUFFER size (not so good)
Decrease message size or its amount (not so good)
Fetch messages from the queue immediately as they come (good way)

Python script is hanging AFTER multithreading

I know there are a few questions and answers related to hanging threads in Python, but my situation is slightly different as the script is hanging AFTER all the threads have been completed. The threading script is below, but obviously the first 2 functions are simplified massively.
When I run the script shown, it works. When I use my real functions, the script hangs AFTER THE LAST LINE. So, all the scenarios are processed (and a message printed to confirm), logStudyData() then collates all the results and writes to a csv. "Script Complete" is printed. And THEN it hangs.
The script with threading functionality removed runs fine.
I have tried enclosing the main script in try...except but no exception gets logged. If I use a debugger with a breakpoint on the final print and then step it forward, it hangs.
I know there is not much to go on here, but short of including the whole 1500-line script, I don't know hat else to do. Any suggestions welcome!
def runScenario(scenario):
# Do a bunch of stuff
with lock:
# access global variables
pass
pass
def logStudyData():
# Combine results from all scenarios into a df and write to csv
pass
def worker():
global q
while True:
next_scenario = q.get()
if next_scenario is None:
break
runScenario(next_scenario)
print(next_scenario , " is complete")
q.task_done()
import threading
from queue import Queue
global q, lock
q = Queue()
threads = []
scenario_list = ['s1','s2','s3','s4','s5','s6','s7','s8','s9','s10','s11','s12']
num_worker_threads = 6
lock = threading.Lock()
for i in range(num_worker_threads):
print("Thread number ",i)
this_thread = threading.Thread(target=worker)
this_thread.start()
threads.append(this_thread)
for scenario_name in scenario_list:
q.put(scenario_name)
q.join()
print("q.join completed")
logStudyData()
print("script complete")
As the docs for Queue.get say:
Remove and return an item from the queue. If optional args block is true and timeout is None (the default), block if necessary until an item is available. If timeout is a positive number, it blocks at most timeout seconds and raises the Empty exception if no item was available within that time. Otherwise (block is false), return an item if one is immediately available, else raise the Empty exception (timeout is ignored in that case).
In other words, there is no way get can ever return None, except by you calling q.put(None) on the main thread, which you don't do.
Notice that the example directly below those docs does this:
for i in range(num_worker_threads):
q.put(None)
for t in threads:
t.join()
The second one is technically necessary, but you usually get away with not doing it.
But the first one is absolutely necessary. You need to either do this, or come up with some other mechanism to tell your workers to quit. Without that, your main thread just tries to exit, which means it tries to join every worker, but those workers are all blocked forever on a get that will never happen, so your program hangs forever.
Building a thread pool may not be rocket science (if only because rocket scientists tend to need their calculations to be deterministic and hard real-time…), but it's not trivial, either, and there are plenty of things you can get wrong. You may want to consider using one of the two already-built threadpools in the Python standard library, concurrent.futures.ThreadPoolExecutor or multiprocessing.dummy.Pool. This would reduce your entire program to:
import concurrent.futures
def work(scenario):
runScenario(scenario)
print(scenario , " is complete")
scenario_list = ['s1','s2','s3','s4','s5','s6','s7','s8','s9','s10','s11','s12']
with concurrent.futures.ThreadPoolExecutor(max_workers=6) as x:
results = list(x.map(work, scenario_list))
print("q.join completed")
logStudyData()
print("script complete")
Obviously you'll still need a lock around any mutable variables you change inside runScenario—although if you're only using a mutable variable there because you couldn't figure out how to return values to the main thread, that's trivial with an Executor: just return the values from work, and then you can use them like this:
for result in x.map(work, scenario_list):
do_something(result)

How do you kill Futures once they have started?

I am using the new concurrent.futures module (which also has a Python 2 backport) to do some simple multithreaded I/O. I am having trouble understanding how to cleanly kill tasks started using this module.
Check out the following Python 2/3 script, which reproduces the behavior I'm seeing:
#!/usr/bin/env python
from __future__ import print_function
import concurrent.futures
import time
def control_c_this():
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
future1 = executor.submit(wait_a_bit, name="Jack")
future2 = executor.submit(wait_a_bit, name="Jill")
for future in concurrent.futures.as_completed([future1, future2]):
future.result()
print("All done!")
def wait_a_bit(name):
print("{n} is waiting...".format(n=name))
time.sleep(100)
if __name__ == "__main__":
control_c_this()
While this script is running it appears impossible to kill cleanly using the regular Control-C keyboard interrupt. I am running on OS X.
On Python 2.7 I have to resort to kill from the command line to kill the script. Control-C is just ignored.
On Python 3.4, Control-C works if you hit it twice, but then a lot of strange stack traces are dumped.
Most documentation I've found online talks about how to cleanly kill threads with the old threading module. None of it seems to apply here.
And all the methods provided within the concurrent.futures module to stop stuff (like Executor.shutdown() and Future.cancel()) only work when the Futures haven't started yet or are complete, which is pointless in this case. I want to interrupt the Future immediately.
My use case is simple: When the user hits Control-C, the script should exit immediately like any well-behaved script does. That's all I want.
So what's the proper way to get this behavior when using concurrent.futures?
It's kind of painful. Essentially, your worker threads have to be finished before your main thread can exit. You cannot exit unless they do. The typical workaround is to have some global state, that each thread can check to determine if they should do more work or not.
Here's the quote explaining why. In essence, if threads exited when the interpreter does, bad things could happen.
Here's a working example. Note that C-c takes at most 1 sec to propagate because the sleep duration of the child thread.
#!/usr/bin/env python
from __future__ import print_function
import concurrent.futures
import time
import sys
quit = False
def wait_a_bit(name):
while not quit:
print("{n} is doing work...".format(n=name))
time.sleep(1)
def setup():
executor = concurrent.futures.ThreadPoolExecutor(max_workers=5)
future1 = executor.submit(wait_a_bit, "Jack")
future2 = executor.submit(wait_a_bit, "Jill")
# main thread must be doing "work" to be able to catch a Ctrl+C
# http://www.luke.maurits.id.au/blog/post/threads-and-signals-in-python.html
while (not (future1.done() and future2.done())):
time.sleep(1)
if __name__ == "__main__":
try:
setup()
except KeyboardInterrupt:
quit = True
I encountered this, but the issue I had was that many futures (10's of thousands) would be waiting to run and just pressing Ctrl-C left them waiting, not actually exiting. I was using concurrent.futures.wait to run a progress loop and needed to add a try ... except KeyboardInterrupt to handle cancelling unfinished Futures.
POLL_INTERVAL = 5
with concurrent.futures.ThreadPoolExecutor(max_workers=MAX_WORKERS) as pool:
futures = [pool.submit(do_work, arg) for arg in large_set_to_do_work_over]
# next line returns instantly
done, not_done = concurrent.futures.wait(futures, timeout=0)
try:
while not_done:
# next line 'sleeps' this main thread, letting the thread pool run
freshly_done, not_done = concurrent.futures.wait(not_done, timeout=POLL_INTERVAL)
done |= freshly_done
# more polling stats calculated here and printed every POLL_INTERVAL seconds...
except KeyboardInterrupt:
# only futures that are not done will prevent exiting
for future in not_done:
# cancel() returns False if it's already done or currently running,
# and True if was able to cancel it; we don't need that return value
_ = future.cancel()
# wait for running futures that the above for loop couldn't cancel (note timeout)
_ = concurrent.futures.wait(not_done, timeout=None)
If you're not interested in keeping exact track of what got done and what didn't (i.e. don't want a progress loop), you can replace the first wait call (the one with timeout=0) with not_done = futures and still leave the while not_done: logic.
The for future in not_done: cancel loop can probably behave differently based on that return value (or be written as a comprehension), but waiting for futures that are done or canceled isn't really waiting - it returns instantly. The last wait with timeout=None ensures that pool's running jobs really do finish.
Again, this only works correctly if the do_work that's being called actually, eventually returns within a reasonable amount of time. That was fine for me - in fact, I want to be sure that if do_work gets started, it runs to completion. If do_work is 'endless' then you'll need something like cdosborn's answer that uses a variable visible to all the threads, signaling them to stop themselves.
Late to the party, but I just had the same problem.
I want to kill my program immediately and I don't care what's going on. I don't need a clean shutdown beyond what Linux will do.
I found that replacing geitda's code in the KeyboardInterrupt exception handler with os.kill(os.getpid(), 9) exits immediately after the first ^C.
main = str(os.getpid())
def ossystem(c):
return subprocess.Popen(c, shell=True, stdout=subprocess.PIPE).stdout.read().decode("utf-8").strip()
def killexecutor():
print("Killing")
pids = ossystem('ps -a | grep scriptname.py').split('\n')
for pid in pids:
pid = pid.split(' ')[0].strip()
if(str(pid) != main):
os.kill(int(pid), 9)
...
killexecutor()

How to list Processes started by multiprocessing Pool?

While attempting to store multiprocessing's process instance in multiprocessing list-variable 'poolList` I am getting a following exception:
SimpleQueue objects should only be shared between processes through inheritance
The reason why I would like to store the PROCESS instances in a variable is to be able to terminate all or just some of them later (if for example a PROCESS freezes). If storing a PROCESS in variable is not an option I would like to know how to get or to list all the PROCESSES started by mutliprocessing POOL. That would be very similar to what .current_process() method does. Except .current_process gets only a single process while I need all the processes started or all the processes currently running.
Two questions:
Is it even possible to store an instance of the Process (as a result of mp.current_process()
Currently I am only able to get a single process from inside of the function that the process is running (from inside of myFunct() using .current_process() method).
Instead I would like to to list all the processes currently running by multiprocessing. How to achieve it?
import multiprocessing as mp
poolList=mp.Manager().list()
def myFunct(arg):
print 'myFunct(): current process:', mp.current_process()
try: poolList.append(mp.current_process())
except Exception, e: print e
for i in range(110):
for n in range(500000):
pass
poolDict[arg]=i
print 'myFunct(): completed', arg, poolDict
from multiprocessing import Pool
pool = Pool(processes=2)
myArgsList=['arg1','arg2','arg3']
pool=Pool(processes=2)
pool.map_async(myFunct, myArgsList)
pool.close()
pool.join()
To list the processes started by a Pool()-instance(which is what you mean if I understand you correctly), there is the pool._pool-list. And it contains the instances of the processes.
However, it is not part of the documented interface and hence, really should not be used.
BUT...it seems a little bit unlikely that it would change just like that anyway. I mean, should they stop having an internal list of processes in the pool? And not call that _pool?
And also, it annoys me that there at least isn't a get processes-method. Or something.
And handling it breaking due to some name change should not be that difficult.
But still, use at your own risk:
from multiprocessing import pool
# Have to run in main
if __name__ == '__main__':
# Create 3 worker processes
_my_pool = pool.Pool(3)
# Loop, terminate, and remove from the process list
# Use a copy [:] of the list to remove items correctly
for _curr_process in _my_pool._pool[:]:
print("Terminating process "+ str(_curr_process.pid))
_curr_process.terminate()
_my_pool._pool.remove(_curr_process)
# If you call _repopulate, the pool will again contain 3 worker processes.
_my_pool._repopulate_pool()
for _curr_process in _my_pool._pool[:]:
print("After repopulation "+ str(_curr_process.pid))
The example creates a pool and manually terminates all processes.
It is important that you remember to delete the process you terminate from the pool yourself i you want Pool() to continue working as usual.
_my_pool._repopulate increases the number of working processes to 3 again, not needed to answer the question, but gives a little bit of behind-the-scenes insight.
Yes you can get all active process and perform action based on name of process
e.g
multiprocessing.Process(target=foo, name="refresh-reports")
and then
for p in multiprocessing.active_children():
if p.name == "foo":
p.terminate()
You're creating a managed List object, but then letting the associated Manager object expire.
Process objects are shareable because they aren't pickle-able; that is, they aren't simple.
Oddly the multiprocessing module doesn't have the equivalent of threading.enumerate() -- that is, you can't list all outstanding processes. As a workaround, I just store procs in a list. I never terminate() a process, but do sys.exit(0) in the parent. It's rough, because the workers will leave things in an inconsistent state, but it's okay for smaller programs
To kill a frozen worker, I suggest: 1) worker receives "heartbeat" jobs in a queue every now and then, 2) if parent notices worker A hasn't responded to a heartbeat in a certain amount of time, then p.terminate(). Consider restating the problem in another SO question, as it's interesting.
To be honest the map stuff is much easier than using a Manager.
Here's a Manager example I've used. A worker adds stuff to a shared list. Another worker occasionally wakes up, processes everything on the list, then goes back to sleep. The code also has verbose logs, which are essential for ease in debugging.
source
# producer adds to fixed-sized list; scanner uses them
import logging, multiprocessing, sys, time
def producer(objlist):
'''
add an item to list every sec; ensure fixed size list
'''
logger = multiprocessing.get_logger()
logger.info('start')
while True:
try:
time.sleep(1)
except KeyboardInterrupt:
return
msg = 'ding: {:04d}'.format(int(time.time()) % 10000)
logger.info('put: %s', msg)
del objlist[0]
objlist.append( msg )
def scanner(objlist):
'''
every now and then, run calculation on objlist
'''
logger = multiprocessing.get_logger()
logger.info('start')
while True:
try:
time.sleep(5)
except KeyboardInterrupt:
return
logger.info('items: %s', list(objlist))
def main():
logger = multiprocessing.log_to_stderr(
level=logging.INFO
)
logger.info('setup')
# create fixed-length list, shared between producer & consumer
manager = multiprocessing.Manager()
my_objlist = manager.list( # pylint: disable=E1101
[None] * 10
)
multiprocessing.Process(
target=producer,
args=(my_objlist,),
name='producer',
).start()
multiprocessing.Process(
target=scanner,
args=(my_objlist,),
name='scanner',
).start()
logger.info('running forever')
try:
manager.join() # wait until both workers die
except KeyboardInterrupt:
pass
logger.info('done')
if __name__=='__main__':
main()

Categories