There are a lot of similar questions and asnwers, but I still can't find reliable answer.
So, I have a function, that can possibly run too long. Function is private, in sense that I can not change it code.
I want to restrict its execution time to 60 seconds.
I tried following approaches:
Python signals. Don't work on Windows and in multithreaded envorinment (mod_wsgi).
Threads. Nice way, but thread can not be stopped, so that it lives even after raising TimeoutException.
multiprocessing python module. I have problems with pickling and I don't know how to solve them. I want to make time_limit decorator and there are problems with importing required function in top-level. Function that is executed long is instance method, and wrapping it also doesn't help...
So, are there good solutions to the above problem?
How to kill thread, that I started?
How to use subprocesses and avoid problems with pickling?
Is subprocess module of any help?
Thank you.
I think the multiprocessing approach is your only real option. You're correct that threads can't be killed (nicely) and signals have cross-platform issues. Here is one multiprocessing implementation:
import multiprocessing
import Queue
def timed_function(return_queue):
do_other_stuff()
return_queue.put(True)
return
def main():
return_queue = multiprocessing.Manager().Queue()
proc = multiprocessing.Process(target=timed_function, args=(return_queue,))
proc.start()
try:
# wait for 60 seconds for the function to return a value
return_queue.get(timeout=60)
except Queue.Empty:
# timeout expired
proc.terminate() # kill the subprocess
# other cleanup
I know you said that you have pickling issues, but those can almost always be resolved with refactoring. For example, you said that your long function is an instance method. You can wrap those kinds of functions to use them with multiprocessing:
class TestClass(object):
def timed_method(self, return_queue):
do_other_stuff()
return_queue.put(True)
return
To use that method in a pool of workers, add this wrapper to the top-level of the module:
def _timed_method_wrapper(TestClass_object, return_queue):
return TestClass_object(return_queue)
Now you can, for example, use apply_async on this class method from a different method of the same class:
def run_timed_method():
return_queue = multiprocessing.Manager().Queue()
pool = multiprocessing.Pool()
result = pool.apply_async(_timed_method_wrapper, args=(self, return_queue))
I'm pretty sure that these wrappers are only necessary if you're using a multiprocessing.Pool instead of launching the subprocess with a multiprocessing.Process object. Also, I bet a lot of people would frown on this construct because you're breaking the nice, clean abstraction that classes provide, and also creating a dependency between the class and this other random wrapper function hanging around. You'll have to be the one to decide if making your code more ugly is worth it or not.
An answer to
Is it possible to kill a process on Windows from within Python? may help:
You need to kill that subprocess or thread:
"Terminating a subprocess on windows"
Maybe also TerminateThread helps
Related
I would like to run a series of commands (which take a long time). But I do not want to wait for the completion of each command. How can I go about this in Python?
I looked at
os.fork()
and
subprocess.popen()
Don't think that is what I need.
Code
def command1():
wait(10)
def command2():
wait(10)
def command3():
wait(10)
I would like to call
command1()
command2()
command3()
Without having to wait.
Use python's multiprocessing module.
def func(arg1):
... do something ...
from multiprocessing import Process
p = Process(target=func, args=(arg1,), name='func')
p.start()
Complete Documentaion is over here: https://docs.python.org/2/library/multiprocessing.html
EDIT:
You can also use the Threading module of python if you are using jpython/cpython distribution as you can overcome the GIL (Global Interpreter Lock) in these distributions.
https://docs.python.org/2/library/threading.html
This example maybe is suitable for you:
#!/usr/bin/env python3
import sys
import os
import time
def forked(fork_func):
def do_fork():
pid = os.fork()
if (pid > 0):
fork_func()
exit(0)
else:
return pid
return do_fork
#forked
def command1():
time.sleep(2)
#forked
def command2():
time.sleep(1)
command1()
command2()
print("Hello")
You just use decorator #forked for your functions.
There is only one problem: when main program is over, it waits for end of child processes.
The most straightforward way is to use Python's own multiprocessing:
from multiprocessing import Process
def command1():
wait(10)
...
call1 = Process(target=command1, args=(...))
call1.start()
...
This module was introduced back exactly to ease the burden on controlling external process execution of functions accessible in the same code-base Of course, that could already be done by using os.fork, subprocess. Multiprocessing emulates as far as possible, Python's own threading moudle interface. The one immediate advantage of using multiprocessing over threading is that this enables the various worker processes to make use of different CPU cores, actually working in parallel - while threading, effectively, due to language design limitations is actually limited to a single execution worker at once, thus making use of a single core even when several are available.
Now, note that there are still peculiarities - specially if you are, for example, calling these from inside a web-request. Check this question an answers form a few days ago:
Stop a background process in flask without creating zombie processes
I'm currently using the standard multiprocessing in python to generate a bunch of processes that will run indefinitely. I'm not particularly concerned with performance; each thread is simply watching for a different change on the filesystem, and will take the appropriate action when a file is modified.
Currently, I have a solution that works, for my needs, in Linux. I have a dictionary of functions and arguments that looks like:
job_dict['func1'] = {'target': func1, 'args': (args,)}
For each, I create a process:
import multiprocessing
for k in job_dict.keys():
jobs[k] = multiprocessing.Process(target=job_dict[k]['target'],
args=job_dict[k]['args'])
With this, I can keep track of each one that is running, and, if necessary, restart a job that crashes for any reason.
This does not work in Windows. Many of the functions I'm using are wrappers, using various functools functions, and I get messages about not being able to serialize the functions (see What can multiprocessing and dill do together?). I have not figured out why I do not get this error in Linux, but do in Windows.
If I import dill before starting my processes in Windows, I do not get the serialization error. However, the processes do not actually do anything. I cannot figure out why.
I then switched to the multiprocessing implementation in pathos, but did not find an analog to the simple Process class within the standard multiprocessing module. I was able to generate threads for each job using pathos.pools.ThreadPool. This is not the intended use for map, I'm sure, but it started all the threads, and they ran in Windows:
import pathos
tp = pathos.pools.ThreadPool()
for k in job_dict.keys():
tp.uimap(job_dict[k]['target'], job_dict[k]['args'])
However, now I'm not sure how to monitor whether a thread is still active, which I'm looking for so that I can restart threads that crash for some reason or another. Any suggestions?
I'm the pathos and dill author. The Process class is buried deep within pathos at pathos.helpers.mp.process.Process, where mp itself is the actual fork of the multiprocessing library. Everything in multiprocessing should be accessible from there.
Another thing to know about pathos is that it keeps the pool alive for you until you remove it from the held state. This helps reduce overhead in creating "new" pools. To remove a pool, you do:
>>> # create
>>> p = pathos.pools.ProcessPool()
>>> # remove
>>> p.clear()
There's no such mechanism for a Process however.
For multiprocessing, windows is different than Linux and Macintosh… because windows doesn't have a proper fork like on linux… linux can share objects across processes, while on windows there is no sharing… it's basically a fully independent new process created… and therefore the serialization has to be better for the object to pass across to the other process -- just as if you would send the object to another computer. On, linux, you'd have to do this to get the same behavior:
def check(obj, *args, **kwds):
"""check pickling of an object across another process"""
import subprocess
fail = True
try:
_x = dill.dumps(x, *args, **kwds)
fail = False
finally:
if fail:
print "DUMP FAILED"
msg = "python -c import dill; print dill.loads(%s)" % repr(_x)
print "SUCCESS" if not subprocess.call(msg.split(None,2)) else "LOAD FAILED"
Ive been trying to read up on threading and multiprocessing but all the examples are to intricate and advanced for my level of python/programming knowlegde. I want to run a function, which consists of a while loop, and while that loop runs I want to continue with the program and eventually change the condition for the while-loop and end that process. This is the code:
class Example():
def __init__(self):
self.condition = False
def func1(self):
self.condition = True
while self.condition:
print "Still looping"
time.sleep(1)
print "Finished loop"
def end_loop(self):
self.condition = False
The I make the following function-calls:
ex = Example()
ex.func1()
time.sleep(5)
ex.end_loop()
What I want is for the func1 to run for 5s before the end_loop() is called and changes the condition and ends the loop and thus also the function. I.e I want one process to start and "go" into func1 and at the same time I want time.sleep(5) to be called, so the processes "split" when arriving at func1, one process entering the function while the other continues down the program and start with the time.sleep(5) execution.
This must be the most basic example of a multiprocess, still Ive had trouble finding a simple way to do it!
Thank you
EDIT1: regarding do_something. In my real problem do_something is replaced by some code that communicates with another program via a socket and receives packages with coordinates every 0.02s and stores them in membervariables of the class. I want this constant updating of the coordinates to start and then be able to to read the coordinates via other functions at the same time.
However that is not so relevant. What if do_something is replaced by:
time.sleep(1)
print "Still looping"
How do I solve my problem then?
EDIT2: I have tried multiprocessing like this:
from multiprocessing import Process
ex = Example()
p1 = Process(target=ex.func1())
p2 = Process(target=ex.end_loop())
p1.start()
time.sleep(5)
p2.start()
When I ran this, I never got to p2.start(), so that did not help. Even if it had this is not really what Im looking for either. What I want would be just to start the process p1, and then continue with time.sleep and ex.end_loop()
The first problem with your code are the calls
p1 = Process(target=ex.func1())
p2 = Process(target=ex.end_loop())
With ex.func1() you're calling the function and pass the return value as target parameter. Since the function doesn't return anything, you're effectively calling
p1 = Process(target=None)
p2 = Process(target=None)
which makes, of course, no sense.
After fixing that, the next problem will be shared data: when using the multiprocessing package, you implement concurrency using multiple processes which, by default, cannot simply share data afaik. Have a look at Sharing state between processes in the package's documentation to read about this. Especially take the first sentence into account: "when doing concurrent programming it is usually best to avoid using shared state as far as possible"!
So you might want to also have a look at Exchanging objects between processes to read about how to send/receive data between two different processes. So, instead of simply setting a flag to stop the loop, it might be better to send a message to signal the loop should be terminated.
Also note that processes are a heavyweight form of multiprocessing, they spawn multiple OS processes which comes with a relatively big overhead. multiprocessing's main purpose is to avoid problems imposed by Python's Global Interpreter Lock (google about this to read more...) If your problem is'nt much more complex than what you've told us, you might want to use the threading package instead: threads come with less overhead than processes and also allow to access the same data (although you really should read about synchronization when doing this...)
I'm afraid, multiprocessing is an inherently complex subject. So I think you will need to advance your programming/python skills to successfully use it. But I'm sure you'll manage this, the python documentation about this is comprehensive and there are a lot of other resources about this.
To tackle your EDIT2 problem, you could try using the shared memory map Value.
import time
from multiprocessing import Process, Value
class Example():
def func1(self, cond):
while (cond.value == 1):
print('do something')
time.sleep(1)
return
if __name__ == '__main__':
ex = Example()
cond = Value('i', 1)
proc = Process(target=ex.func1, args=(cond,))
proc.start()
time.sleep(5)
cond.value = 0
proc.join()
(Note the target=ex.func1 without the parentheses and the comma after cond in args=(cond,).)
But look at the answer provided by MartinStettner to find a good solution.
I'm trying to a program that executes a piece of code in such a way that the user can stop its execution at any time without stopping the main program. I thought I could do this using threading.Thread, but then I ran the following code in IDLE (Python 3.3):
from threading import *
import math
def f():
eval("math.factorial(1000000000)")
t = Thread(target = f)
t.start()
The last line doesn't return: I eventually restarted the shell. Is this a consequence of the Global Interpreter Lock, or am I doing something wrong? I didn't see anything specific to this problem in the threading documentation (http://docs.python.org/3/library/threading.html)
I tried to do the same thing using a process:
from multiprocessing import *
import math
def f():
eval("math.factorial(1000000000)")
p = Process(target = f)
p.start()
p.is_alive()
The last line returns False, even though I ran it only a few seconds after I started the process! Based on my processor usage, I am forced to conclude that the process never started in the first place. Can somebody please explain what I am doing wrong here?
Thread.start() never returns! Could this have something to do with the C implementation of the math library?
As #eryksun pointed out in the comment: math.factorial() is implemented as a C function that doesn't release GIL so no other Python code may run until it returns.
Note: multiprocessing version should work as is: each Python process has its own GIL.
factorial(1000000000) has hundreds millions of digits. Try import time; time.sleep(10) as dummy calculation instead.
If you have issues with multithreaded code in IDLE then try the same code from the command line, to make sure that the error persists.
If p.is_alive() returns False after p.start() is already called then it might mean that there is an error in f() function e.g., MemoryError.
On my machine, p.is_alive() returns True and one of cpus is at 100% if I paste your code from the question into Python shell.
Unrelated: remove wildcard imports such as from multiprocessing import *. They may shadow other names in your code so that you can't be sure what a given name means e.g., threading could define eval function (it doesn't but it could) with a similar but different semantics that might break your code silently.
I want my program to be able to handle ridiculous inputs from the user gracefully
If you pass user input directly to eval() then the user can do anything.
Is there any way to get a process to print, say, an error message without constructing a pipe or other similar structure?
It is an ordinary Python code:
print(message) # works
The difference is that if several processes run print() then the output might be garbled. You could use a lock to synchronize print() calls.
I am writing a GUI app in Pyglet that has to display tens to hundreds of thumbnails from the Internet. Right now, I am using urllib.urlretrieve to grab them, but this blocks each time until they are finished, and only grabs one at a time.
I would prefer to download them in parallel and have each one display as soon as it's finished, without blocking the GUI at any point. What is the best way to do this?
I don't know much about threads, but it looks like the threading module might help? Or perhaps there is some easy way I've overlooked.
You'll probably benefit from threading or multiprocessing modules. You don't actually need to create all those Thread-based classes by yourself, there is a simpler method using Pool.map:
from multiprocessing import Pool
def fetch_url(url):
# Fetch the URL contents and save it anywhere you need and
# return something meaningful (like filename or error code),
# if you wish.
...
pool = Pool(processes=4)
result = pool.map(f, image_url_list)
As you suspected, this is a perfect situation for threading. Here is a short guide I found immensely helpful when doing my own first bit of threading in python.
As you rightly indicated, you could create a number of threads, each of which is responsible for performing urlretrieve operations. This allows the main thread to continue uninterrupted.
Here is a tutorial on threading in python:
http://heather.cs.ucdavis.edu/~matloff/Python/PyThreads.pdf
Here's an example of how to use threading.Thread. Just replace the class name with your own and the run function with your own. Note that threading is great for IO restricted applications like your's and can really speed it up. Using pythong threading strictly for computation in standard python doesn't help because only one thread can compute at a time.
import threading, time
class Ping(threading.Thread):
def __init__(self, multiple):
threading.Thread.__init__(self)
self.multiple = multiple
def run(self):
#sleeps 3 seconds then prints 'pong' x times
time.sleep(3)
printString = 'pong' * self.multiple
pingInstance = Ping(3)
pingInstance.start() #your run function will be called with the start function
print "pingInstance is alive? : %d" % pingInstance.isAlive() #will return True, or 1
print "Number of threads alive: %d" % threading.activeCount()
#main thread + class instance
time.sleep(3.5)
print "Number of threads alive: %d" % threading.activeCount()
print "pingInstance is alive?: %d" % pingInstance.isAlive()
#isAlive returns false when your thread reaches the end of it's run function.
#only main thread now
You have these choices:
Threads: easiest but doesn't scale well
Twisted: medium difficulty, scales well but shares CPU due to GIL and being single threaded.
Multiprocessing: hardest. Scales well if you know how to write your own event loop.
I recommend just using threads unless you need an industrial scale fetcher.
You either need to use threads, or an asynchronous networking library such as Twisted. I suspect that using threads might be simpler in your particular use case.