i made a downloader based on a scheduler. So I wanted to download files at starting at a time of and stop before 6 am. I managed to achieve this using apscheduler that starts at 12 am, but i am not sure how to terminate all the task at 6 am. I am doing multi threading as i need to download as many things as possible by 6 am. i tried scheduling a exit at 6 am but it doesn't stop the threads and only exits current thread. i want to exit all threads by 6 am. here is the code i tried ::
executor = ThreadPoolExecutor(max_workers=multiprocessing.cpu_count() * 5)
urls = [...] # list of all urls
def download(url):
... # downloader here
def main_download():
futures = [executor.submit(download, url) for url in urls]
for future in as_completed(futures):
... # do something
scheduler = apscheduler.schedulers.background.BlockingScheduler(timezone="Asia/Kolkata")
job = scheduler.add_job(main_download, trigger="cron", hour=12)
def kill_all(): # kill everything
job.remove()
scheduler.remove_all_jobs()
scheduler.shutdown()
quit(1)
# already tried exit, raise keyboard interrupt, sys.exit
scheduler.add_job(kill_all, trigger="cron", hour=6) # kill everything
scheduler.start()
But it still continues the download everything, is there a good way to stop all threads? As this exists only current thread and other processes continues to run.
The best solution might be using os._exit(n). exit, quit, and even sys.exit raises SystemExit which only terminates current thread and can be caught by handlers. Where as os._exit(n) exits the code without calling cleanup handlers, flushing stdio buffers. So your solution might just be using os._exit(n) But be warned it is not a good way to exit and should only be used in special cases. So use it carefully.
The difference between each of them has been explained briefly in the following stackoverflow thread
Related
I am writing an queue processing application which uses threads for waiting on and responding to queue messages to be delivered to the app. For the main part of the application, it just needs to stay active. For a code example like:
while True:
pass
or
while True:
time.sleep(1)
Which one will have the least impact on a system? What is the preferred way to do nothing, but keep a python app running?
I would imagine time.sleep() will have less overhead on the system. Using pass will cause the loop to immediately re-evaluate and peg the CPU, whereas using time.sleep will allow the execution to be temporarily suspended.
EDIT: just to prove the point, if you launch the python interpreter and run this:
>>> while True:
... pass
...
You can watch Python start eating up 90-100% CPU instantly, versus:
>>> import time
>>> while True:
... time.sleep(1)
...
Which barely even registers on the Activity Monitor (using OS X here but it should be the same for every platform).
Why sleep? You don't want to sleep, you want to wait for the threads to finish.
So
# store the threads you start in a your_threads list, then
for a_thread in your_threads:
a_thread.join()
See: thread.join
If you are looking for a short, zero-cpu way to loop forever until a KeyboardInterrupt, you can use:
from threading import Event
Event().wait()
Note: Due to a bug, this only works on Python 3.2+. In addition, it appears to not work on Windows. For this reason, while True: sleep(1) might be the better option.
For some background, Event objects are normally used for waiting for long running background tasks to complete:
def do_task():
sleep(10)
print('Task complete.')
event.set()
event = Event()
Thread(do_task).start()
event.wait()
print('Continuing...')
Which prints:
Task complete.
Continuing...
signal.pause() is another solution, see https://docs.python.org/3/library/signal.html#signal.pause
Cause the process to sleep until a signal is received; the appropriate handler will then be called. Returns nothing. Not on Windows. (See the Unix man page signal(2).)
I've always seen/heard that using sleep is the better way to do it. Using sleep will keep your Python interpreter's CPU usage from going wild.
You don't give much context to what you are really doing, but maybe Queue could be used instead of an explicit busy-wait loop? If not, I would assume sleep would be preferable, as I believe it will consume less CPU (as others have already noted).
[Edited according to additional information in comment below.]
Maybe this is obvious, but anyway, what you could do in a case where you are reading information from blocking sockets is to have one thread read from the socket and post suitably formatted messages into a Queue, and then have the rest of your "worker" threads reading from that queue; the workers will then block on reading from the queue without the need for neither pass, nor sleep.
Running a method as a background thread with sleep in Python:
import threading
import time
class ThreadingExample(object):
""" Threading example class
The run() method will be started and it will run in the background
until the application exits.
"""
def __init__(self, interval=1):
""" Constructor
:type interval: int
:param interval: Check interval, in seconds
"""
self.interval = interval
thread = threading.Thread(target=self.run, args=())
thread.daemon = True # Daemonize thread
thread.start() # Start the execution
def run(self):
""" Method that runs forever """
while True:
# Do something
print('Doing something imporant in the background')
time.sleep(self.interval)
example = ThreadingExample()
time.sleep(3)
print('Checkpoint')
time.sleep(2)
print('Bye')
I have some testcases where I start a webserver process and then
run some URL tests to check if every function runs fine.
The server process start-up time is depending on the system where it is executed. It's a matter of seconds and I work with a time.sleep(5) for now.
But honestly I'm not a huge fan of sleep() since it might work for my systems but what if the test runs on a system where server needs 6 secs to start ... (so it's never really safe to go that way..)
Tests will fail for no reason at all.
So the question is: is there a nice way to check if the process really started.
I use the python multiprocessing module
Example:
from multiprocessing import Process
import testapp.server
import requests
import testapp.config as cfg
import time
p = Process(target=testapp.server.main)
p.start()
time.sleep(5)
testurl=cfg.server_settings["protocol"] + cfg.server_settings["host"] + ":" +str(cfg.server_settings["port"]) + "/test/12"
r = requests.get(testurl)
p.terminate()
assert int(r.text)==12
So it would be nice to avoid the sleep() and really check when the process started ...
You should use is_alive (docs) but that would almost always return True after you initiated start() on the process. If you want to make sure the process is already doing something important, there's no getting around the time.sleep (at least from this end, look at the last paragraph for another idea)
In any case, you could implement is_alive like this:
p = Process(target=testapp.server.main)
p.start()
while not p.is_alive():
time.sleep(0.1)
do_something_once_alive()
As you can see we still need to "sleep" and check again (just 0.1 seconds), but it will probably be much less than 5 seconds until is_alive returns True.
If both is_alive and time.sleep aren't accurate enough for you to know if the process really does something specific yet, and if you're controlling the other program as well, you should have it raise another kind of flag so you know you're good to go.
I suggest creating your process with a connection object as argument (other synchronization primitives may work) and use the send() method within your child process to notify your parent process that business can go on. Use the recv() method on the parent end of the connection object.
import multiprocessing as mp
def worker(conn):
conn.send(0) # argument object must be pickable
# your worker is ready to do work and just signaled it to the parent
out_conn, in_conn = mp.Pipe()
process = mp.Process(target=worker,
args=(out_conn,))
process.start()
in_conn.recv() # Will block until something is received
# worker in child process signaled it is ready. Business can go on
I am using the new concurrent.futures module (which also has a Python 2 backport) to do some simple multithreaded I/O. I am having trouble understanding how to cleanly kill tasks started using this module.
Check out the following Python 2/3 script, which reproduces the behavior I'm seeing:
#!/usr/bin/env python
from __future__ import print_function
import concurrent.futures
import time
def control_c_this():
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
future1 = executor.submit(wait_a_bit, name="Jack")
future2 = executor.submit(wait_a_bit, name="Jill")
for future in concurrent.futures.as_completed([future1, future2]):
future.result()
print("All done!")
def wait_a_bit(name):
print("{n} is waiting...".format(n=name))
time.sleep(100)
if __name__ == "__main__":
control_c_this()
While this script is running it appears impossible to kill cleanly using the regular Control-C keyboard interrupt. I am running on OS X.
On Python 2.7 I have to resort to kill from the command line to kill the script. Control-C is just ignored.
On Python 3.4, Control-C works if you hit it twice, but then a lot of strange stack traces are dumped.
Most documentation I've found online talks about how to cleanly kill threads with the old threading module. None of it seems to apply here.
And all the methods provided within the concurrent.futures module to stop stuff (like Executor.shutdown() and Future.cancel()) only work when the Futures haven't started yet or are complete, which is pointless in this case. I want to interrupt the Future immediately.
My use case is simple: When the user hits Control-C, the script should exit immediately like any well-behaved script does. That's all I want.
So what's the proper way to get this behavior when using concurrent.futures?
It's kind of painful. Essentially, your worker threads have to be finished before your main thread can exit. You cannot exit unless they do. The typical workaround is to have some global state, that each thread can check to determine if they should do more work or not.
Here's the quote explaining why. In essence, if threads exited when the interpreter does, bad things could happen.
Here's a working example. Note that C-c takes at most 1 sec to propagate because the sleep duration of the child thread.
#!/usr/bin/env python
from __future__ import print_function
import concurrent.futures
import time
import sys
quit = False
def wait_a_bit(name):
while not quit:
print("{n} is doing work...".format(n=name))
time.sleep(1)
def setup():
executor = concurrent.futures.ThreadPoolExecutor(max_workers=5)
future1 = executor.submit(wait_a_bit, "Jack")
future2 = executor.submit(wait_a_bit, "Jill")
# main thread must be doing "work" to be able to catch a Ctrl+C
# http://www.luke.maurits.id.au/blog/post/threads-and-signals-in-python.html
while (not (future1.done() and future2.done())):
time.sleep(1)
if __name__ == "__main__":
try:
setup()
except KeyboardInterrupt:
quit = True
I encountered this, but the issue I had was that many futures (10's of thousands) would be waiting to run and just pressing Ctrl-C left them waiting, not actually exiting. I was using concurrent.futures.wait to run a progress loop and needed to add a try ... except KeyboardInterrupt to handle cancelling unfinished Futures.
POLL_INTERVAL = 5
with concurrent.futures.ThreadPoolExecutor(max_workers=MAX_WORKERS) as pool:
futures = [pool.submit(do_work, arg) for arg in large_set_to_do_work_over]
# next line returns instantly
done, not_done = concurrent.futures.wait(futures, timeout=0)
try:
while not_done:
# next line 'sleeps' this main thread, letting the thread pool run
freshly_done, not_done = concurrent.futures.wait(not_done, timeout=POLL_INTERVAL)
done |= freshly_done
# more polling stats calculated here and printed every POLL_INTERVAL seconds...
except KeyboardInterrupt:
# only futures that are not done will prevent exiting
for future in not_done:
# cancel() returns False if it's already done or currently running,
# and True if was able to cancel it; we don't need that return value
_ = future.cancel()
# wait for running futures that the above for loop couldn't cancel (note timeout)
_ = concurrent.futures.wait(not_done, timeout=None)
If you're not interested in keeping exact track of what got done and what didn't (i.e. don't want a progress loop), you can replace the first wait call (the one with timeout=0) with not_done = futures and still leave the while not_done: logic.
The for future in not_done: cancel loop can probably behave differently based on that return value (or be written as a comprehension), but waiting for futures that are done or canceled isn't really waiting - it returns instantly. The last wait with timeout=None ensures that pool's running jobs really do finish.
Again, this only works correctly if the do_work that's being called actually, eventually returns within a reasonable amount of time. That was fine for me - in fact, I want to be sure that if do_work gets started, it runs to completion. If do_work is 'endless' then you'll need something like cdosborn's answer that uses a variable visible to all the threads, signaling them to stop themselves.
Late to the party, but I just had the same problem.
I want to kill my program immediately and I don't care what's going on. I don't need a clean shutdown beyond what Linux will do.
I found that replacing geitda's code in the KeyboardInterrupt exception handler with os.kill(os.getpid(), 9) exits immediately after the first ^C.
main = str(os.getpid())
def ossystem(c):
return subprocess.Popen(c, shell=True, stdout=subprocess.PIPE).stdout.read().decode("utf-8").strip()
def killexecutor():
print("Killing")
pids = ossystem('ps -a | grep scriptname.py').split('\n')
for pid in pids:
pid = pid.split(' ')[0].strip()
if(str(pid) != main):
os.kill(int(pid), 9)
...
killexecutor()
I have some code which runs routinely, and every now and then (like once a month) the program seems to hang somewhere and I'm not sure where.
I thought I would implement [what has turned out to be not quite] a "quick fix" of checking how long the program has been running for. I decided to use multithreading to call the function, and then while it is running, check the time.
For example:
import datetime
import threading
def myfunc():
#Code goes here
t=threading.Thread(target=myfunc)
t.start()
d1=datetime.datetime.utcnow()
while threading.active_count()>1:
if (datetime.datetime.utcnow()-d1).total_seconds()>60:
print 'Exiting!'
raise SystemExit(0)
However, this does not close the other thread (myfunc).
What is the best way to go about killing the other thread?
The docs could be clearer about this. Raising SystemExit tells the interpreter to quit, but "normal" exit processing is still done. Part of normal exit processing is .join()-ing all active non-daemon threads. But your rogue thread never ends, so exit processing waits forever to join it.
As #roippi said, you can do
t.daemon = True
before starting it. Normal exit processing does not wait for daemon threads. Your OS should kill them then when the main process exits.
Another alternative:
import os
os._exit(13) # whatever exit code you want goes there
That stops the interpreter "immediately", and skips all normal exit processing.
Pick your poison ;-)
There is no way to kill a thread. You must kill the target from within the target. The best way is with a hook and a queue. It goes something like this.
import Threading
from Queue import Queue
# add a kill_hook arg to your function, kill_hook
# is a queue used to pass messages to the main thread
def myfunc(*args, **kwargs, kill_hook=None):
#Code goes here
# put this somewhere which is periodically checked.
# an ideal place to check the hook is when logging
try:
if q.get_nowait(): # or use q.get(True, 5) to wait a longer
print 'Exiting!'
raise SystemExit(0)
except Queue.empty:
pass
q = Queue() # the queue used to pass the kill call
t=threading.Thread(target=myfunc, args = q)
t.start()
d1=datetime.datetime.utcnow()
while threading.active_count()>1:
if (datetime.datetime.utcnow()-d1).total_seconds()>60:
# if your kill criteria are met, put something in the queue
q.put(1)
I originally found this answer somewhere online, possibly this. Hope this helps!
Another solution would be to use a separate instance of Python, and monitor the other Python thread, killing it from the system level, with psutils.
Wow, I like the daemon and stealth os._exit solutions too!
I have a long process that I've scheduled to run in a thread, because otherwise it will freeze the UI in my wxpython application.
I'm using:
threading.Thread(target=myLongProcess).start()
to start the thread and it works, but I don't know how to pause and resume the thread. I looked in the Python docs for the above methods, but wasn't able to find them.
Could anyone suggest how I could do this?
I did some speed tests as well, the time to set the flag and for action to be taken is pleasantly fast 0.00002 secs on a slow 2 processor Linux box.
Example of thread pause test using set() and clear() events:
import threading
import time
# This function gets called by our thread.. so it basically becomes the thread init...
def wait_for_event(e):
while True:
print('\tTHREAD: This is the thread speaking, we are Waiting for event to start..')
event_is_set = e.wait()
print('\tTHREAD: WHOOOOOO HOOOO WE GOT A SIGNAL : %s' % event_is_set)
# or for Python >= 3.6
# print(f'\tTHREAD: WHOOOOOO HOOOO WE GOT A SIGNAL : {event_is_set}')
e.clear()
# Main code
e = threading.Event()
t = threading.Thread(name='pausable_thread',
target=wait_for_event,
args=(e,))
t.start()
while True:
print('MAIN LOOP: still in the main loop..')
time.sleep(4)
print('MAIN LOOP: I just set the flag..')
e.set()
print('MAIN LOOP: now Im gonna do some processing')
time.sleep(4)
print('MAIN LOOP: .. some more processing im doing yeahhhh')
time.sleep(4)
print('MAIN LOOP: ok ready, soon we will repeat the loop..')
time.sleep(2)
There is no method for other threads to forcibly pause a thread (any more than there is for other threads to kill that thread) -- the target thread must cooperate by occasionally checking appropriate "flags" (a threading.Condition might be appropriate for the pause/unpause case).
If you're on a unix-y platform (anything but windows, basically), you could use multiprocessing instead of threading -- that is much more powerful, and lets you send signals to the "other process"; SIGSTOP should unconditionally pause a process and SIGCONT continues it (if your process needs to do something right before it pauses, consider also the SIGTSTP signal, which the other process can catch to perform such pre-suspension duties. (There may be ways to obtain the same effect on Windows, but I'm not knowledgeable about them, if any).
You can use signals: http://docs.python.org/library/signal.html#signal.pause
To avoid using signals you could use a token passing system. If you want to pause it from the main UI thread you could probably just use a Queue.Queue object to communicate with it.
Just pop a message telling the thread the sleep for a certain amount of time onto the queue.
Alternatively you could simply continuously push tokens onto the queue from the main UI thread. The worker should just check the queue every N seconds (0.2 or something like that). When there are no tokens to dequeue the worker thread will block. When you want it to start again just start pushing tokens on to the queue from the main thread again.
The multiprocessing module works fine on Windows. See the documentation here (end of first paragraph):
http://docs.python.org/library/multiprocessing.html
On the wxPython IRC channel, we had a couple fellows trying multiprocessing out and they said it worked. Unfortunately, I have yet to see anyone who has written up a good example of multiprocessing and wxPython.
If you (or anyone else on here) come up with something, please add it to the wxPython wiki page on threading here: http://wiki.wxpython.org/LongRunningTasks
You might want to check that page out regardless as it has several interesting examples using threads and queues.
You might take a look at the Windows API for thread suspension.
As far as I'm aware there is no POSIX/pthread equivalent. Furthermore, I cannot ascertain if thread handles/IDs are made available from Python. There are also potential issues with Python, as its scheduling is done using the native scheduler, it's unlikely that it is expecting threads to suspend, particularly if threads suspended while holding the GIL, amongst other possibilities.
I had the same issue. It is more effective to use time.sleep(1800) in the thread loop to pause the thread execution.
e.g
MON, TUE, WED, THU, FRI, SAT, SUN = range(7) #Enumerate days of the week
Thread 1 :
def run(self):
while not self.exit:
try:
localtime = time.localtime(time.time())
#Evaluate stock
if localtime.tm_hour > 16 or localtime.tm_wday > FRI:
# do something
pass
else:
print('Waiting to evaluate stocks...')
time.sleep(1800)
except:
print(traceback.format_exc())
Thread 2
def run(self):
while not self.exit:
try:
localtime = time.localtime(time.time())
if localtime.tm_hour >= 9 and localtime.tm_hour <= 16:
# do something
pass
else:
print('Waiting to update stocks indicators...')
time.sleep(1800)
except:
print(traceback.format_exc())