I have some testcases where I start a webserver process and then
run some URL tests to check if every function runs fine.
The server process start-up time is depending on the system where it is executed. It's a matter of seconds and I work with a time.sleep(5) for now.
But honestly I'm not a huge fan of sleep() since it might work for my systems but what if the test runs on a system where server needs 6 secs to start ... (so it's never really safe to go that way..)
Tests will fail for no reason at all.
So the question is: is there a nice way to check if the process really started.
I use the python multiprocessing module
Example:
from multiprocessing import Process
import testapp.server
import requests
import testapp.config as cfg
import time
p = Process(target=testapp.server.main)
p.start()
time.sleep(5)
testurl=cfg.server_settings["protocol"] + cfg.server_settings["host"] + ":" +str(cfg.server_settings["port"]) + "/test/12"
r = requests.get(testurl)
p.terminate()
assert int(r.text)==12
So it would be nice to avoid the sleep() and really check when the process started ...
You should use is_alive (docs) but that would almost always return True after you initiated start() on the process. If you want to make sure the process is already doing something important, there's no getting around the time.sleep (at least from this end, look at the last paragraph for another idea)
In any case, you could implement is_alive like this:
p = Process(target=testapp.server.main)
p.start()
while not p.is_alive():
time.sleep(0.1)
do_something_once_alive()
As you can see we still need to "sleep" and check again (just 0.1 seconds), but it will probably be much less than 5 seconds until is_alive returns True.
If both is_alive and time.sleep aren't accurate enough for you to know if the process really does something specific yet, and if you're controlling the other program as well, you should have it raise another kind of flag so you know you're good to go.
I suggest creating your process with a connection object as argument (other synchronization primitives may work) and use the send() method within your child process to notify your parent process that business can go on. Use the recv() method on the parent end of the connection object.
import multiprocessing as mp
def worker(conn):
conn.send(0) # argument object must be pickable
# your worker is ready to do work and just signaled it to the parent
out_conn, in_conn = mp.Pipe()
process = mp.Process(target=worker,
args=(out_conn,))
process.start()
in_conn.recv() # Will block until something is received
# worker in child process signaled it is ready. Business can go on
I am very new to the concept of threading and the concepts are still somewhat fuzzy.
But as of now i have a requirement in which i spin up an arbitrary number of threads from my Python program and then my Python program should indicate to the user running the process which threads have finished executing. Below is my first try:
import threading
from threading import Thread
from time import sleep
def exec_thread(n):
name = threading.current_thread().getName()
filename = name + ".txt"
with open(filename, "w+") as file:
file.write(f"My name is {name} and my main thread is {threading.main_thread()}\n")
sleep(n)
file.write(f"{name} exiting\n")
t1 = Thread(name="First", target=exec_thread, args=(10,))
t2 = Thread(name="Second", target=exec_thread, args=(2,))
t1.start()
t2.start()
while len(threading.enumerate()) > 1:
print(f"Waiting ... !")
sleep(5)
print(f"The threads are done"
So this basically tells me when all the threads are done executing.
But i want to know as soon as any one of my threads have completed execution so that i can tell the user that please check the output file for the thread.
I cannot use thread.join() since that would block my main program and the user would not know anything unless everything is complete which might take hours. The user wants to know as soon as some results are available.
Now i know that we can check individual threads whether they are active or not by doing : thread.isAlive() but i was hoping for a more elegant solution in which if the child threads can somehow communicate with the main thread and say I am done !
Many thanks for any answers in advance.
The simplest and most straightforward way to indicate a single thread is "done" is to put the required notification in the thread's implementation method, as the very last step. For example, you could print out a notification to the user.
Or, you could use events, see: https://docs.python.org/3/library/threading.html#event-objects
This is one of the simplest mechanisms for communication between
threads: one thread signals an event and other threads wait for it.
An event object manages an internal flag that can be set to true with
the set() method and reset to false with the clear() method. The
wait() method blocks until the flag is true.
So, the "final act" in your thread implementation would be to set an event object, and your main thread can wait until it's set.
Or, for an even fancier and more mechanism, use queues: https://docs.python.org/3/library/queue.html
Each thread writes an "I'm done" object to the queue when done, and the main thread can read those notifications from the queue in sequence as each thread completes.
I have some code which runs routinely, and every now and then (like once a month) the program seems to hang somewhere and I'm not sure where.
I thought I would implement [what has turned out to be not quite] a "quick fix" of checking how long the program has been running for. I decided to use multithreading to call the function, and then while it is running, check the time.
For example:
import datetime
import threading
def myfunc():
#Code goes here
t=threading.Thread(target=myfunc)
t.start()
d1=datetime.datetime.utcnow()
while threading.active_count()>1:
if (datetime.datetime.utcnow()-d1).total_seconds()>60:
print 'Exiting!'
raise SystemExit(0)
However, this does not close the other thread (myfunc).
What is the best way to go about killing the other thread?
The docs could be clearer about this. Raising SystemExit tells the interpreter to quit, but "normal" exit processing is still done. Part of normal exit processing is .join()-ing all active non-daemon threads. But your rogue thread never ends, so exit processing waits forever to join it.
As #roippi said, you can do
t.daemon = True
before starting it. Normal exit processing does not wait for daemon threads. Your OS should kill them then when the main process exits.
Another alternative:
import os
os._exit(13) # whatever exit code you want goes there
That stops the interpreter "immediately", and skips all normal exit processing.
Pick your poison ;-)
There is no way to kill a thread. You must kill the target from within the target. The best way is with a hook and a queue. It goes something like this.
import Threading
from Queue import Queue
# add a kill_hook arg to your function, kill_hook
# is a queue used to pass messages to the main thread
def myfunc(*args, **kwargs, kill_hook=None):
#Code goes here
# put this somewhere which is periodically checked.
# an ideal place to check the hook is when logging
try:
if q.get_nowait(): # or use q.get(True, 5) to wait a longer
print 'Exiting!'
raise SystemExit(0)
except Queue.empty:
pass
q = Queue() # the queue used to pass the kill call
t=threading.Thread(target=myfunc, args = q)
t.start()
d1=datetime.datetime.utcnow()
while threading.active_count()>1:
if (datetime.datetime.utcnow()-d1).total_seconds()>60:
# if your kill criteria are met, put something in the queue
q.put(1)
I originally found this answer somewhere online, possibly this. Hope this helps!
Another solution would be to use a separate instance of Python, and monitor the other Python thread, killing it from the system level, with psutils.
Wow, I like the daemon and stealth os._exit solutions too!
I have a process, that needs to perform a bunch of actions "later" (after 10-60 seconds usually). The problem is that those "later" actions can be a lot (1000s), so using a Thread per task is not viable. I know for the existence of tools like gevent and eventlet, but one of the problem is that the process uses zeromq for communication so I would need some integration (eventlet already has it).
What I'm wondering is What are my options? So, suggestions are welcome, in the lines of libraries (if you've used any of the mentioned please share your experiences), techniques (Python's "coroutine" support, use one thread that sleeps for a while and checks a queue), how to make use of zeromq's poll or eventloop to do the job, or something else.
consider using a priority queue with one or more worker threads to service the tasks. The main thread can add work to the queue, with a timestamp of the soonest it should be serviced. Worker threads pop work off the queue, sleep until the time of priority value is reached, do the work, and then pop another item off the queue.
How about a more fleshed out answer. mklauber makes a good point. If there's a chance all of your workers might be sleeping when you have new, more urgent work, then a queue.PriorityQueue isn't really the solution, although a "priority queue" is still the technique to use, which is available from the heapq module. Instead, we'll make use of a different synchronization primitive; a condition variable, which in python is spelled threading.Condition.
The approach is fairly simple, peek on the heap, and if the work is current, pop it off and do that work. If there was work, but it's scheduled into the future, just wait on the condition until then, or if there's no work at all, sleep forever.
The producer does it's fair share of the work; every time it adds new work, it notifies the condition, so if there are sleeping workers, they'll wake up and recheck the queue for newer work.
import heapq, time, threading
START_TIME = time.time()
SERIALIZE_STDOUT = threading.Lock()
def consumer(message):
"""the actual work function. nevermind the locks here, this just keeps
the output nicely formatted. a real work function probably won't need
it, or might need quite different synchronization"""
SERIALIZE_STDOUT.acquire()
print time.time() - START_TIME, message
SERIALIZE_STDOUT.release()
def produce(work_queue, condition, timeout, message):
"""called to put a single item onto the work queue."""
prio = time.time() + float(timeout)
condition.acquire()
heapq.heappush(work_queue, (prio, message))
condition.notify()
condition.release()
def worker(work_queue, condition):
condition.acquire()
stopped = False
while not stopped:
now = time.time()
if work_queue:
prio, data = work_queue[0]
if data == 'stop':
stopped = True
continue
if prio < now:
heapq.heappop(work_queue)
condition.release()
# do some work!
consumer(data)
condition.acquire()
else:
condition.wait(prio - now)
else:
# the queue is empty, wait until notified
condition.wait()
condition.release()
if __name__ == '__main__':
# first set up the work queue and worker pool
work_queue = []
cond = threading.Condition()
pool = [threading.Thread(target=worker, args=(work_queue, cond))
for _ignored in range(4)]
map(threading.Thread.start, pool)
# now add some work
produce(work_queue, cond, 10, 'Grumpy')
produce(work_queue, cond, 10, 'Sneezy')
produce(work_queue, cond, 5, 'Happy')
produce(work_queue, cond, 10, 'Dopey')
produce(work_queue, cond, 15, 'Bashful')
time.sleep(5)
produce(work_queue, cond, 5, 'Sleepy')
produce(work_queue, cond, 10, 'Doc')
# and just to make the example a bit more friendly, tell the threads to stop after all
# the work is done
produce(work_queue, cond, float('inf'), 'stop')
map(threading.Thread.join, pool)
This answer has actually two suggestions - my first one and another I have discovered after the first one.
sched
I suspect you are looking for the sched module.
EDIT: my bare suggestion seemed little helpful after I have read it. So I decided to test the sched module to see if it can work as I suggested. Here comes my test: I would use it with a sole thread, more or less this way:
class SchedulingThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
self.scheduler = sched.scheduler(time.time, time.sleep)
self.queue = []
self.queue_lock = threading.Lock()
self.scheduler.enter(1, 1, self._schedule_in_scheduler, ())
def run(self):
self.scheduler.run()
def schedule(self, function, delay):
with self.queue_lock:
self.queue.append((delay, 1, function, ()))
def _schedule_in_scheduler(self):
with self.queue_lock:
for event in self.queue:
self.scheduler.enter(*event)
print "Registerd event", event
self.queue = []
self.scheduler.enter(1, 1, self._schedule_in_scheduler, ())
First, I'd create a thread class which would have its own scheduler and a queue. At least one event would be registered in the scheduler: one for invoking a method for scheduling events from the queue.
class SchedulingThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
self.scheduler = sched.scheduler(time.time, time.sleep)
self.queue = []
self.queue_lock = threading.Lock()
self.scheduler.enter(1, 1, self._schedule_in_scheduler, ())
The method for scheduling events from the queue would lock the queue, schedule each event, empty the queue and schedule itself again, for looking for new events some time in the future. Note that the period for looking for new events is short (one second), you may change it:
def _schedule_in_scheduler(self):
with self.queue_lock:
for event in self.queue:
self.scheduler.enter(*event)
print "Registerd event", event
self.queue = []
self.scheduler.enter(1, 1, self._schedule_in_scheduler, ())
The class should also have a method for scheduling user events. Naturally, this method should lock the queue while updating it:
def schedule(self, function, delay):
with self.queue_lock:
self.queue.append((delay, 1, function, ()))
Finally, the class should invoke the scheduler main method:
def run(self):
self.scheduler.run()
Here comes an example of using:
def print_time():
print "scheduled:", time.time()
if __name__ == "__main__":
st = SchedulingThread()
st.start()
st.schedule(print_time, 10)
while True:
print "main thread:", time.time()
time.sleep(5)
st.join()
Its output in my machine is:
$ python schedthread.py
main thread: 1311089765.77
Registerd event (10, 1, <function print_time at 0x2f4bb0>, ())
main thread: 1311089770.77
main thread: 1311089775.77
scheduled: 1311089776.77
main thread: 1311089780.77
main thread: 1311089785.77
This code is just a quick'n'dirty example, it may need some work. However, I have to confess that I am a bit fascinated by the sched module, so did I suggest it. You may want to look for other suggestions as well :)
APScheduler
Looking in Google for solutions like the one I've post, I found this amazing APScheduler module. It is so practical and useful that I bet it is your solution. My previous example would be way simpler with this module:
from apscheduler.scheduler import Scheduler
import time
sch = Scheduler()
sch.start()
#sch.interval_schedule(seconds=10)
def print_time():
print "scheduled:", time.time()
sch.unschedule_func(print_time)
while True:
print "main thread:", time.time()
time.sleep(5)
(Unfortunately I did not find how to schedule an event to execute only once, so the function event should unschedule itself. I bet it can be solved with some decorator.)
If you have a bunch of tasks that need to get performed later, and you want them to persist even if you shut down the calling program or your workers, you should really look into Celery, which makes it super easy to create new tasks, have them executed on any machine you'd like, and wait for the results.
From the Celery page, "This is a simple task adding two numbers:"
from celery.task import task
#task
def add(x, y):
return x + y
You can execute the task in the background, or wait for it to finish:
>>> result = add.delay(8, 8)
>>> result.wait() # wait for and return the result
16
You wrote:
one of the problem is that the process uses zeromq for communication so I would need some integration (eventlet already has it)
Seems like your choice will be heavily influenced by these details, which are a bit unclear—how is zeromq being used for communication, how much resources will the integration will require, and what are your requirements and available resources.
There's a project called django-ztask which uses zeromq and provides a task decorator similar to celery's one. However, it is (obviously) Django-specific and so may not be suitable in your case. I haven't used it, prefer celery myself.
Been using celery for a couple of projects (these are hosted at ep.io PaaS hosting, which provides an easy way to use it).
Celery looks like quite flexible solution, allowing delaying tasks, callbacks, task expiration & retrying, limiting task execution rate, etc. It may be used with Redis, Beanstalk, CouchDB, MongoDB or an SQL database.
Example code (definition of task and asynchronous execution after a delay):
from celery.decorators import task
#task
def my_task(arg1, arg2):
pass # Do something
result = my_task.apply_async(
args=[sth1, sth2], # Arguments that will be passed to `my_task()` function.
countdown=3, # Time in seconds to wait before queueing the task.
)
See also a section in celery docs.
Have you looked at the multiprocessing module? It comes standard with Python. It is similar to the threading module, but runs each task in a process. You can use a Pool() object to set up a worker pool, then use the .map() method to call a function with the various queued task arguments.
Pyzmq has an ioloop implementation with a similar api to that of the tornado ioloop. It implements a DelayedCallback which may help you.
Presuming your process has a run loop which can receive signals and the length of time of each action is within bounds of sequential operation, use signals and posix alarm()
signal.alarm(time)
If time is non-zero, this function requests that a
SIGALRM signal be sent to the process in time seconds.
This depends on what you mean by "those "later" actions can be a lot" and if your process already uses signals. Due to phrasing of the question it's unclear why an external python package would be needed.
Another option is to use the Phyton GLib bindings, in particular its timeout functions.
It's a good choice as long as you don't want to make use of multiple cores and as long as the dependency on GLib is no problem. It handles all events in the same thread which prevents synchronization issues. Additionally, its event framework can also be used to watch and handle IO-based (i.e. sockets) events.
UPDATE:
Here's a live session using GLib:
>>> import time
>>> import glib
>>>
>>> def workon(thing):
... print("%s: working on %s" % (time.time(), thing))
... return True # use True for repetitive and False for one-time tasks
...
>>> ml = glib.MainLoop()
>>>
>>> glib.timeout_add(1000, workon, "this")
2
>>> glib.timeout_add(2000, workon, "that")
3
>>>
>>> ml.run()
1311343177.61: working on this
1311343178.61: working on that
1311343178.61: working on this
1311343179.61: working on this
1311343180.61: working on this
1311343180.61: working on that
1311343181.61: working on this
1311343182.61: working on this
1311343182.61: working on that
1311343183.61: working on this
Well in my opinion you could use something called "cooperative multitasking". It's twisted-based thing and its really cool. Just look at PyCon presentation from 2010: http://blip.tv/pycon-us-videos-2009-2010-2011/pycon-2010-cooperative-multitasking-with-twisted-getting-things-done-concurrently-11-3352182
Well you will need transport queue to do this too...
Simple. You can inherit your class from Thread and create instance of your class with Param like timeout so for each instance of your class you can say timeout that will make your thread wait for that time
I have a long process that I've scheduled to run in a thread, because otherwise it will freeze the UI in my wxpython application.
I'm using:
threading.Thread(target=myLongProcess).start()
to start the thread and it works, but I don't know how to pause and resume the thread. I looked in the Python docs for the above methods, but wasn't able to find them.
Could anyone suggest how I could do this?
I did some speed tests as well, the time to set the flag and for action to be taken is pleasantly fast 0.00002 secs on a slow 2 processor Linux box.
Example of thread pause test using set() and clear() events:
import threading
import time
# This function gets called by our thread.. so it basically becomes the thread init...
def wait_for_event(e):
while True:
print('\tTHREAD: This is the thread speaking, we are Waiting for event to start..')
event_is_set = e.wait()
print('\tTHREAD: WHOOOOOO HOOOO WE GOT A SIGNAL : %s' % event_is_set)
# or for Python >= 3.6
# print(f'\tTHREAD: WHOOOOOO HOOOO WE GOT A SIGNAL : {event_is_set}')
e.clear()
# Main code
e = threading.Event()
t = threading.Thread(name='pausable_thread',
target=wait_for_event,
args=(e,))
t.start()
while True:
print('MAIN LOOP: still in the main loop..')
time.sleep(4)
print('MAIN LOOP: I just set the flag..')
e.set()
print('MAIN LOOP: now Im gonna do some processing')
time.sleep(4)
print('MAIN LOOP: .. some more processing im doing yeahhhh')
time.sleep(4)
print('MAIN LOOP: ok ready, soon we will repeat the loop..')
time.sleep(2)
There is no method for other threads to forcibly pause a thread (any more than there is for other threads to kill that thread) -- the target thread must cooperate by occasionally checking appropriate "flags" (a threading.Condition might be appropriate for the pause/unpause case).
If you're on a unix-y platform (anything but windows, basically), you could use multiprocessing instead of threading -- that is much more powerful, and lets you send signals to the "other process"; SIGSTOP should unconditionally pause a process and SIGCONT continues it (if your process needs to do something right before it pauses, consider also the SIGTSTP signal, which the other process can catch to perform such pre-suspension duties. (There may be ways to obtain the same effect on Windows, but I'm not knowledgeable about them, if any).
You can use signals: http://docs.python.org/library/signal.html#signal.pause
To avoid using signals you could use a token passing system. If you want to pause it from the main UI thread you could probably just use a Queue.Queue object to communicate with it.
Just pop a message telling the thread the sleep for a certain amount of time onto the queue.
Alternatively you could simply continuously push tokens onto the queue from the main UI thread. The worker should just check the queue every N seconds (0.2 or something like that). When there are no tokens to dequeue the worker thread will block. When you want it to start again just start pushing tokens on to the queue from the main thread again.
The multiprocessing module works fine on Windows. See the documentation here (end of first paragraph):
http://docs.python.org/library/multiprocessing.html
On the wxPython IRC channel, we had a couple fellows trying multiprocessing out and they said it worked. Unfortunately, I have yet to see anyone who has written up a good example of multiprocessing and wxPython.
If you (or anyone else on here) come up with something, please add it to the wxPython wiki page on threading here: http://wiki.wxpython.org/LongRunningTasks
You might want to check that page out regardless as it has several interesting examples using threads and queues.
You might take a look at the Windows API for thread suspension.
As far as I'm aware there is no POSIX/pthread equivalent. Furthermore, I cannot ascertain if thread handles/IDs are made available from Python. There are also potential issues with Python, as its scheduling is done using the native scheduler, it's unlikely that it is expecting threads to suspend, particularly if threads suspended while holding the GIL, amongst other possibilities.
I had the same issue. It is more effective to use time.sleep(1800) in the thread loop to pause the thread execution.
e.g
MON, TUE, WED, THU, FRI, SAT, SUN = range(7) #Enumerate days of the week
Thread 1 :
def run(self):
while not self.exit:
try:
localtime = time.localtime(time.time())
#Evaluate stock
if localtime.tm_hour > 16 or localtime.tm_wday > FRI:
# do something
pass
else:
print('Waiting to evaluate stocks...')
time.sleep(1800)
except:
print(traceback.format_exc())
Thread 2
def run(self):
while not self.exit:
try:
localtime = time.localtime(time.time())
if localtime.tm_hour >= 9 and localtime.tm_hour <= 16:
# do something
pass
else:
print('Waiting to update stocks indicators...')
time.sleep(1800)
except:
print(traceback.format_exc())