One-time wake a waiting thread in Python

One-time wake a waiting thread in Python - python

I have a Python object that is updated regularly, from an asynchronous I/O callback. A few threads need to wait for this object to be updated.
This "update" event is an instantaneous event, and as well the action of telling the waiting threads to wake up should be an atomic one.
I have looked for a few solutions:
Condition objects need you to acquire a lock first, which is a no-go: I need several threads to reach the wait unhindered, without fighting to acquire a lock.
Event objects generate a race condition: if a thread reaches the wait before the event is cleared, it won't wait, and on the contrary the event could be cleared before any thread is awoken.
The best solution would be an equivalent of the POSIX pause/kill combo, but for threads (at least the best that I can think of).
So, questions:
Does the pause/kill combo have an equivalent for Python 2.7 threads, and which one?
If not, what is the best compromise (in terms of reliability) for my use case, using the Python 2.7 standard library?
Here is something similar to what I would like to achieve:
# Would be a derived of threading.Event
# and the perfect solution for me, if it existed
ev = InstantEvent()
def update(*args):
# do stuff with args
ev.notifyAll()
if __name__ == "__main__":
# do startup stuff
ev.wait()
# do more stuff

each event can wait on Event object with wait() as you see. Even more: they can check periodically the event (wait with timeout) and then to do own job in the loop. Callback will set event when it done. You can combine several events by checking one then another, etc. I can not understand your problem. If you want exclusive reaction on event - use semaphore instead of event: to allow only one thread/listener to process callback completion. See more about Python threading: https://docs.python.org/3/library/threading.html

Related

How can I "wake up" an event loop to notify it that a Future was completed from another thread?

When using python async/asyncio, I often create and complete asyncio.Future objects from threads that are not the thread running the event loop.
Unless I complete those futures in the thread that is running the event loop or via a function that notifies that loop of the completion, the event loop often does not "notice" that the futures are completed.
Is there a way to "notify" an event loop that it should check a Future for completion if that future was readied (via set_result) externally?
Why I am asking this
The threads which ready futures need to a) have very low latency, and b) check whether the future has been readied, synchronously, later on (e.g. via future.done()).
The event loop awaiting the Futures does not need to have low latency in being notified that they're ready--it can be notified a few milliseconds late.
Ideally there would be a performant way to notify the event loop that a Future had been readied after readying it synchronously in a thread.
Even if that's not possible, the event loop could poll readiness on an interval, so long as the futures were synchronously readied as quickly as possible in threads.
What I have tried
The "correct" way to solve this problem is with call_soon_threadsafe, e.g.:
def do_in_thread(future):
future.get_loop().call_soon_threasafe(future.set_result, "the result")
That notifies the event loop of Future readiness reliably, but does not work for two reasons:
It has significant (8-10x) overhead versus calling future.set_result in my benchmarks.
It doesn't ready the Future until the event loop runs, which means I can't reliably check if the Future is done, which I need to do. For example, this won't work:
def do_in_thread(future):
future.get_loop().call_soon_threasafe(future.set_result, "the result")
assert future.done() # Fails
One thing that does seem to work is to notify the event loop by intentionally failing a second call to set_result via call_soon_threadsafe, and swallowing the InvalidStateError, like this:
def ensure_result(f, res):
try:
f.set_result(res)
except InvalidStateError:
pass
def in_thread(fut: Future):
fut.set_result("the result")
fut.get_loop().call_soon_threadsafe(ensure_result, fut, "the result")
That still has overhead, but I could remove the overhead of calling call_soon_threadsafe by tracking Futures in a thread-shared data structure and polling calls to ensure_result occasionally. However, I'm still not sure:
Does that reliably work? Is set_result failing with InvalidStateError guaranteed to notify the event loop that a await given Future can return out of await, or is that an undocumented implementation detail I'm relying on?
Is there a better way to achieve that periodic-wakeup that doesn't involve me keeping track of/polling such Futures myself?
In a perfect world, there would be a loop.poll_all_pending_futures() or loop.update_future_state(fut) method which would achieve this efficiently, but I don't know of one.

What is the difference between pygame.event and pygame.fastevent?

According to the pygame.event documentation, this function gets all the events from the queue.
get events from the queue
get(eventtype=None) -> Eventlist
get(eventtype=None, pump=True) -> Eventlist
This will get all the messages and remove them from the queue.
According to the pygame.fastevent documentation, this function gets all the events from the queue.
get all events from the queue
get() -> list of Events
This will get all the messages and remove them from the queue.
So what's the difference?
I think the difference is about multithreading: event "should be called from the main thread" and fastevent is used in "multithread environments" - but I don't see any difference (in this case for example)

The idea behind fastevent was to remove some limitation in SDL's event handling code to be able to process more than 12,700 events per second.
You can still find the original documentation here and here. A few quotes:
Digging through the SDL_WaitEvent() code is when I got my nose rubbed in the fact that SDL is designed to work on operating systems that do not support threads. SDL_WaitEvent waits for events to arrive in the queue the simplest way possible, it uses SDL_Delay() to wait for 10 milliseconds and then checks to see if there are any events waiting. Checking the queue 100 times per second and possibly polling tor input events 100 times per second is great for single threaded games and it gives you the same results whether you OS supports threads or not.
...
To figure out how fast the test should run I looked at the SDL event code and saw that the queue can hold at most 127 items and since SDL_WaitEvent() looks at the queue 100 times per second we know that SDL_WaitEvent() can not remove more than (127 * 100) = 12,700 events/second and you can't push more than 12,700 events into the queue in a second.
...
Because I would like to use the same library for both the client and the server I wanted to see if I could make this code run a little faster.
...
The next step was to write my own version of SDL_WaitEvent() and use a semaphore and a condition variable to control access to the event queue. A semaphore is a simple mutual exclusion operator also known as a mutex. A mutex is used by threads to keep more that one thread from touching a variable or running a section of code at the same time. Having more than one thread changing the same data at the same time leads to horrible bugs that are hard to find. In this case I needed a mutex to keep the contents of the queue consistent. A condition variable is just a way for one thread to let other threads know that something has happened. One or more threads can wait on a condition, and when another thread signals that the condition has occurred the waiting threads wake up an go about their business.
...
When I tested that code I found that it got the SDL events and it was able to push over 30,000 events per second from my event pushing thread to the main SDL thread. I believe that the speed I'm seeing is only limited by the speed of my test machine, and not by anything in my code or in SDL. ...
Note that this does not remove the limition that the event handling functions must be
called from the main SDL thread. But it allows you to post events safely from other threads.

Is python's 'if' polling?

I am trying wait for any of multiple multiprocessing events at the same time, so I came up with code like this:
if e1.wait(timeout) or e2.wait(timeout):
# this part will be reached if either of both
# events is set or the wait timed out
It works like the comment says. But how does this work? Is the if polling bot methods all the time? Or is it called as soon as one event gets set?
Bonus question: Is there some clever way to adjust the code to wait for any number of events, i.e. a list of events? if True in [e1.wait(timeout),e2.wait(timeout)] does not work as expected.

It only waits for the first one. This is due to python's support of short circuiting.

Wait on a thread or process is blocking, so it will block the current thread for going future until the timeout or the thread has finished. The semantics of if in Python is short circuit, which means that if the first one returns true, then the second one will not be called - simonzack said.
Waiting on a number of threads would be kinda hard to implement and maintain for a variety of threads. I would suggest you to use Message passing, and get each process to send a message to a Queue when it is finished. This way you could just check if the queue was of ´len(n)´, where ´n´ is the number of threads/processes. see more here Queues in multiprocessing

Fair semaphore in python

Is it possible to have a fair semaphore in python, one that guarantees that blocking threads are unblocked in the order they call acquire()?

You might have to build one from other moving parts. For example, create a Queue.Queue() to which each listener posts a brand-new Event() on which it then waits. When it is time to wake up one of the waiting threads, pop off the item on the queue that has been waiting longest — it will be one of those event objects — and release the thread through event.set().
Obviously, you could also use a semaphore per waiting process, but I like the semantics of an Event since it can clearly only happen once, while a semaphore has the semantics that its value could support many waiting threads.
To set the system up:
import Queue
big_queue = Queue.Queue()
Then, to wait:
import threading
myevent = threading.Event()
big_queue.put(myevent)
myevent.wait()
And to release one of the waiting threads:
event = big_queue.get()
event.set()
I suppose the weakness of this approach is that the thread doing the set/release has to wait for a waiting thread to come along, whereas a true semaphore would let several releases proceed even if no one was waiting yet?

With Brandon having addressed the "fair semaphore" question, it might be useful to look at a related problem of barriers, a waiting point for threads to reach and then be released at the same time: http://docs.python.org/py3k/whatsnew/3.2.html#threading

Putting a thread to sleep until event X occurs

I'm writing to many files in a threaded app and I'm creating one handler per file. I have HandlerFactory class that manages the distribution of these handlers. What I'd like to do is that
thread A requests and gets foo.txt's file handle from the HandlerFactory class
thread B requests foo.txt's file handler
handler class recognizes that this file handle has been checked out
handler class puts thread A to sleep
thread B closes file handle using a wrapper method from HandlerFactory
HandlerFactory notifies sleeping threads
thread B wakes and successfully gets foo.txt's file handle
This is what I have so far,
def get_handler(self, file_path, type):
self.lock.acquire()
if file_path not in self.handlers:
self.handlers[file_path] = open(file_path, type)
elif not self.handlers[file_path].closed:
time.sleep(1)
self.lock.release()
return self.handlers[file_path][type]
I believe this covers the sleeping and handler retrieval successfully, but I am unsure how to wake up all threads, or even better wake up a specific thread.

What you're looking for is known as a condition variable.
Condition Variables
Here is the Python 2 library reference.
For Python 3 it can be found here

Looks like you want a threading.Semaphore associated with each handler (other synchronization objects like Events and Conditions are also possible, but a Semaphore seems simplest for your needs). (Specifically, use a BoundedSemaphore: for your use case, that will raise an exception immediately for programming errors that erroneously release the semaphone more times than they acquire it -- and that's exactly the reason for being of the bounded version of semaphones;-).
Initialize each semaphore to a value of 1 when you build it (so that means the handler is available). Each using-thread calls acquire on the semaphore to get the handler (that may block it), and release on it when it's done with the handler (that will unblock exactly one of the waiting threads). That's simpler than the acquire/wait/notify/release lifecycle of a Condition, and more future-proof too, since as the docs for Condition say:
The current implementation wakes up
exactly one thread, if any are
waiting. However, it’s not safe to
rely on this behavior. A future,
optimized implementation may
occasionally wake up more than one
thread.
while with a Semaphore you're playing it safe (the semantics whereof are safe to rely on: if a semaphore is initialized to N, there are at all times between 0 and N-1 [[included]] threads that have successfully acquired the semaphore and not yet released it).

You do realize that Python has a giant lock, so that most of the benefits of multi-threading you do not get, right?
Unless there is some reason for the master thread to do something with the results of each worker, you may wish to consider just forking off another process for each request. You won't have to deal with locking issues then. Have the children do what they need to do, then die. If they do need to communicate back, do it over a pipe, with XMLRPC, or through a sqlite database (which is threadsafe).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.