Is this Python producer-consumer lockless approach thread-safe?

Is this Python producer-consumer lockless approach thread-safe? - python

I recently wrote a program that used a simple producer/consumer pattern. It initially had a bug related to improper use of threading.Lock that I eventually fixed. But it made me think whether it's possible to implement producer/consumer pattern in a lockless manner.
Requirements in my case were simple:
One producer thread.
One consumer thread.
Queue has place for only one item.
Producer can produce next item before the current one is consumed. The current item is therefore lost, but that's OK for me.
Consumer can consume current item before the next one is produced. The current item is therefore consumed twice (or more), but that's OK for me.
So I wrote this:
QUEUE_ITEM = None
# this is executed in one threading.Thread object
def producer():
global QUEUE_ITEM
while True:
i = produce_item()
QUEUE_ITEM = i
# this is executed in another threading.Thread object
def consumer():
global QUEUE_ITEM
while True:
i = QUEUE_ITEM
consume_item(i)
My question is: Is this code thread-safe?
Immediate comment: this code isn't really lockless - I use CPython and it has GIL.
I tested the code a little and it seems to work. It translates to some LOAD and STORE ops which are atomic because of GIL. But I also know that del x operation isn't atomic when x implements __del__ method. So if my item has a __del__ method and some nasty scheduling happens, things may break. Or not?
Another question is: What kind of restrictions (for example on produced items' type) do I have to impose to make the above code work fine?
My questions are only about theoretical possibility to exploit CPython's and GIL's quirks in order to come up with lockless (i.e. no locks like threading.Lock explicitly in code) solution.

Trickery will bite you. Just use Queue to communicate between threads.

Yes this will work in the way that you described:
That the producer may produce a skippable element.
That the consumer may consume the same element.
But I also know that del x operation isn't atomic when x implements del method. So if my item has a del method and some nasty scheduling happens, things may break.
I don't see a "del" here. If a del happens in consume_item then the del may occur in the producer thread. I don't think this would be a "problem".
Don't bother using this though. You will end up using up CPU on pointless polling cycles, and it is not any faster than using a queue with locks since Python already has a global lock.

This is not really thread safe because producer could overwrite QUEUE_ITEM before consumer has consumed it and consumer could consume QUEUE_ITEM twice. As you mentioned, you're OK with that but most people aren't.
Someone with more knowledge of cpython internals will have to answer you more theoretical questions.

I think it's possible that a thread is interrupted while producing/consuming, especially if the items are big objects.
Edit: this is just a wild guess. I'm no expert.
Also the threads may produce/consume any number of items before the other one starts running.

You can use a list as the queue as long as you stick to append/pop since both are atomic.
QUEUE = []
# this is executed in one threading.Thread object
def producer():
global QUEUE
while True:
i = produce_item()
QUEUE.append(i)
# this is executed in another threading.Thread object
def consumer():
global QUEUE
while True:
try:
i = QUEUE.pop(0)
except IndexError:
# queue is empty
continue
consume_item(i)
In a class scope like below, you can even clear the queue.
class Atomic(object):
def __init__(self):
self.queue = []
# this is executed in one threading.Thread object
def producer(self):
while True:
i = produce_item()
self.queue.append(i)
# this is executed in another threading.Thread object
def consumer(self):
while True:
try:
i = self.queue.pop(0)
except IndexError:
# queue is empty
continue
consume_item(i)
# There's the possibility producer is still working on it's current item.
def clear_queue(self):
self.queue = []
You'll have to find out which list operations are atomic by looking at the bytecode generated.

The __del__ could be a problem as You said. It could be avoided, if only there was a way to prevent the garbage collector from invoking the __del__ method on the old object before We finish assigning the new one to the QUEUE_ITEM. We would need something like:
increase the reference counter on the old object
assign a new one to `QUEUE_ITEM`
decrease the reference counter on the old object
I'm afraid, I don't know if it is possible, though.

Related

Rewrite Python generator as asynchronous

I have a Python script that generates [str, float] tuples which are then indexed into ElasticSearch using a custom function which eventually calls helper.streaming_bulk().
This is how the generator is implemented:
doc_ids: List[str] = [...]
docs = ((doc_id, get_value(doc_id) for doc_id in doc_ids)
get_value() calls a remote service that computes a float value per document id.
Next, these tuples are passed on to update_page_quality_bulk():
for success, item in update_page_quality_bulk(
islice(doc_qualities, size)
):
total_success += success
if not success:
logging.error(item)
Internally, update_page_quality_bulk() creates the ElasticSearch requests.
One of the advantages of using a generator here is that the first size elements can be fed into update_page_quality_bulk() through islice().
In order to make the entire process faster, I would like to parallelize the get_value() calls. As mentioned, these are remote calls so the local compute cost in negligible, but the duration is significant.
The order of the tuples does not matter, neither which elements are passed into update_page_quality_bulk(). On a high level, I would like to make the get_value() calls (up to x in parallel) for any n tuples and pass on whichever ones are finished first.
My naive attempt was to define get_value() as asynchronous:
async def get_value():
...
and await the call in the generator:
docs = ((doc_id, await get_value(doc_id) for doc_id in doc_ids)
However, this raises an error in the subsequent islice() call:
TypeError: 'async_generator' object is not iterable
Removing the islice call and passing the unmodified docs generator to update_page_quality_bulk() causes the same error to be raised when looping over the tuples to convert them into ElasticSearch requests.
I am aware that the ElasticSearch client provides asynchronous helpers, but they don't seem applicable here because I need to generate the actions first.
According to this answer, it seems like I have to change the implementation to using a queue.
This answer implies that it cannot be done without using multiprocessing due to Python GIL, but that answer is not marked as correct and is quite old too.
Generally, I am looking for a way to change the current logic as little as possible while parallelizing the get_value() calls.

So, you want to pass an "synchronous looking" generator to a call that expects a normal lazy generator such as islice, and keep getting the results for this in parallel.
It sounds like a work for asyncio.as_completed: you use your plain generator to create tasks - these are run in parallel by the asyncio machinery, and the results are made available as the tasks are completed (d'oh!).
However since update_page_quality_bulk is not asynco aware, it will never yield the control to the asyncio loop, so that it can complete the tasks which got their results. This would likely block.
Calling update_page_quality_bulk in another thread probably won't work as well. I did not try it here, but I'd say you can't just iterate over doc in a different thread than the one it (and its tasks) where created.
So, first things first - the "generator expression" syntax does not work when you want some terms of the generator to be calculated asynchronously, as you found out - we refactor that so that the tuples are created in an coroutine-function - and we wrap all calls for those in tasks (some of the asyncio functions do the wrapping in a task automatically)
Then we can us the asyncio machinery to schedule all the calls and call update_page_quality_bulk as these results arrive. The problem is that as_completed, as stated above, can't be passed directly to a non-async function: the asyncio loop would never get control back. Instead, we keep picking the results of tasks in the main thread, and call the sync function in another thread - using a Queue to pass the fetched results. And finally, so that the results can be consumed as made available inside update_page_quality_bulk, we create a small wrapper class to the threading.Queue, so that it can be consumed as in iterator - this is transparent for the code consuming the iterator.
# example code: untested
async def get_doc_values(doc_id):
loop = asyncio.get_running_loop()
# Run_in_executor runs the synchronous function in parallel in a thread-pool
# check the docs - you might want to pass a custom executor with more than
# the default number of workers, instead of None:
return doc_id, await asyncio.run_in_executor(None, get_value, doc_id)
def update_es(iterator):
# this function runs in a separate thread -
for success, item in update_page_quality_bulk(iterator):
total_success += success
if not success:
logging.error(item)
sentinel = Ellipsis # ... : python ellipsis - a nice sentinel that also worker for multiprocessing
class Iterator:
"""This allows the queue, fed in the main thread by the tasks as they are as they are completed
to behave like an ordinary iterator, which can be consumed by "update_page_quality_bulk" in another thread
"""
def __init__(self, source_queue):
self.source = source_queue
def __next__(self):
value= self.source.get()
if value is sentinel:
raise StopIteration()
return value
queue = threading.Queue()
iterator = Iterator(queue)
es_worker = threading.Thread(target=update_es, args=(iterator,))
es_worker.start()
for doc_value_task in asyncio.as_completed(get_doc_values(doc_id) for doc_id in doc_ids):
doc_value = await doc_value_task
queue.put(doc_value)
es_worker.join()

Can threading.Event be used to protect variable access in Python?

I have a relatively simple scenario in some Python code where I have two threads, one of which sets a value and the other is waiting for it to be set. My instinct was to reach for threading.Condition to implement this but I got wondering whether I could simply use threading.Event instead.
So, I have something like this:
value = None
readyToRead = threading.Event()
def set():
# executes in thread 1
global value
value = computeValue()
readyToRead.set()
def get():
# executes in thread 2
readyToRead.wait()
useValue(value)
I suppose I am uneasy because access to value is not actually mutex protected and I think in some languages at least it might not be safe simply to rely on the ordering implied by the statements in the code.
Is this a valid use of Event in Python?

yes this is the valid use-case of event..
value is thread protected.
if you increase the number of thread you have to wait in all thread .if that is the case you can use semaphore to.

Loop through changing dataset with inlineCallbacks/yield (python-twisted)

I have a defer.inlineCallback function for incrementally updating a large (>1k) list one piece at a time. This list may change at any time, and I'm getting bugs because of that behavior.
The simplest representation of what I'm doing is:-
#defer.inlineCallbacks
def _get_details(self, dt=None):
data = self.data
for e in data:
if needs_update(e):
more_detail = yield get_more_detail(e)
do_the_update(e, more_detail)
schedule_future(self._get_details)
self.data is a list of dictionaries which is initially populated with basic information (e.g. a name and ID) at application start. _get_details will run whenever allowed to by the reactor to get more detailed information for each item in data, updating the item as it goes along.
This works well when self.data does not change, but once it is changed (can be at any point) the loop obviously refers to the wrong information. In fact in that situation it would be better to just stop the loop entirely.
I'm able to set a flag in my class (which the inlineCallback can then check) when the data is changed.
Where should this check be conducted?
How does the inlineCallback code execute compared to a normal deferred (and indeed to a normal python generator).
Does code execution stop everytime it encounters yield (i.e. can I rely on this code between one yield and the next to be atomic)?
In the case of unreliable large lists, should I even be looping through the data (for e in data), or is there a better way?

the Twisted reactor never preempts your code while it is executing -- you have to voluntarily yield to the reactor by returning a value. This is why it is such a terrible thing to write Twisted code that blocks on I/O, because the reactor is not able to schedule any tasks while you are waiting for your disk.
So the short answer is that yes, execution is atomic between yields.
Without #inlineCallbacks, the _get_details function returns a generator. The #inlineCallbacks annotation simply wraps the generator in a Deferred that traverses the generator until it reaches a StopIteration exception or a defer.returnValue exception. When either of those conditions is reached, inlineCallbacks fires its Deferred. It's quite clever, really.
I don't know enough about your use case to help with your concurrency problem. Maybe make a copy of the list with tuple() and update that. But it seems like you really want an event-driven solution and not a state-driven one.

You need to protect access to shared resource (self.data).
You can do this with: twisted.internet.defer.DeferredLock.
http://twistedmatrix.com/documents/current/api/twisted.internet.defer.DeferredLock.html
Method acquire
Attempt to acquire the lock. Returns a Deferred that fires on lock
acquisition with the DeferredLock as the value. If the lock is locked,
then the Deferred is placed at the end of a waiting list.
Method release
Release the lock. If there is a waiting list, then the first Deferred in that waiting list will be called back.

#defer.inlineCallback
def _get_details(self, dt=None):
data = self.data
i = 0
while i < len(data):
e = data[i]
if needs_update(e):
more_detail = yield get_more_detail(e)
if i < len(data) or data[i] != e:
break
do_the_update(e, more_detail)
i += 1
schedule_future(self._get_details)
Based on more testing, the following are my observations.
for e in data iterates through elements, with the element still existing even if data itself does not, both before and after the yield statement.
As far as I can tell, execution is atomic between one yield and the next.
Looping through the data is more transparently done by using a counter. This also allows for checking whether the data has changed. The check can be done anytime after yield because any changes must have occurred before yield returned. This results in the code shown above.

self.data is a list of dictionaries...once it is changed (can be at any point) the loop obviously refers to the wrong information
If you're modifying a list while you iterate it, as Raymond Hettinger would say "You're living in the land of sin and you deserve everything that happens to you." :) Scenarios like this should be avoided or the list should be immutable. To circumvent this problem, you can use self.data.pop() or DeferredQueue object to store data. This way you can add and remove elements at anytime without causing adverse effects. Example with a list:
#defer.inlineCallbacks
def _get_details(self, dt=None):
try:
data = yield self.data.pop()
except IndexError:
schedule_future(self._get_details)
defer.returnValue(None) # exit function
if needs_update(e):
more_detail = yield get_more_detail(data)
do_the_update(data, more_detail)
schedule_future(self._get_details)
Take a look at DeferredQueue because a Deferred is returned when the get() function is called, which you can chain callbacks to handle each element you pop from the queue.

Python destructor basing on try/finally + yield?

I've been testing a dirty hack inspired by this http://docs.python.org/2/library/contextlib.html .
The main idea is to bring try/finally idea onto class level and get reliable and simple class destructor.
class Foo():
def __init__(self):
self.__res_mgr__ = self.__acquire_resources__()
self.__res_mgr__.next()
def __acquire_resources__(self):
try:
# Acquire some resources here
print "Initialize"
self.f = 1
yield
finally:
# Release the resources here
print "Releasing Resources"
self.f = 0
f = Foo()
print "testing resources"
print f.f
But it always gives me:
Initialize
testing resources
1
and never "Releasing Resources". I'm basing my hope on:
As of Python version 2.5, the yield statement is now allowed in the
try clause of a try ... finally construct. If the generator is not
resumed before it is finalized (by reaching a zero reference count or
by being garbage collected), the generator-iterator’s close() method
will be called, allowing any pending finally clauses to execute. Source link
But it seems when the class member is being garbage collected together with the class their ref counts don't decrease, so as a result generators close() and thus finally is never called. As for the second part of the quote
"or by being garbage collected"
I just don't know why it's not true. Any chance to make this utopia work? :)
BTW this works on module level:
def f():
try:
print "ack"
yield
finally:
print "release"
a = f()
a.next()
print "testing"
Output will be as I expect:
ack
testing
release
NOTE: In my task I'm not able to use WITH manager because I'm releasing the resource inside end_callback of the thread (it will be out of any WITH). So I wanted to get a reliable destructor for cases when callback won't be called for some reason

The problem you are having is caused by a reference cycle and an implicit __del__ defined on your generator (it's so implicit, CPython doesn't actually show __del__ when you introspect, because only the C level tp_del exists, no Python-visible __del__ is created). Basically, when a generator has a yield inside:
A try block, or equivalently
A with block
it has an implicit __del__-like implementation. On Python 3.3 and earlier, if a reference cycle contains an object whose class implements __del__ (technically, has tp_del in CPython), unless the cycle is manually broken, the cyclic garbage collector cannot clean it up, and just sticks it in gc.garbage (import gc to gain access), because it doesn't know which objects (if any) must be collected first to clean up "nicely".
Because your class's __acquire_resources__(self) contains a reference to the instance's self, you form a reference cycle:
self -> self.__res_mgr__ (generator object) -> generator frame (referencing locals which includes) -> self
Because of this reference cycle, and the fact that the generator has a try/finally in it (creating tp_del equivalent to __del__), the cycle is uncollectable, and your finally block never gets executed unless you manually advance self.__res_mgr__ (which defeats the whole purpose).
You experiment happens to display this problem automatically because the reference cycle is implicit/automatic, but any accidental reference cycle where an object in the cycle has a class with __del__ will trigger the same problem, so even if you just did:
class Foo():
def __init__(self):
# Acquire some resources here
print "Initialize"
self.f = 1
def __del__(self):
# Release the resources here
print "Releasing Resources"
self.f = 0
if the "resources" involved could conceivably lead to a reference cycle with an instance of Foo, you'd have the same problem.
The solution here is one or both of:
Make your class a context manager so users provide the information necessary for deterministic finalization (by using with blocks) as well as providing an explicit cleanup method (e.g. close) for when with blocks aren't feasible (part of another object's state that is cleaned up through its own resource management). This is also the only way to provide deterministic cleanup on most non-CPython interpreters where reference counting semantics have never been used (so all finalizers are called non-deterministically, if at all)
Move to Python 3.4 or higher, where PEP 442 resolves the issue with uncollectable cyclic garbage (it's technically still possible to produce such cycles on CPython, but only via third party extensions that continue to use tp_del instead of updating to use the tp_finalize slot that allows cyclic garbage to be cleaned properly). It's still non-deterministic cleanup (if a reference cycle exists, you're waiting on the cyclic gc to run, sometime), but it's possible, where pre-3.4, cyclic garbage of this sort could not be cleaned up at all.

Forcing an interrupt between threads through a singleton object (academic)

I'm sure this is not a very pythonic situation. But I'm not actually using this in any production code, I'm just considering how (if?) this could work. It doesn't have to be python specific, but I'd like a solution that at least WORKS within python framework.
Basically, I have a thread safe singleton object that implements __enter__ and __exit__ (so it can be used with with.
Singleton():
l = threading.Lock()
__enter__():
l.acquire()
__exit__():
l.release()
In my example, one thread gets the singleton, and inside the with statement it enters an infinite loop.
def infinite():
with Singleton():
while True:
pass
The goal of this experiment is to get the infinite thread out of its infinite loop WITHOUT killing the thread. Specifically using the Singleton object. First I was thinking of using an exception called from a different thread:
Singleton():
....
def killMe():
raise exception
But this obviously doesn't raise the exception in the other thread. What I thought next is that since the enter and exit methods acquire a class variable lock, is there any method that can be called on the Lock that will cause the thread that has acquired it to throw an exception?
Or, what I would probably do in C++ is just delete this or somehow call the destructor of the object from itself. Is there ANY way to do this in python? I know that if it's possible it will be a total hack job. But again, this is basically a thought experiment.

In Python, there is a somewhat undocumented way of raising an exception in another thread, though there are some caveats. See this recipe for "killable threads":
http://code.activestate.com/recipes/496960-thread2-killable-threads/
http://sebulba.wikispaces.com/recipe+thread2

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Is this Python producer-consumer lockless approach thread-safe? - python

Trickery will bite you. Just use Queue to communicate between threads.

I think it's possible that a thread is interrupted while producing/consuming, especially if the items are big objects. Edit: this is just a wild guess. I'm no expert. Also the threads may produce/consume any number of items before the other one starts running.

Related

Rewrite Python generator as asynchronous

Can threading.Event be used to protect variable access in Python?

Loop through changing dataset with inlineCallbacks/yield (python-twisted)

Python destructor basing on try/finally + yield?

Forcing an interrupt between threads through a singleton object (academic)

Categories

Resources