I have a relatively simple scenario in some Python code where I have two threads, one of which sets a value and the other is waiting for it to be set. My instinct was to reach for threading.Condition to implement this but I got wondering whether I could simply use threading.Event instead.
So, I have something like this:
value = None
readyToRead = threading.Event()
def set():
# executes in thread 1
global value
value = computeValue()
readyToRead.set()
def get():
# executes in thread 2
readyToRead.wait()
useValue(value)
I suppose I am uneasy because access to value is not actually mutex protected and I think in some languages at least it might not be safe simply to rely on the ordering implied by the statements in the code.
Is this a valid use of Event in Python?
yes this is the valid use-case of event..
value is thread protected.
if you increase the number of thread you have to wait in all thread .if that is the case you can use semaphore to.
Related
I did research on multi-threading for a programming project using it (first-timer here...). I would appreciate if you deemed my statements below correct or, rather, comment on the ones that are wrong or need correction.
A lock is an object which can be passed to functions, methods, ... by reference. A (in this example) function can then make use of that lock object reference in order to safely operate on data (a variable in this example). It does this by acquiring the lock, modifying the variable and then releasing the lock.
A thread can be created to target a function, which may obtain a reference to a lock (to then achieve what is stated above).
A lock does not protect a specific variable, object etc.
A lock does not protect or do anything unless it is acquired (and released).
Thus, it is in the responsibility of the programmer to use the lock in order achieve the desired protection.
If a lock is acquired inside a function executed by thread A, this has no immediate influence on any other running thread B. Not even if the functions targeted by threads A and B have a reference to the same lock object.
Only if the function targeted by thread B wants to acquire the same lock (i.e. via the same referenced lock object), which already was acquired by the function targeted by thread A at that time, the lock conveys influence on both threads in that thread B will pause further execution until the function targeted by thread A releases the lock again.
Thus, a locked lock only ever pauses execution of a thread, if its targeted function wants (and waits) to acquire the very same lock itself. Thus, by thread A acquiring the lock, it can only prevent thread B from acquiring the same lock, nothing more, nothing less.
If I want to use a lock to prevent race conditions when setting a variable, I (as the programmer) need to:
pass a lock to all functions targeted by threads that will want to set the variable and
acquire the lock in every function and every time before I set the variable (and release it afterwards). (*)
If I create even only one thread targeting a function without providing it a reference to the lock object and let it set the variable or
if I set the variable via a thread whose targeted function has the lock object, but doesn't acquire it prior to the operation, I will have failed to implement thread-safe setting of the variable.
(*) The lock should be acquired as long as the variable must not be accessed by other threads. Right now, I like to compare that to a database transaction... I lock the database (~ acquire a lock) until my set of instructions is completed, then I commit (~ release the lock).
Example If I wanted to create a class whose member _value should be set in a thread-safe fashion, I would implement one of these two versions:
class Version1:
def __init__(self):
self._value:int = 0
self._lock:threading.Lock = threading.Lock()
def getValue(self) -> int:
"""Getting won't be protected in this example."""
return self._value
def setValue(self, val:int) -> None:
"""This will be made thread-safe by member lock."""
with self._lock:
self._value = val
v1 = Version1()
t1_1 = threading.Thread(target=v1.setValue, args=(1)).start()
t1_2 = threading.Thread(target=v1.setValue, args=(2)).start()
class Version2:
def __init__(self):
self._value:int = 0
def getValue(self) -> int:
"""Getting won't be protected in this example."""
return self._value
def setValue(self, val:int, lock:threading.Lock) -> None:
"""This will be made thread-safe by injected lock."""
with self._lock:
self._value = val
v2 = Version2()
l = threading.Lock()
t2_1 = threading.Thread(target=v2.setValue, args=(1, l)).start()
t2_2 = threading.Thread(target=v2.setValue, args=(2, l)).start()
In Version1, I, as the class provider, can guarantee that setting _value is always thread-safe...
...because in Version2, the user of my class might pass to different lock objects to the two spawned threads and thus render the lock protection useless.
If I want to give the user of my class the freedom to include the setting of _value into a larger collection of steps that should be executed in a thread-safe manner, I could inject a Lock reference into Version1's __init__ function and assign that to the _lock member. Thus, the thread-safe operation of the class would be guaranteed while still allowing the user of the class to use "her own" lock for that purpose.
A score from 0-15 will now rate how well I have (mis)understood locks... :-D
It's also quite common to use global variables for locks. It depends on what the lock is protecting.
True, although somewhat meaningless. Any function can use a lock, not just the function that's the target of a thread.
If you mean there's no direct link between a lock and the data it protects, that's true. But you can define a data structure that contains a value that needs protecting and a reference to its lock.
True. Although as I say in 3, you can define a data structure that packages the data and lock. You could make this a class and have the class methods automatically acquire the lock as needed.
Correct. But see 4 for how you can automate this.
Correct.
Correct.
Correct.
Correct if it's not a global lock.
Partially correct. You should also often acquire the lock if you're merely reading the variable. If reading the object is not atomic (e.g. it's a list and you're reading multiple elements, or you read the same scalar object variable times and expect it to be stable), you need to prevent another thread from modifying it while you're reading.
Correct.
Correct.
Correct. This is an example of what I described above in 3 and 4.
Correct. Which is why the design in 13 is often better.
This is tricky, because the granularity of the locking needs to reflect all the objects that need to be protected. Your class only protects the assignment of that one variable -- it will release the lock before all the other steps associated with the caller-provided lock have been completed.
I want to read and process some data from an external service. I ask the service if there is any data, if something was returned I process it and ask again (so data can be processed immediately when it's available) and otherwise I wait for a notification that data is available. This can be written as an infinite loop:
def loop(self):
while True:
data = yield self.get_data_nonblocking()
if data is not None:
yield self.process_data(data)
else:
yield self.data_available
def on_data_available(self):
self.data_available.fire()
How can data_available be implemented here? It could be a Deferred but a Deferred cannot be reset, only recreated. Are there better options?
Can this loop be integrated into the Twisted event loop? I can read and process data right in on_data_available and write some code instead of the loop checking get_data_nonblocking but I feel like then I'll need some locks to make sure data is processed in the same order it arrives (the code above enforces it because it's the only place where it's processed). Is this a good idea at all?
Consider the case of a TCP connection. The receiver buffer for a TCP connection can either have data in it or not. You can get that data, or get nothing, without blocking by using the non-blocking socket API:
data = socket.recv(1024)
if data:
self.process_data(data)
You can wait for data to be available using select() (or any of the basically equivalent APIs):
socket.setblocking(False)
while True:
data = socket.recv(1024)
if data:
self.process_data(data)
else:
select([socket], [], [])
Of these, only select() is particularly Twisted-unfriendly (though the Twisted idiom is certainly not to make your own socket.recv calls). You could replace the select call with a Twisted-friendly version though (implement a Protocol with a dataReceived method that fires a Deferred - sort of like your on_data_available method - toss in some yields and make this whole thing an inlineCallbacks generator).
But though that's one way you can get data from a TCP connection, that's not the API that Twisted encourages you to use to do so. Instead, the API is:
class SomeProtocol(Protocol):
def dataReceived(self, data):
# Your logic here
I don't see how your case is substantially different. What if, instead of the loop you wrote, you did something like this:
class YourDataProcessor(object):
def process_data(self, data):
# Your logic here
class SomeDataGetter(object):
def __init__(self, processor):
self.processor = processor
def on_available_data(self):
data = self.get_data_nonblocking()
if data is not None:
self.processor.process_data(data)
Now there are no Deferreds at all (except perhaps in whatever implements on_available_data or get_data_nonblocking but I can't see that code).
If you leave this roughly as-is, you are guaranteed of in-ordered execution because Twisted is single-threaded (except in a couple places that are very clearly marked) and in a single-threaded program, an earlier call to process_data must complete before any later call to process_data could be made (excepting, of course, the case where process_data reentrantly invokes itself - but that's another story).
If you switch this back to using inlineCallbacks (or any equivalent "coroutine" flavored drink mix) then you are probably introducing the possibility of out-of-order execution.
For example, if get_data_nonblocking returns a Deferred and you write something like this:
#inlineCallbacks
def on_available_data(self):
data = yield self.get_data_nonblocking()
if data is not None:
self.processor.process_data(data)
Then you have changed on_available_data to say that a context switch is allowed when calling get_data_nonblocking. In this case, depending on your implementation of get_data_nonblocking and on_available_data, it's entirely possible that:
on_available_data is called
get_data_nonblocking is called and returns a Deferred
on_available_data tells execution to switch to another context (via yield / inlineCallbacks)
on_available_data is called again
get_data_nonblocking is called again and returns a Deferred (perhaps the same one! perhaps a new one! depends on how it's implement)
The second invocation of on_available_data tells execution to switch to another context (same reason)
The reactor spins around for a while and eventually an event arrives that causes the Deferred returned by the second invocation of get_data_nonblocking to fire.
Execution switches back to the second on_available_data frame
process_data is called with whatever data the second get_data_nonblocking call returned
Eventually the same things happen to the first set of objects and process_data is called again with whatever data the first get_data_nonblocking call returned
Now perhaps you've processed data out of order - again, this depends on more details of other parts of your system.
If so, you can always re-impose order. There are a lot of different possible approaches to this. Twisted itself doesn't come with any APIs that are explicitly in support of this operation so the solution involves writing some new code. Here's one idea (untested) for an approach - a queue-like class that knows about object sequence numbers:
class SequencedQueue(object):
"""
A queue-like type which guarantees objects come out of the queue in the order
defined by a sequence number associated with the objects when they are put into
the queue.
Application code manages sequence number assignment so that sequence numbers don't
have to have the same order as `put` calls on this type.
"""
def __init__(self):
# The sequence number of the object that should be given out
# by the next call to `get`
self._next_sequence = 0
# The sequence number of the next result that needs to be provided.
self._next_result = 0
# A holding area for objects past _next_sequence
self._queue = {}
# A holding area
self._waiting =
def put(self, sequence, object):
"""
Put an object into the queue at a particular point in the sequence.
"""
if sequence < self._next_sequence:
# Programming error. The sequence number
# of the object being put has already been used.
raise ...
self._queue[sequence] = object
self._check_waiters()
def get(self):
"""
Get an object from the queue which has the next sequence number
following whatever was previously gotten.
"""
result = self._waiters[self._next_sequence] = Deferred()
self._next_sequence += 1
self._check_waiters()
return result
def _check_waiters(self):
"""
Find any Deferreds previously given out by get calls which can now be given
their results and give them to them.
"""
while True:
seq = self._next_result
if seq in self._queue and seq in self._waiting:
self._next_result += 1
# XXX Probably a re-entrancy bug here. If a callback calls back in to
# put then this loop might run recursively
self._waiting.pop(seq).callback(self._queue.pop(seq))
else:
break
The expected behavior (modulo any bugs I accidentally added) is something like:
q = SequencedQueue()
d1 = q.get()
d2 = q.get()
# Nothing in particular happens
q.put(1, "second result")
# d1 fires with "first result" and afterwards d2 fires with "second result"
q.put(0, "first result")
Using this, just make sure you assign sequence numbers in the order you want data dispatched rather than the order it actually shows up somewhere. For example:
#inlineCallbacks
def on_available_data(self):
sequence = self._process_order
data = yield self.get_data_nonblocking()
if data is not None:
self._process_order += 1
self.sequenced_queue.put(sequence, data)
Elsewhere, some code can consume the queue sort of like:
#inlineCallbacks
def queue_consumer(self):
while True:
yield self.process_data(yield self.sequenced_queue.get())
I am writing a piece of code where I have one class A and two threads B and C.
I create an instance a of A. I then start both threads, first B then C.
B calls a function func_name in A by a.func_name(). So far so fine.
C on the other hand needs to access the result which is a list, say list_a defined inside func_name() in class A and accessed by instance a.
I have to match a set of string by using a for loop like this,
if self.string_variable in a.list_a:
print "found"
but it gives me an error:
A instance has no attribute list_a
Can some one please help me?
You will need some kind of synchronization primitive – exactly which one depends on further details of your design and requirements.
Assuming the list a.list_b is to be created once and is not modified later, thread C needs to wait until a.func_name() returns. This can be achieved by adding a threading.Event instance to A. In A.__init__(), add
self.event = threading.Event()
At the end of A.func_name(), add
self.event.set()
Before thread C tries to access a.list_b, add
a.event.wait()
to wait until a.func_name() has finished in thread B.
In general, synchronization between threads is a complex topic and an error-prone task. You should only do this if you really need to.
I'm sure this is not a very pythonic situation. But I'm not actually using this in any production code, I'm just considering how (if?) this could work. It doesn't have to be python specific, but I'd like a solution that at least WORKS within python framework.
Basically, I have a thread safe singleton object that implements __enter__ and __exit__ (so it can be used with with.
Singleton():
l = threading.Lock()
__enter__():
l.acquire()
__exit__():
l.release()
In my example, one thread gets the singleton, and inside the with statement it enters an infinite loop.
def infinite():
with Singleton():
while True:
pass
The goal of this experiment is to get the infinite thread out of its infinite loop WITHOUT killing the thread. Specifically using the Singleton object. First I was thinking of using an exception called from a different thread:
Singleton():
....
def killMe():
raise exception
But this obviously doesn't raise the exception in the other thread. What I thought next is that since the enter and exit methods acquire a class variable lock, is there any method that can be called on the Lock that will cause the thread that has acquired it to throw an exception?
Or, what I would probably do in C++ is just delete this or somehow call the destructor of the object from itself. Is there ANY way to do this in python? I know that if it's possible it will be a total hack job. But again, this is basically a thought experiment.
In Python, there is a somewhat undocumented way of raising an exception in another thread, though there are some caveats. See this recipe for "killable threads":
http://code.activestate.com/recipes/496960-thread2-killable-threads/
http://sebulba.wikispaces.com/recipe+thread2
I recently wrote a program that used a simple producer/consumer pattern. It initially had a bug related to improper use of threading.Lock that I eventually fixed. But it made me think whether it's possible to implement producer/consumer pattern in a lockless manner.
Requirements in my case were simple:
One producer thread.
One consumer thread.
Queue has place for only one item.
Producer can produce next item before the current one is consumed. The current item is therefore lost, but that's OK for me.
Consumer can consume current item before the next one is produced. The current item is therefore consumed twice (or more), but that's OK for me.
So I wrote this:
QUEUE_ITEM = None
# this is executed in one threading.Thread object
def producer():
global QUEUE_ITEM
while True:
i = produce_item()
QUEUE_ITEM = i
# this is executed in another threading.Thread object
def consumer():
global QUEUE_ITEM
while True:
i = QUEUE_ITEM
consume_item(i)
My question is: Is this code thread-safe?
Immediate comment: this code isn't really lockless - I use CPython and it has GIL.
I tested the code a little and it seems to work. It translates to some LOAD and STORE ops which are atomic because of GIL. But I also know that del x operation isn't atomic when x implements __del__ method. So if my item has a __del__ method and some nasty scheduling happens, things may break. Or not?
Another question is: What kind of restrictions (for example on produced items' type) do I have to impose to make the above code work fine?
My questions are only about theoretical possibility to exploit CPython's and GIL's quirks in order to come up with lockless (i.e. no locks like threading.Lock explicitly in code) solution.
Trickery will bite you. Just use Queue to communicate between threads.
Yes this will work in the way that you described:
That the producer may produce a skippable element.
That the consumer may consume the same element.
But I also know that del x operation isn't atomic when x implements del method. So if my item has a del method and some nasty scheduling happens, things may break.
I don't see a "del" here. If a del happens in consume_item then the del may occur in the producer thread. I don't think this would be a "problem".
Don't bother using this though. You will end up using up CPU on pointless polling cycles, and it is not any faster than using a queue with locks since Python already has a global lock.
This is not really thread safe because producer could overwrite QUEUE_ITEM before consumer has consumed it and consumer could consume QUEUE_ITEM twice. As you mentioned, you're OK with that but most people aren't.
Someone with more knowledge of cpython internals will have to answer you more theoretical questions.
I think it's possible that a thread is interrupted while producing/consuming, especially if the items are big objects.
Edit: this is just a wild guess. I'm no expert.
Also the threads may produce/consume any number of items before the other one starts running.
You can use a list as the queue as long as you stick to append/pop since both are atomic.
QUEUE = []
# this is executed in one threading.Thread object
def producer():
global QUEUE
while True:
i = produce_item()
QUEUE.append(i)
# this is executed in another threading.Thread object
def consumer():
global QUEUE
while True:
try:
i = QUEUE.pop(0)
except IndexError:
# queue is empty
continue
consume_item(i)
In a class scope like below, you can even clear the queue.
class Atomic(object):
def __init__(self):
self.queue = []
# this is executed in one threading.Thread object
def producer(self):
while True:
i = produce_item()
self.queue.append(i)
# this is executed in another threading.Thread object
def consumer(self):
while True:
try:
i = self.queue.pop(0)
except IndexError:
# queue is empty
continue
consume_item(i)
# There's the possibility producer is still working on it's current item.
def clear_queue(self):
self.queue = []
You'll have to find out which list operations are atomic by looking at the bytecode generated.
The __del__ could be a problem as You said. It could be avoided, if only there was a way to prevent the garbage collector from invoking the __del__ method on the old object before We finish assigning the new one to the QUEUE_ITEM. We would need something like:
increase the reference counter on the old object
assign a new one to `QUEUE_ITEM`
decrease the reference counter on the old object
I'm afraid, I don't know if it is possible, though.