Concurrent access to a shared resource using conditions in Threads Python - python

I have the below sample pretty basic code for working with conditions in Python:
import threading
import random
import time
class Producer(threading.Thread):
"""
Produces random integers to a list
"""
def __init__(self, integers, condition):
"""
Constructor.
#param integers list of integers
#param condition condition synchronization object
"""
threading.Thread.__init__(self)
self.integers = integers
self.condition = condition
def run(self):
"""
Thread run method. Append random integers to the integers list
at random time.
"""
while True:
integer = random.randint(0, 256)
self.condition.acquire()
print 'condition acquired by %s' % self.name
self.integers.append(integer)
print '%d appended to list by %s' % (integer, self.name)
print 'condition notified by %s' % self.name
self.condition.notify()
print 'condition released by %s' % self.name
self.condition.release()
time.sleep(1)
class Consumer(threading.Thread):
"""
Consumes random integers from a list
"""
def __init__(self, integers, condition):
"""
Constructor.
#param integers list of integers
#param condition condition synchronization object
"""
threading.Thread.__init__(self)
self.integers = integers
self.condition = condition
def run(self):
"""
Thread run method. Consumes integers from list
"""
while True:
self.condition.acquire()
print 'condition acquired by %s' % self.name
while True:
if self.integers:
integer = self.integers.pop()
print '%d popped from list by %s' % (integer, self.name)
break
print 'condition wait by %s' % self.name
self.condition.wait()
print 'condition released by %s' % self.name
self.condition.release()
def main():
integers = []
condition = threading.Condition()
t1 = Producer(integers, condition)
t2 = Consumer(integers, condition)
t1.start()
t2.start()
t1.join()
t2.join()
if __name__ == '__main__':
main()
As per my understanding when the consumer calls the wait() method it would release the condition and go into sleep.
When the producer notifies the threads after it calls notify() it seems that neither of the consumers are reacquiring the condition before they try to pop from the integer list.
Is this not a race condition ?

The consumers do not need to reacquire the condition after being woken up from wait() because they do not release it until after resuming from wait().
What they release is a lock that is always associated with the condition, either explicitly or implicitly.
From the docs:
A condition variable is always associated with some kind of lock; this can be passed in or one will be created by default. [...] The lock is part of the condition object: you don’t have to track it separately.
The lock is acquired and released implictly by acquiring / releasing the condition, as well as when calling wait() resp. waking up from it.
The acquire() and release() methods also call the corresponding methods of the associated lock.
[..] The wait() method releases the lock, and then blocks until another thread awakens it by calling notify() or notify_all(). Once awakened, wait() re-acquires the lock and returns.
So there is always a maximum of one thread that can hold the lock, and thus modify the shared resource safely at any given point in time.

Related

How to properly make a queue? Why does sleep on consumer make the queue work?

I'm trying to implement a queue. This is old code which was either taken from some kind of tutorial that I did some time ago or from some kind of experimentation that I did reading the docs, or a mix of the two. Thing is I'm not sure if the code is mine or not, but I'm trying to use it as an example to learn from. The script has a producer that produces numbers in a list and 2 consumers competing for grabbing those numbers and adding them up, the one with the highest sum wins.
So, here's my question: in the following code in the "consume_numbers" function I have a time.sleep(0.01) line which makes the code run. Without it, the code hangs, with it it runs smoothly. Can someone explain why this happens and how I could implement a queue without this issue?
import concurrent.futures
import time
import random
import threading
import queue
class MyQueue(queue.Queue):
def __init__(self, maxsize=10):
super().__init__()
self.maxsize = maxsize
self.numbers = []
def set_number(self, number):
self.put(number)
self.numbers.append(number)
def get_number(self):
return self.get()
def produce_random_numbers(q: MyQueue, maxcount: int, evnt: threading.Event):
count = 0
while not evnt.is_set():
num = random.randint(1, 5)
q.set_number(num)
count += 1
if count > maxcount:
event.set()
def consume_numbers(q: MyQueue, consumed: list, evnt: threading.Event):
while not q.empty() or not evnt.is_set():
num = q.get_number()
time.sleep(0.01)
consumed.append(num)
if __name__ == "__main__":
q = MyQueue(maxsize=10)
event = threading.Event()
cons1 = []
cons2 = []
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as ex:
ex.submit(produce_random_numbers, q, 50, event)
ex.submit(consume_numbers, q, cons1, event)
ex.submit(consume_numbers, q, cons2, event)
event.set()
print(f'Generated Numbers: {q.numbers}')
print(f'Numbers Consumed by Thread1 which summed up to {sum(cons1)} are: {cons1}')
print(f'Numbers Consumed by Thread2 which summed up to {sum(cons2)} are: {cons2}')
if sum(cons1) > sum(cons2):
print("Thread1 Wins!")
elif sum(cons1) < sum(cons2):
print("Thread2 Wins!")
else:
print("It's a tie!")
Thanks!
The code does not implement a queue from scratch, but extends queue.Queue to add memory. There is an event object that is used to signal to the consumers that the producer thread has finished. There is are hidden race conditions in the consumers when there is only one item on the queue.
The check not q.empty() or not evnt.is_set() will run the loop code either if there is something in the queue or the event has not been set. It could happen that:
One thread sees that the queue is not empty and enters the loop
A thread switch happens, and the other thread consumes the last item
A switch happens to the first thread, which calls get_number() and blocks
A similar race condition happens with the evnt.is_set() check:
The last item is added to the queue by the producer, and a thread switch happens
One thread consumes the last item, a switch
A thread switch happens, the consumer gets the last item and goes back to the loop condition. As the event has not been set the loop is executed and get_number() blocks
Having the threads wait minimizes the chance of these conditions happening. Without waiting, it is very likely that a single consumer thread will consume all the queue items, while the other one is still entering its loop.
Using timeouts is cumbersome. A useful idiom that avoids using events is to use iter and use an impossible value as a sentinel:
# --- snip ---
def produce_random_numbers(q: MyQueue, maxcount: int, n_consumers: int):
for _ in range(maxcount):
num = random.randint(1, 5)
q.set_number(num)
for _ in range(n_consumers):
q.put(None) # <--- I use put to put one sentinel per consumer
def consume_numbers(q: MyQueue, consumed: list):
for num in iter(q.get_number, None):
consumed.append(num)
if __name__ == "__main__":
q = MyQueue(maxsize=10)
cons1 = []
cons2 = []
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as ex:
ex.submit(produce_random_numbers, q, 500000, 2)
ex.submit(consume_numbers, q, cons1)
ex.submit(consume_numbers, q, cons2)
print(f'Generated Numbers: {q.numbers}')
# --- snip ---
There are some other issues and things I would have done differently:
The event.set() after the with... block is useless: the event has already been set by the producer
There is a typo in the producer and the global event variable is used instead of the local evnt parameter. Fortunately those refer to the same object.
As there is only one producer, there will be no problem. Otherwise the order of MyQueue.numbers could be different from the order in which the items were added to the queue:
put is called on one thread
a thread switch happens
a put + append happens in the new thread
a thread switch happens, and the first value is appended
Instead of defining MyQueue.set_number I would have overrided put

How to allow a class's variables to be modified concurrently by multiple threads

I have a class (MyClass) which contains a queue (self.msg_queue) of actions that need to be run and I have multiple sources of input that can add tasks to the queue.
Right now I have three functions that I want to run concurrently:
MyClass.get_input_from_user()
Creates a window in tkinter that has the user fill out information and when the user presses submit it pushes that message onto the queue.
MyClass.get_input_from_server()
Checks the server for a message, reads the message, and then puts it onto the queue. This method uses functions from MyClass's parent class.
MyClass.execute_next_item_on_the_queue()
Pops a message off of the queue and then acts upon it. It is dependent on what the message is, but each message corresponds to some method in MyClass or its parent which gets run according to a big decision tree.
Process description:
After the class has joined the network, I have it spawn three threads (one for each of the above functions). Each threaded function adds items from the queue with the syntax "self.msg_queue.put(message)" and removes items from the queue with "self.msg_queue.get_nowait()".
Problem description:
The issue I am having is that it seems that each thread is modifying its own queue object (they are not sharing the queue, msg_queue, of the class of which they, the functions, are all members).
I am not familiar enough with Multiprocessing to know what the important error messages are; however, it is stating that it cannot pickle a weakref object (it gives no indication of which object is the weakref object), and that within the queue.put() call the line "self._sem.acquire(block, timeout) yields a '[WinError 5] Access is denied'" error. Would it be safe to assume that this failure in the queue's reference not copying over properly?
[I am using Python 3.7.2 and the Multiprocessing package's Process and Queue]
[I have seen multiple Q/As about having threads shuttle information between classes--create a master harness that generates a queue and then pass that queue as an argument to each thread. If the functions didn't have to use other functions from MyClass I could see adapting this strategy by having those functions take in a queue and use a local variable rather than class variables.]
[I am fairly confident that this error is not the result of passing my queue to the tkinter object as my unit tests on how my GUI modifies its caller's queue work fine]
Below is a minimal reproducible example for the queue's error:
from multiprocessing import Queue
from multiprocessing import Process
import queue
import time
class MyTest:
def __init__(self):
self.my_q = Queue()
self.counter = 0
def input_function_A(self):
while True:
self.my_q.put(self.counter)
self.counter = self.counter + 1
time.sleep(0.2)
def input_function_B(self):
while True:
self.counter = 0
self.my_q.put(self.counter)
time.sleep(1)
def output_function(self):
while True:
try:
var = self.my_q.get_nowait()
except queue.Empty:
var = -1
except:
break
print(var)
time.sleep(1)
def run(self):
process_A = Process(target=self.input_function_A)
process_B = Process(target=self.input_function_B)
process_C = Process(target=self.output_function)
process_A.start()
process_B.start()
process_C.start()
# without this it generates the WinError:
# with this it still behaves as if the two input functions do not modify the queue
process_C.join()
if __name__ == '__main__':
test = MyTest()
test.run()
Indeed - these are not "threads" - these are "processes" - while if you were using multithreading, and not multiprocessing, the self.my_q instance would be the same object, placed at the same memory space on the computer,
multiprocessing does a fork of the process, and any data in the original process (the one in execution in the "run" call) will be duplicated when it is used - so, each subprocess will see its own "Queue" instance, unrelated to the others.
The correct way to have various process share a multiprocessing.Queue object is to pass it as a parameter to the target methods. The simpler way to reorganize your code so that it works is thus:
from multiprocessing import Queue
from multiprocessing import Process
import queue
import time
class MyTest:
def __init__(self):
self.my_q = Queue()
self.counter = 0
def input_function_A(self, queue):
while True:
queue.put(self.counter)
self.counter = self.counter + 1
time.sleep(0.2)
def input_function_B(self, queue):
while True:
self.counter = 0
queue.put(self.counter)
time.sleep(1)
def output_function(self, queue):
while True:
try:
var = queue.get_nowait()
except queue.Empty:
var = -1
except:
break
print(var)
time.sleep(1)
def run(self):
process_A = Process(target=self.input_function_A, args=(queue,))
process_B = Process(target=self.input_function_B, args=(queue,))
process_C = Process(target=self.output_function, args=(queue,))
process_A.start()
process_B.start()
process_C.start()
# without this it generates the WinError:
# with this it still behaves as if the two input functions do not modify the queue
process_C.join()
if __name__ == '__main__':
test = MyTest()
test.run()
As you can see, since your class is not actually sharing any data through the instance's attributes, this "class" design does not make much sense for your application - but for grouping the different workers in the same code block.
It would be possible to have a magic-multiprocess-class that would have some internal method to actually start the worker-methods and share the Queue instance - so if you have a lot of those in a project, there would be a lot less boilerplate.
Something along:
from multiprocessing import Queue
from multiprocessing import Process
import time
class MPWorkerBase:
def __init__(self, *args, **kw):
self.queue = None
self.is_parent_process = False
self.is_child_process = False
self.processes = []
# ensure this can be used as a colaborative mixin
super().__init__(*args, **kw)
def run(self):
if self.is_parent_process or self.is_child_process:
# workers already initialized
return
self.queue = Queue()
processes = []
cls = self.__class__
for name in dir(cls):
method = getattr(cls, name)
if callable(method) and getattr(method, "_MP_worker", False):
process = Process(target=self._start_worker, args=(self.queue, name))
self.processes.append(process)
process.start()
# Setting these attributes here ensure the child processes have the initial values for them.
self.is_parent_process = True
self.processes = processes
def _start_worker(self, queue, method_name):
# this method is called in a new spawned process - attribute
# changes here no longer reflect attributes on the
# object in the initial process
# overwrite queue in this process with the queue object sent over the wire:
self.queue = queue
self.is_child_process = True
# call the worker method
getattr(self, method_name)()
def __del__(self):
for process in self.processes:
process.join()
def worker(func):
"""decorator to mark a method as a worker that should
run in its own subprocess
"""
func._MP_worker = True
return func
class MyTest(MPWorkerBase):
def __init__(self):
super().__init__()
self.counter = 0
#worker
def input_function_A(self):
while True:
self.queue.put(self.counter)
self.counter = self.counter + 1
time.sleep(0.2)
#worker
def input_function_B(self):
while True:
self.counter = 0
self.queue.put(self.counter)
time.sleep(1)
#worker
def output_function(self):
while True:
try:
var = self.queue.get_nowait()
except queue.Empty:
var = -1
except:
break
print(var)
time.sleep(1)
if __name__ == '__main__':
test = MyTest()
test.run()

Can I assume my threads are done when threading.active_count() returns 1?

Given the following class:
from abc import ABCMeta, abstractmethod
from time import sleep
import threading
from threading import active_count, Thread
class ScraperPool(metaclass=ABCMeta):
Queue = []
ResultList = []
def __init__(self, Queue, MaxNumWorkers=0, ItemsPerWorker=50):
# Initialize attributes
self.MaxNumWorkers = MaxNumWorkers
self.ItemsPerWorker = ItemsPerWorker
self.Queue = Queue # For testing purposes.
def initWorkerPool(self, PrintIDs=True):
for w in range(self.NumWorkers()):
Thread(target=self.worker, args=(w + 1, PrintIDs,)).start()
sleep(1) # Explicitly wait one second for this worker to start.
def run(self):
self.initWorkerPool()
# Wait until all workers (i.e. threads) are done.
while active_count() > 1:
print("Active threads: " + str(active_count()))
sleep(5)
self.HandleResults()
def worker(self, id, printID):
if printID:
print("Starting worker " + str(id) + ".")
while (len(self.Queue) > 0):
self.scraperMethod()
if printID:
print("Worker " + str(id) + " is quiting.")
# Todo Kill is this Thread.
return
def NumWorkers(self):
return 1 # Simplified for testing purposes.
#abstractmethod
def scraperMethod(self):
pass
class TestScraper(ScraperPool):
def scraperMethod(self):
# print("I am scraping.")
# print("Scraping. Threads#: " + str(active_count()))
temp_item = self.Queue[-1]
self.Queue.pop()
self.ResultList.append(temp_item)
def HandleResults(self):
print(self.ResultList)
ScraperPool.register(TestScraper)
scraper = TestScraper(Queue=["Jaap", "Piet"])
scraper.run()
print(threading.active_count())
# print(scraper.ResultList)
When all the threads are done, there's still one active thread - threading.active_count() on the last line gets me that number.
The active thread is <_MainThread(MainThread, started 12960)> - as printed with threading.enumerate().
Can I assume that all my threads are done when active_count() == 1?
Or can, for instance, imported modules start additional threads so that my threads are actually done when active_count() > 1 - also the condition for the loop I'm using in the run method.
You can assume that your threads are done when active_count() reaches 1. The problem is, if any other module creates a thread, you'll never get to 1. You should manage your threads explicitly.
Example: You can put the threads in a list and join them one at a time. The relevant changes to your code are:
def __init__(self, Queue, MaxNumWorkers=0, ItemsPerWorker=50):
# Initialize attributes
self.MaxNumWorkers = MaxNumWorkers
self.ItemsPerWorker = ItemsPerWorker
self.Queue = Queue # For testing purposes.
self.WorkerThreads = []
def initWorkerPool(self, PrintIDs=True):
for w in range(self.NumWorkers()):
thread = Thread(target=self.worker, args=(w + 1, PrintIDs,))
self.WorkerThreads.append(thread)
thread.start()
sleep(1) # Explicitly wait one second for this worker to start.
def run(self):
self.initWorkerPool()
# Wait until all workers (i.e. threads) are done. Waiting in order
# so some threads further in the list may finish first, but we
# will get to all of them eventually
while self.WorkerThreads:
self.WorkerThreads[0].join()
self.HandleResults()
according to docs active_count() includes the main thread, so if you're at 1 then you're most likely done, but if you have another source of new threads in your program then you may be done before active_count() hits 1.
I would recommend implementing explicit join method on your ScraperPool and keeping track of your workers and explicitly joining them to main thread when needed instead of checking that you're done with active_count() calls.
Also remember about GIL...

How to run Threads synchronously with Python?

I need to use 3 threads to print array items sequentially using Python.
Each Thread will print one array item.
I need the threads to sleep for a random number of seconds and then print the item.
This function will be executed N times. This N value is given by the user.
The items must be printed on a specific order, which means I have to somehow block the other threads to execute while the previous one is not done.
I've been trying a lot of different solutions but I can't figure out how to make it work.
I've tried to use Semaphores, Lock and Events but without success on the synchronization. In all cases it would print the sequence randomly, according to the time.sleep and not with the sequence itself. How can I block the thread from executing the function and check if the previous thread was finished in order to allow the sequence to work?
Which tool should I use to make it work? Any help is appreciated.
class myThread(threading.Thread):
def __init__(self, group=None, target=None, name=None,
args=(), kwargs=None, verbose=None):
super(myThread,self).__init__()
self.target = target
self.name = name
return
def run(self):
while True:
if not q.empty():
semaphore.acquire()
try:
time_sleep = random.randrange(0,10)
print "thread " + self.name + ". Dormir por " + str(time_sleep) + " segundos"
time.sleep(time_sleep)
print cores[int(self.name)]
if int(self.name) == len(cores) - 1:
item = q.get()
print 'Executou sequencia ' + str(item + 1) + ' vezes. Ainda irá executar ' + str(q.qsize()) + ' vezes'
e.set()
finally:
semaphore.release()
if int(self.name) != len(cores) - 1:
e.wait()
return
if __name__ == '__main__':
for i in range(2):
q.put(i)
for i in range(3):
t = myThread(name=i)
t.start()
There are many, many approaches to this. A simple one is to use a shared Queue of numbers.
Each thread can sleep for however long it wants to, take a number from the queue when it wakes up, and print it. They will come out in the order they were pushed to the queue.
If your numbers are sequential, or can generated dynamically, you can also do it in constant-memory using a shared counter, as described in this answer.
If you didn't care about order you could just use a lock to synchronize access. In this case, though, how about a list of events. Each thread gets its own event slot and hands it to the next event in the list when done. This scheme could be fancied up by returning a context manager so that you don't need to release explicitly.
import threading
class EventSlots(object):
def __init__(self, num_slots):
self.num_slots = num_slots
self.events = [threading.Event() for _ in range(num_slots)]
self.events[0].set()
def wait_slot(self, slot_num):
self.events[slot_num].wait()
self.events[slot_num].clear()
def release_slot(self, slot_num):
self.events[(slot_num + 1) % self.num_slots].set()
def worker(event_slots, slot_num):
for i in range(5):
event_slots.wait_slot(slot_num)
print('slot', slot_num, 'iteration', i)
event_slots.release_slot(slot_num)
NUM = 3
slots = EventSlots(NUM)
threads = []
for slot in range(NUM):
t = threading.Thread(target=worker, args=(slots, slot))
t.start()
threads.append(t)
for t in threads:
t.join()

Threading lock in python not working as desired

I am trying to protect data inside my thread from the main thread. I have the following code:
lock = threading.Lock()
def createstuff(data):
t= threading.Thread(target=func, args=(data,))
t.start()
def func(val):
with lock:
print 'acquired'
time.sleep(2)
print ('Value: %s, : %s'%(val, threading.currentThread().getName()))
print 'released\n'
ags_list = ['x']
createstuff(ags_list)
rn =random.randint(5,50)
print 'random no:', rn
ags_list[0] = rn
It produces the Output:
acquired
random no: 10
Value: [10], : Thread-1
released
Why does changing the list in main thread cause the list inside another thread to mutate even though it is locked? What can I do to prevent it? Thanks.
because the lock only works out if you're using everywhere you're mutating the list, it's not a magic spell that works everywhere if you call it at only one place.
To protect the list you need to add the lock context on both threads:
lock = threading.Lock()
def createstuff(data):
t= threading.Thread(target=func, args=(data,))
t.start()
def func(val):
with lock:
print 'thread: acquired'
time.sleep(2)
print ('Value: %s, : %s'%(val, threading.currentThread().getName()))
print 'thread released'
ags_list = ['x']
createstuff(ags_list)
with lock:
print 'thread: acquired'
rn =random.randint(5,50)
print 'random no:', rn
ags_list[0] = rn
print 'thread: released'
You could create a thread safe list such as:
class ThreadSafeList(list):
def __init__(self, *args):
super(ThreadSafeList, self).__init__(*args)
self.lock = threading.Lock()
def __setitem__(self, idx, value):
with self.lock:
print 'list acquired'
super(ThreadSafeList, self)[idx] = value
print 'list released'
and then use it:
def createstuff(data):
t= threading.Thread(target=func, args=(data,))
t.start()
def func(val):
time.sleep(2)
print ('Value: %s, : %s'%(val, threading.currentThread().getName()))
args_list = ThreadSafeList(['x'])
createstuff(args_list)
rn =random.randint(5,50)
print 'random no:', rn
args_list[0] = rn
of course that's only an example that needs to be completed and improved. Here I preferred to focus on the point.
Though you do not need a lock in the thread, because accessing a value from a list is (afaict) an atomic read only action, so the mutation of the list can only happen before or after the value is being accessed within the list, not as it is accessing the value. So in the end you should not have any race issue in your example.
If you were modifying the list's value, or doing a non atomic access to data, then the lock could be useful.
N.B.: in case you thought it could work any other way: the mutex mechanism (implemented through Lock) does not protect data, it protects two threads of execution from executing at the same time. If you assert a lock in Thread A, before asserting the same lock in Thread B, Thread B will wait for Thread A to deassert the lock until doing its job.
In python list is passed by reference, so any changes on that list are reflected anywhere else that list is being used.
Here is a link to docs that might help clear a few more things up.
lock = threading.Lock()
def createstuff(data):
t= threading.Thread(target=func, args=(data,))
t.start()
def func(val):
with lock:
print 'acquired'
time.sleep(2)
print ('Value: %s, : %s'%(val, threading.currentThread().getName()))
print 'released\n'
ags_list = ['x']
# you would need to create a copy of different copy of that list
new_ags_list = ags_list[:] #here
createstuff(ags_list)
rn =random.randint(5,50)
print 'random no:', rn
ags_list[0] = rn

Categories