How to run Threads synchronously with Python? - python

I need to use 3 threads to print array items sequentially using Python.
Each Thread will print one array item.
I need the threads to sleep for a random number of seconds and then print the item.
This function will be executed N times. This N value is given by the user.
The items must be printed on a specific order, which means I have to somehow block the other threads to execute while the previous one is not done.
I've been trying a lot of different solutions but I can't figure out how to make it work.
I've tried to use Semaphores, Lock and Events but without success on the synchronization. In all cases it would print the sequence randomly, according to the time.sleep and not with the sequence itself. How can I block the thread from executing the function and check if the previous thread was finished in order to allow the sequence to work?
Which tool should I use to make it work? Any help is appreciated.
class myThread(threading.Thread):
def __init__(self, group=None, target=None, name=None,
args=(), kwargs=None, verbose=None):
super(myThread,self).__init__()
self.target = target
self.name = name
return
def run(self):
while True:
if not q.empty():
semaphore.acquire()
try:
time_sleep = random.randrange(0,10)
print "thread " + self.name + ". Dormir por " + str(time_sleep) + " segundos"
time.sleep(time_sleep)
print cores[int(self.name)]
if int(self.name) == len(cores) - 1:
item = q.get()
print 'Executou sequencia ' + str(item + 1) + ' vezes. Ainda irá executar ' + str(q.qsize()) + ' vezes'
e.set()
finally:
semaphore.release()
if int(self.name) != len(cores) - 1:
e.wait()
return
if __name__ == '__main__':
for i in range(2):
q.put(i)
for i in range(3):
t = myThread(name=i)
t.start()

There are many, many approaches to this. A simple one is to use a shared Queue of numbers.
Each thread can sleep for however long it wants to, take a number from the queue when it wakes up, and print it. They will come out in the order they were pushed to the queue.
If your numbers are sequential, or can generated dynamically, you can also do it in constant-memory using a shared counter, as described in this answer.

If you didn't care about order you could just use a lock to synchronize access. In this case, though, how about a list of events. Each thread gets its own event slot and hands it to the next event in the list when done. This scheme could be fancied up by returning a context manager so that you don't need to release explicitly.
import threading
class EventSlots(object):
def __init__(self, num_slots):
self.num_slots = num_slots
self.events = [threading.Event() for _ in range(num_slots)]
self.events[0].set()
def wait_slot(self, slot_num):
self.events[slot_num].wait()
self.events[slot_num].clear()
def release_slot(self, slot_num):
self.events[(slot_num + 1) % self.num_slots].set()
def worker(event_slots, slot_num):
for i in range(5):
event_slots.wait_slot(slot_num)
print('slot', slot_num, 'iteration', i)
event_slots.release_slot(slot_num)
NUM = 3
slots = EventSlots(NUM)
threads = []
for slot in range(NUM):
t = threading.Thread(target=worker, args=(slots, slot))
t.start()
threads.append(t)
for t in threads:
t.join()

Related

How to properly make a queue? Why does sleep on consumer make the queue work?

I'm trying to implement a queue. This is old code which was either taken from some kind of tutorial that I did some time ago or from some kind of experimentation that I did reading the docs, or a mix of the two. Thing is I'm not sure if the code is mine or not, but I'm trying to use it as an example to learn from. The script has a producer that produces numbers in a list and 2 consumers competing for grabbing those numbers and adding them up, the one with the highest sum wins.
So, here's my question: in the following code in the "consume_numbers" function I have a time.sleep(0.01) line which makes the code run. Without it, the code hangs, with it it runs smoothly. Can someone explain why this happens and how I could implement a queue without this issue?
import concurrent.futures
import time
import random
import threading
import queue
class MyQueue(queue.Queue):
def __init__(self, maxsize=10):
super().__init__()
self.maxsize = maxsize
self.numbers = []
def set_number(self, number):
self.put(number)
self.numbers.append(number)
def get_number(self):
return self.get()
def produce_random_numbers(q: MyQueue, maxcount: int, evnt: threading.Event):
count = 0
while not evnt.is_set():
num = random.randint(1, 5)
q.set_number(num)
count += 1
if count > maxcount:
event.set()
def consume_numbers(q: MyQueue, consumed: list, evnt: threading.Event):
while not q.empty() or not evnt.is_set():
num = q.get_number()
time.sleep(0.01)
consumed.append(num)
if __name__ == "__main__":
q = MyQueue(maxsize=10)
event = threading.Event()
cons1 = []
cons2 = []
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as ex:
ex.submit(produce_random_numbers, q, 50, event)
ex.submit(consume_numbers, q, cons1, event)
ex.submit(consume_numbers, q, cons2, event)
event.set()
print(f'Generated Numbers: {q.numbers}')
print(f'Numbers Consumed by Thread1 which summed up to {sum(cons1)} are: {cons1}')
print(f'Numbers Consumed by Thread2 which summed up to {sum(cons2)} are: {cons2}')
if sum(cons1) > sum(cons2):
print("Thread1 Wins!")
elif sum(cons1) < sum(cons2):
print("Thread2 Wins!")
else:
print("It's a tie!")
Thanks!
The code does not implement a queue from scratch, but extends queue.Queue to add memory. There is an event object that is used to signal to the consumers that the producer thread has finished. There is are hidden race conditions in the consumers when there is only one item on the queue.
The check not q.empty() or not evnt.is_set() will run the loop code either if there is something in the queue or the event has not been set. It could happen that:
One thread sees that the queue is not empty and enters the loop
A thread switch happens, and the other thread consumes the last item
A switch happens to the first thread, which calls get_number() and blocks
A similar race condition happens with the evnt.is_set() check:
The last item is added to the queue by the producer, and a thread switch happens
One thread consumes the last item, a switch
A thread switch happens, the consumer gets the last item and goes back to the loop condition. As the event has not been set the loop is executed and get_number() blocks
Having the threads wait minimizes the chance of these conditions happening. Without waiting, it is very likely that a single consumer thread will consume all the queue items, while the other one is still entering its loop.
Using timeouts is cumbersome. A useful idiom that avoids using events is to use iter and use an impossible value as a sentinel:
# --- snip ---
def produce_random_numbers(q: MyQueue, maxcount: int, n_consumers: int):
for _ in range(maxcount):
num = random.randint(1, 5)
q.set_number(num)
for _ in range(n_consumers):
q.put(None) # <--- I use put to put one sentinel per consumer
def consume_numbers(q: MyQueue, consumed: list):
for num in iter(q.get_number, None):
consumed.append(num)
if __name__ == "__main__":
q = MyQueue(maxsize=10)
cons1 = []
cons2 = []
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as ex:
ex.submit(produce_random_numbers, q, 500000, 2)
ex.submit(consume_numbers, q, cons1)
ex.submit(consume_numbers, q, cons2)
print(f'Generated Numbers: {q.numbers}')
# --- snip ---
There are some other issues and things I would have done differently:
The event.set() after the with... block is useless: the event has already been set by the producer
There is a typo in the producer and the global event variable is used instead of the local evnt parameter. Fortunately those refer to the same object.
As there is only one producer, there will be no problem. Otherwise the order of MyQueue.numbers could be different from the order in which the items were added to the queue:
put is called on one thread
a thread switch happens
a put + append happens in the new thread
a thread switch happens, and the first value is appended
Instead of defining MyQueue.set_number I would have overrided put

Difference in starting threading.Thread objects from a list in python3

I am trying to do an exercise about the use of multi-threading in python. This is the task "Write a program that increments a counter shared by two or more threads up untile a certain threshold. Consider various numbers of threads you can use and various initial values and thresholds. Every thread increases the value of the counter by one, if this is lower than the threashold, every 2 seconds."
My attempt at solving the problem is the following:
from threading import Thread
import threading
import time
lock = threading.Lock()
class para:
def __init__(self, value):
self.para = value
class myT(Thread):
def __init__(self,nome,para, end, lock):
Thread.__init__(self)
self.nome = nome
self.end = end
self.para = para
self.lock = lock
def run(self):
while self.para.para < self.end:
self.lock.acquire()
self.para.para += 1
self.lock.release()
time.sleep(2)
print(self.nome, self.para.para)
para = para(1)
threads = []
for i in range(2):
t = myT('Thread' + str(i), para, 15, lock)
threads.append(t)
for i in range(len(threads)):
threads[i].start()
threads[i].join()
print('End code')
I have found an issue:
for i in range(len(threads)):
threads[i].start()
threads[i].join()
The for cycle makes just one thread start while the others are not started (in fact, the output is just the Thread with name 'Thread0' incresing the variable. While if i type manually:
threads[0].start()
threads[1].start()
threads[0].join()
threads[1].join()
I get the correct output, meanining that both threads are working at the same time
Writing the join outside the for and implementing a for just for the join seems to solve the issue, but i do not completely understand why:
for i in range(len(threads)):
threads[i].start()
for i in range(len(threads)):
threads[i].join()
I wanted to ask here for an explanation of the correct way to solve the task using multi-threading in python
Here's an edit of your code and some observations.
Threads share the same memory space therefore, there's no need to pass the reference to the Lock object - that can be in global space.
The Lock object supports enter and exit and can therefore be used in the style of a work manager.
In the first loop we build a list of all threads and also start them. Once they're all started we use another loop to join them.
So now it looks like this:
from threading import Thread, Lock
class para:
def __init__(self, value):
self.para = value
class myT(Thread):
def __init__(self, nome, para, end):
super().__init__()
self.nome = nome
self.end = end
self.para = para
def run(self):
while self.para.para < self.end:
with LOCK:
self.para.para += 1
print(self.nome, self.para.para)
para = para(1)
LOCK = Lock()
threads = []
NTHREADS = 2
for i in range(NTHREADS):
t = myT(f'Thread-{i}', para, 15)
threads.append(t)
t.start()
for t in threads:
t.join()
print('End code')

Can I assume my threads are done when threading.active_count() returns 1?

Given the following class:
from abc import ABCMeta, abstractmethod
from time import sleep
import threading
from threading import active_count, Thread
class ScraperPool(metaclass=ABCMeta):
Queue = []
ResultList = []
def __init__(self, Queue, MaxNumWorkers=0, ItemsPerWorker=50):
# Initialize attributes
self.MaxNumWorkers = MaxNumWorkers
self.ItemsPerWorker = ItemsPerWorker
self.Queue = Queue # For testing purposes.
def initWorkerPool(self, PrintIDs=True):
for w in range(self.NumWorkers()):
Thread(target=self.worker, args=(w + 1, PrintIDs,)).start()
sleep(1) # Explicitly wait one second for this worker to start.
def run(self):
self.initWorkerPool()
# Wait until all workers (i.e. threads) are done.
while active_count() > 1:
print("Active threads: " + str(active_count()))
sleep(5)
self.HandleResults()
def worker(self, id, printID):
if printID:
print("Starting worker " + str(id) + ".")
while (len(self.Queue) > 0):
self.scraperMethod()
if printID:
print("Worker " + str(id) + " is quiting.")
# Todo Kill is this Thread.
return
def NumWorkers(self):
return 1 # Simplified for testing purposes.
#abstractmethod
def scraperMethod(self):
pass
class TestScraper(ScraperPool):
def scraperMethod(self):
# print("I am scraping.")
# print("Scraping. Threads#: " + str(active_count()))
temp_item = self.Queue[-1]
self.Queue.pop()
self.ResultList.append(temp_item)
def HandleResults(self):
print(self.ResultList)
ScraperPool.register(TestScraper)
scraper = TestScraper(Queue=["Jaap", "Piet"])
scraper.run()
print(threading.active_count())
# print(scraper.ResultList)
When all the threads are done, there's still one active thread - threading.active_count() on the last line gets me that number.
The active thread is <_MainThread(MainThread, started 12960)> - as printed with threading.enumerate().
Can I assume that all my threads are done when active_count() == 1?
Or can, for instance, imported modules start additional threads so that my threads are actually done when active_count() > 1 - also the condition for the loop I'm using in the run method.
You can assume that your threads are done when active_count() reaches 1. The problem is, if any other module creates a thread, you'll never get to 1. You should manage your threads explicitly.
Example: You can put the threads in a list and join them one at a time. The relevant changes to your code are:
def __init__(self, Queue, MaxNumWorkers=0, ItemsPerWorker=50):
# Initialize attributes
self.MaxNumWorkers = MaxNumWorkers
self.ItemsPerWorker = ItemsPerWorker
self.Queue = Queue # For testing purposes.
self.WorkerThreads = []
def initWorkerPool(self, PrintIDs=True):
for w in range(self.NumWorkers()):
thread = Thread(target=self.worker, args=(w + 1, PrintIDs,))
self.WorkerThreads.append(thread)
thread.start()
sleep(1) # Explicitly wait one second for this worker to start.
def run(self):
self.initWorkerPool()
# Wait until all workers (i.e. threads) are done. Waiting in order
# so some threads further in the list may finish first, but we
# will get to all of them eventually
while self.WorkerThreads:
self.WorkerThreads[0].join()
self.HandleResults()
according to docs active_count() includes the main thread, so if you're at 1 then you're most likely done, but if you have another source of new threads in your program then you may be done before active_count() hits 1.
I would recommend implementing explicit join method on your ScraperPool and keeping track of your workers and explicitly joining them to main thread when needed instead of checking that you're done with active_count() calls.
Also remember about GIL...

Python: multithreading complex objects

class Job(object):
def __init__(self, name):
self.name = name
self.depends = []
self.waitcount = 0
def work(self):
#does some work
def add_dependent(self, another_job)
self.depends.append(another_job)
self.waitcount += 1
so, waitcount is based on the number of jobs you have in depends
job_board = {}
# create a dependency tree
for i in range(1000):
# create random jobs
j = Job(<new name goes here>)
# add jobs to depends if dependent
# record it in job_board
job_board[j.name] = j
# example
# jobC is in self.depends of jobA and jobB
# jobC would have a waitcount of 2
rdyQ = Queue.Queue()
def worker():
try:
job = rdyQ.get()
success = job.work()
# if this job was successful create dependent jobs
if success:
for dependent_job in job.depends:
dependent_job.waitcount -= 1
if dependent_job.waitcount == 0:
rdyQ.put(dependent_job)
and then i would create threads
for i in range(10):
t = threading.Thread( target=worker )
t.daemon=True
t.start()
for job_name, job_obj in job_board.iteritems():
if job_obj.waitcount == 0:
rdyQ.put(job_obj)
while True:
# until all jobs finished wait
Now here is an example:
# example
# jobC is in self.depends of jobA and jobB
# jobC would have a waitcount of 2
now in this scenario, if both jobA and jobB are running and they both tried to decrement waitcount of jobC, weird things were happening
so i put a lock
waitcount_lock = threading.Lock()
and changed this code to:
# if this job was successful create dependent jobs
if success:
for dependent_job in job.depends:
with waitcount_lock:
dependent_job.waitcount -= 1
if dependent_job.waitcount == 0:
rdyQ.put(dependent_job)
and strange things still happen
i.e. same job was being processed by multiple threads, as if the job was put into the queue twice
is it not a best practice to have/modify nested objects when complex objects are being pass amongst threads?
Here's a complete, executable program that appears to work fine. I expect you're mostly seeing "weird" behavior because, as I suggested in a comment, you're counting job successors instead of job predecessors. So I renamed things with "succ" and "pred" in their names to make that much clearer. daemon threads are also usually a Bad Idea, so this code arranges to shut down all the threads cleanly when the work is over. Note too the use of assertions to verify that implicit beliefs are actually true ;-)
import threading
import Queue
import random
NTHREADS = 10
NJOBS = 10000
class Job(object):
def __init__(self, name):
self.name = name
self.done = False
self.succs = []
self.npreds = 0
def work(self):
assert not self.done
self.done = True
return True
def add_dependent(self, another_job):
self.succs.append(another_job)
another_job.npreds += 1
def worker(q, lock):
while True:
job = q.get()
if job is None:
break
success = job.work()
if success:
for succ in job.succs:
with lock:
assert succ.npreds > 0
succ.npreds -= 1
if succ.npreds == 0:
q.put(succ)
q.task_done()
jobs = [Job(i) for i in range(NJOBS)]
for i, job in enumerate(jobs):
# pick some random successors
possible = xrange(i+1, NJOBS)
succs = random.sample(possible,
min(len(possible),
random.randrange(10)))
for succ in succs:
job.add_dependent(jobs[succ])
q = Queue.Queue()
for job in jobs:
if job.npreds == 0:
q.put(job)
print q.qsize(), "ready jobs initially"
lock = threading.Lock()
threads = [threading.Thread(target=worker,
args=(q, lock))
for _ in range(NTHREADS)]
for t in threads:
t.start()
q.join()
# add sentinels so threads end cleanly
for t in threads:
q.put(None)
for t in threads:
t.join()
for job in jobs:
assert job.done
assert job.npreds == 0
CLARIFYING THE LOCK
In a sense, the lock in this code protects "too much". The potential problem it's addressing is that multiple threads may try to decrement the .npreds member of the same Job object simultaneously. Without mutual exclusion, the stored value at the end of that may be anywhere from 1 smaller than its initial value, to the correct result (the initial value minus the number of threads trying to decrement it).
But there's no need to also mutate the queue under lock protection. Queues do their own thread-safe locking. So, e.g., the code could be written like so instead:
for succ in job.succs:
with lock:
npreds = succ.npreds = succ.npreds - 1
assert npreds >= 0
if npreds == 0:
q.put(succ)
It's generally best practice to hold a lock for as little time as possible. However, I find this rewrite harder to follow. Pick your poison ;-)

How to close Threads in Python?

I have some issue with too many Threads unfinished.
I think that queue command .join() just close queue and not the threads using it.
In my script I need to check 280k domains and for each domain get list of his MX records and obtain an IPv6 address of servers if it has it.
I used threads and thanks for them the script its many times faster. But there is a problem, although there is join() for the queue, number of alive threads is growing till an error occur that informs that cant create any new thread (limitation of OS?).
How can I terminate/close/stop/reset threads after each For loop when I am retrieving new domain from database?
Thread Class definition...
class MX_getAAAA_thread(threading.Thread):
def __init__(self,queue,id_domain):
threading.Thread.__init__(self)
self.queue = queue
self.id_domain = id_domain
def run(self):
while True:
self.mx = self.queue.get()
res = dns.resolver.Resolver()
res.lifetime = 1.5
res.timeout = 0.5
try:
answers = res.query(self.mx,'AAAA')
ip_mx = str(answers[0])
except:
ip_mx = "N/A"
lock.acquire()
sql = "INSERT INTO mx (id_domain,mx,ip_mx) VALUES (" + str(id_domain) + ",'" + str(self.mx) + "','" + str(ip_mx) + "')"
try:
cursor.execute(sql)
db.commit()
except:
db.rollback()
print "MX" , '>>' , ip_mx, ' :: ', str(self.mx)
lock.release()
self.queue.task_done()
Thread class in use...
(The main For-loop is not here, this is just part of his body)
try:
answers = resolver.query(domain, 'MX')
qMX = Queue.Queue()
for i in range(len(answers)):
t = MX_getAAAA_thread(qMX,id_domain)
t.setDaemon(True)
threads.append(t)
t.start()
for mx in answers:
qMX.put(mx.exchange)
qMX.join()
except NoAnswer as e:
print "MX - Error: No Answer"
except Timeout as etime:
print "MX - Error: dns.exception.Timeout"
print "end of script"
I tried to:
for thread in threads:
thread.join()
after the queue was done, but thread.join() never stops waiting, despite fact that there is no need to wait, because when queue.join() executes there is nothing to do for threads.
What I often do when my thread involves an infinite loop like this, is to change the condition to something I can control from the outside. For example like this:
def run(self):
self.keepRunning = True
while self.keepRunning:
# do stuff
That way, I can change the keepRunning property from the outside and set it to false to gracefully terminate the thread the next time it checks the loop condition.
Btw. as you seem to spawn exactly one thread for each item you put into the queue, you don’t even need to have the threads loop at all, although I would argue that you should always enforce a maximum limit of threads that can be created in this way (i.e. for i in range(min(len(answers), MAX_THREAD_COUNT)):)
Alternative
In your case, instead of terminating the threads in each for-loop iteration, you could just reuse the threads. From what I gather from your thread’s source, all that makes a thread unique to an iteration is the id_domain property you set on its creation. You could however just provide that as well with your queue, so the threads are completely independent and you can reuse them.
This could look like this:
qMX = Queue.Queue()
threads = []
for i in range(MAX_THREAD_COUNT):
t = MX_getAAAA_thread(qMX)
t.daemon = True
threads.append(t)
t.start()
for id_domain in enumerateIdDomains():
answers = resolver.query(id_domain, 'MX')
for mx in answers:
qMX.put((id_domain, mx.exchange)) # insert a tuple
qMX.join()
for thread in threads:
thread.keepRunning = False
Of course, you would need to change your thread a bit then:
class MX_getAAAA_thread(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
def run(self):
self.keepRunning = True
while self.keepRunning:
id_domain, mx = self.queue.get()
# do stuff
I do not see why you need a Queue in the first place.
After all in your design every thread just processes one task.
You should be able to pass that task to the thread on creation.
This way you do not need a Queue and you get rid of the while-loop:
class MX_getAAAA_thread(threading.Thread):
def __init__(self, id_domain, mx):
threading.Thread.__init__(self)
self.id_domain = id_domain
self.mx = mx
Then you can rid of the while-loop inside the run-method:
def run(self):
res = dns.resolver.Resolver()
res.lifetime = 1.5
res.timeout = 0.5
try:
answers = res.query(self.mx,'AAAA')
ip_mx = str(answers[0])
except:
ip_mx = "N/A"
with lock:
sql = "INSERT INTO mx (id_domain,mx,ip_mx) VALUES (" + str(id_domain) + ",'" + str(self.mx) + "','" + str(ip_mx) + "')"
try:
cursor.execute(sql)
db.commit()
except:
db.rollback()
print "MX" , '>>' , ip_mx, ' :: ', str(self.mx)
Create one thread for each task
for mx in answers:
t = MX_getAAAA_thread(qMX, id_domain, mx)
t.setDaemon(True)
threads.append(t)
t.start()
and join them
for thread in threads:
thread.join()
Joining the threads will do the trick, but the joins in your case are blocking indefinitely because your threads aren't ever exiting your run loop. You need to exit the run method so that the threads can be joined.

Categories