I have an application with one producer and many consumers, and a queue, which communicates them.
The consumer should collect some data from queue, let's say qsize()/number_of_consumers, but it must stop it work when a sentinel appears.
I have such a code:
frame = 0
elems_max = 10
while frame is not None:
frames = []
for _ in range(elems_max):
frame = queue_in.get()
if frame:
frames.append(frame)
else:
break
process_data(frames)
As You can see None is a sentinel for this queue, and when it appears I wan't to break my working process. I also want to get more then one element for data processing.
What is the fastest method to achieve this [in python 3.5]?
I understand that you want to break your outer while when encountering a None.
You can hold a boolean variable that is True while the while must execute, and False when it should stop.
This would look like this:
frame = 0
elems_max = 10
running = True
while running and frame is not None:
frames = []
for _ in range(elems_max):
frame = queue_in.get()
if frame is not None:
frames.append(frame)
else:
running = False
break
process_data(frames)
The break instruction will break the inner for, but not the outer while.
However, having set running to False, the while loop will stop.
Based on your comment.
It is not possible to include a break statement in a comprehension, nor an else clause, as you wanted to do:
frames = [f for i in range(elems_max) if queue_in.get() is not None else break]
However, you can build your list, and then remove all the elements after a None:
frames = [queue_in.get() for _ in range(elems_max)]
try:
noneId = frames.find(None)
frames = frames[:noneId]
except ValueError:
pass
This is not very efficient, because potentially many elements will be appended in frames for nothing.
I would prefer a manual construction, to avoid this hazard.
One more solution, based on a generator.
This might not be what you expected, but the syntax is rather simple, so you may like it.
The idea is to wrap the getting of the data inside of a generator, that breaks on a None value:
def queue_data_generator(queue, count):
for _ in range(count):
item = queue.get()
if item is None:
raise StopIteration
else:
yield item
Then, instantiate this generator, and simply iterate over it:
g = queue_data_generator(queue_in, elems_max)
frames = [frame for frame in g]
The frames list will contain all the frames contained in queue_in, until the first None.
The usage is rather simple, but you have to setup it by defining the generator.
I think it's pretty elegant though.
I would do next (kinda pseudocode):
class CInputQueue:
def get(self, preferred_N):
# do sync stuff
# take <= N elements (you can do kinda balance the load)
# or throw exception
raise Exception("No data, no work, no life.")
elems_max = 10
try:
while True:
process_data(queue_in.get(elems_max))
except:
None # break
I assume that data processing takes much more time, than 0 ms, so I use exception. I know that it's not okay use exceptions for flow control, but for worker it's really exception. His "life" build around processing data, but their is no work for him, no even sleep task.
Related
I want to loop over tasks, again and again, until reaching a certain condition before continuing the rest of the workflow.
What I have so far is this:
# Loop task
class MyLoop(Task):
def run(self):
loop_res = prefect.context.get("task_loop_result", 1)
print (loop_res)
if loop_res >= 10:
return loop_res
raise LOOP(result=loop_res+1)
But as far as I understand this does not work for multiple tasks.
Is there a way to come back further and loop on several tasks at a time ?
The solution is simply to create a single task that itself creates a new flow with one or more parameters and calls flow.run(). For example:
class MultipleTaskLoop(Task):
def run(self):
# Get previous value
loop_res = prefect.context.get("task_loop_result", 1)
# Create subflow
with Flow('Subflow', executor=LocalDaskExecutor()) as flow:
x = Parameter('x', default = 1)
loop1 = print_loop()
add = add_value(x)
loop2 = print_loop()
loop1.set_downstream(add)
add.set_downstream(loop2)
# Run subflow and extract result
subflow_res = flow.run(parameters={'x': loop_res})
new_res = subflow_res.result[add]._result.value
# Loop
if new_res >= 10:
return new_res
raise LOOP(result=new_res)
where print_loop simply prints "loop" in the output and add_value adds one to the value it receives.
Unless I'm missing something, the answer is no.
Prefect flows are DAGs, and what you are describing (looping over multiple tasks in order again and again until some condition is met) would make a cycle, so you can't do it.
This may or may not be helpful, but you could try and make all of the tasks you want to loop into one task, and loop within that task until your exit condition has been met.
The following code will gather data from an API and the try/except clause will help to handle several errors (from authentication, index, anything).
There's only one error (an authentication error) that I'm using the while True to repeat the API call to make sure I get the data and it will after a try or two. However if by any means I get another error, it'll be infinitely looping and I can't break it so it goes to the next iteration. I tried to create a counter and if the counter reaches to a number then (pass or continue or break) but it's not working.
## Create a array to loop to:
data_array_query = pd.date_range(start_date,end_date,freq='6H')
#This is my idea but is not working
#Create a counter
counter = 0
#Loop through the just created array
for idx in range(len(data_array_query)-1):
## If counter reaches move on to next for loop element
while True:
if counter>=5:
break
else:
try:
start_date = data_array_query[idx]
end_date = data_array_query[idx+1]
print('from',start_date,'to',end_date)
df = api.query(domain, site_slug, resolution, data_series_collection, start_date=str(start_date), end_date=str(end_date), env='prod', from_archive=True, phase='production').sort_index()
print(df.info())
break
except Exception as e:
print(e)
counter +=1
print(counter)
So the output of running this code for a couple of days show that when it runs 5 times (that's the counter max I set up) it does break but it breaks the whole loop and I only want it to move to the next date.
Any help will be appreciated,
You need to use a break statement to get out of a while True loop. pass and continue work for for loops that have a fixed number of iterations. While loops can go on forever (hence the different names)
I have a piece of code with a multiprocessing implementation:
q = range(len(aaa))
w = range(len(aab))
e = range(len(aba))
paramlist = list(itertools.product(q,w,e))
def f(combinations):
q = combinations[0]
w = combinations[1]
e = combinations[2]
# the rest of the function
if __name__ == '__main__':
pool = mul.Pool(4)
res_p = pool.map(f, paramlist)
for _ in tqdm.tqdm(res_p, total=len(paramlist)):
pass
pool.close()
pool.join()
Where 'aaa, aab, aba' are lists with triple values of type:
aaa = [[1,2,3], [3,5,1], ...], etc.
And I wanted to use imap() to be able to follow the calculation progress using module tqdm().
But why does the map() show me the length of the list(res_p) list correctly, but when I change to imap(), the list is empty? Can you track progress using the map() module?
tqdm doesn't work with map because map is blocking; it waits for all results and then returns them as a list. By the time your loop is executed, the only progress to be made is what happens in that loop—the parallel phase has already been completed.
imap does not block, since it returns just an iterator, i.e. a thing you can ask for the next result, and the next result, and the next result. Only when you do that, by looping over it, the next result is waited for, one after another. The consequence of it being an iterator means that once all results have been consumed (the end of your loop), it is empty. As such, there's nothing left to put in a list. If you wish to keep the results, you could append each in the loop, for example, or change the code to this:
res_p = list(tqdm.tqdm(pool.imap(f, paramlist), total=len(paramlist)))
for res in res_p:
... # Do stuff
I was reading a proxy server developed using python
I don't understand the method def _read_write which uses select to write client and server socket.
def _read_write(self):
time_out_max = self.timeout/3
socs = [self.client, self.target]
count = 0
while 1:
count += 1
(recv, _, error) = select.select(socs, [], socs, 3)
if error:
break
if recv:
for in_ in recv:
data = in_.recv(BUFLEN)
if in_ is self.client:
out = self.target
else:
out = self.client
if data:
out.send(data)
count = 0
if count == time_out_max:
break
Please someone help me to understand.
Here is my quick and dirty annotation:
def _read_write(self):
# This allows us to get multiple
# lower-level timeouts before we give up.
# (but see later note about Python 3)
time_out_max = self.timeout/3
# We have two sockets we care about
socs = [self.client, self.target]
# Loop until error or timeout
count = 0
while 1:
count += 1
# select is very efficient. It will let
# other processes execute until we have
# data or an error.
# We only care about receive and error
# conditions, so we pass in an empty list
# for transmit, and assign transmit results
# to the _ variable to ignore.
# We also pass a timeout of 3 seconds, which
# is why it's OK to divide the timeout value
# by 3 above.
# Note that select doesn't read anything for
# us -- it just blocks until data is ready.
(recv, _, error) = select.select(socs, [], socs, 3)
# If we have an error, break out of the loop
if error:
break
# If we have receive data, it's from the client
# for the target, or the other way around, or
# even both. Loop through and deal with whatever
# receive data we have and send it to the other
# port.
# BTW, "if recv" is redundant here -- (a) in
# general (except for timeouts) we'll have
# receive data here, and (b) the for loop won't
# execute if we don't.
if recv:
for in_ in recv:
# Read data up to a max of BUFLEN,
data = in_.recv(BUFLEN)
# Dump the data out the other side.
# Indexing probably would have been
# more efficient than this if/else
if in_ is self.client:
out = self.target
else:
out = self.client
# I think this may be a bug. IIRC,
# send is not required to send all the
# data, but I don't remember and cannot
# be bothered to look it up right now.
if data:
out.send(data)
# Reset the timeout counter.
count = 0
# This is ugly -- should be >=, then it might
# work even on Python 3...
if count == time_out_max:
break
# We're done with the loop and exit the function on
# either a timeout or an error.
I am writing some code to build a table of variable length (Huffman) codes, and I wanted to use the multiprocessing module for fun. The idea is to have each process try to get a node from the queue. They do work on the node, and either put that nodes two children back into the work queue, or they put the variable length code into result queue. They are also passing messages to a message queue, which should be printed by a thread in the main process. Here is the code so far:
import Queue
import multiprocessing as mp
from threading import Thread
from collections import Counter, namedtuple
Node = namedtuple("Node", ["child1", "child2", "weight", "symbol", "code"])
def _sort_func(node):
return node.weight
def _encode_proc(proc_number, work_queue, result_queue, message_queue):
while True:
try:
#get a node from the work queue
node = work_queue.get(timeout=0.1)
#if it is an end node, add the symbol-code pair to the result queue
if node.child1 == node.child2 == None:
message_queue.put("Symbol processed! : proc%d" % proc_number)
result_queue.put({node.symbol:node.code})
#otherwise do some work and add some nodes to the work queue
else:
message_queue.put("More work to be done! : proc%d" % proc_number)
node.child1.code.append(node.code + '0')
node.child2.code.append(node.code + '1')
work_queue.put(node.child1)
work_queue.put(node.child2)
except Queue.Empty: #everything is probably done
return
def _reporter_thread(message_queue):
while True:
try:
message = message_queue.get(timeout=0.1)
print message
except Queue.Empty: #everything is probably done
return
def _encode_tree(tree, symbol_count):
"""Uses multiple processes to walk the tree and build the huffman codes."""
#Create a manager to manage the queues, and a pool of workers.
manager = mp.Manager()
worker_pool = mp.Pool()
#create the queues you will be using
work = manager.Queue()
results = manager.Queue()
messages = manager.Queue()
#add work to the work queue, and start the message printing thread
work.put(tree)
message_thread = Thread(target=_reporter_thread, args=(messages,))
message_thread.start()
#add the workers to the pool and close it
for i in range(mp.cpu_count()):
worker_pool.apply_async(_encode_proc, (i, work, results, messages))
worker_pool.close()
#get the results from the results queue, and update the table of codes
table = {}
while symbol_count > 0:
try:
processed_symbol = results.get(timeout=0.1)
table.update(processed_symbol)
symbol_count -= 1
except Queue.Empty:
print "WAI DERe NO SYMBOLzzzZzz!!!"
finally:
print "Symbols to process: %d" % symbol_count
return table
def make_huffman_table(data):
"""
data is an iterable containing the string that needs to be encoded.
Returns a dictionary mapping symbols to codes.
"""
#Build a list of Nodes out of the characters in data
nodes = [Node(None, None, weight, symbol, bytearray()) for symbol, weight in Counter(data).items()]
nodes.sort(reverse=True, key=_sort_func)
symbols = len(nodes)
append_node = nodes.append
while len(nodes) > 1:
#make a new node out of the two nodes with the lowest weight and add it to the list of nodes.
child2, child1 = nodes.pop(), nodes.pop()
new_node = Node(child1, child2, child1.weight+child2.weight, None, bytearray())
append_node(new_node)
#then resort the nodes
nodes.sort(reverse=True, key=_sort_func)
top_node = nodes[0]
return _encode_tree(top_node, symbols)
def chars(fname):
"""
A simple generator to make reading from files without loading them
totally into memory a simple task.
"""
f = open(fname)
char = f.read(1)
while char != '':
yield char
char = f.read(1)
f.close()
raise StopIteration
if __name__ == "__main__":
text = chars("romeo-and-juliet.txt")
table = make_huffman_table(text)
print table
The current output of this is:
More work to be done! : proc0
WAI DERe NO SYMBOLzzzZzz!!!
Symbols to process: 92
WAI DERe NO SYMBOLzzzZzz!!!
Symbols to process: 92
WAI DERe NO SYMBOLzzzZzz!!!
Symbols to process: 92
It just repeats the last bit forever. After the first process adds work to the node, everything just stops. Why is that? Am I not understand/using queues properly? Sorry for all the code to read.
Your first problem is trying to use timeouts. They're almost never a good idea. They may be a good idea if you can't possibly think of a reliable way to do something efficiently, and you use timeouts only as a first step in checking whether something is really done.
That said, the primary problem is that multiprocessing is often very bad at reporting exceptions that occur in worker processes. Your code is actually dying here:
node.child1.code.append(node.code + '0')
The error message you're not seeing is "an integer or string of size 1 is required". You can't append a bytearray to a bytearray. You want to do :
node.child1.code.extend(node.code + '0')
^^^^^^
instead, and in the similar line for child2. As is, because the first worker process to take something off the work queue dies, nothing more is ever added to the work queue. That explains everything you've seen - so far ;-)
No timeouts
FYI, the usual approach to avoid timeouts (which are flaky - unreliable) is to put a special sentinel value on a queue. Consumers know it's time to quit when they see the sentinel, and use a plain blocking .get() to retrieve items from the queue. So first thing is to create a sentinel; e.g., add this near the top:
ALL_DONE = "all done"
Best practice is also to .join() threads and processes - that way the main program knows (doesn't just guess) when they're done too.
So, you can change the end of _encode_tree() like so:
for i in range(1, symbol_count + 1):
processed_symbol = results.get()
table.update(processed_symbol)
print "Symbols to process: %d" % (symbol_count - i)
for i in range(mp.cpu_count()):
work.put(ALL_DONE)
worker_pool.join()
messages.put(ALL_DONE)
message_thread.join()
return table
The key here is that the main program knows all the work is done when, and only when, no symbols remain to be processed. Until then, it can unconditionally .get() results from the results queue. Then it puts a number of sentinels on the work queue equal to the number of workers. They'll each consume a sentinel and quit. Then we wait for them to finish (worker_pool.join()). Then a sentinel is put on the message queue, and we wait for that thread to end too. Only then does the function return.
Now nothing ever terminates early, everything is shut down cleanly, and the output of your final table isn't mixed up anymore with various other output from the workers and the message thread. _reporter_thread() gets rewritten like so:
def _reporter_thread(message_queue):
while True:
message = message_queue.get()
if message == ALL_DONE:
break
else:
print message
and similarly for _encode_proc(). No more timeouts or try/except Queue.Empty: fiddling. You don't even have to import Queue anymore :-)