Looping tasks in Prefect - python

I want to loop over tasks, again and again, until reaching a certain condition before continuing the rest of the workflow.
What I have so far is this:
# Loop task
class MyLoop(Task):
def run(self):
loop_res = prefect.context.get("task_loop_result", 1)
print (loop_res)
if loop_res >= 10:
return loop_res
raise LOOP(result=loop_res+1)
But as far as I understand this does not work for multiple tasks.
Is there a way to come back further and loop on several tasks at a time ?

The solution is simply to create a single task that itself creates a new flow with one or more parameters and calls flow.run(). For example:
class MultipleTaskLoop(Task):
def run(self):
# Get previous value
loop_res = prefect.context.get("task_loop_result", 1)
# Create subflow
with Flow('Subflow', executor=LocalDaskExecutor()) as flow:
x = Parameter('x', default = 1)
loop1 = print_loop()
add = add_value(x)
loop2 = print_loop()
loop1.set_downstream(add)
add.set_downstream(loop2)
# Run subflow and extract result
subflow_res = flow.run(parameters={'x': loop_res})
new_res = subflow_res.result[add]._result.value
# Loop
if new_res >= 10:
return new_res
raise LOOP(result=new_res)
where print_loop simply prints "loop" in the output and add_value adds one to the value it receives.

Unless I'm missing something, the answer is no.
Prefect flows are DAGs, and what you are describing (looping over multiple tasks in order again and again until some condition is met) would make a cycle, so you can't do it.
This may or may not be helpful, but you could try and make all of the tasks you want to loop into one task, and loop within that task until your exit condition has been met.

Related

How do I run a conditional statement "only once" and every time it changes?

I might be asking a simple question. I have a python program that runs every minute. But I would like a block of code to only run once the condition changes? My code looks like this:
# def shortIndicator():
a = int(indicate_5min.value5)
b = int(indicate_10min.value10)
c = int(indicate_15min.value15)
if a + b + c == 3:
print("Trade posible!")
else:
print("Trade NOT posible!")
# This lets the processor work more than it should.
"""run_once = 0 # This lets the processor work more than it should.
while 1:
if run_once == 0:
shortIndicator()
run_once = 1"""
I've run it without using a function. But then I get an output every minute. I've tried to run it as a function, when I enable the commented code it sort of runs, but also the processing usage is more. If there perhaps a smarter way of doing this?
It's really not clear what you mean, but if you only want to print a notification when the result changes, add another variable to rembember the previous result.
def shortIndicator():
return indicate_5min.value5 and indicate_10min.value10 and indicate_15min.value15
previous = None
while True:
indicator = shortIndicator()
if previous is None or indicator != previous:
if indicator:
print("Trade possible!")
else:
print("Trade NOT possible!")
previous = indicator
# take a break so as not to query too often
time.sleep(60)
Initializing provious to None creates a third state which is only true the first time the while loop executes; by definition, the result cannot be identical to the previous result because there isn't really a previous result the first time.
Perhaps also notice the boolean shorthand inside the function, which is simpler and more idiomatic than converting each value to an int and checking their sum.
I'm guessing the time.sleep is what you were looking for to reduce the load of running this code repeatedly, though that part of the question remains really unclear.
Finally, check the spelling of possible.
If I understand it correctly, you can save previous output to a file, then read it at the beginning of program and print output only if previous output was different.

How to jump out the current while loop and run the next loop whenever meeting a certain condition?

The Python script that I am using is not exactly as below, but I just want to show the logic. Let me explain what I am trying to do: I have a database, and I want to fetch one population (a number A) a time from the database until all numbers are fetched. Each time fetching a number, it does some calculations (A to B) and store the result in C. After all fetched, the database will be updated with C. The while condition just works like a 'switch'.
The thing is that I don't want to fetch a negative number, so when it does fetch one, I want to immediately jump out the current loop and get a next number, until it is not a negative number. I am a beginner of Python. The following script is what I could write, but clearly it doesn't work. I think something like continue, break or try+except should be used here, but I have no idea.
for _ in range(db_size):
condition = True
while condition:
# Get a number from the database
A = db.get_new_number()
# Regenerate a new number if A is negative
if A < 0:
A = db.get_new_number()
B = myfunc1(A)
if B is None:
continue
C=myfunc2(B)
db.update(C)
Use a while loop that repeats until the condition is met.
for _ in range(db_size):
condition = True
while condition:
# Get a number from the database
while True:
A = db.get_new_number()
if A is None:
raise Exception("Ran out of numbers!")
# Regenerate a new number if A is negative
if A >= 0:
break
B = myfunc1(A)
if B is None:
continue
C=myfunc2(B)
db.update(C)
My code assumes that db.get_new_number() returns None when it runs out. Another possibility would be for it to raise an exception itself, then you don't need that check here.

REVISED WITH COMMENTS v1: Multiprocessing on same dict/list

I am fairly new to python, kindly excuse me for insufficient information if any. As a part of the curriculum , I got introduced to python for quants/finance, I am studying multiprocessing and trying to understand this better. I tried modifying the problem given and now I am stuck mentally with the problem.
Problem:
I have a function which gives me ticks, in ohlc format.
{'scrip_name':'ABC','timestamp':1504836192,'open':301.05,'high':303.80,'low':299.00,'close':301.10,'volume':100000}
every minute. I wish to do the following calculation concurrently and preferably append/insert in the samelist
Find the Moving Average of the last 5 close data
Find the Median of the last 5 open data
Save the tick data to a database.
so expected data is likely to be
['scrip_name':'ABC','timestamp':1504836192,'open':301.05,'high':303.80,'low':299.00,'close':301.10,'volume':100000,'MA_5_open':300.25,'Median_5_close':300.50]
Assuming that the data is going to a db, its fairly easy to write a simple dbinsert routine to the database, I don't see that as a great challenge, I can spawn a to execute a insert statement for every minute.
How do I sync 3 different functions/process( a function to insert into db, a function to calculate the average, a function to calculate the median), while holding in memory 5 ticks to calculate the 5 period, simple average Moving Average and push them back to the dict/list.
The following assumption, challenges me in writing the multiprocessing routine. can someone guide me. I don't want to use pandas dataframe.
====REVISION/UPDATE===
The reason, why I don't want any solution on pandas/numpy is that, my objective is to understand the basics, and not the nuances of a new library. Please don't mistake my need for understanding to be arrogance or not wanting to be open to suggestions.
The advantage of having
p1=Process(target=Median,arg(sourcelist))
p2=Process(target=Average,arg(sourcelist))
p3=process(target=insertdb,arg(updatedlist))
would help me understand the possibility of scaling processes based on no of functions /algo components.. But how should I make sure p1&p2 are in sync while p3 should execute post p1&p2
Here is an example of how to use multiprocessing:
from multiprocessing import Pool, cpu_count
def db_func(ma, med):
db.save(something)
def backtest_strat(d, db_func):
a = d.get('avg')
s = map(sum, a)
db_func(s/len(a), median(a))
with Pool(cpu_count()) as p:
from functools import partial
bs = partial(backtest_strat, db_func=db_func)
print(p.map(bs, [{'avg': [1,2,3,4,5], 'median': [1,2,3,4,5]}]))
also see :
https://stackoverflow.com/a/24101655/2026508
note that this will not speed up anything unless there are a lot of slices.
so for the speed up part:
def get_slices(data)
for slice in data:
yield {'avg': [1,2,3,4,5], 'median': [1,2,3,4,5]}
p.map(bs, get_slices)
from what i understand multiprocessing works by message passing via pickles, so the pool.map when called should have access to all three things, the two arrays, and the db_save function. There are of course other ways to go about it, but hopefully this shows one way to go about it.
Question: how should I make sure p1&p2 are in sync while p3 should execute post p1&p2
If you sync all Processes, computing one Task (p1,p2,p3) couldn't be faster as the slowes Process are be.
In the meantime the other Processes running idle.
It's called "Producer - Consumer Problem".
Solution using Queue all Data serialize, no synchronize required.
# Process-1
def Producer()
task_queue.put(data)
# Process-2
def Consumer(task_queue)
data = task_queue.get()
# process data
You want multiple Consumer Processes and one Consumer Process gather all Results.
You don't want to use Queue, but Sync Primitives.
This Example let all Processes run independent.
Only the Process Result waits until notified.
This Example uses a unlimited Task Buffer tasks = mp.Manager().list().
The Size could be minimized if List Entrys for done Tasks are reused.
If you have some very fast algos combine some to one Process.
import multiprocessing as mp
# Base class for all WORKERS
class Worker(mp.Process):
tasks = mp.Manager().list()
task_ready = mp.Condition()
parties = mp.Manager().Value(int, 0)
#classmethod
def join(self):
# Wait until all Data processed
def get_task(self):
for i, task in enumerate(Worker.tasks):
if task is None: continue
if not self.__class__.__name__ in task['result']:
return (i, task['range'])
return (None, None)
# Main Process Loop
def run(self):
while True:
# Get a Task for this WORKER
idx, _range = self.get_task()
if idx is None:
break
# Compute with self Method this _range
result = self.compute(_range)
# Update Worker.tasks
with Worker.lock:
task = Worker.tasks[idx]
task['result'][name] = result
parties = len(task['result'])
Worker.tasks[idx] = task
# If Last, notify Process Result
if parties == Worker.parties.value:
with Worker.task_ready:
Worker.task_ready.notify()
class Result(Worker):
# Main Process Loop
def run(self):
while True:
with Worker.task_ready:
Worker.task_ready.wait()
# Get (idx, _range) from tasks List
idx, _range = self.get_task()
if idx is None:
break
# process Task Results
# Mark this tasks List Entry as done for reuse
Worker.tasks[idx] = None
class Average(Worker):
def compute(self, _range):
return average of DATA[_range]
class Median(Worker):
def compute(self, _range):
return median of DATA[_range]
if __name__ == '__main__':
DATA = mp.Manager().list()
WORKERS = [Result(), Average(), Median()]
Worker.start(WORKERS)
# Example creates a Task every 5 Records
for i in range(1, 16):
DATA.append({'id': i, 'open': 300 + randrange(0, 5), 'close': 300 + randrange(-5, 5)})
if i % 5 == 0:
Worker.tasks.append({'range':(i-5, i), 'result': {}})
Worker.join()
Tested with Python: 3.4.2

for loop generator with a sentinel

I have an application with one producer and many consumers, and a queue, which communicates them.
The consumer should collect some data from queue, let's say qsize()/number_of_consumers, but it must stop it work when a sentinel appears.
I have such a code:
frame = 0
elems_max = 10
while frame is not None:
frames = []
for _ in range(elems_max):
frame = queue_in.get()
if frame:
frames.append(frame)
else:
break
process_data(frames)
As You can see None is a sentinel for this queue, and when it appears I wan't to break my working process. I also want to get more then one element for data processing.
What is the fastest method to achieve this [in python 3.5]?
I understand that you want to break your outer while when encountering a None.
You can hold a boolean variable that is True while the while must execute, and False when it should stop.
This would look like this:
frame = 0
elems_max = 10
running = True
while running and frame is not None:
frames = []
for _ in range(elems_max):
frame = queue_in.get()
if frame is not None:
frames.append(frame)
else:
running = False
break
process_data(frames)
The break instruction will break the inner for, but not the outer while.
However, having set running to False, the while loop will stop.
Based on your comment.
It is not possible to include a break statement in a comprehension, nor an else clause, as you wanted to do:
frames = [f for i in range(elems_max) if queue_in.get() is not None else break]
However, you can build your list, and then remove all the elements after a None:
frames = [queue_in.get() for _ in range(elems_max)]
try:
noneId = frames.find(None)
frames = frames[:noneId]
except ValueError:
pass
This is not very efficient, because potentially many elements will be appended in frames for nothing.
I would prefer a manual construction, to avoid this hazard.
One more solution, based on a generator.
This might not be what you expected, but the syntax is rather simple, so you may like it.
The idea is to wrap the getting of the data inside of a generator, that breaks on a None value:
def queue_data_generator(queue, count):
for _ in range(count):
item = queue.get()
if item is None:
raise StopIteration
else:
yield item
Then, instantiate this generator, and simply iterate over it:
g = queue_data_generator(queue_in, elems_max)
frames = [frame for frame in g]
The frames list will contain all the frames contained in queue_in, until the first None.
The usage is rather simple, but you have to setup it by defining the generator.
I think it's pretty elegant though.
I would do next (kinda pseudocode):
class CInputQueue:
def get(self, preferred_N):
# do sync stuff
# take <= N elements (you can do kinda balance the load)
# or throw exception
raise Exception("No data, no work, no life.")
elems_max = 10
try:
while True:
process_data(queue_in.get(elems_max))
except:
None # break
I assume that data processing takes much more time, than 0 ms, so I use exception. I know that it's not okay use exceptions for flow control, but for worker it's really exception. His "life" build around processing data, but their is no work for him, no even sleep task.

Different inputs for different processes in python multiprocessing

Please bear with me as this is a bit of a contrived example of my real application. Suppose I have a list of numbers and I wanted to add a single number to each number in the list using multiple (2) processes. I can do something like this:
import multiprocessing
my_list = list(range(100))
my_number = 5
data_line = [{'list_num': i, 'my_num': my_number} for i in my_list]
def worker(data):
return data['list_num'] + data['my_num']
pool = multiprocessing.Pool(processes=2)
pool_output = pool.map(worker, data_line)
pool.close()
pool.join()
Now however, there's a wrinkle to my problem. Suppose that I wanted to alternate adding two numbers (instead of just adding one). So around half the time, I want to add my_number1 and the other half of the time I want to add my_number2. It doesn't matter which number gets added to which item on the list. However, the one requirement is that I don't want to be adding the same number simultaneously at the same time across the different processes. What this boils down to essentially (I think) is that I want to use the first number on Process 1 and the second number on Process 2 exclusively so that the processes are never simultaneously adding the same number. So something like:
my_num1 = 5
my_num2 = 100
data_line = [{'list_num': i, 'my_num1': my_num1, 'my_num2': my_num2} for i in my_list]
def worker(data):
# if in Process 1:
return data['list_num'] + data['my_num1']
# if in Process 2:
return data['list_num'] + data['my_num2']
# and so forth
Is there an easy way to specify specific inputs per process? Is there another way to think about this problem?
multiprocessing.Pool allows to execute an initializer function which is going to be executed before the actual given function will be run.
You can use it altogether with a global variable to allow your function to understand in which process is running.
You probably want to control the initial number the processes will get. You can use a Queue to notify to the processes which number to pick up.
This solution is not optimal but it works.
import multiprocessing
process_number = None
def initializer(queue):
global process_number
process_number = queue.get() # atomic get the process index
def function(value):
print "I'm process %s" % process_number
return value[process_number]
def main():
queue = multiprocessing.Queue()
for index in range(multiprocessing.cpu_count()):
queue.put(index)
pool = multiprocessing.Pool(initializer=initializer, initargs=[queue])
tasks = [{0: 'Process-0', 1: 'Process-1', 2: 'Process-2'}, ...]
print(pool.map(function, tasks))
My PC is a dual core, as you can see only Process-0 and Process-1 are processed.
I'm process 0
I'm process 0
I'm process 1
I'm process 0
I'm process 1
...
['Process-0', 'Process-0', 'Process-1', 'Process-0', ... ]

Categories