The script below should set up scheduling at two specified times while passing individual parameters to each run. In this case the output should be 1 and 2, however the output is 2 and 2, somehow passing only the last variable to the parameters.
import schedule, threading, time
def run_script(n):
#Check this output
print(n)
def run_threaded(job_func):
job_thread = threading.Thread(target=job_func)
job_thread.start()
times = ["13:56", "13:56"]
for n,row in enumerate(times, 1):
schedule.every().day.at(row).do(run_threaded, lambda: run_script(n))
while True:
schedule.run_pending()
time.sleep(1)
If you run the script without the loop manually like this:
schedule.every().day.at("18:56").do(run_threaded, lambda: run_script(1))
schedule.every().day.at("18:56").do(run_threaded, lambda: run_script(2))
This works fine! However when generated by the loop the result is not the same.
Does anyone have any idea why and how to solve this?
Related
I'm working on a project that needs to run two different CPU-intensive functions. Hence using a multiproccessing approach seems to be the way to go. The challenge that I'm facing is that one function has a slower runtime than the other one. For the sake of argument lets say that execute has a runtime of .1 seconds while update takes a full second to run. The goal is that while update is running execute will have calculated an output value 10 times. Once update has finished it needs to pass a set of parameters to execute which can then continue generating an output with the new set of parameters. After sometime update needs to run again and once more generate a new set of parameters.
Furthermore both functions will require a different set of input variables.
The image link below should hopefully visualize my conundrum a bit better.
function runtime visualisation
From what I've gathered (https://zetcode.com/python/multiprocessing/) using an asymetric mapping approach might be the way to go, but it doesn't really seem to work. Any help is greatly appreciated.
Pseudo Code
from multiprocessing import Pool
from datetime import datetime
import time
import numpy as np
class MyClass():
def __init__(self, inital_parameter_1, inital_parameter_2):
self.parameter_1 = inital_parameter_1
self.parameter_2 = inital_parameter_2
def execute(self, input_1, input_2, time_in):
print('starting execute function for time:' + str(time_in))
time.sleep(0.1) # wait for 100 milliseconds
# generate some output
output = (self.parameter_1 * input_1) + (self.parameter_2 + input_2)
print('exiting execute function')
return output
def update(self, update_input_1, update_input_2, time_in):
print('starting update function for time:' + str(time_in))
time.sleep(1) # wait for 1 second
# generate parameters
self.parameter_1 += update_input_1
self.parameter_2 += update_input_2
print('exiting update function')
def smap(f):
return f()
if __name__ == "__main__":
update_input_1 = 3
update_input_2 = 4
input_1 = 0
input_2 = 1
# initialize class
my_class = MyClass(1, 2)
# total runtime (arbitrary)
runtime = int(10e6)
# update_time (arbitrary)
update_time = np.array([10, 10e2, 15e4, 20e5])
for current_time in range(runtime):
# if time equals update time run both functions simultanously until update is complete
if any(update_time == current_time):
with Pool() as pool:
res = pool.map_async(my_class.smap, [my_class.execute(input_1, input_2, current_time),
my_class.update(update_input_1, update_input_2, current_time)])
# otherwise run only execute
else:
output = my_class.execute(input_1, input_2,current_time)
# increment input
input_1 += 1
input_2 += 2
I confess to not being able to fully following your code vis-a-vis your description. But I see some issues:
Method update is not returning any value other than None, which is implicitly returned due to the lack of a return statement.
Your with Pool() ...: block will call terminate upon block exit, which is immediately after your call to pool.map_async, which is non-blocking. But you have no provision to wait for the completion of this submitted task (terminate will most likely kill the running task before it completes).
What you are passing to the map_async function is the worker function name and an iterable. But you are invoking method calls to execute and update from the current main process and using their return values as elements of the iterable and these return values are definitely not functions suitable for passing to smap. So there is no multiprocessing being done and this is just plain wrong.
You are also creating and destroying process pools over and over again. Much better to create the process pool just once.
I would therefore recommend the following changes at the very least. But note that this code potentially generates tasks much faster than they can be completed and you could have millions of tasks queued up to run given your current runtime value, which could be quite a strain on system resources such as memory. So I've inserted some code that ensures that the rate of submitting tasks is throttled so that the number of incomplete submitted tasks is never more than three times the number of CPU cores available.
# we won't need heavy-duty numpy for what we are doing:
#import numpy as np
from multiprocessing import cpu_count
from threading import Lock
... # etc.
if __name__ == "__main__":
update_input_1 = 3
update_input_2 = 4
input_1 = 0
input_2 = 1
# initialize class
my_class = MyClass(1, 2)
# total runtime (arbitrary)
runtime = int(10e6)
# update_time (arbitrary)
# we don't need overhead of numpy (remove import of numpy):
#update_time = np.array([10, 10e2, 15e4, 20e5])
update_time = [10, 10e2, 15e4, 20e5]
tasks_submitted = 0
lock = Lock()
execute_output = []
def execute_result(result):
global tasks_submitted
with lock:
tasks_submitted -= 1
# result is the return value from method execute
# do something with it, e.g. execute_output.append(result)
pass
update_output = []
def update_result(result):
global tasks_submitted
with lock:
tasks_submitted -= 1
# result is the return value from method update
# do something with it, e.g. update_output.append(result)
pass
n_processors = cpu_count()
with Pool() as pool:
for current_time in range(runtime):
# if time equals update time run both functions simultanously until update is complete
#if any(update_time == current_time):
if current_time in update_time:
# run both update and execute:
pool.apply_async(my_class.update, args=(update_input_1, update_input_2, current_time), callback=update_result)
with lock:
tasks_submitted += 1
pool.apply_async(my_class.execute, args=(input_1, input_2, current_time), callback=execute_result)
with lock:
tasks_submitted += 1
# increment input
input_1 += 1
input_2 += 2
while tasks_submitted > n_processors * 3:
time.sleep(.05)
# Ensure all tasks have completed:
pool.close()
pool.join()
assert(tasks_submitted == 0)
I want to loop over tasks, again and again, until reaching a certain condition before continuing the rest of the workflow.
What I have so far is this:
# Loop task
class MyLoop(Task):
def run(self):
loop_res = prefect.context.get("task_loop_result", 1)
print (loop_res)
if loop_res >= 10:
return loop_res
raise LOOP(result=loop_res+1)
But as far as I understand this does not work for multiple tasks.
Is there a way to come back further and loop on several tasks at a time ?
The solution is simply to create a single task that itself creates a new flow with one or more parameters and calls flow.run(). For example:
class MultipleTaskLoop(Task):
def run(self):
# Get previous value
loop_res = prefect.context.get("task_loop_result", 1)
# Create subflow
with Flow('Subflow', executor=LocalDaskExecutor()) as flow:
x = Parameter('x', default = 1)
loop1 = print_loop()
add = add_value(x)
loop2 = print_loop()
loop1.set_downstream(add)
add.set_downstream(loop2)
# Run subflow and extract result
subflow_res = flow.run(parameters={'x': loop_res})
new_res = subflow_res.result[add]._result.value
# Loop
if new_res >= 10:
return new_res
raise LOOP(result=new_res)
where print_loop simply prints "loop" in the output and add_value adds one to the value it receives.
Unless I'm missing something, the answer is no.
Prefect flows are DAGs, and what you are describing (looping over multiple tasks in order again and again until some condition is met) would make a cycle, so you can't do it.
This may or may not be helpful, but you could try and make all of the tasks you want to loop into one task, and loop within that task until your exit condition has been met.
I'm used to multiprocessing, but now I have a problem where mp.Pool isn't the tool that I need.
I have a process that prepares input and another process that uses it. I'm not using up all of my cores, so I want to have the two go at the same time, with the first getting the batch ready for the next iteration. How do I do this? And (importantly) what is this sort of thing called, so that I can go and google it?
Here's a dummy example. The following code takes 8 seconds:
import time
def make_input():
time.sleep(1)
return "cthulhu r'lyeh wgah'nagl fhtagn"
def make_output(input):
time.sleep(1)
return input.upper()
start = time.time()
for i in range(4):
input = make_input()
output = make_output(input)
print(output)
print(time.time() - start)
CTHULHU R'LYEH WGAH'NAGL FHTAGN
CTHULHU R'LYEH WGAH'NAGL FHTAGN
CTHULHU R'LYEH WGAH'NAGL FHTAGN
CTHULHU R'LYEH WGAH'NAGL FHTAGN
8.018263101577759
If I were preparing input batches at the same time as I was doing the output, it would take four seconds. Something like this:
next_input = make_input()
start = time.time()
for i in range(4):
res = do_at_the_same_time(
output = make_output(next_input),
next_input = make_input()
)
print(output)
print(time.time() - start)
But, obviously, that doesn't work. How can I accomplish what I'm trying to accomplish?
Important note: I tried the following, but it failed because the executing worker was working in the wrong scope (like, for my actual use-case). In my dummy use-case, it doesn't work because it prints in a different process.
def proc(i):
if i == 0:
return make_input()
if i == 1:
return make_output(next_input)
next_input = make_input()
for i in range(4):
pool = mp.Pool(2)
next_input = pool.map(proc, [0, 1])[0]
pool.close()
So I need a solution where the second processes happens in the same scope or environment as the for loop, and where the first has output that can be gotten from that scope.
You should be able to use Pool. If I understand it correctly, you want one worker to prepare the input for the next worker which runs and does something more with it, given your example functions, this should do just that:
pool = mp.Pool(2)
for i in range(4):
next_input = pool.apply(make_input)
pool.apply_async(make_output, (next_input, ), callback=print)
pool.close()
pool.join()
We prepare a pool with 2 workers, now we want run the loop to run our pair of tasks twice.
We delegate make_input to a worker using apply() waiting for the function to complete assign the result to next_input. Note: in this example we could have used a single worker pool and just run next_input = make_input() (i.e. in the same process your script runs in and just delegate the make_output()).
Now the more interesting bit: by using apply_async() we ask a worker to run make_output, passing single parameter next_input to it and telling it to runt (or any function) print with the result of make_output as argument passed to the function registered with callback.
Then we close() the pool not accepting any more jobs and join() to wait for processes to complete their jobs.
I have a pretty specific problem. I want to measure execution time of the generator loop (with the yield keyword). However, I don't know in what intervals next() will be called on this generator. This means I can't just get the timestamp before and after the loop. I thought getting the timestamp at the beginning and end of each iteration will do the trick but I'm getting very inconsistent results.
Here's the test code:
import time
def gen(n):
total = 0
for i in range(n):
t1 = time.process_time_ns()
# Something that takes time
x = [i ** i for i in range(i)]
t2 = time.process_time_ns()
yield x
total += t2 - t1
print(total)
def main():
for i in gen(100):
pass
for i in gen(100):
time.sleep(0.001)
for i in gen(100):
time.sleep(0.01)
if __name__ == '__main__':
main()
Typical output for me looks something like this:
2151918
9970539
11581393
As you can see it looks like the delay outside of the loop somehow influences execution time of the loop itself.
What is the reason of this behavior? How can I avoid this inconsistency? Maybe there's some entirely different way of doing what I'm trying to achieve?
You can switch the yield x and total += t2 - t1 lines to only count the time it takes to create x.
For more in dept also see: Behaviour of Python's "yield"
Please bear with me as this is a bit of a contrived example of my real application. Suppose I have a list of numbers and I wanted to add a single number to each number in the list using multiple (2) processes. I can do something like this:
import multiprocessing
my_list = list(range(100))
my_number = 5
data_line = [{'list_num': i, 'my_num': my_number} for i in my_list]
def worker(data):
return data['list_num'] + data['my_num']
pool = multiprocessing.Pool(processes=2)
pool_output = pool.map(worker, data_line)
pool.close()
pool.join()
Now however, there's a wrinkle to my problem. Suppose that I wanted to alternate adding two numbers (instead of just adding one). So around half the time, I want to add my_number1 and the other half of the time I want to add my_number2. It doesn't matter which number gets added to which item on the list. However, the one requirement is that I don't want to be adding the same number simultaneously at the same time across the different processes. What this boils down to essentially (I think) is that I want to use the first number on Process 1 and the second number on Process 2 exclusively so that the processes are never simultaneously adding the same number. So something like:
my_num1 = 5
my_num2 = 100
data_line = [{'list_num': i, 'my_num1': my_num1, 'my_num2': my_num2} for i in my_list]
def worker(data):
# if in Process 1:
return data['list_num'] + data['my_num1']
# if in Process 2:
return data['list_num'] + data['my_num2']
# and so forth
Is there an easy way to specify specific inputs per process? Is there another way to think about this problem?
multiprocessing.Pool allows to execute an initializer function which is going to be executed before the actual given function will be run.
You can use it altogether with a global variable to allow your function to understand in which process is running.
You probably want to control the initial number the processes will get. You can use a Queue to notify to the processes which number to pick up.
This solution is not optimal but it works.
import multiprocessing
process_number = None
def initializer(queue):
global process_number
process_number = queue.get() # atomic get the process index
def function(value):
print "I'm process %s" % process_number
return value[process_number]
def main():
queue = multiprocessing.Queue()
for index in range(multiprocessing.cpu_count()):
queue.put(index)
pool = multiprocessing.Pool(initializer=initializer, initargs=[queue])
tasks = [{0: 'Process-0', 1: 'Process-1', 2: 'Process-2'}, ...]
print(pool.map(function, tasks))
My PC is a dual core, as you can see only Process-0 and Process-1 are processed.
I'm process 0
I'm process 0
I'm process 1
I'm process 0
I'm process 1
...
['Process-0', 'Process-0', 'Process-1', 'Process-0', ... ]