So I'm having a though time wrapping my head around multiprocessing library and all the functionality. Basicly what I'm trying to accomplish is to start a separate process from a background thread that receives function object and it's positional and keyword arguments.
I have a thread that is started at the beginning and it's job is to execute functions that are passed to it via dependency injection. Once the thread detects that new job is scheduled it takes the job and executes it. The problem is that I have no idea how long that job will take and I would like to terminate it if let's say 10 minutes have passed. Since this can't be accomplished via threading module I decided to take a look at multiprocessing since it's processes can be terminated.
Dependency injection is solved via decorator that encapsulates each function (that is intended to be executed by the thread) that passes function object and it's positional and keyword arguments to the thread that's gonna execute it via * and **. The thread at the end gets all arguments and the function object (this works).
The problem begins when i try to create a Pool and assing work to a single worker. Since I have no idea of function input arguments, how am I able to use apply_async functin with * and **?
def intercept(callback):
def wrapper(*args, **kwargs):
# pass callback, args and kwargs to the thread
pass
return wrapper
#intercept
def do_some_work(first, second, third=None):
time.sleep(10)
def bg_thread():
while True:
# acquire callback, args and kwargs from intercept decorator
# if new job is scheduled create a process and execute it
# if process did not finish in timeout, terminate it
p = multiprocessing.Pool()
ret = p.apply_async(callback, args, kwargs)
p.close()
try:
ret.get(5)
except:
p.terminate()
t = threading.Thread(target=bg_thread)
t.start()
do_some_work()
Related
Basically I'm trying to open a new process every time I call a function. The problem is that when I get the PID inside of the function, the PID is the same as in another functions even if the other functions haven't finished yet.
I'm wrapping my function with a decorator:
def run_in_process(function):
"""Runs a function isolated in a new process.
Args:
function (function): Function to execute.
"""
def wrapper(*args):
parent_connection, child_connection = Pipe()
process = Process(target=function, args=(*args, child_connection))
process.start()
response = parent_connection.recv()
process.join()
return response
return wrapper
And declaring the function like this:
#run_in_process
def example(data, pipe):
print(os.getpid())
pipe.send("Just an example here!")
pipe.close()
Obs1.:This code is running inside a AWS Lambda.
Obs2.: Those lambdas didn't finish before the other one starts, because this tasks takes at least 10 seconds.
Log of execution 1
Log of execution 2
Log of execution 3
You can see at the logs that each one is a different execution and they are executed at the "same" time.
The question is: Why they have the same PID even knowing that they are running concurrently? Shouldn't they have different PIDs?
I obligatorily need to execute this function in an isolated process
Your Lambda function could have been running in multiple containers at once in the AWS cloud. If you've been heavily testing your function with multiple concurrent requests, it is quite possible that the AWS orchestration created additional instances to handle the traffic.
With serverless, you lose some of the visibility into exactly how your code is being executed, but does it really matter?
I use multiprocessing.Pool like so to execute a number of tasks.
def execute(task):
# run task, return result
def on_completion(task_result):
# process task result
async_results = [pool.apply_async(execute,
args=[task],
callback=on_completion)
for task in self.tasks]
# wait for results
My completion handler is invoked by the pool in a nice, serialized way so I don't have to worry about thread safety in its implementation.
However, I would also like to be notified when a task is started. Is there an elegant way to accomplish the following?
def on_start(arg): # Whatever arg(s) were passed to the execute function
# Called when task starts to run
pool.apply_async(run_task,
args=[task],
start_callback=on_start,
completion_callback=on_completion)
I have some code that does the same thing to several files in a python 3 application and so seems like a great candidate for multiprocessing. I'm trying to use Pool to assign work to some number of processes. I'd like the code to continue do other things (mainly displaying things for the user) while these calculations are going on, so i'd like to use the map_async function of the multiprocessing.Pool class for this. I would expect that after calling this, the code will continue and the result will be handled by the callback I've specified, but this doesn't seem to be happening. The following code shows three ways I've tried calling map_async and the results I've seen:
import multiprocessing
NUM_PROCS = 4
def func(arg_list):
arg1 = arg_list[0]
arg2 = arg_list[1]
print('start func')
print ('arg1 = {0}'.format(arg1))
print ('arg2 = {0}'.format(arg2))
time.sleep(1)
result1 = arg1 * arg2
print('end func')
return result1
def callback(result):
print('result is {0}'.format(result))
def error_handler(error1):
print('error in call\n {0}'.format(error1))
def async1(arg_list1):
# This is how my understanding of map_async suggests i should
# call it. When I execute this, the target function func() is not called
with multiprocessing.Pool(NUM_PROCS) as p1:
r1 = p1.map_async(func,
arg_list1,
callback=callback,
error_callback=error_handler)
def async2(arg_list1):
with multiprocessing.Pool(NUM_PROCS) as p1:
# If I call the wait function on the result for a small
# amount of time, then the target function func() is called
# and executes sucessfully in 2 processes, but the callback
# function is never called so the results are not processed
r1 = p1.map_async(func,
arg_list1,
callback=callback,
error_callback=error_handler)
r1.wait(0.1)
def async3(arg_list1):
# if I explicitly call join on the pool, then the target function func()
# successfully executes in 2 processes and the callback function is also
# called, but by calling join the processing is not asynchronous any more
# as join blocks the main process until the other processes are finished.
with multiprocessing.Pool(NUM_PROCS) as p1:
r1 = p1.map_async(func,
arg_list1,
callback=callback,
error_callback=error_handler)
p1.close()
p1.join()
def main():
arg_list1 = [(5, 3), (7, 4), (-8, 10), (4, 12)]
async3(arg_list1)
print('pool executed successfully')
if __name__ == '__main__':
main()
When async1, async2 or async3 is called in main, the results are described in the comments for each function. Could any one explain why the different calls are behaving the way they are? Ultimately I'd like to call map_async as done in async1, so i can do something in else the main process while the worker processes are busy. I have tested this code with python 2.7 and 3.6, on an older RH6 linux box and a newer ubuntu VM, with the same results.
This is happening because when you use the multiprocessing.Pool as a context manager, pool.terminate() is called when you leave the with block, which immediately exits all workers, without waiting for in-progress tasks to finish.
New in version 3.3: Pool objects now support the context management protocol – see Context Manager Types. __enter__() returns the pool object, and __exit__() calls terminate().
IMO using terminate() as the __exit__ method of the context manager wasn't a great design choice, since it seems most people intuitively expect close() will be called, which will wait for in-progress tasks to complete before exiting. Unfortunately all you can do is refactor your code away from using a context manager, or refactor your code so that you guarantee you don't leave the with block until the Pool is done doing its work.
I'm importing multiple python threads from different directories and then want to run them simultaneously.
Here's my parent:
import sys
import thread
sys.path.append('/python/loanrates/test')
import test2
thread.start_new_thread(test2.main())
and here's one of my child's:
import json
def main():
data = 'ello world'
print data
with open( 'D:/python/loanrates/test/it_worked.json', 'w') as f:
json.dump(data, f)
if __name__ == '__main__':
main()
but I am getting this error:
TypeError: start_new_thread expected at least 2 arguments, got 1
What is a simple way I can get this thread started (and then sequentially run multiple threads using the same method)
You also need to provide a tuple with the argument to run the function with. If you have none, pass an empty tuple.
thread.start_new_thread(test2.main, ())
From the docs of thread.start_new_thread(function, args[, kwargs]) (boldface mine):
Start a new thread and return its identifier. The thread executes the function function with the argument list args (which must be a tuple). The optional kwargs argument specifies a dictionary of keyword arguments. When the function returns, the thread silently exits. When the function terminates with an unhandled exception, a stack trace is printed and then the thread exits (but other threads continue to run).
You can also:
thread = Thread(target = test2.main, args, kwargs)
thread.start() // starts the thread
thread.join() // wait
Read more on this approach to creating and working with threads here.
I would like to implement an async callback style function in python... This is what I came up with but I am not sure how to actually return to the main process and call the function.
funcs = {}
def runCallback(uniqueId):
'''
I want this to be run in the main process.
'''
funcs[uniqueId]()
def someFunc(delay, uniqueId):
'''
This function runs in a seperate process and just sleeps.
'''
time.sleep(delay)
### HERE I WANT TO CALL runCallback IN THE MAIN PROCESS ###
# This does not work... It calls runCallback in the separate process:
runCallback(uniqueId)
def setupCallback(func, delay):
uniqueId = id(func)
funcs[uniqueId] = func
proc = multiprocessing.Process(target=func, args=(delay, uniqueId))
proc.start()
return unqiueId
Here is how I want it to work:
def aFunc():
return None
setupCallback(aFunc, 10)
### some code that gets run before aFunc is called ###
### aFunc runs 10s later ###
There is a gotcha here, because I want this to be a bit more complex. Basically when the code in the main process is done running... I want to examine the funcs dict and then run any of the callbacks that have not yet run. This means that runCallback also needs to remove entries from the funcs dict... the funcs dict is not shared with the seperate processes, so I think runCallback needs to be called in the main process???
It is unclear why do you use multiprocessing module here.
To call a function with delay in the same process you could use threading.Timer.
threading.Timer(10, aFunc).start()
Timer has .cancel() method if you'd like to cancel the callback later:
t = threading.Timer(10, runCallback, args=[uniqueId, funcs])
t.start()
timers.append((t, uniqueId))
# do other stuff
# ...
# run callbacks right now
for t, uniqueId in timers:
t.cancel() # after this the `runCallback()` won't be called by Timer()
# if it's not been called already
runCallback(uniqueId, funcs)
Where runCallback() is modified to remove functions to be called:
def runCallback(uniqueId, funcs):
f = funcs.pop(uniqueId, None) # GIL protects this code with some caveats
if f is not None:
f()
To do exactly what you're trying to do, you're going to need to set up a signal handler in the parent process to run the callback (or just remove the callback function that the child runs if it doesn't need access to any of the parent process's memory), and have the child process send a signal, but if your logic gets any more complex, you'll probably need to use another type of inter-process communication (IPC) such as pipes or sockets.
Another possibility is using threads instead of processes, then you can just run the callback from the second thread. You'll need to add a lock to synchronize access to the funcs dict.