AWS Lambda function not running concurrently - python

I have an AWS Lambda function that gets invoked from another function. The first function processes the data and invokes the other when it is finished. The second function will get n instances to run at the same time.
For example the second function takes about 5 seconds (for each invoke) to run; I want this function to run all at the time they are invoked for a total run time of about 5 seconds.
The function takes longer than that and runs each function one at a time until the one prior is finished; this process takes 5*n seconds.
I see that I can scale the function to run up to 1,000 in my region as stated by AWS. How can I make this run concurrently? Don't need a code example, just a general process I can look into to fix the problem.
The first function header looks like this: (I have other code that gets the json_file that I left out)
def lambda_handler(event=None, context=None):
for n in range(len(json_file)):
response = client.invoke(
FunctionName='docker-selenium-lambda-prod-demo',
InvocationType='RequestResponse',
Payload=json.dumps(json_file[n])
)
responseJson = json.load(response['Payload'])
where json_file[n] is being sent to the other function to run.

As you can see in boto3 docs about invoke function:
Invokes a Lambda function. You can invoke a function synchronously (and wait for the response), or asynchronously. To invoke a function asynchronously, set InvocationType to Event .
If you are using RequestResponse, your code will wait until the lambda called is terminated.
You can either change InvocationType to Event or use something like ThreadPoolExecutor and wait until all executions are finished

Related

Python multiprocessing - Subprocesses with same PID

Basically I'm trying to open a new process every time I call a function. The problem is that when I get the PID inside of the function, the PID is the same as in another functions even if the other functions haven't finished yet.
I'm wrapping my function with a decorator:
def run_in_process(function):
"""Runs a function isolated in a new process.
Args:
function (function): Function to execute.
"""
def wrapper(*args):
parent_connection, child_connection = Pipe()
process = Process(target=function, args=(*args, child_connection))
process.start()
response = parent_connection.recv()
process.join()
return response
return wrapper
And declaring the function like this:
#run_in_process
def example(data, pipe):
print(os.getpid())
pipe.send("Just an example here!")
pipe.close()
Obs1.:This code is running inside a AWS Lambda.
Obs2.: Those lambdas didn't finish before the other one starts, because this tasks takes at least 10 seconds.
Log of execution 1
Log of execution 2
Log of execution 3
You can see at the logs that each one is a different execution and they are executed at the "same" time.
The question is: Why they have the same PID even knowing that they are running concurrently? Shouldn't they have different PIDs?
I obligatorily need to execute this function in an isolated process
Your Lambda function could have been running in multiple containers at once in the AWS cloud. If you've been heavily testing your function with multiple concurrent requests, it is quite possible that the AWS orchestration created additional instances to handle the traffic.
With serverless, you lose some of the visibility into exactly how your code is being executed, but does it really matter?

How to interrupt function when cancel from UI in Python?

I have the function extract data like this:
#controller.route('/scrap_all_categories')
def scrap_all_categories():
result = ETL_jobs.extract_all_category(url, conn)
return result
Extract_all_category is a function which extracts data from a website using BeautifulSoup. So it takes about 30 min to finish and it runs until the job is done without using interrupt key.
And I need create a function to cancel this process. Which functions in Python I can use to interrupt this job.
Thanks.

How Python interrupts the current method to execute another method[close]

In python, I have two methods. In method A, I receive parameters and put them into parameter array. In procedure B, I process the data of parameter array, and put the results of processing into a log array. I want to get the data of reading log data by while loop in final of method A, and get the processing of parameters currently passed into A. As a result, I would like to ask how to pause to start method B when A is half-executed, otherwise A will endless loop.
Adding sleep method expects A to interrupt and B to execute, but it has no effect.
def A()
try:
datas=request.get_data()
data=json.loads(datas)
global queque_list,log_list
queque_list.append("data":data)
finally:
while 1:
sleep(3)
if len(log_list)>0
for logdata in log_list:
if logdata.get('uuid')==uuid:
return logdata.get('msg')
def B(task):
try:
do(task)
finally:
log_list.append({"uuid":uuid,"msg":msg})
def C():
while True:
if len(queque_list)>0:
task=queque_list.pop(0)
B(task)
t=threading.Thread(target=C)
t.start()
I expect if method A can interrupt when executing final module and wait for method B to finish executing before executing. but now method A executing final module and method B non-execution ,the mothod endless loop
You can use queue.Queue to send messages between the threads, specifically the put() method to send a message and the get() method to wait for a message in another thread. With this, you can get the threads to work in lock-step.
I'm not sure what you are trying to do, but perhaps you can get away with doing all the work in a single thread for simplicity.

How to schedule tasks without exiting existing loop?

I have struggled with this question for about a week -- time to ask someone who can bang out an answer in a couple minutes.
I am trying to run a python program once every 10 seconds. There are a lot of questions of this sort : Use sched module to run at a given time, Python threading.timer - repeat function every 'n' seconds, How to execute a function asynchronously every 60 seconds in Python?
Normally the solutions using sched or time.sleep would work, but I am trying to start a scheduled process from within cmd2, which is already running in a while False loop. (When you exit cmd2, it exits this loop).
Because of this, when I start a function to repeat every 10 seconds, I enter another loop nested within cmd2 and I am unable to enter cmd2 commands. I can only get back to cmd2 by exiting the sub-loop that is repeating the function, and thus the function stops repeating.
Evidently threading will solve this problem. I have tried threading.Timer without success. Perhaps the real problem is that I do not understand threads or multiprocessing.
Here is an example of code that is roughly isomorphic to the code I'm using, using sched module, which I got to work:
import cmd2
import repeated
class prompt(cmd2.Cmd):
"""this lets you enter commands"""
def default(self, line):
return cmd2.Cmd.default(self, line)
def do_exit(self, line):
return True
def do_repeated(self, line):
repeated.func_1()
Where repeated.py looks like this:
import sched
import time
def func_2(sc):
print 'doing stuff'
sc.enter(10, 0, func_2, (sc,))
def func_1():
s = sched.scheduler(time.time, time.sleep)
s.enter(0, 0, func_2, (s,))
s.run()
http://docs.python.org/2/library/queue.html?highlight=queue#Queue
Can you instance a Queue object outside of cmd2? There can be one thread that watches the queue and takes jobs from it at periodic intervals; while cmd2 is free to run or not run. The thread that processes the queue, and the queue object itself need to be in the outer scope, of course.
To schedule something at a particular time, you can insert a tuple which has the target time in it. Or you can have the thread just check at regular intervals, if that's good enough.
[Edit, if you have a process that is intended to repeat, you can have it requeue itself at the end of it's operation.]
As soon as I asked the question I was able to figure it out. Don't know why that happens sometimes.
This code
def f():
# do something here ...
# call f() again in 60 seconds
threading.Timer(60, f).start()
# start calling f now and every 60 sec thereafter
f()
From here: How to execute a function asynchronously every 60 seconds in Python?
Actually works for what I was trying to do. There are evidently some subtleties in how the function is called as an argument in threading.Timer. Before when I was including the arguments and even the parentheses after the function I was getting recursive depth errors --i.e. the function was calling itself without delay constantly.
So anyone else who has a problem like this, pay attention to how you call the function in threading.Timer(60, f).start(). If you write threading.Timer(60, f()).start() or something similar it will probably not work.

thread for callback function in timeout_add

On which thread does the callback function gets executed after every "interval" milliseconds when we schedule a function using the following method??
def glib.timeout_add(interval, callback, ...)
https://developer.gnome.org/pygobject/stable/glib-functions.html#function-glib--timeout-add
In the thread which is running the default main loop.
If it's not documented, you'll either have to read the source code, or you can print out the return value from thread.get_ident() from inside the callback function and compare it to values printed from inside known threads within your code.
It's possible that the ident won't match any of the other threads, in which case it will be a thread created internally just for the purposes of the callbacks.

Categories