How to call aws service to asynchronously in python aws lambda

How to call aws service to asynchronously in python aws lambda - python

I need to call to async function inside lambda function. When executing this, it's showing this error
await wasn't used with future
Here is my code
async def main():
bucketName = 'data-store-test'
folder = 'Contacts-Aggre'
lastModifiedFolderPath = await getLastModifiedBucketPath(bucketName,folder)
print('Received lastModifiedFolderPath:',lastModifiedFolderPath)
async def getLastModifiedBucketPath(bucketName, prefix):
loop = asyncio.get_running_loop()
bucket_objects = await asyncio.gather(*loop.run_in_executor(None, functools.partial(s3_client.list_objects_v2, Bucket=bucketName, Prefix=prefix)))
all = bucket_objects['Contents']
latest = max(all, key=lambda x: x['LastModified'])
folderPaths = latest['Key'].split('/')
lastModifiedFolder = folderPaths[1] if len(folderPaths) >=2 else folderPaths[0]
return lastModifiedFolder
def lambda_handler(event, context):
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
What is the issue in this code? I created another question also related to this. No one answered for me, that's why I created this.
In this getLastModifiedBucketPath method used to get specific bucket location last modified folder name. If I remove asyncio.gather part, then it's working. But I need to return value after it execute.

I'm just learning asyncio this week, so take this with a grain of salt, but my belief is:
loop.run_in_executor returns a future
This future is not awaited; when you try to splat it, you get the error (try out just that snippet in your REPL and you'll see)
gather needs the splatted argument to be a list, not a future
Try awaiting that run_in_executor before using it as an argument to gather and see what happens.

Related

Python asyncio non-blocking wait after return

I created API on Python and i want to start some long function, but I want to tell user that my endpoint worked successfully and i some task started in execution
I want to do it because i want so that the user does not wait for the function to be executed
If it were represented in pseudocode, it would probably look like this:
async my_endpoint(context):
func_name = context.func_name
<something_validation_block>
return 204 if all right
So, how created in one function ?
I tried something as:
async def handle(context):
<validate_block>
threading.Thread(
target=logn_func, args=(context,),
).start()
return 204
But unfortunately it does not work : (

First, asyncio has a method named asyncio.to_thread docs
It's provide a friendly method to work with async and threading.
(Or you can run task in threading pool docs)
then, you can use asyncio.create_task(coro) to run async function in background
it will return a Task object which is awaitable, or use task.add_done_callback to handle result.
import asyncio
import time
def block() -> str:
print("block function start")
time.sleep(1)
print("block function done")
return "result"
async def main() -> int:
task = asyncio.get_running_loop().run_in_executor(None, block)
task.add_done_callback(lambda task: print("task with result:", task.result()))
print("return 204")
return 204
asyncio.run(main())
block function start
return 204
block function done
task with result: result
NOTE: Save a reference to tasks, to avoid a task disappearing mid-execution. The event loop only keeps weak references to tasks. A task that isn’t referenced elsewhere may get garbage collected at any time, even before it’s done.

How to really make this operation async in python?

I have very recently started checking out asyncio for python. The use case is that, I hit an endpoint /refresh_data and the data is refreshed (from an S3 bucket). But this should be a non blocking operation, other API endpoints should still be able to be serviced.
So in my controller, I have:
def refresh_data():
myservice.refresh_data()
return jsonify(dict(ok=True))
and in my service I have:
async def refresh_data():
try:
s_result = await self.s3_client.fetch()
except (FileNotFound, IOError) as e:
logger.info("problem")
gather = {i.pop("x"):i async for i in summary_result}
# ... some other stuff
and in my client:
async def fetch():
result = pd.read_parquet("bucket", "pyarrow", cols, filts).to_dict(orient="col1")
return result
And when I run it, I see this error:
TypeError: 'async for' requires an object with __aiter__ method, got coroutine
I don't know how to move past this. Adding async definitely makes it return a coroutine type - but, either I have implemented this messily or I haven't fully understood asyncio package in Python. I have been working off of simple examples but I'm not sure what's wrong with what I've done here.

Limiting number of concurrent AsyncIO tasks using Semaphore not working

Objective:
I am trying to scrape multiple URLs simultaneously. I don't want to make too many requests at the same time so I am using this solution to limit it.
Problem:
Requests are being made for ALL tasks instead of for a limited number at a time.
Stripped-down Code:
async def download_all_product_information():
# TO LIMIT THE NUMBER OF CONCURRENT REQUESTS
async def gather_with_concurrency(n, *tasks):
semaphore = asyncio.Semaphore(n)
async def sem_task(task):
async with semaphore:
return await task
return await asyncio.gather(*(sem_task(task) for task in tasks))
# FUNCTION TO ACTUALLY DOWNLOAD INFO
async def get_product_information(url_to_append):
url = 'https://www.amazon.com.br' + url_to_append
print('Product Information - Page ' + str(current_page_number) + ' for category ' + str(
category_index) + '/' + str(len(all_categories)) + ' in ' + gender)
source = await get_source_code_or_content(url, should_render_javascript=True)
time.sleep(random.uniform(2, 5))
return source
# LOOP WHERE STUFF GETS DONE
for current_page_number in range(1, 401):
for gender in os.listdir(base_folder):
all_tasks = []
# check all products in the current page
all_products_in_current_page = open_list(os.path.join(base_folder, gender, category, current_page))
for product_specific_url in all_products_in_current_page:
current_task = asyncio.create_task(get_product_information(product_specific_url))
all_tasks.append(current_task)
await gather_with_concurrency(random.randrange(8, 15), *all_tasks)
async def main():
await download_all_product_information()
# just to make sure there are not any problems caused by two event loops
if asyncio.get_event_loop().is_running(): # only patch if needed (i.e. running in Notebook, Spyder, etc)
import nest_asyncio
nest_asyncio.apply()
# for asynchronous functionality
if __name__ == '__main__':
asyncio.run(main())
What am I doing wrong? Thanks!

What is wrong is this line:
current_task = asyncio.create_task(get_product_information(product_specific_url))
When you create a "task" it is imediatelly scheduled for execution. As soons
as your code yield execution to the asyncio loop (at any "await" expression), asyncio will loop executing all your tasks.
The semaphore, in the original snippet you pointed too, guarded the creation of the tasks itself, ensuring only "n" tasks would be active at a time. What is passed in to gather_with_concurrency in that snippet are co-routines.
Co-routines, unlike tasks, are objects that are ready to be awaited, but are not yet scheduled. They canbe passed around for free, just like any other object - they will only be executed when they are either awaited, or wrapped by a task (and then when the code passes control to the asyncio loop).
In your code, you are creating the co-routine, with the get_product_information call, and immediately wrapping it in a task. In the await instruction in the line that calls gather_with_concurrency itself, they are all run at once.
The fix is simple: do not create a task at this point, just inside the code guarded by your semaphore. Add just the raw co-routines to your list:
...
all_coroutines = []
# check all products in the current page
all_products_in_current_page = open_list(os.path.join(base_folder, gender, category, current_page))
for product_specific_url in all_products_in_current_page:
current_coroutine = get_product_information(product_specific_url)
all_coroutines.append(current_coroutine)
await gather_with_concurrency(random.randrange(8, 15), *all_coroutines)
There is still an unrelated incorrectness in this code that will make concurrency fail: you are making a synchronous call to time.sleepinside gather_product_information. This will stall the asyncio loop at this point
until the sleep is over. The correct thing to do is to use await asyncio.sleep(...) .

Concurrent execution of two python methods

I'm creating a script that is posting a message to both discord and twitter, depending on some input. I have to methods (in separate .py files), post_to_twitter and post_to_discord. What I want to achieve is that both of these try to execute even if the other fails (e.g. if there is some exception with login).
Here is the relevant code snippet for posting to discord:
def post_to_discord(message, channel_name):
client = discord.Client()
#client.event
async def on_ready():
channel = # getting the right channel
await channel.send(message)
await client.close()
client.run(discord_config.token)
and here is the snippet for posting to twitter part (stripped from the try-except blocks):
def post_to_twitter(message):
auth = tweepy.OAuthHandler(twitter_config.api_key, twitter_config.api_key_secret)
auth.set_access_token(twitter_config.access_token, twitter_config.access_token_secret)
api = tweepy.API(auth)
api.update_status(message)
Now, both of these work perfectly fine on their own and when being called synchronously from the same method:
def main(message):
post_discord.post_to_discord(message)
post_tweet.post_to_twitter(message)
However, I just cannot get them to work concurrently (i.e. to try to post to twitter even if discord fails or vice-versa). I've already tried a couple of different approaches with multi-threading and with asyncio.
Among others, I've tried the solution from this question. But got an error No module named 'IPython'. When I omitted the IPython line, changed the methods to async, I got this error: RuntimeError: Cannot enter into task <ClientEventTask state=pending event=on_ready coro=<function post_to_discord.<locals>.on_ready at 0x7f0ee33e9550>> while another task <Task pending name='Task-1' coro=<main() running at post_main.py:31>> is being executed..
To be honest, I'm not even sure if asyncio would be the right approach for my use case, so any insight is much appreciated.
Thank you.

In this case running the two things in completely separate threads (and completely separate event loops) is probably the easiest option at your level of expertise. For example, try this:
import post_to_discord, post_to_twitter
import concurrent.futures
def main(message):
with concurrent.futures.ThreadPoolExecutor() as pool:
fut1 = pool.submit(post_discord.post_to_discord, message)
fut2 = pool.submit(post_tweet.post_to_twitter, message)
# here closing the threadpool will wait for both futures to complete
# make exceptions visible
for fut in (fut1, fut2):
try:
fut.result()
except Exception as e:
print("error: ", e)

Forwarding Events with Python Faust

I am trying to forward messages to internal topics in faust. As suggested by faust in this example:
https://faust.readthedocs.io/en/latest/playbooks/vskafka.html
I have the following code
in_topic = app.topic("in_topic", internal=False, partitions=1)
batching = app.topic("batching", internal=True, partitions=1)
....
#app.agent(in_topic)
async def process(stream):
async for event in stream:
event.forward(batching)
yield
But i always get the following error when runnning my pytest:
AttributeError: 'str' object has no attribute 'forward'
Was this feature removed or do i need to specify the topic differently to get an event, or is this even an issue with pytest ?

You are using the syntax backwards, thats why!
in_topic = app.topic("in_topic", internal=False, partitions=1)
batching = app.topic("batching", internal=True, partitions=1)
....
#app.agent(in_topic)
async def process(stream):
async for event in stream:
batching.send(value=event)
yield
Should actually work.
Edit:
This is also the only they that I could make correct use of pytest with faust.
The sink method is only usable if you delete all but the last sink of the mocked agent, which does not seem intuitive.
If you first import your sink and then add this decorator to your test, everything should work fine:
from my_path import my_sink, my_agent
from unittest.mock import patch, AsyncMock
#patch(__name__ + '.my_sink.send', new_callable=AsyncMock)
def test_my_agent(test_app)
async with my_agent.test_context() as agent:
payload = 'This_works_fine'
agent.put(payload)
my_sink.send.assert_called_with(value=payload)

try the next one:
#app.agent(in_topic, sink=[batching])
async def process(stream):
async for event in stream:
yield event

The answer from #Florian Hall works fine. I also found out that the forward method is only implemented for Events, in my case i received a str. The reason could be the difference between Faust Topics and Channels. Another thing is that using pytest and the Faust test_context() behaves weird, for example it forces you to yield even if you don't have a sink defined.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.