What is the default concurrency level with asyncio in Python? - python

I'm using Python Asyncio to do a lot of HTTP requests.
I'm wondering what is the default concurrency level of Asyncio or how many HTTP requests would be happening in parallel at any given time?
The code I'm using to do the HTTP requests you can find below:
async def call_url(self, session, url):
response = await session.request(method='GET', url=url)
return response
async def main(self, url_list):
async with aiohttp.ClientSession() as session:
res = await asyncio.gather(*[self.call_url(session, url) for url in url_list])
return res

There is no built-in limit in asyncio, but there is one in aiohttp. The TCPConnector limits the number of connections to 100 by default. You can override it by creating a TCPConnector with a different limit and passing it to the session.

Related

Use a single ClientSession instead of creating one for every request slowed down my HTTP calls

My python program makes HTTP requests to several different sites once every few hours. At first, I didn't know the recommended way to use aiohttp is to create only one ClientSession and use it for every request in the program's lifetime. So I create a new ClientSession for every call. The time between request and response was 0.3 to 0.5 seconds.
After learned that I should just use one ClientSession, it is supposed to be faster. I modified my code. But then, the time between request and response now is 0.5 to 1.5 seconds. I see > 1 seconds response time all the time, which never happened before.
Why is the recommended way slower?
I really don't want to change it back, because it is cleaner now, and I did other adjustments (which I am sure doesn't affect the response time) in the same commit. Is there any way I can use one shared ClientSession and make it fast like before?
Here are the code examples:
Before:
async def my_func1():
async with aiohttp.ClientSession() as session:
async with session.post(...) as resp:
# process response
async def my_func2():
async with aiohttp.ClientSession() as session:
async with session.get(...) as resp:
# process response
await asyncio.gather(my_func1(), my_func2())
After:
async def my_func1(session: ClientSession):
async with session.post(...) as resp:
# process response
async def my_func2(session: ClientSession):
async with session.get(...) as resp:
# process response
async with aiohttp.ClientSession() as session:
await asyncio.gather(my_func1(session), my_func2(session))
After using different ClientSession for different site as #antfuentes87 suggested, problem seems solved.
Running two days, response time stay at 0.2 to 0.5 seconds.
Code sample:
async def my_func1(session: ClientSession):
async with session.post(...) as resp:
# process response
async def my_func2(session: ClientSession):
async with session.get(...) as resp:
# process response
async with ClientSession(site1) as session1, ClientSession(site2) as session2:
await asyncio.gather(my_func1(session1), my_func2(session2))

how to schedule the execution of an async function in python and immediately return

I need to implement a proxy server in python that forwards client requests to some api if it doesn't have the data in it's cache. The requirement for when the data isn't present in the cache, is to not let the client wait at all but rather send back something like "you'll have your data soon" and in the meanwhile send a request to the api. It was my understanding I need to use async/await for this, but I could not make this work no matter what I tried. I am using asyncio and aiohttp libraries for this.
So let's say I have my function that sends a request to the api:
async def fetch(url, page_num):
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
resp = await response.json()
cache[page_num] = (resp, datetime.now())
return resp
what I would like is the following behavior:
if not_in_cache(page_number):
fetch(url, page_number) #this needs to return immediately so the client won't wait!!!
return Response("we're working on it") #send back a response without data
So on the one hand I want the method to immediately return a response to the client, but in the background I want it to get the data and store it in the cache. How can you accomplish that with async/await?
Create a task. Instead of:
if not_in_cache(page_number):
await fetch(url, page_number)
return Response(...)
write:
if not_in_cache(page_number):
asyncio.create_task(fetch(url, page_number))
return Response(...)
Don't forget to read the asyncio docs: Coroutines and Tasks

python ThreadPoolExecutor close all threads when I get a result

I am running a webscraper class who's method name is self.get_with_random_proxy_using_chain.
I am trying to send multithreaded calls to the same url, and would like that once there is a result from any thread, the method returns a response and closes other still active threads.
So far my code looks like this (probably naive):
from concurrent.futures import ThreadPoolExecutor, as_completed
# class initiation etc
max_workers = cpu_count() * 5
urls = [url_to_open] * 50
with ThreadPoolExecutor(max_workers=max_workers) as executor:
future_to_url=[]
for url in urls: # i had to do a loop to include sleep not to overload the proxy server
future_to_url.append(executor.submit(self.get_with_random_proxy_using_chain,
url,
timeout,
update_proxy_score,
unwanted_keywords,
unwanted_status_codes,
random_universe_size,
file_path_to_save_streamed_content))
sleep(0.5)
for future in as_completed(future_to_url):
if future.result() is not None:
return future.result()
But it runs all the threads.
Is there a way to close all threads once the first future has completed.
I am using windows and python 3.7x
So far I found this link, but I don't manage to make it work (pogram still runs for a long time).
As far as I know, running futures cannot be cancelled. Quite a lot has been written about this. And there are even some workarounds.
But I would suggest taking a closer look at the asyncio module. It is quite well suited for such tasks.
Below is a simple example, when several concurrent requests are made, and upon receiving the first result, the rest are canceled.
import asyncio
from typing import Set
from aiohttp import ClientSession
async def fetch(url, session):
async with session.get(url) as response:
return await response.read()
async def wait_for_first_response(tasks):
done, pending = await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED)
for p in pending:
p.cancel()
return done.pop().result()
async def request_one_of(*urls):
tasks = set()
async with ClientSession() as session:
for url in urls:
task = asyncio.create_task(fetch(url, session))
tasks.add(task)
return await wait_for_first_response(tasks)
async def main():
response = await request_one_of("https://wikipedia.org", "https://apple.com")
print(response)
asyncio.run(main())

FastApi communication with other Api

I am using fastapi very recently and as an exercise I want to connect my fastapi api with a validation service on other server... but I do not know how to do this, I have not found something that will help me in the official documentation.. Will I have to do it with python code? Or is there a way?
FastApi docs
thank you for your help and excuse my english.
The accepted answer certainly works, but it is not an effective solution. With each request, the ClientSession is closed, so we lose the advantage [0] of ClientSession: connection pooling, keepalives, etc. etc.
We can use the startup and shutdown events [1] in FastAPI, which are triggered when the server starts and shuts down respectively. In these events it is possible to create a ClientSession instance and use it during the runtime of the whole application (and therefore utilize its full potential).
The ClientSession instance is stored in the application state. [2]
Here I answered a very similar question in the context of the aiohttp server: https://stackoverflow.com/a/60850857/752142
from __future__ import annotations
import asyncio
from typing import Final
from aiohttp import ClientSession
from fastapi import Depends, FastAPI
from starlette.requests import Request
app: Final = FastAPI()
#app.on_event("startup")
async def startup_event():
setattr(app.state, "client_session", ClientSession(raise_for_status=True))
#app.on_event("shutdown")
async def shutdown_event():
await asyncio.wait((app.state.client_session.close()), timeout=5.0)
def client_session_dep(request: Request) -> ClientSession:
return request.app.state.client_session
#app.get("/")
async def root(
client_session: ClientSession = Depends(client_session_dep),
) -> str:
async with client_session.get(
"https://example.com/", raise_for_status=True
) as the_response:
return await the_response.text()
[0] https://docs.aiohttp.org/en/stable/client_reference.html
[1] https://fastapi.tiangolo.com/advanced/events/
[2] https://www.starlette.io/applications/#storing-state-on-the-app-instance
You will need to code it with Python.
If you're using async you should use a HTTP client that is also async, for example aiohttp.
import aiohttp
#app.get("/")
async def slow_route():
async with aiohttp.ClientSession() as session:
async with session.get("http://validation_service.com") as resp:
data = await resp.text()
# do something with data

Trying to do multiple request simultaneously and then add to a set with Aiohttp and python

I have the below code that will do GET requests at an http endpoint. However, doing them one at a time is super slow. So the code below will do them 50 at a time, but I need to add them to a set (I figured a set would be fastest, because there will be duplicate objects returned with this script. Right now, this just returns the objects in a string 50 at a time, when I need them separated so I can sort them after they are all in a set. I'm new to python so I'm not sure what else to try
import asyncio
from aiohttp import ClientSession
async def fetch(url, session):
async with session.get(url) as response:
return await response.read()
async def run(r):
url = "http://httpbin.org/get"
tasks = []
# Fetch all responses within one Client session,
# keep connection alive for all requests.
async with ClientSession() as session:
for i in range(r):
task = asyncio.ensure_future(fetch(url.format(i), session))
tasks.append(task)
responses = await asyncio.gather(*tasks)
# you now have all response bodies in this variable
print(responses)
def print_responses(result):
print(result)
loop = asyncio.get_event_loop()
future = asyncio.ensure_future(run(20))
loop.run_until_complete(future)
Right now, it just dumps all of the request responses to result, I need it to add each response to a set so I can work with the data later

Categories