The Getting Started docs for aiohttp give the following client example:
import asyncio
import aiohttp
async def fetch_page(session, url):
with aiohttp.Timeout(10):
async with session.get(url) as response:
assert response.status == 200
return await response.read()
loop = asyncio.get_event_loop()
with aiohttp.ClientSession(loop=loop) as session:
content = loop.run_until_complete(
fetch_page(session, 'http://python.org'))
print(content)
And they give the following note for Python 3.4 users:
If you are using Python 3.4, please replace await with yield from and
async def with a #coroutine decorator.
If I follow these instructions I get:
import aiohttp
import asyncio
#asyncio.coroutine
def fetch(session, url):
with aiohttp.Timeout(10):
async with session.get(url) as response:
return (yield from response.text())
if __name__ == '__main__':
loop = asyncio.get_event_loop()
with aiohttp.ClientSession(loop=loop) as session:
html = loop.run_until_complete(
fetch(session, 'http://python.org'))
print(html)
However, this will not run, because async with is not supported in Python 3.4:
$ python3 client.py
File "client.py", line 7
async with session.get(url) as response:
^
SyntaxError: invalid syntax
How can I translate the async with statement to work with Python 3.4?
Just don't use the result of session.get() as a context manager; use it as a coroutine directly instead. The request context manager that session.get() produces would normally release the request on exit, but so does using response.text(), so you could ignore that here:
#asyncio.coroutine
def fetch(session, url):
with aiohttp.Timeout(10):
response = yield from session.get(url)
return (yield from response.text())
The request wrapper returned here doesn't have the required asynchronous methods (__aenter__ and __aexit__), they omitted entirely when not using Python 3.5 (see the relevant source code).
If you have more statements between the session.get() call and accessing the response.text() awaitable, you probably want to use a try:..finally: anyway to release the connection; the Python 3.5 release context manager also closes the response if an exception occurred. Because a yield from response.release() is needed here, this can't be encapsulated in a context manager before Python 3.4:
import sys
#asyncio.coroutine
def fetch(session, url):
with aiohttp.Timeout(10):
response = yield from session.get(url)
try:
# other statements
return (yield from response.text())
finally:
if sys.exc_info()[0] is not None:
# on exceptions, close the connection altogether
response.close()
else:
yield from response.release()
aiohttp's examples implemented using 3.4 syntax. Based on json client example your function would be:
#asyncio.coroutine
def fetch(session, url):
with aiohttp.Timeout(10):
resp = yield from session.get(url)
try:
return (yield from resp.text())
finally:
yield from resp.release()
Upd:
Note that Martijn's solution would work for simple cases, but may lead to unwanted behavior in specific cases:
#asyncio.coroutine
def fetch(session, url):
with aiohttp.Timeout(5):
response = yield from session.get(url)
# Any actions that may lead to error:
1/0
return (yield from response.text())
# exception + warning "Unclosed response"
Besides exception you'll get also warning "Unclosed response". This may lead to connections leak in complex app. You will avoid this problem if you'll manually call resp.release()/resp.close():
#asyncio.coroutine
def fetch(session, url):
with aiohttp.Timeout(5):
resp = yield from session.get(url)
try:
# Any actions that may lead to error:
1/0
return (yield from resp.text())
except Exception as e:
# .close() on exception.
resp.close()
raise e
finally:
# .release() otherwise to return connection into free connection pool.
# It's ok to release closed response:
# https://github.com/KeepSafe/aiohttp/blob/master/aiohttp/client_reqrep.py#L664
yield from resp.release()
# exception only
I think it's better to follow official examples (and __aexit__ implementation) and call resp.release()/resp.close() explicitly.
Related
I have a pretty complicated API with custom parameters and headers so I created a class to wrap around it. Here's a contrived example:
import asyncio
import aiohttp
# The wrapper class around my API
class MyAPI:
def __init__(self, base_url: str):
self.base_url = base_url
async def send(self, session, method, url) -> aiohttp.ClientResponse:
request_method = getattr(session, method.lower())
full_url = f"{self.base_url}/{url}"
async with request_method(full_url) as response:
return response
async def main():
api = MyAPI("https://httpbin.org")
async with aiohttp.ClientSession() as session:
response = await api.send(session, "GET", "/uuid")
print(response.status) # 200 OK
print(await response.text()) # Exception: Connection closed
asyncio.run(main())
Why is my session closed? I didn't exit the context manager of session.
If I ignore the wrapper class, everything works as expected:
async def main():
async with aiohttp.ClientSession() as session:
async with session.get("https://httpbin.org/uuid") as response:
print(await response.text())
You can't call response.text() once you have left the request_method(full_url) context.
If you write:
async with request_method(full_url) as response:
text = await response.text()
return response.status, text
then the send() method returns without error.
I have a script that checks the status code for a couple hundred thousand supplied websites, and I was trying to integrate a Semaphore to the flow to speed up processing. The problem is that whenever I integrate a Semaphore, I just get a list populated with None objects, and I'm not entirely sure why.
I have been mostly copying code from other sources as I don't fully grok asynchronous programming fully yet, but it seems like when I debug I should be getting results out of the function, but something is going wrong when I gather the results. I've tried juggling around my looping, my gathering, ensuring futures, etc, but nothing seems to return a list of things that work.
async def fetch(session, url):
try:
async with session.head(url, allow_redirects=True) as resp:
return url, resp.real_url, resp.status, resp.reason
except Exception as e:
return url, None, e, 'Error'
async def bound_fetch(sem, session, url):
async with sem:
await fetch(session, url)
async def run(urls):
timeout = 15
tasks = []
sem = asyncio.Semaphore(100)
conn = aiohttp.TCPConnector(limit=64, ssl=False)
async with aiohttp.ClientSession(connector=conn) as session:
for url in urls:
task = asyncio.wait_for(bound_fetch(sem, session, url), timeout)
tasks.append(task)
responses = await asyncio.gather(*tasks)
# responses = [await f for f in tqdm.tqdm(asyncio.as_completed(tasks), total=len(tasks))]
return responses
urls = ['https://google.com', 'https://yahoo.com']
loop = asyncio.ProactorEventLoop()
data = loop.run_until_complete(run(urls))
I've commented out the progress bar component, but that implementation returns the desired results when there is no semaphore.
Any help would be greatly appreciated. I am furiously reading up on asynchronous programming, but I can't wrap my mind around it yet.
You should explicitly return results of awaiting coroutines.
Replace this code...
async def bound_fetch(sem, session, url):
async with sem:
await fetch(session, url)
... with this:
async def bound_fetch(sem, session, url):
async with sem:
return await fetch(session, url)
I'm refactoring the following functions to use requests_async instead of aiohttp
Original code:
async def ping_url(session, url):
logger.debug('Ping: %s', url)
try:
response = await session.get(url)
except asyncio.TimeoutError:
logger.info('Ping timeout for %s', url)
return ('TIMEOUT', url)
else:
logger.debug('Ping done for %s', url)
return (response.status, url)
async def ping_urls(urls, headers=None):
async with aiohttp.ClientSession(timeout=timeout, headers=headers) as session:
tasks = [
asyncio.create_task(
ping_url(session, url)
) for url in urls
]
ping_results = await asyncio.gather(*tasks)
return ping_results
After refactoring:
async def ping_url(session, url):
logger.debug('Ping: %s', url)
try:
response = await session.get(url)
except requests_async.exceptions.Timeout:
logger.info('Ping timeout for %s', url)
return ('TIMEOUT', url)
else:
logger.debug('Ping done for %s', url)
return (response.status_code, url)
async def ping_urls(urls, headers=None):
async with requests_async.Session() as session:
session.headers.update(headers or {})
session.timeout = 2
tasks = [
asyncio.create_task(
ping_url(session, url)
) for url in urls
]
ping_results = await asyncio.gather(*tasks)
return ping_results
If I run the ping_urls coro on my machine with the same URL pinged by my lambda, there is no significant difference in performance between the two versions.
Inside my AWS lambda, the response version is 3 times slower than the aiohttp version.
Any idea on what could be the cause of the difference and why it affects the code only when it's executed in the aws lambda?
I've been checking that the version of dependencies installed by my packaging scripts are the same as the one I use locally.
I have also edited my lambda to execute both versions sequentially in the same invocation to ensure that the execution context is the same, but the performance differences are the same (3/1).
My code is as follows:
import asyncio
import aiohttp
urls = [
'http://www.163.com/',
'http://www.sina.com.cn/',
'https://www.hupu.com/',
'http://www.csdn.net/'
]
async def get_url_data(u):
"""
read url data
:param u:
:return:
"""
print('running ', u)
resp = await aiohttp.ClientSession().get(url=u)
headers = resp.headers
print(u, headers)
return headers
async def request_url(u):
"""
main func
:param u:
:return:
"""
res = await get_url_data(u)
return res
loop = asyncio.get_event_loop()
task_lists = asyncio.wait([request_url(u) for u in urls])
loop.run_until_complete(task_lists)
loop.close()
When i running my code, it's display a warning message:
Unclosed client session
Anybody can give me some solutions about that?
Thanks a lot
You should close the connection in the end.
You have 2 options:
You can close the connection manually:
import aiohttp
session = aiohttp.ClientSession()
# use the session here
session.close()
Or you can use it with a contex manager:
import aiohttp
import asyncio
async def fetch(client):
async with client.get('http://python.org') as resp:
assert resp.status == 200
return await resp.text()
async def main(loop):
async with aiohttp.ClientSession(loop=loop) as client:
html = await fetch(client)
print(html)
loop = asyncio.get_event_loop()
loop.run_until_complete(main(loop))
The client session supports the context manager protocol for self closing.
If you are not using context manager, the proper way to close it would also need an await. Many answers on the internet miss that part, and few people actually notice it, presumably because most people use the more convenient context manager. But the manual await session.close() is essential when/if you are closing a class-wide session inside the tearDownClass() when doing unittesting.
import aiohttp
session = aiohttp.ClientSession()
# use the session here
await session.close()
You should use ClientSession using async context manager for proper blocking/freeing resources:
async def get_url_data(u):
"""
read url data
:param u:
:return:
"""
print('running ', u)
async with aiohttp.ClientSession() as session:
resp = await session.get(url=u)
headers = resp.headers
print(u, headers)
return headers
I am trying to teach myself Python's async functionality. To do so I have built an async web scraper. I would like to limit the total number of connections I have open at once to be a good citizen on servers. I know that semaphore's are a good solution, and the asyncio library has a semaphore class built in. My issue is that Python complains when using yield from in an async function as you are combining yield and await syntax. Below is the exact syntax I am using...
import asyncio
import aiohttp
sema = asyncio.BoundedSemaphore(5)
async def get_page_text(url):
with (yield from sema):
try:
resp = await aiohttp.request('GET', url)
if resp.status == 200:
ret_val = await resp.text()
except:
raise ValueError
finally:
await resp.release()
return ret_val
Raising this Exception:
File "<ipython-input-3-9b9bdb963407>", line 14
with (yield from sema):
^
SyntaxError: 'yield from' inside async function
Some possible solution I can think of...
Just use the #asyncio.coroutine decorator
Use threading.Semaphore? This seems like it may cause other issues
Try this in the beta of Python 3.6 for this reason.
I am very new to Python's async functionality so I could be missing something obvious.
You can use the async with statement to get an asynchronous context manager:
#!/usr/local/bin/python3.5
import asyncio
from aiohttp import ClientSession
sema = asyncio.BoundedSemaphore(5)
async def hello(url):
async with ClientSession() as session:
async with sema, session.get(url) as response:
response = await response.read()
print(response)
loop = asyncio.get_event_loop()
loop.run_until_complete(hello("http://httpbin.org/headers"))
Example taken from here. The page is also a good primer for asyncio and aiohttp in general.
OK, so this is really silly but I just replaces yield from with await in the semaphore context manager and it is working perfectly.
sema = asyncio.BoundedSemaphore(5)
async def get_page_text(url):
with (await sema):
try:
resp = await aiohttp.request('GET', url)
if resp.status == 200:
ret_val = await resp.text()
except:
raise ValueError
finally:
await resp.release()
return ret_val
For the semaphore only:
sem = asyncio.Semaphore(10)
# ... later
async with sem:
# work with shared resource
which is equivalent to:
sem = asyncio.Semaphore(10)
# ... later
await sem.acquire()
try:
# work with shared resource
finally:
sem.release()
ref:
https://docs.python.org/3/library/asyncio-sync.html#asyncio.Semaphore