I am trying to teach myself Python's async functionality. To do so I have built an async web scraper. I would like to limit the total number of connections I have open at once to be a good citizen on servers. I know that semaphore's are a good solution, and the asyncio library has a semaphore class built in. My issue is that Python complains when using yield from in an async function as you are combining yield and await syntax. Below is the exact syntax I am using...
import asyncio
import aiohttp
sema = asyncio.BoundedSemaphore(5)
async def get_page_text(url):
with (yield from sema):
try:
resp = await aiohttp.request('GET', url)
if resp.status == 200:
ret_val = await resp.text()
except:
raise ValueError
finally:
await resp.release()
return ret_val
Raising this Exception:
File "<ipython-input-3-9b9bdb963407>", line 14
with (yield from sema):
^
SyntaxError: 'yield from' inside async function
Some possible solution I can think of...
Just use the #asyncio.coroutine decorator
Use threading.Semaphore? This seems like it may cause other issues
Try this in the beta of Python 3.6 for this reason.
I am very new to Python's async functionality so I could be missing something obvious.
You can use the async with statement to get an asynchronous context manager:
#!/usr/local/bin/python3.5
import asyncio
from aiohttp import ClientSession
sema = asyncio.BoundedSemaphore(5)
async def hello(url):
async with ClientSession() as session:
async with sema, session.get(url) as response:
response = await response.read()
print(response)
loop = asyncio.get_event_loop()
loop.run_until_complete(hello("http://httpbin.org/headers"))
Example taken from here. The page is also a good primer for asyncio and aiohttp in general.
OK, so this is really silly but I just replaces yield from with await in the semaphore context manager and it is working perfectly.
sema = asyncio.BoundedSemaphore(5)
async def get_page_text(url):
with (await sema):
try:
resp = await aiohttp.request('GET', url)
if resp.status == 200:
ret_val = await resp.text()
except:
raise ValueError
finally:
await resp.release()
return ret_val
For the semaphore only:
sem = asyncio.Semaphore(10)
# ... later
async with sem:
# work with shared resource
which is equivalent to:
sem = asyncio.Semaphore(10)
# ... later
await sem.acquire()
try:
# work with shared resource
finally:
sem.release()
ref:
https://docs.python.org/3/library/asyncio-sync.html#asyncio.Semaphore
Related
I'm making a python module for interacting with an API. I'd like it to be fast, so I chose to use asyncio and Aiohttp. I'm quite new to async programming and I'm not quite sure how to reuse the same session for every request. Also, I'd like to spare my end-users the hassle of creating the loop etc. I came up with this class for my base client:
import asyncio
import aiohttp
class BaseClient:
API_BASE_URL = "dummyURL"
API_VERSION = 3
async def __aenter__(self):
self._session = aiohttp.ClientSession(raise_for_status=True)
return self
async def __aexit__(self, exc_type, exc, tb):
await self._session.close()
#remove the next line when aiohttp 4.0 is released
await asyncio.sleep(0.250)
async def _get(self, endpoint: str) -> None:
url = f"{self.API_BASE_URL}/{endpoint}/?v={self.API_VERSION}"
async with self._session.get(url) as resp:
json_body = await resp.json()
return json_body
async def list_forums(self):
endpoint = "forums"
return await self._get(endpoint)
async def main():
async with BaseClient() as client:
forums = await client.list_forums()
print(forums)
asyncio.run(main())
Is that the right way to reuse the same session? Is it possible to refactor BaseClient in such a way my end-users would only have to dothe following:
client = BaseClient()
forums = client.list_forums()
Thanks for your help.
I am trying to achieve aiohttp async processing of requests that have been defined in my class as follows:
class Async():
async def get_service_1(self, zip_code, session):
url = SERVICE1_ENDPOINT.format(zip_code)
response = await session.request('GET', url)
return await response
async def get_service_2(self, zip_code, session):
url = SERVICE2_ENDPOINT.format(zip_code)
response = await session.request('GET', url)
return await response
async def gather(self, zip_code):
async with aiohttp.ClientSession() as session:
return await asyncio.gather(
self.get_service_1(zip_code, session),
self.get_service_2(zip_code, session)
)
def get_async_requests(self, zip_code):
asyncio.set_event_loop(asyncio.SelectorEventLoop())
loop = asyncio.get_event_loop()
results = loop.run_until_complete(self.gather(zip_code))
loop.close()
return results
When running to get the results from the get_async_requests function, i am getting the following error:
TypeError: object ClientResponse can't be used in 'await' expression
Where am i going wrong in the code? Thank you in advance
When you await something like session.response, the I/O starts, but aiohttp returns when it receives the headers; it doesn't want for the response to finish. (This would let you react to a status code without waiting for the entire body of the response.)
You need to await something that does that. If you're expecting a response that contains text, that would be response.text. If you're expecting JSON, that's response.json. This would look something like
response = await session.get(url)
return await response.text()
I have a script that checks the status code for a couple hundred thousand supplied websites, and I was trying to integrate a Semaphore to the flow to speed up processing. The problem is that whenever I integrate a Semaphore, I just get a list populated with None objects, and I'm not entirely sure why.
I have been mostly copying code from other sources as I don't fully grok asynchronous programming fully yet, but it seems like when I debug I should be getting results out of the function, but something is going wrong when I gather the results. I've tried juggling around my looping, my gathering, ensuring futures, etc, but nothing seems to return a list of things that work.
async def fetch(session, url):
try:
async with session.head(url, allow_redirects=True) as resp:
return url, resp.real_url, resp.status, resp.reason
except Exception as e:
return url, None, e, 'Error'
async def bound_fetch(sem, session, url):
async with sem:
await fetch(session, url)
async def run(urls):
timeout = 15
tasks = []
sem = asyncio.Semaphore(100)
conn = aiohttp.TCPConnector(limit=64, ssl=False)
async with aiohttp.ClientSession(connector=conn) as session:
for url in urls:
task = asyncio.wait_for(bound_fetch(sem, session, url), timeout)
tasks.append(task)
responses = await asyncio.gather(*tasks)
# responses = [await f for f in tqdm.tqdm(asyncio.as_completed(tasks), total=len(tasks))]
return responses
urls = ['https://google.com', 'https://yahoo.com']
loop = asyncio.ProactorEventLoop()
data = loop.run_until_complete(run(urls))
I've commented out the progress bar component, but that implementation returns the desired results when there is no semaphore.
Any help would be greatly appreciated. I am furiously reading up on asynchronous programming, but I can't wrap my mind around it yet.
You should explicitly return results of awaiting coroutines.
Replace this code...
async def bound_fetch(sem, session, url):
async with sem:
await fetch(session, url)
... with this:
async def bound_fetch(sem, session, url):
async with sem:
return await fetch(session, url)
My code is as follows:
import asyncio
import aiohttp
urls = [
'http://www.163.com/',
'http://www.sina.com.cn/',
'https://www.hupu.com/',
'http://www.csdn.net/'
]
async def get_url_data(u):
"""
read url data
:param u:
:return:
"""
print('running ', u)
resp = await aiohttp.ClientSession().get(url=u)
headers = resp.headers
print(u, headers)
return headers
async def request_url(u):
"""
main func
:param u:
:return:
"""
res = await get_url_data(u)
return res
loop = asyncio.get_event_loop()
task_lists = asyncio.wait([request_url(u) for u in urls])
loop.run_until_complete(task_lists)
loop.close()
When i running my code, it's display a warning message:
Unclosed client session
Anybody can give me some solutions about that?
Thanks a lot
You should close the connection in the end.
You have 2 options:
You can close the connection manually:
import aiohttp
session = aiohttp.ClientSession()
# use the session here
session.close()
Or you can use it with a contex manager:
import aiohttp
import asyncio
async def fetch(client):
async with client.get('http://python.org') as resp:
assert resp.status == 200
return await resp.text()
async def main(loop):
async with aiohttp.ClientSession(loop=loop) as client:
html = await fetch(client)
print(html)
loop = asyncio.get_event_loop()
loop.run_until_complete(main(loop))
The client session supports the context manager protocol for self closing.
If you are not using context manager, the proper way to close it would also need an await. Many answers on the internet miss that part, and few people actually notice it, presumably because most people use the more convenient context manager. But the manual await session.close() is essential when/if you are closing a class-wide session inside the tearDownClass() when doing unittesting.
import aiohttp
session = aiohttp.ClientSession()
# use the session here
await session.close()
You should use ClientSession using async context manager for proper blocking/freeing resources:
async def get_url_data(u):
"""
read url data
:param u:
:return:
"""
print('running ', u)
async with aiohttp.ClientSession() as session:
resp = await session.get(url=u)
headers = resp.headers
print(u, headers)
return headers
The Getting Started docs for aiohttp give the following client example:
import asyncio
import aiohttp
async def fetch_page(session, url):
with aiohttp.Timeout(10):
async with session.get(url) as response:
assert response.status == 200
return await response.read()
loop = asyncio.get_event_loop()
with aiohttp.ClientSession(loop=loop) as session:
content = loop.run_until_complete(
fetch_page(session, 'http://python.org'))
print(content)
And they give the following note for Python 3.4 users:
If you are using Python 3.4, please replace await with yield from and
async def with a #coroutine decorator.
If I follow these instructions I get:
import aiohttp
import asyncio
#asyncio.coroutine
def fetch(session, url):
with aiohttp.Timeout(10):
async with session.get(url) as response:
return (yield from response.text())
if __name__ == '__main__':
loop = asyncio.get_event_loop()
with aiohttp.ClientSession(loop=loop) as session:
html = loop.run_until_complete(
fetch(session, 'http://python.org'))
print(html)
However, this will not run, because async with is not supported in Python 3.4:
$ python3 client.py
File "client.py", line 7
async with session.get(url) as response:
^
SyntaxError: invalid syntax
How can I translate the async with statement to work with Python 3.4?
Just don't use the result of session.get() as a context manager; use it as a coroutine directly instead. The request context manager that session.get() produces would normally release the request on exit, but so does using response.text(), so you could ignore that here:
#asyncio.coroutine
def fetch(session, url):
with aiohttp.Timeout(10):
response = yield from session.get(url)
return (yield from response.text())
The request wrapper returned here doesn't have the required asynchronous methods (__aenter__ and __aexit__), they omitted entirely when not using Python 3.5 (see the relevant source code).
If you have more statements between the session.get() call and accessing the response.text() awaitable, you probably want to use a try:..finally: anyway to release the connection; the Python 3.5 release context manager also closes the response if an exception occurred. Because a yield from response.release() is needed here, this can't be encapsulated in a context manager before Python 3.4:
import sys
#asyncio.coroutine
def fetch(session, url):
with aiohttp.Timeout(10):
response = yield from session.get(url)
try:
# other statements
return (yield from response.text())
finally:
if sys.exc_info()[0] is not None:
# on exceptions, close the connection altogether
response.close()
else:
yield from response.release()
aiohttp's examples implemented using 3.4 syntax. Based on json client example your function would be:
#asyncio.coroutine
def fetch(session, url):
with aiohttp.Timeout(10):
resp = yield from session.get(url)
try:
return (yield from resp.text())
finally:
yield from resp.release()
Upd:
Note that Martijn's solution would work for simple cases, but may lead to unwanted behavior in specific cases:
#asyncio.coroutine
def fetch(session, url):
with aiohttp.Timeout(5):
response = yield from session.get(url)
# Any actions that may lead to error:
1/0
return (yield from response.text())
# exception + warning "Unclosed response"
Besides exception you'll get also warning "Unclosed response". This may lead to connections leak in complex app. You will avoid this problem if you'll manually call resp.release()/resp.close():
#asyncio.coroutine
def fetch(session, url):
with aiohttp.Timeout(5):
resp = yield from session.get(url)
try:
# Any actions that may lead to error:
1/0
return (yield from resp.text())
except Exception as e:
# .close() on exception.
resp.close()
raise e
finally:
# .release() otherwise to return connection into free connection pool.
# It's ok to release closed response:
# https://github.com/KeepSafe/aiohttp/blob/master/aiohttp/client_reqrep.py#L664
yield from resp.release()
# exception only
I think it's better to follow official examples (and __aexit__ implementation) and call resp.release()/resp.close() explicitly.