I have a process which generates some data then uploads it to a web server. Both operations take some time, so I would like to speed the process up by sending the data asynchronously. I also do not want to overload the server, so I want to restrict uploading to one file at a time. I can't get this working and was hoping someone could point me in the right direction.
I have set up testing using a simple Flask server. I have created a test example using requests which works fine. However, when I try to upload data with asyncio it throws an exception some of the time. It seems to be related to how quickly I make the calls, sometimes it's fine, sometimes it throws but once it starts failing it seems to cascade. I am trying to ensure I'm only making one call at a time which should mimic requests.
Given it seems to work fine with requests, I'm guessing I'm doing something wrong on asyncio side. I have tried a range of different implementations with no luck.
This is the code for the Flask server:
from flask import Flask
from flask import jsonify
app = Flask(__name__)
#app.route('/', methods=['GET', 'POST'])
def upload():
return jsonify({'status': 'OK'})
if __name__ == "__main__":
app.run(debug=True, port=8080)
This is the code for the requests and asyncio upload implementations:
import asyncio, requests
from aiohttp import ClientSession
url = 'http://127.0.0.1:8080'
def get_data():
return {'a': 'b'}
async def send_data(session, data):
async with session.post(url, json=data) as response:
response_data = await response.json()
print(response_data)
async def upload_asyncio(loop):
"""attempt at asyncio upload"""
task = None
async with ClientSession() as session:
for _ in range(20):
# dummy get data function for testing
data = get_data()
# I want to wait until the previous upload has complete before uploading next chunk
if not task is None:
await task
task = asyncio.create_task(send_data(session, data))
if not task is None:
await task
def upload_requests():
"""working equivalent using requests"""
for _ in range(20):
data = get_data()
with requests.post(url, json=data) as resp:
response_data = resp.json()
print(response_data)
#upload_requests()
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
loop.run_until_complete(upload_asyncio(loop))
the exception which gets raised is:
Exception in callback _ProactorBasePipeTransport._call_connection_lost(None)
handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)>
Traceback (most recent call last):
File "C:\Users\John\anaconda3\envs\py310\lib\asyncio\events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "C:\Users\John\anaconda3\envs\py310\lib\asyncio\proactor_events.py", line 162, in _call_connection_lost
self._sock.shutdown(socket.SHUT_RDWR)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
Related
I am building a rest api with fastapi. I implemented the data layer separately from the fastapi application meaning I do not have direct access to the database session in my fastapi application.
I have access to the storage object which have method like close_session which allow me to close the current session.
Is there a equivalent of flask teardown_request in fastapi?
Flask Implementation
from models import storage
.....
.....
#app.teardown_request
def close_session(exception=None):
storage.close_session()
I have looked at fastapi on_event('shutdown') and on_event('startup'). These two only runs when the application is shutting down or starting up.
We can do this by using dependency.
credit to williamjemir: Click here to read the github discussion
from fastapi import FastAPI, Depends
from models import storage
async def close_session() -> None:
"""Close current after every request."""
print('Closing current session')
yield
storage.close()
print('db session closed.')
app = FastAPI(dependencies=[Depends(close_session)])
#app.get('/')
def home():
return "Hello World"
if __name__ == '__main__':
import uvicorn
uvicorn.run(app)
use fastapi middleware
A "middleware" is a function that works with every request before it is processed by any specific path operation. And also with every response before returning it.
It takes each request that comes to your application.
It can then do something to that request or run any needed code.
Then it passes the request to be processed by the rest of the application (by some path operation).
It then takes the response generated by the application (by some path operation).
It can do something to that response or run any needed code.
Then it returns the response.
Example:
import time
from fastapi import FastAPI, Request
app = FastAPI()
#app.middleware("http")
async def add_process_time_header(request: Request, call_next):
# do things before the request
response = await call_next(request)
# do things after the response
return response
references:
https://fastapi.tiangolo.com/tutorial/middleware/
I have a FastAPI application for testing/development purposes. What I want is that any request that arrives to my app to automatically be sent, as is, to another app on another server, with exactly the same parameters and same endpoint. This is not a redirect, because I still want the app to process the request and return values as usual. I just want to initiate a similar request to a different version of the app on a different server, without waiting for the answer from the other server, so that the other app gets the request as if the original request was sent to it.
How can I achieve that? Below is a sample code that I use for handling the request:
#app.post("/my_endpoint/some_parameters")
def process_request(
params: MyParamsClass,
pwd: str = Depends(authenticate),
):
# send the same request to http://my_other_url/my_endpoint/
return_value = process_the_request(params)
return return_value.as_json()
You could use the AsyncClient() from the httpx library, as described in this answer, as well as this answer and this answer (have a look at those answers for more details on the approach demonstrated below). You can spawn a Client inside the startup event handler, store it on the app instance—as described here, as well as here and here—and reuse it every time you need it. You can explicitly close the Client once you are done with it, using the shutdown event handler.
Working Example
Main Server
When building the request that is about to be forwarded to the other server, the main server uses request.stream() to read the request body from the client's request, which provides an async iterator, so that if the client sent a request with some large body (for instance, the client uploads a large file), the main server would not have to wait for the entire body to be received and loaded into memory before forwarding the request, something that would happen in case you used await request.body() instead, which would likely cause server issues if the body could not fit into RAM.
You can add multiple routes in the same way the /upload one has been defined below, specifying the path, as well as the HTTP method for the endpoint. Note that the /upload route below uses Starlette's path convertor to capture arbitrary paths, as demonstrated here and here. You could also specify the exact path parameters if you wish, but the below provides a more convenient way if there are too many of them. Regardless, the path will be evaluated against the endpoint in the other server below, where you can explicitly specify the path parameters.
from fastapi import FastAPI, Request
from fastapi.responses import StreamingResponse
from starlette.background import BackgroundTask
import httpx
app = FastAPI()
#app.on_event('startup')
async def startup_event():
client = httpx.AsyncClient(base_url='http://127.0.0.1:8001/') # this is the other server
app.state.client = client
#app.on_event('shutdown')
async def shutdown_event():
client = app.state.client
await client.aclose()
async def _reverse_proxy(request: Request):
client = request.app.state.client
url = httpx.URL(path=request.url.path, query=request.url.query.encode('utf-8'))
req = client.build_request(
request.method, url, headers=request.headers.raw, content=request.stream()
)
r = await client.send(req, stream=True)
return StreamingResponse(
r.aiter_raw(),
status_code=r.status_code,
headers=r.headers,
background=BackgroundTask(r.aclose)
)
app.add_route('/upload/{path:path}', _reverse_proxy, ['POST'])
if __name__ == '__main__':
import uvicorn
uvicorn.run(app, host='0.0.0.0', port=8000)
The Other Server
Again, for simplicity, the Request object is used to read the body, but you can isntead define UploadFile, Form and other parameters as usual. The below is listenning on port 8001.
from fastapi import FastAPI, Request
app = FastAPI()
#app.post('/upload/{p1}/{p2}')
async def upload(p1: str, p2: str, q1: str, request: Request):
return {'p1': p1, 'p2': p2, 'q1': q1, 'body': await request.body()}
if __name__ == '__main__':
import uvicorn
uvicorn.run(app, host='0.0.0.0', port=8001)
Test the example above
import httpx
url = 'http://127.0.0.1:8000/upload/hello/world'
files = {'file': open('file.txt', 'rb')}
params = {'q1': 'This is a query param'}
r = httpx.post(url, params=params, files=files)
print(r.content)
I do not want to await my function. Look at this code:
loop = asyncio.new_event_loop() # Needed because of flask server
asyncio.set_event_loop(loop) # Needed because of flask server
task = loop.create_task(mainHandler(code, send, text)) #Start mainHandler
return formResponse(True, "mainHandler Started") # Run this without awaiting the task
This is part of a function called on a flask server endpoint. I don't really care how mainHandler is started, all I want is that it starts and the function (not mainHandler) immediately returns without awaiting. I have tried scheduling tasks, using futures and just running it but all to no avail. Even this question describing exactly what I need did not help: How to run an Asyncio task without awaiting? Does anyone have experience with this?
When trying to use asyncio.create_task in this example:
from flask import Flask
from flask import request
import asyncio
app = Flask(__name__)
async def mainHandler(code: str, sends: int, sendText: str):
await asyncio.sleep(60) # simulate the time the function takes
#app.route('/endpoint')
def index():
code = request.args.get('code', default = "NOTSET", type = str)
send = request.args.get('send', default = 0, type = int)
text = request.args.get('text', default = "NOTSET", type = str).replace("+", " ")
if (code != "NOTSET" and send > 0 and text != "NOTSET"):
asyncio.create_task(mainHandler(code, send, text))
return {"error":False}
else:
return {"error":True}
if __name__ == '__main__':
app.debug = True
app.run(host="0.0.0.0", port=80)
It throws:
RuntimeError: no running event loop
when calling asyncio.create_task(mainHandler(code, send, text)).
To sum this up for everyone coming here in the future:
The version of Flask I used was sync. I just switched to Sanic which is similar in Syntax (only minor differences), faster and supports async via asyncio.ensure_future(mainHandler(code, send, text)).
According to gre_gor (Thank you!) you should also be able to use pip install flask[async] to install async flask.
I am attempting to optimize a simple web scraper that I made. It gets a list of urls from a table on a main page and then goes to each of those "sub" urls and gets information from those pages. I was able to successfully write it synchronously and using concurrent.futures.ThreadPoolExecutor(). However, I am trying to optimize it to use asyncio and httpx as these seem to be very fast for making hundreds of http requests.
I wrote the following script using asyncio and httpx however, I keep getting the following errors:
httpcore.RemoteProtocolError: Server disconnected without sending a response.
RuntimeError: The connection pool was closed while 4 HTTP requests/responses were still in-flight.
It appears that I keep losing connection when I run the script. I even attempted running a synchronous version of it and get the same error. I was thinking that the remote server was blocking my requests, however, I am able to run my original program and go to each of the urls from the same IP address without issue.
What would cause this exception and how do you fix it?
import httpx
import asyncio
async def get_response(client, url):
resp = await client.get(url, headers=random_user_agent()) # Gets a random user agent.
html = resp.text
return html
async def main():
async with httpx.AsyncClient() as client:
tasks = []
# Get list of urls to parse.
urls = get_events('https://main-url-to-parse.com')
# Get the responses for the detail page for each event
for url in urls:
tasks.append(asyncio.ensure_future(get_response(client, url)))
detail_responses = await asyncio.gather(*tasks)
for resp in detail_responses:
event = get_details(resp) # Parse url and get desired info
asyncio.run(main())
I've had a same issue, the problem occurs when there is an exception in one of the asyncio.gather tasks, when it's raised, it causes httpxclient to call __ aexit __ and cancel all the current requests, you could bypass it by using return_exceptions=True for asyncio.gather:
async def main():
async with httpx.AsyncClient() as client:
tasks = []
# Get list of urls to parse.
urls = get_events('https://main-url-to-parse.com')
# Get the responses for the detail page for each event
for url in urls:
tasks.append(asyncio.ensure_future(get_response(client, url)))
detail_responses = await asyncio.gather(*tasks, return_exceptions=True)
for resp in detail_responses:
# here you would need to do smth with the exceptions
# if isinstance(resp, Exception): ...
event = get_details(resp) # Parse url and get desired info
I'm working on an application that will have to consult multiple APIs for information and after processing the data, will output the answer to a client. The client uses a browser to connect to a web server to forward the request, afterwards, the web server will look for the information needed from the multiple APIs and after joining the responses from those APIs will then give an answer to the client.
The web server was built using Flask and a module that extracts the needed information for each API was also implemented (Python). Since the consulting process for each API takes time, I would like to give the web server a timeout for responding, therefore, after the requests are sent only those that are below the time buffer will be used.
My proposed solution:
Use a Redis Queue and an RQ worker to enqueue the requests for each API and store the responses on the Queue then wait for the timeout and collect the responses that were able to respond in the allowed time. Afterwards, process the information and give the response to the user.
The flask web server is setup something like this:
#app.route('/result',methods=["POST"])
def show_result():
inputText = request.form["question"]
tweetModule = Twitter()
tweeterResponse = tweetModule.ask(params=inputText)
redditObject = RedditModule()
redditResponse = redditObject.ask(params=inputText)
edmunds = Edmunds()
edmundsJson = edmunds.ask(params=inputText)
# More APIs could be consulted here
# Send each request async and the synchronize the responses from the queue
template = env.get_template('templates/result.html')
return render_template(template,resp=resp)
The worker:
conn = redis.from_url(redis_url)
if __name__ == '__main__':
with Connection(conn):
worker = Worker(map(Queue, listen))
worker.work()
And lets assume each Module handles its own queueing process.
I can see some problems ahead:
What happens to the information stored on the queue that did not make it to the timeout?
How can I make Flask wait and then extract the responses from the Queue?
Is it possible that information could get mixed if two clients ask in the same time-frame?
Is there a better way to handle the async requests and then synchronize the response?
Thanks!
In such cases I prefer a combination of HTTPX and flask[async]
First - HTTPX
HTTPX offers a standard synchronous API by default, but also gives you the option of an async client if you need it.
Async is a concurrency model that is far more efficient than multi-threading, and can provide significant performance benefits and enable the use of long-lived network connections such as WebSockets.
If you're working with an async web framework then you'll also want to use an async client for sending outgoing HTTP requests.
>>> async with httpx.AsyncClient() as client:
... r = await client.get('https://www.example.com/')
...
>>> r
<Response [200 OK]>
Second - Using async and await in a flask
Routes, error handlers, before request, after request, and teardown functions can all be coroutine functions if Flask is installed with the async extra (pip install flask[async]). It requires Python 3.7+ where contextvars.ContextVar is available. This allows views to be defined with async def and use await.
For example, you should do something like this:
import asyncio
import httpx
from flask import Flask, render_template, request
app = Flask(__name__)
#app.route('/async', methods=['GET', 'POST'])
async def async_form():
if request.method == 'POST':
...
async with httpx.AsyncClient() as client:
tweeterResponse, redditResponse, edmundsJson = await asyncio.gather(
client.get(f'https://api.tweeter....../id?id={request.form["tweeter_id"]}', timeout=None),
client.get(f'https://api.redditResponse.....?key={APIKEY}&reddit={request.form["reddit_id"]}'),
client.post(f'https://api.edmundsJson.......', data=inputText)
)
...
resp = {
"tweeter_response" : tweeterResponse,
"reddit_response": redditResponse,
"edmunds_json" : edmundsJson
}
template = env.get_template('templates/result.html')
return render_template(template, resp=resp)