HTTP server kick-off background python script without blocking

HTTP server kick-off background python script without blocking - python

I'd like to be able to trigger a long-running python script via a web request, in bare-bones fashion. Also, I'd like to be able to trigger other copies of the script with different parameters while initial copies are still running.
I've looked at flask, aiohttp, and queueing possibilities. Flask and aiohttp seem to have the least overhead to set up. I plan on executing the existing python script via subprocess.run (however, I did consider refactoring the script into libraries that could be used in the web response function).
With aiohttp, I'm trying something like:
ingestion_service.py:
from aiohttp import web
from pprint import pprint
routes = web.RouteTableDef()
#routes.get("/ingest_pipeline")
async def test_ingest_pipeline(request):
'''
Get the job_conf specified from the request and activate the script
'''
#subprocess.run the command with lookup of job conf file
response = web.Response(text=f"Received data ingestion request")
await response.prepare(request)
await response.write_eof()
#eventually this would be subprocess.run call
time.sleep(80)
return response
def init_func(argv):
app = web.Application()
app.add_routes(routes)
return app
But though the initial request returns immediately, subsequent requests block until the initial request is complete. I'm running a server via:
python -m aiohttp.web -H localhost -P 8080 ingestion_service:init_func
I know that multithreading and concurrency may provide better solutions than asyncio. In this case, I'm not looking for a robust solution, just something that will allow me to run multiple scripts at once via http request, ideally with minimal memory costs.

OK, there were a couple of issues with what I was doing. Namely, time.sleep() is blocking, so asyncio.sleep() should be used. However, since I'm interested in spawning a subprocess, I can use asyncio.subprocess to do that in a non-blocking fashion.
nb:
asyncio: run one function threaded with multiple requests from websocket clients
https://docs.python.org/3/library/asyncio-subprocess.html.
Using these help, but there's still an issue with the webhandler terminating the subprocess. Luckily, there's a solution here:
https://docs.aiohttp.org/en/stable/web_advanced.html
aiojobs has a decorator "atomic" that will protect the process until it is complete. So, code along these lines will function:
from aiojobs.aiohttp import setup, atomic
import asyncio
import os
from aiohttp import web
#atomic
async def ingest_pipeline(request):
#be careful what you pass through to shell, lest you
#give away the keys to the kingdom
shell_command = "[your command here]"
response_text = f"running {shell_command}"
response_code = 200
response = web.Response(text=response_text, status=response_code)
await response.prepare(request)
await response.write_eof()
ingestion_process = await asyncio.create_subprocess_shell(shell_command,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE)
stdout, stderr = await ingestion_process.communicate()
return response
def init_func(argv):
app = web.Application()
setup(app)
app.router.add_get('/ingest_pipeline', ingest_pipeline)
return app
This is very bare bones, but might help others looking for a quick skeleton for a temporary internal solution.

Related

How to stop execution of FastAPI endpoint after a specified time to reduce CPU resource usage/cost?

Use case
The client micro service, which calls /do_something, has a timeout of 60 seconds in the request/post() call. This timeout is fixed and can't be changed. So if /do_something takes 10 mins, /do_something is wasting CPU resources since the client micro service is NOT waiting after 60 seconds for the response from /do_something, which wastes CPU for 10 mins and this increases the cost. We have limited budget.
The current code looks like this:
import time
from uvicorn import Server, Config
from random import randrange
from fastapi import FastAPI
app = FastAPI()
def some_func(text):
"""
Some computationally heavy function
whose execution time depends on input text size
"""
randinteger = randrange(1,120)
time.sleep(randinteger)# simulate processing of text
return text
#app.get("/do_something")
async def do_something():
response = some_func(text="hello world")
return {"response": response}
# Running
if __name__ == '__main__':
server = Server(Config(app=app, host='0.0.0.0', port=3001))
server.run()
Desired Solution
Here /do_something should stop the processing of the current request to endpoint after 60 seconds and wait for next request to process.
If execution of the end point is force stopped after 60 seconds we should be able to log it with custom message.
This should not kill the service and work with multithreading/multiprocessing.
I tried this. But when timeout happends the server is getting killed.
Any solution to fix this?
import logging
import time
import timeout_decorator
from uvicorn import Server, Config
from random import randrange
from fastapi import FastAPI
app = FastAPI()
#timeout_decorator.timeout(seconds=2, timeout_exception=StopIteration, use_signals=False)
def some_func(text):
"""
Some computationally heavy function
whose execution time depends on input text size
"""
randinteger = randrange(1,30)
time.sleep(randinteger)# simulate processing of text
return text
#app.get("/do_something")
async def do_something():
try:
response = some_func(text="hello world")
except StopIteration:
logging.warning(f'Stopped /do_something > endpoint due to timeout!')
else:
logging.info(f'( Completed < /do_something > endpoint')
return {"response": response}
# Running
if __name__ == '__main__':
server = Server(Config(app=app, host='0.0.0.0', port=3001))
server.run()

This answer is not about improving CPU time—as you mentioned in the comments section—but rather explains what would happen, if you defined an endpoint with normal def or async def, as well as provides solutions when you run blocking operations inside an endpoint.
You are asking how to stop the processing of a request after a while, in order to process further requests. It does not really make that sense to start processing a request, and then (60 seconds later) stop it as if it never happened (wasting server resources all that time and having other requests waiting). You should instead let the handling of requests to FastAPI framework itself. When you define an endpoint with async def, it is run on the main thread (in the event loop), i.e., the server processes the requests sequentially, as long as there is no await call inside the endpoint (just like in your case). The keyword await passes function control back to the event loop. In other words, it suspends the execution of the surrounding coroutine, and tells the event loop to let something else run, until the awaited task completes (and has returned the result data). The await keyword only works within an async function.
Since you perform a heavy CPU-bound operation inside your async def endpoint (by calling your some_func() function), and you never give up control for other requests to run in the event loop (e.g., by awaiting for some coroutine), the server will be blocked and wait for that request to be fully processed and complete, before moving on to the next one(s)—have a look at this answer for more details.
Solutions
One solution would be to define your endpoint with normal def instead of async def. In brief, when you declare an endpoint with normal def instead of async def in FastAPI, it is run in an external threadpool that is then awaited, instead of being called directly (as it would block the server); hence, FastAPI would still work asynchronously.
Another solution, as described in this answer, is to keep the async def definition and run the CPU-bound operation in a separate thread and await it, using Starlette's run_in_threadpool(), thus ensuring that the main thread (event loop), where coroutines are run, does not get blocked. As described by #tiangolo here, "run_in_threadpool is an awaitable function, the first parameter is a normal function, the next parameters are passed to that function directly. It supports sequence arguments and keyword arguments". Example:
from fastapi.concurrency import run_in_threadpool
res = await run_in_threadpool(cpu_bound_task, text='Hello world')
Since this is about a CPU-bound operation, it would be preferable to run it in a separate process, using ProcessPoolExecutor, as described in the link provided above. In this case, this could be integrated with asyncio, in order to await the process to finish its work and return the result(s). Note that, as described in the link above, it is important to protect the main loop of code to avoid recursive spawning of subprocesses, etc—essentially, your code must be under if __name__ == '__main__'. Example:
import concurrent.futures
from functools import partial
import asyncio
loop = asyncio.get_running_loop()
with concurrent.futures.ProcessPoolExecutor() as pool:
res = await loop.run_in_executor(pool, partial(cpu_bound_task, text='Hello world'))
About Request Timeout
With regards to the recent update on your question about the client having a fixed 60s request timeout; if you are not behind a proxy such as Nginx that would allow you to set the request timeout, and/or you are not using gunicorn, which would also allow you to adjust the request timeout, you could use a middleware, as suggested here, to set a timeout for all incoming requests. The suggested middleware (example is given below) uses asyncio's .wait_for() function, which waits for an awaitable function/coroutine to complete with a timeout. If a timeout occurs, it cancels the task and raises asyncio.TimeoutError.
Regarding your comment below:
My requirement is not unblocking next request...
Again, please read carefully the first part of this answer to understand that if you define your endpoint with async def and not await for some coroutine inside, but instead perform some CPU-bound task (as you already do), it will block the server until is completed (and even the approach below wont' work as expected). That's like saying that you would like FastAPI to process one request at a time; in that case, there is no reason to use an ASGI framework such as FastAPI, which takes advantage of the async/await syntax (i.e., processing requests asynchronously), in order to provide fast performance. Hence, you either need to drop the async definition from your endpoint (as mentioned earlier above), or, preferably, run your synchronous CPU-bound task using ProcessPoolExecutor, as described earlier.
Also, your comment in some_func():
Some computationally heavy function whose execution time depends on
input text size
indicates that instead of (or along with) setting a request timeout, you could check the length of input text (using a dependency fucntion, for instance) and raise an HTTPException in case the text's length exceeds some pre-defined value, which is known beforehand to require more than 60s to complete the processing. In that way, your system won't waste resources trying to perform a task, which you already know will not be completed.
Working Example
import time
import uvicorn
import asyncio
import concurrent.futures
from functools import partial
from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse
from starlette.status import HTTP_504_GATEWAY_TIMEOUT
from fastapi.concurrency import run_in_threadpool
REQUEST_TIMEOUT = 2 # adjust timeout as desired
app = FastAPI()
#app.middleware('http')
async def timeout_middleware(request: Request, call_next):
try:
return await asyncio.wait_for(call_next(request), timeout=REQUEST_TIMEOUT)
except asyncio.TimeoutError:
return JSONResponse({'detail': f'Request exceeded the time limit for processing'},
status_code=HTTP_504_GATEWAY_TIMEOUT)
def cpu_bound_task(text):
time.sleep(5)
return text
#app.get('/')
async def main():
loop = asyncio.get_running_loop()
with concurrent.futures.ProcessPoolExecutor() as pool:
res = await loop.run_in_executor(pool, partial(cpu_bound_task, text='Hello world'))
return {'response': res}
if __name__ == '__main__':
uvicorn.run(app)

Running a Tornado Server within a Jupyter Notebook

Taking the standard Tornado demonstration and pushing the IOLoop into a background thread allows querying of the server within a single script. This is useful when the Tornado server is an interactive object (see Dask or similar).
import asyncio
import requests
import tornado.ioloop
import tornado.web
from concurrent.futures import ThreadPoolExecutor
class MainHandler(tornado.web.RequestHandler):
def get(self):
self.write("Hello, world")
def make_app():
return tornado.web.Application([
(r"/", MainHandler),
])
pool = ThreadPoolExecutor(max_workers=2)
loop = tornado.ioloop.IOLoop()
app = make_app()
app.listen(8888)
fut = pool.submit(loop.start)
print(requests.get("https://localhost:8888"))
The above works just fine in a standard python script (though it is missing safe shutdown). Jupyter notebook are optimal environment for these interactive Tornado server environments. However, when it comes to Jupyter this idea breaks down as there is already a active running loop:
>>> import asyncio
>>> asyncio.get_event_loop()
<_UnixSelectorEventLoop running=True closed=False debug=False>
This is seen when running the above script in a Jupyter notebook, both the server and the request client are trying to open a connection in the same thread and the code hangs. Building a new Asyncio loop and/or Tornado IOLoop does not seem to help and I suspect I am missing something in Jupyter itself.
The question: Is it possible to have a live Tornado server running in the background within a Jupyter notebook so that standard python requests or similar can connect to it from the primary thread? I would prefer to avoid Asyncio in the code presented to users if possible due to its relatively complexity for novice users.

Based on my recent PR to streamz, here is something that works, similar to your idea:
class InNotebookServer(object):
def __init__(self, port):
self.port = port
self.loop = get_ioloop()
self.start()
def _start_server(self):
from tornado.web import Application, RequestHandler
from tornado.httpserver import HTTPServer
from tornado import gen
class Handler(RequestHandler):
source = self
#gen.coroutine
def get(self):
self.write('Hello World')
application = Application([
('/', Handler),
])
self.server = HTTPServer(application)
self.server.listen(self.port)
def start(self):
"""Start HTTP server and listen"""
self.loop.add_callback(self._start_server)
_io_loops = []
def get_ioloop():
from tornado.ioloop import IOLoop
import threading
if not _io_loops:
loop = IOLoop()
thread = threading.Thread(target=loop.start)
thread.daemon = True
thread.start()
_io_loops.append(loop)
return _io_loops[0]
To call in the notebook
In [2]: server = InNotebookServer(9005)
In [3]: import requests
requests.get('http://localhost:9005')
Out[3]: <Response [200]>

Part 1: Let get nested tornado(s)
To find the information you need you would have had to follow the following crumbtrails, start by looking at what is described in the release notes of IPython 7
It particular it will point you to more informations on the async and await sections in the documentation, and to this discussion,
which suggest the use of nest_asyncio.
The Crux is the following:
A) either you trick python into running two nested event loops. (what nest_asyncio does)
B) You schedule coroutines on already existing eventloop. (I'm not sure how to do that with tornado)
I'm pretty sure you know all that, but I'm sure other reader will appreciate.
There are unfortunately no ways to make it totally transparent to users – well unless you control the deployment like on a jupyterhub, and can add these lines to the IPython startups scripts that are automatically loaded. But I think the following is simple enough.
import nest_asyncio
nest_asyncio.apply()
# rest of your tornado setup and start code.
Part 2: Gotcha Synchronous code block eventloop.
Previous section takes only care of being able to run the tornado app. But note that any synchronous code will block the eventloop; thus when running print(requests.get("http://localhost:8000")) the server will appear to not work as you are blocking the eventloop, which will restart only when the code finish execution which is waiting for the eventloop to restart...(understanding this is an exercise left to the reader). You need to either issue print(requests.get("http://localhost:8000")) from another kernel, or, use aiohttp.
Here is how to use aiohttp in a similar way as requests.
import aiohttp
session = aiohttp.ClientSession()
await session.get('http://localhost:8889')
In this case as aiohttp is non-blocking things will appear to work properly. You here can see some extra IPython magic where we autodetect async code and run it on the current eventloop.
A cool exercise could be to run a request.get in a loop in another kernel, and run sleep(5) in the kernel where tornado is running, and see that we stop processing requests...
Part 3: Disclaimer and other routes:
This is quite tricky and I would advise to not use in production, and warn your users this is not the recommended way of doing things.
That does not completely solve your case, you will need to run things not in the main thread which I'm not sure is possible.
You can also try to play with other loop runners like trio and curio; they might allow you to do stuff you can't with asyncio by default like nesting, but here be dragoons. I highly recommend trio and the multiple blog posts around its creation, especially if you are teaching async.
Enjoy, hope that helped, and please report bugs, as well as things that did work.

You can make the tornado server run in background using the %%script --bg magic command. The option --bg tells jupyter to run the code of the current cell in background.
Just create a tornado server in one cell alongwith the magic command and run that cell.
Example:
%%script python --bg
import tornado.ioloop
import tornado.web
class MainHandler(tornado.web.RequestHandler):
def get(self):
self.write("Hello, world")
def make_app():
return tornado.web.Application([
(r"/", MainHandler),
])
loop = tornado.ioloop.IOLoop.current()
app = make_app()
app.listen(8000) # 8888 was being used by jupyter in my case
loop.start()
And then you can use requests in a separate cell to connect to the server:
import requests
print(requests.get("http://localhost:8000"))
# prints <Response [200]>
One thing to note here is that if you stop/interrupt the kernel on any cell, the background script will also stop. So you'll have to run this cell again to start the server.

concurrent connections in urllib3

Using a loop to make multiple requests to various websites, how is it possible to do this with a proxy in urllib3?
The code will read in a tuple of URLs, and use a for loop to connect to each site, however, currently it does not connect past the first url in the tuple. There is a proxy in place as well.
list = ['https://URL1.com', 'http://URL2.com', 'http://URL3.com']
for i in list:
http = ProxyManager("PROXY-PROXY")
http_get = http.request('GET', i, preload_content=False).read().decode()
I have removed the urls and proxy information from the above code. The first URL in the tuple will run fine, but after this, nothing else occurs, just waiting. I have tried the clear() method to reset the connection for each time in the loop.

unfortunately urllib3 is synchronous and blocks. You could use it with threads, but that is a hassle and usually leads to more problems. The main approach these days is to use some asynchronous network. Twisted and asyncio (with aiohttp maybe) are the popular packages.
I'll provide an example using trio framework and asks:
import asks
import trio
asks.init('trio')
path_list = ['https://URL1.com', 'http://URL2.com', 'http://URL3.com']
results = []
async def grabber(path):
r = await s.get(path)
results.append(r)
async def main(path_list):
async with trio.open_nursery() as n:
for path in path_list:
n.spawn(grabber(path))
s = asks.Session()
trio.run(main, path_list)

Using threads is not really that much of a hassle since 3.2 when concurrent.futures was added:
from urllib3 import ProxyManager
from concurrent.futures import ThreadPoolExecutor,wait
url_list:list = ['https://URL1.com', 'http://URL2.com', 'http://URL3.com']
thread_pool:ThreadPoolExecutor = ThreadPoolExecutor(max_workers=min(len(url_list),20))
tasks = []
for url in url_list:
def send_request() -> type:
# copy i into this function's stack frame
this_url:str = url
# could this assignment be removed from the loop?
# I'd have to read the docs for ProxyManager but probably
http:ProxyManager = ProxyManager("PROXY-PROXY")
return http.request('GET', this_url, preload_content=False).read().decode()
tasks.append(thread_pool.submit(send_request))
wait(tasks)
all_responses:list = [task.result() for task in tasks]
Later versions offer an event loop via asyncio. Issues I've had with asyncio are usually related to portability of libraries (IE aiohttp via pydantic), most of which are not pure python and have external libc dependencies. This can be an issue if you have to support a lot of docker apps which might have musl-libc(alpine) or glibc(everyone else).

Handling async requests using multiple APIs - Python Flask Redis

I'm working on an application that will have to consult multiple APIs for information and after processing the data, will output the answer to a client. The client uses a browser to connect to a web server to forward the request, afterwards, the web server will look for the information needed from the multiple APIs and after joining the responses from those APIs will then give an answer to the client.
The web server was built using Flask and a module that extracts the needed information for each API was also implemented (Python). Since the consulting process for each API takes time, I would like to give the web server a timeout for responding, therefore, after the requests are sent only those that are below the time buffer will be used.
My proposed solution:
Use a Redis Queue and an RQ worker to enqueue the requests for each API and store the responses on the Queue then wait for the timeout and collect the responses that were able to respond in the allowed time. Afterwards, process the information and give the response to the user.
The flask web server is setup something like this:
#app.route('/result',methods=["POST"])
def show_result():
inputText = request.form["question"]
tweetModule = Twitter()
tweeterResponse = tweetModule.ask(params=inputText)
redditObject = RedditModule()
redditResponse = redditObject.ask(params=inputText)
edmunds = Edmunds()
edmundsJson = edmunds.ask(params=inputText)
# More APIs could be consulted here
# Send each request async and the synchronize the responses from the queue
template = env.get_template('templates/result.html')
return render_template(template,resp=resp)
The worker:
conn = redis.from_url(redis_url)
if __name__ == '__main__':
with Connection(conn):
worker = Worker(map(Queue, listen))
worker.work()
And lets assume each Module handles its own queueing process.
I can see some problems ahead:
What happens to the information stored on the queue that did not make it to the timeout?
How can I make Flask wait and then extract the responses from the Queue?
Is it possible that information could get mixed if two clients ask in the same time-frame?
Is there a better way to handle the async requests and then synchronize the response?
Thanks!

In such cases I prefer a combination of HTTPX and flask[async]
First - HTTPX
HTTPX offers a standard synchronous API by default, but also gives you the option of an async client if you need it.
Async is a concurrency model that is far more efficient than multi-threading, and can provide significant performance benefits and enable the use of long-lived network connections such as WebSockets.
If you're working with an async web framework then you'll also want to use an async client for sending outgoing HTTP requests.
>>> async with httpx.AsyncClient() as client:
... r = await client.get('https://www.example.com/')
...
>>> r
<Response [200 OK]>
Second - Using async and await in a flask
Routes, error handlers, before request, after request, and teardown functions can all be coroutine functions if Flask is installed with the async extra (pip install flask[async]). It requires Python 3.7+ where contextvars.ContextVar is available. This allows views to be defined with async def and use await.
For example, you should do something like this:
import asyncio
import httpx
from flask import Flask, render_template, request
app = Flask(__name__)
#app.route('/async', methods=['GET', 'POST'])
async def async_form():
if request.method == 'POST':
...
async with httpx.AsyncClient() as client:
tweeterResponse, redditResponse, edmundsJson = await asyncio.gather(
client.get(f'https://api.tweeter....../id?id={request.form["tweeter_id"]}', timeout=None),
client.get(f'https://api.redditResponse.....?key={APIKEY}&reddit={request.form["reddit_id"]}'),
client.post(f'https://api.edmundsJson.......', data=inputText)
)
...
resp = {
"tweeter_response" : tweeterResponse,
"reddit_response": redditResponse,
"edmunds_json" : edmundsJson
}
template = env.get_template('templates/result.html')
return render_template(template, resp=resp)

Gevent async server with blocking requests

I have what I would think is a pretty common use case for Gevent. I need a UDP server that listens for requests, and based on the request submits a POST to an external web service. The external web service essentially only allows one request at a time.
I would like to have an asynchronous UDP server so that data can be immediately retrieved and stored so that I don't miss any requests (this part is easy with the DatagramServer gevent provides). Then I need some way to send requests to the external web service serially, but in such a way that it doesn't ruin the async of the UDP server.
I first tried monkey patching everything and what I ended up with was a quick solution, but one in which my requests to the external web service were not rate limited in any way and which resulted in errors.
It seems like what I need is a single non-blocking worker to send requests to the external web service in serial while the UDP server adds tasks to the queue from which the non-blocking worker is working.
What I need is information on running a gevent server with additional greenlets for other tasks (especially with a queue). I've been using the serve_forever function of the DatagramServer and think that I'll need to use the start method instead, but haven't found much information on how it would fit together.
Thanks,
EDIT
The answer worked very well. I've adapted the UDP server example code with the answer from #mguijarr to produce a working example for my use case:
from __future__ import print_function
from gevent.server import DatagramServer
import gevent.queue
import gevent.monkey
import urllib
gevent.monkey.patch_all()
n = 0
def process_request(q):
while True:
request = q.get()
print(request)
print(urllib.urlopen('https://test.com').read())
class EchoServer(DatagramServer):
__q = gevent.queue.Queue()
__request_processing_greenlet = gevent.spawn(process_request, __q)
def handle(self, data, address):
print('%s: got %r' % (address[0], data))
global n
n += 1
print(n)
self.__q.put(n)
self.socket.sendto('Received %s bytes' % len(data), address)
if __name__ == '__main__':
print('Receiving datagrams on :9000')
EchoServer(':9000').serve_forever()

Here is how I would do it:
Write a function taking a "queue" object as argument; this function will continuously process items from the queue. Each item is supposed to be a request for the web service.
This function could be a module-level function, not part of your DatagramServer instance:
def process_requests(q):
while True:
request = q.get()
# do your magic with 'request'
...
in your DatagramServer, make the function running within a greenlet (like a background task):
self.__q = gevent.queue.Queue()
self.__request_processing_greenlet = gevent.spawn(process_requests, self.__q)
when you receive the UDP request in your DatagramServer instance, you push the request to the queue
self.__q.put(request)
This should do what you want. You still call 'serve_forever' on DatagramServer, no problem.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.