Can a Python RQ job reschedule itself and keep depending jobs around? - python

I have a Python RQ job that downloads a resource from a webserver.
In case of a non-responding webserver, can the download-job reschedule itself and retry the download after a certain interval?
Several transformation-jobs depend on the download-job via
job_queue.enqueue(transformation_task, depends_on=download_job)
If the download-job could reschedule itself, are the dependent jobs kept along, and would finally execute, once the download-job finishes?

I asked the question on the python-rq GitHub project and the functionality is now included in version 1.5.0 of RQ.
RQ lets you now easily retry failed jobs. To configure retries, use RQ’s Retry object that accepts max and interval arguments.
Dependent jobs are kept in the deferred job registry until the job they depend upon succeeds and are executed only then.
For example:
from redis import Redis
from rq import Queue, Retry
from somewhere import randomly_failing_task, dependent_task
job_queue = Queue(connection=Redis())
randomly_failing_job = job_queue.enqueue(randomly_failing_task, retry=Retry(max=3))
dependent_job = job_queue.enqueue(dependent_task, depends_on=randomly_failing_job)
And the sample tasks:
from random import choice
def randomly_failing_task():
print('I am a task, I will fail 50% of the times :/')
success = choice([True, False])
if success:
print('I succeed :)')
else:
print('I failed :(')
raise Exception('randomly_failing_task failed!')
def dependent_task():
print('I depend upon the randomly_failing_task.')
print('I am only executed, once the randomly_failing_task succeeded.’)

Related

How to stop execution of FastAPI endpoint after a specified time to reduce CPU resource usage/cost?

Use case
The client micro service, which calls /do_something, has a timeout of 60 seconds in the request/post() call. This timeout is fixed and can't be changed. So if /do_something takes 10 mins, /do_something is wasting CPU resources since the client micro service is NOT waiting after 60 seconds for the response from /do_something, which wastes CPU for 10 mins and this increases the cost. We have limited budget.
The current code looks like this:
import time
from uvicorn import Server, Config
from random import randrange
from fastapi import FastAPI
app = FastAPI()
def some_func(text):
"""
Some computationally heavy function
whose execution time depends on input text size
"""
randinteger = randrange(1,120)
time.sleep(randinteger)# simulate processing of text
return text
#app.get("/do_something")
async def do_something():
response = some_func(text="hello world")
return {"response": response}
# Running
if __name__ == '__main__':
server = Server(Config(app=app, host='0.0.0.0', port=3001))
server.run()
Desired Solution
Here /do_something should stop the processing of the current request to endpoint after 60 seconds and wait for next request to process.
If execution of the end point is force stopped after 60 seconds we should be able to log it with custom message.
This should not kill the service and work with multithreading/multiprocessing.
I tried this. But when timeout happends the server is getting killed.
Any solution to fix this?
import logging
import time
import timeout_decorator
from uvicorn import Server, Config
from random import randrange
from fastapi import FastAPI
app = FastAPI()
#timeout_decorator.timeout(seconds=2, timeout_exception=StopIteration, use_signals=False)
def some_func(text):
"""
Some computationally heavy function
whose execution time depends on input text size
"""
randinteger = randrange(1,30)
time.sleep(randinteger)# simulate processing of text
return text
#app.get("/do_something")
async def do_something():
try:
response = some_func(text="hello world")
except StopIteration:
logging.warning(f'Stopped /do_something > endpoint due to timeout!')
else:
logging.info(f'( Completed < /do_something > endpoint')
return {"response": response}
# Running
if __name__ == '__main__':
server = Server(Config(app=app, host='0.0.0.0', port=3001))
server.run()
This answer is not about improving CPU time—as you mentioned in the comments section—but rather explains what would happen, if you defined an endpoint with normal def or async def, as well as provides solutions when you run blocking operations inside an endpoint.
You are asking how to stop the processing of a request after a while, in order to process further requests. It does not really make that sense to start processing a request, and then (60 seconds later) stop it as if it never happened (wasting server resources all that time and having other requests waiting). You should instead let the handling of requests to FastAPI framework itself. When you define an endpoint with async def, it is run on the main thread (in the event loop), i.e., the server processes the requests sequentially, as long as there is no await call inside the endpoint (just like in your case). The keyword await passes function control back to the event loop. In other words, it suspends the execution of the surrounding coroutine, and tells the event loop to let something else run, until the awaited task completes (and has returned the result data). The await keyword only works within an async function.
Since you perform a heavy CPU-bound operation inside your async def endpoint (by calling your some_func() function), and you never give up control for other requests to run in the event loop (e.g., by awaiting for some coroutine), the server will be blocked and wait for that request to be fully processed and complete, before moving on to the next one(s)—have a look at this answer for more details.
Solutions
One solution would be to define your endpoint with normal def instead of async def. In brief, when you declare an endpoint with normal def instead of async def in FastAPI, it is run in an external threadpool that is then awaited, instead of being called directly (as it would block the server); hence, FastAPI would still work asynchronously.
Another solution, as described in this answer, is to keep the async def definition and run the CPU-bound operation in a separate thread and await it, using Starlette's run_in_threadpool(), thus ensuring that the main thread (event loop), where coroutines are run, does not get blocked. As described by #tiangolo here, "run_in_threadpool is an awaitable function, the first parameter is a normal function, the next parameters are passed to that function directly. It supports sequence arguments and keyword arguments". Example:
from fastapi.concurrency import run_in_threadpool
res = await run_in_threadpool(cpu_bound_task, text='Hello world')
Since this is about a CPU-bound operation, it would be preferable to run it in a separate process, using ProcessPoolExecutor, as described in the link provided above. In this case, this could be integrated with asyncio, in order to await the process to finish its work and return the result(s). Note that, as described in the link above, it is important to protect the main loop of code to avoid recursive spawning of subprocesses, etc—essentially, your code must be under if __name__ == '__main__'. Example:
import concurrent.futures
from functools import partial
import asyncio
loop = asyncio.get_running_loop()
with concurrent.futures.ProcessPoolExecutor() as pool:
res = await loop.run_in_executor(pool, partial(cpu_bound_task, text='Hello world'))
About Request Timeout
With regards to the recent update on your question about the client having a fixed 60s request timeout; if you are not behind a proxy such as Nginx that would allow you to set the request timeout, and/or you are not using gunicorn, which would also allow you to adjust the request timeout, you could use a middleware, as suggested here, to set a timeout for all incoming requests. The suggested middleware (example is given below) uses asyncio's .wait_for() function, which waits for an awaitable function/coroutine to complete with a timeout. If a timeout occurs, it cancels the task and raises asyncio.TimeoutError.
Regarding your comment below:
My requirement is not unblocking next request...
Again, please read carefully the first part of this answer to understand that if you define your endpoint with async def and not await for some coroutine inside, but instead perform some CPU-bound task (as you already do), it will block the server until is completed (and even the approach below wont' work as expected). That's like saying that you would like FastAPI to process one request at a time; in that case, there is no reason to use an ASGI framework such as FastAPI, which takes advantage of the async/await syntax (i.e., processing requests asynchronously), in order to provide fast performance. Hence, you either need to drop the async definition from your endpoint (as mentioned earlier above), or, preferably, run your synchronous CPU-bound task using ProcessPoolExecutor, as described earlier.
Also, your comment in some_func():
Some computationally heavy function whose execution time depends on
input text size
indicates that instead of (or along with) setting a request timeout, you could check the length of input text (using a dependency fucntion, for instance) and raise an HTTPException in case the text's length exceeds some pre-defined value, which is known beforehand to require more than 60s to complete the processing. In that way, your system won't waste resources trying to perform a task, which you already know will not be completed.
Working Example
import time
import uvicorn
import asyncio
import concurrent.futures
from functools import partial
from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse
from starlette.status import HTTP_504_GATEWAY_TIMEOUT
from fastapi.concurrency import run_in_threadpool
REQUEST_TIMEOUT = 2 # adjust timeout as desired
app = FastAPI()
#app.middleware('http')
async def timeout_middleware(request: Request, call_next):
try:
return await asyncio.wait_for(call_next(request), timeout=REQUEST_TIMEOUT)
except asyncio.TimeoutError:
return JSONResponse({'detail': f'Request exceeded the time limit for processing'},
status_code=HTTP_504_GATEWAY_TIMEOUT)
def cpu_bound_task(text):
time.sleep(5)
return text
#app.get('/')
async def main():
loop = asyncio.get_running_loop()
with concurrent.futures.ProcessPoolExecutor() as pool:
res = await loop.run_in_executor(pool, partial(cpu_bound_task, text='Hello world'))
return {'response': res}
if __name__ == '__main__':
uvicorn.run(app)

Exectuted Python Tasks stops on Ubuntu without exception

I'm building a flask app which processes some big files from users. Since this process can take a few seconds, I want the task to be done by an executor. I'm using Flask-executor and it works very well on my Windows developing machine.
But on the Ubuntu host, it simply doesn't: After submitting a task the job method gets called, logs one line to the log file and then stops. No return statement, no exception thrown or something else. How is this even possible?
Here I create the executor:
from flask_executor import Executor
executor = Executor()
executor.init_app(app)
After the file upload, I submit the job:
task_list.append((process_upload, full_file_path, current_user, readme_path))
Nothing special, so here's my method:
def process_upload(file_path: str, user: User, readme_path: str):
try:
_process_upload(file_path, user, readme_path)
except BaseException:
logger.error(traceback.format_exc())
def _process_upload(file_path: str, user: User, readme_path: str):
logger.info(f" --------- processing file {file_path} in background from user {user.username}")
logger.info("1")
path_only, file_name_with_ext = os.path.split(file_path)
logger.info("2")
The process method is actually bigger and does some fancy things, but I'm not even getting that far. Here's the log file:
2022-02-22 10:56:02,055 - ThreadPoolExecutor-0_0 - INFO - --------- processing file /var/www/myproject/files/test/test.dat in background from user test
I know that it's not just log files because the task isn't done after all.
So how the hell is it possible that the submitted task just stops? And why does it work on Windows but not on a 08/15 Ubuntu? How can i find and fix the problem?

Aborting code execution in a Python Process without terminating the process

Let's say I have a (websocket) API, api.py, as such:
from flask import Flask, request
from flask_socketio import SocketIO, emit
from worker import Worker
app = Flask()
socketio = SocketIO(app)
worker = Worker()
worker.start()
#socketio.on('connect')
def connect():
print("Client", request.sid, "connected")
#socketio.on('get_results')
def get_results(query):
"""
The only endpoing of the API.
"""
print("Client", request.sid, "requested results for query", query)
# Set the worker to work, wait for results to be ready, and
# send the results back to the client.
worker.task_queue.put(query)
results = worker.result_queue.get()
emit("results", results)
#socketio.on('disconnect')
def disconnect():
print("Client", request.sid, "disconnected, perhaps before results where ready")
# What to do here?
socketio.run(app, host='')
The a API will serve many clients but only has a single worker to produce the results that should be served. worker.py:
from multiprocessing import Process, Queue
class Worker(Process):
def __init__(self):
super().__init__()
self.task_queue = Queue()
self.result_queue = Queue()
self.some_stateful_variable = 0
# Do other computationally expensive work
def reset_state(self):
# Computationally inexpensive.
pass
def do_work(self, task):
# Computationally expensive. Takes long time.
# Modifies internal state.
pass
def run(self):
while True:
task = self.task_queue.get()
results = self.do_work(task)
self.result_queue.put(results)
The worker gets a request, i.e. a task to do, and sets forth producing a result. When the result is ready, the client will be served it.
But not all clients are patient. They may leave, i.e. disconnect from the API, before the results are ready. They don't want them, and the worker therefore ends up working on a task that does not need to finish. That makes other client in queue wait unnecessarily. How to avoid this situation, and get the worker to abort executing do_work for a task that does not need to finish?
In client side: when user closes browser tab or leave the page send request to your Flask server, the request should contain id of the task you would like to cancel.
In server side put cancel status for the task in database or any shared variable between Flask Server and your Worker Process
Divide Task processing in several stages and check status of task in database before each stage, if status is canceled - stop the task processing.
Another choice for point 1 is to do some monitoring in Server side in separate Process - count interval between status requests from client side.
I've handled similar problems by launching an entirely separate process via:
sp.call('start python path\\worker.py', shell=True)
worker.py would then report its PID back to the api.py via redis, then its straightforward to kill the process at any point from api.py
Of course, how viable that is for you will depend on how much data resides within api.py and is shared to worker.py - whether its feasible for that to also pass via redis or not is for you to decide.
The added benefit is you decouple socket from heavy compute - and you can go quasi-multi-core (single thread per worker.py). You could go full multi core by incorporating multiprocessing into each worker.py if you wished.

How to write python code that will work with threads or coroutines and will complete in deterministic time?

What I mean by "deterministic time"? For example AWS offer a service "AWS Lambda". The process started as lambda function has time limit, after that lambda function will stop execution and will assume that task was finished with error. And example task - send data to http endpoint. Depending of a network connection to http endpoint, or other factors, process of sending data can take a long time. If I need to send the same data to the many endpoints, then full process time will take one process time times endpoints amount. Which increase a chance that lambda function will be stopped before all data will be send to all endpoints.
To solve this I need to send data to different endpoints in parallel mode using threads.
The problem with threads - started thread can't be stopped. If http request will take more time than it dedicated by lambda function time limit, lambda function will be aborted and return error. So I need to use timeout with http request, to abort it, if it take more time than expected.
If http request will be canceled by timeout or endpoint will return error, I need to save not processed data somewhere to not lost the data. The time needed to save unprocessed data can be predicted, because I control the storage where data will be saved.
And the last part that consume time - procedure or loop where threads are scheduled executor.submit(). If there is only one endpoint or small number of them then the consumed time will be small. And there is no necessary to control this. But if I have deal with many endpoints, I have to take this into account.
So basically full time will consists of:
scheduling threads
http request execution
saving unprocessed data
There is example of how I can manage time using threads
import concurrent.futures
from functools import partial
import requests
import time
start = time.time()
def send_data(data):
host = 'http://127.0.0.1:5000/endpoint'
try:
result = requests.post(host, json=data, timeout=(0.1, 0.5))
# print('done')
if result.status_code == 200:
return {'status': 'ok'}
if result.status_code != 200:
return {'status': 'error', 'msg': result.text}
except requests.exceptions.Timeout as err:
return {'status': 'error', 'msg': 'timeout'}
def get_data(n):
return {"wait": n}
def done_cb(a, b, future):
pass # save unprocessed data
def main():
executor = concurrent.futures.ThreadPoolExecutor()
futures = []
max_time = 0.5
for i in range(1):
future = executor.submit(send_data, *[{"wait": 10}])
future.add_done_callback(partial(done_cb, 2, 3))
futures.append(future)
if time.time() - s_time > max_time:
print('stopping creating new threads')
# save unprocessed data
break
try:
for item in concurrent.futures.as_completed(futures, timeout=1):
item.result()
except concurrent.futures.TimeoutError as err:
pass
I was thinking of how I can use asyncio library instead of threads, to do the same thing.
import asyncio
import time
from functools import partial
import requests
start = time.time()
def send_data(data):
...
def get_data(n):
return {"wait": n}
def done_callback(a,b, future):
pass # save unprocessed data
def main(loop):
max_time = 0.5
futures = []
start_appending = time.time()
for i in range(1):
event_data = get_data(1)
future = (loop.run_in_executor(None, send_data, event_data))
future.add_done_callback(partial(done_callback, 2, 3))
futures.append(future)
if time.time() - s_time > max_time:
print('stopping creating new futures')
# save unprocessed data
break
finished, unfinished = loop.run_until_complete(
asyncio.wait(futures, timeout=1)
)
_loop = asyncio.get_event_loop()
result = main(_loop)
Function send_data() the same as in previous code snipped.
Because request library is not async code I use run_in_executor() to create future object. The main problems I have is that done_callback() is not executed when the thread that started but executor done it's job. But only when the futures will be "processed" by asyncio.wait() expression.
Basically I seeking the way to start execute asyncio future, like ThreadPoolExecutor start execute threads, and not wait for asyncio.wait() expression to call done_callback(). If you have other ideas how to write python code that will work with threads or coroutines and will complete in deterministic time. Please share it, I will be glad to read them.
And other question. If thread or future done its job, it can return result, that I can use in done_callback(), for example to remove message from queue by id returned in result. But if thread or future was canceled, I don't have result. And I have to use functools.partial() pass in done_callback additional data, that can help me to understand for what data this callback was called. If passed data are small this is not a problem. If data will be big, I need to put data in array/list/dictionary and pass in callback only index of array or put "full data: in callback.
Can I somehow get access to variable that was passed to future/thread, from done_callback(), that was triggered on canceled future/thread?
You can use asyncio.wait_for to wait for a future (or multiple futures, when combined with asyncio.gather) and cancel them in case of a timeout. Unlike threads, asyncio supports cancellation, so you can cancel a task whenever you feel like it, and it will be cancelled at the first blocking call it makes (typically a network call).
Note that for this to work, you should be using asyncio-native libraries such as aiohttp for HTTP. Trying to combine requests with asyncio using run_in_executor will appear to work for simple tasks, but it will not bring you the benefits of using asyncio, such as being able to spawn a massive number of tasks without encumbering the OS, or the possibility of cancellation.

Check if in celery task

How to check that a function in executed by celery?
def notification():
# in_celery() returns True if called from celery_test(),
# False if called from not_celery_test()
if in_celery():
# Send mail directly without creation of additional celery subtask
...
else:
# Send mail with creation of celery task
...
#celery.task()
def celery_test():
notification()
def not_celery_test():
notification()
Here is one way to do it by using celery.current_task. Here is the code to be used by the task:
def notification():
from celery import current_task
if not current_task:
print "directly called"
elif current_task.request.id is None:
print "called synchronously"
else:
print "dispatched"
#app.task
def notify():
notification()
This is code you can run to exercise the above:
from core.tasks import notify, notification
print "DIRECT"
notification()
print "NOT DISPATCHED"
notify()
print "DISPATCHED"
notify.delay().get()
My task code in the first snippet was in a module named core.tasks. And I shoved the code in the last snippet in a custom Django management command. This tests 3 cases:
Calling notification directly.
Calling notification through a task executed synchronously. That is, this task is not dispatched through Celery to a worker. The code of the task executes in the same process that calls notify.
Calling notification through a task run by a worker. The code of the task executes in a different process from the process that started it.
The output was:
NOT DISPATCHED
called synchronously
DISPATCHED
DIRECT
directly called
There is no line from the print in the task on the output after DISPATCHED because that line ends up in the worker log:
[2015-12-17 07:23:57,527: WARNING/Worker-4] dispatched
Important note: I initially was using if current_task is None in the first test but it did not work. I checked and rechecked. Somehow Celery sets current_task to an object which looks like None (if you use repr on it, you get None) but is not None. Unsure what is going on there. Using if not current_task works.
Also, I've tested the code above in a Django application but I've not used it in production. There may be gotchas I don't know.

Categories