I want to implement slack slash command that has to process fucntion pipeline which takes roughly 30 seconds to process. Now since Slack slash commands only allows 3 seconds to respond, how to go about implementing this. I referred this but don't how to implement it.
Please hold up with me. I am doing this first time.
This is what I have tried. I know how to respond with ok status within 3 seconds but I don't understand how to again call pipeline
import requests
import json
from bottle import route, run, request
from S3_download import s3_download
from index import main_func
#route('/action')
def action():
pipeline()
return "ok"
def pipeline():
s3_download()
p = main_func()
print (p)
if __name__ == "__main__":
run(host='0.0.0.0', port=8082, debug=True)
I came across this article. Is using AWS lambda the only solution?
Can't we do this completely in python?
Something like this:
from boto import sqs
#route('/action', method='POST')
def action():
#retrieving all the required request example
params = request.forms.get('response_url')
sqs_queue = get_sqs_connection(queue_name)
message_object = sqs.message.Message()
message_object.set_body(params)
mail_queue.write(message_object)
return "request under process"
and you can have another process which processes the queue and call long running function:
sqs_queue = get_sqs_connection(queue_name)
for sqs_msg in sqs_queue.get_messages(10, wait_time_seconds=5):
processed_msg = json.loads(sqs_msg.get_body())
response = pipeline(processed_msg)
if response:
sqs_queue.delete_message(sqs_msg)
you can run this 2nd process maybe in a diff standalone python file, as a daemon process or cron.
I`v used sqs Amazon Queue here, but there are different options available.
You have an option or two for doing this in a single process, but it's fraught with peril. If you spin up a new Thread to handle the long process, you might end up deploying or crashing in the middle and losing it.
If durability is important to you, look into background-task workers like SQS, Lambda, or even a Celery task queue backed with Redis. A separate task has some interesting failure modes, and these tools will help you deal with them better than just spawning a thread.
Related
I'm currently creating a web app using Python Flask and I've run into a road block and I'm not sure if I'm even thinking about it correctly.
So my website's homepage is just a simple landing page with text input that is required to perform the websites function. What I am trying to accomplish is for the web app to perform two things after the text is input. First, the server takes the username input and performs a function that doesn't return anything to the user but creates a bunch of data that is logged into an sqlite database, and used later on in the process. Then, the server returns the web page for a survey that has to be taken after the username is input. However, the function that the server performs can take upwards of 2 minutes depending on the user. The way I currently have it coded, the server performs the function, then once it has finished, it returns the web page, so the user is stuck at a loading screen for up to 2 minutes.
#app.route("/survey")
def main(raw_user):
raw_user = request.args.get("SteamID") <
games = createGameDict(user_obj) <----- the function
tag_lst = get_tags(games) <
return render_template("survey_page.html")
Since the survey doesn't depend on the user input, instead of having the user sitting at a loading screen, I would like them to be able to start the survey while the functions works in the background, is that possible, and how would I do that?
Update: I've had to solve this problem a number of times in Flask, so I wrote a small Flask extension called Flask-Executor to do it for me. It's a wrapper for concurrent.futures that provides a few handy features, and is my preferred way of handling background tasks that don't require distribution in Flask.
For more complex background tasks, something like celery is your best bet. For simpler use cases however, what you want is the threading module.
Consider the following example:
from flask import Flask
from time import sleep
app = Flask(__name__)
def slow_function(some_object):
sleep(5)
print(some_object)
#app.route('/')
def index():
some_object = 'This is a test'
slow_function(some_object)
return 'hello'
if __name__ == '__main__':
app.run()
Here, we create a function, slow_function() that sleeps for five seconds before returning. When we call it in our route function it blocks the page load. Run the example and hit http://127.0.0.1:5000 in your browser, and you'll see the page wait five seconds before loading, after which the test message is printed in your terminal.
What we want to do is to put slow_function() on a different thread. With just a couple of additional lines of code, we can use the threading module to separate out the execution of this function onto a different thread:
from flask import Flask
from time import sleep
from threading import Thread
app = Flask(__name__)
def slow_function(some_object):
sleep(5)
print(some_object)
#app.route('/')
def index():
some_object = 'This is a test'
thr = Thread(target=slow_function, args=[some_object])
thr.start()
return 'hello'
if __name__ == '__main__':
app.run()
What we're doing here is simple. We're creating a new instance of Thread and passing it two things: the target, which is the function we want to run, and args, the argument(s) to be passed to the target function. Notice that there are no parentheses on slow_function, because we're not running it - functions are objects, so we're passing the function itself to Thread. As for args, this always expects a list. Even if you only have one argument, wrap it in a list so args gets what it's expecting.
With our thread ready to go, thr.start() executes it. Run this example in your browser, and you'll notice that the index route now loads instantly. But wait another five seconds and sure enough, the test message will print in your terminal.
Now, we could stop here - but in my opinion at least, it's a bit messy to actually have this threading code inside the route itself. What if you need to call this function in another route, or a different context? Better to separate it out into its own function. You could make threading behaviour a part of slow function itself, or you could make a "wrapper" function - which approach you take depends a lot on what you're doing and what your needs are.
Let's create a wrapper function, and see what it looks like:
from flask import Flask
from time import sleep
from threading import Thread
app = Flask(__name__)
def slow_function(some_object):
sleep(5)
print(some_object)
def async_slow_function(some_object):
thr = Thread(target=slow_function, args=[some_object])
thr.start()
return thr
#app.route('/')
def index():
some_object = 'This is a test'
async_slow_function(some_object)
return 'hello'
if __name__ == '__main__':
app.run()
The async_slow_function() function is doing pretty much exactly what we were doing before - it's just a bit neater now. You can call it in any route without having to rewrite your threading logic all over again. You'll notice that this function actually returns the thread - we don't need that for this example, but there are other things you might want to do with that thread later, so returning it makes the thread object available if you ever need it.
Okay so I have a relatively simple problem I think, and it's like I'm hitting a brick wall with it. I have a flask app, and a webpage that allows you to run a number of scripts on the server side using celery & redis(broker).
All I want to do, is when I start a task to give it a name/id (task will be portrayed as a button on the client side) i.e.
#app.route('/start_upgrade/<task_name>')
def start_upgrade(task_name):
example_task.delay(1, 2, task_name=task_name)
Then after the task has kicked off I want to see if the task is running/waiting/finished in a seperate request, preferably like;
#app.route('/check_upgrade_status/<task_name>')
def get_task_status(task_name):
task = celery.get_task_by_name(task_name)
task_state = task.state
return task_state # pseudocode
But I can't find anything like that in the docs. I am very new to celery though just FYI so assume I know nothing. Also just to be extra obvious, I need to be able to query the task state from python, no CLI commands please.
Any alternative methods of achieving my goal of querying the queue are also welcome.
I ended up figuring out a solution for my question from arthur's post.
In conjunction with redis I created these functions
import redis
from celery.result import AsyncResult
redis_cache = redis.StrictRedis(host='localhost', port=6379, db=0)
def check_task_status(task_name):
task_id = redis_cache.get(task_name)
return AsyncResult(task_id).status
def start_task(task, task_name, *args, **kwargs):
response = task.delay(*args, **kwargs)
redis_cache.set(task_name, response.id)
Which allows me to define specific names to tasks. Note I haven't actually tested this yet but it makes sense so.
Example usage;
start_task(example_task, "example_name", 1, 2)
When you start a task with delay or apply_async an object AsyncResultis created and contains the id of the task. To get it you just have to store it in a variable.
For example
#app.route('/start_upgrade/<task_name>')
def start_upgrade(task_name):
res = example_task.delay(1, 2, task_name=task_name)
print res.id
You can store this id and maybe associate it with something else in a database (or just print it like I did in the example).
Then you can check the status of your task in a python console with :
from celery.result import AsyncResult
AsyncResult(your_task_id).status
Take a look at the result documentation you should get what you need there : http://docs.celeryproject.org/en/latest/reference/celery.result.html
In Bash, it is possible to execute a command in the background by appending &. How can I do it in Python?
while True:
data = raw_input('Enter something: ')
requests.post(url, data=data) # Don't wait for it to finish.
print('Sending POST request...') # This should appear immediately.
Here's a hacky way to do it:
try:
requests.get("http://127.0.0.1:8000/test/",timeout=0.0000000001)
except requests.exceptions.ReadTimeout:
pass
Edit: for those of you that observed that this will not await a response - that is my understanding of the question "fire and forget... do not wait for it to finish". There are much more thorough and complete ways to do it with threads or async if you need response context, error handling, etc.
I use multiprocessing.dummy.Pool. I create a singleton thread pool at the module level, and then use pool.apply_async(requests.get, [params]) to launch the task.
This command gives me a future, which I can add to a list with other futures indefinitely until I'd like to collect all or some of the results.
multiprocessing.dummy.Pool is, against all logic and reason, a THREAD pool and not a process pool.
Example (works in both Python 2 and 3, as long as requests is installed):
from multiprocessing.dummy import Pool
import requests
pool = Pool(10) # Creates a pool with ten threads; more threads = more concurrency.
# "pool" is a module attribute; you can be sure there will only
# be one of them in your application
# as modules are cached after initialization.
if __name__ == '__main__':
futures = []
for x in range(10):
futures.append(pool.apply_async(requests.get, ['http://example.com/']))
# futures is now a list of 10 futures.
for future in futures:
print(future.get()) # For each future, wait until the request is
# finished and then print the response object.
The requests will be executed concurrently, so running all ten of these requests should take no longer than the longest one. This strategy will only use one CPU core, but that shouldn't be an issue because almost all of the time will be spent waiting for I/O.
Elegant solution from Andrew Gorcester. In addition, without using futures, it is possible to use the callback and error_callback attributes (see
doc) in order to perform asynchronous processing:
def on_success(r: Response):
if r.status_code == 200:
print(f'Post succeed: {r}')
else:
print(f'Post failed: {r}')
def on_error(ex: Exception):
print(f'Post requests failed: {ex}')
pool.apply_async(requests.post, args=['http://server.host'], kwargs={'json': {'key':'value'},
callback=on_success, error_callback=on_error))
According to the doc, you should move to another library :
Blocking Or Non-Blocking?
With the default Transport Adapter in place, Requests does not provide
any kind of non-blocking IO. The Response.content property will block
until the entire response has been downloaded. If you require more
granularity, the streaming features of the library (see Streaming
Requests) allow you to retrieve smaller quantities of the response at
a time. However, these calls will still block.
If you are concerned about the use of blocking IO, there are lots of
projects out there that combine Requests with one of Python’s
asynchronicity frameworks.
Two excellent examples are
grequests and
requests-futures.
Simplest and Most Pythonic Solution using threading
A Simple way to go ahead and send POST/GET or to execute any other function without waiting for it to finish is using the built-in Python Module threading.
import threading
import requests
def send_req():
requests.get("http://127.0.0.1:8000/test/")
for x in range(100):
threading.Thread(target=send_req).start() # start's a new thread and continues.
Other Important Features of threading
You can turn these threads into daemons using thread_obj.daemon = True
You can go ahead and wait for one to complete executing and then continue using thread_obj.join()
You can check if a thread is alive using thread_obj.is_alive() bool: True/False
You can even check the active thread count as well by threading.active_count()
Official Documentation
If you can write the code to be executed separately in a separate python program, here is a possible solution based on subprocessing.
Otherwise you may find useful this question and related answer: the trick is to use the threading library to start a separate thread that will execute the separated task.
A caveat with both approach could be the number of items (that's to say the number of threads) you have to manage. If the items in parent are too many, you may consider halting every batch of items till at least some threads have finished, but I think this kind of management is non-trivial.
For more sophisticated approach you can use an actor based approach, I have not used this library myself but I think it could help in that case.
from multiprocessing.dummy import Pool
import requests
pool = Pool()
def on_success(r):
print('Post succeed')
def on_error(ex):
print('Post requests failed')
def call_api(url, data, headers):
requests.post(url=url, data=data, headers=headers)
def pool_processing_create(url, data, headers):
pool.apply_async(call_api, args=[url, data, headers],
callback=on_success, error_callback=on_error)
TLDR;
To run an initialization function for each process that is spawned by celery, you can use the worker_process_init signal. As you can read in the docs, handlers for that signal should not be blocking for more than 4 seconds.
But what are the options, if I have to run an init function that takes more than 4 seconds to execute?
Problem
I use a C extension module to run certain operations within celery tasks. This module requires an initialization that might take several seconds (maybe 4 - 10). Since I would rather prefer not to run this init function for every task but for every process that is spawned, I made use of the worker_process_init signal:
#lib.py
import isclient #c extension module
client = None
def init():
global client
client = isclient.Client() #this might take a while
def create_ne_list(text):
return client.ne_receiventities4datachunk(text)
#celery.py
from celery import Celery
from celery.signals import worker_process_init
from lib import init
celery = Celery(include=[
'isc.ne.tasks'
])
celery.config_from_object('celeryconfig')
#worker_process_init.connect
def process_init(sender=None, conf=None, **kwargs):
init()
if __name__ == '__main__':
celery.start()
#tasks.py
from celery import celery
from lib import create_ne_list as cnl
#celery.task(time_limit=1200)
def create_ne_list(text):
return cnl(text)
What happens, when I run this code is what I described in my earlier question (Celery: stuck in infinitly repeating timeouts (Timed out waiting for UP message)). In short: since my init function takes longer than 4 seconds, it sometimes happens that a worker gets killed and restarted and during the restarting process gets killed again, because that's what automatically happens after 4 seconds unresponsiveness. This eventually results in an infinite repeating kill-and-restart process.
Another option would be to run my init function only once for every worker, using the signal worker_init. If I do that, I get a different problem: Now the queued up processes get stuck for some reason.
When I start the worker with a concurrency of 3, and then send a couple of tasks, the first three will get finished, the remaining ones won't get touched. (I assume it might have something to do with the fact, that the client objects needs to be shared between multiple processes and that the C extension, for some reasons, doesn't support that. But to be honest, I'm relatively new to muli-processing, so I can just guess)
Question
So, the question remains: How can I run an init function per process that takes longer than 4 seconds? Is there a correct way to do that and what way would that be?
Celery limits to process init timeout to 4.0 sec.
Check source code
To workaround this limit, you can consider change it before you create celery app
from celery.concurrency import asynpool
asynpool.PROC_ALIVE_TIMEOUT = 10.0 #set this long enough
Note that there is no configuration or setting to change this value.
#changhwan's answer is no longer the only method as of celery 4.4.0. Here is the pull request that added the config option for this feature.
Use the config option
With celery ^4.4.0, this value is configurable. Use the celery application config option worker_proc_alive_timeout. From the stable version docs:
worker_proc_alive_timeout
Default: 4.0.
The timeout in seconds (int/float) when waiting for a new worker process to start up.
Example:
from celery import Celery
from celery.signals import worker_process_init
app = Celery('app')
app.conf.worker_proc_alive_timeout = 10
#worker_process_init.connect
def long_init_function(*args, **kwargs):
import time
time.sleep(8)
I have a simple Flask web app that make many HTTP requests to an external service when a user push a button. On the client side I have an angularjs app.
The server side of the code look like this (using multiprocessing.dummy):
worker = MyWorkerClass()
pool = Pool(processes=10)
result_objs = [pool.apply_async(worker.do_work, (q,))
for q in queries]
pool.close() # Close pool
pool.join() # Wait for all task to finish
errors = not all(obj.successful() for obj in result_objs)
# extract result only from successful task
items = [obj.get() for obj in result_objs if obj.successful()]
As you can see I'm using apply_async because I want to later inspect each task and extract from them the result only if the task didn't raise any exception.
I understood that in order to show a progress bar on client side, I need to publish somewhere the number of completed tasks so I made a simple view like this:
#app.route('/api/v1.0/progress', methods=['GET'])
def view_progress():
return jsonify(dict(progress=session['progress']))
That will show the content of a session variable. Now, during the process, I need to update that variable with the number of completed tasks (the total number of tasks to complete is fixed and known).
Any ideas about how to do that? I working in the right direction?
I'have seen similar questions on SO like this one but I'm not able to adapt the answer to my case.
Thank you.
For interprocess communication you can use a multiprocessiong.Queue and your workers can put_nowait tuples with progress information on it while doing their work. Your main process can update whatever your view_progress is reading until all results are ready.
A bit like in this example usage of a Queue, with a few adjustments:
In the writers (workers) I'd use put_nowait instead of put because working is more important than waiting to report that you are working (but perhaps you judge otherwise and decide that informing the user is part of the task and should never be skipped).
The example just puts strings on the queue, I'd use collections.namedtuples for more structured messages. On tasks with many steps, this enables you to raise the resolution of you progress report, and report more to the user.
In general the approach you are taking is okay, I do it in a similar way.
To calculate the progress you can use an auxiliary function that counts the completed tasks:
def get_progress(result_objs):
done = 0
errors = 0
for r in result_objs:
if r.ready():
done += 1
if not r.successful():
errors += 1
return (done, errors)
Note that as a bonus this function returns how many of the "done" tasks ended in errors.
The big problem is for the /api/v1.0/progress route to find the array of AsyncResult objects.
Unfortunately AsyncResult objects cannot be serialized to a session, so that option is out. If your application supports a single set of async tasks at a time then you can just store this array as a global variable. If you need to support multiple clients, each with a different set of async tasks, then you will need figure out a strategy to keep client session data in the server.
I implemented the single client solution as a quick test. My view functions are as follows:
results = None
#app.route('/')
def index():
global results
results = [pool.apply_async(do_work) for n in range(20)]
return render_template('index.html')
#app.route('/api/v1.0/progress')
def progress():
global results
total = len(results)
done, errored = get_progress(results)
return jsonify({'total': total, 'done': done, 'errored': errored})
I hope this helps!
I think you should be able to update the number of completed tasks using multiprocessing.Value and multiprocessing.Lock.
In your main code, use:
processes=multiprocessing.Value('i', 10)
lock=multiprocessing.Lock()
And then, when you call worker.dowork, pass a lock object and the value to it:
worker.dowork(lock, processes)
In your worker.dowork code, decrease "processes" by one when the code is finished:
lock.acquire()
processes.value-=1
lock.release()
Now, "processes.value" should be accessible from your main code, and be equal to the number of remaining processes. Make sure you acquire the lock before acessing processes.value, and release the lock afterwards