I am trying to schedule tasks using Celery and Python for a Flask app. I basically want to run a function in another directory every x amount of time and make it a celery task. I import the function test_check and I try and put it under a celery task called testcheck(), however, I get the error:
working outside of application context
How can I fix this? Here is my setup:
from app import app
from celery import Celery
from datetime import timedelta
from app.mod_check.views import test_check
celery = Celery(__name__,
broker='amqp://guest:#localhost/',
backend='amqp://guest:#localhost/'
)
celery.config_from_object(__name__)
#celery.task
def add(x, y):
print "celery working!"
return x + y
#celery.task
def testcheck():
test_check()
CELERYBEAT_SCHEDULE = {
'add-every-30-seconds': {
'task': 'tasks2.testcheck',
'schedule': timedelta(seconds=5),
#'args': (16, 16)
},
}
CELERY_TIMEZONE = 'Europe/London'
Whatever test_check is, it does something that needs a request context. Since Celery tasks are not part of the HTTP request/response cycle, you need to set up a request context manually.
with app.test_request_context():
test_check()
Related
I'm using celery and celery-beat without Django and I have a task which needs to modify celery-beat schedule when run.
Now I have the following code (module called celery_tasks):
# __init__.py
from .celery import app as celery_app
__all__ = ['celery_app']
#celery.py
from celery import Celery
import config
celery_config = config.get_celery_config()
app = Celery(
__name__,
include=[
'celery_tasks.tasks',
],
)
app.conf.update(celery_config)
# tasks.py
from celery_tasks import celery_app
from celery import shared_task
#shared_task
def start_game():
celery_app.conf.beat_schedule = {
'process_round': {
'task': 'celery_tasks.tasks.process_round',
'schedule': 5,
},
}
I start celery with the following command:
celery worker -A celery_tasks -E -l info --beat
start_game executes and exists normally, but beat process_round task never runs.
How can I force-reload beat schedule (restarting all workers doesn't seem as a good idea)?
the problem with normal celery backend when you start the celerybeat process. it will create a config file and write all tasks and schedules in to that file so it cannot change dynamically
you can use the package
celerybeat-sqlalchemy-scheduler so you can edit schedule on DB itself so that celerybeat will pickup the new schedule from DB itself
also there is another package celery-redbeat which using redis-server as backend
you can refer this this also
Using schedule config also seems bad idea. What if initially process_round task will be active and check if game is not started task just do nothing.
I'd like to call generate_async_audio_service from a view and have it asynchronously generate audio files for the list of words using a threading pool and then commit them to a database.
I keep running into an error that I'm working out of the application context even though I'm creating a new polly and s3 instance each time.
How can I generate/upload multiple audio files at once?
from flask import current_app,
from multiprocessing.pool import ThreadPool
from Server.database import db
import boto3
import io
import uuid
def upload_audio_file_to_s3(file):
app = current_app._get_current_object()
with app.app_context():
s3 = boto3.client(service_name='s3',
aws_access_key_id=app.config.get('BOTO3_ACCESS_KEY'),
aws_secret_access_key=app.config.get('BOTO3_SECRET_KEY'))
extension = file.filename.rsplit('.', 1)[1].lower()
file.filename = f"{uuid.uuid4().hex}.{extension}"
s3.upload_fileobj(file,
app.config.get('S3_BUCKET'),
f"{app.config.get('UPLOADED_AUDIO_FOLDER')}/{file.filename}",
ExtraArgs={"ACL": 'public-read', "ContentType": file.content_type})
return file.filename
def generate_polly(voice_id, text):
app = current_app._get_current_object()
with app.app_context():
polly_client = boto3.Session(
aws_access_key_id=app.config.get('BOTO3_ACCESS_KEY'),
aws_secret_access_key=app.config.get('BOTO3_SECRET_KEY'),
region_name=app.config.get('AWS_REGION')).client('polly')
response = polly_client.synthesize_speech(VoiceId=voice_id,
OutputFormat='mp3', Text=text)
return response['AudioStream'].read()
def generate_polly_from_term(vocab_term, gender='m'):
app = current_app._get_current_object()
with app.app_context():
audio = generate_polly('Celine', vocab_term.term)
file = io.BytesIO(audio)
file.filename = 'temp.mp3'
file.content_type = 'mp3'
return vocab_term.id, upload_audio_file_to_s3(file)
def generate_async_audio_service(terms):
pool = ThreadPool(processes=12)
results = pool.map(generate_polly_from_term, terms)
# do something w/ results
This is not necessarily a fleshed-out answer, but rather than putting things into comments I'll put it here.
Celery is a task manager for python. The reason you would want to use this is if you have tasks pinging Flask, but they take longer to finish than the interval of tasks coming in, then certain tasks will be blocked and you won't get all of your results. To fix this, you hand it to another process. This goes like so:
1) Client sends a request to Flask to process audio files
2) The files land in Flask to be processed, Flask will send an asyncronous task to Celery.
3) Celery is notified of the task and stores its state in some sort of messaging system (RabbitMQ and Redis are the canonical examples)
4) Flask is now unburdened from that task and can receive more
5) Celery finishes the task, including the upload to your database
Celery and Flask are then two separate python processes communicating with one another. That should satisfy your multithreaded approach. You can also retrieve the state from a task through Flask if you want the client to verify that the task was/was not completed. The route in your Flask app.py would look like:
#app.route('/my-route', methods=['POST'])
def process_audio():
# Get your files and save to common temp storage
save_my_files(target_dir, files)
response = celery_app.send_tast('celery_worker.files', args=[target_dir])
return jsonify({'task_id': response.task_id})
Where celery_app comes from another module worker.py:
import os
from celery import Celery
env = os.environ
# This is for a rabbitMQ backend
CELERY_BROKER_URL = env.get('CELERY_BROKER_URL', 'amqp://0.0.0.0:5672/0')
CELERY_RESULT_BACKEND = env.get('CELERY_RESULT_BACKEND', 'rpc://')
celery_app = Celery('tasks', broker=CELERY_BROKER_URL, backend=CELERY_RESULT_BACKEND)
Then, your celery process would have a worker configured something like:
from celery import Celery
from celery.signals import after_task_publish
env = os.environ
CELERY_BROKER_URL = env.get('CELERY_BROKER_URL')
CELERY_RESULT_BACKEND = env.get('CELERY_RESULT_BACKEND', 'rpc://')
# Set celery_app with name 'tasks' using the above broker and backend
celery_app = Celery('tasks', broker=CELERY_BROKER_URL, backend=CELERY_RESULT_BACKEND)
#celery_app.task(name='celery_worker.files')
def async_files(path):
# Get file from path
# Process
# Upload to database
# This is just if you want to return an actual result, you can fill this in with whatever
return {'task_state': "FINISHED"}
This is relatively basic, but could serve as a starting point. I will say that some of Celery's behavior and setup is not always the most intuitive, but this will leave your flask app available to whoever wants to send files to it without blocking anything else.
Hopefully that's somewhat helpful
I'm running a Celery worker in Python with the celery module v3.1.25, and a Celery client in node.js with the node-celery npm package v0.2.7 (not the latest).
The Python Celery worker works fine when sending a job using a Python Celery client.
Problem: When using a node-celery client to send a task to the Celery backend, we get an error in the JS console:
(STDERR) Celery should be configured with json serializer
Python Celery worker is configured with:
app = Celery('tasks',
broker='amqp://test:test#192.168.1.26:5672//',
backend='amqp://',
task_serializer='json',
include=['proj.tasks'])
node-celery client is configured with:
var celery = require('node-celery')
var client = celery.createClient({
CELERY_BROKER_URL: 'amqp://test:test#192.168.1.26:5672//',
CELERY_RESULT_BACKEND: 'amqp',
CELERY_TASK_SERIALIZER: "json"
});
client.on('connect', function() {
console.log('connected');
client.call('proj.tasks.getPriceEstimates', [start_latitude, start_longitude],
function(result) {
console.log('result: ', result);
client.end();
})
});
Is this a problem with the configuration on the Python Celery worker? Did we miss out on a configuration parameter which can change the return serialization format to json?
Update
Updated with result_serializers and accept_content parameters as suggested by ChillarAnand
from __future__ import absolute_import, unicode_literals
from celery import Celery
app = Celery('tasks',
broker='amqp://test:test#192.168.1.26:5672//',
backend='amqp://',
task_serializer='json',
result_serializer='json',
accept_content=['application/json'],
include=['proj.tasks'])
But node.js Celery client still thinks that its not in json, throwing the same error message.
It gives that error because the results were in the form of 'application/x-python-serialize'.
Checked this to be the case, as RabbitMQ management console shows the results to be content_type: application/x-python-serialize
This forum post says that it is because the tasks were created before the configs were loaded.
Here are how my files are like:
proj/celery.py
from __future__ import absolute_import, unicode_literals
from celery import Celery
app = Celery('tasks',
broker='amqp://test:test#192.168.1.26:5672//',
backend='amqp://',
task_serializer='json',
result_serializer='json',
accept_content=['application/json'],
include=['proj.tasks'])
proj/tasks.py
from __future__ import absolute_import, unicode_literals
from .celery import app
#app.task
def myTask():
...
return ...
Is there a better way to structure the code to ensure that the configs are loaded before the tasks?
When configuring a serializer, you should specify content type, task serializer and result serializer as well.
app = Celery(
broker='amqp://guest#localhost//',
backend='amqp://',
include=['proj.tasks'],
task_serializer='json',
result_serializer='json',
accept_content = ['application/json'],
)
I'm using Celery 4.0.1 with Django 1.10 and I have troubles scheduling tasks (running a task works fine). Here is the celery configuration:
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'myapp.settings')
app = Celery('myapp')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
app.conf.BROKER_URL = 'amqp://{}:{}#{}'.format(settings.AMQP_USER, settings.AMQP_PASSWORD, settings.AMQP_HOST)
app.conf.CELERY_DEFAULT_EXCHANGE = 'myapp.celery'
app.conf.CELERY_DEFAULT_QUEUE = 'myapp.celery_default'
app.conf.CELERY_TASK_SERIALIZER = 'json'
app.conf.CELERY_ACCEPT_CONTENT = ['json']
app.conf.CELERY_IGNORE_RESULT = True
app.conf.CELERY_DISABLE_RATE_LIMITS = True
app.conf.BROKER_POOL_LIMIT = 2
app.conf.CELERY_QUEUES = (
Queue('myapp.celery_default'),
Queue('myapp.queue1'),
Queue('myapp.queue2'),
Queue('myapp.queue3'),
)
Then in tasks.py I have:
#app.task(queue='myapp.queue1')
def my_task(some_id):
print("Doing something with", some_id)
In views.py I want to schedule this task:
def my_view(request, id):
app.add_periodic_task(10, my_task.s(id))
Then I execute the commands:
sudo systemctl start rabbitmq.service
celery -A myapp.celery_app beat -l debug
celery worker -A myapp.celery_app
But the task is never scheduled. I don't see anything in the logs. The task is working because if in my view I do:
def my_view(request, id):
my_task.delay(id)
The task is executed.
If in my configuration file if I schedule the task manually, like this it works:
app.conf.CELERYBEAT_SCHEDULE = {
'add-every-30-seconds': {
'task': 'tasks.my_task',
'schedule': 10.0,
'args': (66,)
},
}
I just can't schedule the task dynamically. Any idea?
EDIT: (13/01/2018)
The latest release 4.1.0 have addressed the subject in this ticket #3958 and has been merged
Actually you can't not define periodic task at the view level, because the beat schedule setting will be loaded first and can not be rescheduled at runtime:
The add_periodic_task() function will add the entry to the beat_schedule setting behind the scenes, and the same setting can also can be used to set up periodic tasks manually:
app.conf.CELERYBEAT_SCHEDULE = {
'add-every-30-seconds': {
'task': 'tasks.my_task',
'schedule': 10.0,
'args': (66,)
},
}
which means if you want to use add_periodic_task() it should be wrapped within an on_after_configure handler at the celery app level and any modification on runtime will not take effect:
app = Celery()
#app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
sender.add_periodic_task(10, my_task.s(66))
As mentioned in the doc the the regular celerybeat simply keep track of task execution:
The default scheduler is the celery.beat.PersistentScheduler, that simply keeps track of the last run times in a local shelve database file.
In order to be able to dynamically manage periodic tasks and reschedule celerybeat at runtime:
There’s also the django-celery-beat extension that stores the schedule in the Django database, and presents a convenient admin interface to manage periodic tasks at runtime.
The tasks will be persisted in django database and the scheduler could be updated in task model at the db level. Whenever you update a periodic task a counter in this tasks table will be incremented, and tells the celery beat service to reload the schedule from the database.
A possible solution for you could be as follow:
from django_celery_beat.models import PeriodicTask, IntervalSchedule
schedule= IntervalSchedule.objects.create(every=10, period=IntervalSchedule.SECONDS)
task = PeriodicTask.objects.create(interval=schedule, name='any name', task='tasks.my_task', args=json.dumps([66]))
views.py
def update_task_view(request, id)
task = PeriodicTask.objects.get(name="task name") # if we suppose names are unique
task.args=json.dumps([id])
task.save()
I have a Flask web hosting with no access to cron command.
How can I execute some Python function every hour?
You can use BackgroundScheduler() from APScheduler package (v3.5.3):
import time
import atexit
from apscheduler.schedulers.background import BackgroundScheduler
def print_date_time():
print(time.strftime("%A, %d. %B %Y %I:%M:%S %p"))
scheduler = BackgroundScheduler()
scheduler.add_job(func=print_date_time, trigger="interval", seconds=60)
scheduler.start()
# Shut down the scheduler when exiting the app
atexit.register(lambda: scheduler.shutdown())
Note that two of these schedulers will be launched when Flask is in debug mode. For more information, check out this question.
I'm a little bit new with the concept of application schedulers, but what I found here for APScheduler v3.3.1 , it's something a little bit different. I believe that for the newest versions, the package structure, class names, etc., have changed, so I'm putting here a fresh solution which I made recently, integrated with a basic Flask application:
#!/usr/bin/python3
""" Demonstrating Flask, using APScheduler. """
from apscheduler.schedulers.background import BackgroundScheduler
from flask import Flask
def sensor():
""" Function for test purposes. """
print("Scheduler is alive!")
sched = BackgroundScheduler(daemon=True)
sched.add_job(sensor,'interval',minutes=60)
sched.start()
app = Flask(__name__)
#app.route("/home")
def home():
""" Function for test purposes. """
return "Welcome Home :) !"
if __name__ == "__main__":
app.run()
I'm also leaving this Gist here, if anyone have interest on updates for this example.
Here are some references, for future readings:
APScheduler Doc: https://apscheduler.readthedocs.io/en/latest/
daemon=True: https://docs.python.org/3.4/library/threading.html#thread-objects
You could make use of APScheduler in your Flask application and run your jobs via its interface:
import atexit
# v2.x version - see https://stackoverflow.com/a/38501429/135978
# for the 3.x version
from apscheduler.scheduler import Scheduler
from flask import Flask
app = Flask(__name__)
cron = Scheduler(daemon=True)
# Explicitly kick off the background thread
cron.start()
#cron.interval_schedule(hours=1)
def job_function():
# Do your work here
# Shutdown your cron thread if the web process is stopped
atexit.register(lambda: cron.shutdown(wait=False))
if __name__ == '__main__':
app.run()
I've tried using flask instead of a simple apscheduler what you need to install is
pip3 install flask_apscheduler
Below is the sample of my code:
from flask import Flask
from flask_apscheduler import APScheduler
app = Flask(__name__)
scheduler = APScheduler()
def scheduleTask():
print("This test runs every 3 seconds")
if __name__ == '__main__':
scheduler.add_job(id = 'Scheduled Task', func=scheduleTask, trigger="interval", seconds=3)
scheduler.start()
app.run(host="0.0.0.0")
For a simple solution, you could add a route such as
#app.route("/cron/do_the_thing", methods=['POST'])
def do_the_thing():
logging.info("Did the thing")
return "OK", 200
Then add a unix cron job that POSTs to this endpoint periodically. For example to run it once a minute, in terminal type crontab -e and add this line:
* * * * * /opt/local/bin/curl -X POST https://YOUR_APP/cron/do_the_thing
(Note that the path to curl has to be complete, as when the job runs it won't have your PATH. You can find out the full path to curl on your system by which curl)
I like this in that it's easy to test the job manually, it has no extra dependencies and as there isn't anything special going on it is easy to understand.
Security
If you'd like to password protect your cron job, you can pip install Flask-BasicAuth, and then add the credentials to your app configuration:
app = Flask(__name__)
app.config['BASIC_AUTH_REALM'] = 'realm'
app.config['BASIC_AUTH_USERNAME'] = 'falken'
app.config['BASIC_AUTH_PASSWORD'] = 'joshua'
To password protect the job endpoint:
from flask_basicauth import BasicAuth
basic_auth = BasicAuth(app)
#app.route("/cron/do_the_thing", methods=['POST'])
#basic_auth.required
def do_the_thing():
logging.info("Did the thing a bit more securely")
return "OK", 200
Then to call it from your cron job:
* * * * * /opt/local/bin/curl -X POST https://falken:joshua#YOUR_APP/cron/do_the_thing
You could try using APScheduler's BackgroundScheduler to integrate interval job into your Flask app. Below is the example that uses blueprint and app factory (init.py) :
from datetime import datetime
# import BackgroundScheduler
from apscheduler.schedulers.background import BackgroundScheduler
from flask import Flask
from webapp.models.main import db
from webapp.controllers.main import main_blueprint
# define the job
def hello_job():
print('Hello Job! The time is: %s' % datetime.now())
def create_app(object_name):
app = Flask(__name__)
app.config.from_object(object_name)
db.init_app(app)
app.register_blueprint(main_blueprint)
# init BackgroundScheduler job
scheduler = BackgroundScheduler()
# in your case you could change seconds to hours
scheduler.add_job(hello_job, trigger='interval', seconds=3)
scheduler.start()
try:
# To keep the main thread alive
return app
except:
# shutdown if app occurs except
scheduler.shutdown()
Hope it helps :)
Ref :
https://github.com/agronholm/apscheduler/blob/master/examples/schedulers/background.py
Another alternative might be to use Flask-APScheduler which plays nicely with Flask, e.g.:
Loads scheduler configuration from Flask configuration,
Loads job definitions from Flask configuration
More information here:
https://pypi.python.org/pypi/Flask-APScheduler
You may use flask-crontab module, which is quite easy.
Step 1: pip install flask-crontab
Step 2:
from flask import Flask
from flask_crontab import Crontab
app = Flask(__name__)
crontab = Crontab(app)
Step 3:
#crontab.job(minute="0", hour="6", day="*", month="*", day_of_week="*")
def my_scheduled_job():
do_something()
Step 4: On cmd, hit
flask crontab add
Done. now simply run your flask application, and you can check your function will call at 6:00 every day.
You may take reference from Here (Official DOc).
A complete example using schedule and multiprocessing, with on and off control and parameter to run_job()
the return codes are simplified and interval is set to 10sec, change to every(2).hour.do()for 2hours. Schedule is quite impressive it does not drift and I've never seen it more than 100ms off when scheduling. Using multiprocessing instead of threading because it has a termination method.
#!/usr/bin/env python3
import schedule
import time
import datetime
import uuid
from flask import Flask, request
from multiprocessing import Process
app = Flask(__name__)
t = None
job_timer = None
def run_job(id):
""" sample job with parameter """
global job_timer
print("timer job id={}".format(id))
print("timer: {:.4f}sec".format(time.time() - job_timer))
job_timer = time.time()
def run_schedule():
""" infinite loop for schedule """
global job_timer
job_timer = time.time()
while 1:
schedule.run_pending()
time.sleep(1)
#app.route('/timer/<string:status>')
def mytimer(status, nsec=10):
global t, job_timer
if status=='on' and not t:
schedule.every(nsec).seconds.do(run_job, str(uuid.uuid4()))
t = Process(target=run_schedule)
t.start()
return "timer on with interval:{}sec\n".format(nsec)
elif status=='off' and t:
if t:
t.terminate()
t = None
schedule.clear()
return "timer off\n"
return "timer status not changed\n"
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)
You test this by just issuing:
$ curl http://127.0.0.1:5000/timer/on
timer on with interval:10sec
$ curl http://127.0.0.1:5000/timer/on
timer status not changed
$ curl http://127.0.0.1:5000/timer/off
timer off
$ curl http://127.0.0.1:5000/timer/off
timer status not changed
Every 10sec the timer is on it will issue a timer message to console:
127.0.0.1 - - [18/Sep/2018 21:20:14] "GET /timer/on HTTP/1.1" 200 -
timer job id=b64ed165-911f-4b47-beed-0d023ead0a33
timer: 10.0117sec
timer job id=b64ed165-911f-4b47-beed-0d023ead0a33
timer: 10.0102sec
You might want to use some queue mechanism with scheduler like RQ scheduler or something more heavy like Celery (most probably an overkill).