Configure Celery with JSON Serializer (Python + node.js) - python

I'm running a Celery worker in Python with the celery module v3.1.25, and a Celery client in node.js with the node-celery npm package v0.2.7 (not the latest).
The Python Celery worker works fine when sending a job using a Python Celery client.
Problem: When using a node-celery client to send a task to the Celery backend, we get an error in the JS console:
(STDERR) Celery should be configured with json serializer
Python Celery worker is configured with:
app = Celery('tasks',
broker='amqp://test:test#192.168.1.26:5672//',
backend='amqp://',
task_serializer='json',
include=['proj.tasks'])
node-celery client is configured with:
var celery = require('node-celery')
var client = celery.createClient({
CELERY_BROKER_URL: 'amqp://test:test#192.168.1.26:5672//',
CELERY_RESULT_BACKEND: 'amqp',
CELERY_TASK_SERIALIZER: "json"
});
client.on('connect', function() {
console.log('connected');
client.call('proj.tasks.getPriceEstimates', [start_latitude, start_longitude],
function(result) {
console.log('result: ', result);
client.end();
})
});
Is this a problem with the configuration on the Python Celery worker? Did we miss out on a configuration parameter which can change the return serialization format to json?
Update
Updated with result_serializers and accept_content parameters as suggested by ChillarAnand
from __future__ import absolute_import, unicode_literals
from celery import Celery
app = Celery('tasks',
broker='amqp://test:test#192.168.1.26:5672//',
backend='amqp://',
task_serializer='json',
result_serializer='json',
accept_content=['application/json'],
include=['proj.tasks'])
But node.js Celery client still thinks that its not in json, throwing the same error message.
It gives that error because the results were in the form of 'application/x-python-serialize'.
Checked this to be the case, as RabbitMQ management console shows the results to be content_type: application/x-python-serialize
This forum post says that it is because the tasks were created before the configs were loaded.
Here are how my files are like:
proj/celery.py
from __future__ import absolute_import, unicode_literals
from celery import Celery
app = Celery('tasks',
broker='amqp://test:test#192.168.1.26:5672//',
backend='amqp://',
task_serializer='json',
result_serializer='json',
accept_content=['application/json'],
include=['proj.tasks'])
proj/tasks.py
from __future__ import absolute_import, unicode_literals
from .celery import app
#app.task
def myTask():
...
return ...
Is there a better way to structure the code to ensure that the configs are loaded before the tasks?

When configuring a serializer, you should specify content type, task serializer and result serializer as well.
app = Celery(
broker='amqp://guest#localhost//',
backend='amqp://',
include=['proj.tasks'],
task_serializer='json',
result_serializer='json',
accept_content = ['application/json'],
)

Related

Celery jobs not running on heroku (python/django app)

I have a Django app setup with some scheduled tasks. The app is deployed on Heroku with Redis. The task runs if invoked synchronously in the console, or locally when I also have redis and celery running. However, the scheduled jobs are not running on Heroku.
My task:
#shared_task(name="send_emails")
def send_emails():
.....
celery.py:
from __future__ import absolute_import, unicode_literals
import os
from celery import Celery
from celery.schedules import crontab
# set the default Django settings module for the 'celery' program.
# this is also used in manage.py
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'my_app.settings')
# Get the base REDIS URL, default to redis' default
BASE_REDIS_URL = os.environ.get('REDIS_URL', 'redis://localhost:6379')
app = Celery('my_app')
# Using a string here means the worker don't have to serialize
# the configuration object to child processes.
# - namespace='CELERY' means all celery-related configuration keys
# should have a `CELERY_` prefix.
app.config_from_object('django.conf:settings', namespace='CELERY')
# Load task modules from all registered Django app configs.
app.autodiscover_tasks()
app.conf.broker_url = BASE_REDIS_URL
# this allows you to schedule items in the Django admin.
app.conf.beat_scheduler = 'django_celery_beat.schedulers.DatabaseScheduler'
# These are the scheduled jobs
app.conf.beat_schedule = {
'send_emails_crontab': {
'task': 'send_emails',
'schedule': crontab(hour=9, minute=0),
'args': (),
}
}
In Procfile:
worker: celery -A my_app worker --beat -S django -l info
I've spun up the worker with heroku ps:scale worker=1 -a my-app.
I can see the registered tasks under [tasks] in the worker logs.
However, the scheduled tasks are not running at their scheduled time. Calling send_emails.delay() in the production console does work.
How do I get the worker to stay alive and / or run the job at the scheduled time?
I have a workaround using a command and heroku scheduler. Just unsure if that's the best way to do it.
If you're on free demo, you should know that heroku server sleeps and if your scheduled task becomes due when your server is sleeping, it won't run.
I share you any ideas.
Run console and get the datetime of Dyno. The Dyno use a localtime US.
The DynoFree sleeps each 30 minutes and only 450 hours/month.
Try change celery to BackgroundScheduler,
you need add a script clock.py as:
from myapp import myfunction
from datetime import datetime
from apscheduler.schedulers.background import BackgroundScheduler
from apscheduler.schedulers.blocking import BlockingScheduler
from time import monotonic, sleep, ctime
import os
sched = BlockingScheduler()
hour = int(os.environ.get("SEARCH_HOUR"))
minutes = int(os.environ.get("SEARCH_MINUTES"))
#sched.scheduled_job('cron', day_of_week='mon-sun', hour=hour, minute = minutes)
def scheduled_job():
print('This job: Execute myfunction every at ', hour, ':', minutes)
#My function
myfunction()
sched.start(
)
In Procfile:
clock: python clock.py
and run:
heroku ps:scale clock=1 --app thenameapp
Regards.

Celery not publishing message to non-task based RabbitMQ Queue

I'm trying to connect a Django Rest Framework backend to RabbitMQ using Celery. I have a few micro-services that use RabbitMQ as a message bus between the backend and those services. When I have the backend call a task that pushes a message to the micro-services message bus the message is never put into the queue for one of those services to grab. The queues I declare in my tasks.py are being created in RabbitMQ but are not being used by any of the producers.
testcelery/celery.py
...
from celery import Celery
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "testcelery.settings")
app = Celery("testcelery")
app.config_from_object("django.conf:settings", namespace="CELERY")
app.autodiscover_tasks()
testcelery/settings.py
...
installed_apps = [
...
"backend_processing"
]
CELERY_BROKER_URL = "amqp://testuser#rabbitmq_host:5672"
CELERY_ACCEPT_CONTENT = ["application/json"]
CELERY_TASK_SERIALIZER = "json"
backend_processing/tasks.py
...
from celery import shared_task
from testcelery.celery import app as celery_app
from kombu import Exchange, Queue
test_queue = Queue("test_queue", Exchange("test"), "test", durable=True)
#shared_task
def run_processing(data, producer=None):
with celery_app.producer_or_acquire(producer) as producer:
producer.publish(data,
declare=[test_queue],
retry=True,
exchange=test_queue.exchange,
routing_key=test_queue.routing_key,
delivery_mode="persistent")
What do I need to modify in my configuration to allow celery to publish to queues other than the ones given to celery by the settings.py?

Reload celery beat config

I'm using celery and celery-beat without Django and I have a task which needs to modify celery-beat schedule when run.
Now I have the following code (module called celery_tasks):
# __init__.py
from .celery import app as celery_app
__all__ = ['celery_app']
#celery.py
from celery import Celery
import config
celery_config = config.get_celery_config()
app = Celery(
__name__,
include=[
'celery_tasks.tasks',
],
)
app.conf.update(celery_config)
# tasks.py
from celery_tasks import celery_app
from celery import shared_task
#shared_task
def start_game():
celery_app.conf.beat_schedule = {
'process_round': {
'task': 'celery_tasks.tasks.process_round',
'schedule': 5,
},
}
I start celery with the following command:
celery worker -A celery_tasks -E -l info --beat
start_game executes and exists normally, but beat process_round task never runs.
How can I force-reload beat schedule (restarting all workers doesn't seem as a good idea)?
the problem with normal celery backend when you start the celerybeat process. it will create a config file and write all tasks and schedules in to that file so it cannot change dynamically
you can use the package
celerybeat-sqlalchemy-scheduler so you can edit schedule on DB itself so that celerybeat will pickup the new schedule from DB itself
also there is another package celery-redbeat which using redis-server as backend
you can refer this this also
Using schedule config also seems bad idea. What if initially process_round task will be active and check if game is not started task just do nothing.

Flask and Celery on Heroku: sqlalchemy.exc.DatabaseError: (psycopg2.DatabaseError) SSL error: decryption failed or bad record mac

I'm trying to deploy a flask app on heroku that uses background tasks in Celery. I've implemented the application factory pattern so that the celery processes are not bound to any one instance of the flask app.
This works locally, and I have yet to see an error. But when deployed to heroku, the same results always occur: the celery task (I'm only using one) succeeds the first time it is run, but any subsequent celery calls to that task fail with sqlalchemy.exc.DatabaseError: (psycopg2.DatabaseError) SSL error: decryption failed or bad record mac. If I restart the celery worker, the cycle continues.
There are multiple issues that show this same error, but none specify a proper solution. I initially believed implementing the application factory pattern would have prevented this error from manifesting, but it's not quite there.
In app/__init__.py I create the celery and db objects:
celery = Celery(__name__, broker=Config.CELERY_BROKER_URL)
db = SQLAlchemy()
def create_app(config_name):
app = Flask(__name__)
app.config.from_object(config[config_name])
db.init_app(app)
return app
My flask_celery.py file creates the actual Flask app object:
import os
from app import celery, create_app
app = create_app(os.getenv('FLASK_CONFIG', 'default'))
app.app_context().push()
And I start celery with this command:
celery worker -A app.flask_celery.celery --loglevel=info
This is what the actual celery task looks like:
#celery.task()
def task_process_stuff(stuff_id):
stuff = Stuff.query.get(stuff_id)
stuff.processed = True
db.session.add(stuff)
db.session.commit()
return stuff
Which is invoked by:
task_process_stuff.apply_async(args=[stuff.id], countdown=10)
Library Versions
Flask 0.12.2
SQLAlchemy 1.1.11
Flask-SQLAlchemy 2.2
Celery 4.0.2
The solution was to add db.engine.dispose() at the beginning of the task, disposing of all db connections before any work begins:
#celery.task()
def task_process_stuff(stuff_id):
db.engine.dispose()
stuff = Stuff.query.get(stuff_id)
stuff.processed = True
db.session.commit()
return stuff
As I need this functionality across all of my tasks, I added it to task_prerun:
#task_prerun.connect
def on_task_init(*args, **kwargs):
db.engine.dispose()

Celery Consumer SQS Messages

I am new to Celery and SQS, and would like to use it to periodically check messages stored in SQS and then fire a consumer. The consumer and Celery both live on EC2, while the messages are sent from GAE using boto library.
Currently, I am confused about:
In the message body of creating_msg_gae.py, what task information I should put here? I assume this information would be the name of my celery task?
In the message body of creating_msg_gae.py, is url considered as the argument to be processed by my consumer (function do_something_url(url) in tasks.py)?
Currently, I am running celery with command celery worker -A celery_c -l info, from the command line, it seems like celery checks SQS periodically. Do I need to create a PeriodicTask in Celery instead?
I really appreciate any suggestions to help me with this issue.
creating_msg_gae.py
from boto import sqs
conn = sqs.connect_to_region("us-east-1",
aws_access_key_id='aaa',
aws_secret_access_key='bbb')
my_queue = conn.get_queue('uber_batch')
msg = {'properties': {'content_type': 'application/json',
'content_encoding': 'utf-8',
'body_encoding':'base64',
'delivery_tag':None,
'delivery_info': {'exchange':None, 'routing_key':None}},}
body = {'id':'theid',
###########Question 1#######
'task':'what task name I should put here?',
'url':['my_s3_address']}
msg.update({'body':base64.encodestring(json.dumps(body))})
my_queue.write(my_queue.new_message(json.dumps(msg)))
My Celery file system looks like:
./ce_folder/
celery_c.py, celeryconfig.py, tasks.py, __init__.py
celeryconfig.py
import os
BROKER_BACKEND = "SQS"
AWS_ACCESS_KEY_ID = 'aaa'
AWS_SECRET_ACCESS_KEY = 'bbb'
os.environ.setdefault("AWS_ACCESS_KEY_ID", AWS_ACCESS_KEY_ID)
os.environ.setdefault("AWS_SECRET_ACCESS_KEY", AWS_SECRET_ACCESS_KEY)
BROKER_URL = 'sqs://'
BROKER_TRANSPORT_OPTIONS = {'region': 'us-east-1'}
BROKER_TRANSPORT_OPTIONS = {'visibility_timeout': 60}
BROKER_TRANSPORT_OPTIONS = {'polling_interval': 30}
CELERY_DEFAULT_QUEUE = 'uber_batch'
CELERY_DEFAULT_EXCHANGE = CELERY_DEFAULT_QUEUE
CELERY_DEFAULT_EXCHANGE_TYPE = CELERY_DEFAULT_QUEUE
CELERY_DEFAULT_ROUTING_KEY = CELERY_DEFAULT_QUEUE
CELERY_QUEUES = {
CELERY_DEFAULT_QUEUE: {
'exchange': CELERY_DEFAULT_QUEUE,
'binding_key': CELERY_DEFAULT_QUEUE,
}
}
celery_c.py
from __future__ import absolute_import
from celery import Celery
app = Celery('uber')
app.config_from_object('celeryconfig')
if __name__ == '__main__':
app.start()
tasks.py
from __future__ import absolute_import
from celery_c import app
#app.task
def do_something_url(url):
..download file from url
..do some calculations
..upload results files to s3 and return the result url###
return result_url

Categories