Celery tasks successfully executing without queues
setup.
BROKER_URL = "amqp://user:pass#localhost:5672/test"
# Celery Data Format
CELERY_ACCEPT_CONTENT = ['application/json']
CELERY_TASK_SERIALIZER = 'json'
CELERYD_TASK_SOFT_TIME_LIMIT = 60
CELERY_IGNORE_RESULT = True
#app.task
def test(a,b,c):
print("doing something here...")
command
celery worker -A proj -E -l INFO
The above setup worker is executing successfully.
I have introduced queue to the celery tasks.
added configuration with the previous setup
from kombu.entity import Exchange, Queue
CELERY_QUEUES = (
Queue('high', Exchange('high'), routing_key='high'),
Queue('normal', Exchange('normal'), routing_key='normal'),
Queue('low', Exchange('low'), routing_key='low'),
)
CELERY_DEFAULT_QUEUE = 'normal'
CELERY_DEFAULT_EXCHANGE = 'normal'
CELERY_DEFAULT_ROUTING_KEY = 'normal'
CELERY_ROUTES = {
'myapp.tasks.test': {'queue': 'high'},
}
command
celery worker -A proj -E -l INFO -n worker.high -Q high
call
test.delay(1, 2, 3)
When I execute with the queue worker is not running. Did I miss any configuration?
Change CELERY_ROUTES to CELERY_TASK_ROUTES- changed in version 4
First, make sure that connection established in both rabbit & high worker logs.
Then, try to change your CELERY_ROUTES to:
CELERY_ROUTES = {
'myapp.tasks.test': {
'exchange': 'high',
'exchange_type': 'high',
'routing_key': 'high'
}
}
or call the task with queue, for example:
test_task = test.signature(args=(1, 2, 3), queue='high', immutable=True)
test_task.apply_async()
Related
I'm new to learning celery and was following tutorials and setup my celery setup with docker
I'm having issue with sending and executing celery task.
So have 4 docker container one for rabbitmq server, celery producer server and 2 worker.
Celery tasks file:
"""
CELERY MAIN FILE
"""
from celery import Celery
from time import sleep
celery_obj = Celery()
celery_obj.config_from_object('celery_config') #config file we created in same folder
#celery_obj.task
def add(num1,num2):
print("executing add function")
sleep(5)
return num1 + num2
My celery config file for Producer:
"""
CELERY CONFIGURATION FILE
"""
from kombu import Exchange, Queue
broker_url = "pyamqp://rabbitmq_user:123#172.17.0.2/res_opt_rabbitmq_vhost"
result_backend = 'rpc://'
#celery_result_backend = ""
celery_imports = ('res_opt_code.tasks')
task_queues = (
Queue('worker_A_kombu_queue',Exchange('celery',type='direct'),routing_key='worker_A_rabbitmq_queue'),
Queue('worker_B_kombu_queue',Exchange('celery',type='direct'),routing_key='worker_B_rabbitmq_queue')
)
Config file for worker_A:
"""
CELERY CONFIGURATION FILE
"""
from kombu import Exchange, Queue
broker_url = "pyamqp://rabbitmq_user:123#172.17.0.2/res_opt_rabbitmq_vhost"
result_backend = 'rpc://'
#celery_result_backend = ""
celery_imports = ('worker_code.tasks')
task_queues = (
Queue('worker_A_kombu_queue',Exchange('celery',type='direct'),routing_key='worker_A_rabbitmq_queue'),
Queue('worker_B_kombu_queue',Exchange('celery',type='direct'),routing_key='worker_B_rabbitmq_queue')
)
Command for starting celery on producer:
celery -A tasks worker --loglevel=DEBUG -f log_file.txt
command for starting celery on worker:
celery -A tasks worker -n celery_worker_A -Q worker_A_kombu_queue --loglevel=DEBUG
Function call from producer:
from tasks import add
add.apply_async([4,4],routing_key='worker_A_rabbitmq_queue')
#also tried local executing the function but not logs of functions it's in pending
add.delay(4,4)
could you guyz please help me what I'm doing wrong here
In Logs I'm able to see worker_A connected but no logs for function
Tried further troubleshooting and changed the argument in apply_async from routing key to queue and it working with the queue argument
was following this tutorial:
https://www.youtube.com/watch?v=TM1a3m65zaA
old:
add.apply_async([4,4],routing_key='worker_A_rabbitmq_queue')
new:
add.apply_async([4,4],queue='worker_A_rabbitmq_queue')
I run celery:
celery multi start --app=myapp fast_worker
slow_worker
-Q:fast_worker fast-queue
-Q:slow_worker slow-queue
-c:fast_worker 1 -c:slow_worker 1
--logfile=%n.log --pidfile=%n.pid
And celerybeat:
celery beat -A myapp
Task:
#task.periodic_task(run_every=timedelta(seconds=5), ignore_result=True)
def test_log_task_queue():
import time
time.sleep(10)
print "test_log_task_queue"
Routing:
CELERY_ROUTES = {
'myapp.tasks.test_log_task_queue': {
'queue': 'slow-queue',
'routing_key': 'slow-queue',
},
}
I use rabbitMQ. When I open rabbitMQ admin panel, I see that my tasks are in slow-queue, but when I open logs I see task output for both workers. Why do both workers execute my tasks, even when task not in worker queue?
It looks like celery multi creates something like shared queues. To fix this problem, I added -X option:
celery multi start --app=myapp fast_worker
slow_worker
-Q:fast_worker fast-queue
-Q:slow_worker slow-queue
-X:fast_worker slow-queue
-X:slow_worker fast-queue
-c:fast_worker 1 -c:slow_worker 1
--logfile=%n.log --pidfile=%n.pid
I am trying to schedule a task that runs every 10 minutes using Django 1.9.8, Celery 4.0.2, RabbitMQ 2.1.4, Redis 2.10.5. These are all running within Docker containers in Linux (Fedora 25). I have tried many combinations of things that I found in Celery docs and from this site. The only combination that has worked thus far is below. However, it only runs the periodic task initially when the application starts, but the schedule is ignored thereafter. I have absolutely confirmed that the scheduled task does not run again after the initial time.
My (almost-working) setup that only runs one-time:
settings.py:
INSTALLED_APPS = (
...
'django_celery_beat',
...
)
BROKER_URL = 'amqp://{user}:{password}#{hostname}/{vhost}/'.format(
user=os.environ['RABBIT_USER'],
password=os.environ['RABBIT_PASS'],
hostname=RABBIT_HOSTNAME,
vhost=os.environ.get('RABBIT_ENV_VHOST', '')
# We don't want to have dead connections stored on rabbitmq, so we have to negotiate using heartbeats
BROKER_HEARTBEAT = '?heartbeat=30'
if not BROKER_URL.endswith(BROKER_HEARTBEAT):
BROKER_URL += BROKER_HEARTBEAT
BROKER_POOL_LIMIT = 1
BROKER_CONNECTION_TIMEOUT = 10
# Celery configuration
# configure queues, currently we have only one
CELERY_DEFAULT_QUEUE = 'default'
CELERY_QUEUES = (
Queue('default', Exchange('default'), routing_key='default'),
)
# Sensible settings for celery
CELERY_ALWAYS_EAGER = False
CELERY_ACKS_LATE = True
CELERY_TASK_PUBLISH_RETRY = True
CELERY_DISABLE_RATE_LIMITS = False
# By default we will ignore result
# If you want to see results and try out tasks interactively, change it to False
# Or change this setting on tasks level
CELERY_IGNORE_RESULT = True
CELERY_SEND_TASK_ERROR_EMAILS = False
CELERY_TASK_RESULT_EXPIRES = 600
# Set redis as celery result backend
CELERY_RESULT_BACKEND = 'redis://%s:%d/%d' % (REDIS_HOST, REDIS_PORT, REDIS_DB)
CELERY_REDIS_MAX_CONNECTIONS = 1
# Don't use pickle as serializer, json is much safer
CELERY_TASK_SERIALIZER = "json"
CELERY_RESULT_SERIALIZER = "json"
CELERY_ACCEPT_CONTENT = ['application/json']
CELERYD_HIJACK_ROOT_LOGGER = False
CELERYD_PREFETCH_MULTIPLIER = 1
CELERYD_MAX_TASKS_PER_CHILD = 1000
celeryconf.py
coding=UTF8
from __future__ import absolute_import
import os
from celery import Celery
from django.conf import settings
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "web_portal.settings")
app = Celery('web_portal')
CELERY_TIMEZONE = 'UTC'
app.config_from_object('django.conf:settings')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
tasks.py
from celery.schedules import crontab
from .celeryconf import app as celery_app
#celery_app.on_after_finalize.connect
def setup_periodic_tasks(sender, **kwargs):
# Calls email_scanner every 10 minutes
sender.add_periodic_task(
crontab(hour='*',
minute='*/10',
second='*',
day_of_week='*',
day_of_month='*'),
email_scanner.delay(),
)
#app.task
def email_scanner():
dispatch_list = scanning.email_scan()
for dispatch in dispatch_list:
validate_dispatch.delay(dispatch)
return
run_celery.sh -- Used to start celery tasks from docker-compose.yml
#!/bin/sh
# wait for RabbitMQ server to start
sleep 10
cd web_portal
# run Celery worker for our project myproject with Celery configuration stored in Celeryconf
su -m myuser -c "celery beat -l info --pidfile=/tmp/celerybeat-web_portal.pid -s /tmp/celerybeat-schedule &"
su -m myuser -c "celery worker -A web_portal.celeryconf -Q default -n default#%h"
I have also tried using a CELERYBEAT_SCHEDULER in the settings.py in lieu of the #celery_app.on_after finalize_connect decorator and block in tasks.py, but the scheduler never ran even once.
settings.py (not working at all scenario)
(same as before except also including the following)
CELERYBEAT_SCHEDULE = {
'email-scanner-every-5-minutes': {
'task': 'tasks.email_scanner',
'schedule': timedelta(minutes=10)
},
}
The Celery 4.0.2 documentation online presumes that I should instinctively know many givens, but I am new in this environment. If anybody knows where I can find a tutorial OTHER THAN docs.celeryproject.org and http://django-celery-beat.readthedocs.io/en/latest/ which both assume that I am already a Django master, I would be grateful. Or let me know of course if you see something obviously wrong in my setup. Thanks!
I found a solution that works. I could not get CELERYBEAT_SCHEDULE or the celery task decorators to work, and I suspect that it may be at least partially due with the manner in which I started the Celery beat task.
The working solution goes the whole 9 yards to utilize Django Database Scheduler. I downloaded the GitHub project "https://github.com/celery/django-celery-beat" and incorporated all of the code as another "app" in my project. This enabled Django-Admin access to maintain the cron / interval / periodic task(s) tables via a browser. I also modified my run_celery.sh as follows:
#!/bin/sh
# wait for RabbitMQ server to start
sleep 10
# run Celery worker for our project myproject with Celery configuration stored in Celeryconf
celery beat -A web_portal.celeryconf -l info --pidfile=/tmp/celerybeat- web_portal.pid -S django --detach
su -m myuser -c "celery worker -A web_portal.celeryconf -Q default -n default#%h -l info "
After adding a scheduled task via the django-admin web interface, the scheduler started working fine.
I am new to Celery and SQS, and would like to use it to periodically check messages stored in SQS and then fire a consumer. The consumer and Celery both live on EC2, while the messages are sent from GAE using boto library.
Currently, I am confused about:
In the message body of creating_msg_gae.py, what task information I should put here? I assume this information would be the name of my celery task?
In the message body of creating_msg_gae.py, is url considered as the argument to be processed by my consumer (function do_something_url(url) in tasks.py)?
Currently, I am running celery with command celery worker -A celery_c -l info, from the command line, it seems like celery checks SQS periodically. Do I need to create a PeriodicTask in Celery instead?
I really appreciate any suggestions to help me with this issue.
creating_msg_gae.py
from boto import sqs
conn = sqs.connect_to_region("us-east-1",
aws_access_key_id='aaa',
aws_secret_access_key='bbb')
my_queue = conn.get_queue('uber_batch')
msg = {'properties': {'content_type': 'application/json',
'content_encoding': 'utf-8',
'body_encoding':'base64',
'delivery_tag':None,
'delivery_info': {'exchange':None, 'routing_key':None}},}
body = {'id':'theid',
###########Question 1#######
'task':'what task name I should put here?',
'url':['my_s3_address']}
msg.update({'body':base64.encodestring(json.dumps(body))})
my_queue.write(my_queue.new_message(json.dumps(msg)))
My Celery file system looks like:
./ce_folder/
celery_c.py, celeryconfig.py, tasks.py, __init__.py
celeryconfig.py
import os
BROKER_BACKEND = "SQS"
AWS_ACCESS_KEY_ID = 'aaa'
AWS_SECRET_ACCESS_KEY = 'bbb'
os.environ.setdefault("AWS_ACCESS_KEY_ID", AWS_ACCESS_KEY_ID)
os.environ.setdefault("AWS_SECRET_ACCESS_KEY", AWS_SECRET_ACCESS_KEY)
BROKER_URL = 'sqs://'
BROKER_TRANSPORT_OPTIONS = {'region': 'us-east-1'}
BROKER_TRANSPORT_OPTIONS = {'visibility_timeout': 60}
BROKER_TRANSPORT_OPTIONS = {'polling_interval': 30}
CELERY_DEFAULT_QUEUE = 'uber_batch'
CELERY_DEFAULT_EXCHANGE = CELERY_DEFAULT_QUEUE
CELERY_DEFAULT_EXCHANGE_TYPE = CELERY_DEFAULT_QUEUE
CELERY_DEFAULT_ROUTING_KEY = CELERY_DEFAULT_QUEUE
CELERY_QUEUES = {
CELERY_DEFAULT_QUEUE: {
'exchange': CELERY_DEFAULT_QUEUE,
'binding_key': CELERY_DEFAULT_QUEUE,
}
}
celery_c.py
from __future__ import absolute_import
from celery import Celery
app = Celery('uber')
app.config_from_object('celeryconfig')
if __name__ == '__main__':
app.start()
tasks.py
from __future__ import absolute_import
from celery_c import app
#app.task
def do_something_url(url):
..download file from url
..do some calculations
..upload results files to s3 and return the result url###
return result_url
I'm a bit confused on what my configuration should look like to set up a topic exchange.
http://www.rabbitmq.com/tutorials/tutorial-five-python.html
This is what I'd like to accomplish:
Task1 -> send to QueueOne and QueueFirehose
Task2 -> sent to QueueTwo and QueueFirehose
then:
Task1 -> consume from QueueOne
Task2 -> consume from QueueTwo
TaskFirehose -> consume from QueueFirehose
I only want Task1 to consume from QueueOne and Task2 to consume from QueueTwo.
That problem now is that when Task1 and 2 run, they also drain QueueFirehose, and TaskFirehose task never executes.
Is there something wrong with my config, or am I misunderstanding something?
CELERY_QUEUES = {
"QueueOne": {
"exchange_type": "topic",
"binding_key": "pipeline.one",
},
"QueueTwo": {
"exchange_type": "topic",
"binding_key": "pipeline.two",
},
"QueueFirehose": {
"exchange_type": "topic",
"binding_key": "pipeline.#",
},
}
CELERY_ROUTES = {
"tasks.task1": {
"queue": 'QueueOne',
"routing_key": 'pipeline.one',
},
"tasks.task2": {
"queue": 'QueueTwo',
"routing_key": 'pipeline.two',
},
"tasks.firehose": {
'queue': 'QueueFirehose',
"routing_key": 'pipeline.#',
},
}
Assuming that you actually meant something like this:
Task1 -> send to QueueOne
Task2 -> sent to QueueTwo
TaskFirehose -> send to QueueFirehose
then:
Worker1 -> consume from QueueOne, QueueFirehose
Worker2 -> consume from QueueTwo, QueueFirehose
WorkerFirehose -> consume from QueueFirehose
This might not be exactly what you meant, but i think it should cover many scenarios and hopefully yours too.
Something like this should work:
# Advanced example starting 10 workers in the background:
# * Three of the workers processes the images and video queue
# * Two of the workers processes the data queue with loglevel DEBUG
# * the rest processes the default' queue.
$ celery multi start 10 -l INFO -Q:1-3 images,video -Q:4,5 data
-Q default -L:4,5 DEBUG
For more options and reference: http://celery.readthedocs.org/en/latest/reference/celery.bin.multi.html
This was straight from the documentation.
I too had a similar situation, and I tackled it in a slightly different way. I couldnt use celery multi with supervisord.
So instead I created multiple program in supervisord for each worker. The workers will be on different processes anyway, so just let supervisord take care of everything for you.
The config file looks something like:-
; ==================================
; celery worker supervisor example
; ==================================
[program:Worker1]
; Set full path to celery program if using virtualenv
command=celery worker -A proj --loglevel=INFO -Q QueueOne, QueueFirehose
directory=/path/to/project
user=nobody
numprocs=1
stdout_logfile=/var/log/celery/worker1.log
stderr_logfile=/var/log/celery/worker1.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998
Similarly, for Worker2 and WorkerFirehose, edit the corresponding lines to make:
[program:Worker2]
; Set full path to celery program if using virtualenv
command=celery worker -A proj --loglevel=INFO -Q QueueTwo, QueueFirehose
and
[program:WorkerFirehose]
; Set full path to celery program if using virtualenv
command=celery worker -A proj --loglevel=INFO -Q QueueFirehose
include them all in supervisord.conf file and that should do it.