Multiple Python-Celery Scripts conflict and doesn't execute - python

I have two different python scripts in different directories that has celery schedulers.
Script 1:
import requests
from celery import Celery
from celery.schedules import crontab
import subprocess
celery = Celery()
celery.conf.enable_utc = False
#celery.task()
def proxy():
response = requests.get(url="XYZ")
proxies = response.text
paid_proxies = open("paid_proxies.txt", "w+")
paid_proxies.write(proxies.strip())
paid_proxies.close()
celery.conf.beat_schedule = {
"proxy-api": {
"task": "scheduler1.proxy",
"schedule": crontab(minute="*/5")
}
}
Commands that I use for executing it:
celery beat -A scheduler1.celery
celery worker -A scheduler1.celery
Script 2:
from celery import Celery
from celery.schedules import crontab
import subprocess
celery = Celery()
celery.conf.enable_utc = False
#celery.task()
def daily():
subprocess.run(["python3", "cross_validation.py"])
celery.conf.beat_schedule = {
"daily-scraper": {
"task": "scheduler2.daily",
"schedule": crontab(day_of_week="*", hour=15, minute=23)
}
}
Commands that I use for executing it:
celery beat -A scheduler2.celery
celery worker -A scheduler2.celery
The issues is when I execute Script 1, it works perfectly. But when I try to execute Script 2, I get this error as Scheduler2 tries to execute tasks of scheduler1:
[2019-09-14 15:10:00,127: ERROR/MainProcess] Received unregistered task of type 'scheduler1.proxy'.
The message has been ignored and discarded.
Did you remember to import the module containing this task?
Or maybe you're using relative imports?
Please see
http://docs.celeryq.org/en/latest/internals/protocol.html
for more information.
The full contents of the message body was:
'[[], {}, {"callbacks": null, "errbacks": null, "chain": null, "chord": null}]' (77b)
Traceback (most recent call last):
File "/home/PycharmProjects/data_scraping/venv/lib/python3.6/site-packages/celery/worker/consumer/consumer.py", line 559, in on_task_received
strategy = strategies[type_]
KeyError: 'scheduler1.proxy'
I tried referring multiple answers but nothing worked.

The problem that you are seeing is that celery is using the same "broker" in both project 1 and project 2. In order to use two different celery projects simultaneously, all you have to do is give them different brokers. You can specify a broker using the broker_url setting.
We typically use redis as a broker, so it is very simple to put one project on redis db 0 and the other project on redis db 1. That said, there is a lot of thinking that normally goes into which broker to use, and deciding on a broker is outside the scope of this particular question.

Related

How to send a task with schedule to celery while worker is running?

I've problem to creating and adding beat schedule task to the celery worker and running that when the celery worker is currently running
well this is my code
from celery import Celery
from celery.schedules import crontab
app = Celery('proj2')
app.config_from_object('proj2.celeryconfig')
#app.task
def hello() -> None:
print(f'Hello')
app.add_periodic_task(10.0, hello)
so when i am running the celery
$ celery -A proj2 worker --beat -E
this works very well and the task will run every 10 seconds
But
I need to create a task with schedule and sending that to the worker to run it automatically
Imagine i have run the worker celery and every things is ok
well i go to the python shell and do following code
>>> from proj2.celery import app
>>>
>>> app.conf.beat_schedule = {
... 'hello-every-5': {
... 'tasks': 'proj2.tasks.bye' # i have a task called bye in tasks module
... 'schedule': 5.0,
... }
...}
>>>
it's not working. Also there is no error
but it seems does not send to the worker
p.s: i have also used add_periodic_task method. still not working

Celery task_routes and task_create_missing_queues note working

I have configured a CentOS server with (server A) :
Redis 6.2.6
Python 3.8.1
Celery 5.2.1
Flower
On another server (server B) I send tasks to redis that are executed by a worker on server A.
I want to try to implement multiple queues and change de name of the default queue : queues "high", "normal" and "low", with "normal" beeing the default queue (instead of celery queue).
If I hardcode the queue in the task, it works (they are send to the right queue).
But if I try to do it with "task_routes", it doesn't.
I tried to:
hardcode route in this settings
implement a routing function
implement a routing class with the routing function
But it never works. Moreover, even though I put the "task_create_missing_queue" parameter to False, all tasks are send to celery queue (the default queue is still celery and not normal...).
Any ideas ?
Here is the code for my celeryconfig.py file (with routing function:
from celery import Celery
from kombu import Exchange, Queue
from celery.exceptions import Reject
import re
task_create_missing_queues = False
task_queues = (
Queue('high', Exchange('high'), routing_key='high'),
Queue('normal', Exchange('normal'), routing_key='normal'),
Queue('low', Exchange('low'), routing_key='low')
)
task_default_queue = 'normal'
task_default_exchange = 'normal'
task_default_routing_key = 'normal'
def route_tasks (name, args, kwargs, options, task=None, **kw):
if ':' not in name:
return {'queue': 'normal'}
namespace, _ = name.split(':')
return {'queue': namespace}
task_routes = (route_tasks,)
And my tasks are defined with a name like this with the decorator (when I hardcode the queue for each task, it is there that I add "queue="queue_name") (my celery app is called celery):
#celery.task(bind=True, name="high:long_task")
I can see in flower that the configuration is well taken into account.
My worker is started with multi and -Q high,normal,low.
Many thanks!

Reload celery beat config

I'm using celery and celery-beat without Django and I have a task which needs to modify celery-beat schedule when run.
Now I have the following code (module called celery_tasks):
# __init__.py
from .celery import app as celery_app
__all__ = ['celery_app']
#celery.py
from celery import Celery
import config
celery_config = config.get_celery_config()
app = Celery(
__name__,
include=[
'celery_tasks.tasks',
],
)
app.conf.update(celery_config)
# tasks.py
from celery_tasks import celery_app
from celery import shared_task
#shared_task
def start_game():
celery_app.conf.beat_schedule = {
'process_round': {
'task': 'celery_tasks.tasks.process_round',
'schedule': 5,
},
}
I start celery with the following command:
celery worker -A celery_tasks -E -l info --beat
start_game executes and exists normally, but beat process_round task never runs.
How can I force-reload beat schedule (restarting all workers doesn't seem as a good idea)?
the problem with normal celery backend when you start the celerybeat process. it will create a config file and write all tasks and schedules in to that file so it cannot change dynamically
you can use the package
celerybeat-sqlalchemy-scheduler so you can edit schedule on DB itself so that celerybeat will pickup the new schedule from DB itself
also there is another package celery-redbeat which using redis-server as backend
you can refer this this also
Using schedule config also seems bad idea. What if initially process_round task will be active and check if game is not started task just do nothing.

Django celerybeat periodic task only runs once

I am trying to schedule a task that runs every 10 minutes using Django 1.9.8, Celery 4.0.2, RabbitMQ 2.1.4, Redis 2.10.5. These are all running within Docker containers in Linux (Fedora 25). I have tried many combinations of things that I found in Celery docs and from this site. The only combination that has worked thus far is below. However, it only runs the periodic task initially when the application starts, but the schedule is ignored thereafter. I have absolutely confirmed that the scheduled task does not run again after the initial time.
My (almost-working) setup that only runs one-time:
settings.py:
INSTALLED_APPS = (
...
'django_celery_beat',
...
)
BROKER_URL = 'amqp://{user}:{password}#{hostname}/{vhost}/'.format(
user=os.environ['RABBIT_USER'],
password=os.environ['RABBIT_PASS'],
hostname=RABBIT_HOSTNAME,
vhost=os.environ.get('RABBIT_ENV_VHOST', '')
# We don't want to have dead connections stored on rabbitmq, so we have to negotiate using heartbeats
BROKER_HEARTBEAT = '?heartbeat=30'
if not BROKER_URL.endswith(BROKER_HEARTBEAT):
BROKER_URL += BROKER_HEARTBEAT
BROKER_POOL_LIMIT = 1
BROKER_CONNECTION_TIMEOUT = 10
# Celery configuration
# configure queues, currently we have only one
CELERY_DEFAULT_QUEUE = 'default'
CELERY_QUEUES = (
Queue('default', Exchange('default'), routing_key='default'),
)
# Sensible settings for celery
CELERY_ALWAYS_EAGER = False
CELERY_ACKS_LATE = True
CELERY_TASK_PUBLISH_RETRY = True
CELERY_DISABLE_RATE_LIMITS = False
# By default we will ignore result
# If you want to see results and try out tasks interactively, change it to False
# Or change this setting on tasks level
CELERY_IGNORE_RESULT = True
CELERY_SEND_TASK_ERROR_EMAILS = False
CELERY_TASK_RESULT_EXPIRES = 600
# Set redis as celery result backend
CELERY_RESULT_BACKEND = 'redis://%s:%d/%d' % (REDIS_HOST, REDIS_PORT, REDIS_DB)
CELERY_REDIS_MAX_CONNECTIONS = 1
# Don't use pickle as serializer, json is much safer
CELERY_TASK_SERIALIZER = "json"
CELERY_RESULT_SERIALIZER = "json"
CELERY_ACCEPT_CONTENT = ['application/json']
CELERYD_HIJACK_ROOT_LOGGER = False
CELERYD_PREFETCH_MULTIPLIER = 1
CELERYD_MAX_TASKS_PER_CHILD = 1000
celeryconf.py
coding=UTF8
from __future__ import absolute_import
import os
from celery import Celery
from django.conf import settings
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "web_portal.settings")
app = Celery('web_portal')
CELERY_TIMEZONE = 'UTC'
app.config_from_object('django.conf:settings')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
tasks.py
from celery.schedules import crontab
from .celeryconf import app as celery_app
#celery_app.on_after_finalize.connect
def setup_periodic_tasks(sender, **kwargs):
# Calls email_scanner every 10 minutes
sender.add_periodic_task(
crontab(hour='*',
minute='*/10',
second='*',
day_of_week='*',
day_of_month='*'),
email_scanner.delay(),
)
#app.task
def email_scanner():
dispatch_list = scanning.email_scan()
for dispatch in dispatch_list:
validate_dispatch.delay(dispatch)
return
run_celery.sh -- Used to start celery tasks from docker-compose.yml
#!/bin/sh
# wait for RabbitMQ server to start
sleep 10
cd web_portal
# run Celery worker for our project myproject with Celery configuration stored in Celeryconf
su -m myuser -c "celery beat -l info --pidfile=/tmp/celerybeat-web_portal.pid -s /tmp/celerybeat-schedule &"
su -m myuser -c "celery worker -A web_portal.celeryconf -Q default -n default#%h"
I have also tried using a CELERYBEAT_SCHEDULER in the settings.py in lieu of the #celery_app.on_after finalize_connect decorator and block in tasks.py, but the scheduler never ran even once.
settings.py (not working at all scenario)
(same as before except also including the following)
CELERYBEAT_SCHEDULE = {
'email-scanner-every-5-minutes': {
'task': 'tasks.email_scanner',
'schedule': timedelta(minutes=10)
},
}
The Celery 4.0.2 documentation online presumes that I should instinctively know many givens, but I am new in this environment. If anybody knows where I can find a tutorial OTHER THAN docs.celeryproject.org and http://django-celery-beat.readthedocs.io/en/latest/ which both assume that I am already a Django master, I would be grateful. Or let me know of course if you see something obviously wrong in my setup. Thanks!
I found a solution that works. I could not get CELERYBEAT_SCHEDULE or the celery task decorators to work, and I suspect that it may be at least partially due with the manner in which I started the Celery beat task.
The working solution goes the whole 9 yards to utilize Django Database Scheduler. I downloaded the GitHub project "https://github.com/celery/django-celery-beat" and incorporated all of the code as another "app" in my project. This enabled Django-Admin access to maintain the cron / interval / periodic task(s) tables via a browser. I also modified my run_celery.sh as follows:
#!/bin/sh
# wait for RabbitMQ server to start
sleep 10
# run Celery worker for our project myproject with Celery configuration stored in Celeryconf
celery beat -A web_portal.celeryconf -l info --pidfile=/tmp/celerybeat- web_portal.pid -S django --detach
su -m myuser -c "celery worker -A web_portal.celeryconf -Q default -n default#%h -l info "
After adding a scheduled task via the django-admin web interface, the scheduler started working fine.

Setting up periodic tasks in Celery (celerybeat) dynamically using add_periodic_task

I'm using Celery 4.0.1 with Django 1.10 and I have troubles scheduling tasks (running a task works fine). Here is the celery configuration:
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'myapp.settings')
app = Celery('myapp')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
app.conf.BROKER_URL = 'amqp://{}:{}#{}'.format(settings.AMQP_USER, settings.AMQP_PASSWORD, settings.AMQP_HOST)
app.conf.CELERY_DEFAULT_EXCHANGE = 'myapp.celery'
app.conf.CELERY_DEFAULT_QUEUE = 'myapp.celery_default'
app.conf.CELERY_TASK_SERIALIZER = 'json'
app.conf.CELERY_ACCEPT_CONTENT = ['json']
app.conf.CELERY_IGNORE_RESULT = True
app.conf.CELERY_DISABLE_RATE_LIMITS = True
app.conf.BROKER_POOL_LIMIT = 2
app.conf.CELERY_QUEUES = (
Queue('myapp.celery_default'),
Queue('myapp.queue1'),
Queue('myapp.queue2'),
Queue('myapp.queue3'),
)
Then in tasks.py I have:
#app.task(queue='myapp.queue1')
def my_task(some_id):
print("Doing something with", some_id)
In views.py I want to schedule this task:
def my_view(request, id):
app.add_periodic_task(10, my_task.s(id))
Then I execute the commands:
sudo systemctl start rabbitmq.service
celery -A myapp.celery_app beat -l debug
celery worker -A myapp.celery_app
But the task is never scheduled. I don't see anything in the logs. The task is working because if in my view I do:
def my_view(request, id):
my_task.delay(id)
The task is executed.
If in my configuration file if I schedule the task manually, like this it works:
app.conf.CELERYBEAT_SCHEDULE = {
'add-every-30-seconds': {
'task': 'tasks.my_task',
'schedule': 10.0,
'args': (66,)
},
}
I just can't schedule the task dynamically. Any idea?
EDIT: (13/01/2018)
The latest release 4.1.0 have addressed the subject in this ticket #3958 and has been merged
Actually you can't not define periodic task at the view level, because the beat schedule setting will be loaded first and can not be rescheduled at runtime:
The add_periodic_task() function will add the entry to the beat_schedule setting behind the scenes, and the same setting can also can be used to set up periodic tasks manually:
app.conf.CELERYBEAT_SCHEDULE = {
'add-every-30-seconds': {
'task': 'tasks.my_task',
'schedule': 10.0,
'args': (66,)
},
}
which means if you want to use add_periodic_task() it should be wrapped within an on_after_configure handler at the celery app level and any modification on runtime will not take effect:
app = Celery()
#app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
sender.add_periodic_task(10, my_task.s(66))
As mentioned in the doc the the regular celerybeat simply keep track of task execution:
The default scheduler is the celery.beat.PersistentScheduler, that simply keeps track of the last run times in a local shelve database file.
In order to be able to dynamically manage periodic tasks and reschedule celerybeat at runtime:
There’s also the django-celery-beat extension that stores the schedule in the Django database, and presents a convenient admin interface to manage periodic tasks at runtime.
The tasks will be persisted in django database and the scheduler could be updated in task model at the db level. Whenever you update a periodic task a counter in this tasks table will be incremented, and tells the celery beat service to reload the schedule from the database.
A possible solution for you could be as follow:
from django_celery_beat.models import PeriodicTask, IntervalSchedule
schedule= IntervalSchedule.objects.create(every=10, period=IntervalSchedule.SECONDS)
task = PeriodicTask.objects.create(interval=schedule, name='any name', task='tasks.my_task', args=json.dumps([66]))
views.py
def update_task_view(request, id)
task = PeriodicTask.objects.get(name="task name") # if we suppose names are unique
task.args=json.dumps([id])
task.save()

Categories