In django admin panel, I created about 1500 celery-beat periodic tasks, and 100 of them has the same crontab schedule 15 4 * 1 * (m/h/d/dM/MY) UTC(Minute(s)/Hour(s)/Day(s) Of The Week/Day(s) Of The Month/Month(s) Of The Year). All of them were enabled.
But at 04:15 on day-of-month 1, one of the periodic tasks was not sending due task to celery worker while all other tasks were sending. From the django admin panel, this periodic task's last_run_time is None, which indicates that it is not triggered.
I tried to configure the crontab schedule to 15 * * * * (m/h/d/dM/MY) UTC and then it runs successfully at minute 15. So I wonder is there any limitation of the number of celery-beat periodic tasks?
celery.py
app = Celery('myapp', broker=os.getenv('BROKER_URL', None))
#signals.setup_logging.connect
def setup_logging(**kwargs):
"""Setup logging."""
pass
app.conf.ONCE = {
'backend': 'celery_once.backends.Redis',
'settings': {
'url': 'redis://localhost:6379/0',
'default_timeout': 60 * 60
}
}
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
from django.utils import timezone
app.conf.update(
CELERY_ALWAYS_EAGER=bool(os.getenv('CELERY_ALWAYS_EAGER', False)),
CELERY_DISABLE_RATE_LIMITS=True,
# CELERYD_MAX_TASKS_PER_CHILD=5,
CELERY_TASK_RESULT_EXPIRES=3600,
# Uncomment below if you want to use django-celery backend
# CELERY_RESULT_BACKEND='djcelery.backends.database:DatabaseBackend',
CELERY_ACCEPT_CONTENT=['json'],
CELERY_TASK_SERIALIZER='json',
CELERY_RESULT_SERIALIZER='json',
# Uncomment below if you want to store the periodic task information in django-celery backend
CELERYBEAT_SCHEDULER = "django_celery_beat.schedulers.DatabaseScheduler",
# periodic tasks setup
CELERYBEAT_SCHEDULE={
'check_alerts_task': {
'task': 'devices.tasks.alert_task',
'schedule': 300.0 # run every 5 minutes
},
'weather_request': {
'task': 'organizations.tasks.weather_request',
'schedule': 60.0 # run every minute
},
'update_crontab': {
'task': 'devices.tasks.update_crontab',
'schedule': crontab(hour=0, minute=0)
},
'export_data_cleanup': {
'task': 'devices.tasks.delete_expiry_export_data',
'schedule': crontab(minute=0)
}
}
)
celery.log
[01/Jun/2020 04:15:09] INFO [celery.beat:271] Scheduler: Sending due task Report_first (reportAPI.tasks.report)
Your crontab is actually wrong ( minute hour day month weekday )
15 4 * 1 * -> 04:15 each day in 1st month ( January )
what you want is 1st day of month 15 4 1 * *
Related
Iam working with Celery Worker with Redis as broker and in backend.
Steps:
Add process will pick the data from queue and process and the result will be send to read queue.
Read process will read the data from read queue and print the result.
Before picking up by second task, the message is getting deleted with the below message
[2023-02-02 16:41:29,992: WARNING/MainProcess] Received and deleted unknown message. Wrong destination?!?
The full contents of the message body was: body: {'task_id': 'e54849dc-3bc7-409c-afab-5c69b3310d99', 'status': 'SUCCESS', 'result': 7, 'traceback': None, 'children': []} (120b)
{content_type:'application/json' content_encoding:'utf-8'
delivery_info:{'exchange': '', 'routing_key': 'read'} headers={}}
Code snippet:
from kombu import Queue, Exchange
from celery import Celery
celery = Celery('tasks', broker="redis://localhost:6379/0", backend="rpc://")
CELERY_DEFAULT_QUEUE = 'default'
CELERY_DEFAULT_EXCHANGE = 'default'
CELERY_DEFAULT_EXCHANGE_TYPE = 'direct'
CELERY_DEFAULT_ROUTING_KEY = 'default'
celery.conf.update(
CELERY_ROUTES={
"celery_worker.celery_worker.add": {
"queue": "add",
"routing_key": "add"
},
"celery_worker.celery_worker.read": {
"queue": "read",
"routing_key": "read"
}
},
CELERY_QUEUES = (
Queue(CELERY_DEFAULT_QUEUE, Exchange(CELERY_DEFAULT_EXCHANGE),
routing_key=CELERY_DEFAULT_ROUTING_KEY),
Queue("add", Exchange(CELERY_DEFAULT_EXCHANGE),
routing_key="add"),
Queue("read", Exchange(CELERY_DEFAULT_EXCHANGE),
routing_key="read"),
),
CELERY_CREATE_MISSING_QUEUES = True,
CELERYD_PREFETCH_MULTIPLIER = 1)
#celery.task(name='add',acks_late=True)
def add(x, y):
print(f"Order Complete!{x}, {y}")
return x + y
#celery.task(name='read',acks_late=True, queue="read", routing_key="read")
def read(data):
print("data")
print(f"Order Completedread!{data}")
return data
task_1 = add.apply_async((4,3), queue='add', routing_key="add", reply_to="read")
Help to resolve this issue
Here's my Dag:
I am trying to spin an EMR cluster and trying to use its cluster id in adding steps.
Use case is : Want to spin a cluster and save its cluster id somewhere in s3.
But the xcom_pull is showing error.
from datetime import timedelta
from airflow import DAG
from airflow.contrib.operators.emr_add_steps_operator import EmrAddStepsOperator
from airflow.providers.amazon.aws.operators.emr_create_job_flow import EmrCreateJobFlowOperator
from airflow.providers.amazon.aws.sensors.emr_job_flow import EmrJobFlowSensor
from airflow.utils.dates import days_ago
SPARK_STEPS = [
{
'Name': 'emr_spin',
'ActionOnFailure': 'CONTINUE',
'HadoopJarStep': {
'Jar': 'command-runner.jar',
'Args': ['/usr/lib/spark/bin/run-example', 'SparkPi', '10'],
},
}
]
JOB_FLOW_OVERRIDES = {
'Name': 'airflow_trial',
'ReleaseLabel': 'emr-5.29.0',
'Applications': [{'Name': 'Spark'}],
'Instances': {
'InstanceGroups': [
{
'Name': "Master",
'Market': 'ON_DEMAND',
'InstanceRole': 'MASTER',
'InstanceType': 'm5.xlarge',
'InstanceCount': 1,
}
],
'KeepJobFlowAliveWhenNoSteps': False,
'TerminationProtected': False,
},
'Steps': [],
'JobFlowRole': 'EMR_EC2_DefaultRole',
'ServiceRole': 'EMR_DefaultRole',
}
with DAG(
dag_id='xcom',
default_args={
'owner': 'airflow'
},
dagrun_timeout=timedelta(hours=2),
start_date=days_ago(2),
schedule_interval='0 3 * * *',
tags=['example'],
) as dag:
# [START howto_operator_emr_automatic_steps_tasks]
job_flow_creator = EmrCreateJobFlowOperator(
task_id='create_job_flow',
job_flow_overrides=JOB_FLOW_OVERRIDES,
aws_conn_id='aws_default',
emr_conn_id='emr_default',
)
step_adder = EmrAddStepsOperator(
task_id='add_steps',
job_flow_id = "{{ task_instance.xcom_pull(task_ids='create_job_flow', key='return_value') }}",
aws_conn_id='aws_default',
steps=SPARK_STEPS,
)
Output:
XCom
Key Value
job_flow_id None
Please help.
I want to save the cluster id in json.
But I'm not getting it using xcom
There are various ways to pass the job_flow_id, can you please try them and let me know the outcome.
first, with xcom, try using xcom_pull simply as below
step_adder = EmrAddStepsOperator(
task_id='add_steps',
job_flow_id="{{ task_instance.xcom_pull('create_job_flow', key='return_value') }}",
aws_conn_id='aws_default',
steps=SPARK_STEPS,
)
with output key of the EmrCreateJobFlowOperator
step_adder = EmrAddStepsOperator(
task_id='add_steps',
job_flow_id=job_flow_creator.output,
aws_conn_id='aws_default',
steps=SPARK_STEPS,
)
not sure but also try job_flow_creator alone, without job_flow_creator.output
I have the next problem, I'm using a process on Python that must wait X number of second, the process by itself work correctly, the problem is when I put it as task on celery.
When the worker try to do the time.sleep(X) on one task it pause all the tasks in the worker, for example:
I have the Worker A, it can do 4 tasks at the same time (q,w,e and r), the task r have a sleep of 1800 seconds, so the worker is doing the 4 tasks at the same time, but when the r task do the sleep the worker stop q, w and e too.
Is this normal? Do you know how I can solve this problem?
EDIT:
this is an example of celery.py with my beat and queues
app.conf.update(
CELERY_DEFAULT_QUEUE='default',
CELERY_QUEUES=(
Queue('search', routing_key='search.#'),
Queue('tests', routing_key='tests.#'),
Queue('default', routing_key='tasks.#'),
),
CELERY_DEFAULT_EXCHANGE='tasks',
CELERY_DEFAULT_EXCHANGE_TYPE='topic',
CELERY_DEFAULT_ROUTING_KEY='tasks.default',
CELERY_TASK_RESULT_EXPIRES=10,
CELERYD_TASK_SOFT_TIME_LIMIT=1800,
CELERY_ROUTES={
'tests.tasks.volume': {
'queue': 'tests',
'routing_key': 'tests.volume',
},
'tests.tasks.summary': {
'queue': 'tests',
'routing_key': 'tests.summary',
},
'search.tasks.links': {
'queue': 'search',
'routing_key': 'search.links',
},
'search.tasks.urls': {
'queue': 'search',
'routing_key': 'search.urls',
},
},
CELERYBEAT_SCHEDULE={
# heavy one
'each-hour-summary': {
'task': 'tests.tasks.summary',
'schedule': crontab(minute='0', hour='*/1'),
'args': (),
},
'each-hour-volume': {
'task': 'tests.tasks.volume',
'schedule': crontab(minute='0', hour='*/1'),
'args': (),
},
'links-each-cuarter': {
'task': 'search.tasks.links',
'schedule': crontab(minute='*/15'),
'args': (),
},
'urls-each-ten': {
'schedule': crontab(minute='*/10'),
'task': 'search.tasks.urls',
'args': (),
},
}
)
test.tasks.py
#app.task
def summary():
execute_sumary() #heavy task ~ 1 hour aprox
#app.task
def volume():
execute_volume() #no important ~ less than 5 minutes
and search.tasks.py
#app.task
def links():
free = search_links() #return boolean
if free:
process_links()
else:
time.sleep(1080) #<--------sleep with which I have problems
process_links()
#app.task
def urls():
execute_urls() #no important ~ less than 1 minute
Well, I have 2 workers, A for the queue search and B for tests and defaul.
The problem is with A, when it take the task "links" and it execute the time.sleep() it stop the other tasks that the worker is doing.
Because the worker B is working correctly I thinks the problem is the time.sleep() function.
If you only have one process/thread, call to sleep() will block it. This means that no other task will run...
You set CELERYD_TASK_SOFT_TIME_LIMIT=1800 but your sleep is 1080.
Only one or two task can work in this time interval.
Set CELERYD_TASK_SOFT_TIME_LIMIT > (1080+(work time))*3
Set more --concurency (> 4) when start celery worker.
I'm using Django + Celery to asynchronously process data.
Here is my settings.py:
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.locmem.LocMemCache',
'LOCATION': 'unique-snowflake'
}
}
And here is my Celery task:
from celery import shared_task
from django.core.cache import cache
#shared_task
def process():
my_data = cache.get('hello')
if(my_data == None):
my_data = 'something'
cache.set('hello', my_data)
It's very simple. However, everytime I call the task, cache.get('hello') returns always None. I have no clu why. Someone could help me?
I also tried with Memcached and these settings:
> CACHES = {
> 'default': {
> 'BACKEND':
> 'django.core.cache.backends.memcached.MemcachedCache',
> 'LOCATION': '127.0.0.1:11211',
> 'TIMEOUT': 60 * 60 * 60 * 24,
> 'OPTIONS': {
> 'MAX_ENTRIES': 5000,
> }
> } }
Of course, memcached is running as daemon. But the code is still not working...
I am using django 1.3's logging feature and trying to implement a timedrotatingfilehandler to rotate logs every hour.The logger is rotating successfully after every hour but it seems during every log request it truncates the file.The file has only the last written message.Is this an issue in django handler or am i missing somewhere.The logging dictionary is below:
LOGGING = {
'version': 1,
'disable_existing_loggers': True,
'formatters': {
'standard': {
'format' : "%(asctime)s:%(pathname)s:%(lineno)s: %(message)s",
'datefmt' : "%d/%b/%Y %H:%M:%S"
},
},
'handlers': {
'logfile': {
'level':'DEBUG',
'class':'logging.handlers.TimedRotatingFileHandler',
'filename': "/tmp/log1.log",
'when' : 'hour',
'interval' : 0,
'formatter': 'standard',
},
},
'loggers': {
'collection': {
'handlers': ['logfile'],
'level': 'DEBUG',
},
}
}
Please note: when the interval is set to 1, the log is not getting rotated.Is this a bug in django?
You need to set:
'when' : 'H',
'interval' : 1,
From the code, current 'when' events supported:
S - Seconds
M - Minutes
H - Hours
D - Days
midnight - roll over at midnight
W{0-6} - roll over on a certain day; 0 - Monday
Interval is the number of intervals to count (e.g. when == 'H' and interval == 2 will result in 2 hours).
Whenever you are creating a log file, just add a datetime stamp to the filename. This will make sure that the file will never be truncated.
I guess there are multiple processes writing to your log file, in which case you can use ConcurrentLogHandler to avoid truncating.