I have the next problem, I'm using a process on Python that must wait X number of second, the process by itself work correctly, the problem is when I put it as task on celery.
When the worker try to do the time.sleep(X) on one task it pause all the tasks in the worker, for example:
I have the Worker A, it can do 4 tasks at the same time (q,w,e and r), the task r have a sleep of 1800 seconds, so the worker is doing the 4 tasks at the same time, but when the r task do the sleep the worker stop q, w and e too.
Is this normal? Do you know how I can solve this problem?
EDIT:
this is an example of celery.py with my beat and queues
app.conf.update(
CELERY_DEFAULT_QUEUE='default',
CELERY_QUEUES=(
Queue('search', routing_key='search.#'),
Queue('tests', routing_key='tests.#'),
Queue('default', routing_key='tasks.#'),
),
CELERY_DEFAULT_EXCHANGE='tasks',
CELERY_DEFAULT_EXCHANGE_TYPE='topic',
CELERY_DEFAULT_ROUTING_KEY='tasks.default',
CELERY_TASK_RESULT_EXPIRES=10,
CELERYD_TASK_SOFT_TIME_LIMIT=1800,
CELERY_ROUTES={
'tests.tasks.volume': {
'queue': 'tests',
'routing_key': 'tests.volume',
},
'tests.tasks.summary': {
'queue': 'tests',
'routing_key': 'tests.summary',
},
'search.tasks.links': {
'queue': 'search',
'routing_key': 'search.links',
},
'search.tasks.urls': {
'queue': 'search',
'routing_key': 'search.urls',
},
},
CELERYBEAT_SCHEDULE={
# heavy one
'each-hour-summary': {
'task': 'tests.tasks.summary',
'schedule': crontab(minute='0', hour='*/1'),
'args': (),
},
'each-hour-volume': {
'task': 'tests.tasks.volume',
'schedule': crontab(minute='0', hour='*/1'),
'args': (),
},
'links-each-cuarter': {
'task': 'search.tasks.links',
'schedule': crontab(minute='*/15'),
'args': (),
},
'urls-each-ten': {
'schedule': crontab(minute='*/10'),
'task': 'search.tasks.urls',
'args': (),
},
}
)
test.tasks.py
#app.task
def summary():
execute_sumary() #heavy task ~ 1 hour aprox
#app.task
def volume():
execute_volume() #no important ~ less than 5 minutes
and search.tasks.py
#app.task
def links():
free = search_links() #return boolean
if free:
process_links()
else:
time.sleep(1080) #<--------sleep with which I have problems
process_links()
#app.task
def urls():
execute_urls() #no important ~ less than 1 minute
Well, I have 2 workers, A for the queue search and B for tests and defaul.
The problem is with A, when it take the task "links" and it execute the time.sleep() it stop the other tasks that the worker is doing.
Because the worker B is working correctly I thinks the problem is the time.sleep() function.
If you only have one process/thread, call to sleep() will block it. This means that no other task will run...
You set CELERYD_TASK_SOFT_TIME_LIMIT=1800 but your sleep is 1080.
Only one or two task can work in this time interval.
Set CELERYD_TASK_SOFT_TIME_LIMIT > (1080+(work time))*3
Set more --concurency (> 4) when start celery worker.
Related
Iam working with Celery Worker with Redis as broker and in backend.
Steps:
Add process will pick the data from queue and process and the result will be send to read queue.
Read process will read the data from read queue and print the result.
Before picking up by second task, the message is getting deleted with the below message
[2023-02-02 16:41:29,992: WARNING/MainProcess] Received and deleted unknown message. Wrong destination?!?
The full contents of the message body was: body: {'task_id': 'e54849dc-3bc7-409c-afab-5c69b3310d99', 'status': 'SUCCESS', 'result': 7, 'traceback': None, 'children': []} (120b)
{content_type:'application/json' content_encoding:'utf-8'
delivery_info:{'exchange': '', 'routing_key': 'read'} headers={}}
Code snippet:
from kombu import Queue, Exchange
from celery import Celery
celery = Celery('tasks', broker="redis://localhost:6379/0", backend="rpc://")
CELERY_DEFAULT_QUEUE = 'default'
CELERY_DEFAULT_EXCHANGE = 'default'
CELERY_DEFAULT_EXCHANGE_TYPE = 'direct'
CELERY_DEFAULT_ROUTING_KEY = 'default'
celery.conf.update(
CELERY_ROUTES={
"celery_worker.celery_worker.add": {
"queue": "add",
"routing_key": "add"
},
"celery_worker.celery_worker.read": {
"queue": "read",
"routing_key": "read"
}
},
CELERY_QUEUES = (
Queue(CELERY_DEFAULT_QUEUE, Exchange(CELERY_DEFAULT_EXCHANGE),
routing_key=CELERY_DEFAULT_ROUTING_KEY),
Queue("add", Exchange(CELERY_DEFAULT_EXCHANGE),
routing_key="add"),
Queue("read", Exchange(CELERY_DEFAULT_EXCHANGE),
routing_key="read"),
),
CELERY_CREATE_MISSING_QUEUES = True,
CELERYD_PREFETCH_MULTIPLIER = 1)
#celery.task(name='add',acks_late=True)
def add(x, y):
print(f"Order Complete!{x}, {y}")
return x + y
#celery.task(name='read',acks_late=True, queue="read", routing_key="read")
def read(data):
print("data")
print(f"Order Completedread!{data}")
return data
task_1 = add.apply_async((4,3), queue='add', routing_key="add", reply_to="read")
Help to resolve this issue
I am trying to build sensor for the execution of pipeline/graph. The sensor would check on different intervals and executes the job containing different ops. Now the Job requires some resource_defs and config. In the offical documentation I don't see how I can define resource_defs for Job. A small hint would be great
Question : where or how do i define resource_defs in sensor ? Do I even have to define it ? its not mentioned in official documentation
https://docs.dagster.io/concepts/partitions-schedules-sensors/sensors
### defining Job
#job(
resource_defs = {"some_API_Module": API_module , "db_Module" : db} ,
config = {key : value }
)
def job_pipeline ():
op_1 () ## API is used as required resource
op_2 () ## db is used as required resource
### defining sensor that triggers the Job
#sensor ( Job = job_pipeline) :
### some calculation
yield RunRequest(run_key = "" config = {key : value} )
It seems like you may be missing building your run config from within the sensor. The configuration you pass to a RunRequest should contain the resource configuration you want to run the job with, and will look exactly like the run configuration you'd configure from the launchpad. Something like:
### defining sensor that triggers the Job
#sensor ( Job = job_pipeline) :
### some calculation
run_config = {
"ops": {
"op1": {
"config": {
"key": "value"
},
}
},
"op2": {
"config": {
"key": "value",
}
},
},
"resources": {
"some_API_module": {
"config": {"key": "value"}
},
"db": {
"config": {"key": "value"}
},
},
}
yield RunRequest(run_key="<unique_value>", run_config=run_config)
In django admin panel, I created about 1500 celery-beat periodic tasks, and 100 of them has the same crontab schedule 15 4 * 1 * (m/h/d/dM/MY) UTC(Minute(s)/Hour(s)/Day(s) Of The Week/Day(s) Of The Month/Month(s) Of The Year). All of them were enabled.
But at 04:15 on day-of-month 1, one of the periodic tasks was not sending due task to celery worker while all other tasks were sending. From the django admin panel, this periodic task's last_run_time is None, which indicates that it is not triggered.
I tried to configure the crontab schedule to 15 * * * * (m/h/d/dM/MY) UTC and then it runs successfully at minute 15. So I wonder is there any limitation of the number of celery-beat periodic tasks?
celery.py
app = Celery('myapp', broker=os.getenv('BROKER_URL', None))
#signals.setup_logging.connect
def setup_logging(**kwargs):
"""Setup logging."""
pass
app.conf.ONCE = {
'backend': 'celery_once.backends.Redis',
'settings': {
'url': 'redis://localhost:6379/0',
'default_timeout': 60 * 60
}
}
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
from django.utils import timezone
app.conf.update(
CELERY_ALWAYS_EAGER=bool(os.getenv('CELERY_ALWAYS_EAGER', False)),
CELERY_DISABLE_RATE_LIMITS=True,
# CELERYD_MAX_TASKS_PER_CHILD=5,
CELERY_TASK_RESULT_EXPIRES=3600,
# Uncomment below if you want to use django-celery backend
# CELERY_RESULT_BACKEND='djcelery.backends.database:DatabaseBackend',
CELERY_ACCEPT_CONTENT=['json'],
CELERY_TASK_SERIALIZER='json',
CELERY_RESULT_SERIALIZER='json',
# Uncomment below if you want to store the periodic task information in django-celery backend
CELERYBEAT_SCHEDULER = "django_celery_beat.schedulers.DatabaseScheduler",
# periodic tasks setup
CELERYBEAT_SCHEDULE={
'check_alerts_task': {
'task': 'devices.tasks.alert_task',
'schedule': 300.0 # run every 5 minutes
},
'weather_request': {
'task': 'organizations.tasks.weather_request',
'schedule': 60.0 # run every minute
},
'update_crontab': {
'task': 'devices.tasks.update_crontab',
'schedule': crontab(hour=0, minute=0)
},
'export_data_cleanup': {
'task': 'devices.tasks.delete_expiry_export_data',
'schedule': crontab(minute=0)
}
}
)
celery.log
[01/Jun/2020 04:15:09] INFO [celery.beat:271] Scheduler: Sending due task Report_first (reportAPI.tasks.report)
Your crontab is actually wrong ( minute hour day month weekday )
15 4 * 1 * -> 04:15 each day in 1st month ( January )
what you want is 1st day of month 15 4 1 * *
I was new to python, Now, I have an requirements to run single function as parallel for some process and some may be depends on others.
My dataset will be like:
[
{
'process_id': 1,
'dependency_id': 0
},
{
'process_id': 2,
'dependency_id': 0
},
{
'process_id': 3,
'dependency_id': 0
},
{
'process_id': 4,
'dependency_id': 2
},
{
'process_id': 5,
'dependency_id': 2
},
{
'process_id': 6,
'dependency_id': 1
}
]
Here,
the process_id 1,2,3 should run in parallel because it has the
dependency_id as 0.
the process_id 4 has the dependency_id as 2 -- it should wait until
process 2 get done
the process_id 5 has the dependency_id as 2 -- it should wait until
process 2 get done. Now process 4 and 5 should wait for 2 and get run
in parallel.
the process_id 6 has the dependency_id as 1 -- it should wait until
process 1 get done
I have tried in the following way :
from multiprocessing import Process
from time import sleep
process = []
process_list =[{'process_id': 1,'dependency_id': 0},{'process_id': 2,'dependency_id': 0},{'process_id': 3,'dependency_id': 0},{'process_id': 4,'dependency_id': 2},{'process_id': 5,'dependency_id': 2},{'process_id': 6,'dependency_id': 1}]
def process_fn(process_id):
print i
def main():
for item in len(process_list):
p = process(target=process_fn, args=(item['process_id']))
process.append(p)
p.start()
But I want to know how to include that dependency logic inside multiprocessing. Can anyone please help me with the solution.
I'm using Django + Celery to asynchronously process data.
Here is my settings.py:
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.locmem.LocMemCache',
'LOCATION': 'unique-snowflake'
}
}
And here is my Celery task:
from celery import shared_task
from django.core.cache import cache
#shared_task
def process():
my_data = cache.get('hello')
if(my_data == None):
my_data = 'something'
cache.set('hello', my_data)
It's very simple. However, everytime I call the task, cache.get('hello') returns always None. I have no clu why. Someone could help me?
I also tried with Memcached and these settings:
> CACHES = {
> 'default': {
> 'BACKEND':
> 'django.core.cache.backends.memcached.MemcachedCache',
> 'LOCATION': '127.0.0.1:11211',
> 'TIMEOUT': 60 * 60 * 60 * 24,
> 'OPTIONS': {
> 'MAX_ENTRIES': 5000,
> }
> } }
Of course, memcached is running as daemon. But the code is still not working...