I'm trying to call a task and create a queue for that task if it doesn't exist then immediately insert to that queue the called task. I have the following code:
#task
def greet(name):
return "Hello %s!" % name
def run():
result = greet.delay(args=['marc'], queue='greet.1',
routing_key='greet.1')
print result.ready()
then I have a custom router:
class MyRouter(object):
def route_for_task(self, task, args=None, kwargs=None):
if task == 'tasks.greet':
return {'queue': kwargs['queue'],
'exchange': 'greet',
'exchange_type': 'direct',
'routing_key': kwargs['routing_key']}
return None
this creates an exchange called greet.1 and a queue called greet.1 but the queue is empty. The exchange should be just called greet which knows how to route a routing key like greet.1 to the queue called greet.1.
Any ideas?
When you do the following:
task.apply_async(queue='foo', routing_key='foobar')
Then Celery will take default values from the 'foo' queue in CELERY_QUEUES,
or if it does not exist then automatically create it using (queue=foo, exchange=foo, routing_key=foo)
So if 'foo' does not exist in CELERY_QUEUES you will end up with:
queues['foo'] = Queue('foo', exchange=Exchange('foo'), routing_key='foo')
The producer will then declare that queue, but since you override the routing_key,
actually send the message using routing_key = 'foobar'
This may seem strange but the behavior is actually useful for topic exchanges,
where you publish to different topics.
It's harder to do what you want though, you can create the queue yourself
and declare it, but that won't work well with automatic message publish retries.
It would be better if the queue argument to apply_async could support
a custom kombu.Queue instead that will be both declared and used as the destination.
Maybe you could open an issue for that at http://github.com/celery/celery/issues
Related
I have created a celery task as below
import os
import time
from celery import Celery
from dotenv import load_dotenv
load_dotenv()
celery = Celery(__name__)
celery.conf.broker_url = os.environ.get("CELERY_BROKER_URL")
celery.conf.result_backend = os.environ.get("CELERY_RESULT_BACKEND")
#celery.task(name="create_task")
def message_sender(sender_func, numbers: list, message: str):
sender_func(numbers, message)
return "Sent Successful"
And calling the task as below
modem_conn = Modem()
task = message_sender.apply_async(
kwargs={
"sender_func": modem_conn.sms,
"numbers": ["00000000"],
"message": "sms sent",
}
)
But I am getting bellow error
kombu.exceptions.EncodeError: Object of type method is not JSON serializable
But if I call the task without delay or apply_async, then it workes. What could be the problem here and how can I achive this.
All I want to do is pass a function or instance while calling the celery task.
The celery task is run in another instance than your app, and both instances communicate via the broker. Since you don't "call" the task function, but only send messages with serialized data that tell the worker which function to call, you can't send objects or functions. This is similar to multiprocessing, where only serialized text messages can be sent between the processes.
My approach would be to make the function known to the worker and then send e.g. a string with the name of the function and call it.
sender:
task = message_sender.apply_async(
kwargs={
"sender_func": "sms",
"numbers": ["00000000"],
"message": "sms sent",
}
)
worker:
#celery.task(name="create_task")
def message_sender(sender_func, numbers: list, message: str):
modem_conn = Modem()
if sender_func == "sms":
modem_conn.sms(numbers, message)
return "Sent Successful"
You could also use getattr or locals()
I have a view that sends a message to a RabbitMQ queue.
message = {'origin': 'Bytes CSV',
'data': {'csv_key': str(csv_entry.key),
'csv_fields': csv_fields
'order_by': order_by,
'filters': filters}}
...
queue_service.send(message=message, headers={}, exchange_name=EXCHANGE_IN_NAME,
routing_key=MESSAGES_ROUTING_KEY.replace('#', 'bytes_counting.create'))
On my consumer, I have a long process to generate a CSV.
def create(self, data):
csv_obj = self._get_object(key=data['csv_key'])
if csv_obj.status == CSVRequestStatus.CANCELED:
self.logger.info(f'CSV {csv_obj.key} was canceled by the user')
return
result = self.generate_result_data(filters=data['filters'], order_by=data['order_by'], csv_obj=csv_obj)
csv_data = self._generate_csv(result=result, csv_fields=data['csv_fields'], csv_obj=csv_obj)
file_key = self._post_csv(csv_data=csv_data, csv_obj=csv_obj)
csv_obj.status = CSVRequestStatus.READY
csv_obj.status_additional = CSVRequestStatusAdditional.SUCCESS
csv_obj.file_key = file_key
csv_obj.ready_at = timezone.now()
csv_obj.save(update_fields=['status', 'status_additional', 'ready_at', 'file_key'])
self.logger.info(f'CSV {csv_obj.name} created')
The long proccess happens inside self._generate_csv, because self.generate_result_data returns a queryset, which is lazy.
As you can see, if a user changes the status of the csv_request through an endpoint BEFORE the message starts to be consumed the proccess will not be evaluated. My goal is to let this happen during the execution of self._generate_csv.
So far I tried to use Threading, but unsuccessfully.
How can I achive my goal?
Thanks a lot!
Why don't you checkout Celery library ? Using celery with django with RabbitMQ backend is much easier than directly leveraging rabbitmq queues.
Celery has an inbuilt function revoke to terminate an ongoing task:
>>> from celery.task.control import revoke
>>> revoke(task_id, terminate=True)
related SO answer
celery docs
For your use case, you probably want something like (code snippets):
## celery/tasks.py
from celery import app
#app.task(queue="my_queue")
def create_csv(message):
# ...snip...
pass
## main.py
from celery import uuid, current_app
def start_task(task_id, message):
current_app.send_task(
"create_csv",
args=[message],
task_id=task_id,
)
def kill_task(task_id):
current_app.control.revoke(task_id, terminate=True)
## signals.py
from django.dispatch import receiver
from .models import MyModel
from .main import kill_task
# choose appropriate signal to listen for DB change
#receiver(models.signals.post_save, sender=MyModel)
def handler(sender, instance, **kwargs):
kill_task(instance.task_id)
Use celery.uuid to generate task IDs which can be stored in DB or cache and use the same task ID to control the task i.e. request termination.
Since self._generate_csv is the slowest, the obvious solution would be to work with this function.
To do this, you can divide the creation of the csv file into several pieces. After creating each piece, check the status and see if you can continue to create the file. At the very end, glue all the pieces into a finished file.
Here is a method for combining multiple files into one.
I am trying to access taskid of celery task in different helper function with a AsyncResult but unable to acces it
https://celery.readthedocs.io/en/latest/reference/celery.result.html#celery.result.AsyncResult
iam a getting None on this link and Tried with some other links like https://docs.celeryproject.org/en/latest/userguide/tasks.html
def anotherFunction(data):
try:
fvlfn
except Exception as identifier:
logging.exception(identifier)
#app.task(bind=True)
def send(self):
try:
TASKID = self.request.id
anotherFunction('no if s')
except Exception as identifier:
self.update_state(state='ALMOST DONE')
logging.exception(self.request.id)
I want to access taskid in anotherFunction without passing it
You can always use the Celery application object to get the information about the current task: myapp.current_task.request.id (where myapp is an instance of the Celery class)
However, just because you can does not mean that is what you should do. Many of us prefer explicitly passing the data needed for function to run, instead of using some global object.
I have a RabbitMQ topic exchange named experiment. I'm building a consumer where I'd like to receive all messages whose routing key begins with "foo" and all messages whose routing key begins with "bar".
According to the RabbitMQ docs, and based on my own experimentation in the management UI, it should be possible to have one exchange, one queue, and two bindings (foo.# and bar.#) that connect them.
I can't figure out how to express this using Kombu's ConsumerMixin. I feel like I should be able to do:
q = Queue(exchange=exchange, routing_key=['foo.#', 'bar.#'])
...but it does not like that at all. I've also tried:
q.bind_to(exchange=exchange, routing_key='foo.#')
q.bind_to(exchange=exchange, routing_key='bar.#')
...but every time I try I get:
kombu.exceptions.NotBoundError: Can't call method on Queue not bound to a channel
...which I guess manes sense. However I can't see a place in the mixin's interface where I can easily hook onto the queues once they are bound to the channel. Here's the base (working) code:
from kombu import Connection, Exchange, Queue
from kombu.mixins import ConsumerMixin
class Worker(ConsumerMixin):
exchange = Exchange('experiment', type='topic')
q = Queue(exchange=exchange, routing_key='foo.#', exclusive=True)
def __init__(self, connection):
self.connection = connection
def get_consumers(self, Consumer, channel):
return [Consumer(queues=[self.q], callbacks=[self.on_task])]
def on_task(self, body, message):
print body
message.ack()
if __name__ == '__main__':
with Connection('amqp://guest:guest#localhost:5672//') as conn:
worker = Worker(conn)
worker.run()
...which works, but only gives me foo messages. Other than creating a new Queue for each routing key I'm interested in and passing them all to the Consumer, is there a clean way to do this?
After digging a little bit, I found a way to accomplish this that is fairly close to the first idea I had. Instead of passing a routing_key string to the Queue, pass a bindings list. Each element in the list is an instance of a binding object that specifies the exchange and the routing key.
An example is worth a thousand words:
from kombu import Exchange, Queue, binding
exchange = Exchange('experiment', type='topic')
q = Queue(exchange=exchange, bindings=[
binding(exchange, routing_key='foo.#'),
binding(exchange, routing_key='bar.#')
], exclusive=True)
And it works great!
Here is a small adjustment of the answer by smitelli. When the bindings parameter is used for defining bindings, the exchange parameter is ignored.
Adjusted example:
from kombu import Exchange, Queue, binding
exchange = Exchange('experiment', type='topic')
q = Queue(bindings=[
binding(exchange, routing_key='foo.#'),
binding(exchange, routing_key='bar.#'),
])
The exchange parameter is discarded during the Queue init:
if self.bindings:
self.exchange = None
How to check that a function in executed by celery?
def notification():
# in_celery() returns True if called from celery_test(),
# False if called from not_celery_test()
if in_celery():
# Send mail directly without creation of additional celery subtask
...
else:
# Send mail with creation of celery task
...
#celery.task()
def celery_test():
notification()
def not_celery_test():
notification()
Here is one way to do it by using celery.current_task. Here is the code to be used by the task:
def notification():
from celery import current_task
if not current_task:
print "directly called"
elif current_task.request.id is None:
print "called synchronously"
else:
print "dispatched"
#app.task
def notify():
notification()
This is code you can run to exercise the above:
from core.tasks import notify, notification
print "DIRECT"
notification()
print "NOT DISPATCHED"
notify()
print "DISPATCHED"
notify.delay().get()
My task code in the first snippet was in a module named core.tasks. And I shoved the code in the last snippet in a custom Django management command. This tests 3 cases:
Calling notification directly.
Calling notification through a task executed synchronously. That is, this task is not dispatched through Celery to a worker. The code of the task executes in the same process that calls notify.
Calling notification through a task run by a worker. The code of the task executes in a different process from the process that started it.
The output was:
NOT DISPATCHED
called synchronously
DISPATCHED
DIRECT
directly called
There is no line from the print in the task on the output after DISPATCHED because that line ends up in the worker log:
[2015-12-17 07:23:57,527: WARNING/Worker-4] dispatched
Important note: I initially was using if current_task is None in the first test but it did not work. I checked and rechecked. Somehow Celery sets current_task to an object which looks like None (if you use repr on it, you get None) but is not None. Unsure what is going on there. Using if not current_task works.
Also, I've tested the code above in a Django application but I've not used it in production. There may be gotchas I don't know.