Related
from kafka import KafkaProducer, errors, admin, KafkaConsumer
SERVERS = ['localhost:9092']
TEST_TOPIC = 'test-topic'
DATA = [{'A':'A'}, {'A':'A'}, {'A':'A'}]
class TestKafkaConsumer(unittest.TestCase):
#classmethod
def setUpClass(self):
self._producer = KafkaProducer(bootstrap_servers=SERVERS, value_serializer=lambda x:dumps(x).encode('utf-8'))
def _send_data(self):
for data in DATA:
print(self._producer.send(TEST_TOPIC, value=data))
def test_basic_processing(self):
self._send_data()
received = []
consumer = KafkaConsumer(TEST_TOPIC, bootstrap_servers=SERVERS)
for msg in consumer:
message = json.loads(msg.value.decode('utf-8'))
received.append(message)
if (len(received) >= len(DATA)):
self.assertEqual(received, DATA)
This should succeed pretty quickly, as it just sends the data to the the Kafka broker in a pretty straightforward manner. However, it times out; the consumer never reads a single message. If I move the consumer portion to a different file and run it in a different terminal window, the messages are "consumed" pretty instantly. Why is the unittest not working for a consumer in this unittest?
You're producing records with your producer and then you're reading, this might be your problem.
When your consumer is started, you already had produced records, so, from the consumer point of view, there are no new messages.
You should run your consumer in a different thread, before your producer start producing.
Yannick
I'm using Python 2.7 (sigh), celery==3.1.19, librabbitmq==1.6.1, rabbitmq-server-3.5.6-1.noarch, and redis 2.8.24 (from redis-cli info).
I'm attempting to send a message from a celery producer to a celery consumer, and obtain the result back in the producer. There is 1 producer and 1 consumer, but 2 rabbitmq's (as brokers) and 1 redis (for results) in between.
The problem I'm facing is:
In the consumer, I get back get an AsyncResult via async_result =
ZipUp.delay(unique_directory), but async_result.ready() never
returns True (at least for 9 seconds it doesn't) - even for a
consumer task that does essentially nothing but return a string.
I can see, in the rabbitmq management web interface, my message
being received by the rabbitmq exchange, but it doesn't show up in
the corresponding rabbitmq queue. Also, a log message sent by the
very beginning of the ZipUp task doesn't appear to be getting
logged.
Things work if I don't try to get a result back from the AsyncResult! But I'm kinda hoping to get the result of the call - it's useful :).
Below are configuration specifics.
We're setting up Celery as follows for returns:
CELERY_RESULT_BACKEND = 'redis://%s' % _SHARED_WRITE_CACHE_HOST_INTERNAL
CELERY_RESULT = Celery('TEST', broker=CELERY_BROKER)
CELERY_RESULT.conf.update(
BROKER_HEARTBEAT=60,
CELERY_RESULT_BACKEND=CELERY_RESULT_BACKEND,
CELERY_TASK_RESULT_EXPIRES=100,
CELERY_IGNORE_RESULT=False,
CELERY_RESULT_PERSISTENT=False,
CELERY_ACCEPT_CONTENT=['json'],
CELERY_TASK_SERIALIZER='json',
CELERY_RESULT_SERIALIZER='json',
)
We have another Celery configuration that doesn't expect a return value, and that works - in the same program. It looks like:
CELERY = Celery('TEST', broker=CELERY_BROKER)
CELERY.conf.update(
BROKER_HEARTBEAT=60,
CELERY_RESULT_BACKEND=CELERY_BROKER,
CELERY_TASK_RESULT_EXPIRES=100,
CELERY_STORE_ERRORS_EVEN_IF_IGNORED=False,
CELERY_IGNORE_RESULT=True,
CELERY_ACCEPT_CONTENT=['json'],
CELERY_TASK_SERIALIZER='json',
CELERY_RESULT_SERIALIZER='json',
)
The celery producer's stub looks like:
#CELERY_RESULT.task(name='ZipUp', exchange='cognition.workflow.ZipUp_%s' % INTERNAL_VERSION)
def ZipUp(directory): # pylint: disable=invalid-name
""" Task stub """
_unused_directory = directory
raise NotImplementedError
It's been mentioned that using queue= instead of exchange= in this stub would be simpler. Can anyone confirm that (I googled but found exactly nothing on the topic)? Apparently you can just use queue= unless you want to use fanout or something fancy like that, since not all celery backends have the concept of an exchange.
Anyway, the celery consumer starts out with:
#task(queue='cognition.workflow.ZipUp_%s' % INTERNAL_VERSION, name='ZipUp')
#StatsInstrument('workflow.ZipUp')
def ZipUp(directory): # pylint: disable=invalid-name
'''
Zip all files in directory, password protected, and return the pathname of the new zip archive.
:param directory Directory to zip
'''
try:
LOGGER.info('zipping up {}'.format(directory))
But "zipping up" doesn't get logged anywhere. I searched every (disk-backed) file on the celery server for that string, and got two hits: /usr/bin/zip, and my celery task's code - and no log messages.
Any suggestions?
Thanks for reading!
It appears that using the following task stub in the producer solved the problem:
#CELERY_RESULT.task(name='ZipUp', queue='cognition.workflow.ZipUp_%s' % INTERNAL_VERSION)
def ZipUp(directory): # pylint: disable=invalid-name
""" Task stub """
_unused_directory = directory
raise NotImplementedError
In short, it's using queue= instead of exchange= .
What is the easiest way to create a delay (or parking) queue with Python, Pika and RabbitMQ? I have seen an similar questions, but none for Python.
I find this an useful idea when designing applications, as it allows us to throttle messages that needs to be re-queued again.
There are always the possibility that you will receive more messages than you can handle, maybe the HTTP server is slow, or the database is under too much stress.
I also found it very useful when something went wrong in scenarios where there is a zero tolerance to losing messages, and while re-queuing messages that could not be handled may solve that. It can also cause problems where the message will be queued over and over again. Potentially causing performance issues, and log spam.
I found this extremely useful when developing my applications. As it gives you an alternative to simply re-queuing your messages. This can easily reduce the complexity of your code, and is one of many powerful hidden features in RabbitMQ.
Steps
First we need to set up two basic channels, one for the main queue, and one for the delay queue. In my example at the end, I include a couple of additional flags that are not required, but makes the code more reliable; such as confirm delivery, delivery_mode and durable. You can find more information on these in the RabbitMQ manual.
After we have set up the channels we add a binding to the main channel that we can use to send messages from the delay channel to our main queue.
channel.queue_bind(exchange='amq.direct',
queue='hello')
Next we need to configure our delay channel to forward messages to the main queue once they have expired.
delay_channel.queue_declare(queue='hello_delay', durable=True, arguments={
'x-message-ttl' : 5000,
'x-dead-letter-exchange' : 'amq.direct',
'x-dead-letter-routing-key' : 'hello'
})
x-message-ttl (Message - Time To Live)
This is normally used to automatically remove old messages in the
queue after a specific duration, but by adding two optional arguments we
can change this behaviour, and instead have this parameter determine
in milliseconds how long messages will stay in the delay queue.
x-dead-letter-routing-key
This variable allows us to transfer the message to a different queue
once they have expired, instead of the default behaviour of removing
it completely.
x-dead-letter-exchange
This variable determines which Exchange used to transfer the message from hello_delay to hello queue.
Publishing to the delay queue
When we are done setting up all the basic Pika parameters you simply send a message to the delay queue using basic publish.
delay_channel.basic_publish(exchange='',
routing_key='hello_delay',
body="test",
properties=pika.BasicProperties(delivery_mode=2))
Once you have executed the script you should see the following queues created in your RabbitMQ management module.
Example.
import pika
connection = pika.BlockingConnection(pika.ConnectionParameters(
'localhost'))
# Create normal 'Hello World' type channel.
channel = connection.channel()
channel.confirm_delivery()
channel.queue_declare(queue='hello', durable=True)
# We need to bind this channel to an exchange, that will be used to transfer
# messages from our delay queue.
channel.queue_bind(exchange='amq.direct',
queue='hello')
# Create our delay channel.
delay_channel = connection.channel()
delay_channel.confirm_delivery()
# This is where we declare the delay, and routing for our delay channel.
delay_channel.queue_declare(queue='hello_delay', durable=True, arguments={
'x-message-ttl' : 5000, # Delay until the message is transferred in milliseconds.
'x-dead-letter-exchange' : 'amq.direct', # Exchange used to transfer the message from A to B.
'x-dead-letter-routing-key' : 'hello' # Name of the queue we want the message transferred to.
})
delay_channel.basic_publish(exchange='',
routing_key='hello_delay',
body="test",
properties=pika.BasicProperties(delivery_mode=2))
print " [x] Sent"
You can use RabbitMQ official plugin: x-delayed-message .
Firstly, download and copy the ez file into Your_rabbitmq_root_path/plugins
Secondly, enable the plugin (do not need to restart the server):
rabbitmq-plugins enable rabbitmq_delayed_message_exchange
Finally, publish your message with "x-delay" headers like:
headers.put("x-delay", 5000);
Notice:
It does not ensure your message's safety, cause if your message expires just during your rabbitmq-server's downtime, unfortunately the message is lost. So be careful when you use this scheme.
Enjoy it and more info in rabbitmq-delayed-message-exchange
FYI, how to do this in Spring 3.2.x.
<rabbit:queue name="delayQueue" durable="true" queue-arguments="delayQueueArguments"/>
<rabbit:queue-arguments id="delayQueueArguments">
<entry key="x-message-ttl">
<value type="java.lang.Long">10000</value>
</entry>
<entry key="x-dead-letter-exchange" value="finalDestinationTopic"/>
<entry key="x-dead-letter-routing-key" value="finalDestinationQueue"/>
</rabbit:queue-arguments>
<rabbit:fanout-exchange name="finalDestinationTopic">
<rabbit:bindings>
<rabbit:binding queue="finalDestinationQueue"/>
</rabbit:bindings>
</rabbit:fanout-exchange>
NodeJS implementation.
Everything is pretty clear from the code.
Hope it will save somebody's time.
var ch = channel;
ch.assertExchange("my_intermediate_exchange", 'fanout', {durable: false});
ch.assertExchange("my_final_delayed_exchange", 'fanout', {durable: false});
// setup intermediate queue which will never be listened.
// all messages are TTLed so when they are "dead", they come to another exchange
ch.assertQueue("my_intermediate_queue", {
deadLetterExchange: "my_final_delayed_exchange",
messageTtl: 5000, // 5sec
}, function (err, q) {
ch.bindQueue(q.queue, "my_intermediate_exchange", '');
});
ch.assertQueue("my_final_delayed_queue", {}, function (err, q) {
ch.bindQueue(q.queue, "my_final_delayed_exchange", '');
ch.consume(q.queue, function (msg) {
console.log("delayed - [x] %s", msg.content.toString());
}, {noAck: true});
});
Message in Rabbit queue can be delayed in 2 ways
- using QUEUE TTL
- using Message TTL
If all messages in queue are to be delayed for fixed time use queue TTL.
If each message has to be delayed by varied time use Message TTL.
I have explained it using python3 and pika module.
pika BasicProperties argument 'expiration' in milliseconds has to be set to delay message in delay queue.
After setting expiration time, publish message to a delayed_queue ("not actual queue where consumers are waiting to consume") , once message in delayed_queue expires, message will be routed to a actual queue using exchange 'amq.direct'
def delay_publish(self, messages, queue, headers=None, expiration=0):
"""
Connect to RabbitMQ and publish messages to the queue
Args:
queue (string): queue name
messages (list or single item): messages to publish to rabbit queue
expiration(int): TTL in milliseconds for message
"""
delay_queue = "".join([queue, "_delay"])
logging.info('Publishing To Queue: {queue}'.format(queue=delay_queue))
logging.info('Connecting to RabbitMQ: {host}'.format(
host=self.rabbit_host))
credentials = pika.PlainCredentials(
RABBIT_MQ_USER, RABBIT_MQ_PASS)
parameters = pika.ConnectionParameters(
rabbit_host, RABBIT_MQ_PORT,
RABBIT_MQ_VHOST, credentials, heartbeat_interval=0)
connection = pika.BlockingConnection(parameters)
channel = connection.channel()
channel.queue_declare(queue=queue, durable=True)
channel.queue_bind(exchange='amq.direct',
queue=queue)
delay_channel = connection.channel()
delay_channel.queue_declare(queue=delay_queue, durable=True,
arguments={
'x-dead-letter-exchange': 'amq.direct',
'x-dead-letter-routing-key': queue
})
properties = pika.BasicProperties(
delivery_mode=2, headers=headers, expiration=str(expiration))
if type(messages) not in (list, tuple):
messages = [messages]
try:
for message in messages:
try:
json_data = json.dumps(message)
except Exception as err:
logging.error(
'Error Jsonify Payload: {err}, {payload}'.format(
err=err, payload=repr(message)), exc_info=True
)
if (type(message) is dict) and ('data' in message):
message['data'] = {}
message['error'] = 'Payload Invalid For JSON'
json_data = json.dumps(message)
else:
raise
try:
delay_channel.basic_publish(
exchange='', routing_key=delay_queue,
body=json_data, properties=properties)
except Exception as err:
logging.error(
'Error Publishing Data: {err}, {payload}'.format(
err=err, payload=json_data), exc_info=True
)
raise
except Exception:
raise
finally:
logging.info(
'Done Publishing. Closing Connection to {queue}'.format(
queue=delay_queue
)
)
connection.close()
Depends on your scenario and needs, I would recommend the following approaches,
Using the official plugin, https://www.rabbitmq.com/blog/2015/04/16/scheduling-messages-with-rabbitmq/, but it will have a capacity issue if the total count of delayed messages exceeds certain number (https://github.com/rabbitmq/rabbitmq-delayed-message-exchange/issues/72), it will not have the high availability option and it will suffer lose of data when it runs out of delayed time during a MQ restart.
Implement a set of cascading delayed queues just like NServiceBus did (https://docs.particular.net/transports/rabbitmq/delayed-delivery).
The following code is what I use to get a count of number of consumers :
import pika
connection = pika.BlockingConnection(pika.ConnectionParameters(host='IP ADDRESS'))
channel = connection.channel()
this=channel.queue_declare(queue="Queue_name",passive=True)
print this.method.consumer_count
Now the count that I obtain are the number of active consumers. However, when consumers are consuming from the queue, this count is printed as zero. Now I need the total number of consumers consuming from the queue. This appears RabbitMQ Management
(as consumers : 0 active
25 Total)
Is there a way to obtain a count of the total number of consumers consuming from a queue, while there are messages in the queue?
Thank you in advance
Following is an answer to this question. However, it uses HTTP API and not pika.
import subprocess
import os
import json
#Execute in command line
def execute_command(command):
proc = subprocess.Popen(command,shell=True,stdout=subprocess.PIPE)
script_response = proc.stdout.read().split('\n')
resp=json.loads(script_response[7])
print resp[0]['name']
print resp[0]['consumers']
######### MAIN #########
if __name__ == '__main__':
execute_command('curl -i -u guest:guest http://*IP ADDRESS*:15672/api/queues/')
Please refer : http://hg.rabbitmq.com/rabbitmq-management/raw-file/3646dee55e02/priv/www-api/help.html
A simple option:
self._channel = self._connection.channel()
queue_state = self._channel.queue_declare(queue=self.__queue_name, passive=True, durable=True)
print(queue_state.method.consumer_count)
print(queue_state.method.message_count)
How can I retrieve a list of tasks in a queue that are yet to be processed?
EDIT: See other answers for getting a list of tasks in the queue.
You should look here:
Celery Guide - Inspecting Workers
Basically this:
my_app = Celery(...)
# Inspect all nodes.
i = my_app.control.inspect()
# Show the items that have an ETA or are scheduled for later processing
i.scheduled()
# Show tasks that are currently active.
i.active()
# Show tasks that have been claimed by workers
i.reserved()
Depending on what you want
If you are using Celery+Django simplest way to inspect tasks using commands directly from your terminal in your virtual environment or using a full path to celery:
Doc: http://docs.celeryproject.org/en/latest/userguide/workers.html?highlight=revoke#inspecting-workers
$ celery inspect reserved
$ celery inspect active
$ celery inspect registered
$ celery inspect scheduled
Also if you are using Celery+RabbitMQ you can inspect the list of queues using the following command:
More info: https://linux.die.net/man/1/rabbitmqctl
$ sudo rabbitmqctl list_queues
if you are using rabbitMQ, use this in terminal:
sudo rabbitmqctl list_queues
it will print list of queues with number of pending tasks. for example:
Listing queues ...
0b27d8c59fba4974893ec22d478a7093 0
0e0a2da9828a48bc86fe993b210d984f 0
10#torob2.celery.pidbox 0
11926b79e30a4f0a9d95df61b6f402f7 0
15c036ad25884b82839495fb29bd6395 1
celerey_mail_worker#torob2.celery.pidbox 0
celery 166
celeryev.795ec5bb-a919-46a8-80c6-5d91d2fcf2aa 0
celeryev.faa4da32-a225-4f6c-be3b-d8814856d1b6 0
the number in right column is number of tasks in the queue. in above, celery queue has 166 pending task.
If you don't use prioritized tasks, this is actually pretty simple if you're using Redis. To get the task counts:
redis-cli -h HOST -p PORT -n DATABASE_NUMBER llen QUEUE_NAME
But, prioritized tasks use a different key in redis, so the full picture is slightly more complicated. The full picture is that you need to query redis for every priority of task. In python (and from the Flower project), this looks like:
PRIORITY_SEP = '\x06\x16'
DEFAULT_PRIORITY_STEPS = [0, 3, 6, 9]
def make_queue_name_for_pri(queue, pri):
"""Make a queue name for redis
Celery uses PRIORITY_SEP to separate different priorities of tasks into
different queues in Redis. Each queue-priority combination becomes a key in
redis with names like:
- batch1\x06\x163 <-- P3 queue named batch1
There's more information about this in Github, but it doesn't look like it
will change any time soon:
- https://github.com/celery/kombu/issues/422
In that ticket the code below, from the Flower project, is referenced:
- https://github.com/mher/flower/blob/master/flower/utils/broker.py#L135
:param queue: The name of the queue to make a name for.
:param pri: The priority to make a name with.
:return: A name for the queue-priority pair.
"""
if pri not in DEFAULT_PRIORITY_STEPS:
raise ValueError('Priority not in priority steps')
return '{0}{1}{2}'.format(*((queue, PRIORITY_SEP, pri) if pri else
(queue, '', '')))
def get_queue_length(queue_name='celery'):
"""Get the number of tasks in a celery queue.
:param queue_name: The name of the queue you want to inspect.
:return: the number of items in the queue.
"""
priority_names = [make_queue_name_for_pri(queue_name, pri) for pri in
DEFAULT_PRIORITY_STEPS]
r = redis.StrictRedis(
host=settings.REDIS_HOST,
port=settings.REDIS_PORT,
db=settings.REDIS_DATABASES['CELERY'],
)
return sum([r.llen(x) for x in priority_names])
If you want to get an actual task, you can use something like:
redis-cli -h HOST -p PORT -n DATABASE_NUMBER lrange QUEUE_NAME 0 -1
From there you'll have to deserialize the returned list. In my case I was able to accomplish this with something like:
r = redis.StrictRedis(
host=settings.REDIS_HOST,
port=settings.REDIS_PORT,
db=settings.REDIS_DATABASES['CELERY'],
)
l = r.lrange('celery', 0, -1)
pickle.loads(base64.decodestring(json.loads(l[0])['body']))
Just be warned that deserialization can take a moment, and you'll need to adjust the commands above to work with various priorities.
To retrieve tasks from backend, use this
from amqplib import client_0_8 as amqp
conn = amqp.Connection(host="localhost:5672 ", userid="guest",
password="guest", virtual_host="/", insist=False)
chan = conn.channel()
name, jobs, consumers = chan.queue_declare(queue="queue_name", passive=True)
A copy-paste solution for Redis with json serialization:
def get_celery_queue_items(queue_name):
import base64
import json
# Get a configured instance of a celery app:
from yourproject.celery import app as celery_app
with celery_app.pool.acquire(block=True) as conn:
tasks = conn.default_channel.client.lrange(queue_name, 0, -1)
decoded_tasks = []
for task in tasks:
j = json.loads(task)
body = json.loads(base64.b64decode(j['body']))
decoded_tasks.append(body)
return decoded_tasks
It works with Django. Just don't forget to change yourproject.celery.
This worked for me in my application:
def get_celery_queue_active_jobs(queue_name):
connection = <CELERY_APP_INSTANCE>.connection()
try:
channel = connection.channel()
name, jobs, consumers = channel.queue_declare(queue=queue_name, passive=True)
active_jobs = []
def dump_message(message):
active_jobs.append(message.properties['application_headers']['task'])
channel.basic_consume(queue=queue_name, callback=dump_message)
for job in range(jobs):
connection.drain_events()
return active_jobs
finally:
connection.close()
active_jobs will be a list of strings that correspond to tasks in the queue.
Don't forget to swap out CELERY_APP_INSTANCE with your own.
Thanks to #ashish for pointing me in the right direction with his answer here: https://stackoverflow.com/a/19465670/9843399
The celery inspect module appears to only be aware of the tasks from the workers perspective. If you want to view the messages that are in the queue (yet to be pulled by the workers) I suggest to use pyrabbit, which can interface with the rabbitmq http api to retrieve all kinds of information from the queue.
An example can be found here:
Retrieve queue length with Celery (RabbitMQ, Django)
I think the only way to get the tasks that are waiting is to keep a list of tasks you started and let the task remove itself from the list when it's started.
With rabbitmqctl and list_queues you can get an overview of how many tasks are waiting, but not the tasks itself: http://www.rabbitmq.com/man/rabbitmqctl.1.man.html
If what you want includes the task being processed, but are not finished yet, you can keep a list of you tasks and check their states:
from tasks import add
result = add.delay(4, 4)
result.ready() # True if finished
Or you let Celery store the results with CELERY_RESULT_BACKEND and check which of your tasks are not in there.
As far as I know Celery does not give API for examining tasks that are waiting in the queue. This is broker-specific. If you use Redis as a broker for an example, then examining tasks that are waiting in the celery (default) queue is as simple as:
connect to the broker
list items in the celery list (LRANGE command for an example)
Keep in mind that these are tasks WAITING to be picked by available workers. Your cluster may have some tasks running - those will not be in this list as they have already been picked.
The process of retrieving tasks in particular queue is broker-specific.
I've come to the conclusion the best way to get the number of jobs on a queue is to use rabbitmqctl as has been suggested several times here. To allow any chosen user to run the command with sudo I followed the instructions here (I did skip editing the profile part as I don't mind typing in sudo before the command.)
I also grabbed jamesc's grep and cut snippet and wrapped it up in subprocess calls.
from subprocess import Popen, PIPE
p1 = Popen(["sudo", "rabbitmqctl", "list_queues", "-p", "[name of your virtula host"], stdout=PIPE)
p2 = Popen(["grep", "-e", "^celery\s"], stdin=p1.stdout, stdout=PIPE)
p3 = Popen(["cut", "-f2"], stdin=p2.stdout, stdout=PIPE)
p1.stdout.close()
p2.stdout.close()
print("number of jobs on queue: %i" % int(p3.communicate()[0]))
If you control the code of the tasks then you can work around the problem by letting a task trigger a trivial retry the first time it executes, then checking inspect().reserved(). The retry registers the task with the result backend, and celery can see that. The task must accept self or context as first parameter so we can access the retry count.
#task(bind=True)
def mytask(self):
if self.request.retries == 0:
raise self.retry(exc=MyTrivialError(), countdown=1)
...
This solution is broker agnostic, ie. you don't have to worry about whether you are using RabbitMQ or Redis to store the tasks.
EDIT: after testing I've found this to be only a partial solution. The size of reserved is limited to the prefetch setting for the worker.
from celery.task.control import inspect
def key_in_list(k, l):
return bool([True for i in l if k in i.values()])
def check_task(task_id):
task_value_dict = inspect().active().values()
for task_list in task_value_dict:
if self.key_in_list(task_id, task_list):
return True
return False
With subprocess.run:
import subprocess
import re
active_process_txt = subprocess.run(['celery', '-A', 'my_proj', 'inspect', 'active'],
stdout=subprocess.PIPE).stdout.decode('utf-8')
return len(re.findall(r'worker_pid', active_process_txt))
Be careful to change my_proj with your_proj
To get the number of tasks on a queue you can use the flower library, here is a simplified example:
from flower.utils.broker import Broker
from django.conf import settings
def get_queue_length(queue):
broker = Broker(settings.CELERY_BROKER_URL)
queues_result = broker.queues([queue])
return queues_result.result()[0]['messages']