I want to share the BlockingChannel across multiple python process.
In order to send
basic_ack from other python process.
How to share the BlockingChannel across multiple python processes.
Following is the code:
self.__connection__ = pika.BlockingConnection(pika.ConnectionParameters(host='localhost'))
self.__channel__ = self.__connection__.channel()
I have tried to dump using pickle but it doenst allow to dump Channel and give error can't pickle select.epoll objects
using the follwoing code
filepath = "temp/" + "merger_channel.sav"
pickle.dump(self.__channel__, open(filepath, 'wb'))
GOAL:
Goal is to send basic_ack from channel from other python processes.
It is an antipattern to share a channel between multiple threads and it's quite unlikely you will manage to share it between processes.
The rule of thumb is 1 connection per process and 1 channel per thread.
You can read more in regard of this matter at the following links:
13 common RabbitMQ mistakes
RabbitMQ best practices
This SO thread gives an in depth analysis in regards of RabbitMQ and concurrent consumption
If you want to pair message consumption together with multiprocessing, the usual pattern is to let the main process receive the messages, deliver their payload to a pool of worker processes and acknowledge them once they are done.
Simple example using pika.BlockingChannel and concurrent.futures.ProcessPoolExecutor:
def ack_message(channel, delivery_tag, _future):
"""Called once the message has been processed.
Acknowledge the message to RabbitMQ.
"""
channel.basic_ack(delivery_tag=delivery_tag)
for message in channel.consume(queue='example'):
method, properties, body = message
future = pool.submit(process_message, body)
# use partial to pass channel and ack_tag to callback function
ack_message_callback = functools.partial(ack_message, channel, method.delivery_tag)
future.add_done_callback(ack_message_callback)
The above loop will endlessly consume messages from the example queue and submit them to the pool of processes. You can control how many messages to process concurrently via RabbitMQ consumer prefetch parameter. Check pika.basic_qos to see how to do it in Python.
Related
Architecture
Consider a system with DB records. Each record can be in a live or expired status; live records should be processed periodically using an external software module.
I have solved this using a classic producer - consumer architecture with Kombu and RabbitMQ. The producer fetches the records from the DB every few seconds, and the consumer handles them.
The problem
The number of live events greatly varies, and on peak hours the consumer can't handle the load and the queue is clogged with thousand of items.
I would like to make the system adaptive, so that the producer will not send new events to the consumer if the queue is empty.
What have I tried
Searching the Kombu documentation / API
Inspecting the Queue object
Using the RabbitMQ REST API: http://<host>:<port/api/queues/<vhost>/<queue_name>. It works, but it's yet another mechanism to maintain, and I prefer an elegant solution within Kombu.
How do I check whether a RabbitMQ is empty using Python's Kombu?
You can call queue_declare() on the kombu Queue object.
According to the docs the function returns:
Returns a tuple containing 3 items:
the name of the queue (essential for automatically-named queues)
message count
consumer count
Therefore you can do:
name, msg_count, consumer_count = queue.queue_declare()
I am writing a Consumer that need to consume from two different queues.
1-> for the actual messages(queue declared before hand).
2-> for command messages to control the behavior of the consumer(dynamically declared by the consumer and binds to an existing exchange with a routing key in a specific format(need one for each instance of consumer running))
I am using selection connection to consume async'ly.
self.channel.basic_qos(prefetch_count = self.prefetch_count)
log.info("Establishing channel with the Queue: "+self.commandQueue)
print "declaring command queue"
self.channel.queue_declare(queue=self.commandQueue,
durable = True,
exclusive=False,
auto_delete=True,
callback = self.on_command_queue_declared)
The queue is not being declared or the callback is not getting called.
On the other hand the messages from the actual message Queue are not being consumed since i added this block of code.
Pika logs do not show any errors nor the consumer app crashes.
does anybody know why this is happening or is there a better way to do this?
Have you looked at the example here: http://pika.readthedocs.org/en/latest/examples/asynchronous_consumer_example.html ?
And some blocking examples:
http://pika.readthedocs.org/en/latest/examples/blocking_consume.html
http://pika.readthedocs.org/en/latest/examples/blocking_consumer_generator.html
Blocking and Select connection comparison: http://pika.readthedocs.org/en/latest/examples/comparing_publishing_sync_async.html
Blocking and Select connections in pika 0.10.0 pre-release are faster and there are a number of bug fixes in that version.
environment:
python, pika, RabbitMQ.
I have a Queue that has some 100 messages already.
when 2 consumer applications are started one after the other, all the preexisting messages are being processed by the first consumer and not being distributed among the two consumers that are up and waiting for messages.
how ever, any new Messages put on to the queue are being distributed among both the consumers.
the problem is if the consumer takes a long time to process, tenter code herehe load is all on one consumer until the consumption of the initial preexisting messages from the queue.
But, if the Consumer1 is killed, the messages get delivered to the Consumer2(which is expected.)
I am using SelectionConnect,
prefetch_count=(tried both 0 and 1),
prefetch_size = 0,
no_ack = False,
is there a way to configure it such a way that, the preexisting messages on the queue will be shared across multiple consumers even if the consumers will be started at different times(like add more consumers based on the load as it increases).
any help is appreciated.
Thank you.
I was able to fix it by just moving the basic_qos call to set the prefetch count to 1 on_channel_create call back method.
for some reason setting prefetch value to 1 just before the basic_consume is not good enough. must be something to do with the pika's io loop.
I am wanting to create a RabbitMQ receiver/consumer in Python and am not sure how to check for messages. I am trying to do this in my own loop, not using the call-backs in pika.
If I understand things, in the Java client I can use getBasic() to check to see if there are any messages available without blocking. I don't mind blocking while getting messages, but I don't want to block until there is a message.
I don't find any clear examples and haven't yet figured out the corresponding call in pika.
If you want to do it synchronously then you will need to look at the pika BlockingConnection
The BlockingConnection creates a layer on top of Pika’s asynchronous
core providng methods that will block until their expected response
has returned. Due to the asynchronous nature of the Basic.Deliver and
Basic.Return calls from RabbitMQ to your application, you are still
required to implement continuation-passing style asynchronous methods
if you’d like to receive messages from RabbitMQ using basic_consume or
if you want to be notified of a delivery failure when using
basic_publish.
More info and an example here
https://pika.readthedocs.org/en/0.9.12/connecting.html#blockingconnection
You can periodically check the queue size using the example of this answer Get Queue Size in Pika (AMQP Python)
Queue processing loop can be done iteratively with the help of process_data_events():
import pika
# A stubborn callback that still wants to be in the code.
def mq_callback(ch, method, properties, body):
print(" Received: %r" % body)
connection = pika.BlockingConnection(pika.ConnectionParameters("localhost"))
channel = connection.channel()
queue_state = channel.queue_declare(queue="test")
# Configure a callback.
channel.basic_consume(mq_callback, queue="test")
try:
# My own loop here:
while(True):
# Do other processing
# Process message queue events, returning as soon as possible.
# Issues mq_callback() when applicable.
connection.process_data_events(time_limit=0)
finally:
connection.close()
The producer module of my application is run by users who want to submit work to be done on a small cluster. It sends the subscriptions in JSON form through the RabbitMQ message broker.
I have tried several strategies, and the best so far is the following, which is still not fully working:
Each cluster machine runs a consumer module, which subscribes itself to the AMQP queue and issues a prefetch_count to tell the broker how many tasks it can run at once.
I was able to make it work using SelectConnection from the Pika AMQP library. Both consumer and producer start two channels, one connected to each queue. The producer sends requests on channel [A] and waits for responses in channel [B], and the consumer waits for requests on channel [A] and send responses on channel [B]. It seems, however, that when the consumer runs the callback that calculates the response, it blocks, so I have only one task executed at each consumer at each time.
What I need in the end:
the consumer [A] subscribes his tasks (around 5k each time) to the cluster
the broker dispatches N messages/requests for each consumer, where N is the number of concurrent tasks it can handle
when a single task is finished, the consumer replies to the broker/producer with the result
the producer receives the replies, update the computation status and, in the end, prints some reports
Restrictions:
If another user submits work, all of his tasks will be queued after the previous user (I guess this is automatically true from the queue system, but I haven't thought about the implications on a threaded environment)
Tasks have an order to be submitted, but the order they are replied is not important
UPDATE
I have studied a bit further and my actual problem seems to be that I use a simple function as callback to the pika's SelectConnection.channel.basic_consume() function. My last (unimplemented) idea is to pass a threading function, instead of a regular one, so the callback would not block and the consumer can keep listening.
As you have noticed, your process blocks when it runs a callback. There are several ways to deal with this depending on what your callback does.
If your callback is IO-bound (doing lots of networking or disk IO) you can use either threads or a greenlet-based solution, such as gevent, eventlet, or greenhouse. Keep in mind, though, that Python is limited by the GIL (Global Interpreter Lock), which means that only one piece of python code is ever running in a single python process. This means that if you are doing lots of computation with python code, these solutions will likely not be much faster than what you already have.
Another option would be to implement your consumer as multiple processes using multiprocessing. I have found multiprocessing to be very useful when doing parallel work. You could implement this by either using a Queue, having the parent process being the consumer and farming out work to its children, or by simply starting up multiple processes which each consume on their own. I would suggest, unless your application is highly concurrent (1000s of workers), to simply start multiple workers, each of which consumes from their own connection. This way, you can use the acknowledgement feature of AMQP, so if a consumer dies while still processing a task, the message is sent back to the queue automatically and will be picked up by another worker, rather than simply losing the request.
A last option, if you control the producer and it is also written in Python, is to use a task library like celery to abstract the task/queue workings for you. I have used celery for several large projects and have found it to be very well written. It will also handle the multiple consumer issues for you with the appropriate configuration.
Your setup sounds good to me. And you are right, you can simply set the callback to start a thread and chain that to a separate callback when the thread finishes to queue the response back over Channel B.
Basically, your consumers should have a queue of their own (size of N, amount of parallelism they support). When a request comes in via Channel A, it should store the result in the queue shared between the main thread with Pika and the worker threads in the thread pool. As soon it is queued, pika should respond back with ACK, and your worker thread would wake up and start processing.
Once the worker is done with its work, it would queue the result back on a separate result queue and issue a callback to the main thread to send it back to the consumer.
You should take care and make sure that the worker threads are not interfering with each other if they are using any shared resources, but that's a separate topic.
Being unexperienced in threading, my setup would run multiple consumer processes (the number of which basically being your prefetch count). Each would connect to the two queues and they would process jobs happily, unknowning of eachother's existence.