I'm working on project that need to control sending queue by code. So I just curious that anybody use to create queue in rabbitmq by python/django code? :)
Usual python clients should do from django (but beware, you may need to block the request when you're running AMQP commands). Take a look at rabbitmq tutorials
http://www.rabbitmq.com/getstarted.html
https://github.com/rabbitmq/rabbitmq-tutorials
There are at least three python clients: python-amqplib, pika and puka.
Also, you may find www.celeryproject.org useful.
In AMQP, you don't create a queue. Instead, you declare a queue, and if the queue doesn't already exist, then it is created.
In some cases all you need to do is to declare the queue in the processes that consume messages. But if you want persistent and durable queues then it is best to declare them beforehand with a shell script, or in the message publisher. Even if the message publisher does not do anything with the queue, it can still declare it to ensure that messages from the exchange are never dropped.
Related
I'm aware that pika is not thread safe, i was trying to work around using a lock to access to channel but still get error:
pika.exceptions.ConnectionClosed: (505, 'UNEXPECTED_FRAME - expected content header for class 60, got non content header frame instead')
PS i cannot use a different channel.
what could i do? Thank you for help in advance
You need to redesign your application or choose another Rabbitmq library than Pika. Locks do not make Pika thread safe. Each thread needs to have a separate connection.
You have a couple of options, but none of them will be as simple as using a lock.
One would be to replace Pika with Kombu. Kombu is thread safe but the interface is rather different from Pika (simpler in my opinion but this is subjective).
If you want to keep using Pika, then you need to redesign your Rabbit interface. I do not know why you "cannot" use a different channel. But one possible way of doing this would be to have a single thread interfacing with Rabbit, and that thread would interact with worker threads doing tasks with the received data, and you would communicate via queues with them. This way your Rabbit thread would read data, send the received data to a worker in a queue, receive answers from workers via another queue and then submitting them to rabbit as responses.
You might also be able to untangle something in your communications protocol so that you actually can use a different channel and each thread can interface rabbit independently with their own connections and channels. This is the method I generally use.
Yet another candidate would be to get rid of threads and start using async methods instead. Your application may or may not be suitable for this.
But there is no simple workaround, and you will eventually encounter weird behaviour or exceptions if you try to share Pika objects between threads.
I am writing a Consumer that need to consume from two different queues.
1-> for the actual messages(queue declared before hand).
2-> for command messages to control the behavior of the consumer(dynamically declared by the consumer and binds to an existing exchange with a routing key in a specific format(need one for each instance of consumer running))
I am using selection connection to consume async'ly.
self.channel.basic_qos(prefetch_count = self.prefetch_count)
log.info("Establishing channel with the Queue: "+self.commandQueue)
print "declaring command queue"
self.channel.queue_declare(queue=self.commandQueue,
durable = True,
exclusive=False,
auto_delete=True,
callback = self.on_command_queue_declared)
The queue is not being declared or the callback is not getting called.
On the other hand the messages from the actual message Queue are not being consumed since i added this block of code.
Pika logs do not show any errors nor the consumer app crashes.
does anybody know why this is happening or is there a better way to do this?
Have you looked at the example here: http://pika.readthedocs.org/en/latest/examples/asynchronous_consumer_example.html ?
And some blocking examples:
http://pika.readthedocs.org/en/latest/examples/blocking_consume.html
http://pika.readthedocs.org/en/latest/examples/blocking_consumer_generator.html
Blocking and Select connection comparison: http://pika.readthedocs.org/en/latest/examples/comparing_publishing_sync_async.html
Blocking and Select connections in pika 0.10.0 pre-release are faster and there are a number of bug fixes in that version.
I have an SQS queue that is constantly being populated by a data consumer and I am now trying to create the service that will pull this data from SQS using Python's boto.
The way I designed it is that I will have 10-20 threads all trying to read messages from the SQS queue and then doing what they have to do on the data (business logic), before going back to the queue to get the next batch of data once they're done. If there's no data they will just wait until some data is available.
I have two areas I'm not sure about with this design
Is it a matter of calling receive_message() with a long time_out value and if nothing is returned in the 20 seconds (maximum allowed) then just retry? Or is there a blocking method that returns only once data is available?
I noticed that once I receive a message, it is not deleted from the queue, do I have to receive a message and then send another request after receiving it to delete it from the queue? seems like a little bit of an overkill.
Thanks
The long-polling capability of the receive_message() method is the most efficient way to poll SQS. If that returns without any messages, I would recommend a short delay before retrying, especially if you have multiple readers. You may want to even do an incremental delay so that each subsequent empty read waits a bit longer, just so you don't end up getting throttled by AWS.
And yes, you do have to delete the message after you have read or it will reappear in the queue. This can actually be very useful in the case of a worker reading a message and then failing before it can fully process the message. In that case, it would be re-queued and read by another worker. You also want to make sure the invisibility timeout of the messages is set to be long enough the the worker has enough time to process the message before it automatically reappears on the queue. If necessary, your workers can adjust the timeout as they are processing if it is taking longer than expected.
If you want a simple way to set up a listener that includes automatic deletion of messages when they're finished being processed, and automatic pushing of exceptions to a specified queue, you can use the pySqsListener package.
You can set up a listener like this:
from sqs_listener import SqsListener
class MyListener(SqsListener):
def handle_message(self, body, attributes, messages_attributes):
run_my_function(body['param1'], body['param2']
listener = MyListener('my-message-queue', 'my-error-queue')
listener.listen()
There is a flag to switch from short polling to long polling - it's all documented in the README file.
Disclaimer: I am the author of said package.
Another option is to setup a worker application using AWS Beanstalk as described in this blogpost.
Instead of long polling using boto3, your flask application receives the message as a json object in a HTTP post. The HTTP path and type of message being set are configurable in the AWS Elastic Beanstalk Configuration tab:
AWS Elastic Beanstalk has the added benefit of being able to dynamically scale the number of workers as a function of the size of your SQS queue, along with its deployment management benefits.
This is an example application that I found useful as a template.
I'm using the threading module to control threads that send data through sockets and what not, however I can't find a suitable solution to pass data into the thread to work with. I've tried things such as Overriding python threading.Thread.run() but can't seem to get it working. If anyone has any suggestions I'd be happy to try anything :)
Thanks !
You are thinking about this backwards. Forget about the fact that it happens to be a thread that's sending the data through the sockets. The data doesn't need to get to the thread, it needs to get to the logic that sends data on the socket.
For example, you can have a queue that holds things that need to be sent through the socket. The socket write code pulls messages from the queue and sends them out the socket. The other code puts messages on this queue. The code that needs to send messages to the socket shouldn't know or care that there happens to be a thread that does the sending.
Use message queues for this. Python has the Queue module for passing data between threads, but if you use a third party library like 0MQ http://www.zeromq.org instead, then you can split the threads into separate processes and it will work the same way.
Multiprocessing is easier to do than threading, but if you have to use threading, avoid locking and sharing data as much as you can. Instead use a prewritten module like Queue to limit the ways in which subtle bugs can arise.
The producer module of my application is run by users who want to submit work to be done on a small cluster. It sends the subscriptions in JSON form through the RabbitMQ message broker.
I have tried several strategies, and the best so far is the following, which is still not fully working:
Each cluster machine runs a consumer module, which subscribes itself to the AMQP queue and issues a prefetch_count to tell the broker how many tasks it can run at once.
I was able to make it work using SelectConnection from the Pika AMQP library. Both consumer and producer start two channels, one connected to each queue. The producer sends requests on channel [A] and waits for responses in channel [B], and the consumer waits for requests on channel [A] and send responses on channel [B]. It seems, however, that when the consumer runs the callback that calculates the response, it blocks, so I have only one task executed at each consumer at each time.
What I need in the end:
the consumer [A] subscribes his tasks (around 5k each time) to the cluster
the broker dispatches N messages/requests for each consumer, where N is the number of concurrent tasks it can handle
when a single task is finished, the consumer replies to the broker/producer with the result
the producer receives the replies, update the computation status and, in the end, prints some reports
Restrictions:
If another user submits work, all of his tasks will be queued after the previous user (I guess this is automatically true from the queue system, but I haven't thought about the implications on a threaded environment)
Tasks have an order to be submitted, but the order they are replied is not important
UPDATE
I have studied a bit further and my actual problem seems to be that I use a simple function as callback to the pika's SelectConnection.channel.basic_consume() function. My last (unimplemented) idea is to pass a threading function, instead of a regular one, so the callback would not block and the consumer can keep listening.
As you have noticed, your process blocks when it runs a callback. There are several ways to deal with this depending on what your callback does.
If your callback is IO-bound (doing lots of networking or disk IO) you can use either threads or a greenlet-based solution, such as gevent, eventlet, or greenhouse. Keep in mind, though, that Python is limited by the GIL (Global Interpreter Lock), which means that only one piece of python code is ever running in a single python process. This means that if you are doing lots of computation with python code, these solutions will likely not be much faster than what you already have.
Another option would be to implement your consumer as multiple processes using multiprocessing. I have found multiprocessing to be very useful when doing parallel work. You could implement this by either using a Queue, having the parent process being the consumer and farming out work to its children, or by simply starting up multiple processes which each consume on their own. I would suggest, unless your application is highly concurrent (1000s of workers), to simply start multiple workers, each of which consumes from their own connection. This way, you can use the acknowledgement feature of AMQP, so if a consumer dies while still processing a task, the message is sent back to the queue automatically and will be picked up by another worker, rather than simply losing the request.
A last option, if you control the producer and it is also written in Python, is to use a task library like celery to abstract the task/queue workings for you. I have used celery for several large projects and have found it to be very well written. It will also handle the multiple consumer issues for you with the appropriate configuration.
Your setup sounds good to me. And you are right, you can simply set the callback to start a thread and chain that to a separate callback when the thread finishes to queue the response back over Channel B.
Basically, your consumers should have a queue of their own (size of N, amount of parallelism they support). When a request comes in via Channel A, it should store the result in the queue shared between the main thread with Pika and the worker threads in the thread pool. As soon it is queued, pika should respond back with ACK, and your worker thread would wake up and start processing.
Once the worker is done with its work, it would queue the result back on a separate result queue and issue a callback to the main thread to send it back to the consumer.
You should take care and make sure that the worker threads are not interfering with each other if they are using any shared resources, but that's a separate topic.
Being unexperienced in threading, my setup would run multiple consumer processes (the number of which basically being your prefetch count). Each would connect to the two queues and they would process jobs happily, unknowning of eachother's existence.