I found a rather interesting problem today. I have a queue with 2K messages in it.
The consumer: import rabbitpy
with rabbitpy.Connection('amqp://guest:guest#localhost:5672/%2f') as conn:
with conn.channel() as channel:
queue = rabbitpy.Queue(channel, 'example')
for message in queue.consume_messages():
message.ack()
takes 41 seconds to get these messages and ack them. (messages vary from 4kB to 52kB)
The publisher, however, took 15 seconds to publish them.
Upon profiling, I found that there is a call to sleep were we spend 86% of the time. This, to my application, is not acceptable. Could someone help me get rid of this sleep? (I'm ok if the CPU cycles are wasted or whatever till a message arrives.)
Zoomed in screenshot
Related
I have a large python script with a thread that listens to a serial port and puts new data to a queue whenever it's received. I've been trying to improve the performance of the script, as right now even when nothing is happening it's using ~ 12% of my Ryzen 3600 CPU. That seems very excessive.
Here's the serial listener thread:
def listen(self):
"""
Main listener
"""
while self.doListen:
# wait for a message
if self.bus.ser.in_waiting:
# Read rest of message
msg = self.bus.read(self.bus.ser.in_waiting)
# Put into queue
self.msgQueue.put_nowait(msg)
I profiled the script using yappi and found that the serial.in_waiting call seems to be hogging the majority of the cycles. See the KCachegrind screenshot below:
I tried the trick suggested in this question, doing a blocking read(1) call to wait for data. However read(1) just continuously returns empty data and never actually blocks (and yes, I've made sure my pyserial timeout is set to None)
Is there a more elegant and CPU-friendly way to wait for data on the serial bus? My messages are of variable length, so doing a blocking read(n) wouldn't work. They also don't end in newlines or any specific terminators, so readline() wouldn't work either.
Aaron's suggestion was great. A simple time.sleep(0.01) in my serial thread dramatically cut down on CPU usage. So far it looks like I'm not missing any messages either, which was my big fear with adding in sleeps.
The new listener thread:
def listen(self):
"""
Main listener
"""
while self.doListen:
# wait for a message
if self.bus.ser.in_waiting:
# Read rest of message
msg = self.bus.read(self.bus.ser.in_waiting)
# Put into queue
self.msgQueue.put_nowait(msg)
# give the CPU a break
time.sleep(0.01)
I'm struggling understanding a "weird" behavior of my simple script. Basically, it works as expected if time.sleep() is set as 60s but as soon as I put a value above 90 (90 is the limit apparently in my case), the loop doesn't work properly. I discovered this when I was trying to pause the script for 3 mins.
Here's my script
from gpiozero import CPUTemperature
import time
import paho.mqtt.client as mqtt #import the client1
import psutil
broker_address="192.168.1.17"
client = mqtt.Client("P1") #create new instance
client.connect(broker_address) #connect to broker
#time.sleep(60)
while True:
cpu = CPUTemperature()
print(cpu.temperature)
#a=cpu.temperature
#print(psutil.cpu_percent())
#print(psutil.virtual_memory()[2])
#print(a)
client.publish("test/message",cpu.temperature)
#client.publish("test/ram", psutil.virtual_memory()[2])
#client.publish("test/cpu", psutil.cpu_percent())
time.sleep(91)
In this case, with 91s it just prints the value of cpu.temperature every 91s, whereas with a value like 60s, besides printing, it also publishes the value via mqtt every cycle.
Am I doing something wrong here? Or for a longer sleep I need to change my code? I'm running this on a RaspberryPi.
Thanks in advance
EDIT:
I solved modifying the script, in particular how mqtt was handling the timing
here's the new script
mqttc=mqtt.Client("P1")
#mqttc.on_connect = onConnect
#mqttc.on_disconnect = onDisconnect
mqttc.connect("192.168.1.17", port=1883, keepalive=60)
mqttc.loop_start()
while True:
cpu = CPUTemperature()
print(cpu.temperature)
mqttc.publish("test/message",cpu.temperature)
time.sleep(300)
The MQTT client uses a network thread to handle a number of different aspects of the connection to the broker.
Firstly, it handles sending ping request to the broker in order to keep the connection alive. The default period for the keepalive period is 60 seconds. The connection will be dropped by the broker if it does not receive any messages in 1.5 times this value, which just happens to be 90 seconds.
Secondly, the thread handles any incoming messages that the client may have subscribed to.
Thirdly, if you try to publish a message that is bigger than the MTU of the network link, calling mqttc.publish() will only send the first packet and the loop is needed to send the rest of the payload.
There are 2 ways to run the network tasks.
As you have found, you can start a separate thread with the mqttc.loop_start()
The other option is to call mqttc.loop() within your own while loop
I am trying a basic hello world example with RabbitMQ in Python, and it is taking about 8 seconds to set up a basic blocking connection. This seems excessive, but this is my first experience with RabbitMQ, so my question is: is this normal? Can I reduce this time? Or should I look for another option? Here is my code:
import time
import pika
start = time.time()
connection = pika.BlockingConnection(pika.ConnectionParameters(host="localhost"))
end = time.time()
print "Elapsed time: %s" % (end-start)
channel = connection.channel()
channel.queue_declare(queue="hello")
channel.basic_publish(exchange="",
routing_key="hello",
body="Hello world!")
connection.close()
and my output is Elapsed time: 8.01042914391.
Thanks for the help!
[Edit] I have noticed that every time I run it, it takes almost exactly 8 seconds, to within .2%. I'm not sure if that means anything.
You need to configure channel configurations for inbound channel, outbound chanel, just like thread pool executor. Default values for these threads is 1 which might cause in delay under some load.
I have an application which fetches messages from a ZeroMQ publisher, using a PUB/SUB setup. The reader is slow sometimes so I set a HWM on both the sender and receiver. I expect that the receiver will fill the buffer and jump to catch up when it recovers from processing slowdowns. But the behavior that I observe is that it never drops! ZeroMQ seems to be ignoring the HWM. Am I doing something wrong?
Here's a minimal example:
publisher.py
import zmq
import time
ctx = zmq.Context()
sock = ctx.socket(zmq.PUB)
sock.setsockopt(zmq.SNDHWM, 1)
sock.bind("tcp://*:5556")
i = 0
while True:
sock.send(str(i))
print i
time.sleep(0.1)
i += 1
subscriber.py
import zmq
import time
ctx = zmq.Context()
sock = ctx.socket(zmq.SUB)
sock.setsockopt(zmq.SUBSCRIBE, "")
sock.setsockopt(zmq.RCVHWM, 1)
sock.connect("tcp://localhost:5556")
while True:
print sock.recv()
time.sleep(0.5)
I believe there are a couple things at play here:
High Water Marks are not exact (see the last paragraph in the linked section) - typically this means the real queue size will be smaller than your listed number, I don't know how this will behave at 1.
Your PUB HWM will never drop messages... due to the way PUB sockets work, it will always immediately processes the message whether there is an available subscriber or not. So unless it actually takes ZMQ .1 seconds to process the message through the queue, your HWM will never come into play on the PUB side.
What should be happening is something like the following (I'm assuming an order of operations that would allow you to actually receive the first published message):
Start up subscriber.py & wait a suitable period to make sure it's completely spun up (basically immediately)
Start up publisher.py
PUB processes and sends the first message, SUB receives and processes the first message
PUB sleeps for .1 seconds and processes & sends the second message
SUB sleeps for .5 seconds, the socket receives the second message but sits in queue until the next call to sock.recv() processes it
PUB sleeps for .1 seconds and processes & sends the third message
SUB is still sleeping for another .3 seconds, so the third message should hit the queue behind the second message, which would make 2 messages in the queue, and the third one should drop due to the HWM
... etc etc etc.
I suggest the following changes to help troubleshoot the issue:
Remove the HWM on your publisher... it does nothing but add a variable we don't need to deal with in your test case, since we never expect it to change anything. If you need it for your production environment, add it back in and test it in a high volume scenario later.
Change the HWM on your subscriber to 50. It'll make the test take longer, but you won't be at the extreme edge case, and since the ZMQ documentation states that the HWM isn't exact, the extreme edge cases could cause unexpected behavior. Mind you, I believe your test (being small numbers) wouldn't do that, but I haven't looked at the code implementing the queues so I can't say with certainty, and it may be possible that your data is small enough that your effective HWM is actually larger.
Change your subscriber sleep time to 3 full seconds... in theory, if your queue holds up to exactly 50 messages, you'll saturate that within two loops (just like you do now), and then you'll have to wait 2.5 minutes to work through those messages to see if you start getting skips, which after the first 50 messages should start jumping large groups of numbers. But I'd wait at least 5-10 minutes. If you find that you start skipping after 100 or 200 messages, then you're being bitten by the smallness of your data.
This of course doesn't address what happens if you still don't skip any messages... If you do that and still experience the same issue, then we may need to dig more into how high water marks actually work, there may be something we're missing.
I met exactly the same problem, and my demo is nearly the same with yours, the subscriber or publisher won't drop any message after either zmq.RCVHWM or zmq.SNDHWM is set to 1.
I walk around after referring to the suicidal snail pattern for slow subscriber detection in Chap.5 of zguide. Hope it helps.
BTW: would you please let me know if you've solved the bug of zmq.HWM ?
I've gotten Celery tasks happening ok, using the default settings in the tutorials and rabbitmq running on ubuntu. All is fine when I schedule a task with no delay, but when I give them an eta, they get scheduled in the future as if my clock is off somewhere.
Here is some python code that is asking for tasks:
for index, to_address in enumerate(email_addresses):
# schedule one email every two seconds
delay = index * 2
log.info("MessageUsersFormView.process_action() scheduling task,"
"email to %s, countdown = %i" % (to_address, delay) )
tasks.send_email.apply_async(args=[to_address, subject, body],
countdown = delay)
So the first one should go out immediately, and then every two seconds. Looking at my celery console, the first one happens immediately, and then the others are scheduled two seconds apart, but starting tomorrow:
[2012-03-09 17:32:40,988: INFO/MainProcess] Got task from broker: stabil.tasks.send_email[24fafc0b-071b-490b-a808-29d47bbee435]
[2012-03-09 17:32:40,989: INFO/MainProcess] Got task from broker: stabil.tasks.send_email[3eb6c3ea-2c84-4368-babe-8a2ac0093836] eta:[2012-03-10 01:32:42.971072-08:00]
[2012-03-09 17:32:40,991: INFO/MainProcess] Got task from broker: stabil.tasks.send_email[a53110d6-b704-4d9c-904a-8d74b99a33af] eta:[2012-03-10 01:32:44.971779-08:00]
[2012-03-09 17:32:40,992: INFO/MainProcess] Got task from broker: stabil.tasks.send_email[2363329b-47e7-4edd-b38e-b09fed232003] eta:[2012-03-10 01:32:46.972422-08:00]
I'm totally new to both Celery and RabbitMQ so any tips on how to fix this or where to look for the cause would be great. This is on a VMWare virtual machine of Ubuntu, but I have the clock set correctly.
Thanks!
I think it is actually working as you expect. The time on the left (between the square brackets and before INFO/MainProcess) is presented in local time, but the eta time is shown as UTC time. For instance:
Take the ETA time presented in the second line of your console output:
2012-03-10 01:32:42.971072-08:00
Subtract 8 hours (-08:00 is the timezone offset) and you get:
2012-03-09 17:32:42.971072
Which is just 2 seconds after the sent time:
2012-03-09 17:32:40,989
I hope that makes sense. Dealing with times often gives me a headache.