Redis: # of channels degrading latency. How to prevent degradation? - python

pub.py
import redis
import datetime
import time
import json
import sys
import threading
import gevent
from gevent import monkey
monkey.patch_all()
def main(chan):
redis_host = '10.235.13.29'
r = redis.client.StrictRedis(host=redis_host, port=6379)
while True:
def getpkg():
package = {'time': time.time(),
'signature' : 'content'
}
return package
#test 2: complex data
now = json.dumps(getpkg())
# send it
r.publish(chan, now)
print 'Sending {0}'.format(now)
print 'data type is %s' % type(now)
time.sleep(1)
def zerg_rush(n):
for x in range(n):
t = threading.Thread(target=main, args=(x,))
t.setDaemon(True)
t.start()
if __name__ == '__main__':
num_of_chan = 10
zerg_rush(num_of_chan)
cnt = 0
stop_cnt = 21
while True:
print 'Waiting'
cnt += 1
if cnt == stop_cnt:
sys.exit(0)
time.sleep(30)
sub.py
import redis
import threading
import time
import json
import gevent
from gevent import monkey
monkey.patch_all()
def callback(ind):
redis_host = '10.235.13.29'
r = redis.client.StrictRedis(host=redis_host, port=6379)
sub = r.pubsub()
sub.subscribe(str(ind))
start = False
avg = 0
tot = 0
sum = 0
while True:
for m in sub.listen():
if not start:
start = True
continue
got_time = time.time()
decoded = json.loads(m['data'])
sent_time = float(decoded['time'])
dur = got_time - sent_time
tot += 1
sum += dur
avg = sum / tot
print decoded #'Recieved: {0}'.format(m['data'])
file_name = 'logs/sub_%s' % ind
f = open(file_name, 'a')
f.write('processing no. %s' % tot)
f.write('it took %s' % dur)
f.write('current avg: %s\n' % avg)
f.close()
def zerg_rush(n):
for x in range(n):
t = threading.Thread(target=callback, args=(x,))
t.setDaemon(True)
t.start()
def main():
num_of_chan = 10
zerg_rush(num_of_chan)
while True:
print 'Waiting'
time.sleep(30)
if __name__ == '__main__':
main()
I am testing redis pubsub to replace the use of rsh to communicate with remote boxes.
One of the things I have tested for was the number of channels affecting latency of publish and pubsub.listen().
Test: One publisher and one subscriber per channel (publisher publish every one second). Incremented the number of channels from and observed the latency (The duration from the moment publisher publish a message to the moment subscriber got the message via listen)
num of chan--------------avg latency in seconds
10:----------------------------------0.004453
50:----------------------------------0.005246
100:---------------------------------0.0155
200:---------------------------------0.0221
300:---------------------------------0.0621
Note: tested on 2 CPU + 4GB RAM + 1 NICs RHEL6.4 VM.
What can I do to maintain low latency with high number of channels?
Redis is single-threaded so increasing more cpus wont help. maybe more RAM? if so, how much more?
Anything I can do code-wise or bottleneck is in Redis itself?
Maybe the limitation comes from the way my test codes are written with threading?
EDIT:
Redis Cluster vs ZeroMQ in Pub/Sub, for horizontally scaled distributed systems
Accepted answer says "You want to minimize latency, I guess. The number of channels is irrelevant. The key factors are the number of publishers and number of subscribers, message size, number of messages per second per publisher, number of messages received by each subscriber, roughly. ZeroMQ can do several million small messages per second from one node to another; your bottleneck will be the network long before it's the software. Most high-volume pubsub architectures therefore use something like PGM multicast, which ZeroMQ supports."
From my testings, i dont know if this is true. (The claim that the number of channels is irrelevant)
For example, i did a testing.
1) One channel. 100 publishers publishing to a channel with 1 subscriber listening. Publisher publishing one second at a time. latency was 0.00965 seconds
2) Same testing except 1000 publishers. latency was 0.00808 seconds
Now during my channel testing:
300 channels with 1 pub - 1 sub resulted in 0.0621 and this is only 600 connections which is less than above testing yet significantly slow in latency

Related

Joining twitch irc through helix api, (5000+ channels) connection reset error

To start off im using an anonymous connection joining the channels which means there are no JOIN limits, I have tried different variations of sleeping, I started off just joining from a text however that had a lot of problems because it was connecting all the sockets before joining so I couldnt see what caused it. However this is the best version I have created so far, its pretty scuffed but I am just trying to understand what the issue is. If anyone has any insight on doing a big task like this I would appreciate it a lot!
(oauth and helix headers are from a random alt account I made for testing and its trying to join 10k channels in the example but stops around 2k-3k max)
import requests
import socket
import time
import threading
import random
connections_made = 0
sockets = []
def connect():
global sockets
global connections_made
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
print("CONNECTING TO IRC")
sock.connect(('irc.chat.twitch.tv', 6667))
sock.send(bytes('PASS oauth:'+ '\r\n', 'utf-8'))
sock.send(bytes('NICK justinfan' + str(random.randint(10000,99999)) + '\r\n', 'utf-8'))
sockets.append(sock)
connections_made += 1
print(f"socket: {len(sockets)}")
for i in range(2):
connect() # initial for .recv reading
helix_headers = {'client-id': 'q6batx0epp608isickayubi39itsckt', 'authorization': 'Bearer rk0ixn6169ar7y5xey9msvk1h8zrs8'}
def request(channels_to_join,cursor):
request_amount = int(channels_to_join / 100) # 100 requests = 10000 channels
user_list = []
sock_numb = 0
total_chans_joined = 0
count_every_request = 0
for i in range(request_amount):
time.sleep(1)
# 3k channels with time.sleep(1) 1.5k channels with time.sleep(2) 30 seconds then connection reset error (when bulk joining 100 channels and waiting for the next request)
# waiting 30 seconds doesnt fix this either stop at about 500 channels so lasted 2.5minutes?
# waiting 60 seconds at 500 channels breaks
if count_every_request == 1: # for every 100 channels
connect()
count_every_request = 0
r = requests.get("https://api.twitch.tv/helix/streams?first=100&after=" + cursor,headers=helix_headers)
cursor = r.json()['pagination']['cursor']
count_every_request += 1
for everything in r.json()['data']:
user_list.append(everything['user_login'])
channel = everything['user_login']
# join channel
if sock_numb == connections_made: # makes it so when joining sockets it joins up to the amount of sockets that there are and then loops back
sock_numb = 0
print(f"JOINING #{channel} with socket: {sock_numb} total joined: {total_chans_joined}")
sockets[sock_numb].send(bytes('JOIN #' + channel + '\r\n', 'utf-8'))
total_chans_joined += 1
sock_numb += 1
def loop():
print("Looping")
try:
while True:
time.sleep(0.1)
for i in range(connections_made):
data = sockets[i].recv(4096).decode("utf-8",errors='replace').strip()
if data == "":
continue
print(data)
if "PING :tmi.twitch.tv" in data:
print("PONG")
sockets[i].send(bytes('PONG :tmi.twitch.tv' + '\r\n', 'utf-8'))
except Exception as e:
print(str(e) + " error in loop ")
pass
thread_loop = threading.Thread(target=loop)
thread_loop.start()
request(channels_to_join=10000,cursor = "eyJiIjp7IkN1cnNvciI6ImV5SnpJam80T0RrMU1TNDVNRFkwTWpnd09URTVNU3dpWkNJNlptRnNjMlVzSW5RaU9uUnlkV1Y5In0sImEiOnsiQ3Vyc29yIjoiZXlKeklqbzFNakF6TGpJM056UTFPVEUzT1RReE1Td2laQ0k2Wm1Gc2MyVXNJblFpT25SeWRXVjkifX0")
The likely problem is that your bot can't keep up with the message send buffer.
So you connect to many channels, but are not processing the incoming chat messages in a timely fashion. So the "queue" of messages to send from Twitch to You exceeds Twitch's buffer. And it DC's you
Or as per the IRC Rate limit guide you are sneding too many commands and getting Disconnected from the server.
Large chat bots will often split groups of channels over multiple connections to solve this issue.

How to tell a beanstalkc receiver to wait X seconds before reserving a task?

I have 2 beanstalkc receivers watching the same tube "tubename".
I would like one beanstalkc receiver to have priority over the other. In order to achieve this, I would like to tell the lowest-priority beanstalkc receiver to wait for task being X seconds old before reserving them.
I found "reserve-with-timeout", but I neither really understand it nor do I managed to make it work successfully for my use case.
class MyBeanstalkReceiver():
def __init__(self, host=beanstalkc.DEFAULT_HOST, port=beanstalkc.DEFAULT_PORT,
tube="default", timeout=1):
self.tube = tube
self.host = host
self.port = port
self.timeout = timeout
def run(self):
while True:
self.run_once()
def run_once(self):
job = self._get_task()
try:
body = job.body
data = json.loads(body)
self.job(data)
except Exception as e:
job.delete()
def job(self, data):
print(data)
def beanstalk(self):
beanstalk = beanstalkc.Connection(host=self.host, port=self.port)
beanstalk.use(self.tube)
beanstalk.watch(self.tube)
return beanstalk
def _get_task(self):
return self.beanstalk().reserve(self.timeout)
And my 2 beanstalkc receivers:
# receiver 1
w = MyBeanstalkReceiver(hosts=["localhost:14711"], tube="tubename", timeout=1)
w.run()
# receiver 2
w = MyBeanstalkReceiver(hosts=["localhost:14711"], tube="tubename", timeout=10000)
w.run()
Between the 2 receivers, with a timeout of 1 and 10000, nothing changes when I send tasks over the tube: both end up managing the same quantity of tasks put inside the tube "tubename".
Any idea on how to proceed to make "receiver 1" prioritary over "receiver 2"?
The timeout in reserve is for how long the client will wait before returning without a job.
You may be looking for put (with a delay), where the job is not released until it has been in the queue for at least n seconds.
There is also a priority per job. If the receiver could have seen them both at the same time, it will return any jobs with a higher priority (ie: closer to 0) rather than with a lower priority (with a larger number).
Beanstalkd does not differentiate between priorities of the clients or receivers.

How to run 2 differents loops in 2 differents threads?

I'm doing a telemetry application using Azure IoT Hub, Azure IoT SDK in Python and a raspberry pi with temperature and humidity sensors.
Humidity + Temperature sensors => Rasperry Pi => Azure IoT Hub
In my first implementation thanks azure examples, I used one loop that collect data from the temperature sensor and the humidity sensor, and send them to Azure IoT Hub in the same time every 60 second.
>>> 1 Loop every 60s = Collect data & send data of temperature and humidity
Now I would like to send them with different frequencies, I mean :
One loop will collect the data of the temperature sensor and send it to Azure IoT Hub every 60 seconds;
Whereas a second loop will collect the data of the humidity sensor and send it to Azure IoT Hub every 600 seconds.
>>> 1 Loop every 60s= Collect data & send data of temperature
>>> 2 Loop every 600s= Collect data & send data of humidity
I think the tool I need is multi-threading, but I don't understand which library or structure I have to implement in my case.
Here is the code provided by Azure, including one loop that handles temperature and humidity at the same time. Reading the data and sending to Azure every 60 seconds.
import random
import time
import sys
# Using the Python Device SDK for IoT Hub:
from iothub_client import IoTHubClient, IoTHubClientError,
IoTHubTransportProvider, IoTHubClientResult
from iothub_client import IoTHubMessage, IoTHubMessageDispositionResult,
IoTHubError, DeviceMethodReturnValue
# The device connection string to authenticate the device with your IoT hub.
CONNECTION_STRING = "{Your IoT hub device connection string}"
# Using the MQTT protocol.
PROTOCOL = IoTHubTransportProvider.MQTT
MESSAGE_TIMEOUT = 10000
# Define the JSON message to send to IoT Hub.
TEMPERATURE = 20.0
HUMIDITY = 60
MSG_TXT = "{\"temperature\": %.2f,\"humidity\": %.2f}"
def send_confirmation_callback(message, result, user_context):
print ( "IoT Hub responded to message with status: %s" % (result) )
def iothub_client_init():
# Create an IoT Hub client
client = IoTHubClient(CONNECTION_STRING, PROTOCOL)
return client
def iothub_client_telemetry_sample_run():
try:
client = iothub_client_init()
print ( "IoT Hub device sending periodic messages, press Ctrl-C to exit" )
#******************LOOP*******************************
while True:
# Build the message with simulated telemetry values.
temperature = TEMPERATURE + (random.random() * 15)
humidity = HUMIDITY + (random.random() * 20)
msg_txt_formatted = MSG_TXT % (temperature, humidity)
message = IoTHubMessage(msg_txt_formatted)
# Send the message.
print( "Sending message: %s" % message.get_string() )
client.send_event_async(message, send_confirmation_callback, None)
time.sleep(60)
except IoTHubError as iothub_error:
print ( "Unexpected error %s from IoTHub" % iothub_error )
return
except KeyboardInterrupt:
print ( "IoTHubClient sample stopped" )
if __name__ == '__main__':
print ( "IoT Hub Quickstart #1 - Simulated device" )
print ( "Press Ctrl-C to exit" )
iothub_client_telemetry_sample_run()
I would like to use the same structure of functions, including two loops that handles temperature and humidity, one every 60s and one every 600s.
while True:
# Build the message with simulated telemetry values.
temperature = TEMPERATURE + (random.random() * 15)
msg_txt_formatted1 = MSG_TXT1 % (temperature)
message1 = IoTHubMessage(msg_txt_formatted1)
# Send the message.
print( "Sending message: %s" % message1.get_string() )
client.send_event_async(message1, send_confirmation_callback, None)
time.sleep(60)
while True:
# Build the message with simulated telemetry values.
humidity = HUMIDITY + (random.random() * 20)
msg_txt_formatted2 = MSG_TXT2 % (humidity)
message2 = IoTHubMessage(msg_txt_formatted2)
# Send the message.
print( "Sending message: %s" % message2.get_string() )
client.send_event_async(message2, send_confirmation_callback, None)
time.sleep(600)
How can I do that? How to call those loops with multi-threading or another method?
It may be simpler to do something like
while True:
loop_b()
for _ in range(10):
loop_a()
time.sleep(60)
or even
while True:
time.sleep(1)
now = time.time()
if now % 60 == 0:
loop_a()
if now % 600 == 0:
loop_b()
But if you really want to use threads, then:
import threading
class LoopAThread(threading.Thread):
def run(self):
loop_a()
class LoopBThread(threading.Thread):
def run(self):
loop_b()
...
thread_a = LoopAThread()
thread_b = LoopBThread()
thread_a.start()
thread_b.start()
thread_a.join()
thread_b.join()
Here are two competing approaches to consider
Don't bother with threads at all. Just have one loop that sleeps every 60 seconds like you have now. Keep track of the last time you sent humidity data. If 600 seconds has passed, then send it. Otherwise, skip it and go to sleep for 60 seconds. Something like this:
from datetime import datetime, timedelta
def iothub_client_telemetry_sample_run():
last_humidity_run = None
humidity_period = timedelta(seconds=600)
client = iothub_client_init()
while True:
now = datetime.now()
send_temperature_data(client)
if not last_humidity_run or now - last_humidity_run >= humidity_period:
send_humidity_data(client)
last_humidity_run = now
time.sleep(60)
Rename iothub_client_telemetry_sample_run to temperature_thread_func or something like it. Create a separate function that looks just like it for humidity. Spawn two threads from the main function of your program. Set them to daemon mode so they shutdown when the user exits
from threading import Thread
def temperature_thread_func():
client = iothub_client_init()
while True:
send_temperature_data(client)
time.sleep(60)
def humidity_thread_func():
client = iothub_client_init()
while True:
send_humidity_data(client)
time.sleep(600)
if __name__ == '__main__':
temp_thread = Thread(target=temperature_thread_func)
temp_thread.daemon = True
humidity_thread = Thread(target=humidity_thread_func)
humidity_thread.daemon = True
input('Polling for data. Press a key to exit')
Notes:
If you decide to use threads, consider using an
event
to terminate them cleanly.
time.sleep is not a precise way to keep
time. You might need a different timing mechanism if the samples need
to be taken at precise moments.

Simpy queue simulation

I'm trying to simulate a queue with limited buffer where no packet is dropped but kept in waiting . Bear with me since I'm just a student with basic coding skills.
The packet arrive exponentially distributed and each hold a packet size with mean 1250 bytes. I managed to get the code working for packet arrival + processing time but i couldn't make the packet 'depart' and also simulating the queue (so far, it is with unlimited buffer) Is there anything I could do to simulate the packet departure and the queue limit?
code:
import random
import simpy
RANDOM_SEED = 42
NEW_CUSTOMERS = 100 # Total number of customers
INTERVAL_CUSTOMERS = 1 # Generate new customers roughly every x seconds
SIZE = 1250
def source(env, number, interval, port):
"""Source generates packet randomly"""
for i in range(number):
size = int(random.expovariate(0.0008))
packet = Packet(env, '%d' % i, size, port, time_in_port=1)
env.process(packet)
t = random.expovariate(1 / interval)
yield env.timeout(t)
def Packet(env, id, size, port, time_in_port):
arrive = env.now
yield Queue.buffer.put(size)
print('packet%s %s arriving at %lf' % (id, size, arrive))
with port.request() as req:
yield req
tip = random.expovariate(1/time_in_port)
yield env.timeout(tip)
amount = size
yield Queue.buffer.get(amount)
print('packet%s %s finished processing at %lf' % (id, size, env.now))
class queue: #THIS PART WON'T WORK
def __init__(self, env):
self.port = simpy.Resource(env, capacity=1)
self.buffer = simpy.Container(env, init = 0, capacity=12500)
self.mon_proc = env.process(self.monitor_tank(env))
def monitor_tank(self, env):
while True:
if self.buffer.level > 12500:
print('Full at %d' % env.now)
random.seed(RANDOM_SEED)
env = simpy.Environment()
Queue = queue(env)
port = simpy.Resource(env, capacity=1)
env.process(source(env, NEW_CUSTOMERS, INTERVAL_CUSTOMERS, port))
env.run()
The queue class didn't work (The program won't run at all). It will run if only I remove the queue class and simulate packet arrival and processing time. Would appreciate any help to make me simulate the packet departure(using a sink) and the queue limit. Thanks.
Not familiar with the details, but your call to self.monitor_tank(env) in the queue constructor is going to go into a tight infinite loop - it isn't a generator, just an unending loop, so python is going to get stuck at that point in the execution.
I think this code from your code is a infinite loop and is blocking your code from running
def monitor_tank(self, env):
while True:
if self.buffer.level > 12500:
print('Full at %d' % env.now)
Try commenting This piece out, or adding a env.timeout so it "sleeps" for a bit on every loop pass
Hello I think the code will solve your problem or at least give you a direction. As in your original code all the packages have the same size, I modelled in this packages, but to change to bytes is straight forward.
I used a buffer (container) and a server (resource).
;)
import simpy
import random
def arrival(env, buffer):
#Arrival of the Package
while True:
print('Package ARRIVED at %.1f \n\t Buffer: %i'
% (env.now, buffer.level))
yield buffer.put(1) # Put the package in the buffer
yield env.timeout(random.expovariate(1.0)) # time between arrivals
env.process(processDeparture(env, buffer, server))
def processDeparture(env, buffer, server):
#Processing and Departure of the Package
while True:
# request a Server to process thge package
request = server.request()
yield request
yield buffer.get(1) # GET a package from the buffer
# Processing time of the package
processingTime = 2
print('Package begin processing at %.1f'
% (env.now))
yield env.timeout(processingTime)
print('Package end processing at %.1f'
% (env.now))
# release the server
yield server.release(request)
random.seed(150)
env = simpy.Environment()
buffer = simpy.Container(env, capacity=3, init=0) # Create the Buffer
server = simpy.Resource(env, capacity=1) # Create the servers (resources)
env.process(arrival(env, buffer))
env.run(until=30) # Execute the Model

Profiling Serial communication using timeit

I need to communicate with an embedded system over RS232. For this I want to profile the time it takes to send a response to each command.
I've tested this code using two methods: datetime.now() and timeit()
Method #1
def resp_time(n,msg):
"""Given number of tries - n and bytearray list"""
msg = bytearray(msg)
cnt = 0
timer = 0
while cnt < n:
time.sleep(INTERVAL)
a = datetime.datetime.now()
ser.flush()
ser.write(msg)
line = []
for count in ser.read():
line.append(count)
if count == '\xFF':
# print line
break
b = datetime.datetime.now()
c = b-a
# print c.total_seconds()*1000
timer = timer + c.total_seconds()*1000
cnt = cnt + 1
return timer/n
ser = serial.Serial(COMPORT,BAUDRATE,serial.EIGHTBITS, serial.PARITY_NONE, serial.STOPBITS_ONE, timeout=16)
if ser.isOpen():
print "Serial port opened at: Baud:",COMPORT,BAUDRATE
cmd = read_file()
# returns a list of commands [msg1,msg2....]
n = 100
for index in cmd:
timer = resp_time(n,index)
print "Time in msecs over %d runs: %f " % (n,timer)
Method #2
def com_loop(msg):
msg = bytearray(msg)
time.sleep(INTERVAL)
ser.flush()
ser.write(msg)
line = []
for count in ser.read():
line.append(count)
if count == '\xFF':
break
if __name__ == '__main__':
import timeit
ser = serial.Serial(COMPORT,BAUDRATE,serial.EIGHTBITS, serial.PARITY_NONE, serial.STOPBITS_ONE, timeout=16)
if ser.isOpen():
print "Serial port opened at: Baud:",COMPORT,BAUDRATE
cmd = read_file()
# returns a list of commands [msg1,msg2....]
n = 100
for index in cmd:
t = timeit.timeit("com_loop(index)","from __main__ import com_loop;index=%s;" % index,number = n)
print t/100
With datetime I get 2 milli-sec to execute a command & with timeit I get 200 milli-sec for the same command.
I suspect I'm not calling timeit() properly, can someone point me in the right direction?
I'd assume 200µs is closer to the truth, considering your comport will have something like 115200baud; assuming messages are 8 bytes long, transmitting one message would take about 9/115200 s ~= 10/100000 = 1/10,000 = 100µs on the serial line alone. Being faster than that will be pretty impossible.
Python is definitely not the language of choice to do timing characterization at these scales. You will need to get a logic analyzer, or work very close to the serial controller (which I hope is directly attached to your PC's IO controller and not some USB devices, because that will introduce latencies in the same order of magnitude, at least). If you're talking about microseconds, the limiting factor in measurement is usually the random time it takes for your PC to react to an interrupt, the OS to run the interrupt service routine, the scheduler to continue your userland process, and then starts python with its levels and levels of indirection. You're basically measuring the size of single grains of sand by holding a banana next to them.

Categories