Related
I have to develop a UDP Client in Python. The purpose of the UDP client is to receive the packets via a port and process it (requires a map lookup) and then publish the processed data to a Kafka topic. The number of Packets received in a second is more than 2000.
I have tried a code which is as shown below. But there are packet losses.
import socket
from kafka import KafkaProducer
producer = KafkaProducer(bootstrap_servers=config.KAFKA_BOOTSTRAP_SERVER,
value_serializer=lambda m: json.dumps(m).encode('ascii'),security_protocol='SSL')
client_socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
client_socket.settimeout(1.0)
addr = ("0.0.0.0", 5000)
client_socket.bind(addr)
while True:
data, server = client_socket.recvfrom(1024)
d_1 = some_logic()
producer.send("XYZ",d_1)
Please suggest me a approach with a small code snippet to perform this activity without or minimal packet loss
Thanks in advance.
Using this code :
sender.py
import socket
import tqdm # pip install
# example data from https://opensource.adobe.com/Spry/samples/data_region/JSONDataSetSample.html
data = '\
[{"id":"0001","type":"donut","name":"Cake","ppu":0.55,"batters":{"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"},{"id":"1003","type":"Blueberry"},{"id":"1004","type":"Devil\'s Food"}]},"topping":[{"id":"5001","type":"None"},{"id":"5002","type":"Glazed"},{"id":"5005","type":"Sugar"},{"id":"5007","type":"Powdered Sugar"},{"id":"5006","type":"Chocolate with Sprinkles"},{"id":"5003","type":"Chocolate"},{"id":"5004","type":"Maple"}]},{"id":"0002","type":"donut","name":"Raised","ppu":0.55,"batters":{"batter":[{"id":"1001","type":"Regular"}]},"topping":[{"id":"5001","type":"None"},{"id":"5002","type":"Glazed"},{"id":"5005","type":"Sugar"},{"id":"5003","type":"Chocolate"},{"id":"5004","type":"Maple"}]},{"id":"0003","type":"donut","name":"Old Fashioned","ppu":0.55,"batters":{"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"}]},"topping":[{"id":"5001","type":"None"},{"id":"5002","type":"Glazed"},{"id":"5003","type":"Chocolate"},{"id":"5004","type":"Maple"}]}]\
'.encode("ascii")
assert len(data) == 1011, len(data) # close to the 1000 you average in your case
sender_socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sender_socket.settimeout(1.0) # 1 second is laaarge
addr = ("127.0.0.1", 6410)
sender_socket.connect(addr)
progress_bar = tqdm.tqdm(unit_scale=True)
while True:
bytes_sent = sender_socket.send(data)
assert bytes_sent == 1011, bytes_sent
progress_bar.update(1)
receiver.py
import json
import socket
import tqdm # pip install
receiver_socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
receiver_socket.settimeout(5.0)
addr = ("127.0.0.1", 6410)
receiver_socket.bind(addr)
progress_bar = tqdm.tqdm(unit_scale=True)
while True:
data_bytes, from_address = receiver_socket.recvfrom(1024)
data = json.loads(data_bytes)
progress_bar.update(1)
(using tqdm for easy speed monitoring)
I am around ~80 K it/s on my computer, which is roughly 80 times more than your case.
Try it yourself, see how much you get. Then add d_1 = some_logic() and measure again. Then add producer.send("XYZ",d_1) and measure again.
This will give you a pretty good picture of what is slowing you. Then ask another question on the specific problem. Better if you produce a Minimal Reproducible Example
Edit:
Indeed, the sender saturates the receiver, such that packets get dropped. It's because the receiver throughput is lower than the sender (because of the processing time), so here is an alternative :
steady_sender.py
import socket
import time
import tqdm # pip install
# example data from https://opensource.adobe.com/Spry/samples/data_region/JSONDataSetSample.html
data = '\
[{"id":"0001","type":"donut","name":"Cake","ppu":0.55,"batters":{"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"},{"id":"1003","type":"Blueberry"},{"id":"1004","type":"Devil\'s Food"}]},"topping":[{"id":"5001","type":"None"},{"id":"5002","type":"Glazed"},{"id":"5005","type":"Sugar"},{"id":"5007","type":"Powdered Sugar"},{"id":"5006","type":"Chocolate with Sprinkles"},{"id":"5003","type":"Chocolate"},{"id":"5004","type":"Maple"}]},{"id":"0002","type":"donut","name":"Raised","ppu":0.55,"batters":{"batter":[{"id":"1001","type":"Regular"}]},"topping":[{"id":"5001","type":"None"},{"id":"5002","type":"Glazed"},{"id":"5005","type":"Sugar"},{"id":"5003","type":"Chocolate"},{"id":"5004","type":"Maple"}]},{"id":"0003","type":"donut","name":"Old Fashioned","ppu":0.55,"batters":{"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"}]},"topping":[{"id":"5001","type":"None"},{"id":"5002","type":"Glazed"},{"id":"5003","type":"Chocolate"},{"id":"5004","type":"Maple"}]}]\
'.encode("ascii")
assert len(data) == 1011, len(data) # close to the 1000 you average in your case
sender_socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sender_socket.settimeout(1.0) # 1 second is laaarge
addr = ("127.0.0.1", 6410)
sender_socket.connect(addr)
progress_bar = tqdm.tqdm(unit_scale=True)
while True:
start_time = time.time()
bytes_sent = sender_socket.send(data)
assert bytes_sent == 1011, bytes_sent
progress_bar.update(1)
current_time = time.time()
remaining_time = 0.001 - (current_time - start_time) # until next millisecond
time.sleep(remaining_time)
It tries to send one packet every millisecond. It stays around ~900 packets/s for me, because the code is too simple (falling asleep takes time too !).
This way, the receiver processes fast enough so that no packet gets dropped (because UDP).
But here is another version, where the sender is bursty : it sends 1000 packet then goes to sleep until the next second.
bursty_sender.py
import socket
import time
import tqdm # pip install
# example data from https://opensource.adobe.com/Spry/samples/data_region/JSONDataSetSample.html
data = '\
[{"id":"0001","type":"donut","name":"Cake","ppu":0.55,"batters":{"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"},{"id":"1003","type":"Blueberry"},{"id":"1004","type":"Devil\'s Food"}]},"topping":[{"id":"5001","type":"None"},{"id":"5002","type":"Glazed"},{"id":"5005","type":"Sugar"},{"id":"5007","type":"Powdered Sugar"},{"id":"5006","type":"Chocolate with Sprinkles"},{"id":"5003","type":"Chocolate"},{"id":"5004","type":"Maple"}]},{"id":"0002","type":"donut","name":"Raised","ppu":0.55,"batters":{"batter":[{"id":"1001","type":"Regular"}]},"topping":[{"id":"5001","type":"None"},{"id":"5002","type":"Glazed"},{"id":"5005","type":"Sugar"},{"id":"5003","type":"Chocolate"},{"id":"5004","type":"Maple"}]},{"id":"0003","type":"donut","name":"Old Fashioned","ppu":0.55,"batters":{"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"}]},"topping":[{"id":"5001","type":"None"},{"id":"5002","type":"Glazed"},{"id":"5003","type":"Chocolate"},{"id":"5004","type":"Maple"}]}]\
'.encode("ascii")
assert len(data) == 1011, len(data) # close to the 1000 you average in your case
sender_socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sender_socket.settimeout(1.0) # 1 second is laaarge
addr = ("127.0.0.1", 6410)
sender_socket.connect(addr)
progress_bar = tqdm.tqdm(unit_scale=True)
while True:
start_time = time.time()
bytes_sent = sender_socket.send(data)
assert bytes_sent == 1011, bytes_sent
progress_bar.update(1)
if progress_bar.n % 1000 == 0:
current_time = time.time()
remaining_time = 1.0 - (current_time - start_time) # until next second
time.sleep(remaining_time)
It sends on average ~990 packets per second (losing less time to getting in and out of sleep). But the receiver only handles ~280 per second, the rest got dropped because the burst filled the receiver's buffer.
If I'm sending bursts at 400/s I process ~160/s.
You can monitor the drop using your OS's tool for monitoring network packet drop, Python can't.
If you don't want to drop, another solution is to use a queue : have the first one simply read from the socket and adding it to the queue, and the other reads from the queue and process. But then you have to ensure that the queue does not grow too large.
I'm able to handle bursts of 50 with my current system config, nearly 100, but not 150.
Here is an example with the queue :
queued_receiver.py
import json
import queue
import socket
import threading
import tqdm # pip install
messages_queue = queue.Queue(maxsize=-1) # infinite
received_packets_bar = tqdm.tqdm(position=0, desc="received", unit_scale=True)
queue_size_bar = tqdm.tqdm(position=1, desc="queue size", unit_scale=True)
processed_packets_bar = tqdm.tqdm(position=2, desc="processed", unit_scale=True)
def read_from_the_socket_into_the_queue():
receiver_socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
receiver_socket.settimeout(5.0)
addr = ("127.0.0.1", 6410)
receiver_socket.bind(addr)
while True:
data_bytes, from_address = receiver_socket.recvfrom(1024)
# no processing at all here ! we want to ensure the packet gets read, so that we are not dropping
messages_queue.put_nowait(data_bytes)
queue_size_bar.update(1)
received_packets_bar.update(1)
def read_from_the_queue_and_process():
while True:
data_bytes = messages_queue.get(block=True, timeout=None) # until a message is available
data = json.loads(data_bytes)
queue_size_bar.update(-1)
processed_packets_bar.update(1)
sum(range(10**5)) # slow computation, adjust
socket_thread = threading.Thread(target=read_from_the_socket_into_the_queue)
process_thread = threading.Thread(target=read_from_the_queue_and_process)
socket_thread.start()
process_thread.start()
Working on learning socket programming and I am having a strange issue crop up between my two codes depending on what IP I try to run them through.
Server:
import socket
import time
import datetime
import filecmp
HOST = 'localhost'
PORT = 9100
n = 1
x = 0
average_list = []
print('I am ready for any client side request \n')
file_comparison = "send.txt"
s=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.bind((HOST,PORT))
s.listen(1)
conn, addr = s.accept()
while n <= 100:
data = conn.recv(1024)
file = 'receive1.txt';
print('I am starting receiving file', file,'for the',n,'th time')
a = datetime.datetime.now()
f = open(file, 'wb')
f.write(data)
print('I am finishing receiving file', file,'for the',n,'th time')
b = datetime.datetime.now()
rawtime = b - a
millidelta = rawtime * 1000
average_list.append(millidelta)
real_average = ((sum(average_list, datetime.timedelta(0,0))) / n)
print('The time used in milliseconds to receive',file,'for the',n,'th time','is:',millidelta,'milliseconds')
print('The average time to receive',file,'in milliseconds is:',real_average)
if filecmp.cmp(file,file_comparison,shallow=False):
x = x+1
n=n + 1
f.close()
conn.close()
s.close()
print('I am done \n')
print('Total errors: ',x,'out of',n-1 )
Client:
import socket
import datetime
import time
import filecmp
#initializing host, port, filename, total time and number of times to send the file
host = 'localhost'
port = 9100
fileName = "send.txt"
n = 1
average_list = []
file_to_send = open(fileName,'rb')
while n <= 100:
data = file_to_send.read(1024)
s=socket.socket()
s.connect((host,port))
s.sendall(data)
#reading the next 1024 bits
print('I am connecting to server side:',host,'\n')
print('I am sending file',fileName,'for the',n,'th time')
a = datetime.datetime.now()
print('I am finishing sending file',fileName,'for the',n,'th time')
b = datetime.datetime.now()
rawtime = b - a
millidelta = rawtime * 1000
average_list.append(millidelta)
real_average = ((sum(average_list, datetime.timedelta(0,0))) / n)
print('The time used in milliseconds to send',fileName,'for the',n,'th time','is:',millidelta,'milliseconds')
print('The average time to send',fileName,'in milliseconds is:',real_average)
n = n + 1
file_to_send.close()
s.close()
print('I am done')
In this current iteration my client side code simply runs through the loop trying to send the data of a .txt file to a server that isnt receiving anything. If i change 'localhost' to my actual IP address, I instead get the server side code cycling through its while loop while the client side gives up after 2 iterations with:
ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it
with the error citing line 15, "s.connect((host,port)) as the cause of the issue. Ultimately Im stuck since changing my host between what I assumed were two correct implementations of the host are giving me drastically different results with neither working as intended.
What I think the error is trying to tell us from other times I have seen that is that the port the socket is trying to connect is still connected to another socket.
So my diagnosis of why that might be happening is that the s.close() is not in the while loop so it keeps making a new socket and then tries to connect on the same port.
Edit: I got a chance to run it on my side and it works for me if I pull the whole making and binding of a socket out of the loop like this:
import socket
import datetime
import time
import filecmp
#initializing host, port, filename, total time and number of times to send the file
host = 'localhost'
port = 9100
fileName = "send.txt"
n = 1
average_list = []
file_to_send = open(fileName,'rb')
s=socket.socket()
s.connect((host,port))
while n <= 100:
data = file_to_send.read(1024)
s.sendall(data)
#reading the next 1024 bits
print('I am connecting to server side:',host,'\n')
print('I am sending file',fileName,'for the',n,'th time')
a = datetime.datetime.now()
print('I am finishing sending file',fileName,'for the',n,'th time')
b = datetime.datetime.now()
rawtime = b - a
millidelta = rawtime * 1000
average_list.append(millidelta)
real_average = ((sum(average_list, datetime.timedelta(0,0))) / n)
print('The time used in milliseconds to send',fileName,'for the',n,'th time','is:',millidelta,'milliseconds')
print('The average time to send',fileName,'in milliseconds is:',real_average)
n = n + 1
s.close()
file_to_send.close()
This definitely works for me and sends the file 100 times and it gets received 100 times but I don't know if in your use case you need it to be a hundred new sockets instead of one socket sending 100 files that get successfully received.
I got this code for streaming a video from a client to a server:
Client:
import cv2, imutils
import mss
import numpy
from win32api import GetSystemMetrics
import pickle
import socket, struct
client_socket = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
host_ip = "IPADRESS"
port = 9999
client_socket.connect((host_ip,port))
with mss.mss() as sct:
monitor = {"top": 0, "left": 0, "width": GetSystemMetrics(0), "height": GetSystemMetrics(1)}
while True:
img = numpy.array(sct.grab(monitor))
frame = imutils.resize(img, width=1400)
a = pickle.dumps(frame)
message = struct.pack("Q",len(a))+a
client_socket.send(message)
Server:
import cv2, imutils
import numpy as np
import pickle, struct
import socket
import threading
server_socket = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
host_ip = "IP_ADRESS"
port = 9999
socket_address = (host_ip,port)
server_socket.bind(socket_address)
server_socket.listen()
print("Listening at",socket_address)
def show_client(addr,client_socket):
try:
print('CLIENT {} CONNECTED!'.format(addr))
if client_socket: # if a client socket exists
data = b""
payload_size = struct.calcsize("Q")
while True:
while len(data) < payload_size:
packet = client_socket.recv(4*1024)
if not packet:
break
data+=packet
packed_msg_size = data[:payload_size]
data = data[payload_size:]
msg_size = struct.unpack("Q",packed_msg_size)[0]
while len(data) < msg_size:
data += client_socket.recv(4*1024)
frame_data = data[:msg_size]
data = data[msg_size:]
frame = pickle.loads(frame_data)
cv2.imshow("Screen", frame)
key = cv2.waitKey(1) & 0xFF
if key == ord('q'):
break
client_socket.close()
except Exception as e:
print(e)
print(f"CLINET {addr} DISCONNECTED")
pass
while True:
client_socket,addr = server_socket.accept()
thread = threading.Thread(target=show_client, args=(addr,client_socket))
thread.start()
print("TOTAL CLIENTS ",threading.activeCount() - 1)
A lot of this code is from a youtuber called "pyshine", and everything is working just fine, but I don't understand, what a specific part of this code is really doing.
These are the parts:
First of all in the client-code:
message = struct.pack("Q",len(a))+a
I know that it does something with the length of the pickle and, that it appends the pickle to it, but not more.
Second of all in the server-code:
data = b""
payload_size = struct.calcsize("Q")
while True:
while len(data) < payload_size:
packet = client_socket.recv(4*1024)
if not packet:
break
data+=packet
packed_msg_size = data[:payload_size]
data = data[payload_size:]
msg_size = struct.unpack("Q",packed_msg_size)[0]
while len(data) < msg_size:
data += client_socket.recv(4*1024)
frame_data = data[:msg_size]
With printing out some values, I definitely understood it a bit better, but the whole process, how it gets the final "frame_data", is still a mystery to me. So I would really appreciate, if someone could explain me the process that is going there.
socket is primitive object and it doesn't care what data you send. You can send two frames and client can get it as single package or it may get it as many small packages - socket doesn't care where is end of first frame. To resolve this problem this code first sends len(data) and next data. It uses struct("Q") so this value len(data) has always 8 bytes. This way receiver knows how much data it has to receive to have complete frame - first it gets 8 bytes to get len(data) and later it use this value to get all data. And this is what second code does - it repeats recv() until it gets all data. It also checks if it doesn't get data from next frame - and keep this part as data[payload_size:] to use it with next frame/
If you will use the same rule(s) on both sides - sender first sends 8 bytes with sizeand next data, receiver first gets 8 bytes with size and next get data (using size) - then you have defined protocol. (similar to other protocols: HTTP (HyperText Transfer Protocol), FTP (File Transfer Protocol), SMTP (Send Mail Transfer Protocol), etc.)
I'm running into issues transferring data over TCP with a remote client and server written in Python. The server is located in a pretty remote region with relatively slow internet connection (<2Mb/sec). When the client is run on the LAN with the server the complete string is transferred (2350 bytes); however, when I run the client outside of the LAN sometimes the string is truncated (1485 bytes) and sometimes the full string comes through (2350 bytes). The size of the truncated string always seems to be 1485 bytes. The full size of the string is well below the set buffer size for the client and server.
I've copied abbreviated versions of the client and server code below, where I have tried to edit out all extraneous details:
Client
import socket
from time import sleep
class FTIRdataClient():
def __init__(self,TCP_IP="xxx.xxx.xxx.xxx",TCP_Port=xxx,BufferSize=4096):
#-----------------------------------
# Configuration parameters of server
#-----------------------------------
self.TCP_IP = TCP_IP
self.TCP_Port = int(TCP_Port)
self.RECV_BUFFER = int(BufferSize)
def writeTCP(self,message):
try:
sock = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
sock.connect((self.TCP_IP,self.TCP_Port))
sock.send(message)
incomming = sock.recv(self.RECV_BUFFER)
sock.close()
except:
print "Unable to connect to data server!!"
incomming = False
return incomming
if __name__ == "__main__":
#----------------------------------
# Initiate remote data client class
#----------------------------------
dataClass = FTIRdataClient(TCP_IP=dataServer_IP,TCP_Port=portNum,BufferSize=4096)
#--------------------------------
# Ask database for all parameters
#--------------------------------
allParms = dataClass.writeTCP("LISTALL")
Server
import os
import sys
import socket
import select
import smtplib
import datetime as dt
class FTIRdataServer(object):
def __init__(self,ctlFvars):
...
def runServer(self):
self.server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.server_socket.bind((self.TCP_IP,self.TCP_Port))
#self.server_socket.setsockopt(socket.IPPROTO_TCP,socket.TCP_NODELAY,1)
self.server_socket.listen(10)
self.connection_list.append(self.server_socket)
#-------------------------------------
# Start loop to listen for connections
#-------------------------------------
while True:
#--------------------
# Get list of sockets
#--------------------
read_sockets,write_sockets,error_sockets = select.select(self.connection_list,[],[],5)
for sock in read_sockets:
#-----------------------
# Handle new connections
#-----------------------
if sock == self.server_socket:
#----------------------------------------------
# New connection recieved through server_socket
#----------------------------------------------
sockfd, addr = self.server_socket.accept()
self.connection_list.append(sockfd)
print "Client (%s, %s) connected" % addr
#-------------------------------------
# Handle incomming request from client
#-------------------------------------
else:
#------------------------
# Handle data from client
#------------------------
try:
data = sock.recv(self.RECV_BUFFER)
#------------------------------------------------
# Three types of call to server:
# 1) set -- sets the value of a data parameter
# 2) get -- gets the value of a data parameter
# 3) write -- write data to a file
#------------------------------------------------
splitVals = data.strip().split()
...
elif splitVals[0].upper() == 'LISTALL':
msgLst = []
#----------------------------
# Create a string of all keys
# and values to send back
#----------------------------
for k in self.dataParams:
msgLst.append("{0:}={1:}".format(k," ".join(self.dataParams[k])))
msg = ";".join(msgLst)
sock.sendall(msg)
...
else:
pass
#---------------------------------------------------
# Remove client from socket list after disconnection
#---------------------------------------------------
except:
sock.close()
self.connection_list.remove(sock)
continue
#-------------
# Close server
#-------------
self.closeServer()
def closeServer(self):
''' Close the TCP data server '''
self.server_socket.close()
Your help is greatly appreciated!!!
For anyone who is interested I found the solution to this problem. John Nielsen has a pretty good explanation here. Basically, TCP stream only guarantees that bytes will not arrive out of order or be duplicated; however, it does not guarantee how many groups the data will be sent in. So one needs to continually read (socket.recv) until all the data is sent. The previous code work on the LAN because the server was sending the entire string in one group. Over a remote connection the string was split into several groups.
I modified the client to continually loop on socket.recv() until the socket is closed and I modified the server to immediately close the socket after sending the data. There are several other ways to do this mentioned in the above link. The new code looks like:
Client
class FTIRdataClient(object):
def __init__(self,TCP_IP="xxx.xxx.xx.xxx",TCP_Port=xxxx,BufferSize=4024):
#-----------------------------------
# Configuration parameters of server
#-----------------------------------
self.TCP_IP = TCP_IP
self.TCP_Port = int(TCP_Port)
self.RECV_BUFFER = int(BufferSize)
def setParam(self,message):
try:
sock = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
sock.connect((self.TCP_IP,self.TCP_Port))
sock.sendall("set "+message)
#-------------------------
# Loop to recieve all data
#-------------------------
incommingTotal = ""
while True:
incommingPart = sock.recv(self.RECV_BUFFER)
if not incommingPart: break
incommingTotal += incommingPart
sock.close()
except:
print "Unable to connect to data server!!"
incommingTotal = False
return incommingTotal
Server
class FTIRdataServer(object):
def __init__(self,ctlFvars):
...
def runServer(self):
self.server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.server_socket.bind((self.TCP_IP,self.TCP_Port))
#self.server_socket.setsockopt(socket.IPPROTO_TCP,socket.TCP_NODELAY,1)
self.server_socket.listen(10)
self.connection_list.append(self.server_socket)
#-------------------------------------
# Start loop to listen for connections
#-------------------------------------
while True:
#--------------------
# Get list of sockets
#--------------------
read_sockets,write_sockets,error_sockets = select.select(self.connection_list,[],[],5)
for sock in read_sockets:
#-----------------------
# Handle new connections
#-----------------------
if sock == self.server_socket:
#----------------------------------------------
# New connection recieved through server_socket
#----------------------------------------------
sockfd, addr = self.server_socket.accept()
self.connection_list.append(sockfd)
print "Client (%s, %s) connected" % addr
#-------------------------------------
# Handle incomming request from client
#-------------------------------------
else:
#------------------------
# Handle data from client
#------------------------
try:
data = sock.recv(self.RECV_BUFFER)
...
elif splitVals[0].upper() == 'LISTALL':
msgLst = []
#----------------------------
# Create a string of all keys
# and values to send back
#----------------------------
for k in self.dataParams:
msgLst.append("{0:}={1:}".format(k," ".join(self.dataParams[k])))
msg = ";".join(msgLst)
sock.sendall(msg)
elif splitVals[0].upper() == 'LISTALLTS': # List all time stamps
msgLst = []
#----------------------------
# Create a string of all keys
# and values to send back
#----------------------------
for k in self.dataParamTS:
msgLst.append("{0:}={1:}".format(k,self.dataParamTS[k]))
msg = ";".join(msgLst)
sock.sendall(msg)
...
else:
pass
#------------------------
# Close socket connection
#------------------------
sock.close()
self.connection_list.remove(sock)
#------------------------------------------------------
# Remove client from socket list if client discconnects
#------------------------------------------------------
except:
sock.close()
self.connection_list.remove(sock)
continue
#-------------
# Close server
#-------------
self.closeServer()
Whatever. This is probably common knowledge and I'm just a little slow.
I am currently pulling stock data down from an api that I have access to. I am doing it in the following steps:
loop through a list of symbols/stocks one by one
create a socket connection and send the relevant message to the api
receive the data and separate it into lines until "!EndMSG!" is received at which point the data for that symbol is complete
convert the data (string) into a StringIO, then read it into a pandas dataframe and ultimately write the data to sql
Do the next symbol/stock
Relevant code snippet:
def readlines(sock, recv_buffer=4096, delim='\n'):
buffer = ''
while True:
data = sock.recv(recv_buffer)
buffer += str(data.decode('latin-1'))
while buffer.find(delim) != -1:
line, buffer = buffer.split('\n', 1)
yield line
def main():
syms = ['MSFT', 'AAPL', 'GS', 'F']
for sym in syms:
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((host, port))
data = ''
message = sym + #relevant api specific commands
sock.sendall(message.encode())
for line in readlines(sock):
if "!ENDMSG!" in line:
break
data += line + '\n'
sock.close()
data = io.StringIO(data)
df = pd.read_csv(data)
df.to_sql(...)
I would like to incorporate threading into this so that I don't have to do one stock at a time. However what im not sure of is where/how to implement locks so that I don't risk getting data for incorrect stocks to incorrect variables etc
This is what I have so far:
import threading
from queue import Queue
q = Queue()
my_lock = threading.Lock()
def readlines(sock, recv_buffer=4096, delim='\n'):
buffer = ''
while True:
data = sock.recv(recv_buffer)
buffer += str(data.decode('latin-1'))
while buffer.find(delim) != -1:
line, buffer = buffer.split('\n', 1)
yield line
def get_symbol_data(sym, sock):
with my_lock:
data = ''
message = sym + #relevant api specific commands
sock.sendall(message.encode())
for line in readlines(sock):
if "!ENDMSG!" in line:
break
data += line + '\n'
data = io.StringIO(data)
df = pd.read_csv(data)
df.to_sql(...)
def threader():
while True:
sym_tuple = q.get()
sym = sym_tuple[0]
sock = sym_tuple[1]
get_symbol_data(sym, sock)
q.task_done()
def main():
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((host, port))
# create 4 threads
for x in range(4):
t = threading.Thread(target=threader)
t.daemon = True
t.start()
syms = ['MSFT', 'AAPL', 'GS', 'F']
for sym in syms:
q.put((sym, sock))
q.join()
sock.close()
My attempt at incorporating threading simply hangs. No errors, nothing. It just hangs. Hopefully someone can point me in the right direction.
Im also not even sure if im using the lock at the right place?
btw, if I do not use a lock, the program still hangs. Presumably it should still work even if the data is all jumbled up because of not using locks?
Here are my 2*[small unit of currency]:
What is the lock supposed to do? Now each thread has to wait for the lock before receiving data. This is not very efficient since the network operation is probably the thing that could benefit the most from being parallelized.
Create a socket in each thread. This way, you don't need to synchronize access to the socket and maybe get rid of locks completely. Alternatively, use a socket pool.
I'm not sure how you are storing your data, but you might need synchronization between writers when you are updating the pandas data frame. You mention SQL - hopefully your database takes care of that for you. Another option is to have the API/socket readers report their data to a second type of thread (or just the main thread) that collects/writes the data to your storage.
All of the above assuming that the network operations are the reason why you want to parallelize in the first place. Someone mentioned in the comment that you could reuse the socket for all symbols. I don't know how your API works, but it seems to me that this would require all symbols to be collected serially.