I am currently pulling stock data down from an api that I have access to. I am doing it in the following steps:
loop through a list of symbols/stocks one by one
create a socket connection and send the relevant message to the api
receive the data and separate it into lines until "!EndMSG!" is received at which point the data for that symbol is complete
convert the data (string) into a StringIO, then read it into a pandas dataframe and ultimately write the data to sql
Do the next symbol/stock
Relevant code snippet:
def readlines(sock, recv_buffer=4096, delim='\n'):
buffer = ''
while True:
data = sock.recv(recv_buffer)
buffer += str(data.decode('latin-1'))
while buffer.find(delim) != -1:
line, buffer = buffer.split('\n', 1)
yield line
def main():
syms = ['MSFT', 'AAPL', 'GS', 'F']
for sym in syms:
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((host, port))
data = ''
message = sym + #relevant api specific commands
sock.sendall(message.encode())
for line in readlines(sock):
if "!ENDMSG!" in line:
break
data += line + '\n'
sock.close()
data = io.StringIO(data)
df = pd.read_csv(data)
df.to_sql(...)
I would like to incorporate threading into this so that I don't have to do one stock at a time. However what im not sure of is where/how to implement locks so that I don't risk getting data for incorrect stocks to incorrect variables etc
This is what I have so far:
import threading
from queue import Queue
q = Queue()
my_lock = threading.Lock()
def readlines(sock, recv_buffer=4096, delim='\n'):
buffer = ''
while True:
data = sock.recv(recv_buffer)
buffer += str(data.decode('latin-1'))
while buffer.find(delim) != -1:
line, buffer = buffer.split('\n', 1)
yield line
def get_symbol_data(sym, sock):
with my_lock:
data = ''
message = sym + #relevant api specific commands
sock.sendall(message.encode())
for line in readlines(sock):
if "!ENDMSG!" in line:
break
data += line + '\n'
data = io.StringIO(data)
df = pd.read_csv(data)
df.to_sql(...)
def threader():
while True:
sym_tuple = q.get()
sym = sym_tuple[0]
sock = sym_tuple[1]
get_symbol_data(sym, sock)
q.task_done()
def main():
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((host, port))
# create 4 threads
for x in range(4):
t = threading.Thread(target=threader)
t.daemon = True
t.start()
syms = ['MSFT', 'AAPL', 'GS', 'F']
for sym in syms:
q.put((sym, sock))
q.join()
sock.close()
My attempt at incorporating threading simply hangs. No errors, nothing. It just hangs. Hopefully someone can point me in the right direction.
Im also not even sure if im using the lock at the right place?
btw, if I do not use a lock, the program still hangs. Presumably it should still work even if the data is all jumbled up because of not using locks?
Here are my 2*[small unit of currency]:
What is the lock supposed to do? Now each thread has to wait for the lock before receiving data. This is not very efficient since the network operation is probably the thing that could benefit the most from being parallelized.
Create a socket in each thread. This way, you don't need to synchronize access to the socket and maybe get rid of locks completely. Alternatively, use a socket pool.
I'm not sure how you are storing your data, but you might need synchronization between writers when you are updating the pandas data frame. You mention SQL - hopefully your database takes care of that for you. Another option is to have the API/socket readers report their data to a second type of thread (or just the main thread) that collects/writes the data to your storage.
All of the above assuming that the network operations are the reason why you want to parallelize in the first place. Someone mentioned in the comment that you could reuse the socket for all symbols. I don't know how your API works, but it seems to me that this would require all symbols to be collected serially.
Related
I'm working on a program that receives a string from an Android app sent through WiFi, the program was originally written for Python 2.7, but after adding some additional functionalities I changed it to Python 3.7. However, after making that change, my data had an extra letter at the front and for the life of me I can't figure out why that is.
Here's a snippet of my code, it's a really simple if statement to see which command was sent from the Android app and controls Raspberry Pi (4) cam (v.2) with the command.
This part sets up the connections and wait to see which command I send.
isoCmd = ['auto','100','200','300','400','500','640','800']
HOST = ''
PORT = 21567
BUFSIZE = 1024
ADDR = (HOST,PORT)
brightness = 50
timelapse = 0
tcpSerSock = socket(AF_INET, SOCK_STREAM)
tcpSerSock.bind(ADDR)
tcpSerSock.listen(5)
while True:
print ('Waiting for connection')
tcpCliSock,addr = tcpSerSock.accept()
try:
while True:
data = ''
brightness = ' '
data = tcpCliSock.recv(BUFSIZE)
dataStr = str(data[1:])
print ("Here's data ",dataStr)
if not data:
break
if data in isoCmd:
if data == "auto":
camera.iso = 0
print ('ISO: Auto')
else:
camera.iso = int(data)
print ('ISO: '), data
When I start the program this is what I see:
Waiting for connection
#If I send command '300'
Here's data b'300'
Here's data b''
Waiting for connection
I'm not sure why there's the extra b'' is coming from. I have tested the code by just adding the "b" at the beginning of each items in the array which worked for any commands that I defined, not for any commands to control the Pi camera since well, there's no extra b at the beginning. (Did that make sense?) My point is, I know I'm able to send commands no problem, just not sure how to get rid of the extra letter. If anyone could give me some advice that would be great. Thanks for helping.
Byte strings are represented by the b-prefix.
Although you can see the string in output on printing, inherently they are bytes.
To get a normal string out of it, decode function can help.
dataStr.decode("utf-8")
b'data' simply means the data inside quotes has been received in bytes form, as mentioned in other answers also, you have to decode that with decode('utf-8') to get it in string form.
I have updated your program below, to be compatible for v3.7+
from socket import *
isoCmd = ['auto','100','200','300','400','500','640','800']
HOST = ''
PORT = 21567
BUFSIZE = 1024
ADDR = (HOST,PORT)
brightness = 50
timelapse = 0
tcpSerSock = socket(AF_INET, SOCK_STREAM)
tcpSerSock.bind(ADDR)
tcpSerSock.listen(5)
while True:
print ('Waiting for connection')
tcpCliSock,addr = tcpSerSock.accept()
try:
while True:
data = ''
brightness = ' '
data = tcpCliSock.recv(BUFSIZE).decode('utf-8')
print ("Here's data "+data)
if not data:
break
if data in isoCmd:
if data == "auto":
camera.iso = 0
print ('ISO: Auto')
else:
camera.iso = int(data)
print ('ISO: '+ data)
except Exception as e:
print(e)
For some background: We currently have a piece of equipment in house which we use to measure the height of an object. It will scan the object, compare it with a reference image and return a pattern match percentage, and if that percentage is above some specified threshold, it will take the height of the object. We use Non-Procedural Ethernet to connect to the sensor through a python socket, and the data is sent by the sensor. The code below showcases how I connect to the sensor:
import socket
import time
import pandas as pd
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
try:
s.connect(("192.168.1.20", 8500))
s.settimeout(30)
data = []
print('Recording data...')
while True:
msg = s.recv(496)
d = msg.decode('utf-8').split(',')
data.append(d)
finally:
s.close()
out = []
out.append({'height': float(data[i][0]),
'reference_pctg': float(data[i][1].split('\r')[0])
})
csv = pd.DataFrame(data = out)
csv.to_csv('./data/' + sheet + '.csv', index = False)
print(csv)
Currently, the socket lasts 30 seconds, and is timed out after that. Issue is, we cannot use the controller to close the connection when the data is done being sent. Is there any way to set the socket to close when the sensor doesn't send any data for a specified time?
What is the best approach to process a socket connection where I need var data to end with a line break \n?
I'm using the code below but sometimes the tcp packets get chunked and it takes a long time to match data.endswith("\n").
I've also tried other approaches, like saving the last line if it doesn't end with \n and append it to dataon the next loop. but this also doesn't work because multiple packets get chunked and the 1st and 2nd part don't match.
I've no control over the other end, it basically sends multiple lines that end in \r\n.
Any suggestion will be welcome, as I don't have much knowledge on socket connections.
def receive_bar_updates(s):
global all_bars
data = ''
buffer_size = 4096
while True:
data += s.recv(buffer_size)
if not data.endswith("\n"):
continue
lines = data.split("\n")
lines = filter(None, lines)
for line in lines:
if line.startswith("BH") or line.startswith("BC"):
symbol = str(line.split(",")[1])
all_bars[symbol].append(line)
y = Thread(target=proccess_bars, kwargs={'symbol': symbol})
y.start()
data = ""
Example of "normal" data:
line1\r\n
line2\r\n
line3\r\n
Example of chunked data:
line1\r\n
line2\r\n
lin
If you have a raw input that you want to process as line, the io module is your friend because it will do the low level assembling of packets in lines.
You could use:
class SocketIO(io.RawIOBase):
def __init__(self, sock):
self.sock = sock
def read(self, sz=-1):
if (sz == -1): sz=0x7FFFFFFF
return self.sock.recv(sz)
def seekable(self):
return False
It is more robust than endswith('\n') because if one packet contains an embedded newline ('ab\ncd'), the io module will correctly process it. Your code could become:
def receive_bar_updates(s):
global all_bars
data = ''
buffer_size = 4096
fd = SocketIO(s) # fd can be used as an input file object
for line in fd:
if should_be_rejected_by_filter(line): continue # do not know what filter does...
if line.startswith("BH") or line.startswith("BC"):
symbol = str(line.split(",")[1])
all_bars[symbol].append(line)
y = Thread(target=proccess_bars, kwargs={'symbol': symbol})
y.start()
Use socket.socket.makefile() to wrap the socket in a class that implenents Text I/O. It handles buffering, converting between bytes and strings, and lets you iterate over lines. Remember to flush any writes.
Example:
#!/usr/bin/env python3
import socket, threading, time
def client(addr):
with socket.create_connection(addr) as conn:
conn.sendall(b'aaa')
time.sleep(1)
conn.sendall(b'bbb\n')
time.sleep(1)
conn.sendall(b'cccddd\n')
time.sleep(1)
conn.sendall(b'eeefff')
time.sleep(1)
conn.sendall(b'\n')
conn.shutdown(socket.SHUT_WR)
response = conn.recv(1024)
print('client got %r' % (response,))
def main():
with socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0) as listen_socket:
listen_socket.bind(('localhost', 0))
listen_socket.listen(1)
addr = listen_socket.getsockname()
threading.Thread(target=client, args=(addr,)).start()
conn, _addr = listen_socket.accept()
conn_file = conn.makefile(mode='rw', encoding='utf-8')
for request in conn_file:
print('server got %r' % (request,))
conn_file.write('response1\n')
conn_file.flush()
if __name__ == '__main__':
main()
$ ./example.py
server got 'aaabbb\n'
server got 'cccddd\n'
server got 'eeefff\n'
client got b'response1\n'
$
Are you accepting different connections? Or is it one stream of data, split up by \r\n's?
When accepting multiple connections you'd wait for a connection with s.accept() and then process all its data. When you have all of the packet, process its data, and wait for the next connection.
What you do then depends on what the structure of each packet would be.
(Example: https://wiki.python.org/moin/TcpCommunication)
If instead you are consuming a stream of data, you should probably process each 'line' you find in a separate thread, while you keep consuming on another.
Edit:
So, if I have your situation correct; one connection, the data being a string broken up by \r\n, ending with a \n. The data however does not correspond to what you are expecting, instead looping infinitely waiting for a \n.
The socket interface, as I understand it, ends with an empty data result. So the last buffer might have ended with a \n, but then just continued getting None objects, trying to find another \n.
Instead, try adding this:
if not data:
break
Full code:
def receive_bar_updates(s):
global all_bars
data = ''
buffer_size = 4096
while True:
data += s.recv(buffer_size)
if not data:
break
if not data.endswith("\n"):
continue
lines = data.split("\n")
lines = filter(None, lines)
for line in lines:
if line.startswith("BH") or line.startswith("BC"):
symbol = str(line.split(",")[1])
all_bars[symbol].append(line)
y = Thread(target=proccess_bars, kwargs={'symbol': symbol})
y.start()
data = ""
Edit2: Oops, wrong code
I have not tested this code, but it should work:
def receive_bar_updates(s):
global all_bars
data = ''
buf = ''
buffer_size = 4096
while True:
if not "\r\n" in data: # skip recv if we already have another line buffered.
data += s.recv(buffer_size)
if not "\r\n" in data:
continue
i = data.rfind("\r\n")
data, buf = data[:i+2], data[i+2:]
lines = data.split("\r\n")
lines = filter(None, lines)
for line in lines:
if line.startswith("BH") or line.startswith("BC"):
symbol = str(line.split(",")[1])
all_bars[symbol].append(line)
y = Thread(target=proccess_bars, kwargs={'symbol': symbol})
y.start()
data = buf
Edit: Forgot to mention, i only modified the code for receiving the data, i have no idea what the rest of the function (starting with lines = data.split("\n")) is for.
Edit 2: Now uses "\r\n" for linebreaks instead of "\n".
Edit 3: Fixed an issue.
You basically seem to want to read lines from the socket. Maybe you're better off not using low level recv calls but just use sock.makefile() and treat the result as a regular file where you can read lines from: from line in sfile: ...
That leaves the delay/chunk issue. This is likely to be caused by Nagle's algorithm on the sending side. Try disabling that:
sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
I'm working on a server, and all of the data is line based. I want to be able to raise an exception when a line exceeds a given length without reading any more data than I have to. For example, client X sends a line that's 16KB long even though the line-length limit is 1024 bytes. After reading more than 1024 bytes, I want to stop reading additional data, close the socket and raise an exception. I've looked through the docs and some of the source code, and I don't see a way to do this without rewriting the _readline method. Is there an easier way that I'm overlooking?
EDIT: Comments made me realize I need to add more information. I know I could write the logic to do this without much work, but I was hoping to use builtins to take advantage of efficient buffering with memoryview rather than implementing it myself again or going with the naive approach of reading chunks, joing and splitting as needed without a memoryview.
I don't really like accepting answers that don't really answer the question, so here's the approach I actually ended up taking, and I'll just mark it community wiki or unanswered later if no one has a better solution:
#!/usr/bin/env python3
class TheThing(object):
def __init__(self, connection, maxlinelen=8192):
self.connection = connection
self.lines = self._iterlines()
self.maxlinelen = maxlinelen
def _iterlines(self):
"""
Yield lines from class member socket object.
"""
buffered = b''
while True:
received = self.connection.recv(4096)
if not received:
if buffered:
raise Exception("Unexpected EOF.")
yield received
continue
elif buffered:
received = buffered + received
if b'\n' in received:
for line in received.splitlines(True):
if line.endswith(b'\n'):
if len(line) > self.maxlinelen:
raise LineTooLong("Line size: %i" % len(line))
yield line
else:
buffered = line
else:
buffered += received
if len(buffered) > self.maxlinelen:
raise LineTooLong("Too much data in internal buffer.")
def _readline(self):
"""
Return next available line from member socket object.
"""
return next(self.lines)
I haven't bothered comparing the code to be certain, but I'm doing fewer concatenations and splits, so I think mine may be more efficient.
I realize that your edit is clarifying that what you want is a builtin approach to achieving your goal. But I am not aware of anything existing that will help you in that fine grained control over the readline approach. But I thought I might just include an example that does do a coded approach with a generator and a split... Just for fun.
Reference this other question/answer for a nice generator that reads lines:
https://stackoverflow.com/a/822788/496445
Based on that reader:
server.py
import socket
MAXLINE = 100
def linesplit(sock, maxline=0):
buf = sock.recv(16)
done = False
while not done:
# mid line check
if maxline and len(buf) > maxline:
yield buf, True
if "\n" in buf:
(line, buf) = buf.split("\n", 1)
err = maxline and len(line) > maxline
yield line+"\n", err
else:
more = sock.recv(16)
if not more:
done = True
else:
buf = buf+more
if buf:
err = maxline and len(buf) > maxline
yield buf, err
HOST = ''
PORT = 50007
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind((HOST, PORT))
s.listen(1)
conn, addr = s.accept()
print 'Connected by', addr
for line, err in linesplit(conn, MAXLINE):
if err:
print "Error: Line greater than allowed length %d (got %d)" \
% (MAXLINE, len(line))
break
else:
print "Received data:", line.strip()
conn.close()
client.py
import socket
import time
import random
HOST = ''
PORT = 50007
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((HOST, PORT))
while True:
val = 'x'*random.randint(1, 50)
if random.random() > .5:
val += "\n"
s.sendall(val)
time.sleep(.1)
s.close()
output
Connected by ('127.0.0.1', 57912)
Received data: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Received data: xxxxxxxxxxxxxxxxxxxxxxxxxxxx
Received data: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
...
Received data: xxxxxxxxxxx
Received data: xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Error: Line greater than allowed length 100 (got 102)
The server reads over the data it receives and constantly checks the length of the line once it assembles one. If at any time the line exceeds the amount specified, it returns an error code. I threw this together kind of fast so I am sure the checks could be cleaned up a bit more, and the read buffer amount can be changed to address how quickly you want to detect the long lines before consuming too much data. In the output example above, I only got 2 more bytes than is allowed, and it stopped.
The client just sends random length data, with a 50/50 change of a newline.
I am trying to write a program that works as an intermedium. (M)
I can only use telnet to connect :
A needs to connect to M, B connects to M.
A sends data to M on a socket, M needs to pass it to B
B sends data to M on another socket
I tried this by starting four threads with a shared list
The problem is it seems it is not writing to the other socket, or even accepting writing.
Does anyone know a better way to implement this and pass it through to another socket
My code :
import sys
import arduinoReadThread
import arduinoWriteThread
import socket
class ControllerClass(object):
'''
classdocs
'''
bolt = 0
socketArray=list()
def __init__(self):
self.readAndParseArgv()
self.createThreads()
def readAndParseArgv(self):
array = sys.argv
print sys.argv
if len(array) != 3:
print "Too few arguments : ./script host:port host:port"
else:
for line in array:
if ":" in line:
splitted = line.split(':')
HOST = splitted[0]
print HOST
PORT = int(splitted[1])
print PORT
s=socket.socket(socket.AF_INET, socket.SOCK_STREAM ) #create an INET, STREAMing socket
s.bind((HOST,PORT)) #bind to that port
print "test"
s.listen(1) #listen for user input and accept 1 connection at a time.
self.socketArray.append(s)
def createThreads(self):
print "Creating Threads"
sharedArray1 = list()
sharedArray2 = list()
s1 = self.socketArray.pop()
s2 = self.socketArray.pop()
sT1 = arduinoWriteThread.writeThread().run(self.bolt,sharedArray1,s2)
sT2 = arduinoReadThread.readThread().run(self.bolt,sharedArray1,s1)
sT3 = arduinoReadThread.readThread().run(self.bolt,sharedArray2,s2)
sT4 = arduinoWriteThread.writeThread().run(self.bolt,sharedArray2,s1)
sT1.start()
sT2.start()
sT3.start()
sT4.start()
x = ControllerClass()
x
Two Threads :
Write Thread :
import threading
class writeThread ( threading.Thread ):
def run ( self,bolt,writeList,sockeToWriteTo ):
s = sockeToWriteTo
while(bolt == 0):
conn, addr = s.accept()
if len(writeList) > 0:
socket.send(writeList.pop(0))
Read Thread
import threading
class readThread ( threading.Thread ):
def run ( self,bolt,writeList,socketToReadFrom ):
s = socketToReadFrom
while(bolt == 0):
conn, addr = s.accept()
f = conn.rcv()
print f
writeList.append(f)
You don't really need threads for this...
When a new connection is accepted, add it to a list. When receiving anything from one of the connection in the list, send to all connections except the one you got the message from.
Use select to see which connections have send data to you.
Edit
Example using select:
# serversocket: One server socket listening on some port, has to be non-blocking
# all_sockets : List containing all connected client sockets
while True:
readset = [serversocket]
readset += all_sockets
# Wait for sockets to be ready, with a 0.1 second timeout
read_ready = select.select(readset, None, None, 0.1)
# If the listening socket can be read, it means it has a new connection
if serversocket in read_ready:
new_connection = serversocket.accept()
new_connection.setblocking(0); # Make socket non-blocking
all_sockets += [new_connection]
read_ready.remove(serversocket) # To not loop over it below
for socket in read_ready:
# Read data from socket
data = socket.recv(2048)
for s in all_sockets:
# Do not send to self
if s != socket:
s.send(data)
Disclaimer I have never really used the Python socket functions, the code above was made from reading the manual pages just now. The code is probably not optimal or very Pythonic either.