python TCP how many bytes to read - python

I want to send a filename from 1 machine to another. Would I need to find out exactly how big the string is and read exactly that many bytes? Right now I'm reading an arbitrary amount of bytes (200) that I know is bigger than the string. Is this a problem? I want to send more commands in the future too is reading more bytes than nessesary going to mess that up?
send code:
filenameToSend = 'images/capture' + str(captureNumber) + '-' + str(pairNumber + 1) + '-' + 'IMAGEINPAIRNUMBER' + '-' + currentTime + '.jpg'
# send message to capture image pair on slave
sock.sendto(filenameToSend.encode('utf-8'), (slave1_TCP_IP, TCP_PORT))
recieve code:
c, addr = sock.accept()
data = c.recv(200)
command = data.decode('utf-8')

Related

Sending Sparse Files

I've been messing around with sockets in Python and I'd like to be able to send a sparse image file from one machine to another. As expected, sending a sparse file over a python socket doesn't preserve the sparseness of the file. I'd like to do a sparse tar and send it that way, but I just can't figure it out.
The tarfile module says it supports reading sparse files with the GNU format which doesn't help me for creating them... but the python docs say the Pax format has "virtually no limits". I'm not sure if that means I can create an archive and preserve the sparse file or not using the pax format... I've been trying but I just have no idea how it might work.
If this solution isn't an option, is there any other way to send a sparse file over a socket? I hate to have to call 'tar -xSf' via a system command from my application...
Thanks,
Server
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
s.bind((socket.gethostname(), 50001))
s.listen(1)
img = open('test.img', 'rb')
client, addr = s.accept()
l = img.read(8192)
while(l):
client.send(l)
l = img.read(8192)
img.close()
s.close()
Client
host = ''
port = 50001
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
s.connect((host, port))
img = open('./newimg.img', 'wb')
l = s.recv(8192)
while(l):
img.write(l)
l = s.recv(8192)
img.close()
s.close()
On the server, I make a new sparse file: truncate -s 1G test.img
a du -h shows: 0 test.img
I run my server and client. Here is a du -h on the transferred file: 1.0G newimg.img
As you can see, it expands the file and it is no longer sparse.
Holes in files are normally created if you write to the beginning of a file, seek to the end and write there. If you read the file you are reading zeros even if there are holes in the file. When you send the files the literal bytes are sent and of course also read. When you then write the bytes all bytes will be written and it will not happen that the holes are created by the filesystem.
To mitigate that you can first seek the holes in the file, sent where they are, and then send the rest of the file.
The following is not polished but should give you a starting point.
import os
f = open(path, "b")
fd = f.fileno()
end = os.stat(fd).st_size
holes = []
offset = os.lseek(fd, 0, os.SEEK_HOLE)
while offset != end:
end_hole = os.lseek(fd, offset, os.SEEK_DATA)
holes.append((offset, end_hole))
offset = end_hole
[open socket and stuff]
# send the holes
socket.write(json.dumps(holes)) # encode appropriately
# send file
f.seek(0)
total = 0
for hole in holes:
while total < hole[0]:
l = f.read(8192)
if len(l) + total > hole[0]:
socket.write(l[:len(l) + total - hole[0]])
l.seek(hole[1])
total += len(1) + total - hole[0]
else:
socket.write(l)
total += len(l)
Then on the client side:
still_json = True
a = []
l = s.recv(8192)
while(still_json):
a.append(l)
if check_json_end(l):
still_json = False
else:
l = s.recv(8192)
holes = parse_json(a) # the last chunk can contain something that is not json
# I asume that a still contains the bytes that are not json
fout = open(outfile, "wb")
total = 0
fout.write(a[0]) # handle the case where the first rest after the json in a is already after a hole
total += len(a[0])
for hole in holes:
while total < hole[0]:
l = socket.recv(8192)
if len(l) + total > hole[0]:
fout.write(l[:len(l) + total - hole[0]])
fout.seek(hole[1])
fout.write(l[len(l) + total - hole[0]:])
else:
fout.write(l)
total += len(l)
There are probably lots of bugs in it and you should rethink each line, but the general principle should be alright. JSON is of course arbitrarily chosen, there are probably other protocols that are better in that case. You could also create your own.

Python (pySerial): cannot trim/remove CR, LF or unexpected characters from string

I am using an Arduino to output Temp & Hum data from a sensor, this is being read by a PC running Python using pySerial. The data is reading in correctly but I would like to remove the CR/LF and unexpected characters. One idea I found on this site was to use lstrip or lreplace but they do not seem to work correctly. They will remove one instance of the character but even repeating the line or making a small loop has no effect.
This is what the program Prints (Bottom line is string after I've tried to cut out the unnecessary characters):
[b'\n', b'\r\n', b'Read sensor: OK\r\n', b'Hum40.00\r\n', b'TempC18.00\r\n']
[" b'Hum40.00\r\n'", " b'TempC18.00\r\n']"]
I am aiming for it to read:
[Hum40.00, TempC18.00]
I can hopefully fine tune the message later.
This is the code:
import serial as ser
import time
count = 0
msgArray = []
saveMsg = []
ser = ser.Serial('COM16', 9600, timeout=1, parity='N', stopbits=1, bytesize=8, xonxoff=0) # Setting up and opening COM port
ser.close()
ser.open()
def readSerial(): #reads a whole line from COM port
serLine = ser.readline()
return serLine
def sveMsgCut(): #saves the buffer as a message then cuts message
cutMsg = saveMsg
words = cutMsg.split(',')
return words
while True: #main program
dataSerial = readSerial()
if count < 5: #reads COM port 5 times and passes along to buffer msgArray
msgArray.append(dataSerial)
count = count+1
else:
print(msgArray) #~display msgArray
saveMsg = str(msgArray) #convert to string
splitMsg = saveMsg.split(',') #splits string (csv)
phrase = splitMsg[3:5] #cuts out excess either side of Temp & Hum/
phraseString = str(phrase)
phraseNew = phraseString.lstrip("/n") #an attempts ot remove CR
print(phraseNew) #~print adjusted string
saveMsg = msgArray
count = 0 #resets msgArray and counter
msgArray = []
time.sleep(5)
I am fairly new to programming, especially Python so it may be something simple that I've missed but have tried several different ideas and cannot remove the extra characters.
not sure why rstrip/lstrip do not work for you.
this code runs as expected on my machine:
s = '\r\nHum40.00\r\n'
print (s.rstrip().lstrip())
The only difference I see is the "/n" parameter so instead try:
phraseNew = phraseString.lstrip()
Decided to go about this another way. Placed the raw serial data into a list and extracted the temp and humidity readings then joined them together as a string:
else:
print(msgArray) #Raw serial data
msgString = str(msgArray) #convert serial data to str
character = list(msgString) #str to list
# the next two lines extract the temp & hum readings then converts
them to their respective strings
humidity = ''.join(character[46:51])
temperature = ''.join(character[65:70])
print('H:' + (humidity) + ' T:' + (temperature))
The output now looks like this (Raw then processed data):
[b'\n', b'\r\n', b'Read sensor: OK\r\n', b'Hum40.00\r\n', b'TempC21.00\r\n']
H:40.00 T:21.00
Now onto the next stage.
One thought:
So what you have now is:
"b'Hum40.00\r\n'"
Now what I see here is that the ' symbol is now a part of the string. This means that ' is the last symbol instead of \r\n. I have found success with starting on the outside are working inwards. Let's first start by removing the b. To do this try:
x = str(ser.readline())
x = x.lstrip("b")
Now we should see this:
"'Hum40.00\r\n'"
Next remove the '
x = x.strip("'")
Now we see this:
"Hum40.00\r\n"
Now the tricky part here is deleting the \r\n. There is a post here:Can't delete "\r\n" from a string
that explains why:
x = x.rstrip("\r\n")
will not work.
Instead type:
x = x.rstrip("\\r\\n")
Final code:
x = str(ser.readline())
x = x.lstrip("b")
x = x.strip("'")
x = x.rstrip("\\r\\n")
print(x)
Should yeild:
"Hum40.00"

Broken pipe during a subprocess stdin.write

I interact with a server that I use to tag sentences. This server is launched locally on port 2020.
For example, if I send Je mange des pâtes . on port 2020 through the client used below, the server answers Je_CL mange_V des_P pâtes_N ._., the result is always one line only, and always one line if my input is not empty.
I currently have to tag 9 568 files through this server. The first 9 483 files are tagged as expected. After that, the input stream seems closed / full / something else because I get an IOError, specifically a Broken Pipe error when I try to write on stdin.
When I skip the first 9 483 first files, the last ones are tagged without any issue, including the one causing the first error.
My server doesn't produce any error log indicating something fishy happened... Do I handle something incorrectly? Is it normal that the pipe fails after some time?
log = codecs.open('stanford-tagger.log', 'w', 'utf-8')
p1 = Popen(["java",
"-cp", JAR,
"edu.stanford.nlp.tagger.maxent.MaxentTaggerServer",
"-client",
"-port", "2020"],
stdin=PIPE,
stdout=PIPE,
stderr=log)
fhi = codecs.open(SUMMARY, 'r', 'utf-8') # a descriptor of the files to tag
for i, line in enumerate(fhi, 1):
if i % 500:
print "Tagged " + str(i) + " documents..."
tokens = ... # a list of words, can be quite long
try:
p1.stdin.write(' '.join(tokens).encode('utf-8') + '\n')
except IOError:
print 'bouh, I failed ;(('
result = p1.stdout.readline()
# Here I do something with result...
fhi.close()
In addition to my comments, I might suggest a few other changes...
for i, line in enumerate(fhi, 1):
if i % 500:
print "Tagged " + str(i) + " documents..."
tokens = ... # a list of words, can be quite long
try:
s = ' '.join(tokens).encode('utf-8') + '\n'
assert s.find('\n') == len(s) - 1 # Make sure there's only one CR in s
p1.stdin.write(s)
p1.stdin.flush() # Block until we're sure it's been sent
except IOError:
print 'bouh, I failed ;(('
result = p1.stdout.readline()
assert result # Make sure we got something back
assert result.find('\n') == len(result) - 1 # Make sure there's only one CR in result
# Here I do something with result...
fhi.close()
...but given there's also a client/server of which we know nothing about, there's a lot of places it could be going wrong.
Does it work if you dump all the queries into a single file, and then run it from the commandline with something like...
java .... < input > output

Function append lines to .csv

It has been awhile since I have written functions with for loops and writing to files so bare with my ignorance.
This function is given an IP address to read from a text file; pings the IP, searches for the received packets and then appends it to a .csv
My question is: Is there a better or an easier way to write this?
def pingS (IPadd4):
fTmp = "tmp"
os.system ("ping " + IPadd4 + "-n 500 > tmp")
sName = siteNF #sys.argv[1]
scrap = open(fTmp,"r")
nF = file(sName,"a") # appends
nF.write(IPadd4 + ",")
for line in scrap:
if line.startswith(" Packets"):
arrT = line.split(" ")
nF.write(arrT[10]+" \n")
scrap.close()
nF.close()
Note: If you need the full script I can supply that as well.
This in my opinion at least makes what is going on a bit more obvious. The len('Received = ') could obviously be replaced by a constant.
def pingS (IPadd4):
fTmp = "tmp"
os.system ("ping " + IPadd4 + "-n 500 > tmp")
sName = siteNF #sys.argv[1]
scrap = open(fTmp,"r")
nF = file(sName,"a") # appends
ip_string = scrap.read()
recvd = ip_string[ip_string.find('Received = ') + len('Received = ')]
nF.write(IPadd4 + ',' + recvd + '\n')
You could also try looking at the Python csv module for writing to the csv. In this case it's pretty trivial though.
This may not be a direct answer, but you may get some performance increase from using StringIO. I have had some dramatic speedups in IO with this. I'm a bioinformatics guy, so I spend a lot of time shooting large text files out of my code.
http://www.skymind.com/~ocrow/python_string/
I use method 5. Didn't require many changes. There are some fancier methods in there, but they didn't appeal to me as much.

Decoding tcp packets using python

I am trying to decode data received over a tcp connection. The packets are small, no more than 100 bytes. However when there is a lot of them I receive some of the the packets joined together. Is there a way to prevent this. I am using python
I have tried to separate the packets, my source is below. The packets start with STX byte and end with ETX bytes, the byte following the STX is the packet length, (packet lengths less than 5 are invalid) the checksum is the last bytes before the ETX
def decode(data):
while True:
start = data.find(STX)
if start == -1: #no stx in message
pkt = ''
data = ''
break
#stx found , next byte is the length
pktlen = ord(data[1])
#check message ends in ETX (pktken -1) or checksum invalid
if pktlen < 5 or data[pktlen-1] != ETX or checksum_valid(data[start:pktlen]) == False:
print "Invalid Pkt"
data = data[start+1:]
continue
else:
pkt = data[start:pktlen]
data = data[pktlen:]
break
return data , pkt
I use it like this
#process reports
try:
data = sock.recv(256)
except: continue
else:
while data:
data, pkt = decode(data)
if pkt:
process(pkt)
Also if there are multiple packets in the data stream, is it best to return the packets as a collection of lists or just return the first packet
I am not that familiar with python, only C, is this method OK. Any advice would be most appreciated. Thanks in advance
Thanks
I would create a class that is responsible for decoding the packets from a stream, like this:
class PacketDecoder(object):
STX = ...
ETX = ...
def __init__(self):
self._stream = ''
def feed(self, buffer):
self._stream += buffer
def decode(self):
'''
Yields packets from the current stream.
'''
while len(self._stream) > 2:
end = self._stream.find(self.ETX)
if end == -1:
break
packet_len = ord(self._stream[1])
packet = self._stream[:end]
if packet_len >= 5 and check_sum_valid(packet):
yield packet
self._stream = self._stream[end+1:]
And then use like this:
decoder = PacketDecoder()
while True:
data = sock.recv(256)
if not data:
# handle lost connection...
decoder.feed(data)
for packet in decoder.decode():
process(packet)
TCP provides a data stream, not individual packets, at the interface level. If you want discrete packets, you can use UDP (and handle lost or out of order packets on your own), or put some data separator inline. It sounds like you are doing that already, with STX/ETX as your separators. However, as you note, you get multiple messages in one data chunk from your TCP stack.
Note that unless you are doing some other processing, data in the code you show does not necessarily contain an integral number of messages. That is, it is likely that the last STX will not have a matching ETX. The ETX will be in the next data chunk without an STX.
You should probably read individual messages from the TCP data stream and return them as they occur.
Try scapy, a powerful interactive packet manipulation program.
Where does the data come from ? Instead of trying to decode it by hand, why not use the excellent Impacket package:
http://oss.coresecurity.com/projects/impacket.html
Nice and simple... :)
The trick is in the file object.
f=sock.makefile()
while True:
STX = f.read(1)
pktlen = f.read(1)
wholePacket = STX + pktlen + f.read(ord(pktlen)-2)
doSomethingWithPacket(wholePacket)
And that's it! (There is also no need to check checksums when using TCP.)
And here is a more "robust"(?) version (it uses STX and checksum):
f=sock.makefile()
while True:
while f.read(1)!=STX:
continue
pktlen = f.read(1)
wholePacket = STX + pktlen + f.read(ord(pktlen)-2)
if checksum_valid(wholePacket):
doSomethingWithPacket(wholePacket)

Categories