Python3; Speed when sending data : IRC protocol, DCC file transfer - python

I've written a new IRC client to which I just added the DCC SEND part, therefore supporting direct file transfer for both users of the app. Nothing, fancy, I'm using the irc python library to power the client and Django for the GUI. The great miniupnpc lib takes care of port forwarding. However, whilst the file is properly being sent/received, the speed is absolutely HORRENDOUS : 20 KB/s approximatively. To test the server, I sent a package using Hexchat : the upload speed was the maximal theoretical bandwidth speed (in other words excellent). I tried looking for a buffer of some sort I may have missed. In the end, I must say I have absolutely no idea why my upload speed is so crappy and need some insight. Here is the relevant part of my upload script.
def on_dcc_connect(self, c, e):
t = threading.Timer(0, upload_monitoring, [self, c])
t.start()
log("connection made with %s" % self.nickname).write()
self.file = open(self.filename, 'rb')
self.sendBlock()
def sendBlock(self):
if self.position > 0:
self.file.seek(self.position)
block = self.file.read(1024)
if block:
self.dcc.send_bytes(block)
self.bytesSent = self.bytesSent + len(block)
else:
# Nothing more to send, transfer complete.
self.connection.quit()
def on_dccsend(self, c, e):
if e.arguments[1].split(" ", 1)[0] == "RESUME":
self.position = int(e.arguments[1].split(" ")[3])
c.ctcp("DCC", self.nickname, "ACCEPT %s %d %d" % (
os.path.basename(self.filename),
self.eport,
self.position))
def on_dccmsg(self, connection, event):
data = event.arguments[0]
bytesAcknowledged = struct.unpack("!Q", data)[0]
if bytesAcknowledged < self.bytesSent:
return
elif bytesAcknowledged > self.bytesSent:
self.connection.quit()
return
self.sendBlock()
The send_bytes(block) method is the basic socket.send() method. When I increase the buffer of file.read(), I get struct.pack error, because the client's block reception acknowledgment (also struct.pack) is not properly read by my send script: data not of bytes length 8. Is it the file.read buffer that has to be changed? If so, why is the bytes received acknowledgment not the same at the sender's side as downloader's side? If not, where should I look to improve the upload speed?

As I already suspected and as Bakuriu pointed out, the problem did indeed lie at the file.read(buffer) line. I finally found out why I had struct.pack error : the bytes acknowledgment was properly sent back to the sender, but sometimes a few packets were joined together. That is, for every packet received, an acknowledgment is answered to the sender in form of an 8 bytes length packed unsigned integer. Sometimes, sock.recv() doesn't read the incoming data fast enough and then, instead of having a bytes object of length 8, I have a bytes object of length 16, 24, 32, 40 or more. That's why I couldn't just unpack with struct.pack("!Q", data). Once I had that figured out, the solution was fairly easy to find :
def on_dccmsg(self, connection, event):
data = event.arguments[0][-8:]
bytesAcknowledged = struct.unpack("!Q", data)[0]
I just read the last 8 bytes from the data read by sock.recv() instead of reading everything. Now it works like a charm and the upload speed is the maximal theoretical upload speed allowed by my bandwidth !!!

Related

Why these Python send / receive socket functions work if invoked slowly, but fail if invoked quickly in a row?

I have a client and a server, where the server needs to send a number of text files to the client.
The send file function receives the socket and the path of the file to send:
CHUNKSIZE = 1_000_000
def send_file(sock, filepath):
with open(filepath, 'rb') as f:
sock.sendall(f'{os.path.getsize(filepath)}'.encode() + b'\r\n')
# Send the file in chunks so large files can be handled.
while True:
data = f.read(CHUNKSIZE)
if not data:
break
sock.send(data)
And the receive file function receives the client socket and the path where to save the incoming file:
CHUNKSIZE = 1_000_000
def receive_file(sock, filepath):
with sock.makefile('rb') as file_socket:
length = int(file_socket.readline())
# Read the data in chunks so it can handle large files.
with open(filepath, 'wb') as f:
while length:
chunk = min(length, CHUNKSIZE)
data = file_socket.read(chunk)
if not data:
break
f.write(data)
length -= len(data)
if length != 0:
print('Invalid download.')
else:
print('Done.')
It works by sending the file size as the first line, then sending the text file line by line.
Both are invoked in loops in the client and the server, so that files are sent and saved one by one.
It works fine if I put a breakpoint and invoke these functions slowly. But If I let the program run uninterrupted, it fails when reading the size of the second file:
File "/home/stark/Work/test/networking.py", line 29, in receive_file
length = int(file_socket.readline())
ValueError: invalid literal for int() with base 10: b'00,1851,-34,-58,782,-11.91,13.87,-99.55,1730,-16,-32,545,-12.12,19.70,-99.55,1564,-8,-10,177,-12.53,24.90,-99.55,1564,-8,-5,88,-12.53,25.99,-99.55,1564,-8,-3,43,-12.53,26.54,-99.55,0,60,0\r\n'
Clearly a lot more data is being received by that length = int(file_socket.readline()) line.
My questions: why is that? Shouldn't that line read only the size given that it's always sent with a trailing \n?
How can I fix this so that multiple files can be sent in a row?
Thanks!
It seems like you're reusing the same connection and what happens is your file_socket being buffered means... you've actually recved more from your socket then you'd think with your read loop.
I.e. the receiver consumes more data from your socket and next time you attempt to readline() you end up reading rest of the previous file up to the new line contained therein or of the next length information.
This also means your initial problem actually is you've skipped a while. Effect of which is next read line is not an int you expected and hence the observed failure.
You can say:
with sock.makefile('rb', buffering=0) as file_socket:
instead to force the file like access being unbuffered. Or actually handle the receiving and buffering and parsing of incoming bytes (understanding where one file ends and the next one begins) on your own (instead of file like wrapper and readline).
You have to understand that socket communication is based on TCP/IP, does not matter if it's same machine (you use loopback in such cases) or different machines. So, you've got some IP addresses between which the connection is established. Going further, it involves accessing your network adapter, ie takes relatively long in comparison to accessing eg. RAM. Additionally, the adapter itself manages when to send particular data frames (lower ISO/OSI layers). Basically, in case of TCP there's ACK required, but on standard PC this is usually not some industrial, real-time ethernet.
So, in your code, you've got a while True loop without any sleep and you don't check what does sock.send returns. Even if something goes wrong with particular data frame, you ignore it and try to send next. On first glance it appears that something has been cached and receiver received what was flushed once connection was re-established.
So, first thing which you should do is check if sock.send indeed returned number of bytes sent. If not, I believe the frame should be re-sent. Another thing which I strongly recommend in such cases is think of some custom protocol (this is usually called application layer in context of OSI/ISO stack). For example, you might have 4 types of frames: START, FILESIZE, DATA, END, assign unique ID and start each frame with the identifier. Then, START is gonna be empty, FILESIZE gonna contain single uint16, DATA is gonna contain {FILE NUMBER, LINE NUMBER, LINE_LENGTH, LINE} and END is gonna be empty. Then, once you've got entire frame on the client, you can safely assemble the information you received.

How does requests' stream=True option streams data one block at a time?

I'm using the following code to test how many seconds an HTTP connection can be kept alive:
start_time = time.time()
try:
r = requests.get(BIG_FILE_URL, stream=True)
total_length = r.headers['Content-length']
for chunk in r.iter_content(chunk_size=CHUNK_SIZE):
time.sleep(1)
# ... except and more logic to report total time and percentage downloaded
To be sure Python doesn't just download everything at once and creates a generator, I've used tcpdump. It does send one packet per second (approximately) but I didn't find what makes the server send one block at a time and how does the requests library does that.
I've checked several SOF questions and looked at the requests library documentation, but all resources explain how to use the library to download large files, and none of them explain the internals of the stream=True option.
My question is: what in the tcp protocol or HTTP request headers, makes the server send one block at a time and not the whole file at once?
EDIT + possible answer:
After working with Wireshark, I found out Python implements it using the TCP's sliding window. Meaning, it won't send an ack while the next chunk is not called.
That might cause some unexpected behavior as the sliding window might be a lot bigger than the chunk, and the chunks in the code might not represent actual packets.
Example: if you set the chunk to 1000 bytes, a default sliding window of 64K (my default on Ubuntu 18) will cause 64 chunks to be sent immediately. If the body size is less than 64K the connection might close immediately. So this is not a good idea for keeping connection online.
This is not explained in user documentation.
By going through the source code of requests, I found out that if we set stream=True in requests.get(...) then headers['Transfer-Encoding'] = 'chunked' is set in the HTTP headers. Thus specifying the Chunked transfer encoding. In chunked transfer encoding, the data stream is divided into a series of non-overlapping "chunks". The chunks are sent out independently of one another by the server.
Hope this answers the question.
This question caught my curiosity, so I decided to go down this research rabbit hole. Here are some of my (open to corrections!) findings:
Client to server communication is standardized by the Open Systems Interconnection model ( OSI ).
The transfer of data is handled by layer 4 - the Transport Layer. TCP/IP always breaks the data into packets. The IP packet lengths max out at approx. 65.5K bytes.
Now what keeps Python from recombining all these packets into the original file before returning it?
The requests iter_contentmethod has a nested generator which wraps a urllib3 generator method: class urllib3.response.HTTPResponse(...).stream(...)
The 'chunk_size' parameter seems to set a buffer for how much data is read from the open socket into memory before it's written to the file system.
Here's a copy of the iter_content method that was helpful:
def iter_content(self, chunk_size=1, decode_unicode=False):
"""Iterates over the response data. When stream=True is set on the
request, this avoids reading the content at once into memory for
large responses. The chunk size is the number of bytes it should
read into memory. This is not necessarily the length of each item
returned as decoding can take place.
chunk_size must be of type int or None. A value of None will
function differently depending on the value of `stream`.
stream=True will read data as it arrives in whatever size the
chunks are received. If stream=False, data is returned as
a single chunk.
If decode_unicode is True, content will be decoded using the best
available encoding based on the response.
"""
def generate():
# Special case for urllib3.
if hasattr(self.raw, 'stream'):
try:
for chunk in self.raw.stream(chunk_size, decode_content=True):
yield chunk
except ProtocolError as e:
raise ChunkedEncodingError(e)
except DecodeError as e:
raise ContentDecodingError(e)
except ReadTimeoutError as e:
raise ConnectionError(e)
else:
# Standard file-like object.
while True:
chunk = self.raw.read(chunk_size)
if not chunk:
break
yield chunk
self._content_consumed = True
if self._content_consumed and isinstance(self._content, bool):
raise StreamConsumedError()
elif chunk_size is not None and not isinstance(chunk_size, int):
raise TypeError("chunk_size must be an int, it is instead a %s." % type(chunk_size))
# simulate reading small chunks of the content
reused_chunks = iter_slices(self._content, chunk_size)
stream_chunks = generate()
chunks = reused_chunks if self._content_consumed else stream_chunks
if decode_unicode:
chunks = stream_decode_response_unicode(chunks, self)
return chunks

Issues with sending large JSON file over socket

I'm at the point in my network learning where I have just realized things aren't perfect and network packets can get out of sync and might not send everything in one packet.
My program works for small data that is being sent over the socket, but not a large file. I keep getting ValueError: Expecting ',' or Expecting ':' whenever trying to send a big file and I think it's because my receiving function is not that good. I've read up a lot on how you have to implement your own protocol for reading information from a socket but I still have found no information on how actually to implement that.
My program serializes some information on a computer and sends that over JSON to another computer. I heard somewhere that JSON automatically deals with sending packet information or something like that, but I'm not too clear about what that meant.
In any case, this is my receiving function:
def recieve(node, s, timeout=2):
s.setblocking(False)
#array of all data
all_data = []
begin = time.time()
while True:
#If data was received, break after timeout
if all_data and time.time()-begin > timeout:
break
#If no data received at all, wait some more
elif time.time() - begin > timeout * 2:
break
try:
data = ''
while len(data) < BUFFER_SIZE:
data += s.recv(BUFFER_SIZE - len(data))
if data:
all_data.append(data.decode('utf-8'))
#Reset begin for timeout
begin = time.time()
except:
pass
return json.loads(''.join(all_data))
And I just use `sendall(dumps((data1, data2, data3, data4, data5)).encode()) to encode my python information (which could be dictionaries, lists, etc).
I'm not sure if my timeouts even make any sense, I'm a complete beginner to network programming, and I'm not sure where to go from here. Should I use socket.makefile()? Do I still need to implement a network protocol when using JSON? How do I implement a network wrapper system if I need it?
Thanks so much in advance!

Receive image in Python

The following code is for a python server that can receive a string.
import socket
TCP_IP = '127.0.0.1'
TCP_PORT = 8001
BUFFER_SIZE = 1024
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind((TCP_IP, TCP_PORT))
s.listen(1)
conn, addr = s.accept()
print 'Connection address:', addr
while 1:
length = conn.recv(1027)
data = conn.recv(int(length))
import StringIO
buff = StringIO.StringIO()
buff.write(data)
if not data: break
print "received data:", data
conn.send('Thanks') # echo
get_result(buff)
conn.close()
Can anyone help me to edit this code or create a similar one to be able to receive images instead of string?
First, your code actually can't receive a string. Sockets are byte streams, not message streams.
This line:
length = conn.recv(1027)
… will receive anywhere from 1 to 1027 bytes.
You need to loop around each recv and accumulate a buffer, like this:
def recvall(conn, length):
buf = b''
while len(buf) < length:
data = conn.recv(length - len(buf))
if not data:
return data
buf += data
return buf
Now you can make it work like this:
while True:
length = recvall(conn, 1027)
if not length: break
data = recvall(conn, int(length))
if not data: break
print "received data:", data
conn.send('Thanks') # echo
You can use StringIO or other techniques instead of concatenation for performance reasons, but I left that out because it's simpler and more concise this way, and understanding the code is more important than performance.
Meanwhile, it's worth pointing out that 1027 bytes is a ridiculous huge amount of space to use for a length prefix. Also, your sending code has to make sure to actually send 1027 bytes, no matter what. And your responses have to always be exactly 6 bytes long for this to work.
def send_string(conn, msg):
conn.sendall(str(len(msg)).ljust(1027))
conn.sendall(msg)
response = recvall(conn, 6)
return response
But at least now it is workable.
So, why did you think it worked?
TCP is a stream of bytes, not a stream of messages. There's no guarantee that a single send from one side will match up with the next recv on the other side. However, when you're running both sides on the same computer, sending relatively small buffers, and aren't loading the computer down too badly, they will often happen to match up 1-to-1. After all, each time you call recv, the other side has probably only had time to send one message, which is sitting in the OS's buffers all by itself, so the OS just gives you the whole thing. So, your code will appear to work in initial testing.
But if you send the message through a router to another computer, or if you wait long enough for the other side to make multiple send calls, or if your message is too big to fit into a single buffer, or if you just get unlucky, there could be 2-1/2 messages waiting in the buffer, and the OS will give you the whole 2-1/2 messages. And then your next recv will get the leftover 1/2 message.
So, how do you make this work for images? Well, it depends on what you mean by that.
You can read an image file into memory as a sequence of bytes, and call send_string on that sequence, and it will work fine. Then the other side can save that file, or interpret it as an image file and display it, or whatever it wants.
Alternatively, you can use something like PIL to parse and decompress an image file into a bitmap. Then, you encode the header data (width, height, pixel format, etc.) in some way (e.g., pickle it), send_string the header, then send_string the bitmap.
If the header has a fixed size (e.g., it's a simple structure that you can serialize with struct.pack), and contains enough information for the other side to figure out the length of the bitmap in bytes, you don't need to send_string each one; just use conn.sendall(serialized_header) then conn.sendall(bitmap).

TCP Socket file transfer

I'm trying to write a secure transfer file program using Python and AES and i've got a problem i don't totally understand. I send my file by parsing it with 1024 bytes chunks and sending them over but the server side who receive the data crashes ( I use AES CBC therefore my data length must be a multiple of 16 bytes ) and the error i get says that it is not.
I tried to print the length of the data sent by the client on the client side and the length of the data received on the server and it shows that the client is sending exactly 1024 bytes each time like it's supposed to, but the server side shows that at some point in time, a received packet is not and so less than 1024 bytes ( for example 743 bytes ).
I tried to put a time.sleep(0.5) between each socket send on the client side and it seems to work. Is it possible that it is some kind of socket buffer failure on the server side ? That too much data is being send too fast by the client and that it breaks somehow the socket buffer on the server side so the data is corrupted or vanish and the recv(1024) only receive a broken chunk? That's the only thing i could think of, but this may also be completely false, if anyone has an idea of why this is not working properly it would be great ;)
Following my idea i tried :
self.s.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, 32768000)
print socket.SO_RCVBUF
I tried to put a 32mbytes buffer on the server side but On Windows XP it shows 4098 on the print and on linux it shows only 8. I don't know how i must interpret this, the only thing i know is that it seems that it doesn't have a 32mbytes buffer so the code doesn't work.
Well it's been a really long post, i hope some of you had the courage to read it all to here ! i'm totally lost there so if anyone has any idea about this please share it :D
Thanks to Faisal my code is here :
Server Side: ( count is my filesize/1024 )
while 1:
txt=self.s.recv(1024)
if txt == " ":
break
txt = self.cipher.decrypt(txt)
if countbis == count:
txt = txt.rstrip()
tfile.write(txt)
countbis+=1
Client side :
while 1:
txt= tfile.read(1024)
if not txt:
self.s.send(" ")
break
txt += ' ' * (-len(txt) % 16)
txt = self.cipher.encrypt(txt)
self.s.send(txt)
Thanks in advance,
Nolhian
Welcome to network programming! You've just fallen into the same mistaken assumption that everyone makes the first time through in assuming that client sends & server recives should be symmetric. Unfortunately, this is not the case. The OS allows reception to occur in arbitrarily sized chunks. It's fairly easy to work around though, just buffer your data until the amount you've read in equals the amount you wish to receive. Something along the lines of this will do the trick:
buff=''
while len(buff) < 1024:
buff += s.recv( 1024 - len(buff) )
TCP is a stream protocol, it doesn't conserve message boundaries, as you have just discovered.
As others have pointed out you're probably processing an incomplete message. You need to either have fixed sized messages or have a delimiter (don't forget to escape your data!) so you know when a complete message has been received.
What TCP can guarantee is that all your data arrives, in the right order, at some point. (Unless something unexpected happens, by which it won't arrive.) But it's very possible that the data you send will still arrive in chunks. Much of it is because of limited send- and receive-buffers. What you should do is to continue doing your recv calls until you have enough data to process it. You might might have to call send multiple times; use its return value to keep track of how much data has been sent/buffered so far.
When you do print socket.SO_RCVBUF, you actually print the symbolic SO_RCVBUF contant (except that Python doesn't really have constants); the one used to tell setsockopt what you want to change. To get the current value, you should instead call getsockopt.
Not related to TCP (as that has been answered already), but appending to a string repeatedly will be rather inefficient if you're expecting to receive a lot. It might be better to append to a list and then turn the list into a string when you finished receiving by using ''.join(list).
For many applications, the complexities of TCP are neatly abstracted by Python's asynchat module.
Here is the nice snippet of code that I wrote some time ago, may be not the best , but it could be good example of big files transfer over the local network. http://setahost.com/sending-files-in-local-network-with-python/
As mentioned above
TCP is a stream protocol
You can try this code, where the data is your original data, you can read it from the file or user input
Sender
import socket as s
sock = s.socket(s.AF_INET, s.SOCK_STREAM)
sock.connect((addr,5000))
sock.sendall(data)
finish = t.time()
Receiver
import socket as s
sock = s.socket(s.AF_INET, s.SOCK_STREAM)
sock.setsockopt(s.SOL_SOCKET, s.SO_REUSEADDR, 1)
sock.bind(("", 5000))
sock.listen(1)
conn, _ = sock.accept()
pack = []
while True:
piece = conn.recv(8192)
if not piece:
break
pack.append(piece.decode())

Categories