I am currently sending data over TCP/IP in myserver using something like this
for str in lst:
data = str + "\n"
self._conn.sendall(data)
Now suppose my list has the following two string in it
1-This is statement 1 in list
2-This is statement 2 in list
My client is receiving half of line 2 like this.
This is statement 1 in list
This is
I would like to send line1 and then line 2 in the list individually. I understand that TCP/IP works this way in which it will send the entire data that is available to send. I think I could put a delay in after calling self._conn.sendall(data) but i wanted to know what other options I have. I cannot make changes to the receiver of the data and I can only make changes to the sender. So far my only option is adding a delay after each send.
TCP works with streams of data, not individual packets. It's like reading data from a file. The sender puts data in its send buffer, and TCP can decide for itself when to send it. The timing of the arrival at the receiving application depends on when the data was sent and on (often unpredictable) network conditions.
TCP deliveries can be made more predicable if you use the TCP_NODELAY flag in your socket (something like socket.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1). This would cause TCP to send out data as soon as it arrives in its buffer. But still, there would be no guarantees as to arrival times. This is why any time based solution would break, at least in some cases.
The solution is to divide the data stream into chunks yourself. There are several ways of doing that. Here are a few:
Use fixed length messages - if all messages have a fixed length, the receiver just has to recv() the right number of bytes, process the message, then wait for the same number of bytes.
Send the length of the message before each message. If you want to send the string "blah", encode it as "0004blah" or something similar. The receiver will (always) read the first four bytes (which are 0004) to figure out the number of remaining bytes to read. It will then read the required number of bytes, process the message, and then wait for the next one. It's a robust solution that's also easy to implement.
Use a delimiter. Lines in text files are divided by newline characters (\n). Similarly, you can add a special delimiter byte (or bytes) between messages. For example, you can define that messages always end with a dollar sign ($). Then all the receiver has to do is read from the socket byte by byte until it receives a dollar sign. Of course if you take this approach, you have to make sure that the body of the messages doesn't contain the delimiter character.
TCP is based on a stream, not individual messages. So you need to parse the end point of each message yourself. One idea in your case would be to read until you get a newline, then process the line. Note that you might read this:
This is statement 1 in list
This is
Then you need to check to see if you got a newline, process the line, then leave your buffer ready to receive the rest, like this:
This is
TCP has a local buffer that is not sent until it's full. You can force flushing of the local buffer so it's sent after every message, but when the other party receives these packages they get stored in another local buffer and your separation may disappear. TCP is a stream, use it as a stream. You have to use separator characters and when the packets are received you have to separate the messages manually. If you want more control, use UDP packets.
Related
I'm trying to modify a TCP payload by stripping out some bytes.
As long at the bytes are replaces with other bytes of the same length instead of stripping them out, modifying the package works fine.
If the bytes are stripped out, Wireshark shows an [TCP Previous segment not captured] message in the dump.
I delete both checksums and the package length of the modified package so that Scapy recalculates all of them when sending the package:
# Delete the old checksums
del packet_mod[IP].chksum
del packet_mod[TCP].chksum
# Delete the old packet length
del packet_mod[IP].len
The modification works if I also cut off len(stripped_bytes) at the end of the modified packet as well, as the re-sent TCP segment is added to the modified package by the receiver.
E.g.: I strip out 20 bytes of the TCP payload. The modification then only works, if I also cut off an additional 20 bytes at the end of the payload.
What am I missing?
I don't understand what this part means:
E.g.: I strip out 20 bytes of the TCP payload. The modification then
only works, if I also cut off an additional 20 bytes at the end of the
payload.
Anyway, the thing you're missing is that each TCP segment carries
a TCP header field -- the "sequence number" field -- that indicates
the position of this segment's data content in the stream of bytes that
is being transferred through TCP.
The TCP receiver uses segment sequence numbers and segment lengths
(the segment length is computed from the lengths of the IP datagrams
that delivered the segment) to build a continuous byte stream from
the received traffic. For example, if the receiver has previously
collected all data up to sequence position 200 and the next incoming
segments look like this:
(segment 1) sequence=200 length=80 data='data data data ...'
(segment 2) sequence=280 length=60 data='more data ...'
(segment 3) sequence=340 length=70 data='even more data ...'
then the receiver knows that it has now collected all of the data
up to (but not including) position 410. Since there are no gaps,
this data is ready to be passed up to the application.
Note that the segment numbers (1), (2), (3) are not present in
the TCP header. Those numbers are only there so that this description
can refer to them.
Obviously, if segment (2) had been lost and the receiver had only
collected segments (1) and (3) then the receiver would know that
there was a gap in the received data. And it would know exactly
where that gap was: it's missing 60 bytes starting at position 280.
TCP promises to deliver a complete in-order stream of data, so until
that gap is filled in, the receiver is not allowed to deliver any
later bytes (like the 70 bytes at position 340 that it got in segment
3) to the application. If the missing bytes don't arrive very soon
then the receiver will tell the sender about the gap and the sender
will retransmit the missing data.
This is why removing bytes from a TCP segment causes problems. If
your program removed 20 bytes from segment (2) then the receiver
would see this:
(segment 1) sequence=200 length=80 data='data data data ...'
(segment 2) sequence=280 length=40 data='more data ...'
(segment 3) sequence=340 length=70 data='even more data ...'
and the receiver would conclude that it had discovered a gap of
20 bytes at position 320.
If Wireshark is observing this traffic then it will reach the same conclusion.
Neither the TCP receiver nor Wireshark knows that the cause of the
missing bytes is that segment (2) was edited. Wireshark's most
reasonable guess is that the missing bytes were in a segment that
somehow wasn't made available for inspection, and that's why it
shows the "Previous segment not captured" message. It says "previous"
because it doesn't discover that there's a gap until it examines
segment (3), then one after the gap.
The receiver will handle this in the same way that it handles any
gap. It will tell the sender about the gap and wait for the missing
data to be retransmitted. If the receiver gets the missing data
then it will fill in the gap and then continue as usual. If you
keep intercepting the retransmission and removing the missing bytes
then the receiver will continue to report that it has a gap, and
the sender will eventually tire of retransmitting the missing data
and it will abandon the TCP connection.
This means that you can't simply remove data from an in-flight TCP segment
and expect TCP to not notice that data has gone missing. In principle you
could do it by deleting some data and then manipulating the sequence
number in all of the later segments from the sender and in all of the
acknowledgements sent by the receiver, but that's a much larger task.
References: The basic TCP sequence number mechanism is described
in RFC 793 and a longstanding
best practice for using sequence numbers to improve security was
described in RFC 1948 and
formalised as a standard in RFC 6528.
For a project I'm working on, I'm supposed to use XBee radio modules, which isn't super important, except that I have to read and write to their serial port in order to use them. I'm currently working with Python and ROS, so I'm attempting to send TransformStamped messages over the XBees.
My question is, unless I'm misunderstanding how Serial.read() and Serial.write() work, how can I tell how many bytes to read? I was planning on using Pickle to serialize the data into a string, and then sending that over the serial ports. Is there a better way that I've overlooked? Is there some sort of loop that would work to read data until the end of the pickled string is read?
The short answer is, serial.read() cannot tell you how many bytes to read. Either you have some prior knowledge as to how long the message is, or the data you send has some means of denoting the boundaries between messages.
Hint; knowing how long a message is is not enough, you also need to know whereabouts in the received byte stream a message has actually started. You don't know for sure that the bytes received are exactly aligned with the sent bytes: you may not have started the receiver before the transmitter, so they can be out of step.
With any serialisation one has to ask, is it self delimiting, or not? Google Protocol buffers are not. I don't think Pickle is either. ASN.1 BER is, at least to some extent. So is XML.
The point is that XBee modules are (assuming you're using the ones from Digi) just unreliable byte transports, so whatever you put through them has to be delimited in some way so that the receiving end knows when it has a complete message. Thus if you pickle or Google Protocol Buf your message, you need some other way of framing the serialised data so that the receiving end knows it has a complete message (i.e. it's seen the beginning and end). This can be as simple as some byte pattern (e.g. 0xffaaccee00112233) used to denote the end of one message and the beginning of the next, chosen so as to be unlikely to occur in the sent messages themselves. Your code at the receiving end would read and discard data until is saw that pattern, would then read subsequent data into a buffer until it saw that pattern again, and only then would it attempt to de-pickle / de-GPB the data back into an object.
With ASN.1 BER, the data stream itself incorporates effectively the same thing, saving you the effort. It uses tags, values and length fields to tell its decoders about the incoming data, and if the incoming data makes no sense to the decoder in comparison to the original schema, incorrectly framed data is easily ignored.
This kind of problem also exists on tcp sockets, though at least with those delivery is more or less guaranteed (the first bytes you receive are the first bytes sent). A Digimesh connection does not quite reach the same level of guaranteed delivery as a tcp socket, so something else is needed (like a framing byte pattern) for the receiving application to know that it is synchronised with the sender.
I made a quick program that sends a file using sockets in python.
Server:
import socket, threading
#Create a socket object.
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
#Bind the socket.
sock.bind( ("", 5050) )
#Start listening.
sock.listen()
#Accept client.
client, addr = sock.accept()
#Open a new file jpg file.
file = open("out.jpg", "wb")
#Receive all the bytes and write them into the file.
while True:
received = client.recv(5)
#Stop receiving.
if received == b'':
file.close()
break
#Write bytes into the file.
file.write( received )
Client:
import socket, threading
#Create a socket object.
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
#Connect to the server.
sock.connect(("192.168.1.3", 5050))
#Open a file for read.
file = open("cpp.jpg", "rb")
#Read first 5 bytes.
read = file.read(5)
#Keep sending bytes until reaching EOF.
while read != b'':
#Send bytes.
sock.send(read)
#Read next five bytes from the file.
read = file.read(1024)
sock.close()
file.close()
From experience a learn that send can send an amount of bytes that your network
speed is capble of sending them. If you give for example: sock.send(20 gb) you are going to lose bytes because most network connections can't send 20 gb at
once. You must send them part by part.
So my question is: How can i know the maximum amount of bytes that socket.send()
can send over the internet? How can i improve my program to send the file as quick as possible depending on my internet speed?
send makes no guarantees that all the data is sent (it's not directly tied to network speed; there are multiple reasons it could send less than requested), just that it lets you know how much was sent. You could explicitly write loops to send until it's all really sent, per Dunno's answer.
Or you could just use sendall and avoid the hassle. sendall is basically the wrapper described in the other answer, but Python does all the heavy lifting for you.
If you don't care about slurping the whole file into memory, you could use this to replace your whole loop structure with just:
sock.sendall(file.read())
If you're on modern Python (3.5 or higher) on a UNIX-like OS, you could optimize a bit to avoid even reading the file data into Python using socket.sendfile (which should only lead to partial send on error):
sock.sendfile(file)
If the Python doesn't support os.sendfile on your OS, this is just a effectively a loop that reads and sends repeatedly, but on a system that supports it, this directly copies from file to socket in the kernel, without even handling file data in Python (which can improve throughput speeds significantly by reducing system calls and eliminating some memory copies entirely).
Just send those bytes in a loop until all were sent, here's an example from the docs
def mysend(self, msg):
totalsent = 0
while totalsent < MSGLEN:
sent = self.sock.send(msg[totalsent:])
if sent == 0:
raise RuntimeError("socket connection broken")
totalsent = totalsent + sent
In your case, MSGLEN would be 1024, and since you're not using a class, you don't need the self argument
There are input/output buffers at all steps along the way between your source and destination. Once a buffer fills, nothing else will be accepted on to it until space has been made available.
As your application attempts to send data, it will fill up a buffer in the operating system that is cleared as the operating system is able to offload that data to the network device driver (which also has a buffer).
The network device driver interfaces with the actual network and understands how to know when it can send data and how receipt will be confirmed by the other side (if at all). As data is sent, that buffer is emptied, allowing the OS to push more data from its buffer. That, in turn, frees up room for your application to push more of its data to the OS.
There are a bunch of other things that factor into this process (timeouts, max hops are two I can think off offhand), but the general process is that you have to buffer the data at each step until it can be sent to the next step.
From experience a learn that send can send an amount of bytes that
your network speed is capble of sending them.
Since you are using a TCP Socket (i.e. SOCK_STREAM), speed-of-transmission issues are handled for you automatically. That is, once some bytes have been copied from your buffer (and into the socket's internal send-buffer) by the send() call, the TCP layer will make sure they make it to the receiving program, no matter how long it takes (well, within reason, anyway; the TCP layer will eventually give up on resending packets if it can't make any progress at all over the course of multiple minutes).
If you give for example: sock.send(20 gb) you are going to lose bytes
because most network connections can't send 20 gb at once. You must
send them part by part.
This is incorrect; you are not going to "lose bytes", as the TCP layer will automatically resend any lost packets when necessary. What might happen, however, is that send() might decide not to accept all of the bytes that you offered it. That's why it is absolutely necessary to check the return value of send() to see how many bytes send() actually accepted responsibility for -- you cannot simply assume that send() will always accept all the bytes you offered to it.
So my question is: How can i know the maximum amount of bytes that
socket.send() can send over the internet?
You can't. Instead, you have to look at the value returned by send() to know how many bytes send() has copied out of your buffer. That way, on your next call to send() you'll know what data to pass in (i.e. starting with the next byte after the last one that was sent in the previous call)
How can i improve my program to send the file as quick as possible
depending on my internet speed?
Offer send() as many bytes as you can at once; that will give it the most flexibility to optimize what it's doing behind the scenes. Other than that, just call send() in a loop, using the return value of each send() call to determine what bytes to pass to send() the next time (e.g. if the first call returns 5, you know that send() read the first 5 bytes out of your buffer and will make sure they get to their destination, so your next call to send() should pass in a buffer starting at the 6th byte of your data stream... and so on). (Or if you don't want to deal with that logic yourself, you can call sendall() like #ShadowRanger suggested; sendall() is just a wrapper containing a loop around send() that does that logic for you. The only disadvantage is that e.g. if you call sendall() on 20 gigabytes of data, it might be several hours before the sendall() call returns! Whether or not that would pose a problem for you depends on what else your program might want to accomplish, if anything, while sending the data).
That's really all there is to it for TCP.
If you were sending data using a UDP socket, on the other hand, things would be very different; in the UDP case, packets can simply be dropped, and it's up to the programmer to manage speed-of-transmission issues, packet resends, etc, explicitely. But with TCP all that is handled for you by the OS.
#Jeremy Friesner
So I can do something like that:
file = open(filename, "rb")
read = file.read(1024**3) #Read 1 gb.
totalsend = 0
#Send Loop
while totalsend < filesize:
#Try to send all the bytes.
send = sock.send(read)
totalsend += send
#If failed, then seek into the file the position
#where the next read will also read the missing bytes.
if send < 1024**3:
file.seek(totalsend)
read = file.read(1024**3) #Read 1 gb.
Is this correct?
Also, from this example i undestood one more think. The data you can send in every loop, can't be bigger in size than your memory. Because you are bringing bytes from the disk on the memory. So theoretically even if your network speed is infinity, you can't send all the bytes at once if the file is bigger than your memory.
I was wondering if there is a way I can tell python to wait until it gets a response from a server to continue running.
I am writing a turn based game. I make the first move and it sends the move to the server and then the server to the other computer. The problem comes here. As it is no longer my turn I want my game to wait until it gets a response from the server (wait until the other player makes a move). But my line:
data=self.sock.recv(1024)
hangs because (I think) it's no getting something immediately. So I want know how can I make it wait for something to happen and then keep going.
Thanks in advance.
The socket programming howto is relevant to this question, specifically this part:
Now we come to the major stumbling block of sockets - send and recv operate on the
network buffers. They do not necessarily handle all the bytes you hand them (or expect
from them), because their major focus is handling the network buffers. In general, they
return when the associated network buffers have been filled (send) or emptied (recv).
They then tell you how many bytes they handled. It is your responsibility to call them
again until your message has been completely dealt with.
...
One complication to be aware of: if your conversational protocol allows multiple
messages to be sent back to back (without some kind of reply), and you pass recv an
arbitrary chunk size, you may end up reading the start of a following message. You’ll
need to put that aside >and hold onto it, until it’s needed.
Prefixing the message with it’s length (say, as 5 numeric characters) gets more complex,
because (believe it or not), you may not get all 5 characters in one recv. In playing
around, you’ll get away with it; but in high network loads, your code will very quickly
break unless you use two recv loops - the first to determine the length, the second to
get the data part of the message. Nasty. This is also when you’ll discover that send
does not always manage to get rid of everything in one pass. And despite having read
this, you will eventually get bit by it!
The main takeaways from this are:
you'll need to establish either a FIXED message size, OR you'll need to send the the size of the message at the beginning of the message
when calling socket.recv, pass number of bytes you actually want (and I'm guessing you don't actually want 1024 bytes). Then use LOOPs because you are not guaranteed to get all you want in a single call.
That line, sock.recv(1024), blocks until 1024 bytes have been received or the OS detects a socket error. You need some way to know the message size -- this is why HTTP messages include the Content-Length.
You can set a timeout with socket.settimeout to abort reading entirely if the expected number of bytes doesn't arrive before a timeout.
You can also explore Python's non-blocking sockets using setblocking(0).
How do I get the following code to break up large files into smaller parts and send those parts, instead of sending the whole file? It fails to send large files (Tested with an ubuntu iso around 600mb)
...some code
# file transfer
with open(sendFile, "rb") as f:
while 1:
fileData = f.read()
if fileData == "": break
# send file
s.sendall(EncodeAES(cipher, fileData))
f.close()
...more code
I tried with f.read(1024), but that didn't work.
Finally, when splitting up the files, I would need to be able to put the parts together again.
I'm also encrypting the files using PyCrypto, if that has any impact on what I'm trying to do. Guess it would be smartest to encrypt the seperate parts, instead of encrypting the whole file and then splitting that into parts.
Hope the above code is enough. If not, I'll update with more code.
I may be wrong, but I'm betting that your actual problem is not what you think it is, and it's the same reason your attempt to fix it by reading 1K at a time didn't help. Apologies if I'm wrong, and you already know this basic stuff.
You're trying to send your cipher text like this:
s.sendall(EncodeAES(cipher, fileData))
There is certainly no length information, no delimiter, etc. within this code. And you can't possibly be sending length data outside this function, because you don't know how long the ciphertext will be before getting to this code.
So, I'm guessing the other side is doing something like this:
data = s.recv(10*1024*1024)
with open(recvFile, "wb") as f:
f.write(DecodeAES(cipher, data))
Since the receiver has no way of knowing where the encrypted file ends and the next encrypted file (or other message) begins, all it can do is try to receive "everything" and then decrypt it. But that could be half the file, or the file plus 6-1/2 other messages, or the leftover part of some previous message plus half the file, etc. TCP sockets are just streams of bytes, not sequences of separate messages. If you want to send messages, you have to build a protocol on top of TCP.
I'm guessing the reason you think it only fails with large files is that you're testing on localhost, or on a simple LAN. In that case, for smallish sends, there's a 99% chance that you will recv exactly as much as you sent. But once you get too big for one of the buffers along the way, it goes from working 99% of the time to 0% of the time, so you assume the problem is that you just can't send big files.
And the reason you think that breaking it into chunks of 1024 bytes gives you gibberish is that it means you're doing a whole bunch of messages in quick succession, making it much less likely that the send and recv calls will match up one-to-one. (Or this one may be even simpler—e.g., you didn't match the changes on the two sides, so you're not decrypting the same way you're encrypting.)
Whenever you're trying to send any kind of messages (files, commands, whatever) over the network, you need a message-based protocol. But TCP/IP is a byte-stream-based protocol. So, how do you handle that? You build a message protocol on top of the stream protocol.
The easiest way to do that is to take a protocol that's already been designed for your purpose, and that already has Python libraries for the client and either Python libraries or a stock daemon that you can just use as-is for the server. Some obvious examples for sending a file are FTP, TFTP, SCP, or HTTP. Or you can use a general-purpose protocol like netstring, JSON-RPC, or HTTP.
If you want to learn to design and implement protocols yourself, there are two basic approaches.
First, you can start with Twisted, monocle, Tulip, or some other framework that's designed to do all the tedious and hard-to-get-right stuff so you only have to write the part you care about: turning bytes into messages and messages into bytes.
Or you can go bottom-up, and build your protocol handler out of basic socket calls (or asyncore or something else similarly low-level). Here's a simple example:
def send_message(sock, msg):
length = len(msg)
if length >= (1 << 32):
raise ValueError('Sorry, {} is too big to fit in a 4GB message'.format(length))
sock.sendall(struct.pack('!I', length))
sock.sendall(msg)
def recv_bytes(sock, length):
buf = ''
while len(buf) < length:
received = sock.recv(4-len(buf))
if not received:
if not buf:
return buf
raise RuntimeError('Socket seems to have closed in mid-message')
buf += received
return buf
def recv_message(sock):
length_buf = recv_bytes(sock, 4)
length = struct.unpack('!I', buf)
msg_buf = recv_bytes(sock, length)
return msg_buf
Of course in real life, you don't want to do tiny little 4-byte reads, which means you need to save up a buffer across multiple calls to recv_bytes. More importantly, you usually want to turn the flow of control around, with a Protocol or Decoder object or callback or coroutine. You feed it with bytes, and it feeds something else with messages. (And likewise for the sending side, but that's always simpler.) By abstracting the protocol away from the socket, you can replace it with a completely different transport—a test driver (almost essential for debugging protocol handlers), a tunneling protocol, a socket tied to a select-style reactor (to handle multiple connections at the same time), etc.