Python/Twisted - TCP packet fragmentation? - python

In Twisted when implementing the dataReceived method, there doesn't seem to be any examples which refer to packets being fragmented. In every other language this is something you manually implement, so I was just wondering if this is done for you in twisted already or what? If so, do I need to prefix my packets with a length header? Or do I have to do this manually? If so, what way would that be?

In the dataReceived method you get back the data as a string of indeterminate length meaning that it may be a whole message in your protocol or it may only be part of the message that some 'client' sent to you. You will have to inspect the data to see if it comprises a whole message in your protocol.
I'm currently using Twisted on one of my projects to implement a protocol and decided to use the struct module to pack/unpack my data. The protocol I am implementing has a fixed header size so I don't construct any messages until I've read at least HEADER_SIZE amount of bytes. The total message size is declared in this header data portion.
I guess you don't really need to define a message length as part of your protocol but it helps. If you didn't define one you would have to have a special delimiter that determines when a message begins/ends. Sort of how the FIX protocol uses the SOH byte to delimit fields. Though it does have a required field that tells you how long a message is (just not how many fields are in a message).

When dealing with TCP, you should really forget all notion of 'packets'. TCP is a stream protocol - you stream data in and data streams out the other side. Once the data is sent, it is allowed to arrive in as many or as few blocks as it wants, as long as the data all arrives in the right order. You'll have to manually do the delimitation as with other languages, with a length field, or a message type field, or a special delimiter character, etc.

You can also use a LineReceiver protocol

Related

How to send UDP client message in smart way to the UDP server parse the message

I have an application in PyQT (UDP client) that send some parameters over UDP/IP to an application on raspberry (UDP server).
This Qt application has several fields like PID parameters, speed of the motor, sensors presets and so on.
Actually, the UDP client sends a string by getting the values from each field in the QT application and appending the data into the string with separator character (','), always in the same sequence. For instance, "142.0, 10.0, 2.0, negative, positive".
The UDP server receives this message, splits the message and moves each item of the list to the respective variable.
It works, but it is not smart, all parameters are sent even when one the parameter is not changed.
Whats should be the smart way to send only specific parameters, not depending of the right sequence? or only the changed ones?
Maybe some encapsulate protocol over the UDP message?
If you really want to keep things simple and change the existing code the least, you can include empty values for parameters with values that didn't change. E.g. if you have four parameters, then assuming they all changed you'd send 142,10,2,negative,positive, but if only the first two changed you'd send 142,10,,. But, such ad-hoc schemes should be IMHO discouraged.
You could use json with very short member strings. E.g.{"a":142,"b":10}. You'd have to keep a human-readable mapping between the short string keys and their meaning separate from the data. Since the strings can be any Unicode character, you have quite a way to go before you ran out of single characters to use. Also, Python natively supports json.
If you don't care much about the length of the packet, then you don't even need short member strings: make your packets-self documenting by using meaningful strings, such as {"velocity":142,"acceleration":10}.

How to incrementally parse JSON from a websocket message in Python?

I have JSON messages (JSON hash table) represented as strings coming in through a websocket. Each read from the socket may return a string that does not end on a message boundary. What's the easiest way to parse the JSON messages in Python? How do I find where in the string a message terminates without writing a parser (or brace/paren matcher) myself?
Do other languages provide tools to make this easier?
Each read from the socket may return a string that does not end on a message boundary.
This is incorrect - Websockets is a message oriented protocol as opposed to the traditional stream oriented TCP socket protocol where you needed to worry about and handle message chunking.
It is built on top of TCP, however it automatically handles piecing the individual fragments into one complete message before delivering it at the application layer (your code).
So websockets are message-oriented like UDP without the maximum length constraints but with TCP’s delivery guarantees and congestion control. It turns out that TCP’s stream orientation isn’t all that useful (think about how many protocols build some sort of “message” concept on top of TCP). In fact SCTP (RFC 4960) provides many of the same benefits of messages-on-top-of-TCP but removes the TCP part to reduce the overhead. Unfortunately, SCTP is yet to gain widespread adoption.
Also from the official RFC:
layers a framing mechanism on top of TCP to get back to the IP packet mechanism that TCP is built on, but without length limits

Python TCP socket for a lot of data

We (as project group) are currently stuck on the issue of how to handle live data to our server.
We are getting updates on data every second, and we would like to insert this into our database (security is currently not an issue, because it is a school project). The problem is here we tried python SockerServer and AsyncIO to create a TCP server to which the data can be sent.
We got this working with different libraries etc. But we are stuck on the fact that if we keep an open connection with the client (in this case hardware which sends data every second) we can't split the different JSON or XML messages. They are all added up together.
We know why because TCP only provides order.
Any thoughts on how to handle this? So that every message sent will get split from the others.
Recreating the socket won't be the right option if I recall correctly.
What you will have to do is ensure that there is a clear delimiter for each message. For example, the first 6 characters of every message could be the length of the message - whatever reads from the socket decodes the length then reads that number of bytes, and sends the data to whatever needs it. Another way would be if there is a character/byte which never appears in the content, send it immediately before a message - for example control-A (binary value 1) could be the leadin character, and send control-B (binary value 2) as the leadout. Again the server looks for these framing a message.
If you can't change the client side (the thing sending the data), then you are going to have to parse the input. You can't just add a delimiter to something that you don't control.
An alternative is to use a header that encodes the size of the message that will be sent. Lets say you use a header of 4 bytes, The client first send the server a header with the size of the message to come. The client then sends the message (up to 4 gigs or there about). The server knows that it must first read 4 bytes (a header). It calculates the size n that the header contained then reads n bytes from the socket buffer. You are guaranteed to have read only your message. Using special delimiters is dangerous as you MUST know all possible values that a client can send.
It really depends on the type of data you are receiving. What type of connection, latency... If you have a pause of 1 second between packets and your connection is consistent, you could probably get away with first reading the entire buffer once to clear it, then as soon as there is data available - read it and clear the buffer it. not a great approach, but it might work for what you need - and no parsing involved.

Google protocol buffer for parsing Text and Binary protocol messages in network trace (PCAP)

I want to parse application layer protocols from network trace using Google protocol buffer and replay the trace (I am using python). I need suggestions to automatically generate protocol message description (in .proto file) from a network trace.
So you want to reconstruct what .proto messages were being passed over the application-layer protocol?
This isn't as easy as it sounds. First, .proto messages can't be sent raw over the wire, as the receiver needs to know how long they are. They need to be encapsulated somehow, maybe in an HTTP POST or with a raw 4-byte size prepended. I don't know what it would be for your application, but you'll need to deal with that.
Second, you can't reconstruct the full .proto from the messages alone. You only get tag numbers and types, not names. In addition, you will lose information about submessages - submessages and plain strings are encoded identically (you could probably tell which is which by eyeballing them, but I don't think you could do it automatically). You also will never know about optional items that never got sent. But you could parse the buffer without the proto and get some reasonable data (ints, repeated strings, and such).
Third, you need to reconstruct the application byte stream from the pcap log. I'm not sure how to do that, but I suspect there are tools that would do that for you.

Python - network buffer handling question

I want to design a game server in python. The game will mostly just be passing small packets filled with ints, strings, and bytes stuffed into one message. As I'm using a different language to write the game, a normal packet would be sent like so:
Writebyte(buffer, 5); // Delimit type of message
Writestring(buffer, "Hello");
Sendmessage(buffer, socket);
As you can see, it writes the bytes to the buffer, and sends the buffer. Is there any way to read something like this in python? I am aware of the struct module, and I've used it to pack things, but I've never used it to actually read something with mixed types stuck into one message. Thanks for the help.
I would recommend using Google Protocol Buffers. Protocol Buffers gives you a multi-language, fast, extensible message serialization framework. You can easily add fields later, parse messages in most popular languages, and embed message types within other message types. It will save you a lot of time as compared to coding your own serialization framework.
Check out http://twistedmatrix.com/ and http://construct.wikispaces.com/

Categories