Deserialize list of objects using protobuf

Deserialize list of objects using protobuf - python

I'm building a C# server and python client app with socket communication. The server sends serialized list of objects to client, but I've got no idea (and couldn't find either) how to deserialize a list in python. Any help would be appreciated.

Allright, I found solution if anyone is interested. The trick is to create a new message type and add original one as repeated. Here's how
message TransactionPackets {
repeated TransactionPacket packet = 1;
}
message TransactionPacket {
required int32 trans_id = 1;
required string user_id = 2;
required int64 date = 3;
}
Now I can simply deserialize a list of objects by calling TransactionPackets.ParseFromString()

Check this:
"The Protocol Buffer wire format is not self-delimiting, so protocol buffer parsers cannot determine where a message ends on their own. The easiest way to solve this problem is to write the size of each message before you write the message itself. When you read the messages back in, you read the size, then read the bytes into a separate buffer, then parse from that buffer."
https://developers.google.com/protocol-buffers/docs/techniques

Related

Send and receive objects through WebRTC data channel

In my project, I am using WebRTC to connect between 2 client using the aiortc package.
I am using this example code and it works, but it seems I can't send non-string data in the data channel.
This is what I send in the data channel (modified the code in the start function in client.js file):
dc.onopen = function() {
dataChannelLog.textContent += '- open\n';
dcInterval = setInterval(function() {
let message = new DataObject(/*All the parameters*/);
dataChannelLog.textContent += '> ' + message + '\n';
dc.send(message);
}, 100);
};
Where DataObject is a class I created that contains the data I want to send.
The Python client receives [object Object] as a string. I expected it will send the bytes representing the object that I can convert back in Python to a normal class.
I know that a workaround for this is to convert the object to a string-format (like JSON), but I prefer not to do it because I am sending the objects very frequently (and every object contains a large array in it) and I am sure it will lead to a performance issues.
So my question is, how can I send the object through the data channel without converting to a string¿
EDIT: If it helps, I can use an array instead of an object to represent my data. But again, it is still sent and received as a string.

You need some sort of serializer function to convert your Javascript object into a stream of bytes. Those bytes don't have to be readable as text. You can't just send a Javascript object.
The built-in robust and secure serializer is, of course JSON.stringify(). As you've pointed out JSON is a verbose format.
To avoid using JSON, you'll need to create your own serializer in Javascript and deserializer in Python. Those will most likely be custom code for your particular object type. For best results, you'll copy the attributes of your object, one by one, into a Uint8Array, then send that.
You didn't tell us anything about your object, so it's hard to help you further.
If this were my project I'd get everything working with JSON and then do the custom serialization as a performance enhancement.

Thanks o-jones for the detailed answer.
In my case it was fairly simple, because I was able to represent all my data as an array.
The main issue I had is that I didn't know the send function has an "overload" that accepts bytes array…
After realizing that, I created a Float32Array in Javascript to hold my data and send it.
And in the Python side, I read this data and converted it to a float array using struct.unpack function.
Something like that:
Javascript side:
dc.onopen = function() {
dataChannelLog.textContent += '- open\n';
dcInterval = setInterval(function() {
let dataObj = new DataObject(/*All the parameters*/);
let data = new Float32Array(3); // Build an array (in my case 3 floats array)
// Insert the data into the array
data[0] = dataObj.value1;
data[1] = dataObj.value2;
data[2] = dataObj.value3;
// Send the data
dc.send(data);
}, 100);
};
Python side:
import struct
def message_received(message: str | bytes) -> None:
if isinstance(message, str):
return # Don't handle string messages
# Read 3 floats from the bytes array we received.
data = struct.unpack('3f', message)

swift - Writing binary directly, like Python

In Python, you can write in hex directly like this; \x00\x01\x02. For a project of mine, I need to be able to do the same or similar in Swift 3; I'm basically communicating over a websocket, and I have to send byte strings with values such as 0x01 or 0x13, values that do not have a printing character equivalent.
The format of the commands would be represented in Python like this; \x25\x00\x00\x01\x00, where individual bytes are not only commands but parameters. For this reason I need the ability to append like strings without one byte affecting another.
I've been looking around, and I once found a Swift class called Byte(), however for the life of me I can't figure out how to use it. Also, a lot of the tutorials I've seen are based on either receiving byte data or converting strings to binary, something I can't do.
Thank you for your help.
Edit: On request, this is an example of what I'm trying to do;
In Python: (this works)
import socket
sock = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
sock.connect(('localhost','8080'))
payload = b'\x25\x00\x00\x01\x00'
sock.send(payload)
resp = sock.recv(100)
print resp
In Swift: (I need to fill in the nil)
import UIKit
import Starscream
class ViewController: UIViewController, WebSocketDelegate{
var socket: WebSocket = WebSocket(url: URL(string: "localhost:8080")!)
// ... viewDidLoad, didReceiveMemoryWarning, and
// Starscream functions like websocketDidConnect
#IBAction func sendMessage(_ sender: UIButton){
var payload = nil //I need to write payload in Python above here
socket.write(payload)
}
}

from Starscream
/**
Write binary data to the websocket. This sends it as a binary frame.
If you supply a non-nil completion block, I will perform it when the write completes.
- parameter data: The data to write.
- parameter completion: The (optional) completion handler.
*/
open func write(data: Data, completion: (() -> ())? = nil) {
guard isConnected else { return }
dequeueWrite(data, code: .binaryFrame, writeCompletion: completion)
}
based on that, you have to
let data = Data(bytes: [UInt8(25),0,1,2,0,255,0xAB]) // create data b'\x25\x00\x01\x02\x00\xFF\xAB'
sock.write(data: data)

Possible encoding issues when serializing user defined type using MessagePack in Delphi?

I'm trying to serialize a Record in Delphi by using MessagePack and then using ZeroMQ TCP protocol I send it to a Python server.
b'\xa8DataType\x01\xa4data\xbf{"major":1,"minor":0,"build":2}\x00'
I'm having trouble deserializing it on the server side. Any ideas why this is happening? Is it some kind of encoding issues? Thanks!
Update #1:
I use the messagepack library "QMsgPack" recommended on www.msgpack.org Here's some Delphi code. My user defined Records and an enum:
Version = Record
major : Integer;
minor : Integer;
build : Integer;
end;
TDataType = (dtUnset, dtVersion, dtEntityDescription, dtEntityDescriptionVector, dtEntityState, dtEntityStateVector, dtCommandContinue, dtCommandComplete);
TPacket = Record
DataType : TDataType;
data : string;
end;
And the code to serialize the object:
begin
dmVersion.major := 1;
dmVersion.minor := 1;
dmVersion.build := 1;
lvMsg := TQMsgPack.Create;
lvMsg.FromRecord(dmVersion);
lvMsgString := lvMsg.ToString();
packet.DataType := dtVersion;
packet.data := lvMsgString;
lvMsg.Clear;
lvMsg.FromRecord(packet);
lvbytes:=lvMsg.Encode;
ZeroMQ.zSendByteArray(skt, lvbytes);
I then try to deserialize the received byte array in the python server which looks like this:
b'\xa8DataType\x01\xa4data\xbf{"major":1,"minor":0,"build":2}\x00'
by using umsgpack.unpack() and then print out the result in the result like this:
packet_packed = command.recv()
# Unpack the packet
umsgpack.compatibility = True
packet = umsgpack.unpackb( packet_packed )
print (packet)
for item in packet:
print (item)
and this is what I get printed out on the screen:
b'DataType'
68
97
116
97
84
121
112
101
I hope this helps! Thanks!
Update #2
Here is some server code on the python side. The VDS_PACKET_VERSION is a constant int set to 1.
# Make sure its a version packet
if VDS_PACKET_VERSION == packet[0]:
# Unpack the data portion of the packet
version = umsgpack.unpackb( packet[1] )
roster = []
if ( VDS_VERSION_MAJOR == version[0] ) and ( VDS_VERSION_MINOR == version[1] ) and ( VDS_VERSION_BUILD == version[2] ):
dostuff()
With the current serialized object
b'\x82\xa8DataType\x01\xa4data\xbf{"major":1,"minor":1,"build":1}'
I get
KeyError: 0 on packet[0]
Why is that?

The packed data appears to be invalid.
>>> packet = { "DataType": 1, "data": "{\"major\":1,\"minor\":0,\"build\":2}"}
>>> umsgpack.packb(packet)
b'\x82\xa4data\xbf{"major":1,"minor":0,"build":2}\xa8DataType\x01'
The first byte is \x82 which, as can be seen in the specification, is a two entry fixmap.
Your packed data is missing that information, and launches straight in to a fixstr. So, yes, there could be a mismatch between your Delphi based packer and the Python based unpacker. However, when I take your Delphi code, using the latest qmsgpack from the repo, it produces the following bytes:
82A8446174615479706501A464617461
BF7B226D616A6F72223A312C226D696E
6F72223A312C226275696C64223A317D
Let's convert that into a Python bytes object. It looks like this:
b'\x82\xa8DataType\x01\xa4data\xbf{"major":1,"minor":1,"build":1}'
Now, that's quite different from what you report. And umsgpack can unpack it just fine. Note that the first byte is \x82, a two entry fixmap, just as expected. Yes, the entries are in a different order, but that's just fine. Order is not significant for a map.
So, I've been able to encode using qmsgpack in Delphi, and decode using umsgpack in Python. Which then suggests that this issue is really in the transmission. It looks to me as though there has been an off-by-one error. Instead of transmitting bytes 0 to N-1, bytes 1 to N have been transmitted. Note the spurious trailing zero in your received data.
In the comments you obverse that the data field is being coded as JSON and passed as a string. But you'd rather have that data encoded using MessagePack. So here's what to do:
In the Delphi code change the data field's type from string to TBytes. That's because we are going to put a byte array in there.
Populate the data field using Encode, like this: packet.data := lvMsg.Encode.
On the Python side, when you unpack data you'll find that it is an array of integers. Convert that to bytes and then unpack: umsgpack.unpackb(bytes(data)).

How to read JSON from socket in python? (Incremental parsing of JSON)

I have a socket opened and I'd like to read some json data from it. The problem is that the json module from standard library can only parse from strings (load only reads the whole file and calls loads inside) It even looks that all the way inside the module it all depends on the parameter being string.
This is a real problem with sockets since you can never read it all to string and you don't know how many bytes to read before you actually parse it.
So my questions are: Is there a (simple and elegant) workaround? Is there another json library that can parse data incrementally? Is it worth writing it myself?
Edit: It is XBMC jsonrpc api. There are no message envelopes, and I have no control over the format. Each message may be on a single line or on several lines.
I could write some simple parser that needs only getc function in some form and feed it using s.recv(1), but this doesn't as a very pythonic solution and I'm a little lazy to do that :-)

Edit: given that you aren't defining the protocol, this isn't useful, but it might be useful in other contexts.
Assuming it's a stream (TCP) socket, you need to implement your own message framing mechanism (or use an existing higher level protocol that does so). One straightforward way is to define each message as a 32-bit integer length field, followed by that many bytes of data.
Sender: take the length of the JSON packet, pack it into 4 bytes with the struct module, send it on the socket, then send the JSON packet.
Receiver: Repeatedly read from the socket until you have at least 4 bytes of data, use struct.unpack to unpack the length. Read from the socket until you have at least that much data and that's your JSON packet; anything left over is the length for the next message.
If at some point you're going to want to send messages that consist of something other than JSON over the same socket, you may want to send a message type code between the length and the data payload; congratulations, you've invented yet another protocol.
Another, slightly more standard, method is DJB's Netstrings protocol; it's very similar to the system proposed above, but with text-encoded lengths instead of binary; it's directly supported by frameworks such as Twisted.

If you're getting the JSON from an HTTP stream, use the Content-Length header to get the length of the JSON data. For example:
import httplib
import json
h = httplib.HTTPConnection('graph.facebook.com')
h.request('GET', '/19292868552')
response = h.getresponse()
content_length = int(response.getheader('Content-Length','0'))
# Read data until we've read Content-Length bytes or the socket is closed
data = ''
while len(data) < content_length or content_length == 0:
s = response.read(content_length - len(data))
if not s:
break
data += s
# We now have the full data -- decode it
j = json.loads(data)
print j

What you want(ed) is ijson, an incremental json parser.
It is available here: https://pypi.python.org/pypi/ijson/ . The usage should be simple as (copying from that page):
import ijson.backends.python as ijson
for item in ijson.items(file_obj):
# ...
(for those who prefer something self-contained - in the sense that it relies only on the standard library: I wrote yesterday a small wrapper around json - but just because I didn't know about ijson. It is probably much less efficient.)
EDIT: since I found out that in fact (a cythonized version of) my approach was much more efficient than ijson, I have packaged it as an independent library - see here also for some rough benchmarks: http://pietrobattiston.it/jsaone

Do you have control over the json? Try writing each object as a single line. Then do a readline call on the socket as described here.
infile = sock.makefile()
while True:
line = infile.readline()
if not line: break
# ...
result = json.loads(line)

Skimming the XBMC JSON RPC docs, I think you want an existing JSON-RPC library - you could take a look at:
http://www.freenet.org.nz/dojo/pyjson/
If that's not suitable for whatever reason, it looks to me like each request and response is contained in a JSON object (rather than a loose JSON primitive that might be a string, array, or number), so the envelope you're looking for is the '{ ... }' that defines a JSON object.
I would, therefore, try something like (pseudocode):
while not dead:
read from the socket and append it to a string buffer
set a depth counter to zero
walk each character in the string buffer:
if you encounter a '{':
increment depth
if you encounter a '}':
decrement depth
if depth is zero:
remove what you have read so far from the buffer
pass that to json.loads()

You may find JSON-RPC useful for this situation. It is a remote procedure call protocol that should allow you to call the methods exposed by the XBMC JSON-RPC. You can find the specification on Trac.

res = str(s.recv(4096), 'utf-8') # Getting a response as string
res_lines = res.splitlines() # Split the string to an array
last_line = res_lines[-1] # Normally, the last one is the json data
pair = json.loads(last_line)
https://github.com/A1vinSmith/arbitrary-python/blob/master/sockets/loopHost.py

How do I dump the TCP client's buffer in order to accept more data?

I've got a simple TCP server and client. The client receives data:
received = sock.recv(1024)
It seems trivial, but I can't figure out how to recieve data larger than the buffer. I tried chunking my data and sending it multiple times from the server (worked for UDP), but it just told me that my pipe was broken.
Suggestions?

If you have no idea how much data is going to pour over the socket, and you simply want to read everything until the socket closes, then you need to put socket.recv() in a loop:
# Assumes a blocking socket.
while True:
data = sock.recv(4096)
if not data:
break
# Do something with `data` here.

Mike's answer is the one you're looking for, but that's not a situation you want to find yourself in. You should develop an over-the-wire protocol that uses a fixed-length field that describes how much data is going to be sent. It's a Type-Length-Value protocol, which you'll find again and again and again in network protocols. It future-proofs your protocol against unforeseen requirements and helps isolate network transmission problems from programmatic ones.
The sending side becomes something like:
socket.write(struct.pack("B", type) #send a one-byte msg type
socket.write(struct.pack("H", len(data)) #send a two-byte size field
socket.write(data)
And the receiving side something like:
type = socket.read(1) # get the type of msg
dataToRead = struct.unpack("H", socket.read(2))[0] # get the len of the msg
data = socket.read(dataToRead) # read the msg
if TYPE_FOO == type:
handleFoo(data)
elif TYPE_BAR == type:
handleBar(data)
else:
raise UnknownTypeException(type)
You end up with an over-the-wire message format that looks like:
struct {
unsigned char type;
unsigned short length;
void *data;
}

Keep in mind that:
Your operating system has it's own idea of what it's TCP/IP socket buffer size is.
TCP/IP packet maximum size (generally is 1500 bytes)
pydoc for socket suggests that 4096 is a good buffer size
With that said, it'd really be helpful to see the code around that one line. There are a few things that could play into this, if you're using select or just polling, is the socket non-blocking, etc.
It also matters how you're sending the data, if your remote end disconnects. More details.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Deserialize list of objects using protobuf - python

I'm building a C# server and python client app with socket communication. The server sends serialized list of objects to client, but I've got no idea (and couldn't find either) how to deserialize a list in python. Any help would be appreciated.

Related

Send and receive objects through WebRTC data channel

swift - Writing binary directly, like Python

Possible encoding issues when serializing user defined type using MessagePack in Delphi?

How to read JSON from socket in python? (Incremental parsing of JSON)

How do I dump the TCP client's buffer in order to accept more data?

Categories

Resources