How to read JSON from socket in python? (Incremental parsing of JSON) - python

I have a socket opened and I'd like to read some json data from it. The problem is that the json module from standard library can only parse from strings (load only reads the whole file and calls loads inside) It even looks that all the way inside the module it all depends on the parameter being string.
This is a real problem with sockets since you can never read it all to string and you don't know how many bytes to read before you actually parse it.
So my questions are: Is there a (simple and elegant) workaround? Is there another json library that can parse data incrementally? Is it worth writing it myself?
Edit: It is XBMC jsonrpc api. There are no message envelopes, and I have no control over the format. Each message may be on a single line or on several lines.
I could write some simple parser that needs only getc function in some form and feed it using s.recv(1), but this doesn't as a very pythonic solution and I'm a little lazy to do that :-)

Edit: given that you aren't defining the protocol, this isn't useful, but it might be useful in other contexts.
Assuming it's a stream (TCP) socket, you need to implement your own message framing mechanism (or use an existing higher level protocol that does so). One straightforward way is to define each message as a 32-bit integer length field, followed by that many bytes of data.
Sender: take the length of the JSON packet, pack it into 4 bytes with the struct module, send it on the socket, then send the JSON packet.
Receiver: Repeatedly read from the socket until you have at least 4 bytes of data, use struct.unpack to unpack the length. Read from the socket until you have at least that much data and that's your JSON packet; anything left over is the length for the next message.
If at some point you're going to want to send messages that consist of something other than JSON over the same socket, you may want to send a message type code between the length and the data payload; congratulations, you've invented yet another protocol.
Another, slightly more standard, method is DJB's Netstrings protocol; it's very similar to the system proposed above, but with text-encoded lengths instead of binary; it's directly supported by frameworks such as Twisted.

If you're getting the JSON from an HTTP stream, use the Content-Length header to get the length of the JSON data. For example:
import httplib
import json
h = httplib.HTTPConnection('graph.facebook.com')
h.request('GET', '/19292868552')
response = h.getresponse()
content_length = int(response.getheader('Content-Length','0'))
# Read data until we've read Content-Length bytes or the socket is closed
data = ''
while len(data) < content_length or content_length == 0:
s = response.read(content_length - len(data))
if not s:
break
data += s
# We now have the full data -- decode it
j = json.loads(data)
print j

What you want(ed) is ijson, an incremental json parser.
It is available here: https://pypi.python.org/pypi/ijson/ . The usage should be simple as (copying from that page):
import ijson.backends.python as ijson
for item in ijson.items(file_obj):
# ...
(for those who prefer something self-contained - in the sense that it relies only on the standard library: I wrote yesterday a small wrapper around json - but just because I didn't know about ijson. It is probably much less efficient.)
EDIT: since I found out that in fact (a cythonized version of) my approach was much more efficient than ijson, I have packaged it as an independent library - see here also for some rough benchmarks: http://pietrobattiston.it/jsaone

Do you have control over the json? Try writing each object as a single line. Then do a readline call on the socket as described here.
infile = sock.makefile()
while True:
line = infile.readline()
if not line: break
# ...
result = json.loads(line)

Skimming the XBMC JSON RPC docs, I think you want an existing JSON-RPC library - you could take a look at:
http://www.freenet.org.nz/dojo/pyjson/
If that's not suitable for whatever reason, it looks to me like each request and response is contained in a JSON object (rather than a loose JSON primitive that might be a string, array, or number), so the envelope you're looking for is the '{ ... }' that defines a JSON object.
I would, therefore, try something like (pseudocode):
while not dead:
read from the socket and append it to a string buffer
set a depth counter to zero
walk each character in the string buffer:
if you encounter a '{':
increment depth
if you encounter a '}':
decrement depth
if depth is zero:
remove what you have read so far from the buffer
pass that to json.loads()

You may find JSON-RPC useful for this situation. It is a remote procedure call protocol that should allow you to call the methods exposed by the XBMC JSON-RPC. You can find the specification on Trac.

res = str(s.recv(4096), 'utf-8') # Getting a response as string
res_lines = res.splitlines() # Split the string to an array
last_line = res_lines[-1] # Normally, the last one is the json data
pair = json.loads(last_line)
https://github.com/A1vinSmith/arbitrary-python/blob/master/sockets/loopHost.py

Related

python concatenate ctype struct and array

I am trying to implement a custom UDP protocol. This protocol has a header and data. The header has information about the structure of the data.
The header is defined using a ctypes struct and the data using 'h' array(signed short).
I am using python sockets.
I tried sending the header ans data using separate calls to "socket.sendto" like this:
s.sendto(header, addr)
s.sendto(data, addr)
But I am not able to receive this as a continuous stream. "socket.recvfrom" is only fetching the header. Maybe if I call "socket.recvfrom" again I will get the data as well. But thats not what I need. I need the full packet as a stream. I think concatenating the header and data in the server itself might fix this.
So I tried different combinations of the following to concatenate the two:
converting the header to "c_char_p" array.
Converting the data to bytearray
using numpy.concatenate
Standard array concatenation after converting the header to 'B' array.
Converting header to 'h' array. (This failed with "string length not a multiple of item size"
All the above failed for one reason or the other.
If I need to convert either header or data, I would prefer it to be the header as it is smaller. But if there is no way around, I am ok with converting the data.
Would appreciate any help.
Relevant code snippets:
sdata = array("h")
.
.
.
header=protocol.HEADER(1,50,1,50,1,50,1,50,1,10,1,12,1,1,1,1,1,18,19,10,35,60,85,24,25)
.
.
s.sendto(header+sdata, addr)
You can copy the header struct into a ctypes array of bytes:
>>> buf = (ctypes.c_char * ctypes.sizeof(header)).from_buffer_copy(header)
Now, in Python 2,
>>> buf.raw + sdata.tostring()
should give you what you're looking for.
In Python 3, it would be
>>> buf.raw + sdata.tobytes()

How to send a dict of namedtuples or just a namedTuple to a client (socket)?

Simply put, I have a dictionary (dictData[name]=namedTuple) of namedTuples (records) on a server.
Objective: I want to send the entire thing (dictData) or a single instance (dictData[key]) to the client via a SOCKET connection so it can be printed (shown on screen).
To send a single record I have tried to do the following:
response = dictData["John"]
print (response) #ensure it is the correct record
s.send(response)
However this generates the following error:
"TypeError: 'record' does not support the buffer interface"
I have tried to encode it and convert it but nothing I do seems to work. I am even open to converting it to a STRING and sending the string but I can't seem to find out how to convert a namedTuple to a string either.
And then, no clue where to start to send the entire dictionary to the client so they can print the entire set?
Any help would be much appreciated.
Sockets can only send and receive bytes, so you need to serialise your named tuples and dictionary to something else.
You could use JSON to produce a string representation of your tuples and dictionary, for example. The json library produces a (unicode) string when encoding, you'll need to encode that to UTF-8 (or similar) to produce bytes:
import json
# sending one tuple
response = json.dumps(dictData["John"])
s.send(response.encode('utf8'))
# sending all of the dictionary
response = json.dumps(dictData)
s.send(response.encode('utf8'))
This will not preserve the named tuple attribute names; the values are sent over as a JSON array instead (so an ordered list).
Another option is to use the pickle module; this would require the listener on the other side to also be coded in Python and to have the same record named tuple type importable from the exact same location, however.
When Pickle loads the data, the name of the full qualifying name of the namedtuple type is included, and you must be able to import that type on both ends of the socket. If you have a line in a module at the global level:
record = namedtuple('record', 'field1 field2 field3')
then from yourmodule import record is possible, but the exact same import should work on the other side too.
pickle.dumps() produces a bytes object which can be written to the socket without encoding.

Passing a record over a socket

I have basic socket communication set up between python and Delphi code (text only). Now I would like to send/receive a record of data on both sides. I have a Record "C compatible" and would like to pass records back and forth have it in a usable format in python.
I use conn.send("text") in python to send the text but how do I send/receive a buffer with python and access the record items sent in python?
Record
TPacketData = record
pID : Integer;
dataType : Integer;
size : Integer;
value : Double;
end;
I don't know much about python, but I have done a lot between Delphi, C++, C# and Java even with COBOL.Anyway, to send a record from Delphi to C first you need to pack the record at both ends,
in Deplhi
MyRecord = pack record
in C++
#pragma pack(1)
I don’t know in python but I guess there must be a similar one. Make sure that at both sides the sizeof(MyRecord) is the same length.Also, before sending the records, you should take care about byte ordering (you know, Little-Endian vs Big-Endian), use the Socket.htonl() and Socket.ntohl() in python and the equivalent in Deplhi which are in WinSock unit. Also a "double" in Delphi could not be the same as in python, in Delphi is 8 bytes check this as well, and change it to Single(4 bytes) or Extended (10 bytes) whichever matches.
If all that match then you could send/receive binary records in one shut, otherwise, I'm afraid, you have to send the individual fields one by one.
I know this answer is a bit late to the game, but may at least prove useful to other people finding this question in their search-results. Because you say the Delphi code sends and receives "C compatible data" it seems that for the sake of the answer about Python's handling it is irrelevant whether it is Delphi (or any other language) on the other end...
The python struct and socket modules have all the functionality for the basic usage you describe. To send the example record you would do something like the below. For simplicity and sanity I have presumed signed integers and doubles, and packed the data in "network order" (bigendian). This can easily be a one-liner but I have split it up for verbosity and reusability's sake:
import struct
t_packet_struc = '>iiid'
t_packet_data = struct.pack(t_packet_struc, pid, data_type, size, value)
mysocket.sendall(t_packet_data)
Of course the mentioned "presumptions" don't need to be made, given tweaks to the format string, data preparation, etc. See the struct inline help for a description of the possible format strings - which can even process things like Pascal-strings... By the way, the socket module allows packing and unpacking a couple of network-specific things which struct doesn't, like IP-address strings (to their bigendian int-blob form), and allows explicit functions for converting data bigendian-to-native and vice-versa. For completeness, here is how to unpack the data packed above, on the Python end:
t_packet_size = struct.calcsize(t_packet_struc)
t_packet_data = mysocket.recv(t_packet_size)
(pid, data_type, size, value) = struct.unpack(t_packet_struc,
t_packet_data)
I know this works in Python version 2.x, and suspect it should work without changes in Python version 3.x too. Beware of one big gotcha (because it is easy to not think about, and hard to troubleshoot after the fact): Aside from different endianness, you can also distinguish between packing things using "standard size and alignment" (portably) or using "native size and alignment" (much faster) depending on how you prefix - or don't prefix - your format string. These can often yield wildly different results than you intended, without giving you a clue as to why... (there be dragons).

TCP Sockets: Double messages

I'm having a problem with sockets in python.
I have a a TCP server and client that send each other data in a while 1 loop.
It packages up 2 shorts in the struct module (struct.pack("hh", mousex, mousey)). But sometimes when recving the data on the other computer, it seems like 2 messages have been glued together. Is this nagle's algorithm?
What exactly is going on here? Thanks in advance.
I agree with other posters, that "TCP just does that". TCP guarantees that your bytes arrive in the right order, but makes no guarantees about the sizes of the chunks they arrive in. I would add that TCP is also allowed to split a single send into multiple recv's, or even for example to split aabb, ccdd into aab, bcc, dd.
I put together this module for dealing with the relevant issues in python:
http://stromberg.dnsalias.org/~strombrg/bufsock.html
It's under an opensource license and is owned by UCI. It's been tested on CPython 2.x, CPython 3.x, Pypy and Jython.
HTH
To be sure I'd have to see actual code, but it sounds like you are expecting a send of n bytes to show up on the receiver as exactly n bytes all the time, every time.
TCP streams don't work that way. It's a "streaming" protocol, as opposed to a "datagram" (record-oriented) one like UDP or STCP or RDS.
For fixed-data-size protocols (or any where the next chunk size is predictable in advance), you can build your own "datagram-like receiver" on a stream socket by simply recv()ing in a loop until you get exactly n bytes:
def recv_n_bytes(socket, n):
"attempt to receive exactly n bytes; return what we got"
data = []
while True:
have = sum(len(x) for x in data)
if have >= n:
break
want = n - have
got = socket.recv(want)
if got == '':
break
return ''.join(data)
(untested; python 2.x code; not necessarily efficient; etc).
You may not assume that data will become available for reading from the local socket in the same size pieces it was provided for sending at the other source end. As you have seen, this might sometimes be usually true, but by no means reliably so. Rather, what TCP guarantees is that what goes in one end will eventually come out the other, in order without anything missing or if that cannot be achieved by means built into the protocol such as retries, then whole thing will break with an error.
Nagle is one possible cause, but not the only one.

Pack array of namedtuples in PYTHON

I need to send an array of namedtuples by a socket.
To create the array of namedtuples I use de following:
listaPeers=[]
for i in range(200):
ipPuerto=collections.namedtuple('ipPuerto', 'ip, puerto')
ipPuerto.ip="121.231.334.22"
ipPuerto.puerto="8988"
listaPeers.append(ipPuerto)
Now that is filled, i need to pack "listaPeers[200]"
How can i do it?
Something like?:
packedData = struct.pack('XXXX',listaPeers)
First of all you are using namedtuple incorrectly. It should look something like this:
# ipPuerto is a type
ipPuerto=collections.namedtuple('ipPuerto', 'ip, puerto')
# theTuple is a tuple object
theTuple = ipPuerto("121.231.334.22", "8988")
As for packing, it depends what you want to use on the other end. If the data will be read by Python, you can just use Pickle module.
import cPickle as Pickle
pickledTuple = Pickle.dumps(theTuple)
You can pickle whole array of them at once.
It is not that simple - yes, for integers and simple numbers, it s possible to pack straight from named tuples to data provided by the struct package.
However, you are holding your data as strings, not as numbers - it is a simple thing to convert to int in the case of the port - as it is a simple integer, but requires some juggling when it comes to the IP.
def ipv4_from_str(ip_str):
parts = ip_str.split(".")
result = 0
for part in parts:
result <<= 8
result += int(part)
return result
def ip_puerto_gen(list_of_ips):
for ip_puerto in list_of_ips:
yield(ipv4_from_str(ip_puerto.ip))
yield(int(ip_puerto.puerto))
def pack(list_of_ips):
return struct.pack(">" + "II" * len(list_of_ips),
*ip_puerto_gen(list_of_ips)
)
And you then use the "pack" function from here to pack your structure as you seem to want.
But first, attempt to the fact that you are creating your "listaPiers" incorrectly (your example code simply will fail with an IndexError) - use an empty list, and the append method on it to insert new named tuples with ip/port pairs as each element:
listaPiers = []
ipPuerto=collections.namedtuple('ipPuerto', 'ip, puerto')
for x in range(200):
new_element = ipPuerto("123.123.123.123", "8192")
listaPiers.append(new_element)
data = pack(listaPiers)
ISTR that pickle is considered insecure in server processes, if the server process is receiving pickled data from untrusted clients.
You might want to come up with some sort of separator character(s) for the records and fields (perhaps \0 and \001 or \376 and \377). Then putting together a message is kind of like a text file broken up into records and fields separated by spaces and newlines. Or for that matter, you could use spaces and newlines, if your normal data doesn't include these.
I find this module very valuable for framing data in socket-based protocols:
http://stromberg.dnsalias.org/~strombrg/bufsock.html
It lets you do things like "read up until the next null byte" or "read the next 10 characters" - without needing to worry about the complexities of IP aggregating or splitting packets.

Categories