I am struggling with an encoding issue. I am still trying to figure out the Python3 encoding scheme. I am trying to upload a json object from Python into an Azure Queue. I am using Python3
I make the json object
response = {"UserImageId": 636667744866847370, "OutputImageName": "car-1807177_with_blue-2467336_size_1020_u38fa38.png"}
queue_service.put_message(response_queue, json.dumps(response))
When it gets to the queue, I get the error
{"imgResponse":"The input is not a valid Base-64 string as it contains a non-base 64 character, more than two padding characters, or an illegal character among the padding characters. ","log":null,"$return":""}
So I have to do something else, because apparently I need to base64 encode my string. So I try
queue_service.put_message(response_queue, base64.b64encode(json.dumps(response).encode('utf-8')))
and I get
TypeError: message should be of type str
From the Azure Storage Queue package. If I check the type of the above statement, it is of type bytes (makes sense).
So my question is, how do I encode my json object into something that the queue service will understand. I would really like to be able to keep the _ and - and . characters in the image name.
If anyone is looking to solve this problem using QueueClient rather than QueueService, here is what worked for me:
import json
from azure.storage.queue import QueueServiceClient, QueueClient, QueueMessage, TextBase64EncodePolicy
conn_string = '[YOUR_CONNECTION_STRING_HERE]'
queue_client = QueueClient.from_connection_string(
conn_string,
'[QUEUE_NAME_HERE]',
message_encode_policy=TextBase64EncodePolicy()
)
queue_client.send_message(json.dumps({'a':'b'}))
this is what I had to do in my code to make it work:
queue_service = QueueService(account_name=os.getenv('storageAccount'), account_key=os.getenv('storageKey'))
queue_service.encode_function = QueueMessageFormat.text_base64encode
after that I could just put messages:
queue_service.put_message('bbbb', message) # 'bbbb' is a queue name
Related
I am trying to read a json which includes a number of tweets, but I get the following error.
OverflowError: int too large to convert
The script filters multiple json files to get specific tweets, and it crashes when reaching to a specific json.
The line that creates the error is this one :
df_temp = pd.read_json(path_or_buf=json_path, lines=True)
Here is the error in the cmd
Just store the user id as a String, and treat it like it is one (this is actually what you should do when dealing with this kind of ids). If you can't change the json input format, you can always parse it like a string before parsing it like a json object, and add the quotes to the id code, using for instance regexes: Regex in python.
I don't know with which library you are parsing the json, but maybe also implicit casting will work: either try the "getString" method on the number instead of the "getInt" method, or force python to treat the object like a string, with something like x = "" + json.getId()
Python is pretty loose on typing and may let you do it.
I'm having trouble parsing the body of a request using jsonlines. I'm using tornado as the server and this is happening inside a post() method.
My purpose in this is to parse the request's body into separate JSONs, then iterate over them with a jsonlines Reader, do some work on each one and then push them to a DB.
I solved this problem by dumping the utf-8 encoded body into a file and then used:
with jsonlines.open("temp.txt") as reader:
That works for me. I can iterate over the entire file with
for obj in reader:
I just feel like this is an unnecessary overhead that can be reduced if I can understand what's keeping me from just using this bit of code instead:
log = self.request.body.decode("utf-8")
with jsonlines.Reader(log) as reader:
for obj in reader:
the exception I get is this:
jsonlines.jsonlines.InvalidLineError: line contains invalid json:
Expecting property name enclosed in double quotes: line 1 column 2
(char 1) (line 1)
I've tried searching for this error here and all I found were examples where people tried using incorrectly formatted jsons that have one quote instead of double quotes. That is not the case for me. I debugged the request and saw that the string that returns from the decode method indeed has double quotes for both properties and values.
here is a sample of the body of the request I send (this is what it looks like in Postman):
{"type":"event","timestamp":"2018-03-25 09:19:50.999","event":"ButtonClicked","params":{"screen":"MainScreen","button":"SettingsButton"}}
{"type":"event","timestamp":"2018-03-25 09:19:51.061","event":"ScreenShown","params":{"name":"SettingsScreen"}}
{"type":"event","timestamp":"2018-03-25 09:19:53.580","event":"ButtonClicked","params":{"screen":"SettingsScreen","button":"MissionsButton"}}
{"type":"event","timestamp":"2018-03-25 09:19:53.615","event":"ScreenShown","params":{"name":"MissionsScreen"}}
You can reproduce the exception by using this simple bit of code in a post method and sending the lines I provided through Postman:
log = self.request.body.decode("utf-8")
with jsonlines.Reader(log) as currentlog:
for obj in currentlog:
print("obj")
As a sidenote: Postman sends the data as text, not JSON.
If you need any more information to answer this question, please let me know.
One thing I did notice is that the string that returns from the decode method starts and ends with one quote. I guess this is because of the double quotes in the JSONs themselves. Is it related in any way?
An example:
'{"type":"event","timestamp":"2018-03-25 09:19:50.999","event":"ButtonClicked","params":{"screen":"MainScreen","button":"SettingsButton"}}'
Thanks for any help!
jsonlines.Reader accepts iterable as an arg ("The first argument must be an iterable that yields JSON encoded strings" not json-encoded single string as in your example), but, after .decode("utf-8"), log would be a string, which happen to support iterable interface. So when reader calls under the hood next(log) it will get first item of a log string, i.e. character { and will try to process it as an json-line which would be obviously invalid. Try log = log.split() before passing log to the Reader.
Simply put, I have a dictionary (dictData[name]=namedTuple) of namedTuples (records) on a server.
Objective: I want to send the entire thing (dictData) or a single instance (dictData[key]) to the client via a SOCKET connection so it can be printed (shown on screen).
To send a single record I have tried to do the following:
response = dictData["John"]
print (response) #ensure it is the correct record
s.send(response)
However this generates the following error:
"TypeError: 'record' does not support the buffer interface"
I have tried to encode it and convert it but nothing I do seems to work. I am even open to converting it to a STRING and sending the string but I can't seem to find out how to convert a namedTuple to a string either.
And then, no clue where to start to send the entire dictionary to the client so they can print the entire set?
Any help would be much appreciated.
Sockets can only send and receive bytes, so you need to serialise your named tuples and dictionary to something else.
You could use JSON to produce a string representation of your tuples and dictionary, for example. The json library produces a (unicode) string when encoding, you'll need to encode that to UTF-8 (or similar) to produce bytes:
import json
# sending one tuple
response = json.dumps(dictData["John"])
s.send(response.encode('utf8'))
# sending all of the dictionary
response = json.dumps(dictData)
s.send(response.encode('utf8'))
This will not preserve the named tuple attribute names; the values are sent over as a JSON array instead (so an ordered list).
Another option is to use the pickle module; this would require the listener on the other side to also be coded in Python and to have the same record named tuple type importable from the exact same location, however.
When Pickle loads the data, the name of the full qualifying name of the namedtuple type is included, and you must be able to import that type on both ends of the socket. If you have a line in a module at the global level:
record = namedtuple('record', 'field1 field2 field3')
then from yourmodule import record is possible, but the exact same import should work on the other side too.
pickle.dumps() produces a bytes object which can be written to the socket without encoding.
I have a socket opened and I'd like to read some json data from it. The problem is that the json module from standard library can only parse from strings (load only reads the whole file and calls loads inside) It even looks that all the way inside the module it all depends on the parameter being string.
This is a real problem with sockets since you can never read it all to string and you don't know how many bytes to read before you actually parse it.
So my questions are: Is there a (simple and elegant) workaround? Is there another json library that can parse data incrementally? Is it worth writing it myself?
Edit: It is XBMC jsonrpc api. There are no message envelopes, and I have no control over the format. Each message may be on a single line or on several lines.
I could write some simple parser that needs only getc function in some form and feed it using s.recv(1), but this doesn't as a very pythonic solution and I'm a little lazy to do that :-)
Edit: given that you aren't defining the protocol, this isn't useful, but it might be useful in other contexts.
Assuming it's a stream (TCP) socket, you need to implement your own message framing mechanism (or use an existing higher level protocol that does so). One straightforward way is to define each message as a 32-bit integer length field, followed by that many bytes of data.
Sender: take the length of the JSON packet, pack it into 4 bytes with the struct module, send it on the socket, then send the JSON packet.
Receiver: Repeatedly read from the socket until you have at least 4 bytes of data, use struct.unpack to unpack the length. Read from the socket until you have at least that much data and that's your JSON packet; anything left over is the length for the next message.
If at some point you're going to want to send messages that consist of something other than JSON over the same socket, you may want to send a message type code between the length and the data payload; congratulations, you've invented yet another protocol.
Another, slightly more standard, method is DJB's Netstrings protocol; it's very similar to the system proposed above, but with text-encoded lengths instead of binary; it's directly supported by frameworks such as Twisted.
If you're getting the JSON from an HTTP stream, use the Content-Length header to get the length of the JSON data. For example:
import httplib
import json
h = httplib.HTTPConnection('graph.facebook.com')
h.request('GET', '/19292868552')
response = h.getresponse()
content_length = int(response.getheader('Content-Length','0'))
# Read data until we've read Content-Length bytes or the socket is closed
data = ''
while len(data) < content_length or content_length == 0:
s = response.read(content_length - len(data))
if not s:
break
data += s
# We now have the full data -- decode it
j = json.loads(data)
print j
What you want(ed) is ijson, an incremental json parser.
It is available here: https://pypi.python.org/pypi/ijson/ . The usage should be simple as (copying from that page):
import ijson.backends.python as ijson
for item in ijson.items(file_obj):
# ...
(for those who prefer something self-contained - in the sense that it relies only on the standard library: I wrote yesterday a small wrapper around json - but just because I didn't know about ijson. It is probably much less efficient.)
EDIT: since I found out that in fact (a cythonized version of) my approach was much more efficient than ijson, I have packaged it as an independent library - see here also for some rough benchmarks: http://pietrobattiston.it/jsaone
Do you have control over the json? Try writing each object as a single line. Then do a readline call on the socket as described here.
infile = sock.makefile()
while True:
line = infile.readline()
if not line: break
# ...
result = json.loads(line)
Skimming the XBMC JSON RPC docs, I think you want an existing JSON-RPC library - you could take a look at:
http://www.freenet.org.nz/dojo/pyjson/
If that's not suitable for whatever reason, it looks to me like each request and response is contained in a JSON object (rather than a loose JSON primitive that might be a string, array, or number), so the envelope you're looking for is the '{ ... }' that defines a JSON object.
I would, therefore, try something like (pseudocode):
while not dead:
read from the socket and append it to a string buffer
set a depth counter to zero
walk each character in the string buffer:
if you encounter a '{':
increment depth
if you encounter a '}':
decrement depth
if depth is zero:
remove what you have read so far from the buffer
pass that to json.loads()
You may find JSON-RPC useful for this situation. It is a remote procedure call protocol that should allow you to call the methods exposed by the XBMC JSON-RPC. You can find the specification on Trac.
res = str(s.recv(4096), 'utf-8') # Getting a response as string
res_lines = res.splitlines() # Split the string to an array
last_line = res_lines[-1] # Normally, the last one is the json data
pair = json.loads(last_line)
https://github.com/A1vinSmith/arbitrary-python/blob/master/sockets/loopHost.py
I'm using a SOAP based web service that expects an image element in the form of a 'ByteArray' described in their docs as being of type 'byte[]' - the client I am using is the Python based suds library.
Problem is that I am not exactly sure how to represent the ByteArray in for this service - I presume that it should look something like the following list:
[71,73,70,56,57,97,1,0,1,0,128,0,0,255,255,255,0,0,0,33,249,4,0,0,0,0,0,44,0,0,0,0,1,0,1,0,0,2,2,68,1,0,59]
Now when I send this as part of the request, the service complains with the message: Base64 sequence length (105) not valid. Must be a multiple of 4. Does this mean that I would have to pad each member with zeroes to make them 4 long, i.e. [0071,0073,0070,...]?
I got it figured in the end - what the web service meant by a ByteArray (byte[]) looked something like:
/9j/4AAQSkZJRgABAgEAYABgAAD/7gAOQWRvYmUAZAAAAAAB...
... aha, base 64 (not anywhere in their docs, I hasten to add)...
so I managed to get it working by using this:
encoded_data = base64.b64encode(open(file_name, 'rb').read())
strg = ''
for i in xrange((len(encoded_data)/40)+1):
strg += encoded_data[i*40:(i+1)*40]
# strg then contains data required
I found the inspiration right here - thanks to Doug Hellman
Try a bytearray.