Can valid JSON be invalid Python?

Can valid JSON be invalid Python? - python

This is a simple questions that is really only a footnote in something I am writing:
Is any valid JSON not also valid Python?
I know the converse is true, i.e. Python data structures and scalars allow a variety of constructs that are not JSON. But for the most part, JSON seems to be a subset of Python syntax for defining (some) data structures.
The obvious stuff is covered. Strings are strings. Ints are ints. JSON "numbers" are read as Python floats (although RFC 8259 does not mandate that interpretation vs. fixed point, for example). Dicts are dicts. Lists are lists.
But maybe something in some obscure corner violates the subset relationship. For example, is there anything in the encoding of Unicode outside the BMP that is directly incompatible? Or maybe within Unicode surrogate pairs?
Or maybe something with numbers where some large number of digits after the decimal would be technically valid JSON but not Python? (I don't think so, but just trying to think of scenarios).

The most obvious thing is that true, false and null don't exist in Python. They are called True, False and None.
In addition, \/ in strings is interpreted as / in json and as \/ in Python:
>>> a = '"\/"'
>>> print(a)
"\/"
>>> print(eval(a))
\/
>>> print(json.loads(a))
/

Yes, you are correct, every valid JSON can be handled in Python. Python is a complete language, and JSON is a way of storing data (serialisation maybe?). Generally a language will support everything a JSON object can represent.
There would be different representation for sure like true in JSON is True in Python.
Since, JSON is way of storing data, and we can also pass it around HTTP requests, which are always processed by some server side language, which is expected to handle the JSON object.

Related

Python, Need Long Value to be Integer

I'm trying to insert a unix timestamp using REST to a webservice. And when I convert the dictionary I get the value: 1392249600000L I need this value to be an integer.
So I tried int(1392249600000L) and I get 1392249600000L, still a long value.
The reason I need this is because the JSON webservice only accepts timestamsp with milliseconds in them, but when I pass the JSON value with the 'L' in it I get an invalid JSON Primative of value 1392249600000L error.
Can someone please help me resolve this? It seems like it should be so easy, but it's driving me crazy!

You should not be using Python representations when you are sending JSON data. Use the json module to represent integers instead:
>>> import json
>>> json.dumps(1392249600000L)
'1392249600000'
In any case, the L is only part of the string representation to make debugging easier, making it clear you have a long, not int value. Don't use Python string representations for network communications, in any case.
For example, if you have a list of Python values, the str() representation of that list will also use repr() representations of the contents of the list, resulting in L postfixes for long integers. But json.dumps() handles such cases properly too, and handle other types correctly too (like Python None to JSON null, Python True to JSON true, etc.):
>>> json.dumps([1392249600000L, True, None])
'[1392249600000, true, null]'

Separate strings from other iterables in python 3

I'm trying to determine whether a function argument is a string, or some other iterable. Specifically, this is used in building URL parameters, in an attempt to emulate PHP's &param[]=val syntax for arrays - so duck typing doesn't really help here, I can iterate through a string and produce things like &param[]=v&param[]=a&param[]=l, but this is clearly not what we want. If the parameter value is a string (or a bytes? I still don't know what the point of a bytes actually is), it should produce &param=val, but if the parameter value is (for example) a list, each element should receive its own &param[]=val. I've seen a lot of explanations about how to do this in 2.* involving isinstance(foo, basestring), but basestring doesn't exist in 3.*, and I've also read that isinstance(foo, str) will miss more complex strings (I think unicode?). So, what is the best way to do this without causing some types to be lost to unnecessary errors?

You've been seeing things that somewhat conflict based on Python 2 vs 3. In Python 3, isinstance(foo, str) is almost certainly what you want. bytes is for raw binary data, which you probably can't include in an argument string like that.
The python 2 str type stored raw binary data, usually a string in some specific encoding like utf8 or latin-1 or something; the unicode type stored a more "abstract" representation of the characters that could then be encoded into whatever specific encoding. basestring was a common ancestor for both of them so you could easily say "any kind of string".
In python 3, str is the more "abstract" type, and bytes is for raw binary data (like a string in a specific encoding, or whatever raw binary data you want to handle). You shouldn't use bytes for anything that would otherwise be a string, so there's not a real reason to check if it's either str or bytes. If you absolutely need to, though, you can do something like isinstance(foo, (str, bytes)).

Mapping Unicode to ASCII in Python

I receive strings after querying via urlopen in JSON format:
def get_clean_text(text):
return text.translate(maketrans("!?,.;():", " ")).lower().strip()
for track in json["tracks"]:
print track["name"].lower()
get_clean_text(track["name"].lower())
For the string "türlich, türlich (sicher, dicker)" I then get
File "main.py", line 23, in get_clean_text
return text.translate(maketrans("!?,.;():", " ")).lower().strip()
TypeError: character mapping must return integer, None or unicode
I want to format the string to be "türlich türlich sicher dicker".

The question is not a complete self-contained example; I can't be sure whether it's Python 2 or 3, where maketrans came from, etc. There's a good chance I will guess wrong, which is why you should be sure to tag your questions appropriately and provide a short, self contained, correct example. (That, and the fact that various other people—some of them probably smarter than me—likely ignored your question because it was ambiguous.)
Assuming you're using 2.x, and you've done a from string import * to get maketrans, and json["name"] is unicode rather than str/bytes, here's your problem:
There are two kinds of translation tables: old-style 8-bit tables (which are just an array of 256 characters) and new-style sparse tables (which are just a dict mapping one character's ordinal to another). The str.translate function can use either, but unicode.translate can only use the second (for reasons that should be obvious if you think about it for a bit).
The string.maketrans function makes old-style 8-bit translation tables. So you can't use it with unicode.translate.
You can always write your own "makeunitrans" function as a drop-in replacement, something like this:
def makeunitrans(frm, to):
return {ord(f):ord(t) for (f,t) in zip(frm, to)}
But if you just want to map out certain characters, you could do something a bit more special purpose:
def makeunitrans(frm):
return {ord(f):ord(' ') for f in frm}
However, from your final comment, I'm not sure translate is even what you want:
I want to format the string to be "türlich türlich sicher dicker"
If you get this right, you're going to format the string to be "türlich türlich sicher dicker ", because you're mapping all those punctuation characters to spaces, not nothing.
With new-style translation tables you can map anything you want to None, which solves that problem. But you might want to step back and ask why you're using the translate method in the first place instead of, e.g., calling replace multiple times (people usually say "for performance", but you wouldn't be building the translation table in-line every time through if that were an issue) or using a trivial regular expression.

How do I get rid of the "u" from a decoded JSON object?

I have a dictionary of dictionaries in Python:
d = {"a11y_firesafety.html":{"lang:hi": {"div1": "http://a11y.in/a11y/idea/a11y_firesafety.html:hi"}, "lang:kn": {"div1": "http://a11y.in/a11ypi/idea/a11y_firesafety.html:kn}}}
I have this in a JSON file and I encoded it using json.dumps(). Now when I decode it using json.loads() in Python I get a result like this:
temp = {u'a11y_firesafety.html': {u'lang:hi': {u'div1': u'http://a11y.in/a11ypi/idea/a11y_firesafety.html:hi'}, u'lang:kn': {u'div1': u'http://a11y.in/a11ypi/idea/a11y_firesafety.html:kn'}}}
My problem is with the "u" which signifies the Unicode encoding in front of every item in my temp (dictionary of dictionaries). How to get rid of that "u"?

Why do you care about the 'u' characters? They're just a visual indicator; unless you're actually using the result of str(temp) in your code, they have no effect on your code. For example:
>>> test = u"abcd"
>>> test == "abcd"
True
If they do matter for some reason, and you don't care about consequences like not being able to use this code in an international setting, then you could pass in a custom object_hook (see the json docs here) to produce dictionaries with string contents rather than unicode.

You could also use this:
import fileinput
fout = open("out.txt", 'a')
for i in fileinput.input("in.txt"):
str = i.replace("u\"","\"").replace("u\'","\'")
print >> fout,str
The typical json responses from standard websites have these two encoding representations - u' and u"
This snippet gets rid of both of them. It may not be required as this encoding doesn't hinder any logical processing, as mentioned by previous commenter

There is no "unicode" encoding, since unicode is a different data type and I don't really see any reason unicode would be a problem, since you may always convert it to string doing e.g. foo.encode('utf-8').
However, if you really want to have string objects upfront you should probably create your own decoder class and use it while decoding JSON.

when converting a python list to json and back, do you cast?

When you convert a list of user objects into json, and then convert it back to its original state, do you have to cast?
Are there any security issues of taking a javascript json object and converting it into a python list object?

json.dumps(somepython) gives you a valid JSON string representing the Python object somepython (which may perfectly well be a list) and json.loads(ajsonstring) goes the other way 'round -- both without any security issue nor "cast" (?). That's with Python 2.6 or better, using the json module in the standard library. If you're stuck with 2.5 (e.g., for use on Google App Engine), you can use the equivalent third-party module simplejson.

You will be responsible for writing python to encode and decode your classes. How are you encoding them? That will have a large bearing on how you decode them. Python will not do either for you if you step beyond dicts, lists, unicode, strings, ints, floats, booleans, and None.
The canonical way to encode custom classes is to subclass json.JSONEncoder and provide a default method. The default method has signature 'self, obj' and returns obj encoded in json if it knows how to and returns super(clsname, self).default(obj) if does not.
If you encode your classes as dicts, then you can write a function that accepts one argument (a decoded dictionary) and returns the decoded object from that. Then pass this function to the constructor for json.JSONDecoder and use the decode method on that instance.
All in all, json is not ideally suited for serializing complex classes. If you can capture the entire state of a function in such a way that it can be passed to the init method, then have at it but if not, then you'll just hurt your head trying.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can valid JSON be invalid Python? - python

The most obvious thing is that true, false and null don't exist in Python. They are called True, False and None. In addition, \/ in strings is interpreted as / in json and as \/ in Python: >>> a = '"\/"' >>> print(a) "\/" >>> print(eval(a)) \/ >>> print(json.loads(a)) /

Related

Python, Need Long Value to be Integer

Separate strings from other iterables in python 3

Mapping Unicode to ASCII in Python

How do I get rid of the "u" from a decoded JSON object?

when converting a python list to json and back, do you cast?

Categories

Resources