Parse an embedded object (JSON) into an ordered dictionary in Python - python

I am looking to parse some JSON into a dictionary but need to preserve order for one particular part of the dictionary.
I know that I can parse the entire JSON file into an ordered dictionary (ex. Can I get JSON to load into an OrderedDict?) but this is not quite what I'm looking for.
{
"foo": "bar",
"columns":
{
"col_1": [],
"col_2": []
}
}
In this example, I would want to parse the entire file in as a dictionary with the "columns" portion being an OrderedDict. Is it possible to get that granular with the JSON parsing tools while guaranteeing that order is preserved throughout? Thank you!

From the comments meanwhile, I gathered that a complete, nested OrderedDict is fine as well, but this could be a solution too, if you don't mind using some knowledge about the names of the columns:
import json
from collections import OrderedDict
def hook(partialjson):
if "col_1" in partialjson:
return OrderedDict(partialjson)
return dict(partialjson)
result = json.loads("YOUR JSON STRING", object_hook=hook)
Hope this helps!

Related

Python json.loads changes the order of the object

I've got a file that contains a JSON object. It's been loaded the following way:
with open('data.json', 'r') as input_file:
input_data = input_file.read()
At this point input_data contains just a string, and now I proceed to parse it into JSON:
data_content = json.loads(input_data.decode('utf-8'))
data_content has the JSON representation of the string which is what I need, but for some reason not clear to me after json.loads it is altering the order original order of the keys, so for instance, if my file contained something like:
{ "z_id": 312312,
"fname": "test",
"program": "none",
"org": null
}
After json.loads the order is altered to let's say something like:
{ "fname": "test",
"program": None,
"z_id": 312312,
"org": "none"
}
Why is this happening? Is there a way to preserve the order? I'm using Python 2.7.
Dictionaries (objects) in python have no guaranteed order. So when parsed into a dict, the order is lost.
If the order is important for some reason, you can have json.loads use an OrderedDict instead, which is like a dict, but the order of keys is saved.
from collections import OrderedDict
data_content = json.loads(input_data.decode('utf-8'), object_pairs_hook=OrderedDict)
This is not an issue with json.load. Dictionaries in Python are not order enforced, so you will get it out of order; generally speaking, it doesn't matter, because you access elements based on strings, like "id".

How to check the completeness of JSON data

I get data in JSON from an API and it may be that the received data is not complete (= some fields are missing). I am not sure either that the structure of the data follows JSON standards.
The solution for the second problem is simple: I will try: to decode the JSON and act accordingly on ValueError and TypeError exceptions.
For the first problem, my solution would also be to
d = {'a': 1}
try:
d['a']
d['b']
d['x']['shouldbethere']
except KeyError:
(...)
that is to list all the keys I need to have in the dict created from a successful JSON conversion.
This made me think that there may be a method to declare the expected keys (and possibly values types) and match the retrieved JSON against it, an unsuccessful match raising a specific exception?
Standard way to validate JSON structure is to use JSON Schema.
Basic characteristics (quoted from official webpage) are:
JSON Schema:
describes your existing data format
clear, human- and machine-readable documentation
complete structural validation, useful for
automated testing
validating client-submitted data
There is no built-in package to validate JSON object against schema, although you may use jsonschema from pypi.
Sample usage (paraphrased from official docs) may be:
import jsonschema
schema = {
"type": "object",
"properties": {
"price": {"type": "number"},
"name": {"type": "string"},
},
}
jsonschema.validate({"name": "Eggs", "price": 34.99}, schema)
# No exception from line above - document is valid
jsonschema.validate({"name": "Eggs", "price": "Invalid"}, schema)
# ValidationError: 'Invalid' is not of type 'number'
JSON parsers aren't terribly easy to use for error correction, so if the data isn't JSON I think it would be very difficult to apply any kind of auto-correction to allow you to parse it, so your solution for the invalid JSON is probably the most reasonable decision.
A function to verify that a dict contains a particular set of keys is relatively easy to implement. I'm not aware of any JSON object methods to perform that test, but given a JSON object j you could check it as follows (it might also be sensible to check that it's a dict, since JSON objects can also be of other types):
def has_all_keys(j, keylist):
return all(key in j for key in keylist)
Using this interactively suggests is might work (in this example I rely on the fact that iteration over a string yields the individual characters, but obviously you will need a real list of string key values).
>>> has_all_keys({}, "abc")
False
>>> has_all_keys({'a':1, 'b':1, 'c':1}, "abc")
True

python json dump, how to make specify key first?

I want to dump this json to a file:
json.dumps(data)
This is the data:
{
"list":[
"one": { "id": "12","desc":"its 12","name":"pop"},
"two": {"id": "13","desc":"its 13","name":"kindle"}
]
}
I want id to be the first property after I dump it to file, but it is not. How can I fix this?
My guess is that it's because you're using a dictionary (hash-map). It's unsortable.
What you could do is:
from collections import OrderedDict
data = OrderedDict()
data['list'] = OrderedDict()
data['list']['one'] = OrderedDict()
data['list']['one']['id'] = '12'
data['list']['one']['idesc'] = ...
data['list']['two'] = ...
This makes it sorted by order of input.
It's "impossible" to know the output of a dict/hashmap because the nature (and speed) of a traditional dictionary makes the sort/access order vary depending on usage, items in the dictionary and a lot of other factors.
So you need to either pass your dictionary to a sort() function prior to sending it to json or use a slower version of the dictionary called OrderedDict (see above).
Many thanks goes out to #MarcoNawijn for checking the source of JSON that does not honor the sort structure of the dictionary, which means you'll have to build the JSON string yourself.
If the parser on the other end of your JSON string honors the order (which i doubt), you could pass this to a function that builds a regular text-string representation of your OrderedDict and formatting the string as per JSON standards. This will however take up more time than I have at this moment since i'm not 100% certain of the RFC for JSON strings.
You shouldnt worry about the order in which json is saved. The order will be changed when dumping. Better look at these too. JSON order mixed up
and
Is the order of elements in a JSON list maintained?

Which API response format to choose if there are json and text?

I'm coding an API library in Python, I always chose json before as the response format but this API provides text and json formats, I'm not asking which one is easier or better, I want to know the advantages and the disadvantages of using both as I only worked with json before.
I thought about using text format, it's very easy to parse but the only thing I thought about was the nested elements, but after checking the example they're separated with underscores _ for example:
name=VALUE
lastname=VALUE
age=VALUE
contact_email=VALUE
contact_phone=VALUE
contact_mobile=VALUE
same json response:
{
"name": "VALUE",
"lastname": "VALUE",
"age": "VALUE",
"contact": {
"email": "VALUE",
"phone": "VALUE",
"mobile": "VALUE"
}
}
So is there any advantages or disadvantages of using text over json or the other way around ?
IMHO the main advantages of json over text would be:
DRY - there are third party libs for json processing which are quite performant
The parsed json ends up being a dict, which you cam refer by keys easily
For the text variant, there is the .ini format, which could offer something aling the lines of structure, although the format you described above is not really designed for nested fields and structures.
I'd also look at who's consuming your API. E.g. if it's a web app, then json is the accepted format. if your consumer is something else which is more comfortable with the text format...
JSON can be easier as you can use the json library.
The return you've shown there can be put into a dictionary with the line:
import json
json.loads(return_string)
Much easier to parse than the text!

Access a variable within a dictionary with unknown nested location

I have a JSON file and I want to query it using python. However, I do not know the nested location of a variable before hand. E.g. to query a JSON object below loaded into python and called 'data', I could do the following:
data['experiments']['initial_ns']['icdat']
However, this assumes that I know that the icdat variable is located below initial_ns which is located under experiments. Unfortunately I do not have this information and also the JSON structure could change in the future. Is there a simpler variable to access variables within a JSON string without explicitly specifying the entire structure?
thanks!!!
{
"experiments": [
{
"management": {
"events": [
{
"date": "19122",
"timp": "TI3",
"eve": "tage"
}
]
},
"initial_ns": {
"icpcr": "MZ",
"icdat": "1922"
},
"observed": {
"mdat": "19403",
"time_series": [
{
"date": "198423",
"etac": "0"
}
],
"adat": "190218"
},
"local_name": "lhi",
"exname": "SE",
"exp_dur": "1"
}
]
}
Have a look at the jsonpath module. http://goessner.net/articles/JsonPath/. I think the search string $..icdat will match your needs.
"...without explicitly specifying the entire structure?"
Yes, there are many ways. Unfortunately you have not specified which answer you are looking for.
To be "unique in terms of the schema" (my terminology) is as follows: If you have for example multiple Foo dictionaries with the key Foo.bar, then that is still unique. What is not unique is if you have Foo objects with Foo.bar, and Baz objects with Baz.bar: searching for {... baz:...} will return different kinds of objects.
If the key is unique in terms of the schema, you can search the entire tree. You can make this go faster by caching all key-value pairs in a dictionary for later use (therefore the operation is O(1) "instant" amortized cost, since you needed to go through the entire data structure anyway to parse it!). This even works if you would like to return sets of objects: use a cache = collections.defaultdict(set) and when you preprocess items to cache, do cache[key].add(value).
If the key is not unique in terms of the schema, you will want to make a reasonable guess about the path and provide some partial information, per Hans Then's answer utilization JsonPath: https://stackoverflow.com/a/12291240/711085 (alternatively, change the schema)
No. You need to know the format, or you'll have to manually loop over everything in it.
You can write a function to recursively search nested containers for a given key, similar to findElementByID() in an XML DOM parser.
def find_key(json, key):
if isinstance(json, dict):
if key in json:
yield json[key]
if isinstance(json, (dict, list)):
for value in (json.itervalues() if isinstance(json, dict) else json):
if isinstance(value, (dict, list)):
for item in find_key(value, key):
yield item
>>> next(items_by_key(data, "icdat"))
'1922'
Since the same key may be found in multiple places in the document, this is actually written as a generator. You can iterate over the results to get all the values or, if you just want the first one (or know it's the only one), use next() around it as I've shown above. You could also convert it to a list() if desired.

Categories