Sorting a JSON file by a certain key - python

I am coding in Python.
I have a carV.json file with content
{"CarValue": "59", "ID": "100043" ...}
{"CarValue": "59", "ID": "100013" ...}
...
How can I sort the file content into
{"CarValue": "59", "ID": "100013" ...}
{"CarValue": "59", "ID": "100043" ...}
...
using the "ID" key to sort?
I tried different methods to read and perform the sort, but always ended up getting errors like "no sort attribute" or ' "unicode' object has no attribute 'sort'".

There are several steps:
Read the file using json.load()
Sort the list of objects using list.sort()
Use a key-function to specify the sort field.
Use operator.itemgetter() to extract the field of interest
Write the data with json.dump()
Here's some code to get you started:
import json, operator
s = '''\
[
{"CarValue": "59", "ID": "100043"},
{"CarValue": "59", "ID": "100013"}
]
'''
data = json.loads(s)
data.sort(key=operator.itemgetter('ID'))
print(json.dumps(data, indent=2))
This outputs:
[
{
"CarValue": "59",
"ID": "100013"
},
{
"CarValue": "59",
"ID": "100043"
}
]
For your application, open the input file and use json.load() instead of json.loads(). Likewise, open a output file and use json.dump() instead of json.dumps(). You can drop the indent parameter as well, that is just to make the output look nicely formatted.

simple and probably faster in case of large data - pandas.DataFrame.to_json
>>> import pandas as pd
>>> unsorted = pd.read_json("test.json")
>>> (unsorted.sort_values("ID")).to_json("sorted_test.json")
>>> sorted = unsorted.sort_values("ID")
>>> sorted
CarValue ID
1 59 100013
0 59 100043
>>> sorted.to_json("n.JSON")

Related

Writing nested lists to json

I need to figure out how to structure my data in python such that when I call dumps and write to file I get the following structure of data as an example:
{
"Prop_a": [
{
"car": "brown",
"color": "yellow",
"engine": [
{
"mod_a": "x1",
"name": [
{
"diesel": "yes",
}
...
As you can see, I have nested elements that need expanded. The end-goal is to import the data into a database, I need JSON or CSV formatted data to do it.
EDIT: To all: I can easily print a single level of the dict of lists to JSON. What I need assistance with is how to format the nested structure.
EDIT #2:
Since code is being requested...
my_dict = {}
for x in group:
my_dict[x] = []
for y in sub_group:
mydict[x].append(data_symbol_reference)
Produces an output like:
{
"Prop_a" : [
"car",
"color",
],
...
I need assistance on the nesting the dict of lists within the list structure.
Get the official docs for python 3.6: JSON encoder and decoder

Python - convert JSON object to dict

I am wondering how I can convert a json list to a dictionary using the two values of the JSON objects as the key/value pair.
The JSON looks like this:
"test": [
{
"name": "default",
"range": "100-1000"
},
{
"name": "bigger",
"range": "1000-10000"
}
]
I basically want the dictionary to use the name as the key and the range as the value. SO the dictionary in this case would be {default:100-1000} {bigger: 1000-10000}
Is that possible?
You can first load the JSON string into a dictionary with json.loads. Next you can use dictionary comprehension to post process it:
from json import loads
{ d['name'] : d['range'] for d in loads(json_string)['test'] }
We then obtain:
>>> { d['name'] : d['range'] for d in loads(json_string)['test'] }
{'bigger': '1000-10000', 'default': '100-1000'}
In case there are two sub-dictionaries with the same name, then the last one will be stored in the result.

For each loop with JSON object python

Alright, so I'm struggling a little bit with trying to parse my JSON object.
My aim is to grab the certain JSON key and return it's value.
JSON File
{
"files": {
"resources": [
{
"name": "filename",
"hash": "0x001"
},
{
"name": "filename2",
"hash": "0x002"
}
]
}
}
I've developed a function which allows me to parse the JSON code above
Function
def parsePatcher():
url = '{0}/{1}'.format(downloadServer, patcherName)
patch = urllib2.urlopen(url)
data = json.loads(patch.read())
patch.close()
return data
Okay so now I would like to do a foreach statement which prints out each name and hash inside the "resources": [] object.
Foreach statement
for name, hash in patcher["files"]["resources"]:
print name
print hash
But it only prints out "name" and "hash" not "filename" and "0x001"
Am I doing something incorrect here?
By using name, hash as the for loop target, you are unpacking the dictionary:
>>> d = {"name": "filename", "hash": "0x001"}
>>> name, hash = d
>>> name
'name'
>>> hash
'hash'
This happens because iteration over a dictionary only produces the keys:
>>> list(d)
['name', 'hash']
and unpacking uses iteration to produce the values to be assigned to the target names.
That that worked at all is subject to random events even, on Python 3.3 and newer with hash randomisation enabled by default, the order of those two keys could equally be reversed.
Just use one name to assign the dictionary to, and use subscription on that dictionary:
for resource in patcher["files"]["resources"]:
print resource['name']
print resource['hash']
So what you intend to do is :
for dic in x["files"]["resources"]:
print dic['name'],dic['hash']
You need to iterate on those dictionaries in that array resources.
The problem seems to be you have a list of dictionaries, first get each element of the list, and then ask the element (which is the dictionary) for the values for keys name and hash
EDIT: this is tested and works
mydict = {"files": { "resources": [{ "name": "filename", "hash": "0x001"},{ "name": "filename2", "hash": "0x002"}]} }
for element in mydict["files"]["resources"]:
for d in element:
print d, element[d]
If in case you have multiple files and multiple resources inside it. This generalized solution works.
for keys in patcher:
for indices in patcher[keys].keys():
print(patcher[keys][indices])
Checked output from myside
for keys in patcher:
... for indices in patcher[keys].keys():
... print(patcher[keys][indices])
...
[{'hash': '0x001', 'name': 'filename'}, {'hash': '0x002', 'name': 'filename2'}]

Python - Searching JSON

I have JSON output as follows:
{
"service": [{
"name": ["Production"],
"id": ["256212"]
}, {
"name": ["Non-Production"],
"id": ["256213"]
}]
}
I wish to find all ID's where the pair contains "Non-Production" as a name.
I was thinking along the lines of running a loop to check, something like this:
data = json.load(urllib2.urlopen(URL))
for key, value in data.iteritems():
if "Non-Production" in key[value]: print key[value]
However, I can't seem to get the name and ID from the "service" tree, it returns:
if "Non-Production" in key[value]: print key[value]
TypeError: string indices must be integers
Assumptions:
The JSON is in a fixed format, this can't be changed
I do not have root access, and unable to install any additional packages
Essentially the goal is to obtain a list of ID's of non production "services" in the most optimal way.
Here you go:
data = {
"service": [
{"name": ["Production"],
"id": ["256212"]
},
{"name": ["Non-Production"],
"id": ["256213"]}
]
}
for item in data["service"]:
if "Non-Production" in item["name"]:
print(item["id"])
Whatever I see JSON I think about functionnal programming ! Anyone else ?!
I think it is a better idea if you use function like concat or flat, filter and reduce, etc.
Egg one liner:
[s.get('id', [0])[0] for s in filter(lambda srv : "Non-Production" not in srv.get('name', []), data.get('service', {}))]
EDIT:
I updated the code, even if data = {}, the result will be [] an empty id list.

Python how to handle # in a dictionary

I've got some json from last.fm's api which I've serialised into a dictionary using simplejson. A quick example of the basic structure is below.
{
"artist": "similar": {
"artist": {
"name": "Blah",
"image": [{
"#text": "URLHERE",
"size": "small"
}, {
"#text": "URLHERE",
"size": "medium"
}, {
"#text": "URLHERE",
"size": "large"
}]
}
}
}
Any ideas how I can access the image urls of various different sizes?
Thanks,
Jack
Python does not have any problem with # in strings used as dict keys.
>>> import json
>>> j = '{"#foo": 6}'
>>> print json.loads(j)
{u'#foo': 6}
>>> print json.loads(j)[u'#foo']
6
>>> print json.loads(j)['#foo']
6
There are, however, problems with the JSON you post. For one, it isn't valid (perhaps you're missing a couple commas?). For two, you have a JSON object with the same key "image" three times, which cannot coexist and do anything useful.
In Javascript, these two syntaxes are equivalent:
o.foo
o['foo']
In Python they are not. The first gives you the foo attribute, the second gives you the foo key. (It's debatable whether this was a good idea or not.) In Python, you wouldn't be able to access #text as:
o.#text
because the hash will start a comment, and you'll have a syntax error.
But you want
o['#text']
in any case.
You can get what you want from the image list with a list comprehension. Something like
desired = [x for x in images if minSize < x['size'] < maxSize]
Here, images would be the list of dicts from the inner level of you data structure.

Categories