Compare two separate JSONs - python

I have a resultant json from an intermediate stage as following
a=[{
"ID": "1201",
"SubID": "S1201",
"Information": {
"Name": "Kim",
"Age": "41"
}
}, {
"ID": "1433",
"subID": "G1433",
"Information": {
"Name": "John",
"Age": "32"
}
}]
I have another json that needs to compared with the above json
c= [{
"ID": "1201",
"SubID": "S1201"
},
{
"ID": "3211",
"subID": "G3211"
}
]
since the json object(a) in my intermediate result is present in another json(c). I want to retain only the json object which is being repeated.
expected output:
[{
"ID": "1201",
"SubID": "S1201",
"Information": {
"Name": "Kim",
"Age": "41"
}
}]
I'm not clear on what the approach to proceed with in achieving the same. Please guide me on this. Thanks.

ids = [e['ID'] for e in c]
repeated = [e for e in a if e['ID'] in ids]
print(repeated)

Related

Referring to parts of large JSON files

I am currently trying to have python parse JSON similar to the one at https://petition.parliament.uk/petitions/560216.json.
My problem is that the data I need is nested in a lot of parts and I don't know how to tell python which part to take.
A simplified version of the data I need is below
{
"data": {
"attributes": {
"signatures_by_country": [
{
"name": "Afghanistan",
"code": "AF",
"signature_count": 1
},
{
"name": "Algeria",
"code": "DZ",
"signature_count": 2
},
]
}
}
}
I am trying to pull the "signature_count" part.
The below code collect what you have asked to a list
data = {
"data": {
"attributes": {
"signatures_by_country": [
{
"name": "Afghanistan",
"code": "AF",
"signature_count": 1
},
{
"name": "Algeria",
"code": "DZ",
"signature_count": 2
},
]
}
}
}
counts = [x['signature_count'] for x in data['data']['attributes']['signatures_by_country']]
print(counts)
output
[1,2]
Count by country below
counts = [{x['name']:x['signature_count']} for x in data['data']['attributes']['signatures_by_country']]
output
[{'Afghanistan': 1}, {'Algeria': 2}]

Parse specific data from JSON

I have a JSON file with lots of data, and I want to keep only specific data.
I thought reading the file, get all the data I want and save as a new JSON.
The JSON is like this:
{
"event": [
{
"date": "2019-01-01",
"location": "world",
"url": "www.com",
"comments": "null",
"country": "china",
"genre": "blues"
},
{
"date": "2000-01-01",
"location": "street x",
"url": "www.cn",
"comments": "null",
"country":"turkey",
"genre": "reds"
},
{...
and I want it to be like this (with just date and url from each event.
{
"event": [
{
"date": "2019-01-01",
"url": "www.com"
},
{
"date": "2000-01-01",
"url": "www.cn"
},
{...
I can open the JSON and read from it using
with open('xx.json') as f:
data = json.load(f)
data2=data["events"]["date"]
But I still need to understand how to save the data I want in a new JSON keeping it's structure
You can use loop comprehension to loop over the events in and return a dictionary containing only the keys that you want.
data = { "event": [
{
"date": "2019-01-01",
"location": "world",
"url": "www.com",
"comments": None,
"country": "china",
"genre": "blues",
},
{
"date": "2000-01-01",
"location": "street x",
"url": "www.cn",
"comments": None,
"country" :"turkey",
"genre":"reds",
}
]}
# List comprehension
data["event"] = [{"date": x["date"], "url": x["url"]} for x in data["event"]]
Alternatively, you can map a function over the events list
keys_to_keep = ["date", "url"]
def subset_dict(d):
return {x: d[x] for x in keys_to_keep}
data["event"] = list(map(subset_dict, data["event"]))

Group and sort JSON array of dictionaries by repeatable keys in Python

I have a json that is a list of dictionaries that looks like this:
I am getting it from MySQL with pymysql
[{
"id": "123",
"name": "test",
"group": "test_group"
},
{
"id": "123",
"name": "test",
"group": "test2_group"
},
{
"id": "456",
"name": "test2",
"group": "test_group2"
},
{
"id": "456",
"name": "test2",
"group": "test_group3"
}]
I need to group it so each "name" will have just one dict and it will contain a list of all groups that under this name.
something like this :
[{
"id": "123",
"name": "test",
"group": ["test2_group", "test_group"]
},
{
"id": "456",
"name": "test2",
"group": ["test_group2", "test_group3"]
}]
I would like to get some help,
Thanks !
You can use itertools.groupby for grouping of data.
Although I don't guarantee solution below to be shortest way but it should do the work.
# Your input data
data = []
from itertools import groupby
res = []
key_func = lambda k: k['id']
for k, g in groupby(sorted(data, key=key_func), key=key_func):
obj = { 'id': k, 'name': '', 'group': []}
for group in g:
if not obj['name']:
obj['name'] = group['name']
obj['group'].append(group['group'])
res.append(obj)
print(res)
It should print the data in required format.

Extracting elements from json in python

I have the following json:
{
"request": {
"id": "123",
"url": "/aa/bb/cc",
"method": "GET",
"timestamp": "2018-08-09T08:41:38.432Z"
},
"response": {
"status": {
"code": 200,
"message": "OK"
},
"items": [
{
"id": "aaa",
"name": "w1"
},
{
"id": "bbb",
"name": "w2"
},
{
"id": "ccc",
"name": "w3"
}
]
}
}
I need to loop over items and print each name. I've tried the following code which doesn't work.
response = requests.get(url)
data = json.loads(response.content)
for group in data['response']['items']:
print data['response']['items'][group]['name']
When i replace group with 0 for example, I can access the first name:
data['response']['items'][0]['name']
However, I don't know in advanced how many elements are in the array.
As Joel mentioned, in the for loop,
for group in data['response']['items']:
you are assigning group the value from data['response']['items']. Hence group contains the value :
[
{
"id": "aaa",
"name": "w1"
},
{
"id": "bbb",
"name": "w2"
},
{
"id": "ccc",
"name": "w3"
}
]
So all you need to do is
print group['name']
You can use Pandas module and call read_json function.
import pandas as pd
df = pd.read_json(your_json_file.json)
for i in df.response['items']:
print(i['name'])
# w1
# w2
# w3
You could try this:
for i in range (0,len(d['response']['items'])):
print(d['response']['items'][i]['name'])
Output:
w1
w2
w3

Find a value in JSON using Python

I’ve previously succeeded in parsing data from a JSON file, but now I’m facing a problem with the function I want to achieve. I have a list of names, identification numbers and birthdate in a JSON. What I want to get in Python is to be able to let a user input a name and retrieve his identification number and the birthdate (if present).
This is my JSON example file:
[
{
"id_number": "SA4784",
"name": "Mark",
"birthdate": null
},
{
"id_number": "V410Z8",
"name": "Vincent",
"birthdate": "15/02/1989"
},
{
"id_number": "CZ1094",
"name": "Paul",
"birthdate": "27/09/1994"
}
]
To be clear, I want to input "V410Z8" and get his name and his birthdate.
I tried to write some code in Python but I only succeed in searching for “id_number” and not for what is inside “id_number” for example "V410Z8".
#!/usr/bin/python
# -*- coding: utf-8 -*-
import json
database = "example.json"
data = json.loads(open(database).read())
id_number = data[0]["id_number"]
print id_number
Thank you for your support, guys :)
You have to iterate over the list of dictionaries and search for the one with the given id_number. Once you find it you can print the rest of its data and break, assuming id_number is unique.
data = [
{
"id_number": "SA4784",
"name": "Mark",
"birthdate": None
},
{
"id_number": "V410Z8",
"name": "Vincent",
"birthdate": "15/02/1989"
},
{
"id_number": "CZ1094",
"name": "Paul",
"birthdate": "27/09/1994"
}
]
for i in data:
if i['id_number'] == 'V410Z8':
print(i['birthdate'])
print(i['name'])
break
If you have control over the data structure, a more efficient way would be to use the id_number as a key (again, assuming id_number is unique):
data = { "SA4784" : {"name": "Mark", "birthdate": None},
"V410Z8" : { "name": "Vincent", "birthdate": "15/02/1989"},
"CZ1094" : {"name": "Paul", "birthdate": "27/09/1994"}
}
Then all you need to do is try to access it directly:
try:
print(data["V410Z8"]["name"])
except KeyError:
print("ID doesn't exist")
>> "Vincent"
Using lamda in Python
data = [
{
"id_number": "SA4784",
"name": "Mark",
"birthdate": None
},
{
"id_number": "V410Z8",
"name": "Vincent",
"birthdate": "15/02/1989"
},
{
"id_number": "CZ1094",
"name": "Paul",
"birthdate": "27/09/1994"
}
]
Using Lambda and filter
print(list(filter(lambda x:x["id_number"]=="CZ1094",data)))
Output
[{'id_number': 'CZ1094', 'name': 'Paul', 'birthdate': '27/09/1994'}]
You can use list comprehension:
Given
data = [
{
"id_number": "SA4784",
"name": "Mark",
"birthdate": None
},
{
"id_number": "V410Z8",
"name": "Vincent",
"birthdate": "15/02/1989"
},
{
"id_number": "CZ1094",
"name": "Paul",
"birthdate": "27/09/1994"
}
]
to get the list item(s) with id_number equal to "V410Z8" you may use:
result = [x for x in data if x["id_number"]=="V410Z8"]
result will contain:
[{'id_number': 'V410Z8', 'name': 'Vincent', 'birthdate': '15/02/1989'}]
In case the if condition is not satisfied, result will contain an empty list: []
data = [
{
"id_number": "SA4784",
"name": "Mark",
"birthdate": None
},
{
"id_number": "V410Z8",
"name": "Vincent",
"birthdate": "14/02/1989"
},
{
"id_number": "CZ1093",
"name": "Paul",
"birthdate": "26/09/1994"
}
]
list(map(lambda x:x if x["id_number"]=="cz1093" ,data)
Output should be
[{
"id_number": "CZ1094",
"name": "Paul",
"birthdate": "26/09/1994"
}]
If you are only interested in one or a subset of total results, then I'd suggest a generator function as the fastest solution, since it will not unnecessarily iterate over every item regardless, and is more memory efficient:
def gen_func(data, search_term):
for i in data:
if i['id_number'] == search_term:
yield i
You can then run the following to retrieve results for CZ1094:
foo = gen_func(data, 'CZ1094')
next(foo)
{'id_number': 'CZ1094', 'name': 'Paul', 'birthdate': '27/09/1994'}
NB: You'll need to handle StopIteration at end of iterable.

Categories