How to group a json by a nested key using Python? - python

Lets say we have a json object in Python:
myJson = [
{
"id": "123",
"name": "alex",
"meta": {
"city": "boston"
}
},
{
"id": "234",
"name": "mike",
"meta": {
"city": "seattle"
}
},
{
"id": "345",
"name": "jess",
"meta": {
"city": "boston"
}
}
]
What is the most efficient way to group this data by city, so that we end up with a json in which we group the data by city such that we end up with a json as:
myNewJson = [
{
"city": "boston",
"people": [ ... ... ]
},
{
"city": "seattle",
"people": [ ... ]
}
]
... in which the content of the people are included in "people" key.
Thanks!

Try:
myJson = [
{"id": "123", "name": "alex", "meta": {"city": "boston"}},
{"id": "234", "name": "mike", "meta": {"city": "seattle"}},
{"id": "345", "name": "jess", "meta": {"city": "boston"}},
]
out = {}
for d in myJson:
out.setdefault(d["meta"]["city"], []).append(d["name"])
out = [{"city": k, "people": v} for k, v in out.items()]
print(out)
Prints:
[
{"city": "boston", "people": ["alex", "jess"]},
{"city": "seattle", "people": ["mike"]},
]

Seems like a dictionary could work. Use city names as the keys, and a list as the value. Then at the end, go through the dictionary and convert it to a list.
myJson = [
{
"id": "123",
"name": "alex",
"meta": {
"city": "boston"
}
},
{
"id": "234",
"name": "mike",
"meta": {
"city": "seattle"
}
},
{
"id": "345",
"name": "jess",
"meta": {
"city": "boston"
}
}
]
d = dict() # dictionary of {city: list of people}
for e in myJson:
city = e['meta']['city']
if city not in d:
d[city] = list()
d[city].append(e['name'])
# convert dictionary to list of json
result = list()
for key, val in d.items():
result.append({'city': key, 'people': val})
print(result)

Related

Getting all the Keys from JSON Object?

Goal: To create a script that will take in nested JSON object as input and output a CSV file with all keys as rows in the CSV?
Example:
{
"Document": {
"DocumentType": 945,
"Version": "V007",
"ClientCode": "WI",
"Shipment": [
{
"ShipmentHeader": {
"ShipmentID": 123456789,
"OrderChannel": "Shopify",
"CustomerNumber": 234234,
"VendorID": "2343SDF",
"ShipViaCode": "FEDX2D",
"AsnDate": "2018-01-27",
"AsnTime": "09:30:47-08:00",
"ShipmentDate": "2018-01-23",
"ShipmentTime": "09:30:47-08:00",
"MBOL": 12345678901234568,
"BOL": 12345678901234566,
"ShippingNumber": "1ZTESTTEST",
"LoadID": 321456987,
"ShipmentWeight": 10,
"ShipmentCost": 2.3,
"CartonsTotal": 2,
"CartonPackagingCode": "CTN25",
"OrdersTotal": 2
},
"References": [
{
"Reference": {
"ReferenceQualifier": "TST",
"ReferenceText": "Testing text"
}
}
],
"Addresses": {
"Address": [
{
"AddressLocationQualifier": "ST",
"LocationNumber": 23234234,
"Name": "John Smith",
"Address1": "123 Main St",
"Address2": "Suite 12",
"City": "Hometown",
"State": "WA",
"Zip": 92345,
"Country": "USA"
},
{
"AddressLocationQualifier": "BT",
"LocationNumber": 2342342,
"Name": "Jane Smith",
"Address1": "345 Second Ave",
"Address2": "Building 32",
"City": "Sometown",
"State": "CA",
"Zip": "23665-0987",
"Country": "USA"
}
]
},
"Orders": {
"Order": [
{
"OrderHeader": {
"PurchaseOrderNumber": 23456342,
"RetailerPurchaseOrderNumber": 234234234,
"RetailerOrderNumber": 23423423,
"CustomerOrderNumber": 234234234,
"Department": 3333,
"Division": 23423,
"OrderWeight": 10.23,
"CartonsTotal": 2,
"QTYOrdered": 12,
"QTYShipped": 23
},
"Cartons": {
"Carton": [
{
"SSCC18": 12345678901234567000,
"TrackingNumber": "1ZTESTTESTTEST",
"CartonContentsQty": 10,
"CartonWeight": 10.23,
"LineItems": {
"LineItem": [
{
"LineNumber": 1,
"ItemNumber": 1234567890,
"UPC": 9876543212,
"QTYOrdered": 34,
"QTYShipped": 32,
"QTYUOM": "EA",
"Description": "Shoes",
"Style": "Tall",
"Size": 9.5,
"Color": "Bllack",
"RetailerItemNumber": 2342333,
"OuterPack": 10
},
{
"LineNumber": 2,
"ItemNumber": 987654321,
"UPC": 7654324567,
"QTYOrdered": 12,
"QTYShipped": 23,
"QTYUOM": "EA",
"Description": "Sunglasses",
"Style": "Short",
"Size": 10,
"Color": "White",
"RetailerItemNumber": 565465456,
"OuterPack": 12
}
]
}
}
]
}
}
]
}
}
]
}
}
In the above JSON Object, I want all the keys (nested included) in a List (Duplicates can be removed by using a set Data Structure). If Nested Key Occurs like in actual JSON they can be keys multiple times in the CSV !
I personally feel that recursion is a perfect application for this type of problem if the amount of nests you will encounter is unpredictable. Here I have written an example in Python of how you can utilise recursion to extract all keys. Cheers.
import json
row = ""
def extract_keys(data):
global row
if isinstance(data, dict):
for key, value in data.items():
row += key + "\n"
extract_keys(value)
elif isinstance(data, list):
for element in data:
extract_keys(element)
# MAIN
with open("input.json", "r") as rfile:
dicts = json.load(rfile)
extract_keys(dicts)
with open("output.csv", "w") as wfile:
wfile.write(row)

Merge Json with same key value pairs

I got a resultant json from an API in the following format
[{
"Uid": "40cc6103-1cf0-4735-b882-d14d32018e58",
"Id": "9e1a0057-4570-4a6e-8ff5-88b2facbaf4e",
"Details": {
"Name": "Kiran"
}
}, {
"Uid": "40cc6103-1cf0-4735-b882-d14d32018e58",
"Id": "9e1a0057-4570-4a6e-8ff5-88b2facbaf4e",
"Details": {
"Age": "24"
}
},
{
"Uid": "196f5865-e9fe-4847-86ae-97d0bf57b816",
"Id": "84909ecb-c92e-48a7-bcaa-d478bf3a9220",
"Details": {
"Name": "Shreyas"
}
}
]
since the Uid and Id are same for multiple entires, can I club them togeather with Details key being the comma seperate key,value pair? Something like mentioned below
[{
"Uid": "40cc6103-1cf0-4735-b882-d14d32018e58",
"Id": "9e1a0057-4570-4a6e-8ff5-88b2facbaf4e",
"Details": {
"Name": "Kiran",
"Age": "24"
}
},
{
"Uid": "196f5865-e9fe-4847-86ae-97d0bf57b816",
"Id": "84909ecb-c92e-48a7-bcaa-d478bf3a9220",
"Details": {
"Name": "Shreyas"
}
}]
Please Guide me on this for the approach to be followed. Thanks
What you need is the dictionary function update(). Here's an example:
A = [{
"Uid": "40cc6103-1cf0-4735-b882-d14d32018e58",
"Id": "9e1a0057-4570-4a6e-8ff5-88b2facbaf4e",
"Details": {
"Name": "Kiran"
}
}, {
"Uid": "40cc6103-1cf0-4735-b882-d14d32018e58",
"Id": "9e1a0057-4570-4a6e-8ff5-88b2facbaf4e",
"Details": {
"Age": "24"
}
},
{
"Uid": "196f5865-e9fe-4847-86ae-97d0bf57b816",
"Id": "84909ecb-c92e-48a7-bcaa-d478bf3a9220",
"Details": {
"Name": "Shreyas"
}
}
]
B = []
def find(uid, id_):
for i, d in enumerate(B):
if d['Uid'] == uid and d['Id'] == id_:
return i
return -1
for d in A:
if (i := find(d['Uid'], d['Id'])) < 0:
B.append(d)
else:
B[i]['Details'].update(d['Details'])
print(B)
Prettyfied output:
[
{
"Uid": "40cc6103-1cf0-4735-b882-d14d32018e58",
"Id": "9e1a0057-4570-4a6e-8ff5-88b2facbaf4e",
"Details": {
"Name": "Kiran",
"Age": "24"
}
},
{
"Uid": "196f5865-e9fe-4847-86ae-97d0bf57b816",
"Id": "84909ecb-c92e-48a7-bcaa-d478bf3a9220",
"Details": {
"Name": "Shreyas"
}
}
]
Note:
This could be very inefficient if your API response contains very large numbers of dictionaries. You might need a completely different approach
You should iterate over the list and merge with accumulator with (Uid, Id) as key:
from typing import Dict, List
l = [{
"Uid": "40cc6103-1cf0-4735-b882-d14d32018e58",
"Id": "9e1a0057-4570-4a6e-8ff5-88b2facbaf4e",
"Details": {
"Name": "Kiran"
}
}, {
"Uid": "40cc6103-1cf0-4735-b882-d14d32018e58",
"Id": "9e1a0057-4570-4a6e-8ff5-88b2facbaf4e",
"Details": {
"Age": "24"
}
},
{
"Uid": "196f5865-e9fe-4847-86ae-97d0bf57b816",
"Id": "84909ecb-c92e-48a7-bcaa-d478bf3a9220",
"Details": {
"Name": "Shreyas"
}
}
]
def mergeItem(it: Dict, acc: Dict) -> Dict:
uid = it["Uid"]
id = it["Id"]
if (uid, id) in acc:
acc[(uid, id)] = {"Uid": uid, "Id": id, "Details": {**acc[(uid, id)]["Details"], **it["Details"]}}
else:
acc[(uid, id)] = {"Uid": uid, "Id": id, "Details": it["Details"]}
return acc
def mergeList(a:List) -> Dict:
acc = {}
for v in a:
acc = mergeItem(v, acc)
return acc
print(list(mergeList(l).values()))
# [
# {
# 'Uid': '40cc6103-1cf0-4735-b882-d14d32018e58',
# 'Id': '9e1a0057-4570-4a6e-8ff5-88b2facbaf4e',
# 'Details': {'Name': 'Kiran', 'Age': '24'}},
# {
# 'Uid': '196f5865-e9fe-4847-86ae-97d0bf57b816',
# 'Id': '84909ecb-c92e-48a7-bcaa-d478bf3a9220',
# 'Details': {'Name': 'Shreyas'}
# }
# ]

How to compare multiple keys value in same JSON file using python

I have a sample JSON file:
"client_info": [
{
"Id": "00201",
"Information": {
"Name": "John",
"Age": 12
},
"Address": [
{
"country": USA,
"location": [
{
"ad1": "NY"
},
{
"ad1": "FL"
},
]
}
]
},
{
"Id": "00202",
"Information": {
"Name": "John",
"Age": 13
},
"Address": [
{
"country": CA,
"location": [
{
"ad1": "NY"
},
{
"ad1": "FL"
},
]
}
]
},
{
"Id": "00203",
"Information": {
"Name": "John",
"Age": 13
},
"Address": [
{
"country": CA,
"location": [
{
"ad1": "NY"
}
]
}
]
}
]
Here I need to compare Information.Name ,Location.ad1 together for each entry. For example: ID 00201 - John, NY, FL is equal with ID 00202 but ID 00203 is different as it has only "ad1": "NY" . Basically need to compare as a set.
I can create the CSV file but my problem is to make that matched result set. I tried the below code to create matched result set but wasnot able to populate the set correcrtly:
uniqueNameSet = set()
uniquelocationSet = set()
for i,client in enumerate(json_data["client_info"]):
if client["Information"]['Name'] not in uniqueNameSet :
uniqueNameSet.add(client["Information"]['Name'])
else:
for j in range(len(client["Address"][0]['location'])):
if client["Address"][0]['location'][j]['ad1'] not in uniquelocationSet :
uniquelocationSet.add(client["Address"][0]['location'][j]['ad1'])
else:
duplictae +=1
I want to generate a CSV for the matched data and removed those from the JSON file.
matched.csv
id Name ad1
00201 John NY,FL
00202 John NY,FL
updated Json file:
"client_info": [
{
"Id": "00203",
"Information": {
"Name": "John",
"Age": 13
},
"Address": [
{
"country": CA,
"location": [
{
"ad1": "NY"
}
]
}
]
}
]

how to convert multi valued CSV to Json

I have a csv file with 4 columns data as below.
type,MetalType,Date,Acknowledge
Metal,abc123451,2018-05-26,Success
Metal,abc123452,2018-05-27,Success
Metal,abc123454,2018-05-28,Failure
Iron,abc123455,2018-05-29,Success
Iron,abc123456,2018-05-30,Failure
( I just provided header in the above example data but in my case i dont have header in the data)
how can i convert above csv file to Json in the below format...
1st Column : belongs to --> "type": "Metal"
2nd Column : MetalType: "values" : "value": "abc123451"
3rd column : "Date": "values":"value": "2018-05-26"
4th Column : "Acknowledge": "values":"value": "Success"
and remaining all columns are default values.
As per below format ,
{
"entities": [
{
"id": "XXXXXXX",
"type": "Metal",
"data": {
"attributes": {
"MetalType": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": "abc123451"
}
]
},
"Date": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": "2018-05-26"
}
]
},
"Acknowledge": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": "Success"
}
]
}
}
}
}
]
}
Even though jww is right, I built something for you:
I import the csv using pandas:
df = pd.read_csv('data.csv')
then I create a template for the dictionaries you want to add:
d_json = {"entities": []}
template = {
"id": "XXXXXXX",
"type": "",
"data": {
"attributes": {
"MetalType": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": ""
}
]
},
"Date": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": ""
}
]
},
"Acknowledge": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": ""
}
]
}
}
}
}
Now you just need to fill in the dictionary:
for i in range(len(df)):
d = template
d['type'] = df['type'][i]
d['data']['attributes']['MetalType']['values'][0]['value'] = df['MetalType'][i]
d['data']['attributes']['Date']['values'][0]['value'] = df['Date'][i]
d['data']['attributes']['Acknowledge']['values'][0]['value'] = df['Acknowledge'][i]
d_json['entities'].append(d)
I know my way of iterating over the df is kind of ugly, maybe someone knows a cleaner way.
Cheers!

Add item to dictionary which has same value in list of dictionary

I am trying to add a key id with the same uuid.uuid4() into the inner dictionary when 'node' values are equal and a new uuid.uuid4() when a distinct uuid is found.
Let's say 2 keys ('node' in this case) have same value like-> node: 'Bangalore', so I want to generate the same ID for it and a fresh ID for every other distinct node.
This is the code I'm working on now:
import uuid
import json
node_list = [
{
"nodes": [
{
"node": "Kunal",
"label": "PERSON"
},
{
"node": "Bangalore",
"label": "LOC"
}
]
},
{
"nodes": [
{
"node": "John",
"label": "PERSON"
},
{
"node": "Bangalore",
"label": "LOC"
}
]
}
]
for outer_node_dict in node_list:
for inner_dict in outer_node_dict["nodes"]:
inner_dict['id'] = str(uuid.uuid4()) # Remember the key's value here and apply this statement somehow?
print(json.dumps(node_list, indent = True))
This is the response I want:
"[
{
"nodes": [
{
"node": "Kunal",
"label": "PERSON",
"id": "fbf094eb-8670-4c31-a641-4cf16c3596d1"
},
{
"node": "Bangalore",
"label": "LOC",
"id": "24867c2a-f66a-4370-8c5d-8af5b9a25675"
}
]
},
{
"nodes": [
{
"node": "John",
"label": "PERSON",
"id": "5eddc375-ed3e-4f6a-81dc-3966590e8f35"
},
{
"node": "Bangalore",
"label": "LOC",
"id": "24867c2a-f66a-4370-8c5d-8af5b9a25675"
}
]
}
]"
But currently its generating like this:
"[
{
"nodes": [
{
"node": "Kunal",
"label": "PERSON",
"id": "3cce6e36-9d1c-4058-a11b-2bcd0da96c83"
},
{
"node": "Bangalore",
"label": "LOC",
"id": "4d860d3b-1835-4816-a372-050c1cc88fbb"
}
]
},
{
"nodes": [
{
"node": "John",
"label": "PERSON",
"id": "67fc9ba9-b591-44d4-a0ae-70503cda9dfe"
},
{
"node": "Bangalore",
"label": "LOC",
"id": "f83025a0-7d8e-4ec8-b4a0-0bced982825f"
}
]
}
]"
How to remember key's value and apply the same ID for it in the dictionary?
Looks like you want the uuid to be the same for the same "node" value. So, instead of generating it, store it to a dict
node_uuids = defaultdict(lambda: uuid.uuid4())
and then, in your inner loop, instead of
inner_dict['id'] = str(uuid.uuid4())
you write
inner_dict['id'] = node_uuids[inner_dict['node']]
A complete working example is as follows:
from collections import defaultdict
import uuid
import json
node_list = [
{
"nodes": [
{
"node": "Kunal",
"label": "PERSON"
},
{
"node": "Bangalore",
"label": "LOC"
}
]
},
{
"nodes": [
{
"node": "John",
"label": "PERSON"
},
{
"node": "Bangalore",
"label": "LOC"
}
]
}
]
node_uuids = defaultdict(lambda: uuid.uuid4())
for outer_node_dict in node_list:
for inner_dict in outer_node_dict["nodes"]:
inner_dict['id'] = str(node_uuids[inner_dict['node']])
print(json.dumps(node_list, indent = True))

Categories