Join two JSONs based on ID element using Python

Join two JSONs based on ID element using Python - python

I have two JSONs. One has entries like this:
one.json
"data": [
{
"results": {
"counties": {
"32": 0,
"96": 0,
"12": 0
},
"cities": {
"total": 32,
"reporting": 0
}
},
"element_id": 999
},
The other has entries like this:
two.json
"data": [
{
"year": 2020,
"state": "Virginia",
"entries": [
{
"first_name": "Robert",
"last_name": "Smith",
"entry_id": 15723,
"pivot": {
"county_id": 32,
"element_id": 999
}
},
{
"first_name": "Benjamin",
"last_name": "Carter",
"entry_id": 15724,
"pivot": {
"county_id": 34,
"element_id": 999
}
}
],
"element_id": 999,
},
I want to join one.json to two.json based on element_id. The JSONs have lots of element_ids so there is an element of finding the right one to append to. Is there a way to use append to do this based on element_id without having to use a for loop? An appended version of the second JSON above would look like this:
joined.json
"data": [
{
"year": 2020,
"state": "Washington",
"entries": [
{
"first_name": "Robert",
"last_name": "Smith",
"entry_id": 15723,
"pivot": {
"county_id": 32,
"element_id": 999
}
},
{
"first_name": "Benjamin",
"last_name": "Carter",
"entry_id": 15724,
"pivot": {
"county_id": 34,
"element_id": 999
}
}
],
"element_id": 999,
{
"results": {
"counties": {
"32": 0,
"96": 0,
"12": 0
},
"cities": {
"total": 32,
"reporting": 0
}
},
},
What I have so far:
for item in one:
#this goes through one and saves each unique item in a couple variables
temp_id = item["element_id"]
temp_datachunk = item
#then I try to find those variables in two and if I find them, I append
for data in two:
if data["element_id"] == temp_id:
full_data = data.append(item)
print(full_data)
Right now, my attempt dies at the append. I get AttributeError: 'dict' object has no attribute 'append'.

Something like this should work:
source = '''
[{
"results": {
"counties": {
"32": 0,
"96": 0,
"12": 0
},
"cities": {
"total": 32,
"reporting": 0
}
},
"element_id": 999
}]
'''
target = """
[{
"year": 2020,
"state": "Virginia",
"entries": [{
"first_name": "Robert",
"last_name": "Smith",
"entry_id": 15723,
"pivot": {
"county_id": 32,
"element_id": 999
}
},
{
"first_name": "Benjamin",
"last_name": "Carter",
"entry_id": 15724,
"pivot": {
"county_id": 34,
"element_id": 999
}
}
],
"element_id": 999
}]
"""
source_j = json.loads(source)
target_j = json.loads(target)
jsonpath_expr = parse('$..element_id')
source_match = jsonpath_expr.find(source_j)
target_match = jsonpath_expr.find(target_j)
if source_match[0].value==target_match[0].value:
final = target_j+source_j
print(final)
The output is the combined json.

Related

Python Dict append value if key value pair are same

I am new to python dict, and have question regarding append value to key. Sample python dictionary is like below. How can I append values if key-value pair, ID & time are same? Please see expected result below. Tried append(), pop(), update(), and couldn't get expected result. Appreciate any help.
{
"Total": [
{
"ID": "ID_1000",
"time": 1000,
"name": {
"first_name": "John",
"last_name": "Brown"
}
},
{
"ID": "ID_5000",
"time": 5000,
"name": {
"first_name": "Jason",
"last_name": "Willams"
}
},
{
"ID": "ID_5000",
"time": 5000,
"name": {
"first_name": "Mary",
"last_name": "Jones"
}
},
{
"ID": "ID_1000",
"time": 1000,
"name": {
"first_name": "Michael",
"last_name": "Kol"
}
}
]
}
Below is the expected result.
{
"Total": [
{
"ID": "ID_1000",
"time": 1000,
"name": [
{
"first_name": "John",
"last_name": "Brown"
},
{
"first_name": "Michael",
"last_name": "Kol"
}
]
},
{
"ID": "ID_5000",
"time": 5000,
"name": [
{
"first_name": "Jason",
"last_name": "Willams"
},
{
"first_name": "Mary",
"last_name": "Jones"
}
]
}
]
}

One option is use a intermediate dictionary with your ids as keys.
Note that this code do not manage the time data, because is not clear if you want to add or what is expected.
import json
values = {
"Total": [
{
"ID": "ID_1000",
"time": 1000,
"name": {
"first_name": "John",
"last_name": "Brown"
}
},
{
"ID": "ID_5000",
"time": 5000,
"name": {
"first_name": "Jason",
"last_name": "Willams"
}
},
{
"ID": "ID_5000",
"time": 5000,
"name": {
"first_name": "Mary",
"last_name": "Jones"
}
},
{
"ID": "ID_1000",
"time": 1000,
"name": {
"first_name": "Michael",
"last_name": "Kol"
}
}
]
}
# first create a dictionary with the ids as keys
consolidated_names = {}
for total_value in values["Total"]:
id = total_value["ID"]
if id not in consolidated_names:
consolidated_names[id] = [total_value["name"]]
else:
consolidated_names[id].append(total_value["name"])
# then create the structure that you want
processed_values = []
for id in consolidated_names:
processed_values.append({"ID": id, "name": consolidated_names[id]})
print(json.dumps({"Total": processed_values}, indent=4))
Result:
{
"Total": [
{
"ID": "ID_1000",
"name": [
{
"first_name": "John",
"last_name": "Brown"
},
{
"first_name": "Michael",
"last_name": "Kol"
}
]
},
{
"ID": "ID_5000",
"name": [
{
"first_name": "Jason",
"last_name": "Willams"
},
{
"first_name": "Mary",
"last_name": "Jones"
}
]
}
]
}

Getting all the Keys from JSON Object?

Goal: To create a script that will take in nested JSON object as input and output a CSV file with all keys as rows in the CSV?
Example:
{
"Document": {
"DocumentType": 945,
"Version": "V007",
"ClientCode": "WI",
"Shipment": [
{
"ShipmentHeader": {
"ShipmentID": 123456789,
"OrderChannel": "Shopify",
"CustomerNumber": 234234,
"VendorID": "2343SDF",
"ShipViaCode": "FEDX2D",
"AsnDate": "2018-01-27",
"AsnTime": "09:30:47-08:00",
"ShipmentDate": "2018-01-23",
"ShipmentTime": "09:30:47-08:00",
"MBOL": 12345678901234568,
"BOL": 12345678901234566,
"ShippingNumber": "1ZTESTTEST",
"LoadID": 321456987,
"ShipmentWeight": 10,
"ShipmentCost": 2.3,
"CartonsTotal": 2,
"CartonPackagingCode": "CTN25",
"OrdersTotal": 2
},
"References": [
{
"Reference": {
"ReferenceQualifier": "TST",
"ReferenceText": "Testing text"
}
}
],
"Addresses": {
"Address": [
{
"AddressLocationQualifier": "ST",
"LocationNumber": 23234234,
"Name": "John Smith",
"Address1": "123 Main St",
"Address2": "Suite 12",
"City": "Hometown",
"State": "WA",
"Zip": 92345,
"Country": "USA"
},
{
"AddressLocationQualifier": "BT",
"LocationNumber": 2342342,
"Name": "Jane Smith",
"Address1": "345 Second Ave",
"Address2": "Building 32",
"City": "Sometown",
"State": "CA",
"Zip": "23665-0987",
"Country": "USA"
}
]
},
"Orders": {
"Order": [
{
"OrderHeader": {
"PurchaseOrderNumber": 23456342,
"RetailerPurchaseOrderNumber": 234234234,
"RetailerOrderNumber": 23423423,
"CustomerOrderNumber": 234234234,
"Department": 3333,
"Division": 23423,
"OrderWeight": 10.23,
"CartonsTotal": 2,
"QTYOrdered": 12,
"QTYShipped": 23
},
"Cartons": {
"Carton": [
{
"SSCC18": 12345678901234567000,
"TrackingNumber": "1ZTESTTESTTEST",
"CartonContentsQty": 10,
"CartonWeight": 10.23,
"LineItems": {
"LineItem": [
{
"LineNumber": 1,
"ItemNumber": 1234567890,
"UPC": 9876543212,
"QTYOrdered": 34,
"QTYShipped": 32,
"QTYUOM": "EA",
"Description": "Shoes",
"Style": "Tall",
"Size": 9.5,
"Color": "Bllack",
"RetailerItemNumber": 2342333,
"OuterPack": 10
},
{
"LineNumber": 2,
"ItemNumber": 987654321,
"UPC": 7654324567,
"QTYOrdered": 12,
"QTYShipped": 23,
"QTYUOM": "EA",
"Description": "Sunglasses",
"Style": "Short",
"Size": 10,
"Color": "White",
"RetailerItemNumber": 565465456,
"OuterPack": 12
}
]
}
}
]
}
}
]
}
}
]
}
}
In the above JSON Object, I want all the keys (nested included) in a List (Duplicates can be removed by using a set Data Structure). If Nested Key Occurs like in actual JSON they can be keys multiple times in the CSV !

I personally feel that recursion is a perfect application for this type of problem if the amount of nests you will encounter is unpredictable. Here I have written an example in Python of how you can utilise recursion to extract all keys. Cheers.
import json
row = ""
def extract_keys(data):
global row
if isinstance(data, dict):
for key, value in data.items():
row += key + "\n"
extract_keys(value)
elif isinstance(data, list):
for element in data:
extract_keys(element)
# MAIN
with open("input.json", "r") as rfile:
dicts = json.load(rfile)
extract_keys(dicts)
with open("output.csv", "w") as wfile:
wfile.write(row)

Access array in json file in python

Here is a question from an absolute beginner python developer.
Here is the challenge I have :)
not being able to access the "status" in this json file:
[{
"id": 0,
"sellerId": "HHH",
"lat": 90.293846,
"lon": 15.837098,
"evses": [{
"id": 0,
"status": 1,
"connectors": [{
"type": "Hyyyyp",
"maxKw": 22
}
]
}, {
"id": 2001,
"status": 2,
"connectors": [{
"type": "Hyyyyp",
"maxKw": 22
}
]
}, {
"id": 2002,
"status": 1,
"connectors": [{
"type": "Hyyyyp",
"maxKw": 22
}
]
}, {
"id": 2003,
"status": 1,
"connectors": [{
"type": "Hyyyp",
"maxKw": 22
}
]
}
]
}, {
"id": 10001,
"sellerId": 7705,
"lat": 12.59962,
"lon": 40.8767,
"evses": [{
"id": 10001,
"status": 1,
"connectors": [{
"type": "Tyyyyp",
"maxKw": 22
}
]
}, {
"id": 10002,
"status": 2,
"connectors": [{
"type": "Tyyyyp",
"maxKw": 22
}
]
}, {
"id": 10003,
"status": 2,
"connectors": [{
"type": "Tyyyyp",
"maxKw": 22
}
]
}, {
"id": 10004,
"status": 2,
"connectors": [{
"type": "Tyyyyp",
"maxKw": 22
}
]
}
]
}, {
for the "id:10001" there are 3 cases which "status: 2".
So.. how do I print 3 for id:10001?
I guess I need to have an array for storying the ids itself and another array for storying the number of "status:2" for each id.
Here is my code:
firs I do print id:
with open('sample.json') as f:
data = json.load(f)
print(id['id'])
Then I think I need to access array evses:
So here is what I do:
print(data['evses'][0]['id']['status'])
But I get error on this line.

Following clarification from OP, this would be my proposed solution:
import json
from collections import Counter
def get_status_2_for_id(filename):
count = Counter()
with open(filename) as jdata:
for e in json.load(jdata):
if (id_ := e.get('id')) is not None:
for f in e.get('evses', []):
if f.get('status') == 2:
count[id_] += 1
return count.items()
for id_, count in get_status_2_for_id('sample.json'):
print(f'id={id_} count={count}')
Output:
id=0 count=1
id=10001 count=3

Let's say you take a single JSON record of your data which is below
record = {
"id": 10001,
"sellerId": 7705,
"lat": 12.59962,
"lon": 40.8767,
"evses": [{
"id": 10001,
"status": 1,
"connectors": [{
"type": "Tyyyyp",
"maxKw": 22
}
]
}, {
"id": 10002,
"status": 2,
"connectors": [{
"type": "Tyyyyp",
"maxKw": 22
}
]
}, {
"id": 10003,
"status": 2,
"connectors": [{
"type": "Tyyyyp",
"maxKw": 22
}
]
}, {
"id": 10004,
"status": 2,
"connectors": [{
"type": "Tyyyyp",
"maxKw": 22
}
]
}
]
}
From this data record above, if you want to count the number of occurences of a particular status you can do something like below
status_2_count = [stp["status"] for stp in record["evses"]].count(2)
We just generate a list of all statuses in the record["evses"] and count the occurence of a particualr status.
You can make this a function, and repeat it for other records in the file.

You can try this if its stored as a variable:
for status in json_data["evses"]:
print('status = ', status['status'])
And this if it's stored in a file:
import json
status_pts = 1
with open('file.json') as json_file:
data = json.loads(json_file.read())
ls = data[0]['evses']
for s in ls:
if s['status'] == status_pts:
print('id:', s['id'], "number of status =", status_pts)
Also, your json data wasn't closed off, the very last line has:
}, {
It needed:
}]

How to create an automatic mapping of possible JSON data options to be collected?

I've never heard of or found an option for what I'm looking for, but maybe someone knows a way:
To collect the data from a JSON I need to map manually it like this:
events = response['events']
for event in events:
tournament_name = event['tournament']['name']
tournament_slug = event['tournament']['slug']
tournament_category_name = event['tournament']['category']['name']
tournament_category_slug = event['tournament']['category']['slug']
tournament_category_sport_name = event['tournament']['category']['sport']['name']
tournament_category_sport_slug = event['tournament']['category']['sport']['slug']
tournament_category_sport_id = event['tournament']['category']['sport']['id']
The complete model is this:
{
"events": [
{
"tournament": {
"name": "Serie A",
"slug": "serie-a",
"category": {
"name": "Italy",
"slug": "italy",
"sport": {
"name": "Football",
"slug": "football",
"id": 1
},
"id": 31,
"flag": "italy",
"alpha2": "IT"
},
"uniqueTournament": {
"name": "Serie A",
"slug": "serie-a",
"category": {
"name": "Italy",
"slug": "italy",
"sport": {
"name": "Football",
"slug": "football",
"id": 1
},
"id": 31,
"flag": "italy",
"alpha2": "IT"
},
"userCount": 586563,
"id": 23,
"hasEventPlayerStatistics": true
},
"priority": 254,
"id": 33
},
"roundInfo": {
"round": 24
},
"customId": "Kdbsfeb",
"status": {
"code": 7,
"description": "2nd half",
"type": "inprogress"
},
"winnerCode": 0,
"homeTeam": {
"name": "Bologna",
"slug": "bologna",
"shortName": "Bologna",
"gender": "M",
"userCount": 39429,
"nameCode": "BOL",
"national": false,
"type": 0,
"id": 2685,
"subTeams": [
],
"teamColors": {
"primary": "#003366",
"secondary": "#cc0000",
"text": "#cc0000"
}
},
"awayTeam": {
"name": "Empoli",
"slug": "empoli",
"shortName": "Empoli",
"gender": "M",
"userCount": 31469,
"nameCode": "EMP",
"national": false,
"type": 0,
"id": 2705,
"subTeams": [
],
"teamColors": {
"primary": "#0d5696",
"secondary": "#ffffff",
"text": "#ffffff"
}
},
"homeScore": {
"current": 0,
"display": 0,
"period1": 0
},
"awayScore": {
"current": 0,
"display": 0,
"period1": 0
},
"coverage": 1,
"time": {
"initial": 2700,
"max": 5400,
"extra": 540,
"currentPeriodStartTimestamp": 1644159735
},
"changes": {
"changes": [
"status.code",
"status.description",
"time.currentPeriodStart"
],
"changeTimestamp": 1644159743
},
"hasGlobalHighlights": false,
"hasEventPlayerStatistics": true,
"hasEventPlayerHeatMap": true,
"id": 9645399,
"statusTime": {
"prefix": "",
"initial": 2700,
"max": 5400,
"timestamp": 1644159735,
"extra": 540
},
"startTimestamp": 1644156000,
"slug": "empoli-bologna",
"lastPeriod": "period2",
"finalResultOnly": false
}
]
}
In my example I am collecting 7 values.
But there are 83 possible values to be collected.
In case I want to get all the values options that exist in this JSON, is there any way to make this map sequence automatically to print so I can copy it to the code?
Because manually it takes too long to do and it's very tiring.
And the results of texts like print() in terminal would be something like:
tournament_name = event['tournament']['name']
tournament_slug = event['tournament']['slug']
...
...
...
And so on until delivering the 83 object paths with values to collect...
Then I could copy all the prints and paste into my Python file to retrieve the values or any other way to make the work easier.

If the elements in the events arrays are the same, this code works without errors.
def get_prints(recode: dict):
for key in recode.keys():
if type(recode[key]) == dict:
for sub_print in get_prints(recode[key]):
yield [key] + sub_print
else:
yield [key]
class Automater:
def __init__(self,name: str):
"""
Params:
name: name of json
"""
self.name = name
def get_print(self,*args):
"""
Params:
*args: keys json
"""
return '_'.join(args) + ' = ' + self.name + ''.join([f"['{arg}']" for arg in args])
For example, this code:
dicts = {
'tournament':{
'name':"any name",
'slug':'somthing else',
'sport':{
'name':'sport',
'anotherdict':{
'yes':True
}
}
}
}
list_names = get_prints(dicts)
for name in list_names:
print(auto.get_print(*name))
Gives this output:
tournament_name = event['tournament']['name']
tournament_slug = event['tournament']['slug']
tournament_sport_name = event['tournament']['sport']['name']
tournament_sport_anotherdict_yes = event['tournament']['sport']['anotherdict']['yes']

How to compare multiple keys value in same JSON file using python

I have a sample JSON file:
"client_info": [
{
"Id": "00201",
"Information": {
"Name": "John",
"Age": 12
},
"Address": [
{
"country": USA,
"location": [
{
"ad1": "NY"
},
{
"ad1": "FL"
},
]
}
]
},
{
"Id": "00202",
"Information": {
"Name": "John",
"Age": 13
},
"Address": [
{
"country": CA,
"location": [
{
"ad1": "NY"
},
{
"ad1": "FL"
},
]
}
]
},
{
"Id": "00203",
"Information": {
"Name": "John",
"Age": 13
},
"Address": [
{
"country": CA,
"location": [
{
"ad1": "NY"
}
]
}
]
}
]
Here I need to compare Information.Name ,Location.ad1 together for each entry. For example: ID 00201 - John, NY, FL is equal with ID 00202 but ID 00203 is different as it has only "ad1": "NY" . Basically need to compare as a set.
I can create the CSV file but my problem is to make that matched result set. I tried the below code to create matched result set but wasnot able to populate the set correcrtly:
uniqueNameSet = set()
uniquelocationSet = set()
for i,client in enumerate(json_data["client_info"]):
if client["Information"]['Name'] not in uniqueNameSet :
uniqueNameSet.add(client["Information"]['Name'])
else:
for j in range(len(client["Address"][0]['location'])):
if client["Address"][0]['location'][j]['ad1'] not in uniquelocationSet :
uniquelocationSet.add(client["Address"][0]['location'][j]['ad1'])
else:
duplictae +=1
I want to generate a CSV for the matched data and removed those from the JSON file.
matched.csv
id Name ad1
00201 John NY,FL
00202 John NY,FL
updated Json file:
"client_info": [
{
"Id": "00203",
"Information": {
"Name": "John",
"Age": 13
},
"Address": [
{
"country": CA,
"location": [
{
"ad1": "NY"
}
]
}
]
}
]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Join two JSONs based on ID element using Python - python

Related

Python Dict append value if key value pair are same

Getting all the Keys from JSON Object?

Access array in json file in python

How to create an automatic mapping of possible JSON data options to be collected?

How to compare multiple keys value in same JSON file using python

Categories

Resources