How to merge 2 json files with a common key

How to merge 2 json files with a common key - python

I would like to merge 2 JSON files
tests.json with empty value, some of them - nested
{
"tests": [{
"id": 1,
"value": "",
"values": "info..."
}, {
"id": 41,
"title": "Debug test",
"value": "",
"values": [{
"id": 345,
"value": "",
"values": [ {
"id": 230,
"values": [{
"id": 234,
"value": ""
}, {
"id": 653,
"value": ""
}]
}]
}],
}, {...
values.json with the value
{
"values": [{
"id": 2,
"value": "passed"
}, {
"id": 41,
"value": "passed"
}, {
"id": 345,
"value": "passed"
}, {
"id": 230,
"value": "passed"
},{
"id": 234,
"value": "passed"
},{
"id": 653,
"value": "passed"
},{...
This code works fine, but I need to make it more compatible
import json
with open("tests.json") as fo:
data1 = json.load(fo)
with open("values.json") as fo:
data2 = json.load(fo)
for dest in data1['tests']:
if 'values' in dest:
for dest_1 in dest['values']:
if 'values' in dest_1:
for dest_2 in dest_1['values']:
if 'values' in dest_2:
for dest_3 in dest_2['values']:
for source in data2['values']:
if source['id'] == dest_3['id']:
dest_3['value'] = source['value']
for source in data2['values']:
if source['id'] == dest_2['id']:
dest_2['value'] = source['value']
for source in data2['values']:
if source['id'] == dest_1['id']:
dest_1['value'] = source['value']
for source in data2['values']:
if source['id'] == dest['id']:
dest['value'] = source['value']
with open("report.json", "w") as write_file:
json.dump(data1, write_file, indent=2)
As I understood I need to check recursively whether file1.json has 'values' parameter and empty 'value' parameter inside that block. Moreover I couldn't touch source tests.json but only create another one file to save all changes.

That's because you're updating the root json object and getting a
{
"tests": [...],
"values": [...]
}
While what you want is to update individual "tests" from individual "values" and get just
{
"tests": [...]
}
as a result.
Try looping through every object in both jsons.

Related

Python: Convert json with extra data error into CSV

I have a JSON in below format which I receive from a different team and not allowed to make any changes to it:
{
"content": [
{
"id": "5603bbaae412390b73f0c7f",
"name": "ABC",
"description": "Test",
"rsid": "pwcs",
"type": "project",
"owner": {
"id": 529932
},
"created": "2015-09-24T09:00:26Z"
},
{
"id": "56094673e4b0a7e17e310b83",
"name": "secores",
"description": "Panel",
"rsid": "pwce",
"type": "project",
"owner": {
"id": 520902
},
"created": "2015-09-28T13:53:55Z"
}
],
"totalPages": 9,
"totalElements": 8592,
"number": 0,
"numberOfElements": 1000,
"firstPage": true,
"lastPage": false,
"sort": null,
"size": 1000
}
{
"content": [
{
"id": "5bf2cc64d977553780706050",
"name": "Services Report",
"description": "",
"rsid": "pcie",
"type": "project",
"owner": {
"id": 518013
},
"created": "2018-11-19T14:44:52Z"
},
{
"id": "5bf2d56e40b39312e3e167d0",
"name": "Standard form",
"description": "",
"rsid": "wcu",
"type": "project",
"owner": {
"id": 521114
},
"created": "2018-11-19T15:23:26Z"
}
],
"totalPages": 9,
"totalElements": 8592,
"number": 1,
"numberOfElements": 1000,
"firstPage": false,
"lastPage": false,
"sort": null,
"size": 1000
}
{
"content": [
{
"id": "5d95e7d6187c6d6376fd1bad",
"name": "New Project",
"description": "",
"rsid": "pcinforrod",
"type": "project",
"owner": {
"id": 200904228
},
"created": "2019-10-03T12:21:42Z"
},
{
"id": "5d95fc6e56d2e82519629b96",
"name": "Demo - 10/03",
"description": "",
"rsid": "sitedev",
"type": "project",
"owner": {
"id": 20001494
},
"created": "2019-10-03T13:49:34Z"
}
],
"totalPages": 9,
"totalElements": 8592,
"number": 2,
"numberOfElements": 1000,
"firstPage": false,
"lastPage": false,
"sort": null,
"size": 1000
}
I am trying to convert it into CSV using below code:
import csv
import json
with open("C:\python\SampleJSON.json",'rb') as file:
data = json.load(file)
fname = "workspaceExcelDemo.csv"
with open(fname,"w", encoding="utf-8", newline='') as file:
csv_file = csv.writer(file)
csv_file.writerow(["id","name","rsid"])
for item in data["content"]:
csv_file.writerow([item['id'],item['name'],item['rsid']])
However I am getting below error message while executing the above piece of code:
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 35 column 1 (char 937)
How do I convert the above JSON into CSV without making any changes to the JSON file?

If I understand your question and the comments well you could use the json.dumps method:
import csv
import json
with open("C:\python\SampleJSON.json",'rb') as file:
data = [json.loads(line) for line in file]
"""
The json.dumps method converts a Python object to a JSON formatted string.
The json.loads method parses a JSON string into a native Python object.
Replacing the "=" character with an empty string.
"""
data = json.loads(json.dumps(data).replace("=", ""))
fname = "workspaceExcelDemo.csv"
with open(fname, "w", encoding="utf-8", newline='') as file:
csv_file = csv.writer(file)
csv_file.writerow(["id", "name", "rsid"])
for item in data[0]["content"]:
csv_file.writerow([item['id'], item['name'], item['rsid']])

Getting all the Keys from JSON Object?

Goal: To create a script that will take in nested JSON object as input and output a CSV file with all keys as rows in the CSV?
Example:
{
"Document": {
"DocumentType": 945,
"Version": "V007",
"ClientCode": "WI",
"Shipment": [
{
"ShipmentHeader": {
"ShipmentID": 123456789,
"OrderChannel": "Shopify",
"CustomerNumber": 234234,
"VendorID": "2343SDF",
"ShipViaCode": "FEDX2D",
"AsnDate": "2018-01-27",
"AsnTime": "09:30:47-08:00",
"ShipmentDate": "2018-01-23",
"ShipmentTime": "09:30:47-08:00",
"MBOL": 12345678901234568,
"BOL": 12345678901234566,
"ShippingNumber": "1ZTESTTEST",
"LoadID": 321456987,
"ShipmentWeight": 10,
"ShipmentCost": 2.3,
"CartonsTotal": 2,
"CartonPackagingCode": "CTN25",
"OrdersTotal": 2
},
"References": [
{
"Reference": {
"ReferenceQualifier": "TST",
"ReferenceText": "Testing text"
}
}
],
"Addresses": {
"Address": [
{
"AddressLocationQualifier": "ST",
"LocationNumber": 23234234,
"Name": "John Smith",
"Address1": "123 Main St",
"Address2": "Suite 12",
"City": "Hometown",
"State": "WA",
"Zip": 92345,
"Country": "USA"
},
{
"AddressLocationQualifier": "BT",
"LocationNumber": 2342342,
"Name": "Jane Smith",
"Address1": "345 Second Ave",
"Address2": "Building 32",
"City": "Sometown",
"State": "CA",
"Zip": "23665-0987",
"Country": "USA"
}
]
},
"Orders": {
"Order": [
{
"OrderHeader": {
"PurchaseOrderNumber": 23456342,
"RetailerPurchaseOrderNumber": 234234234,
"RetailerOrderNumber": 23423423,
"CustomerOrderNumber": 234234234,
"Department": 3333,
"Division": 23423,
"OrderWeight": 10.23,
"CartonsTotal": 2,
"QTYOrdered": 12,
"QTYShipped": 23
},
"Cartons": {
"Carton": [
{
"SSCC18": 12345678901234567000,
"TrackingNumber": "1ZTESTTESTTEST",
"CartonContentsQty": 10,
"CartonWeight": 10.23,
"LineItems": {
"LineItem": [
{
"LineNumber": 1,
"ItemNumber": 1234567890,
"UPC": 9876543212,
"QTYOrdered": 34,
"QTYShipped": 32,
"QTYUOM": "EA",
"Description": "Shoes",
"Style": "Tall",
"Size": 9.5,
"Color": "Bllack",
"RetailerItemNumber": 2342333,
"OuterPack": 10
},
{
"LineNumber": 2,
"ItemNumber": 987654321,
"UPC": 7654324567,
"QTYOrdered": 12,
"QTYShipped": 23,
"QTYUOM": "EA",
"Description": "Sunglasses",
"Style": "Short",
"Size": 10,
"Color": "White",
"RetailerItemNumber": 565465456,
"OuterPack": 12
}
]
}
}
]
}
}
]
}
}
]
}
}
In the above JSON Object, I want all the keys (nested included) in a List (Duplicates can be removed by using a set Data Structure). If Nested Key Occurs like in actual JSON they can be keys multiple times in the CSV !

I personally feel that recursion is a perfect application for this type of problem if the amount of nests you will encounter is unpredictable. Here I have written an example in Python of how you can utilise recursion to extract all keys. Cheers.
import json
row = ""
def extract_keys(data):
global row
if isinstance(data, dict):
for key, value in data.items():
row += key + "\n"
extract_keys(value)
elif isinstance(data, list):
for element in data:
extract_keys(element)
# MAIN
with open("input.json", "r") as rfile:
dicts = json.load(rfile)
extract_keys(dicts)
with open("output.csv", "w") as wfile:
wfile.write(row)

Flatten nested json to csv with nested column names

I have rather very weird requirement now. I have below json and somehow I have to convert it into flat csv.
[
{
"authorizationQualifier": "SDA",
"authorizationInformation": " ",
"securityQualifier": "ASD",
"securityInformation": " ",
"senderQualifier": "ASDAD",
"senderId": "FADA ",
"receiverQualifier": "ADSAS",
"receiverId": "ADAD ",
"date": "140101",
"time": "0730",
"standardsId": null,
"version": "00501",
"interchangeControlNumber": "123456789",
"acknowledgmentRequested": "0",
"testIndicator": "T",
"functionalGroups": [
{
"functionalIdentifierCode": "ADSAD",
"applicationSenderCode": "ASDAD",
"applicationReceiverCode": "ADSADS",
"date": "20140101",
"time": "07294900",
"groupControlNumber": "123456789",
"responsibleAgencyCode": "X",
"version": "005010X221A1",
"transactions": [
{
"name": "ASDADAD",
"transactionSetIdentifierCode": "adADS",
"transactionSetControlNumber": "123456789",
"implementationConventionReference": null,
"segments": [
{
"BPR03": "ad",
"BPR14": "QWQWDQ",
"BPR02": "1.57",
"BPR13": "23223",
"BPR01": "sad",
"BPR12": "56",
"BPR10": "32424",
"BPR09": "12313",
"BPR08": "DA",
"BPR07": "123456789",
"BPR06": "12313",
"BPR05": "ASDADSAD",
"BPR16": "21313",
"BPR04": "SDADSAS",
"BPR15": "11212",
"id": "aDSASD"
},
{
"TRN02": "2424",
"TRN03": "35435345",
"TRN01": "3435345",
"id": "FSDF"
},
{
"REF02": "fdsffs",
"REF01": "sfsfs",
"id": "fsfdsfd"
},
{
"DTM02": "2432424",
"id": "sfsfd",
"DTM01": "234243"
}
],
"loops": [
{
"id": "24324234234",
"segments": [
{
"N101": "sfsfsdf",
"N102": "sfsf",
"id": "dgfdgf"
},
{
"N301": "sfdssfdsfsf",
"N302": "effdssf",
"id": "fdssf"
},
{
"N401": "sdffssf",
"id": "sfds",
"N402": "sfdsf",
"N403": "23424"
},
{
"PER06": "Wsfsfdsfsf",
"PER05": "sfsf",
"PER04": "23424",
"PER03": "fdfbvcb",
"PER02": "Pedsdsf",
"PER01": "sfsfsf",
"id": "fdsdf"
}
]
},
{
"id": "2342",
"segments": [
{
"N101": "sdfsfds",
"N102": "vcbvcb",
"N103": "dsfsdfs",
"N104": "343443",
"id": "fdgfdg"
},
{
"N401": "dfsgdfg",
"id": "dfgdgdf",
"N402": "dgdgdg",
"N403": "234244"
},
{
"REF02": "23423342",
"REF01": "fsdfs",
"id": "sfdsfds"
}
]
}
]
}
]
}
]
}
]
The column header name corresponding to deeper key-value make take nested form, like functionalGroups[0].transactions[0].segments[0].BPR15.
I am able to do this in java using this github project (here you can find the output format I desire in the explanation) in one line:
flatJson = JSONFlattener.parseJson(new File("files/simple.json"), "UTF-8");
The output was:
date,securityQualifier,testIndicator,functionalGroups[1].functionalIdentifierCode,functionalGroups[1].date,functionalGroups[1].applicationReceiverCode, ...
140101,00,T,HP,20140101,ETIN,...
But I want to do this in python. I tried as suggested in this answer:
with open('data.json') as data_file:
data = json.load(data_file)
df = json_normalize(data, record_prefix=True)
with open('temp2.csv', "w", newline='\n') as csv_file:
csv_file.write(df.to_csv())
However, for column functionalGroups, it dumps json as a cell value.
I also tried as suggested in this answer:
with open('data.json') as f: # this ensures opening and closing file
a = json.loads(f.read())
df = pandas.DataFrame(a)
print(df.transpose())
But this also seem to do the same:
0
acknowledgmentRequested 0
authorizationInformation
authorizationQualifier SDA
date 140101
functionalGroups [{'functionalIdentifierCode': 'ADSAD', 'applic...
interchangeControlNumber 123456789
receiverId ADAD
receiverQualifier ADSAS
securityInformation
securityQualifier ASD
senderId FADA
senderQualifier ASDAD
standardsId None
testIndicator T
time 0730
version 00501
Is it possible to do what I desire in python?

Extracting elements from json in python

I have the following json:
{
"request": {
"id": "123",
"url": "/aa/bb/cc",
"method": "GET",
"timestamp": "2018-08-09T08:41:38.432Z"
},
"response": {
"status": {
"code": 200,
"message": "OK"
},
"items": [
{
"id": "aaa",
"name": "w1"
},
{
"id": "bbb",
"name": "w2"
},
{
"id": "ccc",
"name": "w3"
}
]
}
}
I need to loop over items and print each name. I've tried the following code which doesn't work.
response = requests.get(url)
data = json.loads(response.content)
for group in data['response']['items']:
print data['response']['items'][group]['name']
When i replace group with 0 for example, I can access the first name:
data['response']['items'][0]['name']
However, I don't know in advanced how many elements are in the array.

As Joel mentioned, in the for loop,
for group in data['response']['items']:
you are assigning group the value from data['response']['items']. Hence group contains the value :
[
{
"id": "aaa",
"name": "w1"
},
{
"id": "bbb",
"name": "w2"
},
{
"id": "ccc",
"name": "w3"
}
]
So all you need to do is
print group['name']

You can use Pandas module and call read_json function.
import pandas as pd
df = pd.read_json(your_json_file.json)
for i in df.response['items']:
print(i['name'])
# w1
# w2
# w3

You could try this:
for i in range (0,len(d['response']['items'])):
print(d['response']['items'][i]['name'])
Output:
w1
w2
w3

Unable to pull data from json using python

I have the following json
{
"response": {
"message": null,
"exception": null,
"context": [
{
"headers": null,
"name": "aname",
"children": [
{
"type": "cluster-connectivity",
"name": "cluster-connectivity"
},
{
"type": "consistency-groups",
"name": "consistency-groups"
},
{
"type": "devices",
"name": "devices"
},
{
"type": "exports",
"name": "exports"
},
{
"type": "storage-elements",
"name": "storage-elements"
},
{
"type": "system-volumes",
"name": "system-volumes"
},
{
"type": "uninterruptible-power-supplies",
"name": "uninterruptible-power-supplies"
},
{
"type": "virtual-volumes",
"name": "virtual-volumes"
}
],
"parent": "/clusters",
"attributes": [
{
"value": "true",
"name": "allow-auto-join"
},
{
"value": "0",
"name": "auto-expel-count"
},
{
"value": "0",
"name": "auto-expel-period"
},
{
"value": "0",
"name": "auto-join-delay"
},
{
"value": "1",
"name": "cluster-id"
},
{
"value": "true",
"name": "connected"
},
{
"value": "synchronous",
"name": "default-cache-mode"
},
{
"value": "true",
"name": "default-caw-template"
},
{
"value": "blah",
"name": "default-director"
},
{
"value": [
"blah",
"blah"
],
"name": "director-names"
},
{
"value": [
],
"name": "health-indications"
},
{
"value": "ok",
"name": "health-state"
},
{
"value": "1",
"name": "island-id"
},
{
"value": "blah",
"name": "name"
},
{
"value": "ok",
"name": "operational-status"
},
{
"value": [
],
"name": "transition-indications"
},
{
"value": [
],
"name": "transition-progress"
}
],
"type": "cluster"
}
],
"custom-data": null
}
}
which im trying to parse using the json module in python. I am only intrested in getting the following information out of it.
Name Value
operational-status Value
health-state Value
Here is what i have tried.
in the below script data is the json returned from a webpage
json = json.loads(data)
healthstate= json['response']['context']['operational-status']
operationalstatus = json['response']['context']['health-status']
Unfortunately i think i must be missing something as the above results in an error that indexes must be integers not string.
if I try
healthstate= json['response'][0]
it errors saying index 0 is out of range.
Any help would be gratefully received.

json['response']['context'] is a list, so that object requires you to use integer indices.
Each item in that list is itself a dictionary again. In this case there is only one such item.
To get all "name": "health-state" dictionaries out of that structure you'd need to do a little more processing:
[attr['value'] for attr in json['response']['context'][0]['attributes'] if attr['name'] == 'health-state']
would give you a list of of matching values for health-state in the first context.
Demo:
>>> [attr['value'] for attr in json['response']['context'][0]['attributes'] if attr['name'] == 'health-state']
[u'ok']

You have to follow the data structure. It's best to interactively manipulate the data and check what every item is. If it's a list you'll have to index it positionally or iterate through it and check the values. If it's a dict you'll have to index it by it's keys. For example here is a function that get's the context and then iterates through it's attributes checking for a particular name.
def get_attribute(data, attribute):
for attrib in data['response']['context'][0]['attributes']:
if attrib['name'] == attribute:
return attrib['value']
return 'Not Found'
>>> data = json.loads(s)
>>> get_attribute(data, 'operational-status')
u'ok'
>>> get_attribute(data, 'health-state')
u'ok'

json['reponse']['context'] is a list, not a dict. The structure is not exactly what you think it is.
For example, the only "operational status" I see in there can be read with the following:
json['response']['context'][0]['attributes'][0]['operational-status']

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to merge 2 json files with a common key - python

That's because you're updating the root json object and getting a { "tests": [...], "values": [...] } While what you want is to update individual "tests" from individual "values" and get just { "tests": [...] } as a result. Try looping through every object in both jsons.

Related

Python: Convert json with extra data error into CSV

Getting all the Keys from JSON Object?

Flatten nested json to csv with nested column names

Extracting elements from json in python

Unable to pull data from json using python

Categories

Resources