I would like to merge 2 JSON files
tests.json with empty value, some of them - nested
{
"tests": [{
"id": 1,
"value": "",
"values": "info..."
}, {
"id": 41,
"title": "Debug test",
"value": "",
"values": [{
"id": 345,
"value": "",
"values": [ {
"id": 230,
"values": [{
"id": 234,
"value": ""
}, {
"id": 653,
"value": ""
}]
}]
}],
}, {...
values.json with the value
{
"values": [{
"id": 2,
"value": "passed"
}, {
"id": 41,
"value": "passed"
}, {
"id": 345,
"value": "passed"
}, {
"id": 230,
"value": "passed"
},{
"id": 234,
"value": "passed"
},{
"id": 653,
"value": "passed"
},{...
This code works fine, but I need to make it more compatible
import json
with open("tests.json") as fo:
data1 = json.load(fo)
with open("values.json") as fo:
data2 = json.load(fo)
for dest in data1['tests']:
if 'values' in dest:
for dest_1 in dest['values']:
if 'values' in dest_1:
for dest_2 in dest_1['values']:
if 'values' in dest_2:
for dest_3 in dest_2['values']:
for source in data2['values']:
if source['id'] == dest_3['id']:
dest_3['value'] = source['value']
for source in data2['values']:
if source['id'] == dest_2['id']:
dest_2['value'] = source['value']
for source in data2['values']:
if source['id'] == dest_1['id']:
dest_1['value'] = source['value']
for source in data2['values']:
if source['id'] == dest['id']:
dest['value'] = source['value']
with open("report.json", "w") as write_file:
json.dump(data1, write_file, indent=2)
As I understood I need to check recursively whether file1.json has 'values' parameter and empty 'value' parameter inside that block. Moreover I couldn't touch source tests.json but only create another one file to save all changes.
That's because you're updating the root json object and getting a
{
"tests": [...],
"values": [...]
}
While what you want is to update individual "tests" from individual "values" and get just
{
"tests": [...]
}
as a result.
Try looping through every object in both jsons.
Related
I have a JSON in below format which I receive from a different team and not allowed to make any changes to it:
{
"content": [
{
"id": "5603bbaae412390b73f0c7f",
"name": "ABC",
"description": "Test",
"rsid": "pwcs",
"type": "project",
"owner": {
"id": 529932
},
"created": "2015-09-24T09:00:26Z"
},
{
"id": "56094673e4b0a7e17e310b83",
"name": "secores",
"description": "Panel",
"rsid": "pwce",
"type": "project",
"owner": {
"id": 520902
},
"created": "2015-09-28T13:53:55Z"
}
],
"totalPages": 9,
"totalElements": 8592,
"number": 0,
"numberOfElements": 1000,
"firstPage": true,
"lastPage": false,
"sort": null,
"size": 1000
}
{
"content": [
{
"id": "5bf2cc64d977553780706050",
"name": "Services Report",
"description": "",
"rsid": "pcie",
"type": "project",
"owner": {
"id": 518013
},
"created": "2018-11-19T14:44:52Z"
},
{
"id": "5bf2d56e40b39312e3e167d0",
"name": "Standard form",
"description": "",
"rsid": "wcu",
"type": "project",
"owner": {
"id": 521114
},
"created": "2018-11-19T15:23:26Z"
}
],
"totalPages": 9,
"totalElements": 8592,
"number": 1,
"numberOfElements": 1000,
"firstPage": false,
"lastPage": false,
"sort": null,
"size": 1000
}
{
"content": [
{
"id": "5d95e7d6187c6d6376fd1bad",
"name": "New Project",
"description": "",
"rsid": "pcinforrod",
"type": "project",
"owner": {
"id": 200904228
},
"created": "2019-10-03T12:21:42Z"
},
{
"id": "5d95fc6e56d2e82519629b96",
"name": "Demo - 10/03",
"description": "",
"rsid": "sitedev",
"type": "project",
"owner": {
"id": 20001494
},
"created": "2019-10-03T13:49:34Z"
}
],
"totalPages": 9,
"totalElements": 8592,
"number": 2,
"numberOfElements": 1000,
"firstPage": false,
"lastPage": false,
"sort": null,
"size": 1000
}
I am trying to convert it into CSV using below code:
import csv
import json
with open("C:\python\SampleJSON.json",'rb') as file:
data = json.load(file)
fname = "workspaceExcelDemo.csv"
with open(fname,"w", encoding="utf-8", newline='') as file:
csv_file = csv.writer(file)
csv_file.writerow(["id","name","rsid"])
for item in data["content"]:
csv_file.writerow([item['id'],item['name'],item['rsid']])
However I am getting below error message while executing the above piece of code:
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 35 column 1 (char 937)
How do I convert the above JSON into CSV without making any changes to the JSON file?
If I understand your question and the comments well you could use the json.dumps method:
import csv
import json
with open("C:\python\SampleJSON.json",'rb') as file:
data = [json.loads(line) for line in file]
"""
The json.dumps method converts a Python object to a JSON formatted string.
The json.loads method parses a JSON string into a native Python object.
Replacing the "=" character with an empty string.
"""
data = json.loads(json.dumps(data).replace("=", ""))
fname = "workspaceExcelDemo.csv"
with open(fname, "w", encoding="utf-8", newline='') as file:
csv_file = csv.writer(file)
csv_file.writerow(["id", "name", "rsid"])
for item in data[0]["content"]:
csv_file.writerow([item['id'], item['name'], item['rsid']])
Goal: To create a script that will take in nested JSON object as input and output a CSV file with all keys as rows in the CSV?
Example:
{
"Document": {
"DocumentType": 945,
"Version": "V007",
"ClientCode": "WI",
"Shipment": [
{
"ShipmentHeader": {
"ShipmentID": 123456789,
"OrderChannel": "Shopify",
"CustomerNumber": 234234,
"VendorID": "2343SDF",
"ShipViaCode": "FEDX2D",
"AsnDate": "2018-01-27",
"AsnTime": "09:30:47-08:00",
"ShipmentDate": "2018-01-23",
"ShipmentTime": "09:30:47-08:00",
"MBOL": 12345678901234568,
"BOL": 12345678901234566,
"ShippingNumber": "1ZTESTTEST",
"LoadID": 321456987,
"ShipmentWeight": 10,
"ShipmentCost": 2.3,
"CartonsTotal": 2,
"CartonPackagingCode": "CTN25",
"OrdersTotal": 2
},
"References": [
{
"Reference": {
"ReferenceQualifier": "TST",
"ReferenceText": "Testing text"
}
}
],
"Addresses": {
"Address": [
{
"AddressLocationQualifier": "ST",
"LocationNumber": 23234234,
"Name": "John Smith",
"Address1": "123 Main St",
"Address2": "Suite 12",
"City": "Hometown",
"State": "WA",
"Zip": 92345,
"Country": "USA"
},
{
"AddressLocationQualifier": "BT",
"LocationNumber": 2342342,
"Name": "Jane Smith",
"Address1": "345 Second Ave",
"Address2": "Building 32",
"City": "Sometown",
"State": "CA",
"Zip": "23665-0987",
"Country": "USA"
}
]
},
"Orders": {
"Order": [
{
"OrderHeader": {
"PurchaseOrderNumber": 23456342,
"RetailerPurchaseOrderNumber": 234234234,
"RetailerOrderNumber": 23423423,
"CustomerOrderNumber": 234234234,
"Department": 3333,
"Division": 23423,
"OrderWeight": 10.23,
"CartonsTotal": 2,
"QTYOrdered": 12,
"QTYShipped": 23
},
"Cartons": {
"Carton": [
{
"SSCC18": 12345678901234567000,
"TrackingNumber": "1ZTESTTESTTEST",
"CartonContentsQty": 10,
"CartonWeight": 10.23,
"LineItems": {
"LineItem": [
{
"LineNumber": 1,
"ItemNumber": 1234567890,
"UPC": 9876543212,
"QTYOrdered": 34,
"QTYShipped": 32,
"QTYUOM": "EA",
"Description": "Shoes",
"Style": "Tall",
"Size": 9.5,
"Color": "Bllack",
"RetailerItemNumber": 2342333,
"OuterPack": 10
},
{
"LineNumber": 2,
"ItemNumber": 987654321,
"UPC": 7654324567,
"QTYOrdered": 12,
"QTYShipped": 23,
"QTYUOM": "EA",
"Description": "Sunglasses",
"Style": "Short",
"Size": 10,
"Color": "White",
"RetailerItemNumber": 565465456,
"OuterPack": 12
}
]
}
}
]
}
}
]
}
}
]
}
}
In the above JSON Object, I want all the keys (nested included) in a List (Duplicates can be removed by using a set Data Structure). If Nested Key Occurs like in actual JSON they can be keys multiple times in the CSV !
I personally feel that recursion is a perfect application for this type of problem if the amount of nests you will encounter is unpredictable. Here I have written an example in Python of how you can utilise recursion to extract all keys. Cheers.
import json
row = ""
def extract_keys(data):
global row
if isinstance(data, dict):
for key, value in data.items():
row += key + "\n"
extract_keys(value)
elif isinstance(data, list):
for element in data:
extract_keys(element)
# MAIN
with open("input.json", "r") as rfile:
dicts = json.load(rfile)
extract_keys(dicts)
with open("output.csv", "w") as wfile:
wfile.write(row)
I have rather very weird requirement now. I have below json and somehow I have to convert it into flat csv.
[
{
"authorizationQualifier": "SDA",
"authorizationInformation": " ",
"securityQualifier": "ASD",
"securityInformation": " ",
"senderQualifier": "ASDAD",
"senderId": "FADA ",
"receiverQualifier": "ADSAS",
"receiverId": "ADAD ",
"date": "140101",
"time": "0730",
"standardsId": null,
"version": "00501",
"interchangeControlNumber": "123456789",
"acknowledgmentRequested": "0",
"testIndicator": "T",
"functionalGroups": [
{
"functionalIdentifierCode": "ADSAD",
"applicationSenderCode": "ASDAD",
"applicationReceiverCode": "ADSADS",
"date": "20140101",
"time": "07294900",
"groupControlNumber": "123456789",
"responsibleAgencyCode": "X",
"version": "005010X221A1",
"transactions": [
{
"name": "ASDADAD",
"transactionSetIdentifierCode": "adADS",
"transactionSetControlNumber": "123456789",
"implementationConventionReference": null,
"segments": [
{
"BPR03": "ad",
"BPR14": "QWQWDQ",
"BPR02": "1.57",
"BPR13": "23223",
"BPR01": "sad",
"BPR12": "56",
"BPR10": "32424",
"BPR09": "12313",
"BPR08": "DA",
"BPR07": "123456789",
"BPR06": "12313",
"BPR05": "ASDADSAD",
"BPR16": "21313",
"BPR04": "SDADSAS",
"BPR15": "11212",
"id": "aDSASD"
},
{
"TRN02": "2424",
"TRN03": "35435345",
"TRN01": "3435345",
"id": "FSDF"
},
{
"REF02": "fdsffs",
"REF01": "sfsfs",
"id": "fsfdsfd"
},
{
"DTM02": "2432424",
"id": "sfsfd",
"DTM01": "234243"
}
],
"loops": [
{
"id": "24324234234",
"segments": [
{
"N101": "sfsfsdf",
"N102": "sfsf",
"id": "dgfdgf"
},
{
"N301": "sfdssfdsfsf",
"N302": "effdssf",
"id": "fdssf"
},
{
"N401": "sdffssf",
"id": "sfds",
"N402": "sfdsf",
"N403": "23424"
},
{
"PER06": "Wsfsfdsfsf",
"PER05": "sfsf",
"PER04": "23424",
"PER03": "fdfbvcb",
"PER02": "Pedsdsf",
"PER01": "sfsfsf",
"id": "fdsdf"
}
]
},
{
"id": "2342",
"segments": [
{
"N101": "sdfsfds",
"N102": "vcbvcb",
"N103": "dsfsdfs",
"N104": "343443",
"id": "fdgfdg"
},
{
"N401": "dfsgdfg",
"id": "dfgdgdf",
"N402": "dgdgdg",
"N403": "234244"
},
{
"REF02": "23423342",
"REF01": "fsdfs",
"id": "sfdsfds"
}
]
}
]
}
]
}
]
}
]
The column header name corresponding to deeper key-value make take nested form, like functionalGroups[0].transactions[0].segments[0].BPR15.
I am able to do this in java using this github project (here you can find the output format I desire in the explanation) in one line:
flatJson = JSONFlattener.parseJson(new File("files/simple.json"), "UTF-8");
The output was:
date,securityQualifier,testIndicator,functionalGroups[1].functionalIdentifierCode,functionalGroups[1].date,functionalGroups[1].applicationReceiverCode, ...
140101,00,T,HP,20140101,ETIN,...
But I want to do this in python. I tried as suggested in this answer:
with open('data.json') as data_file:
data = json.load(data_file)
df = json_normalize(data, record_prefix=True)
with open('temp2.csv', "w", newline='\n') as csv_file:
csv_file.write(df.to_csv())
However, for column functionalGroups, it dumps json as a cell value.
I also tried as suggested in this answer:
with open('data.json') as f: # this ensures opening and closing file
a = json.loads(f.read())
df = pandas.DataFrame(a)
print(df.transpose())
But this also seem to do the same:
0
acknowledgmentRequested 0
authorizationInformation
authorizationQualifier SDA
date 140101
functionalGroups [{'functionalIdentifierCode': 'ADSAD', 'applic...
interchangeControlNumber 123456789
receiverId ADAD
receiverQualifier ADSAS
securityInformation
securityQualifier ASD
senderId FADA
senderQualifier ASDAD
standardsId None
testIndicator T
time 0730
version 00501
Is it possible to do what I desire in python?
I have the following json:
{
"request": {
"id": "123",
"url": "/aa/bb/cc",
"method": "GET",
"timestamp": "2018-08-09T08:41:38.432Z"
},
"response": {
"status": {
"code": 200,
"message": "OK"
},
"items": [
{
"id": "aaa",
"name": "w1"
},
{
"id": "bbb",
"name": "w2"
},
{
"id": "ccc",
"name": "w3"
}
]
}
}
I need to loop over items and print each name. I've tried the following code which doesn't work.
response = requests.get(url)
data = json.loads(response.content)
for group in data['response']['items']:
print data['response']['items'][group]['name']
When i replace group with 0 for example, I can access the first name:
data['response']['items'][0]['name']
However, I don't know in advanced how many elements are in the array.
As Joel mentioned, in the for loop,
for group in data['response']['items']:
you are assigning group the value from data['response']['items']. Hence group contains the value :
[
{
"id": "aaa",
"name": "w1"
},
{
"id": "bbb",
"name": "w2"
},
{
"id": "ccc",
"name": "w3"
}
]
So all you need to do is
print group['name']
You can use Pandas module and call read_json function.
import pandas as pd
df = pd.read_json(your_json_file.json)
for i in df.response['items']:
print(i['name'])
# w1
# w2
# w3
You could try this:
for i in range (0,len(d['response']['items'])):
print(d['response']['items'][i]['name'])
Output:
w1
w2
w3
I have the following json
{
"response": {
"message": null,
"exception": null,
"context": [
{
"headers": null,
"name": "aname",
"children": [
{
"type": "cluster-connectivity",
"name": "cluster-connectivity"
},
{
"type": "consistency-groups",
"name": "consistency-groups"
},
{
"type": "devices",
"name": "devices"
},
{
"type": "exports",
"name": "exports"
},
{
"type": "storage-elements",
"name": "storage-elements"
},
{
"type": "system-volumes",
"name": "system-volumes"
},
{
"type": "uninterruptible-power-supplies",
"name": "uninterruptible-power-supplies"
},
{
"type": "virtual-volumes",
"name": "virtual-volumes"
}
],
"parent": "/clusters",
"attributes": [
{
"value": "true",
"name": "allow-auto-join"
},
{
"value": "0",
"name": "auto-expel-count"
},
{
"value": "0",
"name": "auto-expel-period"
},
{
"value": "0",
"name": "auto-join-delay"
},
{
"value": "1",
"name": "cluster-id"
},
{
"value": "true",
"name": "connected"
},
{
"value": "synchronous",
"name": "default-cache-mode"
},
{
"value": "true",
"name": "default-caw-template"
},
{
"value": "blah",
"name": "default-director"
},
{
"value": [
"blah",
"blah"
],
"name": "director-names"
},
{
"value": [
],
"name": "health-indications"
},
{
"value": "ok",
"name": "health-state"
},
{
"value": "1",
"name": "island-id"
},
{
"value": "blah",
"name": "name"
},
{
"value": "ok",
"name": "operational-status"
},
{
"value": [
],
"name": "transition-indications"
},
{
"value": [
],
"name": "transition-progress"
}
],
"type": "cluster"
}
],
"custom-data": null
}
}
which im trying to parse using the json module in python. I am only intrested in getting the following information out of it.
Name Value
operational-status Value
health-state Value
Here is what i have tried.
in the below script data is the json returned from a webpage
json = json.loads(data)
healthstate= json['response']['context']['operational-status']
operationalstatus = json['response']['context']['health-status']
Unfortunately i think i must be missing something as the above results in an error that indexes must be integers not string.
if I try
healthstate= json['response'][0]
it errors saying index 0 is out of range.
Any help would be gratefully received.
json['response']['context'] is a list, so that object requires you to use integer indices.
Each item in that list is itself a dictionary again. In this case there is only one such item.
To get all "name": "health-state" dictionaries out of that structure you'd need to do a little more processing:
[attr['value'] for attr in json['response']['context'][0]['attributes'] if attr['name'] == 'health-state']
would give you a list of of matching values for health-state in the first context.
Demo:
>>> [attr['value'] for attr in json['response']['context'][0]['attributes'] if attr['name'] == 'health-state']
[u'ok']
You have to follow the data structure. It's best to interactively manipulate the data and check what every item is. If it's a list you'll have to index it positionally or iterate through it and check the values. If it's a dict you'll have to index it by it's keys. For example here is a function that get's the context and then iterates through it's attributes checking for a particular name.
def get_attribute(data, attribute):
for attrib in data['response']['context'][0]['attributes']:
if attrib['name'] == attribute:
return attrib['value']
return 'Not Found'
>>> data = json.loads(s)
>>> get_attribute(data, 'operational-status')
u'ok'
>>> get_attribute(data, 'health-state')
u'ok'
json['reponse']['context'] is a list, not a dict. The structure is not exactly what you think it is.
For example, the only "operational status" I see in there can be read with the following:
json['response']['context'][0]['attributes'][0]['operational-status']