I want to re-format a JSON file so that certain objects (dictionaries) with some specific keys are on one-line.
For example, any object with key name should appear in one line:
{
"this": "that",
"parameters": [
{ "name": "param1", "type": "string" },
{ "name": "param2" },
{ "name": "param3", "default": "#someValue" }
]
}
The JSON file is generated, and contains programming language data. One-line certain fields makes it much easier to visually inspect/review.
I tried to override python json.JSONEncoder to turn matching dict into a string before writing, only to realize quotes " within the string are escaped again in the result JSON file, defeating my purpose.
I also looked at jq but couldn't figure out a way to do it. I found similar questions and solutions based on line length, but my requirements are simpler, and I don't want other shorter lines to be changed. Only certain objects or fields.
This code recursively replaces all the appropriate dicts in the data with unique strings (UUIDs) and records those replacements, then in the indented JSON string the unique strings are replaced with the desired original single line JSON.
replace returns a pair of:
A modified version of the input argument data
A list of pairs of JSON strings where for each pair the first value should be replaced with the second value in the final pretty printed JSON.
import json
import uuid
def replace(o):
if isinstance(o, dict):
if "name" in o:
replacement = uuid.uuid4().hex
return replacement, [(f'"{replacement}"', json.dumps(o))]
replacements = []
result = {}
for key, value in o.items():
new_value, value_replacements = replace(value)
result[key] = new_value
replacements.extend(value_replacements)
return result, replacements
elif isinstance(o, list):
replacements = []
result = []
for value in o:
new_value, value_replacements = replace(value)
result.append(new_value)
replacements.extend(value_replacements)
return result, replacements
else:
return o, []
def pretty(data):
data, replacements = replace(data)
result = json.dumps(data, indent=4)
for old, new in replacements:
result = result.replace(old, new)
return result
print(pretty({
"this": "that",
"parameters": [
{"name": "param1", "type": "string"},
{"name": "param2"},
{"name": "param3", "default": "#someValue"}
]
}))
Related
I was new to python, my requirement is to check whether the given key exists on the json or not. Json will not be same all the time. So, am looking for the generic function to check whether the key exists. It works for simple json , but returns nothing when the json itself has another json with an jsonarray inside as shown below:
{
"id": "1888741f-173a-4366-9fa0-a156d8734972",
"type": "events",
"version": "1.0.0",
"count": 3,
"payload": {
"cmevents": [
{
"exit_code": "0dbc2745-a964-4ce3-b7a0-bd5295afc620",
"sourceEventType": "test01",
"sourceType": "test",
product:{
"productCode":"101"
}
},
{
"exit_code": "1dbc2745-a964-4ce3-b7a0-bd5295afc620",
"sourceEventType": "test02",
"sourceType": "test",
product:{
"productCode":"102"
}
},
{
"exit_code": "2dbc2745-a964-4ce3-b7a0-bd5295afc620",
"sourceEventType": "test03",
"sourceType": "test",
product:{
"productCode":"103"
}
}
]
}
}
from the above json , i wants to check the key sourceEventType exists in all items of cmevents list.
Following is the function I have used
def checkElementsExist(element, JSON, path, whole_path):
if element in JSON:
path = path + element + ' = ' + JSON[element].encode('utf-8')
whole_path.append(path)
//
for key in JSON:
if isinstance(JSON[key], dict):
finds(element, JSON[key], path + key + '.', whole_path)
To call the function:
whole_path = []
finds('sourceEventType', json, '', whole_path)
Can anyone please help me with the right solution
A recursive function is probably an easier approach here.
Parse the json using json.loads(text)
Search the tree
import json
text = json.loads(".....")
def search_for_key(key, content) -> bool:
# Search each item of the list
# if found return true
if isinstance(content, list):
for elem in content:
if search_for_key(key, elem):
return True
# Search each key of the dictionary for match
# If the key isn't a match try searching the value
# of the key in the dictionary
elif isinstance(content, dict):
for key_in_json in content:
if key_in_json == key:
return True
if search_for_key(key, content[key_in_json]):
return True
# If here it's a number or string. There is no where else to go
# return false
else:
return False
print(search_for_key("some_key?", text))
It's definitely possible to make your path approach work BUT you're doing need to keep track of what paths you haven't explored yet using a stack/queue (and you get this for free with a recursive function).
My code gets a json path file, open/parses it and prints out desired values with help of a csv mapping file set up (knows what key words to look for and what name to print values out as).
Some json files, however, have multiple values for example, a json file with key "Affiliate" will have more key/value pairs inside of it instead of just having a value.
How can I parse within a key like this one and print out the 'true' value vs the 'false' ones? Currently my code would print out the entire array of key value pairs within that target key.
Example json:
"Affiliate": [
{
"ov": true,
"value": "United States",
"lookupCode": "US"
},
{
"ov": false,
"value": "France",
"lookupCode": "FR"
}
]
My code:
import json
import csv
output_dict = {}
#maps csv and json information
def findValue(json_obj, target_key, output_key):
for key in json_obj:
if isinstance(json_obj[key], dict):
findValue(json_obj[key], target_key, output_key)
else:
if target_key == key:
output_dict[output_key] = json_obj[key]
#Opens and parses json file
file = open('source_data.json', 'r')
json_read = file.read()
obj = json.loads(json_read)
#Opens and parses csv file (mapping)
with open('inputoutput.csv') as csvfile:
fr = csv.reader(csvfile)
for row in fr:
findValue(obj, row[0], row[1])
#creates/writes into json file
with open("output.json", "w") as out:
json.dump(output_dict, out, indent=4)
So you'll need to change the way that the mapping CSV is structured, as you'll need variables to determine which criteria to meet, and which value to return when the criteria is met...
Please note that with the logic implemented below, if there are 2 list items in Affiliate that both have the key ov set to true that only the last one will be added (dict keys are unique). You could put a return where I commented in the code, but then it would only use the first one of course.
I've restructured the CSV as below:
inputoutput.csv
Affiliate,Cntr,ov,true,value
Sample1,Output1,,,
Sample2,Output2,criteria2,true,returnvalue
The JSON I used as the source data is this one:
source_data.json
{
"Affiliate": [
{
"ov": true,
"value": "United States",
"lookupCode": "US"
},
{
"ov": false,
"value": "France",
"lookupCode": "FR"
}
],
"Sample1": "im a value",
"Sample2": [
{
"criteria2": false,
"returnvalue": "i am not a return value"
},
{
"criteria2": true,
"returnvalue": "i am a return value"
}
]
}
The actual code is below, note that I commented a bit on my choices.
main.py
import json
import csv
output_dict = {}
def str2bool(input: str) -> bool:
"""simple check to see if a str is a bool"""
# shamelessly stolen from:
# https://stackoverflow.com/a/715468/9267296
return input.lower() in ("yes", "true", "t", "1")
def findValue(
json_obj,
target_key,
output_key,
criteria_key=None,
criteria_value=False,
return_key="",
):
"""maps csv and json information"""
# ^^ use PEP standard for docstrings:
# https://www.python.org/dev/peps/pep-0257/#id16
# you need to global the output_dict to avoid weirdness
# see https://www.w3schools.com/python/gloss_python_global_scope.asp
global output_dict
for key in json_obj:
if isinstance(json_obj[key], dict):
findValue(json_obj[key], target_key, output_key)
# in this case I advise to use "elif" instead of the "else: if..."
elif target_key == key:
# so this is the actual logic change.
if isinstance(json_obj[key], list):
for item in json_obj[key]:
if (
criteria_key != None
and criteria_key in item
and item[criteria_key] == criteria_value
):
output_dict[output_key] = item[return_key]
# here you could put a return
else:
# this part doesn't change
output_dict[output_key] = json_obj[key]
# since we found the key and added in the output_dict
# you can return here to slightly speed up the total
# execution time
return
# Opens and parses json file
with open("source_data.json") as sourcefile:
json_obj = json.load(sourcefile)
# Opens and parses csv file (mapping)
with open("inputoutput.csv") as csvfile:
fr = csv.reader(csvfile)
for row in fr:
# this check is to determine if you need to add criteria
# row[2] would be the key to check
# row[3] would be the value that the key need to have
# row[4] would be the key for which to return the value
if row[2] != "":
findValue(json_obj, row[0], row[1], row[2], str2bool(row[3]), row[4])
else:
findValue(json_obj, row[0], row[1])
# Creates/writes into json file
with open("output.json", "w") as out:
json.dump(output_dict, out, indent=4)
running the above code with the input files I provided, results in the following file:
output.json
{
"Cntr": "United States",
"Output1": "im a value",
"Output2": "i am a return value"
}
I know there are ways to optimize this, but I wanted to keep it close to the original. You might need to play with the exact way you add stuff to output_dict to get the exact output JSON you want...
I am looking to create a python script to be able to convert a nested json file to a csv file, with each inner most child having its own row, that includes all of the parent fields in the row as well.
My nested json looks :
(Note this is just a small excerpt, there are hundreds of date/value pairs)
{
"test1": true,
"test2": [
{
"name_id": 12345,
"tags": [
{
"prod_id": 54321,
"history": [
{
"date": "Feb-2-2019",
"value": 6
},
{
"date": "Feb-3-2019",
"value": 5
},
{
"date": "Feb-4-2019",
"value": 4
}
The goal is to write to a csv where each row shows the values for the inner most field and all of its parents. (e.g, date, value, prod_id, name_id, test1). Basically creating a row for each date & value, with all of the parent field values included as well.
I started using this resource as a foundation, but still not exactly what I'm trying to accomplish:
How to Flatten Deeply Nested JSON Objects in Non-Recursive Elegant Python
I've tried tweaking this script but have not been able to come up with a solution. This seems like a relatively easy task, so maybe there's something I'm missing.
A lot of what you want to do is very data-format specific. Here's something using a function loosely derived from the "traditional recursive" solution shown in linked resource you cited since it will work fine with this data since it's not that deeply nested and is simpler than the iterative approach also illustrtated.
The flatten_json() function returns a list, with each value corresponding to keys in the JSON object passed to it.
Note this is Python 3 code.
from collections import OrderedDict
import csv
import json
def flatten_json(nested_json):
""" Flatten values of JSON object dictionary. """
name, out = [], []
def flatten(obj):
if isinstance(obj, dict):
for key, value in obj.items():
name.append(key)
flatten(value)
elif isinstance(obj, list):
for index, item in enumerate(obj):
name.append(str(index))
flatten(item)
else:
out.append(obj)
flatten(nested_json)
return out
def grouper(iterable, n):
""" Collect data in iterable into fixed-length chunks or blocks. """
args = [iter(iterable)] * n
return zip(*args)
if __name__ == '__main__':
json_str = """
{"test1": true,
"test2": [
{"name_id": 12345,
"tags": [
{"prod_id": 54321,
"history": [
{"date": "Feb-2-2019", "item": 6},
{"date": "Feb-3-2019", "item": 5},
{"date": "Feb-4-2019", "item": 4}
]
}
]
}
]
}
"""
# Flatten the json object into a list of values.
json_obj = json.loads(json_str, object_pairs_hook=OrderedDict)
flattened = flatten_json(json_obj)
print('flattened:', flattened)
# Create row dictionaies for each (data, value) pair at the end of the list
# flattened values with all of the preceeding fields repeated in each one.
test1, name_id, prod_id = flattened[:3]
rows = []
for date, value in grouper(flattened[3:], 2):
rows.append({'date': date, 'value': value,
'prod_id': prod_id, 'name_id': name_id, 'test1': prod_id})
# Write rows to a csv file.
filename = 'product_tests.csv'
fieldnames = 'date', 'value', 'prod_id', 'name_id', 'test1'
with open(filename, mode='w', newline='') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames)
writer.writeheader() # Write csv file header row (optional).
writer.writerows(rows)
print('"{}" file written.'.format(filename))
Here's what it prints:
flattened: [True, 12345, 54321, 'Feb-2-2019', 6, 'Feb-3-2019', 5, 'Feb-4-2019', 4]
"product_tests.csv" file written.
And here's the contents of the product_tests.csv file produced:
date,value,prod_id,name_id,test1
Feb-2-2019,6,54321,12345,True
Feb-3-2019,5,54321,12345,True
Feb-4-2019,4,54321,12345,True
I've got a json file that I've pulled from a web service and am trying to parse it. I see that this question has been asked a whole bunch, and I've read whatever I could find, but the json data in each example appears to be very simplistic in nature. Likewise, the json example data in the python docs is very simple and does not reflect what I'm trying to work with. Here is what the json looks like:
{"RecordResponse": {
"Id": blah
"Status": {
"state": "complete",
"datetime": "2016-01-01 01:00"
},
"Results": {
"resultNumber": "500",
"Summary": [
{
"Type": "blah",
"Size": "10000000000",
"OtherStuff": {
"valueOne": "first",
"valueTwo": "second"
},
"fieldIWant": "value i want is here"
The code block in question is:
jsonFile = r'C:\Temp\results.json'
with open(jsonFile, 'w') as dataFile:
json_obj = json.load(dataFile)
for i in json_obj["Summary"]:
print(i["fieldIWant"])
Not only am I not getting into the field I want, but I'm also getting a key error on trying to suss out "Summary".
I don't know how the indices work within the array; once I even get into the "Summary" field, do I have to issue an index manually to return the value from the field I need?
The example you posted is not valid JSON (no commas after object fields), so it's hard to dig in much. If it's straight from the web service, something's messed up. If you did fix it with proper commas, the "Summary" key is within the "Results" object, so you'd need to change your loop to
with open(jsonFile, 'w') as dataFile:
json_obj = json.load(dataFile)
for i in json_obj["Results"]["Summary"]:
print(i["fieldIWant"])
If you don't know the structure at all, you could look through the resulting object recursively:
def findfieldsiwant(obj, keyname="Summary", fieldname="fieldIWant"):
try:
for key,val in obj.items():
if key == keyname:
return [ d[fieldname] for d in val ]
else:
sub = findfieldsiwant(val)
if sub:
return sub
except AttributeError: #obj is not a dict
pass
#keyname not found
return None
Input/Goal
My input data is an OrderedDict for which there can be a variable depth of nested OrderedDicts so I have opted to handle parsing this output recursively. The desired output is a csv with header.
Elaboration of Problem
My code below will work once I am able to correctly define field_name upon traversing back up a branch after completing all of a branch's leaves. (i.e. Type_1.Field_3.Data will incorrectly be called Type_1.Field_2.Field_3.Data).
Once the leaves on a branch have been exhausted, I want to remove the last .Field_x from the field_name so that a new (correct) one can be added for the following object.
Request for Help
Does anyone see where I can include this feature? Thanks,
...
Dependencies:
Code Snippet:
def get_soql_fields(soql):
soql_fields = re.search('(?<=select)(?s)(.*)(?=from)', soql) # get fields
soql_fields = re.sub(' ', '', soql_fields.group()) # remove extra spaces
fields = re.split(',|\n|\r', soql_fields) # split on commas and newlines
fields = [field for field in fields if field != ''] # remove empty strings
return fields
def parse_output(data, soql):
fields = get_soql_fields(soql)
header = fields
master = [header]
for record in data['records']: # for each 'record' in response
row = []
for obj, value in record.iteritems(): # for each obj in record
if isinstance(value, basestring): # if query base object has desired fields
if obj in fields:
row.append(value)
elif isinstance(value, dict): # traverse down into object
path = obj
row.append(_traverse_output(obj, value, fields, row, path))
master.append(row)
return master
def _traverse_output(obj, value, fields, row, path):
for f, v in value.iteritems(): # for each item in obj
if not isinstance(v, (dict, list, tuple)):
field_name = '{path}.{name}'.format(path=path, name=f) # TODO fix this to full field name
print('FName: {0}'.format(field_name))
if field_name in fields:
print('match')
row.append(v)
elif isinstance(v, dict): # it is a dict
path += '.{obj}'.format(obj=f)
_traverse_output(f, v, fields, row, path)
Example Salesforce SOQL:
select
Type_1.Field_1,
Type_1.Field_2.Data,
Type_1.Field_3,
Type_1.Field_4,
Type_1.Field_5.Data_1.Data,
Type_1.Field_6,
Type_2.Field_1,
Type_2.Field_2
from
Obj_1
limit
1
;
Example Salesforce Output:
{
"records": [
{
"attributes": {
"type": "Obj_1",
"url": "<url>"
},
"Type_1": {
"attributes": {
"type": "Type_1",
"url": "<url>"
},
"Field_1": "<stuff>",
"Field_2": {
"attributes": {
"type": "Field_2",
"url": "<url>"
},
"Data": "<data>"
},
"Field_3": "<data>",
"Field_4": "<data>",
"Field_5": {
"attributes": {
"type": "Field_2",
"url": "<url>"
},
"Data_1": {
"attributes": {
"type": "Data_1",
"url": "<url>"
},
"Data": "<data>"
}
},
"Field_6": 1.0
},
"Type_2": {
"attributes": {
"type": "Type_2",
"url": "<url>"
},
"Field_1": "<data>",
"Field_2": "<data>"
}
}
]
}
I worked out a quick solution for this. I'll just note what I figured out, and append the code I wrote to the end.
Essentially your problem is that you keep trying to modify path in place, which isn't going to work. Instead do something like
new_path = path + '.{obj}'.format(obj=f)
_traverse_output(f, v, fields, row, new_path)
A note about this: it will NOT necessarily result in a row where the values are in the same order as the header (i.e., if Type_1.Field_1 is in position 0 of the header list, then the value corresponding to it might not be).
The easy way to solve this (and handle csvs in general) is to use DictWriter from the csv module, then pass an empty dictionary to your first call where the keys will be the field names and the values will be their values.
Another way to solve the problem is to pre-populate your row list with None or empty strings, then use the list.index method to assign the value to the appropriate position.
I wrote an implementation of _traverse_output as examples for each, though they differ slightly from your code. They take an element of the 'records' list.
Dictionary Example
def _traverse_output_with_dict(record, fields, row_values, field_name=''):
for obj, value in record.iteritems():
new_field_name = '{}.{}'.format(field_name, obj) if field_name else obj
print new_field_name
if not isinstance(value, dict):
if new_field_name in fields:
row_values[new_field_name] = value
else:
_traverse_output_with_dict(value, fields, row_values, new_field_name)
List Example
def _traverse_output_with_list(record, fields, row, field_name=''):
while len(row) < len(fields):
row.append('')
for obj, value in record.iteritems():
new_field_name = '{}.{}'.format(field_name, obj) if field_name else obj
print new_field_name
if not isinstance(value, dict):
if new_field_name in fields:
row[fields.index(new_field_name)] = value
else:
_traverse_output_with_list(value, fields, row, new_field_name)