Editing Json Program - python

My code gets a json path file, open/parses it and prints out desired values with help of a csv mapping file set up (knows what key words to look for and what name to print values out as).
Some json files, however, have multiple values for example, a json file with key "Affiliate" will have more key/value pairs inside of it instead of just having a value.
How can I parse within a key like this one and print out the 'true' value vs the 'false' ones? Currently my code would print out the entire array of key value pairs within that target key.
Example json:
"Affiliate": [
{
"ov": true,
"value": "United States",
"lookupCode": "US"
},
{
"ov": false,
"value": "France",
"lookupCode": "FR"
}
]
My code:
import json
import csv
output_dict = {}
#maps csv and json information
def findValue(json_obj, target_key, output_key):
for key in json_obj:
if isinstance(json_obj[key], dict):
findValue(json_obj[key], target_key, output_key)
else:
if target_key == key:
output_dict[output_key] = json_obj[key]
#Opens and parses json file
file = open('source_data.json', 'r')
json_read = file.read()
obj = json.loads(json_read)
#Opens and parses csv file (mapping)
with open('inputoutput.csv') as csvfile:
fr = csv.reader(csvfile)
for row in fr:
findValue(obj, row[0], row[1])
#creates/writes into json file
with open("output.json", "w") as out:
json.dump(output_dict, out, indent=4)

So you'll need to change the way that the mapping CSV is structured, as you'll need variables to determine which criteria to meet, and which value to return when the criteria is met...
Please note that with the logic implemented below, if there are 2 list items in Affiliate that both have the key ov set to true that only the last one will be added (dict keys are unique). You could put a return where I commented in the code, but then it would only use the first one of course.
I've restructured the CSV as below:
inputoutput.csv
Affiliate,Cntr,ov,true,value
Sample1,Output1,,,
Sample2,Output2,criteria2,true,returnvalue
The JSON I used as the source data is this one:
source_data.json
{
"Affiliate": [
{
"ov": true,
"value": "United States",
"lookupCode": "US"
},
{
"ov": false,
"value": "France",
"lookupCode": "FR"
}
],
"Sample1": "im a value",
"Sample2": [
{
"criteria2": false,
"returnvalue": "i am not a return value"
},
{
"criteria2": true,
"returnvalue": "i am a return value"
}
]
}
The actual code is below, note that I commented a bit on my choices.
main.py
import json
import csv
output_dict = {}
def str2bool(input: str) -> bool:
"""simple check to see if a str is a bool"""
# shamelessly stolen from:
# https://stackoverflow.com/a/715468/9267296
return input.lower() in ("yes", "true", "t", "1")
def findValue(
json_obj,
target_key,
output_key,
criteria_key=None,
criteria_value=False,
return_key="",
):
"""maps csv and json information"""
# ^^ use PEP standard for docstrings:
# https://www.python.org/dev/peps/pep-0257/#id16
# you need to global the output_dict to avoid weirdness
# see https://www.w3schools.com/python/gloss_python_global_scope.asp
global output_dict
for key in json_obj:
if isinstance(json_obj[key], dict):
findValue(json_obj[key], target_key, output_key)
# in this case I advise to use "elif" instead of the "else: if..."
elif target_key == key:
# so this is the actual logic change.
if isinstance(json_obj[key], list):
for item in json_obj[key]:
if (
criteria_key != None
and criteria_key in item
and item[criteria_key] == criteria_value
):
output_dict[output_key] = item[return_key]
# here you could put a return
else:
# this part doesn't change
output_dict[output_key] = json_obj[key]
# since we found the key and added in the output_dict
# you can return here to slightly speed up the total
# execution time
return
# Opens and parses json file
with open("source_data.json") as sourcefile:
json_obj = json.load(sourcefile)
# Opens and parses csv file (mapping)
with open("inputoutput.csv") as csvfile:
fr = csv.reader(csvfile)
for row in fr:
# this check is to determine if you need to add criteria
# row[2] would be the key to check
# row[3] would be the value that the key need to have
# row[4] would be the key for which to return the value
if row[2] != "":
findValue(json_obj, row[0], row[1], row[2], str2bool(row[3]), row[4])
else:
findValue(json_obj, row[0], row[1])
# Creates/writes into json file
with open("output.json", "w") as out:
json.dump(output_dict, out, indent=4)
running the above code with the input files I provided, results in the following file:
output.json
{
"Cntr": "United States",
"Output1": "im a value",
"Output2": "i am a return value"
}
I know there are ways to optimize this, but I wanted to keep it close to the original. You might need to play with the exact way you add stuff to output_dict to get the exact output JSON you want...

Related

Generic way to get the value of key in nested json using python

I was new to python, my requirement is to check whether the given key exists on the json or not. Json will not be same all the time. So, am looking for the generic function to check whether the key exists. It works for simple json , but returns nothing when the json itself has another json with an jsonarray inside as shown below:
{
"id": "1888741f-173a-4366-9fa0-a156d8734972",
"type": "events",
"version": "1.0.0",
"count": 3,
"payload": {
"cmevents": [
{
"exit_code": "0dbc2745-a964-4ce3-b7a0-bd5295afc620",
"sourceEventType": "test01",
"sourceType": "test",
product:{
"productCode":"101"
}
},
{
"exit_code": "1dbc2745-a964-4ce3-b7a0-bd5295afc620",
"sourceEventType": "test02",
"sourceType": "test",
product:{
"productCode":"102"
}
},
{
"exit_code": "2dbc2745-a964-4ce3-b7a0-bd5295afc620",
"sourceEventType": "test03",
"sourceType": "test",
product:{
"productCode":"103"
}
}
]
}
}
from the above json , i wants to check the key sourceEventType exists in all items of cmevents list.
Following is the function I have used
def checkElementsExist(element, JSON, path, whole_path):
if element in JSON:
path = path + element + ' = ' + JSON[element].encode('utf-8')
whole_path.append(path)
//
for key in JSON:
if isinstance(JSON[key], dict):
finds(element, JSON[key], path + key + '.', whole_path)
To call the function:
whole_path = []
finds('sourceEventType', json, '', whole_path)
Can anyone please help me with the right solution
A recursive function is probably an easier approach here.
Parse the json using json.loads(text)
Search the tree
import json
text = json.loads(".....")
def search_for_key(key, content) -> bool:
# Search each item of the list
# if found return true
if isinstance(content, list):
for elem in content:
if search_for_key(key, elem):
return True
# Search each key of the dictionary for match
# If the key isn't a match try searching the value
# of the key in the dictionary
elif isinstance(content, dict):
for key_in_json in content:
if key_in_json == key:
return True
if search_for_key(key, content[key_in_json]):
return True
# If here it's a number or string. There is no where else to go
# return false
else:
return False
print(search_for_key("some_key?", text))
It's definitely possible to make your path approach work BUT you're doing need to keep track of what paths you haven't explored yet using a stack/queue (and you get this for free with a recursive function).

Finding the JSON file containing a particular key value , among multiple JSON files

So, I have 5 JSON files named as JSON1,JSON2,JSON3,JSON4 and JSON5. General format of these json files is
General format of JSON files
{
"flower": {
"price": {
"type": "good",
"value": 5282.0,
"direction": "up"
}
},
"furniture": {
"price": {
"type": "comfy",
"value": 9074.0,
"direction": "down"
}
}
}
Among all these json files , I need to find that json file which has a particular value/data given by the user. for eg- if the user gives an input for the "value" = 9074 that it want to search in all the JSON files and gives an output as the JSON file which contains the value mentioned along with the line which has the mentioned data.
The approach I used was:
import json
with open('JSON1.json','r') as f1:
item1 = json.load(f1)
with open('JSON2.json','r') as f1:
item2 = json.load(f1)
with open('JSON3.json','r') as f1:
item3 = json.load(f1)
with open('JSON4.json','r') as f1:
item4 = json.load(f1)
with open('JSON5.json','r') as f1:
item5 = json.load(f1)
# Input the value that user want to search
item = input("Enter the value:\n")
# function to search the json file
def search_json(value):
int i = 0;
for keyvalue in item(i+1):
i++;
if value == keyval['value']
return keyvalue[name of the json file]
if(search_json(item) !=None):
print("this value is present in json log :",search_json(item))
the output observed should be:
Enter the value: 9074
This value is present in json log JSON1 and is present at 12 line of code
But, the above approach isn't efficient and correct . As, I'm new to learning python therefore I'm confused with the correct approach. It'll be really grateful if someone could help.
Loop through the filenames.
def find_value_in_json_files(value, files):
for f in files:
with open(f) as f:
item = json.load(f)
for k, v in item.items():
price = v['price']
if price.get('value', price.get('values') == value:
return f, k
print(find_value_in_json_files(9074, ['JSON1.json', 'JSON2.json', 'JSON3.json', 'JSON4.json', 'JSON5.json']))

Format some JSON object with certain fields on one-line?

I want to re-format a JSON file so that certain objects (dictionaries) with some specific keys are on one-line.
For example, any object with key name should appear in one line:
{
"this": "that",
"parameters": [
{ "name": "param1", "type": "string" },
{ "name": "param2" },
{ "name": "param3", "default": "#someValue" }
]
}
The JSON file is generated, and contains programming language data. One-line certain fields makes it much easier to visually inspect/review.
I tried to override python json.JSONEncoder to turn matching dict into a string before writing, only to realize quotes " within the string are escaped again in the result JSON file, defeating my purpose.
I also looked at jq but couldn't figure out a way to do it. I found similar questions and solutions based on line length, but my requirements are simpler, and I don't want other shorter lines to be changed. Only certain objects or fields.
This code recursively replaces all the appropriate dicts in the data with unique strings (UUIDs) and records those replacements, then in the indented JSON string the unique strings are replaced with the desired original single line JSON.
replace returns a pair of:
A modified version of the input argument data
A list of pairs of JSON strings where for each pair the first value should be replaced with the second value in the final pretty printed JSON.
import json
import uuid
def replace(o):
if isinstance(o, dict):
if "name" in o:
replacement = uuid.uuid4().hex
return replacement, [(f'"{replacement}"', json.dumps(o))]
replacements = []
result = {}
for key, value in o.items():
new_value, value_replacements = replace(value)
result[key] = new_value
replacements.extend(value_replacements)
return result, replacements
elif isinstance(o, list):
replacements = []
result = []
for value in o:
new_value, value_replacements = replace(value)
result.append(new_value)
replacements.extend(value_replacements)
return result, replacements
else:
return o, []
def pretty(data):
data, replacements = replace(data)
result = json.dumps(data, indent=4)
for old, new in replacements:
result = result.replace(old, new)
return result
print(pretty({
"this": "that",
"parameters": [
{"name": "param1", "type": "string"},
{"name": "param2"},
{"name": "param3", "default": "#someValue"}
]
}))

Converting nested json to a csv, where each row includes innermost values and all parents values

I am looking to create a python script to be able to convert a nested json file to a csv file, with each inner most child having its own row, that includes all of the parent fields in the row as well.
My nested json looks :
(Note this is just a small excerpt, there are hundreds of date/value pairs)
{
"test1": true,
"test2": [
{
"name_id": 12345,
"tags": [
{
"prod_id": 54321,
"history": [
{
"date": "Feb-2-2019",
"value": 6
},
{
"date": "Feb-3-2019",
"value": 5
},
{
"date": "Feb-4-2019",
"value": 4
}
The goal is to write to a csv where each row shows the values for the inner most field and all of its parents. (e.g, date, value, prod_id, name_id, test1). Basically creating a row for each date & value, with all of the parent field values included as well.
I started using this resource as a foundation, but still not exactly what I'm trying to accomplish:
How to Flatten Deeply Nested JSON Objects in Non-Recursive Elegant Python
I've tried tweaking this script but have not been able to come up with a solution. This seems like a relatively easy task, so maybe there's something I'm missing.
A lot of what you want to do is very data-format specific. Here's something using a function loosely derived from the "traditional recursive" solution shown in linked resource you cited since it will work fine with this data since it's not that deeply nested and is simpler than the iterative approach also illustrtated.
The flatten_json() function returns a list, with each value corresponding to keys in the JSON object passed to it.
Note this is Python 3 code.
from collections import OrderedDict
import csv
import json
def flatten_json(nested_json):
""" Flatten values of JSON object dictionary. """
name, out = [], []
def flatten(obj):
if isinstance(obj, dict):
for key, value in obj.items():
name.append(key)
flatten(value)
elif isinstance(obj, list):
for index, item in enumerate(obj):
name.append(str(index))
flatten(item)
else:
out.append(obj)
flatten(nested_json)
return out
def grouper(iterable, n):
""" Collect data in iterable into fixed-length chunks or blocks. """
args = [iter(iterable)] * n
return zip(*args)
if __name__ == '__main__':
json_str = """
{"test1": true,
"test2": [
{"name_id": 12345,
"tags": [
{"prod_id": 54321,
"history": [
{"date": "Feb-2-2019", "item": 6},
{"date": "Feb-3-2019", "item": 5},
{"date": "Feb-4-2019", "item": 4}
]
}
]
}
]
}
"""
# Flatten the json object into a list of values.
json_obj = json.loads(json_str, object_pairs_hook=OrderedDict)
flattened = flatten_json(json_obj)
print('flattened:', flattened)
# Create row dictionaies for each (data, value) pair at the end of the list
# flattened values with all of the preceeding fields repeated in each one.
test1, name_id, prod_id = flattened[:3]
rows = []
for date, value in grouper(flattened[3:], 2):
rows.append({'date': date, 'value': value,
'prod_id': prod_id, 'name_id': name_id, 'test1': prod_id})
# Write rows to a csv file.
filename = 'product_tests.csv'
fieldnames = 'date', 'value', 'prod_id', 'name_id', 'test1'
with open(filename, mode='w', newline='') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames)
writer.writeheader() # Write csv file header row (optional).
writer.writerows(rows)
print('"{}" file written.'.format(filename))
Here's what it prints:
flattened: [True, 12345, 54321, 'Feb-2-2019', 6, 'Feb-3-2019', 5, 'Feb-4-2019', 4]
"product_tests.csv" file written.
And here's the contents of the product_tests.csv file produced:
date,value,prod_id,name_id,test1
Feb-2-2019,6,54321,12345,True
Feb-3-2019,5,54321,12345,True
Feb-4-2019,4,54321,12345,True

how to get specific value in python dictionary?

I call an api via python and return this code as response:
{
"cast": [
{
"character": "Power",
"name": "George"
},
{
"character": "Max",
"job": "Sound",
"name": "Jash"
},
{
"character": "Miranda North",
"job": "Writer",
"name": "Rebecca"
}
]
}
I am trying to get the value of Rebecca because i need to get the Writer.
So i wrote:
for person in cast # cast is the variable keeps whole code above NOT inside the dict:
if person["job"] == "Writer":
writer = person["name"]
but it gives me:
KeyError at search/15
u'job'
how can i get the value?
FULL CODE:
writer = ""
for person in api['cast']:
if person.get('job') == 'Writer':
writer = person.get('name')
return render(request, 'home.html', {
'writer': writer
})
home.html:
<p>{{writer}}</p>
That's because not all elements in the list have the job key.
Change to:
for person in cast #whole code above:
if person.get('job') == 'Writer':
writer = person.get('name')
One liner to find one writer.
writer = next((person for person in api['cast'] if person.get('job') == 'Writer'), None)
One liner to find all writers.
writers = [person for person in api['cast'] if person.get('job') == 'Writer']
Syntax for dictionary get() method:
dict.get(key, default=None)
Parameters
key: This is the Key to be searched in the dictionary.
default: This is the Value to be returned in case key does not exist.
You need to specify the default value for get in case the key doesn't exist.
>>> for person in api['cast']:
... if person.get('job', '') == 'Writer':
... writer = person.get('name')
person.get(u"job") == "Writer"
for person in cast["cast"]:
# cast is the variable keeps whole code above NOT inside the dict
if person["job"] == "Writer":
writer = person["name"]
try this
cast["cast"] == Value of Key "cast" , which in turn is list of Dicts
and for looping through each Dictionary as person

Categories