dir = {"sample":[
{ "key1":"data1" }
,
{ "key1":"data2" }
,
{ "key2":"data3" }
,
{ "key2":"data4" }
]
}
with my code:
listKey1 = []
listKey2 = []
with open(dir) as json_file:
data = json.load(json_file)
for p in data['sample']:
key1data = p['key1']
print("key1: " + key1data)
listKey1.append(key1data)
key2data = p['key2']
print("key2: " + key2data)
listKey2.append(key2data)
Im trying to store the data under the key1 and key2 keys into the listKey1 and listKey2 but i am getting the error:
KeyError: 'key1'
KeyError: 'key2'
As we can see on my file that both key1 and key2 are present.
Here is the modified code. Just check if the key exists before using it.
listKey1 = []
listKey2 = []
with open(dir) as json_file:
data = json.load(json_file)
for p in data['sample']:
if "key1" in p.keys(): # check if key exists on the current index
key1data = p['key1']
print("key1: " + key1data)
listKey1.append(key1data)
if "key2" in p.keys(): # check if key exists on the current index
key2data = p['key2']
print("key2: " + key2data)
listKey2.append(key2data)
Here is the output I get
key1: data1
key1: data2
key2: data3
key2: data4
Here is the reason why the key error occurs. Notice that there are different keys based on the index.
index [ 0 ] : { "key1":"data1" } - The key is "key1"-
,
index [ 1 ] :{ "key1":"data2" } - The key is "key1"
,
index [ 2 ] :{ "key2":"data3" } - The key is "key2"
,
index [ 3 ] :{ "key2":"data4" } - The key is "key2"
The issue that while you are looping p value does not always have key1 or key2. It either has one or the other.
So when it finds key1 it prints the data but gives an error for key2.
And when it finds key2 it prints the data but gives an error for key1
A good option for you is to use the get() method. If the key is present it will return the value else it will give the default value.
Try the code below.
listKey1 = []
listKey2 = []
with open(dir) as json_file:
data = json.load(json_file)
for p in data['sample']:
key1data = p.get('key1',"")
print("key1: " + key1data)
listKey1.append(key1data)
key2data = p.get('key2',"")
print("key2: " + key2data)
listKey2.append(key2data)
You can iterate over the elements, compare by the key and store the data accordingly:
dd = {
"sample":
[
{ "key1":"data1" },
{ "key1":"data2" },
{ "key2":"data3" },
{ "key2":"data4" }
]
}
key1data = []
key2data = []
for elem in dd['sample']:
for key, val in elem.items():
if key == "key1":
key1data.append(elem.get("key1"))
else:
key2data.append(elem.get("key2"))
print(key1data)
print(key2data)
OUTPUT:
['data1', 'data2']
['data3', 'data4']
Related
I have tried to use the online Jsonify It tool which can create nested JSON data from my data but I can't seem to get that to work. I have also tried to use the Python code from other posts on but they do not seem to work either. If you know an easier method than using Python, that would be good.
Here is my .CSV data:
ID,Name,Date,Subject,Start,Finish
0,Ladybridge High School,01/11/2019,Maths,05:28,0
0,Ladybridge High School,02/11/2019,Maths,05:30,06:45
0,Ladybridge High School,01/11/2019,Economics,11:58,12:40
0,Ladybridge High School,02/11/2019,Economics,11:58,12:40
1,Loreto Sixth Form,01/11/2019,Maths,05:28,06:45
1,Loreto Sixth Form,02/11/2019,Maths,05:30,06:45
1,Loreto Sixth Form,01/11/2019,Economics,11:58,12:40
1,Loreto Sixth Form,02/11/2019,Economics,11:58,12:40
This is the nested JSON structure I would like:
{
"Timetable" : [ {
"Date" : {
"01-11-2019" : {
"Maths" : {
"Start" : "05:28",
"Finish" : "06:45"
},
"Economics" : {
"Start" : "11:58",
"Finish" : "12:40"
}
},
"02-11-2019" : {
"Maths" : {
"Start" : "05:30",
"Finish" : "06:45"
},
"Economics" : {
"Start" : "11:58",
"Finish" : "12:40"
}
}
},
"Name" : "Ladybridge High School"
}, {
"Date" : {
"01-11-2019" : {
"Maths" : {
"Start" : "05:28",
"Finish" : "06:45"
},
"Economics" : {
"Start" : "11:58",
"Finish" : "12:40"
}
},
"02-11-2019" : {
"Maths" : {
"Start" : "05:30",
"Finish" : "06:45"
},
"Economics" : {
"Start" : "11:58",
"Finish" : "12:40"
}
}
},
"Name" : "Loreto Sixth From"
} ]
}
Something like this?
[EDIT]
I refactored it to handle arbitrary top-level keys for each entry in the timetable. I also made it first create a dict and then convert the dict to a list so that it can run in O(N) time, in case the input is very large.
import csv
timetable = {}
with open('data.csv') as f:
csv_data = [{k: v for k, v in row.items()} for row in csv.DictReader(f, skipinitialspace=True)]
for row in csv_data:
if not timetable.get(row["ID"]):
timetable[row["ID"]] = {"ID": row["ID"], "Date": {}}
for k in row.keys():
# Date has to be handled as a special case
if k == "Date":
timetable[row["ID"]]["Date"][row["Date"]] = {}
timetable[row["ID"]]["Date"][row["Date"]][row["Subject"]] = {
"Start": row["Start"],
"Finish": row["Finish"]
}
# Ignore these keys because they are only for 'Date'
elif k == "Start" or k == "Finish" or k == "Subject":
continue
# Use everything else
else:
timetable[row["ID"]][k] = row[k]
timetable = {"Timetable": [v for k, v in timetable.items()]}
An improvement to the above answer to nest the ID before the name and date:
import csv
timetable = {"Timetable": []}
print(timetable)
with open("C:/Users/kspv914/Downloads/data.csv") as f:
csv_data = [{k: v for k, v in row.items()} for row in csv.DictReader(f, skipinitialspace=True)]
name_array = []
for name in [row["Name"] for row in csv_data]:
name_array.append(name)
name_set = set(name_array)
for name in name_set:
timetable["Timetable"].append({"Name": name, "Date": {}})
for row in csv_data:
for entry in timetable["Timetable"]:
if entry["Name"] == row["Name"]:
entry["Date"][row["Date"]] = {}
entry["Date"][row["Date"]][row["Subject"]] = {
"Start": row["Start"],
"Finish": row["Finish"]
}
print(timetable)
I have JSON file as mentioned below,
**test.json**
{
"header1" :
{
"header1_body1":
{
"some_key":"some_value",
.......................
},
"header1_body2":
{
"some_key":"some_value",
.......................
}
},
"header2":
{
"header2_body1":
{
"some_key":"some_value",
.......................
},
"header2_body2":
{
"some_key":"some_value",
.......................
}
}
}
Would like to group the JSON content into lists as below:
header1 = ['header1_body1','header1_body2']
header2 = ['header2_body1','header2_body2']
header1, header2 can be till ....header n. So dynamically lists has to be created containing it's values as shown above.
How can i achieve this ?
What's the best optimal way to approach ?
SOLUTION:
with open('test.json') as json_data:
d = json.load(json_data)
for k,v in d.iteritems():
if k == "header1" or k == "header2":
globals()['{}'.format(k)] = d[k].keys()
now, header1 and header2 can be accessed as list.
for i in header1:
print i
Assuming you read the JSON into a variable d (maybe using json.loads), you could iterate over the keys (sorted?) and build the lists with the keys of current value:
for key in sorted(d.keys()):
l = [x for x in sorted(d[key].keys())] # using list comprehension
print(key + ' = ' + str(l))
Fixing your json structure:
{
"header1" :
{
"header1_body1":
{
"some_key":"some_value"
},
"header1_body2":
{
"some_key":"some_value"
}
},
"header2":
{
"header2_body1":
{
"some_key":"some_value"
},
"header2_body2":
{
"some_key":"some_value"
}
}
}
And then loading and creating lists:
header = []
for key, value in dictdump.items():
header.append(list(value.keys()))
for header_num in range(0, len(header)):
print("header{} : {}".format(header_num + 1, header[header_num]))
Gives:
header1 : ['header1_body1', 'header1_body2']
header2 : ['header2_body1', 'header2_body2']
Once you load your json, you can get the list you want for any key by doing something like the following (headers variable below is a placeholder for your loaded json). You don't need to convert it to a list to work with it as an iterable but wrapped it in list(...) to match the output in your question.
list(headers['header1'].keys())
If you need to actually store the list of keys for each of your "header" dicts in some sort of accessible format, then you could create another dictionary that contains the lists you want. For example:
import json
data = """{
"header1" : {
"header1_body1": {
"some_key":"some_value"
},
"header1_body2": {
"some_key":"some_value"
}
},
"header2": {
"header2_body1": {
"some_key":"some_value"
},
"header2_body2": {
"some_key":"some_value"
}
}
}"""
headers = json.loads(data)
# get the list of keys for a specific header
header = list(headers['header1'].keys())
print(header)
# ['header1_body1', 'header1_body2']
# if you really want to store them in another dict
results = {h[0]: list(h[1].keys()) for h in headers.items()}
print(results)
# OUTPUT
# {'header1': ['header1_body1', 'header1_body2'], 'header2': ['header2_body1', 'header2_body2']}
You can use recursion:
d = {'header1': {'header1_body1': {'some_key': 'some_value'}, 'header1_body2': {'some_key': 'some_value'}}, 'header2': {'header2_body1': {'some_key': 'some_value'}, 'header2_body2': {'some_key': 'some_value'}}}
def flatten(_d):
for a, b in _d.items():
yield a
if isinstance(b, dict):
yield from flatten(b)
new_results = {a:[i for i in flatten(b) if i.startswith(a)] for a, b in d.items()}
Output:
{'header1': ['header1_body1', 'header1_body2'], 'header2': ['header2_body1', 'header2_body2']}
import json
with open('test.json') as json_data:
d = json.load(json_data)
for k,v in d.iteritems():
if k == "header1" or k == "header2":
globals()['{}'.format(k)] = d[k].keys()
now, `header1` and `header2` can be accessed as list.
for i in header1:
print i
Background
For some background, I'm trying to create a tool that converts worksheets into API calls using Python 3.5
For the conversion of the table cells to the schema needed for the API call, I've started down the path of using javascript like syntax for the headers used in the spreadsheet. e.g:
Worksheet Header (string)
dict.list[0].id
Python Dictionary
{
"dict":
"list": [
{"id": "my cell value"}
]
}
It's also possible that the header schema could have nested arrays/dicts:
one.two[0].three[0].four.five[0].six
And I also need to append to the object after it has been created as I go through each header.
What I've tried
add_branch
Based on https://stackoverflow.com/a/47276490/2903486 I am able to get nested dictionaries setup using values like one.two.three.four and I'm able to append to the existing dictionary as I go through the rows but I've been unable to add in support for arrays:
def add_branch(tree, vector, value):
key = vector[0]
tree[key] = value \
if len(vector) == 1 \
else add_branch(tree[key] if key in tree else {},
vector[1:],
value)
return tree
file = Worksheet(filePath, sheet).readRow()
rowList = []
for row in file:
rowObj = {}
for colName, rowValue in row.items():
rowObj.update(add_branch(rowObj, colName.split("."), rowValue))
rowList.append(rowObj)
return rowList
My own version of add_branch
import re, json
def branch(tree, vector, value):
"""
Used to convert JS style notation (e.g dict.another.array[0].id) to a python object
Originally based on https://stackoverflow.com/a/47276490/2903486
"""
# Convert Boolean
if isinstance(value, str):
value = value.strip()
if value.lower() in ['true', 'false']:
value = True if value.lower() == "true" else False
# Convert JSON
try:
value = json.loads(value)
except:
pass
key = vector[0]
arr = re.search('\[([0-9]+)\]', key)
if arr:
arr = arr.group(0)
key = key.replace(arr, '')
arr = arr.replace('[', '').replace(']', '')
newArray = False
if key not in tree:
tree[key] = []
tree[key].append(value \
if len(vector) == 1 \
else branch({} if key in tree else {},
vector[1:],
value))
else:
isInArray = False
for x in tree[key]:
if x.get(vector[1:][0], False):
isInArray = x[vector[1:][0]]
if isInArray:
tree[key].append(value \
if len(vector) == 1 \
else branch({} if key in tree else {},
vector[1:],
value))
else:
tree[key].append(value \
if len(vector) == 1 \
else branch({} if key in tree else {},
vector[1:],
value))
if len(vector) == 1 and len(tree[key]) == 1:
tree[key] = value.split(",")
else:
tree[key] = value \
if len(vector) == 1 \
else branch(tree[key] if key in tree else {},
vector[1:],
value)
return tree
What still needs help
My branch solution works pretty well actually now after adding in some things but I'm wondering if I'm doing something wrong/messy here or if theres a better way to handle where I'm editing nested arrays (my attempt started in the if IsInArray section of the code)
I'd expect these two headers to edit the last array, but instead I end up creating a duplicate dictionary on the first array:
file = [{
"one.array[0].dict.arrOne[0]": "1,2,3",
"one.array[0].dict.arrTwo[0]": "4,5,6"
}]
rowList = []
for row in file:
rowObj = {}
for colName, rowValue in row.items():
rowObj.update(add_branch(rowObj, colName.split("."), rowValue))
rowList.append(rowObj)
return rowList
Outputs:
[
{
"one": {
"array": [
{
"dict": {
"arrOne": [
"1",
"2",
"3"
]
}
},
{
"dict": {
"arrTwo": [
"4",
"5",
"6"
]
}
}
]
}
}
]
Instead of:
[
{
"one": {
"array": [
{
"dict": {
"arrOne": [
"1",
"2",
"3"
],
"arrTwo": [
"4",
"5",
"6"
]
}
}
]
}
}
]
So I'm not sure if there are any caveats in this solution, but this appears to work for some of the use cases i'm throwing at it:
import json, re
def build_job():
def branch(tree, vector, value):
# Originally based on https://stackoverflow.com/a/47276490/2903486
# Convert Boolean
if isinstance(value, str):
value = value.strip()
if value.lower() in ['true', 'false']:
value = True if value.lower() == "true" else False
# Convert JSON
try:
value = json.loads(value)
except:
pass
key = vector[0]
arr = re.search('\[([0-9]+)\]', key)
if arr:
# Get the index of the array, and remove it from the key name
arr = arr.group(0)
key = key.replace(arr,'')
arr = int(arr.replace('[','').replace(']',''))
if key not in tree:
# If we dont have an array already, turn the dict from the previous
# recursion into an array and append to it
tree[key] = []
tree[key].append(value \
if len(vector) == 1 \
else branch({} if key in tree else {},
vector[1:],
value))
else:
# Check to see if we are inside of an existing array here
isInArray = False
for i in range(len(tree[key])):
if tree[key][i].get(vector[1:][0], False):
isInArray = tree[key][i][vector[1:][0]]
if isInArray and arr < len(tree[key]) \
and isinstance(tree[key][arr], list):
# Respond accordingly by appending or updating the value
tree[key][arr].append(value \
if len(vector) == 1 \
else branch(tree[key] if key in tree else {},
vector[1:],
value))
else:
# Make sure we have an index to attach the requested array to
while arr >= len(tree[key]):
tree[key].append({})
# update the existing array with a dict
tree[key][arr].update(value \
if len(vector) == 1 \
else branch(tree[key][arr] if key in tree else {},
vector[1:],
value))
# Turn comma deliminated values to lists
if len(vector) == 1 and len(tree[key]) == 1:
tree[key] = value.split(",")
else:
# Add dictionaries together
tree.update({key: value \
if len(vector) == 1 \
else branch(tree[key] if key in tree else {},
vector[1:],
value)})
return tree
file = [{
"one.array[0].dict.dont-worry-about-me": "some value",
"one.array[0].dict.arrOne[0]": "1,2,3",
"one.array[0].dict.arrTwo[1]": "4,5,6",
"one.array[1].x.y[0].z[0].id": "789"
}]
rowList = []
for row in file:
rowObj = {}
for colName, rowValue in row.items():
rowObj.update(branch(rowObj, colName.split("."), rowValue))
rowList.append(rowObj)
return rowList
print(json.dumps(build_job(), indent=4))
Result:
[
{
"one": {
"array": [
{
"dict": {
"dont-worry-about-me": "some value",
"arrOne": [
"1",
"2",
"3"
],
"arrTwo": [
"4",
"5",
"6"
]
}
},
{
"x": {
"y": [
{
"z": [
{
"id": 789
}
]
}
]
}
}
]
}
}
]
I am looking to write a recursive function:
arguments: d, dictionary
result: list of dictionaries
def expand_dictionary(d):
return []
The function recursively goes through a dictionary and flattens nested objects using an _, in addition it expands out nested lists into the array, and includes the parent label.
Think of creating a relational model from a document.
Here is an example input and output:
original_object = {
"id" : 1,
"name" : {
"first" : "Alice",
"last" : "Sample"
},
"cities" : [
{
"id" : 55,
"name" : "New York"
},
{
"id" : 60,
"name" : "Chicago"
}
],
"teachers" : [
{
"id" : 2
"name" : "Bob",
"classes" : [
{
"id" : 13,
"name" : "math"
},
{
"id" : 16,
"name" : "spanish"
}
]
}
]
}
expected_output = [
{
"id" : 1,
"name_first" : "Alice",
"name_last" : "Sample"
},
{
"_parent_object" : "cities",
"id" : 55,
"name" : "New York"
},
{
"_parent_object" : "cities",
"id" : 60,
"name" : "Chicago"
},
{
"parent_object" :"teachers",
"id" : 2,
"name" : "Bob"
},
{
"parent_object" :"teachers_classes",
"id" : 13,
"name" : "math"
},
{
"parent_object" :"teachers_classes",
"id" : 16,
"name" : "spanish"
}
]
the code currently being used for flattening is:
def flatten_dictionary(d):
def expand(key, value):
if isinstance(value, dict):
return [ (key + '_' + k, v) for k, v in flatten_dictionary(value).items() ]
else:
#If value is null or empty array don't include it
if value is None or value == [] or value == '':
return []
return [ (key, value) ]
items = [ item for k, v in d.items() for item in expand(k, v) ]
return dict(items)
That will do
def expand_dictionary(d,name=None,l=None):
obj = {}
if l == None:
l = [obj]
else:
l.append(obj)
prefix = (name+'_'if name else '')
if prefix: obj['_parent_object'] = name
for i, v in d.iteritems():
if isinstance(v, list):
map(lambda x:expand_dictionary(x,prefix+i,l),v)
elif isinstance(v, dict):
obj.update(flatten_dictionary({i: v}))
else:
obj[i] = v
return l
After working through it a bit here is what I have come up with. Probably can be significantly optimized. Based on #paulo-scardine's comment I added the parent primary key to keep the relational model. Would love to hear optimization thoughts.
def expand_dictionary(original_object, object_name, objects=None):
if objects is None:
objects = []
def flatten_dictionary(dictionary):
def expand(key, value):
if isinstance(value, dict):
return [ (key + '_' + k, v) for k, v in flatten_dictionary(value).items() ]
else:
#If value is null or empty array don't include it
if value is None or value == [] or value == '':
return []
return [ (key, value) ]
items = [ item for k, v in dictionary.items() for item in expand(k, v) ]
return dict(items)
original_object_root = flatten_dictionary(original_object).copy()
original_object_root['_meta_object_name'] = object_name
for key,value in original_object_root.copy().items():
if isinstance(value, dict):
flatten_dictionary(value, objects)
if isinstance(value, list):
original_object_root.pop(key)
for nested_object in value:
nested_object['_meta_parent_foreign_key'] = original_object_root['id']
nested_object['_meta_object_name'] = object_name + "_" + key
expand_dictionary(nested_object, object_name + "_" + key, objects)
objects.append(original_object_root)
return objects
i have this code:
for product_code in product_codes:
product_categories = []
product_belongs_to = []
get_categories = """SELECT * FROM stock_groups_styles_map WHERE stock_groups_styles_map.style ='%s'""" % (product_code,)
for category in sql_query(get_categories):
if {product_code: category[1]} in product_categories:
pass
else:
product_categories.append({product_code: category[1]})
for category in product_categories:
category_group = get_group(category.values()[0])
if category_group:
category_name = category_group.replace("-", " ").title()
if category_name:
if category_name == "Vests":
product_belongs_to.append(get_category_ids("Tanks"))
else:
cat_value = get_category_ids(category_name)
if cat_value:
cat_id = get_category_ids(category_name)
product_belongs_to.append(cat_id[0])
ccc_products = {
'_id': ObjectId(),
'collectionId': collectionId,
'categoryIds': product_belongs_to,
'visible' : 'true',
}
products.save(ccc_products)
when i look at the mongdb collection, i have:
{
"_id" : ObjectId("53aaa4e1d901f2430f25a6ba"),
"collectionId" : ObjectId("53aaa4d6d901f2430f25a604"),
"visible" : "true",
"categoryIds" : [
ObjectId("53aaa4d6d901f2430f25a5fc"),
ObjectId("53aaa4d3d901f2430f25a5f9")
]
}
this is correct, but if i only have one item in the product_belongs_to list, i get:
{
"_id" : ObjectId("53aaa4e1d901f2430f25a6bd"),
"collectionId" : ObjectId("53aaa4d6d901f2430f25a604"),
"visible" : "true",
"categoryIds" : [
[
ObjectId("53aaa4d6d901f2430f25a5fe")
]
]
}
basically, "categoryIds" is an array containing an array
the only way to fix this is to do the following:
if len(product_belongs_to) == 1:
product_belongs_to = product_belongs_to[0]
what am i missing?
any advice much appreciated.
I suspect that this line is the problematic one:
product_belongs_to.append(get_category_ids("Tanks"))
get_category_ids is returning a list which you're appending to product_belongs_to.
You probably wanted to merge the results instead, so that they contain unique values:
product_belongs_to = list(set(product_belongs_to + get_category_ids("Tanks")))