Nested dictionary value from key path - python

Get the value from a nested dictionary with the help of key path, here is the dict:
json = {
"app": {
"Garden": {
"Flowers": {
"Red flower": "Rose",
"White Flower": "Jasmine",
"Yellow Flower": "Marigold"
}
},
"Fruits": {
"Yellow fruit": "Mango",
"Green fruit": "Guava",
"White Flower": "groovy"
},
"Trees": {
"label": {
"Yellow fruit": "Pumpkin",
"White Flower": "Bogan"
}
}
}
The input parameter to the method is the key path with dots separated, from the key path = "app.Garden.Flowers.white Flower" need to print 'Jasmine'. My code so far:
import json
with open('data.json') as data_file:
j = json.load(data_file)
def find(element, JSON):
paths = element.split(".")
# print JSON[paths[0]][paths[1]][paths[2]][paths[3]]
for i in range(0,len(paths)):
data = JSON[paths[i]]
# data = data[paths[i+1]]
print data
find('app.Garden.Flowers.White Flower',j)

This is an instance of a fold. You can either write it concisely like this:
from functools import reduce
import operator
def find(element, json):
return reduce(operator.getitem, element.split('.'), json)
Or more Pythonically (because reduce() is frowned upon due to poor readability) like this:
def find(element, json):
keys = element.split('.')
rv = json
for key in keys:
rv = rv[key]
return rv
j = {"app": {
"Garden": {
"Flowers": {
"Red flower": "Rose",
"White Flower": "Jasmine",
"Yellow Flower": "Marigold"
}
},
"Fruits": {
"Yellow fruit": "Mango",
"Green fruit": "Guava",
"White Flower": "groovy"
},
"Trees": {
"label": {
"Yellow fruit": "Pumpkin",
"White Flower": "Bogan"
}
}
}}
print find('app.Garden.Flowers.White Flower', j)

I was in a similar situation and found this dpath module. Nice and easy.

I suggest you to use python-benedict, a python dict subclass with full keypath support and many utility methods.
You just need to cast your existing dict:
d = benedict(json)
# now your keys support dotted keypaths
print(d['app.Garden.Flower.White Flower'])
Here the library and the documentation:
https://github.com/fabiocaccamo/python-benedict
Note: I am the author of this project

Your code heavily depends on no dots every occurring in the key names, which you might be able to control, but not necessarily.
I would go for a generic solution using a list of element names and then generate the list e.g. by splitting a dotted list of key names:
class ExtendedDict(dict):
"""changes a normal dict into one where you can hand a list
as first argument to .get() and it will do a recursive lookup
result = x.get(['a', 'b', 'c'], default_val)
"""
def multi_level_get(self, key, default=None):
if not isinstance(key, list):
return self.get(key, default)
# assume that the key is a list of recursively accessible dicts
def get_one_level(key_list, level, d):
if level >= len(key_list):
if level > len(key_list):
raise IndexError
return d[key_list[level-1]]
return get_one_level(key_list, level+1, d[key_list[level-1]])
try:
return get_one_level(key, 1, self)
except KeyError:
return default
get = multi_level_get # if you delete this, you can still use the multi_level-get
Once you have this class it is easy to just transform your dict and get "Jasmine":
json = {
"app": {
"Garden": {
"Flowers": {
"Red flower": "Rose",
"White Flower": "Jasmine",
"Yellow Flower": "Marigold"
}
},
"Fruits": {
"Yellow fruit": "Mango",
"Green fruit": "Guava",
"White Flower": "groovy"
},
"Trees": {
"label": {
"Yellow fruit": "Pumpkin",
"White Flower": "Bogan"
}
}
}
}
j = ExtendedDict(json)
print j.get('app.Garden.Flowers.White Flower'.split('.'))
will get you:
Jasmine
Like with a normal get() from a dict, you get None if the key (list) you specified doesn't exists anywhere in the tree, and you can specify a second parameter as return value instead of None

Very close. You need to (as you had in your comment) recursively go through the main JSON object. You can accomplish that by storing the result of the outermost key/value, then using that to get the next key/value, etc. till you're out of paths.
def find(element, JSON):
paths = element.split(".")
data = JSON
for i in range(0,len(paths)):
data = data[paths[i]]
print data
You still need to watch out for KeyErrors though.

one-liner:
from functools import reduce
a = {"foo" : { "bar" : "blah" }}
path = "foo.bar"
reduce(lambda acc,i: acc[i], path.split('.'), a)

Option 1: pyats library from Cisco [its a c extension]
Its quick and Super fast (measure it with timeit if required)
Javascript-ish usage [Bracket lookup ,dotted lookup, combined lookup]
Dotted Lookup for missing key raises Attribute error, bracket or default python dict lookup gives KeyError.
pip install pyats pyats-datastructures pyats-utils
from pyats.datastructures import NestedAttrDict
item = {"specifications": {"os": {"value": "Android"}}}
path = "specifications.os.value"
x = NestedAttrDict(item)
print(x[path])# prints Android
print(x['specifications'].os.value)# prints Android
print(x['specifications']['os']['value'])#prints Android
print(x['specifications'].os.value1)# raises Attribute Error
Option 2:pyats.utils chainget
super fast (measure it with timeit if required)
from pyats.utils import utils
item = {"specifications": {"os": {"value": "Android"}}}
path = "specifications.os.value"
path1 = "specifications.os.value1"
print(utils.chainget(item,path))# prints android (string version)
print(utils.chainget(item,path.split('.')))# prints android(array version)
print(utils.chainget(item,path1))# raises KeyError
Option 3: python without external library
Better speed in comparison to lambda.
Separate Error handling not required as in lambda and other cases.
Readable and concise can be a utils function/helper in the project
from functools import reduce
item = {"specifications": {"os": {"value": "Android"}}}
path1 = "specifications.family.value"
path2 = "specifications.family.value1"
def test1():
print(reduce(dict.get, path1.split('.'), item))
def test2():
print(reduce(dict.get, path2.split('.'), item))
test1() # prints Android
test2() # prints None

Wrote function that works with lists in dict.
d = {'test': [
{'value1': 'val'},
{'value1': 'val2'}]}
def find_element(keys: list, dictionary: dict):
rv = dictionary
if isinstance(dictionary, dict):
rv = find_element(keys[1:], rv[keys[0]])
elif isinstance(dictionary, list):
if keys[0].isnumeric():
rv = find_element(keys[1:], dictionary[int(keys[0])])
else:
return rv
return rv
val = find_element('test.1.value1'.split('.'), d)

data = {
"data": {
"author_id": "1",
"text": "hi msg",
"attachments": {
"media_keys": [
"3_16"
]
},
"id": "2",
"edit_history_tweet_ids": [
"2"
]
},
"includes": {
"media": [
{
"media_key": "3_16",
"height": 500,
"type": "photo",
"width": 500,
"url": "https://pbs.twimg.com/media/xxxxxx.png"
}
],
"users": [
{
"id": "1",
"name": "name1",
"username": "username1"
}
]
}
}
def get_value_from_dict(dic_obj, keys: list, default):
"""
get value from dict with key path.
:param dic_obj: dict
:param keys: dict key
:param default: default value
:return:
"""
if not dic_obj or not keys:
return default
pre_obj = dic_obj
for key in keys:
t = type(pre_obj)
if t is dict:
pre_obj = pre_obj.get(key)
elif (t is list or t is tuple) and str(key).isdigit() and len(pre_obj) > int(key):
pre_obj = pre_obj[int(key)]
else:
return default
return pre_obj
print('media_key:', get_value_from_dict(data, 'data.attachments.media_keys'.split('.'), None))
print('username:', get_value_from_dict(data, 'includes.users.0.username'.split('.'), None))
media_key: ['3_16']
username: username1

Related

python dict recursion returns empty list

I have this dictionary that I am trying to iterate through recursively. When I hit a matching node match I want to return that node which is a list.
Currently with my code I keep on getting an empty list. I have stepped through the code and I see my check condition being hit, but the recursion still returns an empty value. what am I doing wrong here? thanks
dictionary data:
{
"apiVersion": "v1",
"kind": "Deployment",
"metadata": {
"name": "cluster",
"namespace": "namespace",
},
"spec": {
"template": {
"metadata": {
"labels": {
"app": "flink",
"cluster": "repo_name-cluster",
"component": "jobmanager",
"track": "prod",
}
},
"spec": {
"containers": [
{
"name": "jobmanager",
"image": "IMAGE_TAG_",
"imagePullPolicy": "Always",
"args": ["jobmanager"],
"resources": {
"requests": {"cpu": "100.0", "memory": "100Gi"},
"limits": {"cpu": "100.0", "memory": "100Gi"},
},
"env": [
{
"name": "ADDRESS",
"value": "jobmanager-prod",
},
{"name": "HADOOP_USER_NAME", "value": "yarn"},
{"name": "JOB_MANAGER_MEMORY", "value": "1000m"},
{"name": "HADOOP_CONF_DIR", "value": "/etc/hadoop/conf"},
{
"name": "TRACK",
"valueFrom": {
"fieldRef": {
"fieldPath": "metadata.labels['track']"
}
},
},
],
}
]
},
},
},
}
code:
test = iterdict(data, "env")
print(test)
def iterdict(data, match):
output = []
if not isinstance(data, str):
for k, v in data.items():
print("key ", k)
if isinstance(v, dict):
iterdict(v, match)
elif isinstance(v, list):
if k.lower() == match.lower():
# print(v)
output += v
return output
else:
for i in v:
iterdict(i, match)
return output
expected return value:
[{'name': 'JOB_MANAGER_RPC_ADDRESS', 'value': 'repo_name-cluster-jobmanager-prod'}, {'name': 'HADOOP_USER_NAME', 'value': 'yarn'}, {'name': 'JOB_MANAGER_MEMORY', 'value': '1000m'}, {'name': 'HADOOP_CONF_DIR', 'value': '/etc/hadoop/conf'}, {'name': 'TRACK', 'valueFrom': {...}}]
When you recurse to iterdict, you're simply throwing away the return value. Thus, since every value in the top level of your dictionary is either a string or a dict, you will end up just returning an empty list.
You probably want to append the recursive outputs:
output += iterdict(v, match)
and
output += iterdict(i, match)
However, this is potentially inefficient as you will build a lot of intermediate lists. A better strategy might be to make your function a generator; the name iterdict would suggest this anyway. To do so, get rid of your output variable and the return statements, and use yield instead:
yield from iterdict(v, match)
yield from v
yield from iterdict(i, match)
and then, at the top level, you can just iterate over your results:
for value in iterdict(data, "env"):
...
or, if you really need a list, collect the generator output into a list:
test = list(iterdata(data, "env"))
This will likely be faster (no intermediate lists) and more Pythonic.
You are not updating the output to output list when you are running it recursively.
You can either append the output or use yield keyword to make use of generators in python. Return creates temporary lists which are memry intensive and impedes performance when you are running it recursively. Thats why use generators.
def iterdict(data, match):
if isinstance(data, str):
return []
for k, v in data.items():
if isinstance(v, dict):
yield from iterdict(v, match)
elif isinstance(v, list):
if k.lower() == match.lower():
yield from v
for i in v:
yield from iterdict(i, match)
test = list(iterdict(data, "env"))
print(test)

Python parse JSON file

{
"Logging": {
"LogLevel": {
"Default": "Information",
"Microsoft": "Warning",
"Microsoft.Hosting.Lifetime": "Information",
"Microsoft.AspNetCore": "Warning",
"System.Net.Http.HttpClient.Default.ClientHandler": "Warning",
"System.Net.Http.HttpClient.Default.LogicalHandler": "Warning"
}
},
"AllowedHosts": "*",
"AutomaticTransferOptions": {
"DateOffsetForDirectoriesInDays": -1,
"DateOffsetForPortfoliosInDays": -3,
"Clause": {
"Item1": "1"
}
},
"Authentication": {
"ApiKeys": [
{
"Key": "AB8E5976-2A7C-4EEE-92C1-7B0B4DC840F6",
"OwnerName": "Cron job",
"Claims": [
{
"Type": "http://schemas.microsoft.com/ws/2008/06/identity/claims/role",
"Value": "StressTestManager"
}
]
},
{
"Key": "B11D4F27-483A-4234-8EC7-CA121712D5BE",
"OwnerName": "Test admin",
"Claims": [
{
"Type": "http://schemas.microsoft.com/ws/2008/06/identity/claims/role",
"Value": "StressTestAdmin"
},
{
"Type": "http://schemas.microsoft.com/ws/2008/06/identity/claims/role",
"Value": "TestManager"
}
]
},
{
"Key": "EBF98F2E-555E-4E66-9D77-5667E0AA1B54",
"OwnerName": "Test manager",
"Claims": [
{
"Type": "http://schemas.microsoft.com/ws/2008/06/identity/claims/role",
"Value": "TestManager"
}
]
}
],
"LDAP": {
"Domain": "domain.local",
"MachineAccountName": "Soft13",
"MachineAccountPassword": "vixuUEY7884*",
"EnableLdapClaimResolution": true
}
},
"Authorization": {
"Permissions": {
"Roles": [
{
"Role": "TestAdmin",
"Permissions": [
"transfers.create",
"bindings.create"
]
},
{
"Role": "TestManager",
"Permissions": [
"transfers.create"
]
}
]
}
}
}
I have JSON above and need to parse it with output like this
Logging__LogLevel__Default
Authentication__ApiKeys__0__Claims__0__Type
Everything is ok, but I always get some strings with this output
Authentication__ApiKeys__0__Key
Authentication__ApiKeys__0__OwnerName
Authentication__ApiKeys__0__Claims__0__Type
Authentication__ApiKeys__0__Claims__0__Value
Authentication__ApiKeys__0__Claims__0
Authentication__ApiKeys__2
Authorization__Permissions__Roles__0__Role
Authorization__Permissions__Roles__0__Permissions__1
Authorization__Permissions__Roles__1__Role
Authorization__Permissions__Roles__1__Permissions__0
Authorization__Permissions__Roles__1
Why does my code adds not full strings like
Authentication__ApiKeys__0__Claims__0
Authentication__ApiKeys__2
Authorization__Permissions__Roles__1
And why it doesn't print every value from
Authorization__Permissions__Roles__0__Permissions__*
and from
Authorization__Permissions__Roles__1__Permissions__*
I have this code in python3:
def checkdepth(sub_key, variable):
delmt = '__'
for item in sub_key:
try:
if isinstance(sub_key[item], dict):
sub_variable = variable + delmt + item
checkdepth(sub_key[item], sub_variable)
except TypeError:
continue
if isinstance(sub_key[item], list):
sub_variable = variable + delmt + item
for it in sub_key[item]:
sub_variable = variable + delmt + item + delmt + str(sub_key[item].index(it))
checkdepth(it, sub_variable)
print(sub_variable)
if isinstance(sub_key[item], int) or isinstance(sub_key[item], str):
sub_variable = variable + delmt + item
print (sub_variable)
for key in data:
if type(data[key]) is str:
print(key + '=' +str(data[key]))
else:
variable = key
checkdepth(data[key], variable)
I know that the problem in block where I process list data type, but I don't know where is the problem exactly
Use a recursive generator:
import json
with open('input.json') as f:
data = json.load(f)
def strkeys(data):
if isinstance(data,dict):
for k,v in data.items():
for item in strkeys(v):
yield f'{k}__{item}' if item else k
elif isinstance(data,list):
for i,v in enumerate(data):
for item in strkeys(v):
yield f'{i}__{item}' if item else str(i)
else:
yield None # termination condition, not a list or dict
for s in strkeys(data):
print(s)
Output:
Logging__LogLevel__Default
Logging__LogLevel__Microsoft
Logging__LogLevel__Microsoft.Hosting.Lifetime
Logging__LogLevel__Microsoft.AspNetCore
Logging__LogLevel__System.Net.Http.HttpClient.Default.ClientHandler
Logging__LogLevel__System.Net.Http.HttpClient.Default.LogicalHandler
AllowedHosts
AutomaticTransferOptions__DateOffsetForDirectoriesInDays
AutomaticTransferOptions__DateOffsetForPortfoliosInDays
AutomaticTransferOptions__Clause__Item1
Authentication__ApiKeys__0__Key
Authentication__ApiKeys__0__OwnerName
Authentication__ApiKeys__0__Claims__0__Type
Authentication__ApiKeys__0__Claims__0__Value
Authentication__ApiKeys__1__Key
Authentication__ApiKeys__1__OwnerName
Authentication__ApiKeys__1__Claims__0__Type
Authentication__ApiKeys__1__Claims__0__Value
Authentication__ApiKeys__1__Claims__1__Type
Authentication__ApiKeys__1__Claims__1__Value
Authentication__ApiKeys__2__Key
Authentication__ApiKeys__2__OwnerName
Authentication__ApiKeys__2__Claims__0__Type
Authentication__ApiKeys__2__Claims__0__Value
Authentication__LDAP__Domain
Authentication__LDAP__MachineAccountName
Authentication__LDAP__MachineAccountPassword
Authentication__LDAP__EnableLdapClaimResolution
Authorization__Permissions__Roles__0__Role
Authorization__Permissions__Roles__0__Permissions__0
Authorization__Permissions__Roles__0__Permissions__1
Authorization__Permissions__Roles__1__Role
Authorization__Permissions__Roles__1__Permissions__0
Using json_flatten this can be converted to pandas, but it's not clear if that's what you want. Also, when you do convert it can use df.iloc[0] to see why each column is being provided (ie you see the value for that key).
Note: you need to pass a list so I just wrapped your json above in [].
# https://github.com/amirziai/flatten
dic = your json from above
dic =[dic] # put it in a list
dic_flattened = (flatten(d, '__') for d in dic) # add your delimiter
df = pd.DataFrame(dic_flattened)
df.iloc[0]
Logging__LogLevel__Default Information
Logging__LogLevel__Microsoft Warning
Logging__LogLevel__Microsoft.Hosting.Lifetime Information
Logging__LogLevel__Microsoft.AspNetCore Warning
Logging__LogLevel__System.Net.Http.HttpClient.Default.ClientHandler Warning
Logging__LogLevel__System.Net.Http.HttpClient.Default.LogicalHandler Warning
AllowedHosts *
AutomaticTransferOptions__DateOffsetForDirectoriesInDays -1
AutomaticTransferOptions__DateOffsetForPortfoliosInDays -3
AutomaticTransferOptions__Clause__Item1 1
Authentication__ApiKeys__0__Key AB8E5976-2A7C-4EEE-92C1-7B0B4DC840F6
Authentication__ApiKeys__0__OwnerName Cron job
Authentication__ApiKeys__0__Claims__0__Type http://schemas.microsoft.com/ws/2008/06/identi...
Authentication__ApiKeys__0__Claims__0__Value StressTestManager
Authentication__ApiKeys__1__Key B11D4F27-483A-4234-8EC7-CA121712D5BE
Authentication__ApiKeys__1__OwnerName Test admin
Authentication__ApiKeys__1__Claims__0__Type http://schemas.microsoft.com/ws/2008/06/identi...
Authentication__ApiKeys__1__Claims__0__Value StressTestAdmin
Authentication__ApiKeys__1__Claims__1__Type http://schemas.microsoft.com/ws/2008/06/identi...
Authentication__ApiKeys__1__Claims__1__Value TestManager
Authentication__ApiKeys__2__Key EBF98F2E-555E-4E66-9D77-5667E0AA1B54
Authentication__ApiKeys__2__OwnerName Test manager
Authentication__ApiKeys__2__Claims__0__Type http://schemas.microsoft.com/ws/2008/06/identi...
Authentication__ApiKeys__2__Claims__0__Value TestManager
Authentication__LDAP__Domain domain.local
Authentication__LDAP__MachineAccountName Soft13
Authentication__LDAP__MachineAccountPassword vixuUEY7884*
Authentication__LDAP__EnableLdapClaimResolution true
Authorization__Permissions__Roles__0__Role TestAdmin
Authorization__Permissions__Roles__0__Permissions__0 transfers.create
Authorization__Permissions__Roles__0__Permissions__1 bindings.create
Authorization__Permissions__Roles__1__Role TestManager
Authorization__Permissions__Roles__1__Permissions__0 transfers.create
Ok, I looked at your code and it's hard to follow. You're variable and function names are not easy to understand their purpose. Which is fine cause everyone has to learn best practice and all the little tips and tricks in python. So hopefully I can help you out.
You have a recursive-ish function. Which is definingly the best way to handle a situation like this. However your code is part recursive and part not. If you go recursive to solve a problem you have to go 100% recursive.
Also the only time you should print in a recursive function is for debugging. Recursive functions should have an object that is passed down the function and gets appended to or altered and then passed back once it gets to the end of the recursion.
When you get a problem like this, think about which data you actually need or care about. In this problem we don't care about the values that are stored in the object, we just care about the keys. So we should write code that doesn't even bother looking at the value of something except to determine its type.
Here is some code I wrote up that should work for what you're wanting to do. But take note that because I did purely a recursive function my code base is small. Also my function uses a list that is passed around and added to and then at the end I return it so that we can use it for whatever we need. If you have questions just comment on this question and I'll answer the best I can.
def convert_to_delimited_keys(obj, parent_key='', delimiter='__', keys_list=None):
if keys_list is None: keys_list = []
if isinstance(obj, dict):
for k in obj:
convert_to_delimited_keys(obj[k], delimiter.join((parent_key, str(k))), delimiter, keys_list)
elif isinstance(obj, list):
for i, _ in enumerate(obj):
convert_to_delimited_keys(obj[i], delimiter.join((parent_key, str(i))), delimiter, keys_list)
else:
# Append to list, but remove the leading delimiter due to string.join
keys_list.append(parent_key[len(delimiter):])
return keys_list
for item in convert_to_delimited_keys(data):
print(item)

Error in producing the output for the json for a specific id value

I have the following json tree.
json_tree ={
"Garden": {
"Seaside": {
"#loc": "127.0.0.1",
"#myID": "1.3.1",
"Shoreside": {
"#myID": "3",
"InfoList": {
"Notes": {
"#code": "0",
"#myID": "1"
},
"Count": {
"#myID": "2",
"#val": "0"
}
},
"state": "0",
"Tid": "3",
"Lakesshore": {
"#myID": "4",
"InfoList": {
"Notes": {
"#code": "0",
"#oid": "1"
},
"Count": {
"#myID": "2",
"#val": "0"
}
},
"state": "0",
"Tid": "4"
}
},
"state": "0",
"Tid": "2"
},
"Tid": "1",
"state": "0"
}
}
I have a method which takes in the "Tid" value and returns the output in the following format.
This is where the issue lies. I do not understand why for the value of Tid = 2, I get "ERROR" stating that the InfoList not exists. For other Tid values, it works well. Can someone help me to resolve this issue?
There is NO InfoList at "Tid:"2 but I am not sure on how to update my logic to handle this.
def get_output (d, id):
if isinstance(d, dict) and d.get('id') == id:
yield {"Tid": d['Tid'], "Notes": d['InfoList']['Notes']['#code'], "status": d['state']}
for i in getattr(d, "values", lambda: [])():
yield from get_based_on_id(i, id)
# The id is from 2 and higher
key_list = list(get_output (json_tree, id))
# To create the json result
jsonify(key_list if not key_list else key_list[0])
For "Tid" values of 2 and higher the get_output method creates this output:
{
"Tid": "3",
"Notes": "2000",
"state": "2"
}
This part shown below works well. The issue is ONLY with the code shown above.
def get_output_id_1 (d, id):
if isinstance(d, dict) and d.get('id') == id:
yield {"id": d['Tid'], "state": d['state']}
for i in getattr(d, "values", lambda: [])():
yield from get_root_id(i, id)
For "Tid" value of 1 and higher the get_output_id_1 method creates this output:
{
"Tid": "1",
"state": "1",
}
Any help is appreciated.
The problem is you are using direct access to leverage a key that may or may not be in the dictionary. To get around this, use the dict.get method, which will return None or some default value that you specify in case the key isn't present:
small_example = {
'Tid': '2',
'status': 'some status'
}
# there is no InfoList key here, so to get around that, I can use something like:
info_list = small_example.get('InfoList')
repr(info_list)
None
Now, you can specify a default return value for get if you need to chain things together, like with a nested dictionary call:
{
'Tid': small_example['Tid'],
'Notes': small_example.get('InfoList', {}).get('Notes', {}).get('#code'),
'status': small_example.get('state')
}
See how on the first two calls, I return an empty dictionary in case InfoList and/or Notes are missing, which supports the subsequent call to get. Without that, I would get an AttributeError:
small_example.get('InfoList').get('Notes')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'get'
So your yield statement should look like:
yield {
"Tid": d['Tid'],
"Notes": d.get('InfoList', {}).get('Notes', {}).get('#code'),
"status": d.get('state')
}
Edit: What if you want a different default for Notes?
This gets a little tricky, especially if you want a data structure that doesn't support .get, such as str.
Your yield statement might have to be produced from a different function to make things a little more tidy:
# d is expected to be a dictionary
def create_yield(d):
# I'm using direct access on `Tid` because I'm assuming it should
# always be there, and if not it will raise a KeyError, you can
# modify this to fit your use case
container = {'Tid': d['Tid'],
'status': d.get('state')}
notes = small_example.get('InfoList', {}).get('Notes')
# if notes is a dict (not None), then we can get `#code` from it
if notes is not None:
container['Notes'] = notes.get('#code')
# otherwise, don't set the `Notes` attribute
return container
# later in your code at your yield statement
# you can call this function instead of manually building the dictionary
yield create_yield(small_example)

How to convert nested JSON data to CSV using python?

I have a file consisting of an array containing over 5000 objects. However, I am having trouble converting one particular part of my JSON file into the appropriate columns in CSV format.
Below is an example version of my data file:
{
"Result": {
"Example 1": {
"Type1": [
{
"Owner": "Name1 Example",
"Description": "Description1 Example",
"Email": "example1_email#email.com",
"Phone": "(123) 456-7890"
}
]
},
"Example 2": {
"Type1": [
{
"Owner": "Name2 Example",
"Description": "Description2 Example",
"Email": "example2_email#email.com",
"Phone": "(111) 222-3333"
}
]
}
}
}
Here is my current code:
import csv
import json
json_file='example.json'
with open(json_file, 'r') as json_data:
x = json.load(json_data)
f = csv.writer(open("example.csv", "w"))
f.writerow(["Address","Type","Owner","Description","Email","Phone"])
for key in x["Result"]:
type = "Type1"
f.writerow([key,
type,
x["Result"][key]["Type1"]["Owner"],
x["Result"][key]["Type1"]["Description"],
x["Result"][key]["Type1"]["Email"],
x["Result"][key]["Type1"]["Phone"]])
My problem is that I'm encountering this issue:
Traceback (most recent call last):
File "./convert.py", line 18, in <module>
x["Result"][key]["Type1"]["Owner"],
TypeError: list indices must be integers or slices, not str
When I try to substitute the last array such as "Owner" to an integer value, I receive this error: IndexError: list index out of range.
When I strictly change the f.writerow function to
f.writerow([key,
type,
x["Result"][key]["Type1"]])
I receive the results in a column, but it merges everything into one column, which makes sense. Picture of the output: https://imgur.com/a/JpDkaAT
I would like the results to be separated based on the label into individual columns instead of being merged into one. Could anyone assist?
Thank you!
Type1 in your data structure is a list, not a dict. So you need to iterate over it instead of referencing by key.
for key in x["Result"]:
# key is now "Example 1" etc.
type1 = x["Result"][key]["Type1"]
# type1 is a list, not a dict
for i in type1:
f.writerow([key,
"Type1",
type1["Owner"],
type1["Description"],
type1["Email"],
type1["Phone"]])
The inner for loop ensure that you're protected from the assumption that "Type1" only ever has one item in the list.
It's definately not the best example, but I'm to sleepy to optimize it.
import csv
def json_to_csv(obj, res):
for k, v in obj.items():
if isinstance(v, dict):
res.append(k)
json_to_csv(v, res)
elif isinstance(v, list):
res.append(k)
for el in v:
json_to_csv(el, res)
else:
res.append(v)
obj = {
"Result": {
"Example 1": {
"Type1": [
{
"Owner": "Name1 Example",
"Description": "Description1 Example",
"Email": "example1_email#email.com",
"Phone": "(123) 456-7890"
}
]
},
"Example 2": {
"Type1": [
{
"Owner": "Name2 Example",
"Description": "Description2 Example",
"Email": "example2_email#email.com",
"Phone": "(111) 222-3333"
}
]
}
}
}
with open("out.csv", "w+") as f:
writer = csv.writer(f)
writer.writerow(["Address","Type","Owner","Description","Email","Phone"])
for k, v in obj["Result"].items():
row = [k]
json_to_csv(v, row)
writer.writerow(row)
Figured it out!
I changed the f.writerow function to the following:
for key in x["Result"]:
type = "Type1"
f.writerow([key,
type,
x["Result"][key]["Type1"][0]["Owner"],
x["Result"][key]["Type1"][0]["Email"]])
...
This allowed me reference the keys within the object. Hopefully this helps someone down the line!

How to return a partial JSON response using python?

I have a json (or a python dictionary) and I would like to define a whitelist of fields.
{
"firstname": "user",
"last_name":"test",
"roles": ["admin", "country"],
"languages":{"first":"english", "second":"french"}
}
Specifying firstname, roles languages and first should output:
{
"firstname": "user",
"roles": ["admin", "country"],
"languages":{"first":"english"}
}
its easy to do for first level, but how can I do it for second, third level etc...
Maybe you can make a recursive function that checks the type of each dictionary element and their white list membership. If type of element N is a dictionary, call again to function incrementing some level counter.
If you know how to do it for "first level", apply recursion to do it for "second, third, level, etc".
def dict_filter(d, keep):
if isinstance(d, dict):
return { k:dict_filter(v, keep) for k,v in d.iteritems() if k in keep }
return d
orig = {
"firstname": "user",
"last_name":"test",
"roles": ["admin", "country"],
"languages":{"first":"english", "second":"french"}
}
keepers = ['firstname', 'roles', 'languages', 'first']
print dict_filter(orig, keepers)

Categories