search values in a list of nested dictionaries - python

I want to perform searches on values in a list of nested dictionaries and return another key:value pairs. These dictionaries are metafiles. Basically I want to search the ID of each dictionary, find all dictionaries that have the same ID, and return the file location (key:value) pairs.
metafile = [{'metadata':{'Title':'The Sun Also Rises', 'ID': 'BAY121-F1164EAB499'}, 'content': 'xyz', 'File_Path': 'file_location1'},
{'metadata':{'Title':'Of Mice and Men', 'ID': '499B0BAB#dfg'}, 'content': 'abc', 'File_Path': 'file_location2'},
{'metadata':{'Title':'The Sun Also Rises Review', 'ID': 'BAY121-F1164EAB499'}, 'content': 'ftw', 'File_Path': 'file_location3'}]
I created a loop to perform my search as follows. It returns an empty list though, how should I modify this so that the file paths are returned?
search_ID = 'BAY121-F1164EAB499'
path =[]
for a in metafile:
for val in a['metadata']['ID']:
if search_ID == val:
path.append(a['File_Path'])

you don't need an inner loop for this:
correct code
search_ID = 'BAY121-F1164EAB499'
path =[]
for a in metafile:
#a['metadata']['ID'] already gives you the value of ID
if search_ID == a['metadata']['ID']:
path.append(a['File_Path'])
output
['file_location1', 'file_location3']

You don't need to iterate through a['metadata']['ID'], instead just access them directly. So the modified code would be
metafile = [{'metadata':{'Title':'The Sun Also Rises', 'ID': 'BAY121-
F1164EAB499'}, 'content': 'xyz', 'File_Path': 'file_location1'},
{'metadata':{'Title':'Of Mice and Men', 'ID': '499B0BAB#dfg'}, 'content': 'abc',
'File_Path': 'file_location2'},
{'metadata':{'Title':'The Sun Also Rises Review', 'ID': 'BAY121-F1164EAB499'},
'content': 'ftw', 'File_Path': 'file_location3'}]
search_ID = 'BAY121-F1164EAB499'
path =[]
for a in metafile:
if a["metadata"]["ID"] == search_ID:
path.append(a['File_Path'])

Related

Python list of dict parsing

I have a list of dict which I need to parse into a dict {A:[b,c]}.
iam_profile_association = [{'AssociationId': 'iip-assoc-08c2998aabf8ad37e', 'InstanceId': 'i-078cf2f285a2bb4d3', 'IamInstanceProfile': {'Arn': 'arn:aws:iam:::instance-profile/STANDARD-SSM-INSTANCEPROFILE-us-east-1', 'Id': 'AIPAZTOKUMIFDJ4VGT2FP'}, 'State': 'associated'}, {'AssociationId': 'iip-assoc-0afc8368072fa1d7e', 'InstanceId': 'i-076d4c961d800ba18', 'IamInstanceProfile': {'Arn': 'arn:aws:iam:::instance-profile/STANDARD-SSM-INSTANCEPROFILE-us-east-1', 'Id': 'AIPAZTOKUMIFDJ4VGT2FP'}, 'State': 'associated'}, {'AssociationId': 'iip-assoc-0c46d1f23de061e98', 'InstanceId': 'i-0942731f00ebd33e1', 'IamInstanceProfile': {'Arn': 'arn:aws:iam:::instance-profile/STANDARD-SSM-INSTANCEPROFILE-us-east-1', 'Id': 'AIPAZTOKUMIFDJ4VGT2FP'}, 'State': 'associated'}]
iam_dict = {}
for iam_assoc in iam_profile_association:
iam_dict[iam_assoc['InstanceId']] = [iam_assoc['AssociationId'],iam_assoc['IamInstanceProfile']['Arn']]
print (iam_dict)
iam_dict just prints the first value :
{'i-078cf2f285a2bb4d3': ['iip-assoc-08c2998aabf8ad37e', 'arn:aws:iam::660239311370:instance-profile/STANDARD-SSM-INSTANCEPROFILE-us-east-1']}
Can someone help me in this? I would like to store all the values in this dict format.
Your print is inside the for loop, move it outside (tab it in)
iam_dict = {}
for iam_assoc in iam_profile_association:
iam_dict[iam_assoc['InstanceId']] = ...........
print (iam_dict)
It's tricky to work out more than that from what you are asking. But this would be a good start.

Problem when saving json data into python list

I'm trying to get two attributes at the time from my json data and add them as an item on my python list. However, when trying to add those two: ['emailTypeDesc']['createdDate'] it throws an error. Could someone help with this? thanks in advance!
json:
{
'readOnly': False,
'senderDetails': {'firstName': 'John', 'lastName': 'Doe', 'emailAddress': 'johndoe#gmail.com', 'emailAddressId': 123456, 'personalId': 123, 'companyName': 'ACME‘},
'clientDetails': {'firstName': 'Jane', 'lastName': 'Doe', 'emailAddress': 'janedoe#gmail.com', 'emailAddressId': 654321, 'personalId': 456, 'companyName': 'Lorem Ipsum‘}},
'notesSection': {},
'emailList': [{'requestId': 12345667, 'emailId': 9876543211, 'emailType': 3, 'emailTypeDesc': 'Email-In', 'emailTitle': 'SampleTitle 1', 'createdDate': '15-May-2020 11:15:52', 'fromMailList': [{'firstName': 'Jane', 'lastName': 'Doe', 'emailAddress': 'janedoe#gmail.com',}]},
{'requestId': 12345667, 'emailId': 14567775, 'emailType': 3, 'emailTypeDesc': 'Email-Out', 'emailTitle': 'SampleTitle 2', 'createdDate': '16-May-2020 16:15:52', 'fromMailList': [{'firstName': 'Jane', 'lastName': 'Doe', 'emailAddress': 'janedoe#gmail.com',}]},
{'requestId': 12345667, 'emailId': 12345, 'emailType': 3, 'emailTypeDesc': 'Email-In', 'emailTitle': 'SampleTitle 3', 'createdDate': '17-May-2020 20:15:52', 'fromMailList': [{'firstName': 'Jane', 'lastName': 'Doe', 'emailAddress': 'janedoe#gmail.com',}]
}
python:
final_list = []
data = json.loads(r.text)
myId = [(data['emailList'][0]['requestId'])]
for each_req in myId:
final_list.append(each_req)
myEmailList = [mails['emailTypeDesc']['createdDate'] for mails in data['emailList']]
for each_requ in myEmailList:
final_list.append(each_requ)
return final_list
This error comes up when I run the above code:
TypeError: string indices must be integers
Desired output for final_list:
[12345667, 'Email-In', '15-May-2020 11:15:52', 'Email-Out', '16-May-2020 16:15:52', 'Email-In', '17-May-2020 20:15:52']
My problem is definetely in this line:
myEmailList = [mails['emailTypeDesc']['createdDate'] for mails in data['emailList']]
because when I run this without the second attribute ['createdDate'] it would work, but I need both attributes on my final_list:
myEmailList = [mails['emailTypeDesc'] for mails in data['emailList']]
I think you're misunderstanding the syntax. mails['emailTypeDesc']['createdDate'] is looking for the key 'createdDate' inside the object mails['emailTypeDesc'], but in fact they are two items at the same level.
Since mails['emailTypeDesc'] is a string, not a dictionary, you get the error you have quoted. It seems that you want to add the two items mails['emailTypeDesc'] and mails['createdDate'] to your list. I'm not sure if you'd rather join these together into a single string or create a sub-list or something else. Here's a sublist option.
myEmailList = [[mails['emailTypeDesc'], mails['createdDate']] for mails in data['emailList']]
Strings in JSON must be in double quotes, not single.
Edit: As well as names.

how to use nested dictionary in python?

I am trying to write some code with the Hunter.io API to automate some of my b2b email scraping. It's been a long time since I've written any code and I could use some input. I have a CSV file of Urls, and I want to call a function on each URL that outputs a dictionary like this:
`{'domain': 'fromthebachrow.com', 'webmail': False, 'pattern': '{f}{last}', 'organization': None, 'emails': [{'value': 'fbach#fromthebachrow.com', 'type': 'personal', 'confidence': 91, 'sources': [{'domain': 'fromthebachrow.com', 'uri': 'http://fromthebachrow.com/contact', 'extracted_on': '2017-07-01'}], 'first_name': None, 'last_name': None, 'position': None, 'linkedin': None, 'twitter': None, 'phone_number': None}]}`
for each URL I call my function on. I want my code to return just the email address for each key labeled 'value'.
Value is a key that is contained in a list that itself is an element of the directory my function outputs. I am able to access the output dictionary to grab the list that is keyed to 'emails', but I don't know how to access the dictionary contained in the list. I want my code to return the value in that dictionary that is keyed with 'value', and I want it to do so for all of my urls.
from pyhunyrt import PyHunter
import csv
file=open('urls.csv')
reader=cvs.reader (file)
urls=list(reader)
hunter=PyHunter('API Key')
for item in urls:
output=hunter.domain_search(item)
output['emails'`
which returns a list that looks like this for each item:
[{
'value': 'fbach#fromthebachrow.com',
'type': 'personal',
'confidence': 91,
'sources': [{
'domain': 'fromthebachrow.com',
'uri': 'http://fromthebachrow.com/contact',
'extracted_on': '2017-07-01'
}],
'first_name': None,
'last_name': None,
'position': None,
'linkedin': None,
'twitter': None,
'phone_number': None
}]
How do I grab the first dictionary in that list and then access the email paired with 'value' so that my output is just an email address for each url I input initially?
To grab the first dict (or any item) in a list, use list[0], then to grab a value of a key value use ["value"]. To combine it, you should use list[0]["value"]

Python: retrieve arbitrary dictionary path and amend data?

Simple Python question, but I'm scratching my head over the answer!
I have an array of strings of arbitrary length called path, like this:
path = ['country', 'city', 'items']
I also have a dictionary, data, and a string, unwanted_property. I know that the dictionary is of arbitrary depth and is dictionaries all the way down, with the exception of the items property, which is always an array.
[CLARIFICATION: The point of this question is that I don't know what the contents of path will be. They could be anything. I also don't know what the dictionary will look like. I need to walk down the dictionary as far as the path indicates, and then delete the unwanted properties from there, without knowing in advance what the path looks like, or how long it will be.]
I want to retrieve the parts of the data object (if any) that matches the path, and then delete the unwanted_property from each.
So in the example above, I would like to retrieve:
data['country']['city']['items']
and then delete unwanted_property from each of the items in the array. I want to amend the original data, not a copy. (CLARIFICATION: By this I mean, I'd like to end up with the original dict, just minus the unwanted properties.)
How can I do this in code?
I've got this far:
path = ['country', 'city', 'items']
data = {
'country': {
'city': {
'items': [
{
'name': '114th Street',
'unwanted_property': 'foo',
},
{
'name': '8th Avenue',
'unwanted_property': 'foo',
},
]
}
}
}
for p in path:
if p == 'items':
data = [i for i in data[p]]
else:
data = data[p]
if isinstance(data, list):
for d in data:
del d['unwanted_property']
else:
del data['unwanted_property']
The problem is that this doesn't amend the original data. It also relies on items always being the last string in the path, which may not always be the case.
CLARIFICATION: I mean that I'd like to end up with:
{
'country': {
'city': {
'items': [
{
'name': '114th Street'
},
{
'name': '8th Avenue'
},
]
}
}
}
Whereas what I have available in data is only [{'name': '114th Street'}, {'name': '8th Avenue'}].
I feel like I need something like XPath for the dictionary.
The problem you are overwriting the original data reference. Change your processing code to
temp = data
for p in path:
temp = temp[p]
if isinstance(temp, list):
for d in temp:
del d['unwanted_property']
else:
del temp['unwanted_property']
In this version, you set temp to point to the same object that data was referring to. temp is not a copy, so any changes you make to it will be visible in the original object. Then you step temp along itself, while data remains a reference to the root dictionary. When you find the path you are looking for, any changes made via temp will be visible in data.
I also removed the line data = [i for i in data[p]]. It creates an unnecessary copy of the list that you never need, since you are not modifying the references stored in the list, just the contents of the references.
The fact that path is not pre-determined (besides the fact that items is going to be a list) means that you may end up getting a KeyError in the first loop if the path does not exist in your dictionary. You can handle that gracefully be doing something more like:
try:
temp = data
for p in path:
temp = temp[p]
except KeyError:
print('Path {} not in data'.format(path))
else:
if isinstance(temp, list):
for d in temp:
del d['unwanted_property']
else:
del temp['unwanted_property']
The problem you are facing is that you are re-assigning the data variable to an undesired value. In the body of your for loop you are setting data to the next level down on the tree, for instance given your example data will have the following values (in order), up to when it leaves the for loop:
data == {'country': {'city': {'items': [{'name': '114th Street', 'unwanted_property': 'foo',}, {'name': '8th Avenue', 'unwanted_property': 'foo',},]}}}
data == {'city': {'items': [{'name': '114th Street', 'unwanted_property': 'foo',}, {'name': '8th Avenue', 'unwanted_property': 'foo',},]}}
data == {'items': [{'name': '114th Street', 'unwanted_property': 'foo',}, {'name': '8th Avenue', 'unwanted_property': 'foo',},]}
data == [{'name': '114th Street', 'unwanted_property': 'foo',}, {'name': '8th Avenue', 'unwanted_property': 'foo',},]
Then when you delete the items from your dictionaries at the end you are left with data being a list of those dictionaries as you have lost the higher parts of the structure. Thus if you make a backup reference for your data you can get the correct output, for example:
path = ['country', 'city', 'items']
data = {
'country': {
'city': {
'items': [
{
'name': '114th Street',
'unwanted_property': 'foo',
},
{
'name': '8th Avenue',
'unwanted_property': 'foo',
},
]
}
}
}
data_ref = data
for p in path:
if p == 'items':
data = [i for i in data[p]]
else:
data = data[p]
if isinstance(data, list):
for d in data:
del d['unwanted_property']
else:
del data['unwanted_property']
data = data_ref
def delKey(your_dict,path):
if len(path) == 1:
for item in your_dict:
del item[path[0]]
return
delKey( your_dict[path[0]],path[1:])
data
{'country': {'city': {'items': [{'name': '114th Street', 'unwanted_property': 'foo'}, {'name': '8th Avenue', 'unwanted_property': 'foo'}]}}}
path
['country', 'city', 'items', 'unwanted_property']
delKey(data,path)
data
{'country': {'city': {'items': [{'name': '114th Street'}, {'name': '8th Avenue'}]}}}
You need to remove the key unwanted_property.
names_list = []
def remove_key_from_items(data):
for d in data:
if d != 'items':
remove_key_from_items(data[d])
else:
for item in data[d]:
unwanted_prop = item.pop('unwanted_property', None)
names_list.append(item)
This will remove the key. The second parameter None is returned if the key unwanted_property does not exist.
EDIT:
You can use pop even without the second parameter. It will raise KeyError if the key does not exist.
EDIT 2: Updated to recursively go into depth of data dict until it finds the items key, where it pops the unwanted_property as desired and append into the names_list list to get the desired output.
Using operator.itemgetter you can compose a function to return the final key's value.
import operator, functools
def compose(*functions):
'''returns a callable composed of the functions
compose(f, g, h, k) -> f(g(h(k())))
'''
def compose2(f, g):
return lambda x: f(g(x))
return functools.reduce(compose2, functions, lambda x: x)
get_items = compose(*[operator.itemgetter(key) for key in path[::-1]])
Then use it like this:
path = ['country', 'city', 'items']
unwanted_property = 'unwanted_property'
for thing in get_items(data):
del thing[unwanted_property]
Of course if the path contains non-existent keys it will throw a KeyError - you probably should account for that:
path = ['country', 'foo', 'items']
get_items = compose(*[operator.itemgetter(key) for key in path[::-1]])
try:
for thing in get_items(data):
del thing[unwanted_property]
except KeyError as e:
print('missing key:', e)
You can try this:
path = ['country', 'city', 'items']
previous_data = data[path[0]]
previous_key = path[0]
for i in path:
previous_data = previous_data[i]
previous_key = i
if isinstance(previous_data, list):
for c, b in enumerate(previous_data):
if "unwanted_property" in b:
del previous_data[c]["unwanted_property"]
current_dict = {}
previous_data_dict = {}
for i, a in enumerate(path):
if i == 0:
current_dict[a] = data[a]
previous_data_dict = data[a]
else:
if a == previous_key:
current_dict[a] = previous_data
else:
current_dict[a] = previous_data_dict[a]
previous_data_dict = previous_data_dict[a]
data = current_dict
print(data)
Output:
{'country': {'city': {'items': [{'name': '114th Street'}, {'name': '8th Avenue'}]}}, 'items': [{'name': '114th Street'}, {'name': '8th Avenue'}], 'city': {'items': [{'name': '114th Street'}, {'name': '8th Avenue'}]}}

Add values to dictionary / tuple

I have a tuple which I convert to a dict. The outcome is e.g.:
{'test-rz-01.test.de': '10.60.1.100','test2.test.de': '10.60.1.10’}
I now need to add "static" content to each entry so that it looks like a list of dicts:
[{'name': 'test-rz-01.test.de', 'ipv4addr':'10.60.1.100', 'view':
'External', 'zone': 'test.de'}, {'name': 'test2.test.de', 'ipv4addr':
'10.60.1.10’, 'view': 'External', 'zone': 'test.de'}]
what would be the "best" way to accomplish this?
Starting with your dictionary
>>> d = {'test-rz-01.htwk-leipzigtest.de': '10.60.1.100', 'test2.test.de': '10.60.1.10'}
employ a list comprehension in which you construct the dicts.
>>> [{'name':domain, 'ipv4addr':ip, 'view': 'External', 'zone': 'test.de'}
... for domain, ip in d.items()]
output:
[{'ipv4addr': '10.60.1.10', 'name': 'test2.test.de', 'zone': 'test.de', 'view': 'External'}, {'ipv4addr': '10.60.1.100', 'name': 'test-rz-01.htwk-leipzigtest.de', 'zone': 'test.de', 'view': 'External'}]

Categories