I have this data
data = [
{
'id': 'abcd738asdwe',
'name': 'John',
'mail': 'test#test.com',
},
{
'id': 'ieow83janx',
'name': 'Jane',
'mail': 'test#foobar.com',
}
]
The id's are unique, it's impossible that multiple dictonaries have the same id.
For example I want to get the item with the id "ieow83janx".
My current solution looks like this:
search_id = 'ieow83janx'
item = [x for x in data if x['id'] == search_id][0]
Do you think that's the be solution or does anyone know an alternative solution?
Since the ids are unique, you can store the items in a dictionary to achieve O(1) lookup.
lookup = {ele['id']: ele for ele in data}
then you can do
user_info = lookup[user_id]
to retrieve it
If you are going to get this kind of operations more than once on this particular object, I would recommend to translate it into a dictionary with id as a key.
data = [
{
'id': 'abcd738asdwe',
'name': 'John',
'mail': 'test#test.com',
},
{
'id': 'ieow83janx',
'name': 'Jane',
'mail': 'test#foobar.com',
}
]
data_dict = {item['id']: item for item in data}
#=> {'ieow83janx': {'mail': 'test#foobar.com', 'id': 'ieow83janx', 'name': 'Jane'}, 'abcd738asdwe': {'mail': 'test#test.com', 'id': 'abcd738asdwe', 'name': 'John'}}
data_dict['ieow83janx']
#=> {'mail': 'test#foobar.com', 'id': 'ieow83janx', 'name': 'Jane'}
In this case, this lookup operation will cost you some constant* O(1) time instead of O(N).
How about the next built-in function (docs):
>>> data = [
... {
... 'id': 'abcd738asdwe',
... 'name': 'John',
... 'mail': 'test#test.com',
... },
... {
... 'id': 'ieow83janx',
... 'name': 'Jane',
... 'mail': 'test#foobar.com',
... }
... ]
>>> search_id = 'ieow83janx'
>>> next(x for x in data if x['id'] == search_id)
{'id': 'ieow83janx', 'name': 'Jane', 'mail': 'test#foobar.com'}
EDIT:
It raises StopIteration if no match is found, which is a beautiful way to handle absence:
>>> search_id = 'does_not_exist'
>>> try:
... next(x for x in data if x['id'] == search_id)
... except StopIteration:
... print('Handled absence!')
...
Handled absence!
Without creating a new dictionary or without writing several lines of code, you can simply use the built-in filter function to get the item lazily, not checking after it finds the match.
next(filter(lambda d: d['id']==search_id, data))
should for just fine.
Would this not achieve your goal?
for i in data:
if i.get('id') == 'ieow83janx':
print(i)
(xenial)vash#localhost:~/python$ python3.7 split.py
{'id': 'ieow83janx', 'name': 'Jane', 'mail': 'test#foobar.com'}
Using comprehension:
[i for i in data if i.get('id') == 'ieow83janx']
if any(item['id']=='ieow83janx' for item in data):
#return item
As any function returns true if iterable (List of dictionaries in your case) has value present.
While using Generator Expression there will not be need of creating internal List. As there will not be duplicate values for the id in List of dictionaries, any will stop the iteration until the condition returns true. i.e the generator expression with any will stop iterating on shortcircuiting. Using List comprehension will create a entire List in the memory where as GE creates the element on the fly which will be better if you are having large items as it uses less memory.
Related
I have a problem. I have a dict my_Dict. This is somewhat nested. However, I would like to 'clean up' the dict my_Dict, by this I mean that I would like to separate all nested ones and also generate a unique ID so that I can later find the corresponding object again.
For example, I have detail: {...}, this nested, should later map an independent dict my_Detail_Dict and in addition, detail should receive a unique ID within my_Dict. Unfortunately, my list that I give out is empty. How can I remove my slaughtered keys and give them an ID?
my_Dict = {
'_key': '1',
'group': 'test',
'data': {},
'type': '',
'code': '007',
'conType': '1',
'flag': None,
'createdAt': '2021',
'currency': 'EUR',
'detail': {
'selector': {
'number': '12312',
'isTrue': True,
'requirements': [{
'type': 'customer',
'requirement': '1'}]
}
}
}
def nested_dict(my_Dict):
my_new_dict_list = []
for key in my_Dict.keys():
#print(f"Looking for {key}")
if isinstance(my_Dict[key], dict):
print(f"{key} is nested")
# Add id to nested stuff
my_Dict[key]["__id"] = 1
my_nested_Dict = my_Dict[key]
# Delete all nested from the key
del my_Dict[key]
# Add id to key, but not the nested stuff
my_Dict[key] = 1
my_new_dict_list.append(my_Dict[key])
my_new_dict_list.append(my_Dict)
return my_new_dict_list
nested_dict(my_Dict)
[OUT] []
# What I want
[my_Dict, my_Details_Dict, my_Data_Dict]
What I have
{'_key': '1',
'group': 'test',
'data': {},
'type': '',
'code': '007',
'conType': '1',
'flag': None,
'createdAt': '2021',
'currency': 'EUR',
'detail': {'selector': {'number': '12312',
'isTrue': True,
'requirements': [{'type': 'customer', 'requirement': '1'}]}}}
What I want
my_Dict = {'_key': '1',
'group': 'test',
'data': 18,
'type': '',
'code': '007',
'conType': '1',
'flag': None,
'createdAt': '2021',
'currency': 'EUR',
'detail': 22}
my_Data_Dict = {'__id': 18}
my_Detail_Dict = {'selector': {'number': '12312',
'isTrue': True,
'requirements': [{'type': 'customer', 'requirement': '1'}]}, '__id': 22}
The following code snippet will solve what you are trying to do:
my_Dict = {
'_key': '1',
'group': 'test',
'data': {},
'type': '',
'code': '007',
'conType': '1',
'flag': None,
'createdAt': '2021',
'currency': 'EUR',
'detail': {
'selector': {
'number': '12312',
'isTrue': True,
'requirements': [{
'type': 'customer',
'requirement': '1'}]
}
}
}
def nested_dict(my_Dict):
# Initializing a dictionary that will store all the nested dictionaries
my_new_dict = {}
idx = 0
for key in my_Dict.keys():
# Checking which keys are nested i.e are dictionaries
if isinstance(my_Dict[key], dict):
# Generating ID
idx += 1
# Adding generated ID as another key
my_Dict[key]["__id"] = idx
# Adding nested key with the ID to the new dictionary
my_new_dict[key] = my_Dict[key]
# Replacing nested key value with the generated ID
my_Dict[key] = idx
# Returning new dictionary containing all nested dictionaries with ID
return my_new_dict
result = nested_dict(my_Dict)
print(my_Dict)
# Iterating through dictionary to get all nested dictionaries
for item in result.items():
print(item)
If I understand you correctly, you wish to automatically make each nested dictionary it's own variable, and remove it from the main dictionary.
Finding the nested dictionaries and removing them from the main dictionary is not so difficult. However, automatically assigning them to a variable is not recommended for various reasons. Instead, what I would do is store all these dictionaries in a list, and then assign them manually to a variable.
# Prepare a list to store data in
inidividual_dicts = []
id_index = 1
for key in my_Dict.keys():
# For each key, we get the current value
value = my_Dict[key]
# Determine if the current value is a dictionary. If so, then it's a nested dict
if isinstance(value, dict):
print(key + " is a nested dict")
# Get the nested dictionary, and replace it with the ID
dict_value = my_Dict[key]
my_Dict[key] = id_index
# Add the id to previously nested dictionary
dict_value['__id'] = id_index
id_index = id_index + 1 # increase for next nested dic
inidividual_dicts.append(dict_value) # store it as a new dictionary
# Manually write out variables names, and assign the nested dictionaries to it.
[my_Details_Dict, my_Data_Dict] = inidividual_dicts
I have an object in Python 3 of this format:
a = {
'events': [
{
'timestamp': 123,
'message': 'test'
},
{
'timestamp': 456,
'message': 'foo'
},
{
'timestamp': 789,
'message': 'testbar'
},
],
'first': 'abc',
'last': 'def'
}
I want to create a new object of the same format, but filtered by whether the message key's corresponding value contains a certain string, for example filtering by "test":
a = {
'events': [
{
'timestamp': 123,
'message': 'test'
},
{
'timestamp': 789,
'message': 'testbar'
},
],
'first': 'abc',
'last': 'def'
}
Can I use a nested comprehension for this? I know you can do nested list comprehensions like:
[[y*2 for y in x] for x in l]
But is there a neat way for a dict > list > dict situation?
One option would be to create a new copy of the input dict without events, and then set the filtered events as you require, like this:
copy = {k: v for k, v in a.items() if k != 'events'}
copy['events'] = [e for e in a['events'] if 'test' in e['message']]
Or if you don't mind overwriting the original input, simply do this:
a['events'] = [e for e in a['events'] if 'test' in e['message']]
I would go with a list comprehension with an if-statement like the following:
[event for event in a["events"] if event["message"] == "test" ]
Loop through the values of the "events"-key and add them to the list if the value of their "message" key equals "test".
The result is a list of dictionaries that you can assign back to a["events"] or a copy of a if you would like to preserve a["events"].
So - you can use multiple layers of comprehension, but that doesn't mean you should. I think for such an example, you'd produce cleaner code, by running it through a couple of for loops. Having that said, I think the following is technically achieves the outcome you're asking for.
>>> pprint.pprint(a)
{'events': [{'message': 'test', 'timestamp': 123},
{'message': 'foo', 'timestamp': 456},
{'message': 'testbar', 'timestamp': 789}],
'first': 'abc',
'last': 'def'}
>>> aa = copy.deepcopy(a)
>>> aa['beta'] = aa['events']
>>> pprint.pprint({k:[item for item in v if 'test' in item['message']] if isinstance(v, list) else v for k, v in aa.items()})
{'beta': [{'message': 'test', 'timestamp': 123},
{'message': 'testbar', 'timestamp': 789}],
'events': [{'message': 'test', 'timestamp': 123},
{'message': 'testbar', 'timestamp': 789}],
'first': 'abc',
'last': 'def'}
>>> pprint.pprint({k:[item for item in v if 'test' in item['message']] if isinstance(v, list) else v for k, v in a.items()})
{'events': [{'message': 'test', 'timestamp': 123},
{'message': 'testbar', 'timestamp': 789}],
'first': 'abc',
'last': 'def'}
As said, this is something you can do; I would however on behalf of everyone who's had to read other people code in their careers, respectfully request that you don't use this in production code. A couple of for loops might be more LOC, but would in most cases be much more readable and maintainable.
In python3 I need to get a JSON response from an API call,
and parse it so I will get a dictionary That only contains the data I need.
The final dictionary I ecxpt to get is as follows:
{'Severity Rules': ('cc55c459-eb1a-11e8-9db4-0669bdfa776e', ['cc637182-eb1a-11e8-9db4-0669bdfa776e']), 'auto_collector': ('57e9a4ec-21f7-4e0e-88da-f0f1fda4c9d1', ['0ab2470a-451e-11eb-8856-06364196e782'])}
the JSON response returns the following output:
{
'RuleGroups': [{
'Id': 'cc55c459-eb1a-11e8-9db4-0669bdfa776e',
'Name': 'Severity Rules',
'Order': 1,
'Enabled': True,
'Rules': [{
'Id': 'cc637182-eb1a-11e8-9db4-0669bdfa776e',
'Name': 'Severity Rule',
'Description': 'Look for default severity text',
'Enabled': False,
'RuleMatchers': None,
'Rule': '\\b(?P<severity>DEBUG|TRACE|INFO|WARN|ERROR|FATAL|EXCEPTION|[I|i]nfo|[W|w]arn|[E|e]rror|[E|e]xception)\\b',
'SourceField': 'text',
'DestinationField': 'text',
'ReplaceNewVal': '',
'Type': 'extract',
'Order': 21520,
'KeepBlockedLogs': False
}],
'Type': 'user'
}, {
'Id': '4f6fa7c6-d60f-49cd-8c3d-02dcdff6e54c',
'Name': 'auto_collector',
'Order': 4,
'Enabled': True,
'Rules': [{
'Id': '2d6bdc1d-4064-11eb-8856-06364196e782',
'Name': 'auto_collector',
'Description': 'DO NOT CHANGE!! Created via API coralogix-blocker tool',
'Enabled': False,
'RuleMatchers': None,
'Rule': 'AUTODISABLED',
'SourceField': 'subsystemName',
'DestinationField': 'subsystemName',
'ReplaceNewVal': '',
'Type': 'block',
'Order': 1,
'KeepBlockedLogs': False
}],
'Type': 'user'
}]
}
I was able to create a dictionary that contains the name and the RuleGroupsID, like that:
response = requests.get(url,headers=headers)
output = response.json()
outputlist=(output["RuleGroups"])
groupRuleName = [li['Name'] for li in outputlist]
groupRuleID = [li['Id'] for li in outputlist]
# Create a dictionary of NAME + ID
ruleDic = {}
for key in groupRuleName:
for value in groupRuleID:
ruleDic[key] = value
groupRuleID.remove(value)
break
Which gave me a simple dictionary:
{'Severity Rules': 'cc55c459-eb1a-11e8-9db4-0669bdfa776e', 'Rewrites': 'ddbaa27e-1747-11e9-9db4-0669bdfa776e', 'Extract': '0cb937b6-2354-d23a-5806-4559b1f1e540', 'auto_collector': '4f6fa7c6-d60f-49cd-8c3d-02dcdff6e54c'}
but when I tried to parse it as nested JSON things just didn't work.
In the end, I managed to create a function that returns this dictionary,
I'm doing it by breaking the JSON into 3 lists by the needed elements (which are Name, Id, and Rules from the first nest), and then create another list from the nested JSON ( which listed everything under Rule) which only create a list from the keyword "Id".
Finally creating a dictionary using a zip command on the lists and dictionaries created earlier.
def get_filtered_rules() -> List[dict]:
groupRuleName = [li['Name'] for li in outputlist]
groupRuleID = [li['Id'] for li in outputlist]
ruleIDList = [li['Rules'] for li in outputlist]
ruleIDListClean = []
ruleClean = []
for sublist in ruleIDList:
try:
lstRule = [item['Rule'] for item in sublist]
ruleClean.append(lstRule)
ruleContent=list(zip(groupRuleName, ruleClean))
ruleContentDictionary = dict(ruleContent)
lstID = [item['Id'] for item in sublist]
ruleIDListClean.append(lstID)
# Create a dictionary of NAME + ID + RuleID
ruleDic = dict(zip(groupRuleName, zip(groupRuleID, ruleIDListClean)))
except Exception as e: print(e)
return ruleDic
I have a list of dictionaries, themselves with nested lists of dictionaries. All of the nest levels have a similar structure, thankfully. I desire to sort these nested lists of dictionaries. I grasp the technique to sort a list of dictionaries by value. I'm struggling with the recursion that will sort the inner lists.
def reorder(l, sort_by):
# I have been trying to add a recursion here
# so that the function calls itself for each
# nested group of "children". So far, fail
return sorted(l, key=lambda k: k[sort_by])
l = [
{ 'name': 'steve',
'children': [
{ 'name': 'sam',
'children': [
{'name': 'sally'},
{'name': 'sabrina'}
]
},
{'name': 'sydney'},
{'name': 'sal'}
]
},
{ 'name': 'fred',
'children': [
{'name': 'fritz'},
{'name': 'frank'}
]
}
]
print(reorder(l, 'name'))
def reorder(l, sort_by):
l = sorted(l, key=lambda x: x[sort_by])
for item in l:
if "children" in item:
item["children"] = reorder(item["children"], sort_by)
return l
Since you state "I grasp the technique to sort a list of dictionaries by value" I will post some code for recursively gathering data from another SO post I made, and leave it to you to implement your sorting technique. The code:
myjson = {
'transportation': 'car',
'address': {
'driveway': 'yes',
'home_address': {
'state': 'TX',
'city': 'Houston'}
},
'work_address': {
'state': 'TX',
'city': 'Sugarland',
'location': 'office-tower',
'salary': 30000}
}
def get_keys(some_dictionary, parent=None):
for key, value in some_dictionary.items():
if '{}.{}'.format(parent, key) not in my_list:
my_list.append('{}.{}'.format(parent, key))
if isinstance(value, dict):
get_keys(value, parent='{}.{}'.format(parent, key))
else:
pass
my_list = []
get_keys(myjson, parent='myjson')
print(my_list)
Is intended to retrieve all keys recursively from the json file. It outputs:
['myjson.address',
'myjson.address.home_address',
'myjson.address.home_address.state',
'myjson.address.home_address.city',
'myjson.address.driveway',
'myjson.transportation',
'myjson.work_address',
'myjson.work_address.state',
'myjson.work_address.salary',
'myjson.work_address.location',
'myjson.work_address.city']
The main thing to note is that if isinstance(value, dict): results in get_keys() being called again, hence the recursive capabilities of it (but only for nested dictionaries in this case).
Simple Python question, but I'm scratching my head over the answer!
I have an array of strings of arbitrary length called path, like this:
path = ['country', 'city', 'items']
I also have a dictionary, data, and a string, unwanted_property. I know that the dictionary is of arbitrary depth and is dictionaries all the way down, with the exception of the items property, which is always an array.
[CLARIFICATION: The point of this question is that I don't know what the contents of path will be. They could be anything. I also don't know what the dictionary will look like. I need to walk down the dictionary as far as the path indicates, and then delete the unwanted properties from there, without knowing in advance what the path looks like, or how long it will be.]
I want to retrieve the parts of the data object (if any) that matches the path, and then delete the unwanted_property from each.
So in the example above, I would like to retrieve:
data['country']['city']['items']
and then delete unwanted_property from each of the items in the array. I want to amend the original data, not a copy. (CLARIFICATION: By this I mean, I'd like to end up with the original dict, just minus the unwanted properties.)
How can I do this in code?
I've got this far:
path = ['country', 'city', 'items']
data = {
'country': {
'city': {
'items': [
{
'name': '114th Street',
'unwanted_property': 'foo',
},
{
'name': '8th Avenue',
'unwanted_property': 'foo',
},
]
}
}
}
for p in path:
if p == 'items':
data = [i for i in data[p]]
else:
data = data[p]
if isinstance(data, list):
for d in data:
del d['unwanted_property']
else:
del data['unwanted_property']
The problem is that this doesn't amend the original data. It also relies on items always being the last string in the path, which may not always be the case.
CLARIFICATION: I mean that I'd like to end up with:
{
'country': {
'city': {
'items': [
{
'name': '114th Street'
},
{
'name': '8th Avenue'
},
]
}
}
}
Whereas what I have available in data is only [{'name': '114th Street'}, {'name': '8th Avenue'}].
I feel like I need something like XPath for the dictionary.
The problem you are overwriting the original data reference. Change your processing code to
temp = data
for p in path:
temp = temp[p]
if isinstance(temp, list):
for d in temp:
del d['unwanted_property']
else:
del temp['unwanted_property']
In this version, you set temp to point to the same object that data was referring to. temp is not a copy, so any changes you make to it will be visible in the original object. Then you step temp along itself, while data remains a reference to the root dictionary. When you find the path you are looking for, any changes made via temp will be visible in data.
I also removed the line data = [i for i in data[p]]. It creates an unnecessary copy of the list that you never need, since you are not modifying the references stored in the list, just the contents of the references.
The fact that path is not pre-determined (besides the fact that items is going to be a list) means that you may end up getting a KeyError in the first loop if the path does not exist in your dictionary. You can handle that gracefully be doing something more like:
try:
temp = data
for p in path:
temp = temp[p]
except KeyError:
print('Path {} not in data'.format(path))
else:
if isinstance(temp, list):
for d in temp:
del d['unwanted_property']
else:
del temp['unwanted_property']
The problem you are facing is that you are re-assigning the data variable to an undesired value. In the body of your for loop you are setting data to the next level down on the tree, for instance given your example data will have the following values (in order), up to when it leaves the for loop:
data == {'country': {'city': {'items': [{'name': '114th Street', 'unwanted_property': 'foo',}, {'name': '8th Avenue', 'unwanted_property': 'foo',},]}}}
data == {'city': {'items': [{'name': '114th Street', 'unwanted_property': 'foo',}, {'name': '8th Avenue', 'unwanted_property': 'foo',},]}}
data == {'items': [{'name': '114th Street', 'unwanted_property': 'foo',}, {'name': '8th Avenue', 'unwanted_property': 'foo',},]}
data == [{'name': '114th Street', 'unwanted_property': 'foo',}, {'name': '8th Avenue', 'unwanted_property': 'foo',},]
Then when you delete the items from your dictionaries at the end you are left with data being a list of those dictionaries as you have lost the higher parts of the structure. Thus if you make a backup reference for your data you can get the correct output, for example:
path = ['country', 'city', 'items']
data = {
'country': {
'city': {
'items': [
{
'name': '114th Street',
'unwanted_property': 'foo',
},
{
'name': '8th Avenue',
'unwanted_property': 'foo',
},
]
}
}
}
data_ref = data
for p in path:
if p == 'items':
data = [i for i in data[p]]
else:
data = data[p]
if isinstance(data, list):
for d in data:
del d['unwanted_property']
else:
del data['unwanted_property']
data = data_ref
def delKey(your_dict,path):
if len(path) == 1:
for item in your_dict:
del item[path[0]]
return
delKey( your_dict[path[0]],path[1:])
data
{'country': {'city': {'items': [{'name': '114th Street', 'unwanted_property': 'foo'}, {'name': '8th Avenue', 'unwanted_property': 'foo'}]}}}
path
['country', 'city', 'items', 'unwanted_property']
delKey(data,path)
data
{'country': {'city': {'items': [{'name': '114th Street'}, {'name': '8th Avenue'}]}}}
You need to remove the key unwanted_property.
names_list = []
def remove_key_from_items(data):
for d in data:
if d != 'items':
remove_key_from_items(data[d])
else:
for item in data[d]:
unwanted_prop = item.pop('unwanted_property', None)
names_list.append(item)
This will remove the key. The second parameter None is returned if the key unwanted_property does not exist.
EDIT:
You can use pop even without the second parameter. It will raise KeyError if the key does not exist.
EDIT 2: Updated to recursively go into depth of data dict until it finds the items key, where it pops the unwanted_property as desired and append into the names_list list to get the desired output.
Using operator.itemgetter you can compose a function to return the final key's value.
import operator, functools
def compose(*functions):
'''returns a callable composed of the functions
compose(f, g, h, k) -> f(g(h(k())))
'''
def compose2(f, g):
return lambda x: f(g(x))
return functools.reduce(compose2, functions, lambda x: x)
get_items = compose(*[operator.itemgetter(key) for key in path[::-1]])
Then use it like this:
path = ['country', 'city', 'items']
unwanted_property = 'unwanted_property'
for thing in get_items(data):
del thing[unwanted_property]
Of course if the path contains non-existent keys it will throw a KeyError - you probably should account for that:
path = ['country', 'foo', 'items']
get_items = compose(*[operator.itemgetter(key) for key in path[::-1]])
try:
for thing in get_items(data):
del thing[unwanted_property]
except KeyError as e:
print('missing key:', e)
You can try this:
path = ['country', 'city', 'items']
previous_data = data[path[0]]
previous_key = path[0]
for i in path:
previous_data = previous_data[i]
previous_key = i
if isinstance(previous_data, list):
for c, b in enumerate(previous_data):
if "unwanted_property" in b:
del previous_data[c]["unwanted_property"]
current_dict = {}
previous_data_dict = {}
for i, a in enumerate(path):
if i == 0:
current_dict[a] = data[a]
previous_data_dict = data[a]
else:
if a == previous_key:
current_dict[a] = previous_data
else:
current_dict[a] = previous_data_dict[a]
previous_data_dict = previous_data_dict[a]
data = current_dict
print(data)
Output:
{'country': {'city': {'items': [{'name': '114th Street'}, {'name': '8th Avenue'}]}}, 'items': [{'name': '114th Street'}, {'name': '8th Avenue'}], 'city': {'items': [{'name': '114th Street'}, {'name': '8th Avenue'}]}}