I am trying to create a dictionary from a list recursively and my code only works when there is only one item in the list. It fails for multiple items and I suspect that this is because the dictionary is being recreated through each instance of the recursion instead of adding to it after the first instance. How can I avoid doing this so that the whole list is converted to a dictionary?
Note: the list is a list of tuples containing two items.
def poncePlanner(restaurantChoices):
if len(restaurantChoices) == 0:
return {}
else:
name, resto = restaurantChoices[0][0], restaurantChoices[0][1]
try:
dic[name] = resto
poncePlanner(restaurantChoices[1:])
return dic
except:
dic = {name: resto}
poncePlanner(restaurantChoices[1:])
return dic
Intended input and output:
>>> restaurantChoice = [("Paige", "Dancing Goats"), ("Fareeda", "Botiwala"),
("Ramya", "Minero"), ("Jane", "Pancake Social")]
>>> poncePlanner(restaurantChoice)
{'Jane': 'Pancake Social',
'Ramya': 'Minero',
'Fareeda': 'Botiwala',
'Paige': 'Dancing Goats'}
You have the edge condition, so you need to define what to do when you have more than one. Here you just take the first tuple, make a dict, and then add the results of recursion into that dict:
restaurantChoice = [("Paige", "Dancing Goats"), ("Fareeda", "Botiwala"),
("Ramya", "Minero"), ("Jane", "Pancake Social")]
def poncePlanner(restaurantChoice):
if not restaurantChoice:
return {}
head, *rest = restaurantChoice
return {head[0]: head[1], **poncePlanner(rest)}
poncePlanner(restaurantChoice)
Returning:
{'Jane': 'Pancake Social',
'Ramya': 'Minero',
'Fareeda': 'Botiwala',
'Paige': 'Dancing Goats'}
Since restaurantChoices are already (key,value) pairs, you can simply use the built-in function dict to create the dictionary:
def poncePlanner(restaurantChoices):
return dict(restaurantChoices)
Without built-in functions, you can also use a simple for-loop to return the desired transformation:
def poncePlanner(restaurantChoices):
result = {}
for key,value in restaurantChoices:
result[key] = value
return result
If you really want recursion, I would do something like this, but it doesn't make sense because the lists are not nested:
def poncePlanner(restaurantChoices):
def recursion(i, result):
if i<len(restaurantChoices):
key, value = restaurantChoices[i]
result[key] = value
recursion(i+1, result)
return result
return recursion(0,{})
This recursive function has O(1) time/space complexity per call, so it runs with optimal efficiency.
The main problem with the original code is that the dictionary dic is not passed to the deeper recursion calls, so the new contents are never added to the final dictionary. (They contents added to new dictionaries and forgotten).
I had to remove some fields from a dictionary, the keys for those fields are on a list. So I wrote this function:
def delete_keys_from_dict(dict_del, lst_keys):
"""
Delete the keys present in lst_keys from the dictionary.
Loops recursively over nested dictionaries.
"""
dict_foo = dict_del.copy() #Used as iterator to avoid the 'DictionaryHasChanged' error
for field in dict_foo.keys():
if field in lst_keys:
del dict_del[field]
if type(dict_foo[field]) == dict:
delete_keys_from_dict(dict_del[field], lst_keys)
return dict_del
This code works, but it's not very elegant and I'm sure that there is a better solution.
First of, I think your code is working and not inelegant. There's no immediate reason not to use the code you presented.
There are a few things that could be better though:
Comparing the type
Your code contains the line:
if type(dict_foo[field]) == dict:
That can be definitely improved. Generally (see also PEP8) you should use isinstance instead of comparing types:
if isinstance(dict_foo[field], dict)
However that will also return True if dict_foo[field] is a subclass of dict. If you don't want that, you could also use is instead of ==. That will be marginally (and probably unnoticeable) faster.
If you also want to allow arbitary dict-like objects you could go a step further and test if it's a collections.abc.MutableMapping. That will be True for dict and dict subclasses and for all mutable mappings that explicitly implement that interface without subclassing dict, for example UserDict:
>>> from collections import MutableMapping
>>> # from UserDict import UserDict # Python 2.x
>>> from collections import UserDict # Python 3.x - 3.6
>>> # from collections.abc import MutableMapping # Python 3.7+
>>> isinstance(UserDict(), MutableMapping)
True
>>> isinstance(UserDict(), dict)
False
Inplace modification and return value
Typically functions either modify a data structure inplace or return a new (modified) data structure. Just to mention a few examples: list.append, dict.clear, dict.update all modify the data structure inplace and return None. That makes it easier to keep track what a function does. However that's not a hard rule and there are always valid exceptions from this rule. However personally I think a function like this doesn't need to be an exception and I would simply remove the return dict_del line and let it implicitly return None, but YMMV.
Removing the keys from the dictionary
You copied the dictionary to avoid problems when you remove key-value pairs during the iteration. However, as already mentioned by another answer you could just iterate over the keys that should be removed and try to delete them:
for key in keys_to_remove:
try:
del dict[key]
except KeyError:
pass
That has the additional advantage that you don't need to nest two loops (which could be slower, especially if the number of keys that need to be removed is very long).
If you don't like empty except clauses you can also use: contextlib.suppress (requires Python 3.4+):
from contextlib import suppress
for key in keys_to_remove:
with suppress(KeyError):
del dict[key]
Variable names
There are a few variables I would rename because they are just not descriptive or even misleading:
delete_keys_from_dict should probably mention the subdict-handling, maybe delete_keys_from_dict_recursive.
dict_del sounds like a deleted dict. I tend to prefer names like dictionary or dct because the function name already describes what is done to the dictionary.
lst_keys, same there. I'd probably use just keys there. If you want to be more specific something like keys_sequence would make more sense because it accepts any sequence (you just have to be able to iterate over it multiple times), not just lists.
dict_foo, just no...
field isn't really appropriate either, it's a key.
Putting it all together:
As I said before I personally would modify the dictionary in-place and not return the dictionary again. Because of that I present two solutions, one that modifies it in-place but doesn't return anything and one that creates a new dictionary with the keys removed.
The version that modifies in-place (very much like Ned Batchelders solution):
from collections import MutableMapping
from contextlib import suppress
def delete_keys_from_dict(dictionary, keys):
for key in keys:
with suppress(KeyError):
del dictionary[key]
for value in dictionary.values():
if isinstance(value, MutableMapping):
delete_keys_from_dict(value, keys)
And the solution that returns a new object:
from collections import MutableMapping
def delete_keys_from_dict(dictionary, keys):
keys_set = set(keys) # Just an optimization for the "if key in keys" lookup.
modified_dict = {}
for key, value in dictionary.items():
if key not in keys_set:
if isinstance(value, MutableMapping):
modified_dict[key] = delete_keys_from_dict(value, keys_set)
else:
modified_dict[key] = value # or copy.deepcopy(value) if a copy is desired for non-dicts.
return modified_dict
However it only makes copies of the dictionaries, the other values are not returned as copy, you could easily wrap these in copy.deepcopy (I put a comment in the appropriate place of the code) if you want that.
def delete_keys_from_dict(dict_del, lst_keys):
for k in lst_keys:
try:
del dict_del[k]
except KeyError:
pass
for v in dict_del.values():
if isinstance(v, dict):
delete_keys_from_dict(v, lst_keys)
return dict_del
Since the question requested an elegant way, I'll submit my general-purpose solution to wrangling nested structures. First, install the boltons utility package with pip install boltons, then:
from boltons.iterutils import remap
data = {'one': 'remains', 'this': 'goes', 'of': 'course'}
bad_keys = set(['this', 'is', 'a', 'list', 'of', 'keys'])
drop_keys = lambda path, key, value: key not in bad_keys
clean = remap(data, visit=drop_keys)
print(clean)
# Output:
{'one': 'remains'}
In short, the remap utility is a full-featured, yet succinct approach to handling real-world data structures which are often nested, and can even contain cycles and special containers.
This page has many more examples, including ones working with much larger objects from Github's API.
It's pure-Python, so it works everywhere, and is fully tested in Python 2.7 and 3.3+. Best of all, I wrote it for exactly cases like this, so if you find a case it doesn't handle, you can bug me to fix it right here.
def delete_keys_from_dict(d, to_delete):
if isinstance(to_delete, str):
to_delete = [to_delete]
if isinstance(d, dict):
for single_to_delete in set(to_delete):
if single_to_delete in d:
del d[single_to_delete]
for k, v in d.items():
delete_keys_from_dict(v, to_delete)
elif isinstance(d, list):
for i in d:
delete_keys_from_dict(i, to_delete)
d = {'a': 10, 'b': [{'c': 10, 'd': 10, 'a': 10}, {'a': 10}], 'c': 1 }
delete_keys_from_dict(d, ['a', 'c']) # inplace deletion
print(d)
>>> {'b': [{'d': 10}, {}]}
This solution works for dict and list in a given nested dict. The input to_delete can be a list of str to be deleted or a single str.
Plese note, that if you remove the only key in a dict, you will get an empty dict.
I think the following is more elegant:
def delete_keys_from_dict(dict_del, lst_keys):
if not isinstance(dict_del, dict):
return dict_del
return {
key: value
for key, value in (
(key, delete_keys_from_dict(value, lst_keys))
for key, value in dict_del.items()
)
if key not in lst_keys
}
Example usage:
test_dict_in = {
1: {1: {0: 2, 3: 4}},
0: {2: 3},
2: {5: {0: 4}, 6: {7: 8}},
}
test_dict_out = {
1: {1: {3: 4}},
2: {5: {}, 6: {7: 8}},
}
assert delete_keys_from_dict(test_dict_in, [0]) == test_dict_out
Since you already need to loop through every element in the dict, I'd stick with a single loop and just make sure to use a set for looking up the keys to delete
def delete_keys_from_dict(dict_del, the_keys):
"""
Delete the keys present in the lst_keys from the dictionary.
Loops recursively over nested dictionaries.
"""
# make sure the_keys is a set to get O(1) lookups
if type(the_keys) is not set:
the_keys = set(the_keys)
for k,v in dict_del.items():
if k in the_keys:
del dict_del[k]
if isinstance(v, dict):
delete_keys_from_dict(v, the_keys)
return dict_del
this works with dicts containing Iterables (list, ...) that may contain dict. Python 3. For Python 2 unicode should also be excluded from the iteration. Also there may be some iterables that don't work that I'm not aware of. (i.e. will lead to inifinite recursion)
from collections.abc import Iterable
def deep_omit(d, keys):
if isinstance(d, dict):
for k in keys:
d.pop(k, None)
for v in d.values():
deep_omit(v, keys)
elif isinstance(d, Iterable) and not isinstance(d, str):
for e in d:
deep_omit(e, keys)
return d
Since nobody posted an interactive version that could be useful for someone:
def delete_key_from_dict(adict, key):
stack = [adict]
while stack:
elem = stack.pop()
if isinstance(elem, dict):
if key in elem:
del elem[key]
for k in elem:
stack.append(elem[k])
This version is probably what you would push to production. The recursive version is elegant and easy to write but it scales badly (by default Python uses a maximum recursion depth of 1000).
If you have nested keys as well and based on #John La Rooy's answer here is an elegant solution:
from boltons.iterutils import remap
def sof_solution():
data = {"user": {"name": "test", "pwd": "******"}, "accounts": ["1", "2"]}
sensitive = {"user.pwd", "accounts"}
clean = remap(
data,
visit=lambda path, key, value: drop_keys(path, key, value, sensitive)
)
print(clean)
def drop_keys(path, key, value, sensitive):
if len(path) > 0:
nested_key = f"{'.'.join(path)}.{key}"
return nested_key not in sensitive
return key not in sensitive
sof_solution() # prints {'user': {'name': 'test'}}
Using the awesome code from this post and add a small statement:
def remove_fields(self, d, list_of_keys_to_remove):
if not isinstance(d, (dict, list)):
return d
if isinstance(d, list):
return [v for v in (self.remove_fields(v, list_of_keys_to_remove) for v in d) if v]
return {k: v for k, v in ((k, self.remove_fields(v, list_of_keys_to_remove)) for k, v in d.items()) if k not in list_of_keys_to_remove}
I came here to search for a solution to remove keys from deeply nested Python3 dicts and all solutions seem to be somewhat complex.
Here's a oneliner for removing keys from nested or flat dicts:
nested_dict = {
"foo": {
"bar": {
"foobar": {},
"shmoobar": {}
}
}
}
>>> {'foo': {'bar': {'foobar': {}, 'shmoobar': {}}}}
nested_dict.get("foo", {}).get("bar", {}).pop("shmoobar", None)
>>> {'foo': {'bar': {'foobar': {}}}}
I used .get() to not get KeyError and I also provide empty dict as default value up to the end of the chain. I do pop() for the last element and I provide None as the default there to avoid KeyError.
Basically the title. I'm trying to store information about duplicate objects in a list of objects, but I'm having a hard time finding anything related to this. I've devised this for now, but I'm not sure if this is the best way for what I want to do :
#dataclass
class People:
name: str = None
age: int = None
# Functions to check for duplicates (based on names)
def __eq__(self, other):
return (self.name == other.name)
def __hash__(self):
return hash(('name', self.name))
objects = [People("General", 12), People("Kenobi", 11), People("General", 15)]
duplicates, temp = [], {}
for (i, object) in enumerate(objects):
if (not object.name in temp):
temp[object.name] = {'count': 1,
'indices': [i]}
else:
temp[object.name]['count'] += 1
temp[object.name]['indices'] += [i]
for t in temp:
if (temp[t]['count'] > 1):
print(f"Found duplicates of {t}")
for i in temp[t]['indices']:
duplicates.append(objects[i])
Edit : The People class is simple as an example. I thought about making it a dict, but that would be more complicated than keeping track of a list of objects. I'm looking to make a new list of duplicates by name only, while keeping every other attribute/value as the original object
Use collections.Counter.
from collections import Counter
...
counts = Counter(objects)
duplicates = [o for o, c in counts.items() if c > 1]
If you want lists of objects matching certain criteria (e.g. all those with the same name), that's not really the same thing as getting a list of duplicates, but it's also very simple:
from collections import defaultdict
...
people_by_name = defaultdict(list)
for p in objects:
people_by_name[p.name].append(p)
If you want to narrow that dictionary to only lists with more than one element, you can use a comprehension very similar to the one you'd use with the Counter:
people_by_name = {k: v for k, v in people_by_name.items() if len(v) > 1}
I have a list that contains several strings and a dictionary with strings (that contain wildcards) as keys and integers as values.
For example like this:
list1 = ['i', 'like', 'tomatoes']
dict1 = {'tomato*':'3', 'shirt*':'7', 'snowboard*':'1'}
I would like to go through list1 and see if there is a key in dict1 that (with the wildcard) matches the string from list1 and get the respective value from dict1. So in this case 3 for 'tomato*'.
Is there a way to iterate over list1, see if one of the dict1 keys (with wildcards) matches with this particular string and return the value from dict1?
I know I could iterate over dict1 and compare the keys with the elements in list1 this way. But in my case, the dict is very large and in addition, I have a lot of lists to go through. So it would take too much time to loop through the dictionary every time.
I thought about turning the keys into a list as well and get wildcard matches with a list comprehension and fnmatch(), but the returned match wouldn't be able to find the value in the dict (because of the wildcard).
Here is a data structure implemented using default python package to help you.
from collections import defaultdict
class Trie(defaultdict):
def __init__(self, value=None):
super().__init__(lambda: Trie(value)) # Trie is essentially hash-table within hash-table
self.__value = value
def __getitem__(self, key):
node = self
if len(key) > 1: # allows you to access the trie like this trie["abc"] instead of trie["a"]["b"]["c"]
for char in key:
node = node[char]
return node
else: # actual getitem routine
return defaultdict.__getitem__(self, key)
def __setitem__(self, key, value):
node = self
if len(key) > 1: # allows you to access the trie like this trie["abc"] instead of trie["a"]["b"]["c"]
for char in key[:-1]:
node = node[char]
node[key[-1]] = value
else: # actual setitem routine
if type(value) is int:
value = Trie(int(value))
defaultdict.__setitem__(self, key, value)
def __str__(self):
return str(self.__value)
d = Trie()
d["ab"] = 3
print(d["abcde"])
3
I need a dictionary data structure that store dictionaries as seen below:
custom = {1: {'a': np.zeros(10), 'b': np.zeros(100)},
2: {'c': np.zeros(20), 'd': np.zeros(200)}}
But the problem is that I iterate over this data structure many times in my code. Every time I iterate over it, I need the order of iteration to be respected because all the elements in this complex data structure are mapped to a 1D array (serialized if you will), and thus the order is important. I thought about writing a ordered dict of ordered dict for that matter, but I'm not sure this is the right solution as it seems I may be choosing the wrong data structure. What would be the most adequate solution for my case?
UPDATE
So this is what I came up with so far:
class Test(list):
def __init__(self, *args, **kwargs):
super(Test, self).__init__(*args, **kwargs)
for k,v in args[0].items():
self[k] = OrderedDict(v)
self.d = -1
self.iterator = iter(self[-1].keys())
self.etype = next(self.iterator)
self.idx = 0
def __iter__(self):
return self
def __next__(self):
try:
self.idx += 1
return self[self.d][self.etype][self.idx-1]
except IndexError:
self.etype = next(self.iterator)
self.idx = 0
return self[self.d][self.etype][self.idx-1]
def __call__(self, d):
self.d = -1 - d
self.iterator = iter(self[self.d].keys())
self.etype = next(self.iterator)
self.idx = 0
return self
def main(argv=()):
tst = Test(elements)
for el in tst:
print(el)
# loop over a lower dimension
for el in tst(-2):
print(el)
print(tst)
return 0
if __name__ == "__main__":
sys.exit(main())
I can iterate as many times as I want in this ordered structure, and I implemented __call__ so I can iterate over the lower dimensions. I don't like the fact that if there isn't a lower dimension present in the list, it doesn't give me any errors. I also have the feeling that every time I call return self[self.d][self.etype][self.idx-1] is less efficient than the original iteration over the dictionary. Is this true? How can I improve this?
Here's another alternative that uses an OrderedDefaultdict to define the tree-like data structure you want. I'm reusing the definition of it from another answer of mine.
To make use of it, you have to ensure the entries are defined in the order you want to access them in later on.
class OrderedDefaultdict(OrderedDict):
def __init__(self, *args, **kwargs):
if not args:
self.default_factory = None
else:
if not (args[0] is None or callable(args[0])):
raise TypeError('first argument must be callable or None')
self.default_factory = args[0]
args = args[1:]
super(OrderedDefaultdict, self).__init__(*args, **kwargs)
def __missing__ (self, key):
if self.default_factory is None:
raise KeyError(key)
self[key] = default = self.default_factory()
return default
def __reduce__(self): # optional, for pickle support
args = (self.default_factory,) if self.default_factory else ()
return self.__class__, args, None, None, self.iteritems()
Tree = lambda: OrderedDefaultdict(Tree)
custom = Tree()
custom[1]['a'] = np.zeros(10)
custom[1]['b'] = np.zeros(100)
custom[2]['c'] = np.zeros(20)
custom[2]['d'] = np.zeros(200)
I'm not sure I understand your follow-on question. If the data structure is limited to two levels, you could use nested for loops to iterate over its elements in the order they were defined. For example:
for key1, subtree in custom.items():
for key2, elem in subtree.items():
print('custom[{!r}][{!r}]: {}'.format(key1, key2, elem))
(In Python 2 you'd want to use iteritems() instead of items().)
I think using OrderedDicts is the best way. They're built-in and relatively fast:
custom = OrderedDict([(1, OrderedDict([('a', np.zeros(10)),
('b', np.zeros(100))])),
(2, OrderedDict([('c', np.zeros(20)),
('d', np.zeros(200))]))])
If you want to make it easy to iterate over the contents of the your data structure, you can always provide a utility function to do so:
def iter_over_contents(data_structure):
for delem in data_structure.values():
for v in delem.values():
for row in v:
yield row
Note that in Python 3.3+, which allows yield from <expression>, the last for loop can be eliminated:
def iter_over_contents(data_structure):
for delem in data_structure.values():
for v in delem.values():
yield from v
With one of those you'll then be able to write something like:
for elem in iter_over_contents(custom):
print(elem)
and hide the complexity.
While you could define your own class in an attempt to encapsulate this data structure and use something like the iter_over_contents() generator function as its __iter__() method, that approach would likely be slower and wouldn't allow expressions using two levels of indexing such this following:
custom[1]['b']
which using nested dictionaries (or OrderedDefaultdicts as shown in my other answer) would.
Could you just use a list of dictionaries?
custom = [{'a': np.zeros(10), 'b': np.zeros(100)},
{'c': np.zeros(20), 'd': np.zeros(200)}]
This could work if the outer dictionary is the only one you need in the right order. You could still access the inner dictionaries with custom[0] or custom[1] (careful, indexing now starts at 0).
If not all of the indices are used, you could do the following:
custom = [None] * maxLength # maximum dict size you expect
custom[1] = {'a': np.zeros(10), 'b': np.zeros(100)}
custom[2] = {'c': np.zeros(20), 'd': np.zeros(200)}
You can fix the order of your keys while iterating when you sort them first:
for key in sorted(custom.keys()):
print(key, custom[key])
If you want to reduce sorted()-calls, you may want to store the keys in an extra list which will then serve as your iteration order:
ordered_keys = sorted(custom.keys())
for key in ordered_keys:
print(key, custom[key])
You should be ready to go for as many iterations over your data structure, as you need.