I am trying to create a dictionary with two strings as a key and I want the keys to be in whatever order.
myDict[('A', 'B')] = 'something'
myDict[('B', 'A')] = 'something else'
print(myDict[('A', 'B')])
I want this piece of code to print 'something else'. Unfortunately, it seems that the ordering matters with tuples. What would be the best data structure to use as the key?
Use a frozenset
Instead of a tuple, which is ordered, you can use a frozenset, which is unordered, while still hashable as frozenset is immutable.
myDict = {}
myDict[frozenset(('A', 'B'))] = 'something'
myDict[frozenset(('B', 'A'))] = 'something else'
print(myDict[frozenset(('A', 'B'))])
Which will print:
something else
Unfortunately, this simplicity comes with a disadvantage, since frozenset is basically a “frozen” set. There will be no duplicate values in the frozenset, for example,
frozenset((1, 2)) == frozenset((1,2,2,1,1))
If the trimming down of values doesn’t bother you, feel free to use frozenset
But if you’re 100% sure that you don’t want what was mentioned above to happen, there are however two alternates:
First method is to use a Counter, and make it hashable by using frozenset again: (Note: everything in the tuple must be hashable)
from collections import Counter
myDict = {}
myDict[frozenset(Counter(('A', 'B')).items())] = 'something'
myDict[frozenset(Counter(('B', 'A')).items())] = 'something else'
print(myDict[frozenset(Counter(('A', 'B')).items())])
# something else
Second method is to use the built-in function sorted, and make it hashable by making it a tuple. This will sort the values before being used as a key: (Note: everything in the tuple must be sortable and hashable)
myDict = {}
myDict[tuple(sorted(('A', 'B')))] = 'something'
myDict[tuple(sorted(('B', 'A')))] = 'something else'
print(myDict[tuple(sorted(('A', 'B')))])
# something else
But if the tuple elements are neither all hashable, nor are they all sortable, unfortunately, you might be out of luck and need to create your own dict structure... D:
You can build your own structure:
class ReverseDict:
def __init__(self):
self.d = {}
def __setitem__(self, k, v):
self.d[k] = v
def __getitem__(self, tup):
return self.d[tup[::-1]]
myDict = ReverseDict()
myDict[('A', 'B')] = 'something'
myDict[('B', 'A')] = 'something else'
print(myDict[('A', 'B')])
Output:
something else
I think the point here is that the elements of the tuple point to the same dictionary element regardless of their order. This can be done by making the hash function commutative over the tuple key elements:
class UnorderedKeyDict(dict):
def __init__(self, *arg):
if arg:
for k,v in arg[0].items():
self[k] = v
def _hash(self, tup):
return sum([hash(ti) for ti in tup])
def __setitem__(self, tup, value):
super().__setitem__(self._hash(tup), value)
def __getitem__(self, tup):
return super().__getitem__(self._hash(tup))
mydict = UnorderedKeyDict({('a','b'):12,('b','c'):13})
mydict[('b','a')]
>> 12
Related
What methods need to be altered if want to change the default behaviour of the dictionary?
Some of the methods I am aware of like __getitem__(), __missing__(), __iter__() etc.
I am trying to implement the dictionary in such a way that if I tried to assign the value to key(already existed) then the old value should not go away while should be kept in some list and when we try to remove the key like pop(key), it should remove older value.
What methods need to be modified to override the dict class to achieve this behaviour?
It is the __setitem__ method that you want to update. You want it to create a list whenever a new key is set in your dictionary and append to that list if the key exists. You can then extend the __getitem__ method as well to take the index of the item you want in a list. As for the pop method, you will also need to override dict.pop.
class ListDict(dict):
def __setitem__(self, key, value):
if key not in self:
super().__setitem__(key, [])
self[key].append(value)
def __getitem__(self, item):
if isinstance(item, tuple):
item, pos = item
return super().__getitem__(item)[pos]
else:
return super().__getitem__(item)
def pop(self, k):
v = self[k].pop(0)
if not self[k]:
super().__delitem__(k)
return v
Example:
# Setting items
d = ListDict()
d['foo'] = 'bar'
d['foo'] = 'baz'
d # {'foo': ['bar', 'baz']}
# Getting items
d['foo', 0] # 'bar'
d['foo', 1] # 'baz'
d['foo', 0:2] # ['bar', 'baz']
# Popping a key
d.pop('foo')
d # {'foo': ['baz']}
d.pop('foo')
d # {}
What I understand is __iter__ method makes an object iterable. If it yields a pair, it produces a dictionary if called with dict. Now, if I want to create
class that creates a list of values if called with list and creates a dict if
called with dict, how to do that?
Let say the class is this:
class Container(object):
def __init__(self):
self.pairs = [
('key1', 'val1'),
('key2', 'val2'),
]
def __getitem__(self, idx):
return self.pairs[idx][1] # return value only
def __iter__(self):
for key, val in self.pairs:
yield key, val
Now, if I use list or dict, I got:
data = Container()
list(data) # [('key1', 'val1'), ('key2', 'val2')]
dict(data) # {'key1': 'val1', 'key2': 'val2'}
What I want (with list) is:
list(data) # ['val1', 'val2']
without keys.
Is it possible? If yes, how?
This is a terrible idea! But you can try reading the calling line to see if the explicit list constructor was calling using the inspect module. Note that this only works when the list constructor is called and will probably produce unexpected results when the regex matches something it wasn't intended to match.
import inspect, re
class Container(object):
def __init__(self):
self.pairs = [
('key1', 'val1'),
('key2', 'val2'),
]
def __getitem__(self, idx):
return self.pairs[idx][1] # return value only
def __iter__(self):
line = inspect.stack()[1][4][0]
# print list(Container())
list_mode = re.match('^.*?list\(.*?\).*$', line) is not None
for key, val in self.pairs:
if list_mode:
yield val
else:
yield key, val
print list(Container()) # ['val1', 'val2']
I think you have to do it manually with something like a list comprehension
[v for k, v in data]
This isn't specifically what you want to do, but I'm not sure that what you want to do is that necessary. Let me know if it is-- though I'm not sure it's possible. In the meantime:
list_data = [val for key,val in list(data)]
Also what Blender said. That was really my first inclination, but I tried to stick more to what you were asking. Something like this:
def to_list(self):
return [val for key,val in self.pairs]
I need a dictionary data structure that store dictionaries as seen below:
custom = {1: {'a': np.zeros(10), 'b': np.zeros(100)},
2: {'c': np.zeros(20), 'd': np.zeros(200)}}
But the problem is that I iterate over this data structure many times in my code. Every time I iterate over it, I need the order of iteration to be respected because all the elements in this complex data structure are mapped to a 1D array (serialized if you will), and thus the order is important. I thought about writing a ordered dict of ordered dict for that matter, but I'm not sure this is the right solution as it seems I may be choosing the wrong data structure. What would be the most adequate solution for my case?
UPDATE
So this is what I came up with so far:
class Test(list):
def __init__(self, *args, **kwargs):
super(Test, self).__init__(*args, **kwargs)
for k,v in args[0].items():
self[k] = OrderedDict(v)
self.d = -1
self.iterator = iter(self[-1].keys())
self.etype = next(self.iterator)
self.idx = 0
def __iter__(self):
return self
def __next__(self):
try:
self.idx += 1
return self[self.d][self.etype][self.idx-1]
except IndexError:
self.etype = next(self.iterator)
self.idx = 0
return self[self.d][self.etype][self.idx-1]
def __call__(self, d):
self.d = -1 - d
self.iterator = iter(self[self.d].keys())
self.etype = next(self.iterator)
self.idx = 0
return self
def main(argv=()):
tst = Test(elements)
for el in tst:
print(el)
# loop over a lower dimension
for el in tst(-2):
print(el)
print(tst)
return 0
if __name__ == "__main__":
sys.exit(main())
I can iterate as many times as I want in this ordered structure, and I implemented __call__ so I can iterate over the lower dimensions. I don't like the fact that if there isn't a lower dimension present in the list, it doesn't give me any errors. I also have the feeling that every time I call return self[self.d][self.etype][self.idx-1] is less efficient than the original iteration over the dictionary. Is this true? How can I improve this?
Here's another alternative that uses an OrderedDefaultdict to define the tree-like data structure you want. I'm reusing the definition of it from another answer of mine.
To make use of it, you have to ensure the entries are defined in the order you want to access them in later on.
class OrderedDefaultdict(OrderedDict):
def __init__(self, *args, **kwargs):
if not args:
self.default_factory = None
else:
if not (args[0] is None or callable(args[0])):
raise TypeError('first argument must be callable or None')
self.default_factory = args[0]
args = args[1:]
super(OrderedDefaultdict, self).__init__(*args, **kwargs)
def __missing__ (self, key):
if self.default_factory is None:
raise KeyError(key)
self[key] = default = self.default_factory()
return default
def __reduce__(self): # optional, for pickle support
args = (self.default_factory,) if self.default_factory else ()
return self.__class__, args, None, None, self.iteritems()
Tree = lambda: OrderedDefaultdict(Tree)
custom = Tree()
custom[1]['a'] = np.zeros(10)
custom[1]['b'] = np.zeros(100)
custom[2]['c'] = np.zeros(20)
custom[2]['d'] = np.zeros(200)
I'm not sure I understand your follow-on question. If the data structure is limited to two levels, you could use nested for loops to iterate over its elements in the order they were defined. For example:
for key1, subtree in custom.items():
for key2, elem in subtree.items():
print('custom[{!r}][{!r}]: {}'.format(key1, key2, elem))
(In Python 2 you'd want to use iteritems() instead of items().)
I think using OrderedDicts is the best way. They're built-in and relatively fast:
custom = OrderedDict([(1, OrderedDict([('a', np.zeros(10)),
('b', np.zeros(100))])),
(2, OrderedDict([('c', np.zeros(20)),
('d', np.zeros(200))]))])
If you want to make it easy to iterate over the contents of the your data structure, you can always provide a utility function to do so:
def iter_over_contents(data_structure):
for delem in data_structure.values():
for v in delem.values():
for row in v:
yield row
Note that in Python 3.3+, which allows yield from <expression>, the last for loop can be eliminated:
def iter_over_contents(data_structure):
for delem in data_structure.values():
for v in delem.values():
yield from v
With one of those you'll then be able to write something like:
for elem in iter_over_contents(custom):
print(elem)
and hide the complexity.
While you could define your own class in an attempt to encapsulate this data structure and use something like the iter_over_contents() generator function as its __iter__() method, that approach would likely be slower and wouldn't allow expressions using two levels of indexing such this following:
custom[1]['b']
which using nested dictionaries (or OrderedDefaultdicts as shown in my other answer) would.
Could you just use a list of dictionaries?
custom = [{'a': np.zeros(10), 'b': np.zeros(100)},
{'c': np.zeros(20), 'd': np.zeros(200)}]
This could work if the outer dictionary is the only one you need in the right order. You could still access the inner dictionaries with custom[0] or custom[1] (careful, indexing now starts at 0).
If not all of the indices are used, you could do the following:
custom = [None] * maxLength # maximum dict size you expect
custom[1] = {'a': np.zeros(10), 'b': np.zeros(100)}
custom[2] = {'c': np.zeros(20), 'd': np.zeros(200)}
You can fix the order of your keys while iterating when you sort them first:
for key in sorted(custom.keys()):
print(key, custom[key])
If you want to reduce sorted()-calls, you may want to store the keys in an extra list which will then serve as your iteration order:
ordered_keys = sorted(custom.keys())
for key in ordered_keys:
print(key, custom[key])
You should be ready to go for as many iterations over your data structure, as you need.
In this simplified form, I want to return the value of bar1 when I iterator over a dictionary of a class in order to avoid issues with a library which requires a list.
class classTest:
def __init__(self, foo):
self.bar1 = foo
def __iter__(self):
for k in self.keys():
yield self[k].bar1
aDict = {}
aDict["foo"] = classTest("xx")
aDict["bar"] = classTest("yy")
for i in aDict:
print i
The current output is
foo
bar
I am targetting for this output to be
xx
yy
What am I missing to get this to work? Or is this even possible?
Your not iterating over the classes, but the dictionary. Also your class has no __getitem__-Method, so your __iter__ wouldn't even work.
To get your result you can do
for value in aDict.values():
print value.bar1
You're printing the keys. Print the values instead:
for k in aDict:
print aDict[k]
Or you can just iterate directly over the values:
for v in aDict.itervalues(): # Python 3: aDict.values()
print v
The __iter__ on your classTest class isn't being used because you're not iterating over a classTest object. (Not that it makes any sense as it's written.)
Basically, if I'm trying to access a dict value which I expect to be an iterable is there an easy one-liner to account for that value not being present aside from using some like DefaultDict. There's this
for el in (myDict.get('myIterable') or []):
pass
Doesn't feel particularly pythonic though...
for item in a_dict.get("some_key",[]):
#do whatever
if the item is guaranteed to be a list if present ... if it might be things other than a list you will need a different solution
You can make a subclass of dict that provides a default value with the __missing__(self, key) method:
class EmptyIterableDict(dict):
def __missing__(self, key):
return []
Example usage:
test = EmptyIterableDict()
test['a'] = [3,2,1]
test['b'] = [2,1]
test['c'] = [1]
for v in test['a']:
print v
3
2
1
for v in test['d']:
print v
If you already have a vanilla dict that you want to iterate like that over, you can make a temporary copy:
original = {'a': [1], 'b': [2,3]}
temp = EmptyIterableDict(original)
for v in temp['d']:
print v
An explicit, multi-line approach to this would be:
if 'my_iterable' in my_dict:
for item in my_dict['my_iterable']:
print(item)
which could also be written as a one-line comprehension:
[print(item) for item in my_dict['my_iterable'] if 'my_iterable' in my_dict]
This isn't a one-liner but it accounts for both possible failures.
try:
for item in dictionary[key]:
print(item)
except KeyError:
pass # Key wasn't present in the dictionary.
except TypeError:
pass # Key was present but corresponding item not iterable.