I need a dictionary data structure that store dictionaries as seen below:
custom = {1: {'a': np.zeros(10), 'b': np.zeros(100)},
2: {'c': np.zeros(20), 'd': np.zeros(200)}}
But the problem is that I iterate over this data structure many times in my code. Every time I iterate over it, I need the order of iteration to be respected because all the elements in this complex data structure are mapped to a 1D array (serialized if you will), and thus the order is important. I thought about writing a ordered dict of ordered dict for that matter, but I'm not sure this is the right solution as it seems I may be choosing the wrong data structure. What would be the most adequate solution for my case?
UPDATE
So this is what I came up with so far:
class Test(list):
def __init__(self, *args, **kwargs):
super(Test, self).__init__(*args, **kwargs)
for k,v in args[0].items():
self[k] = OrderedDict(v)
self.d = -1
self.iterator = iter(self[-1].keys())
self.etype = next(self.iterator)
self.idx = 0
def __iter__(self):
return self
def __next__(self):
try:
self.idx += 1
return self[self.d][self.etype][self.idx-1]
except IndexError:
self.etype = next(self.iterator)
self.idx = 0
return self[self.d][self.etype][self.idx-1]
def __call__(self, d):
self.d = -1 - d
self.iterator = iter(self[self.d].keys())
self.etype = next(self.iterator)
self.idx = 0
return self
def main(argv=()):
tst = Test(elements)
for el in tst:
print(el)
# loop over a lower dimension
for el in tst(-2):
print(el)
print(tst)
return 0
if __name__ == "__main__":
sys.exit(main())
I can iterate as many times as I want in this ordered structure, and I implemented __call__ so I can iterate over the lower dimensions. I don't like the fact that if there isn't a lower dimension present in the list, it doesn't give me any errors. I also have the feeling that every time I call return self[self.d][self.etype][self.idx-1] is less efficient than the original iteration over the dictionary. Is this true? How can I improve this?
Here's another alternative that uses an OrderedDefaultdict to define the tree-like data structure you want. I'm reusing the definition of it from another answer of mine.
To make use of it, you have to ensure the entries are defined in the order you want to access them in later on.
class OrderedDefaultdict(OrderedDict):
def __init__(self, *args, **kwargs):
if not args:
self.default_factory = None
else:
if not (args[0] is None or callable(args[0])):
raise TypeError('first argument must be callable or None')
self.default_factory = args[0]
args = args[1:]
super(OrderedDefaultdict, self).__init__(*args, **kwargs)
def __missing__ (self, key):
if self.default_factory is None:
raise KeyError(key)
self[key] = default = self.default_factory()
return default
def __reduce__(self): # optional, for pickle support
args = (self.default_factory,) if self.default_factory else ()
return self.__class__, args, None, None, self.iteritems()
Tree = lambda: OrderedDefaultdict(Tree)
custom = Tree()
custom[1]['a'] = np.zeros(10)
custom[1]['b'] = np.zeros(100)
custom[2]['c'] = np.zeros(20)
custom[2]['d'] = np.zeros(200)
I'm not sure I understand your follow-on question. If the data structure is limited to two levels, you could use nested for loops to iterate over its elements in the order they were defined. For example:
for key1, subtree in custom.items():
for key2, elem in subtree.items():
print('custom[{!r}][{!r}]: {}'.format(key1, key2, elem))
(In Python 2 you'd want to use iteritems() instead of items().)
I think using OrderedDicts is the best way. They're built-in and relatively fast:
custom = OrderedDict([(1, OrderedDict([('a', np.zeros(10)),
('b', np.zeros(100))])),
(2, OrderedDict([('c', np.zeros(20)),
('d', np.zeros(200))]))])
If you want to make it easy to iterate over the contents of the your data structure, you can always provide a utility function to do so:
def iter_over_contents(data_structure):
for delem in data_structure.values():
for v in delem.values():
for row in v:
yield row
Note that in Python 3.3+, which allows yield from <expression>, the last for loop can be eliminated:
def iter_over_contents(data_structure):
for delem in data_structure.values():
for v in delem.values():
yield from v
With one of those you'll then be able to write something like:
for elem in iter_over_contents(custom):
print(elem)
and hide the complexity.
While you could define your own class in an attempt to encapsulate this data structure and use something like the iter_over_contents() generator function as its __iter__() method, that approach would likely be slower and wouldn't allow expressions using two levels of indexing such this following:
custom[1]['b']
which using nested dictionaries (or OrderedDefaultdicts as shown in my other answer) would.
Could you just use a list of dictionaries?
custom = [{'a': np.zeros(10), 'b': np.zeros(100)},
{'c': np.zeros(20), 'd': np.zeros(200)}]
This could work if the outer dictionary is the only one you need in the right order. You could still access the inner dictionaries with custom[0] or custom[1] (careful, indexing now starts at 0).
If not all of the indices are used, you could do the following:
custom = [None] * maxLength # maximum dict size you expect
custom[1] = {'a': np.zeros(10), 'b': np.zeros(100)}
custom[2] = {'c': np.zeros(20), 'd': np.zeros(200)}
You can fix the order of your keys while iterating when you sort them first:
for key in sorted(custom.keys()):
print(key, custom[key])
If you want to reduce sorted()-calls, you may want to store the keys in an extra list which will then serve as your iteration order:
ordered_keys = sorted(custom.keys())
for key in ordered_keys:
print(key, custom[key])
You should be ready to go for as many iterations over your data structure, as you need.
Related
I am trying to create a dictionary from a list recursively and my code only works when there is only one item in the list. It fails for multiple items and I suspect that this is because the dictionary is being recreated through each instance of the recursion instead of adding to it after the first instance. How can I avoid doing this so that the whole list is converted to a dictionary?
Note: the list is a list of tuples containing two items.
def poncePlanner(restaurantChoices):
if len(restaurantChoices) == 0:
return {}
else:
name, resto = restaurantChoices[0][0], restaurantChoices[0][1]
try:
dic[name] = resto
poncePlanner(restaurantChoices[1:])
return dic
except:
dic = {name: resto}
poncePlanner(restaurantChoices[1:])
return dic
Intended input and output:
>>> restaurantChoice = [("Paige", "Dancing Goats"), ("Fareeda", "Botiwala"),
("Ramya", "Minero"), ("Jane", "Pancake Social")]
>>> poncePlanner(restaurantChoice)
{'Jane': 'Pancake Social',
'Ramya': 'Minero',
'Fareeda': 'Botiwala',
'Paige': 'Dancing Goats'}
You have the edge condition, so you need to define what to do when you have more than one. Here you just take the first tuple, make a dict, and then add the results of recursion into that dict:
restaurantChoice = [("Paige", "Dancing Goats"), ("Fareeda", "Botiwala"),
("Ramya", "Minero"), ("Jane", "Pancake Social")]
def poncePlanner(restaurantChoice):
if not restaurantChoice:
return {}
head, *rest = restaurantChoice
return {head[0]: head[1], **poncePlanner(rest)}
poncePlanner(restaurantChoice)
Returning:
{'Jane': 'Pancake Social',
'Ramya': 'Minero',
'Fareeda': 'Botiwala',
'Paige': 'Dancing Goats'}
Since restaurantChoices are already (key,value) pairs, you can simply use the built-in function dict to create the dictionary:
def poncePlanner(restaurantChoices):
return dict(restaurantChoices)
Without built-in functions, you can also use a simple for-loop to return the desired transformation:
def poncePlanner(restaurantChoices):
result = {}
for key,value in restaurantChoices:
result[key] = value
return result
If you really want recursion, I would do something like this, but it doesn't make sense because the lists are not nested:
def poncePlanner(restaurantChoices):
def recursion(i, result):
if i<len(restaurantChoices):
key, value = restaurantChoices[i]
result[key] = value
recursion(i+1, result)
return result
return recursion(0,{})
This recursive function has O(1) time/space complexity per call, so it runs with optimal efficiency.
The main problem with the original code is that the dictionary dic is not passed to the deeper recursion calls, so the new contents are never added to the final dictionary. (They contents added to new dictionaries and forgotten).
I want to create a data structure for storing various possible paths through a plane with polygons scattered across it. I decided on using nested, multi-level dictionaries to save the various possible paths splitting at fixed points.
A possible instance of such a dictionary would be:
path_dictionary = {starting_coordinates:{new_fixpoint1:{new_fixpoint1_1:...}, new_fixpoint2:{new_fixpoint2_1:...}}}
Now I want to continue building up that structure with new paths from the last fixpoints, so I would have to edit the dictionary at various nesting levels. My plan was to provide a sorted keylist which contains all the fixpoints of the given path and I would have a function to add at to the last provided key.
To achieve this I would have to be able to access the dictionary with the keylist like this:
keylist = [starting_coordinates, new_fixpoint1, new_fixpoint1_1, new_fixpoint1_1_3, ...]
path_dictionary = {starting_coordinates:{new_fixpoint1:{new_fixpoint1_1:...}, new_fixpoint2:{new_fixpoint2_1:...}}}
path_dictionary [keylist [0]] [keylist [1]] [keylist [2]] [...] = additional_fixpoint
Question: How can I write to a variable nesting/depth level in the multi-level dictionary when I have a keylist of some length?
Any help is very much appreciated.
I was playing around with the idea of using multiple indexes, and a defaultdict. And this came out:
from collections import defaultdict
class LayeredDict(defaultdict):
def __getitem__(self, key):
if isinstance(key, (tuple, list)):
if len(key) == 1:
return self[key[0]]
return self[key[0]][key[1:]]
return super(LayeredDict, self).__getitem__(key)
def __setitem__(self, key, value):
if isinstance(key, (tuple, list)):
if len(key) == 1:
self[key[0]] = value
else:
self[key[0]][key[1:]] = value
else:
super(LayeredDict, self).__setitem__(key, value)
def __init__(self, *args, **kwargs):
super(LayeredDict, self).__init__(*args, **kwargs)
self.default_factory = type(self) # override default
I haven't fully tested it, but it should allow you to create any level of nested dictionaries, and index them with a tuple.
>>> x = LayeredDict()
>>> x['abc'] = 'blah'
>>> x['abc']
'blah'
>>> x[0, 8, 2] = 1.2345
>>> x[0, 8, 1] = 8.9
>>> x[0, 8, 'xyz'] = 10.1
>>> x[0, 8].keys()
[1, 2, 'xyz']
>>> x['abc', 1] = 5
*** TypeError: 'str' object does not support item assignment
Unfortunately expansion notation (or whatever it's called) isn't supported, but
you can just pass a list or tuple in as an index.
>>> keylist = (0, 8, 2)
>>> x[*keylist]
*** SyntaxError: invalid syntax (<stdin>, line 1)
>>> x[keylist]
1.2345
Also, the isinstance(key, (tuple, list)) condition means a tuple can't be used as a key.
You can certainly write accessors for such a nested dictionary:
def get(d,l):
return get(d[l[0]],l[1:]) if l else d
def set(d,l,v):
while len(l)>1:
d=d[l.pop(0)]
l,=l # verify list length of 1
d[l]=v
(Neither of these is efficient for long lists; faster versions would use a variable index rather than [1:] or pop(0).)
As for other approaches, there’s not nearly enough here to go on for picking one.
I am trying to create a dictionary with two strings as a key and I want the keys to be in whatever order.
myDict[('A', 'B')] = 'something'
myDict[('B', 'A')] = 'something else'
print(myDict[('A', 'B')])
I want this piece of code to print 'something else'. Unfortunately, it seems that the ordering matters with tuples. What would be the best data structure to use as the key?
Use a frozenset
Instead of a tuple, which is ordered, you can use a frozenset, which is unordered, while still hashable as frozenset is immutable.
myDict = {}
myDict[frozenset(('A', 'B'))] = 'something'
myDict[frozenset(('B', 'A'))] = 'something else'
print(myDict[frozenset(('A', 'B'))])
Which will print:
something else
Unfortunately, this simplicity comes with a disadvantage, since frozenset is basically a “frozen” set. There will be no duplicate values in the frozenset, for example,
frozenset((1, 2)) == frozenset((1,2,2,1,1))
If the trimming down of values doesn’t bother you, feel free to use frozenset
But if you’re 100% sure that you don’t want what was mentioned above to happen, there are however two alternates:
First method is to use a Counter, and make it hashable by using frozenset again: (Note: everything in the tuple must be hashable)
from collections import Counter
myDict = {}
myDict[frozenset(Counter(('A', 'B')).items())] = 'something'
myDict[frozenset(Counter(('B', 'A')).items())] = 'something else'
print(myDict[frozenset(Counter(('A', 'B')).items())])
# something else
Second method is to use the built-in function sorted, and make it hashable by making it a tuple. This will sort the values before being used as a key: (Note: everything in the tuple must be sortable and hashable)
myDict = {}
myDict[tuple(sorted(('A', 'B')))] = 'something'
myDict[tuple(sorted(('B', 'A')))] = 'something else'
print(myDict[tuple(sorted(('A', 'B')))])
# something else
But if the tuple elements are neither all hashable, nor are they all sortable, unfortunately, you might be out of luck and need to create your own dict structure... D:
You can build your own structure:
class ReverseDict:
def __init__(self):
self.d = {}
def __setitem__(self, k, v):
self.d[k] = v
def __getitem__(self, tup):
return self.d[tup[::-1]]
myDict = ReverseDict()
myDict[('A', 'B')] = 'something'
myDict[('B', 'A')] = 'something else'
print(myDict[('A', 'B')])
Output:
something else
I think the point here is that the elements of the tuple point to the same dictionary element regardless of their order. This can be done by making the hash function commutative over the tuple key elements:
class UnorderedKeyDict(dict):
def __init__(self, *arg):
if arg:
for k,v in arg[0].items():
self[k] = v
def _hash(self, tup):
return sum([hash(ti) for ti in tup])
def __setitem__(self, tup, value):
super().__setitem__(self._hash(tup), value)
def __getitem__(self, tup):
return super().__getitem__(self._hash(tup))
mydict = UnorderedKeyDict({('a','b'):12,('b','c'):13})
mydict[('b','a')]
>> 12
I am creating an object that I want to be -either- a dict or a list based on some condition. A basic version of what I am testing is as follows:
class MyClass(dict):
def __init__(self, key, val, allowed):
self.d = {}
if key in allowed:
self.d[key] = val
super(MyClass, self).__init__(self.d)
else:
# I am a list and not a dict
obj = MyClass("a", "b", "abc")
I am wondering if this is possible AND if there is a better way to achieve similar results that is a "better coding practice."
(I do not want to do a pre-check to see if key is in allowed before object creation)
I have a dict that has many elements, I want to write a function that can return the elements in the given index range(treat dict as array):
get_range(dict, begin, end):
return {a new dict for all the indexes between begin and end}
How that can be done?
EDIT: I am not asking using key filter... eg)
{"a":"b", "c":"d", "e":"f"}
get_range(dict, 0, 1) returns {"a":"b", "c":"d"} (the first 2 elements)
I don't care the sorting...
Actually I am implementing the server side paging...
Edit: A dictionary is not ordered. It is impossible to make get_range return the same slice whenever you have modified the dictionary. If you need deterministic result, replace your dict with a collections.OrderedDict.
Anyway, you could get a slice using itertools.islice:
import itertools
def get_range(dictionary, begin, end):
return dict(itertools.islice(dictionary.iteritems(), begin, end+1))
The previous answer that filters by key is kept below:
With #Douglas' algorithm, we could simplify it by using a generator expression:
def get_range(dictionary, begin, end):
return dict((k, v) for k, v in dictionary.iteritems() if begin <= k <= end)
BTW, don't use dict as the variable name, as you can see here dict is a constructor of dictionary.
If you are using Python 3.x, you could use dictionary comprehension directly.
def get_range(dictionary, begin, end):
return {k: v for k, v in dictionary.items() if begin <= k <= end}
Straight forward implementation:
def get_range(d, begin, end):
result = {}
for (key,value) in d.iteritems():
if key >= begin and key <= end:
result[key] = value
return result
One line:
def get_range2(d, begin, end):
return dict([ (k,v) for (k,v) in d.iteritems() if k >= begin and k <= end ])
resting assured that what you really want an OrderedDict, you can also use enumerate:
#!/usr/bin/env python
def get_range(d, begin, end):
return dict(e for i, e in enumerate(d.items()) if begin <= i <= end)
if __name__ == '__main__':
print get_range({"a":"b", "c":"d", "e":"f"}, 0, 1)
output:
{'a': 'b', 'c': 'd'}
ps: I let you use 0, 1 as range values, but you should use 0, 2 to sign the "first two elements" (and use begin <= i < end as comparison function
As others have mentioned, in Python dictionaries are inherently unordered. However at any given moment a list of their current keys or key,value pairs can be obtained by using their keys()or items() methods.
A potential problem with using these lists is that not only their contents, but also the order it is returned in will likely vary if the dictionary has been modified (or mutated) since the last time they were used. This means you generally can't store and reuse the list unless you update it every time the dictionary is is changed just in case you're going to need it.
To make this approach more manageable you can combining a dictionary and the auxiliary list into a new derived class which takes care of the synchronization between the two and also provides a get_range() method that make use of the list's current contents. Below is sample code showing how this could be done. It's based on ideas I got from the code in this ActiveState Python Recipe.
class dict_with_get_range(dict):
def __init__(self, *args, **kwrds):
dict.__init__(self, *args, **kwrds)
self._list_ok = False
def _rebuild_list(self):
self._list = []
for k,v in self.iteritems():
self._list.append((k,v))
self._list_ok = True
def get_range(self, begin, end):
if not self._list_ok:
self._rebuild_list()
return dict(self._list[i] for i in range(begin,end+1))
def _wrapMutatorMethod(methodname):
_method = getattr(dict, methodname)
def wrapper(self, *args, **kwrds):
# Reset 'list OK' flag, then delegate to the real mutator method
self._list_ok = False
return _method(self, *args, **kwrds)
setattr(dict_with_get_range, methodname, wrapper)
for methodname in 'delitem setitem'.split():
_wrapMutatorMethod('__%s__' % methodname)
for methodname in 'clear update setdefault pop popitem'.split():
_wrapMutatorMethod(methodname)
del _wrapMutatorMethod # no longer needed
dct = dict_with_get_range({"a":"b", "c":"d", "e":"f"})
print dct.get_range(0, 1)
# {'a': 'b', 'c': 'd'}
del dct["c"]
print dct.get_range(0, 1)
# {'a': 'b', 'e': 'f'}
The basic idea is to derive a new class from dict that also has an internal contents list for use by the new get_range() method it provides that regular dictionary objects don't. To minmize the need to update (or even create) this internal list, it also has a flag indicating whether or not the list is up-to-date, and only checks it and rebuilds the list when necessary.
To maintain the flag, each inherited dictionary method which potentially changes (or mutates) the dictionary's contents is "wrapped" with helper function the resets the flag and then chains to the normal dictionary method to actually perform the operation. Installing them into the class is simply a matter of putting the names of the methods in one of two lists and then passing them one at time to an auxiliary utility immediately following the creation of the class.