Python dictionary - binary search for a key?

Python dictionary - binary search for a key? - python

I want to write a container class that acts like a dictionary (actually derives from a dict), The keys for this structure will be dates.
When a key (i.e. date) is used to retrieve a value from the class, if the date does not exist then the next available date that preceeds the key is used to return the value.
The following data should help explain the concept further:
Date (key) Value
2001/01/01 123
2001/01/02 42
2001/01/03 100
2001/01/04 314
2001/01/07 312
2001/01/09 321
If I try to fetch the value associated with key (date) '2001/01/05' I should get the value stored under the key 2001/01/04 since that key occurs before where the key '2001/01/05' would be if it existed in the dictionary.
In order to do this, I need to be able to do a search (ideally binary, rather than naively looping through every key in the dictionary). I have searched for bsearch dictionary key lookups in Python dictionaries - but have not found anything useful.
Anyway, I want to write a class like that encapsulates this behavior.
This is what I have so far (not much):
#
class NearestNeighborDict(dict):
#
"""
#
a dictionary which returns value of nearest neighbor
if specified key not found
#
"""
def __init__(self, items={}):
dict.__init__(self, items)
def get_item(self, key):
# returns the item stored with the key (if key exists)
# else it returns the item stored with the key

You really don't want to subclass dict because you can't really reuse any of its functionality. Rather, subclass the abstract base class collections.Mapping (or MutableMapping if you want to also be able to modify an instance after creation), implement the indispensable special methods for the purpose, and you'll get other dict-like methods "for free" from the ABC.
The methods you need to code are __getitem__ (and __setitem__ and __delitem__ if you want mutability), __len__, __iter__, and __contains__.
The bisect module of the standard library gives you all you need to implement these efficiently on top of a sorted list. For example...:
import collections
import bisect
class MyDict(collections.Mapping):
def __init__(self, contents):
"contents must be a sequence of key/value pairs"
self._list = sorted(contents)
def __iter__(self):
return (k for (k, _) in self._list)
def __contains__(self, k):
i = bisect.bisect_left(self._list, (k, None))
return i < len(self._list) and self._list[i][0] == k
def __len__(self):
return len(self._list)
def __getitem__(self, k):
i = bisect.bisect_left(self._list, (k, None))
if i >= len(self._list): raise KeyError(k)
return self._list[i][1]
You'll probably want to fiddle __getitem__ depending on what you want to return (or whether you want to raise) for various corner cases such as "k greater than all keys in self".

The sortedcontainers module provides a SortedDict type that maintains the keys in sorted order and supports bisecting on those keys. The module is pure-Python and fast-as-C implementations with 100% test coverage and hours of stress.
For example:
from sortedcontainers import SortedDict
sd = SortedDict((date, value) for date, value in data)
# Bisect for the index of the desired key.
index = sd.bisect('2001/01/05')
# Lookup the real key at that index.
key = sd.iloc[index]
# Retrieve the value associated with that key.
value = sd[key]
Because SortedDict supports fast indexing, it's easy to look ahead or behind your key as well. SortedDict is also a MutableMapping so it should work nicely in your type system.

I'd extend a dict, and override the __getitem__ and __setitem__ method to store a sorted list of keys.
from bisect import bisect
class NearestNeighborDict(dict):
def __init__(self):
dict.__init__(self)
self._keylist = []
def __getitem__(self, x):
if x in self:
return dict.__getitem__(self, x)
index = bisect(self._keylist, x)
if index == len(self._keylist):
raise KeyError('No next date')
return dict.__getitem__(self, self._keylist[index])
def __setitem__(self, x, value):
if x not in self:
index = bisect(self._keylist, x)
self._keylist.insert(index, value)
dict.__setitem__(self, x, value)
It's true you're better off inheriting from MutableMapping, but the principle is the same, and the above code can be easily adapted.

Why not just maintain a sorted list from dict.keys() and search that? If you're subclassing dict you may even devise an opportunity to do a binary insert on that list when values are added.

Use the floor_key method on bintrees.RBTree: https://pypi.python.org/pypi/bintrees/2.0.1

Related

Dictionay `getitem` multi-subscripting overriding

I'm trying to implement a customized behavior of the dict data structure.
I want to override the __getitem__ and apply some sort of regex on the value before returning it to the user.
Snippet:
class RegexMatchingDict(dict):
def __init__(self, dct, regex, value_group, replace_with_group, **kwargs):
super().__init__(**kwargs)
self.replace_with_group = replace_with_group
self.value_group = value_group
self.regex_str = regex
self.regex_matcher = re.compile(regex)
self.update(dct)
def __getitem__(self, key):
value: Union[str, dict] = dict.__getitem__(self, key)
if type(value) is str:
match = self.regex_matcher.match(value)
if match:
return value.replace(match.group(self.replace_with_group), os.getenv(match.group(self.value_group)))
return value # I BELIEVE ISSUE IS HERE
This works perfectly for a single index level (i.e., dict[key]). However, when trying to multi-index it (i.e., dict[key1][key2]), what happens is that the first index level returns an object from my class. But, the other levels calls the default __getitem__ in dict, which does not execute my customized behavior. How can I fix this?
An MCVE:
The aforementioned code applies a regular expression to the value and convert it to its corresponding environment variable's value if it's string (i.e., the lowest level in the dict)
dictionary = {"KEY": "{ENVIRONMENT_VARIABLE}"}
custom_dict = RegexMatchingDict(dictionary, r"((.*({(.+)}).*))", 4 ,3)
Let's set an env variable called ENVIRONMENT_VARIABLE set to 1.
import os
os.environ["ENVIRONMENT_VARIABLE"] = "1"
In this case, thie code works perfectly fine
custom_dict["KEY"]
and the returned value will be:
{"KEY": 1}
However, if we had a multi-level indexing
dictionary = {"KEY": {"INDEXT_KEY": "{ENVIRONMENT_VARIABLE}"}
custom_dict = RegexMatchingDict(dictionary, r"((.*({(.+)}).*))", 4 ,3)
custom_dict["KEY"]["INDEX_KEY"]
This would return
{ENVIRONMENT_VARIABLE}
P. S. There are many similar questions, but they all (probably) address the top-level indexing.

The problem, as you say yourself, is in the last line of your code.
if type(value) is str:
...
else:
return value # I BELIEVE ISSUE IS HERE
This is returning a dict. But you want to return a RegexMatchingDict instead, that will know how to handle the second level of indexing. So instead of returning value if it is a dict, convert it to a RegexMatchingDict and return that instead. Then when __getitem__() is called to perform the second level of indexing, you will get your version and not the standard one.
Something like this:
return RegexMatchingDict(value, self.regex_str, self.value_group, self.replace_with_group)
This copies the other arguments from the first level since it is hard to see how the second level could be different.

In your example, your second level dictionary is a normal dict and therefore doesn't use your custom __getitem__ method.
The code below shows what should be done to have an internal custom dict:
sec_level_dict = {"KEY": "{ENVIRONMENT_VARIABLE}"}
sec_level_custom_dict = RegexMatchingDict(sec_level_dict, r"((.*({(.+)}).*))", 4 ,3)
dictionary = {"KEY": sec_level_custom_dict}
custom_dict = RegexMatchingDict(dictionary, r"((.*({(.+)}).*))", 4 ,3)
print(custom_dict["KEY"]["KEY"])
If you want to automate this and transform all nested dict in custom dict, you can customize __setitem__ following this pattern:
class CustomDict(dict):
def __init__(self, dct):
super().__init__()
for k, v in dct.items():
self[k] = v
def __getitem__(self, key):
value = dict.__getitem__(self, key)
print("Dictionary:", self, "key:", key, "value:", value)
return value
def __setitem__(self, key, value):
if isinstance(value, dict):
dict.__setitem__(self, key, self.__class__(value))
else:
dict.__setitem__(self, key, value)
a = CustomDict({'k': {'k': "This is my nested value"}})
print(a['k']['k'])

Python - overriding a method based on specific access type

I have a situation where I need to create a dictionary that keeps track of global order of the values. I haven't been able to find a good way for the class itself to have an incrementing counter that's also tracked by the value.
Here's what I've written in the meanwhile to get around this:
from collections import defaultdict
class NotMyDict(object):
""" defaultdict(list) that tracks order globally across the dict.
Will function as a normal defaultdict(list) unless you modify the
'ordered' attribute and set it to a non-false evaluating value. This
"""
ordered = False
_data = defaultdict(NotMyDictList)
_next_index = 0
class NotMyDictList(list):
def append(self, value):
def __repr__(self):
if self.ordered:
return repr(self._data)
else:
temp = defaultdict(list)
for key in self._data:
for value in self._data[key]:
temp[key].append(value[0])
return repr(temp)
def __getitem__(self, key):
if self.ordered:
return self._data[key]
else:
return [val[0] for val in self._data[key]]
def add_value_to_key(self, key, value):
self._data[key].append((value, self._next_index))
self._next_index += 1
So I can use this like a normal dictionary for pulling values. I could have instantiated a list if the key didn't exist, but defaultdict was simple and easy.
Here's an example of the use:
test = NotMyDict()
test.add_value_to_key('test', 'hi')
test.add_value_to_key('test', 'there')
test.add_value_to_key('test', 'buddy')
test['test']
Result:
['hi', 'there', 'buddy']
test.ordered = True
test['test']
Result:
[('hi', 0), ('there', 1), ('buddy', 2)]
Now - the example of use isn't super important, but the functionality that I can't seem to figure out, is instead of using the .add_value_to_key(), I want to be able to use a normal defaultdict(list) convention of:
dict[key].append()
and still have it track the index. Do I need to pass global object locations with id() and reference those objects at a memory level, or is there a way I just don't understand to have a "class global" that's accessible by it's members?
I had also tried to use nested classes, but the nested class didn't have access to the parent class's globals, so I'd have to:
Make a list-like class that references the parent class attribute somehow (Maybe with id() and direct memory location reference?)
modify/make it's append() function so that it also updates the parent class global counter, and tracks the value with this counter as a metadata field.
I really just can't seem to wrap my head around how to create this object/class in a way that let's me use the same functionality of a defaultdict(list) where I can index/append directly AND have it track the global index order of that new value.
dict[key].append(value)
Help would be appreciated - I sunk three hours into trying different solutions before I scrapped it and went with the "just use this method to append" for now.

Efficiently mapping unhashable objects to their index in a list

A Python list
f = [x0, x1, x2]
may be seen as an efficient representation of a mapping from [0, 1, ..., len(f) - 1] to the set of its elements. By "efficient" I mean that f[i] returns the element associated with i in O(1) time.
The inverse mapping may be defined as follows:
class Inverse:
def __init__(self, f):
self.f = f
def __getitem__(self, x):
return self.f.index(x)
This works, but Inverse(f)[x] takes O(n) time on average.
Alternatively, one may use a dict:
f_inv = {x: i for i, x in enumerate(f)}
This has O(1) average time complexity, but it requires the objects in the list to be hashable.
Is there a way to define an inverse mapping that provides equality-based lookups, in O(1) average time, with unhashable objects?
Edit: sample input and expected output:
>>> f = [x0, x1, x2]
>>> f_inv = Inverse(f) # this is to be defined
>>> f_inv[x0] # in O(1) time
0
>>> f_inv[x2] # in O(1) time
2

You can create an associated dictionary mapping the object ID's back to the list index.
The obvious disadvantage is that you will have to search the index for the identity object, not for on eobject that is merely equal.
On the upside, by creating a custom MutableSequence class using collections.abc, you can, with minimal code, write a class that keeps your data both as a sequence and as the reverse dictionary.
from collections.abc import MutableSequence
from threading import RLock
class MD(dict):
# No need for a full MutableMapping subclass, as the use is limited
def __getitem__(self, key):
return super().__getitem__(id(key))
class Reversible(MutableSequence):
def __init__(self, args):
self.seq = list()
self.reverse = MD()
self.lock = RLock()
for element in args:
self.append(element)
def __getitem__(self, index):
return self.seq[index]
def __setitem__(self, index, value):
with self.lock:
del self.reverse[id(self.seq[index])]
self.seq[index] = value
self.reverse[id(value)] = index
def __delitem__(self, index):
if index < 0:
index += len(self)
with self.lock:
# Increase all mapped indexes
for obj in self.seq[index:]:
self.reverse[obj] -= 1
del self.reverse[id(self.seq[index])]
del self.seq[index]
def __len__(self):
return len(self.seq)
def insert(self, index, value):
if index < 0:
index += len(self)
with self.lock:
# Increase all mapped indexes
for obj in self.seq[index:]:
self.reverse[obj] += 1
self.seq.insert(index, value)
self.reverse[id(value)] = index
And voilá: just use this object in place of your list, and the public attribute "reverse" to get the index of identity objects.
Perceive you can augment the "intelligence" of the "MD" class by trying to use different strategies, like to use the objects themselves, if they are hashable, and only resort to id, or other custom key based on other object attributes, when needed. That way you could mitigate the need for the search to be for the same object.
So, for ordinary operations on the list, this class maintain the reverted dictionary synchronized. There is no support for slice indexing, though.
For more information, check the docs at https://docs.python.org/3/library/collections.abc.html

Unfortunately you're stuck with an algorithm limitation here. Fast lookup structures, like hash tables or binary trees, are efficient because they put objects in particular buckets or order them based on their values. This requires them to be hashable or comparable consistently for the entire time you are storing them in this structure, otherwise a lookup is very likely to fail.
If the objects you need are mutable (usually the reason they are not hashable) then any time an object you are tracking changes you need to update the data structure. The safest way to do this is to create immutable objects. If you need to change an object, then create a new one, remove the old one from the dictionary, and insert the new object as a key with the same value.
The operations here are still O(1) with respect to the size of the dictionary, you just need to consider whether the cost of copying objects on every change is worth it.

Use get, set with dictionary item?

Is there a way to make a dictionary of functions that use set and get statements and then use them as set and get functions?
class thing(object):
def __init__(self, thingy)
self.thingy = thingy
def __get__(self,instance,owner):
return thingy
def __set__(self,instance,value):
thingy += value
theDict = {"bob":thing(5), "suzy":thing(2)}
theDict["bob"] = 10
wanted result is that 10 goes into the set function and adds to the existing 5
print theDict["bob"]
>>> 15
actual result is that the dictionary replaces the entry with the numeric value
print theDict["bob"]
>>> 10
Why can't I just make a function like..
theDict["bob"].add(10)
is because it's building off an existing and already really well working function that uses the set and get. The case I'm working with is an edge case and wouldn't make sense to reprogram everything to make work for this one case.
I need some means to store instances of this set/get thingy that is accessible but doesn't create some layer of depth that might break existing references.
Please don't ask for actual code. It'd take pages of code to encapsulate the problem.

You could do it if you can (also) use a specialized version of the dictionary which is aware of your Thing class and handles it separately:
class Thing(object):
def __init__(self, thingy):
self._thingy = thingy
def _get_thingy(self):
return self._thingy
def _set_thingy(self, value):
self._thingy += value
thingy = property(_get_thingy, _set_thingy, None, "I'm a 'thingy' property.")
class ThingDict(dict):
def __getitem__(self, key):
if key in self and isinstance(dict.__getitem__(self, key), Thing):
return dict.__getitem__(self, key).thingy
else:
return dict.__getitem__(self, key)
def __setitem__(self, key, value):
if key in self and isinstance(dict.__getitem__(self, key), Thing):
dict.__getitem__(self, key).thingy = value
else:
dict.__setitem__(self, key, value)
theDict = ThingDict({"bob": Thing(5), "suzy": Thing(2), "don": 42})
print(theDict["bob"]) # --> 5
theDict["bob"] = 10
print(theDict["bob"]) # --> 15
# non-Thing value
print(theDict["don"]) # --> 42
theDict["don"] = 10
print(theDict["don"]) # --> 10

No, because to execute theDict["bob"] = 10, the Python runtime doesn't call any methods at all of the previous value of theDict["bob"]. It's not like when myObject.mydescriptor = 10 calls the descriptor setter.
Well, maybe it calls __del__ on the previous value if the refcount hits zero, but let's not go there!
If you want to do something like this then you need to change the way the dictionary works, not the contents. For example you could subclass dict (with the usual warnings that you're Evil, Bad and Wrong to write a non-Liskov-substituting derived class). Or you could from scratch implement an instance of collections.MutableMapping. But I don't think there's any way to hijack the normal operation of dict using a special value stored in it.

theDict["bob"] = 10 is just assign 10 to the key bob for theDict.
I think you should know about the magic methods __get__ and __set__ first. Go to: https://docs.python.org/2.7/howto/descriptor.html Using a class might be easier than dict.

Pythonic slicing of nested attributes

I am dealing with classes whose attributes are sometimes list whose elements can be dictionaries or further nested objects with attributes etc. I would like to perform some slicing that with my grasp of python is only doable with what feels profoundly un-Pythonic.
My minimal code looks like this:
class X(object):
def __init__(self):
self.a = []
x=X()
x.a.append({'key1':'v1'})
x.a.append({'key1':'v2'})
x.a.append({'key1':'v3'})
# this works as desired
x.a[0]['key1'] # 'v1'
I would like to perform an access to a key in the nested dictionary but make that call for all elements of the list containing that dictionary. The standard python way of doing this would be a list comprehension a la:
[v['key1'] for v in x.a]
However, my minimal example doesn't quite convey the full extent of nesting in my real-world scenario: The attribute list a in class X might contain objects, whose attributes are objects, whose attributes are dictionaries whose keys I want to select on while iterating over the outer list.
# I would like something like
useful_list = x.a[:]['key1'] # TypeError: list indices must be integers, not str
# or even better
cool_list = where(x.a[:]['key1'] == 'v2') # same TypeError
If I start list comprehending for every interesting key it quickly doesn't look all that Pythonic. Is there a nice way of doing this or do I have to code 'getter' methods for all conceivable pairings of lists and dictionary keys?
UPDATE:
I have been reading about overloading lists. Apparently one can mess with the getitem method which is used for indeces for lists and keys for dict. Maybe a custom class that iterates over list members. This is starting to sound contrived...

So, you want to create an hierarchical structure, with an operation which means
different things for different types, and is defined recursively.
Polymorphism to the rescue.
You could override __getitem__ instead of my get_items below, but in your case it might be better to define a non-builtin operation to avoid risking ambiguity. It's up to you really.
class ItemsInterface(object):
def get_items(self, key):
raise NotImplementedError
class DictItems(ItemsInterface, dict):
def __init__(self, *args, **kwargs):
dict.__init__(self, *args, **kwargs)
def get_items(self, key):
res = self[key]
# apply recursively
try:
res = res.get_items(key)
except AttributeError:
pass
return res
class ListItems(ItemsInterface, list):
def __init__(self, *args, **kwargs):
list.__init__(self, *args, **kwargs)
def get_items(self, key):
return [ x.get_items(key) for x in self ]
x = ListItems()
x.append(DictItems({'key1':'v1'}))
x.append(DictItems({'key1':'v2'}))
x.append(DictItems({'key1':'v3'}))
y = DictItems({'key1':'v999'})
x.append(ListItems([ y ]))
x.get_items('key1')
=> ['v1', 'v2', 'v3', ['v999']]
Of course, this solution might not be exactly what you need (you didn't explain what it should do if the key is missing, etc.)
but you can easily modify it to suit your needs.
This solution also supports ListItems as values of the DictItems. the get_items operation is applied recursively.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python dictionary - binary search for a key? - python

Why not just maintain a sorted list from dict.keys() and search that? If you're subclassing dict you may even devise an opportunity to do a binary insert on that list when values are added.

Use the floor_key method on bintrees.RBTree: https://pypi.python.org/pypi/bintrees/2.0.1

Related

Dictionay `getitem` multi-subscripting overriding

Python - overriding a method based on specific access type

Efficiently mapping unhashable objects to their index in a list

Use get, set with dictionary item?

Pythonic slicing of nested attributes

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python dictionary - binary search for a key? - python

Why not just maintain a sorted list from dict.keys() and search that? If you're subclassing dict you may even devise an opportunity to do a binary insert on that list when values are added.

Use the floor_key method on bintrees.RBTree: https://pypi.python.org/pypi/bintrees/2.0.1

Related

Dictionay `__getitem__` multi-subscripting overriding

Python - overriding a method based on specific access type

Efficiently mapping unhashable objects to their index in a list

Use __get__, __set__ with dictionary item?

Pythonic slicing of nested attributes

Categories

Resources

Dictionay `getitem` multi-subscripting overriding

Use get, set with dictionary item?