Dictionay `__getitem__` multi-subscripting overriding - python

I'm trying to implement a customized behavior of the dict data structure.
I want to override the __getitem__ and apply some sort of regex on the value before returning it to the user.
Snippet:
class RegexMatchingDict(dict):
def __init__(self, dct, regex, value_group, replace_with_group, **kwargs):
super().__init__(**kwargs)
self.replace_with_group = replace_with_group
self.value_group = value_group
self.regex_str = regex
self.regex_matcher = re.compile(regex)
self.update(dct)
def __getitem__(self, key):
value: Union[str, dict] = dict.__getitem__(self, key)
if type(value) is str:
match = self.regex_matcher.match(value)
if match:
return value.replace(match.group(self.replace_with_group), os.getenv(match.group(self.value_group)))
return value # I BELIEVE ISSUE IS HERE
This works perfectly for a single index level (i.e., dict[key]). However, when trying to multi-index it (i.e., dict[key1][key2]), what happens is that the first index level returns an object from my class. But, the other levels calls the default __getitem__ in dict, which does not execute my customized behavior. How can I fix this?
An MCVE:
The aforementioned code applies a regular expression to the value and convert it to its corresponding environment variable's value if it's string (i.e., the lowest level in the dict)
dictionary = {"KEY": "{ENVIRONMENT_VARIABLE}"}
custom_dict = RegexMatchingDict(dictionary, r"((.*({(.+)}).*))", 4 ,3)
Let's set an env variable called ENVIRONMENT_VARIABLE set to 1.
import os
os.environ["ENVIRONMENT_VARIABLE"] = "1"
In this case, thie code works perfectly fine
custom_dict["KEY"]
and the returned value will be:
{"KEY": 1}
However, if we had a multi-level indexing
dictionary = {"KEY": {"INDEXT_KEY": "{ENVIRONMENT_VARIABLE}"}
custom_dict = RegexMatchingDict(dictionary, r"((.*({(.+)}).*))", 4 ,3)
custom_dict["KEY"]["INDEX_KEY"]
This would return
{ENVIRONMENT_VARIABLE}
P. S. There are many similar questions, but they all (probably) address the top-level indexing.

The problem, as you say yourself, is in the last line of your code.
if type(value) is str:
...
else:
return value # I BELIEVE ISSUE IS HERE
This is returning a dict. But you want to return a RegexMatchingDict instead, that will know how to handle the second level of indexing. So instead of returning value if it is a dict, convert it to a RegexMatchingDict and return that instead. Then when __getitem__() is called to perform the second level of indexing, you will get your version and not the standard one.
Something like this:
return RegexMatchingDict(value, self.regex_str, self.value_group, self.replace_with_group)
This copies the other arguments from the first level since it is hard to see how the second level could be different.

In your example, your second level dictionary is a normal dict and therefore doesn't use your custom __getitem__ method.
The code below shows what should be done to have an internal custom dict:
sec_level_dict = {"KEY": "{ENVIRONMENT_VARIABLE}"}
sec_level_custom_dict = RegexMatchingDict(sec_level_dict, r"((.*({(.+)}).*))", 4 ,3)
dictionary = {"KEY": sec_level_custom_dict}
custom_dict = RegexMatchingDict(dictionary, r"((.*({(.+)}).*))", 4 ,3)
print(custom_dict["KEY"]["KEY"])
If you want to automate this and transform all nested dict in custom dict, you can customize __setitem__ following this pattern:
class CustomDict(dict):
def __init__(self, dct):
super().__init__()
for k, v in dct.items():
self[k] = v
def __getitem__(self, key):
value = dict.__getitem__(self, key)
print("Dictionary:", self, "key:", key, "value:", value)
return value
def __setitem__(self, key, value):
if isinstance(value, dict):
dict.__setitem__(self, key, self.__class__(value))
else:
dict.__setitem__(self, key, value)
a = CustomDict({'k': {'k': "This is my nested value"}})
print(a['k']['k'])

Related

Python simple lazy loading

I'm trying to clean up some logic and remove duplicate values in some code and am looking for a way to introduce some very simple lazy-loading to handle settings variables. Something that would work like this:
FOO = {'foo': 1}
BAR = {'test': FOO['foo'] }
# ...complex logic here which ultimately updates the value of Foo['foo']...
FOO['foo'] = 2
print(BAR['test']) # Outputs 1 but would like to get 2
Update:
My question may not have been clear based on the initial responses. I'm looking to replace the value being set for test in BAR with a lazy-loaded substitute. I know a way I can do this but it seems unnecessarily complex for what it is, I'm wondering if there's a simpler approach.
Update #2:
Okay, here's a solution that works. Is there any built-in type that can do this out of the box:
FOO = {'foo': 1}
import types
class LazyDict(dict):
def __getitem__(self, item):
value = super().__getitem__(item)
return value if not isinstance(value, types.LambdaType) else value()
BAR = LazyDict({ 'test': lambda: FOO['foo'] })
# ...complex logic here which ultimately updates the value of Foo['foo']...
FOO['foo'] = 2
print(BAR['test']) # Outputs 2
As I stated in the comment above, what you are seeking is some of the facilities of reactive programming paradigm. (not to be confounded with the JavaScript library which borrows its name from there).
It is possible to instrument objects in Python to do so - I think the minimum setup here would be a specialized target mapping, and a special object type you set as the values in it, that would fetch the target value.
Python can do this in more straightforward ways with direct attribute access (using the dot notation: myinstance.value) than by using the key-retrieving notation used in dictionaries mydata['value'] due to the fact a class is already a template to a certain data group, and class attributes can define mechanisms to access each instance's attribute value. That is called the "descriptor protocol" and is bound into the language model itself.
Nonetheless a minimalist Mapping based version can be implemented as such:
FOO = {'foo': 1}
from collections.abc import MutableMapping
class LazyValue:
def __init__(self, source, key):
self.source = source
self.key = key
def get(self):
return self.source[self.key]
def __repr__(self):
return f"<LazyValue {self.get()!r}>"
class LazyDict(MutableMapping):
def __init__(self, *args, **kw):
self.data = dict(*args, **kw)
def __getitem__(self, key):
value = self.data[key]
if isinstance(value, LazyValue):
value = value.get()
return value
def __setitem__(self, key, value):
self.data[key] = value
def __delitem__(key):
del self.data[key]
def __iter__(self):
return iter(self.data)
def __len__():
return len(self.data)
def __repr__():
return repr({key: value} for key, value in self.items())
BAR = LazyDict({'test': LazyValue(FOO, 'foo')})
# ...complex logic here which ultimately updates the value of Foo['foo']...
FOO['foo'] = 2
print(BAR['test']) # Outputs 2
The reason this much code is needed is that there are several ways to retrieve data from a dictionary or mapping (.values, .items, .get, .setdefault) and simply inheriting from dict and implementing __getitem__ would "leak" the special lazy object in any of the other methods. Going through this MutableMapping approach ensure a single point of reading of the value in the __getitem__ method - and the resulting instance can be used reliably anywhere a mapping is expected.
However, notice that if you are using normal classes and instances rather than dictionaries, this can be much simpler - you can just use plain Python "property" and have a getter that will fetch the value. The main factor you should ponder is whether your referenced data keys are fixed, and can be hard-coded when writting the source code, or if they are dynamic, and which keys will work as lazy-references are only known at runtime. In this last case, the custom mapping approach, as above, will be usually better:
FOO = {'foo': 1}
class LazyStuff:
def __init__(self, source):
self.source = source
#property
def test(self):
return self.source["foo"]
BAR = LazyStuff(FOO)
FOO["foo"] = 2
print(BAR.test)
Perceive that in this way you have to hardcode the key "foo" and "test" in the class body, but it is just plaincode, and no need for the intermediary "LazyValue" class. Also, if you need this data as a dictionary, you could add an .as_dict method to LazyStuff that would collect all attributes in the moment it were called and yield a snapshot of those values as a dictionary..
You can try using lambdas and calling the value on return. Like this:
FOO = {'foo': 1}
BAR = {'test': lambda: FOO['foo'] }
FOO['foo'] = 2
print(BAR['test']()) # Outputs 2
If you're only one level deep, you may wish to try ChainMap, E.g.,
>>> from collections import ChainMap
>>> defaults = {'foo': 42}
>>> myvalues = {}
>>> result = ChainMap(myvalues, defaults)
>>> result['foo']
42
>>> defaults['foo'] = 99
>>> result['foo']
99

What is the difference between ds.get() and ds.get_item() in pydicom

Does anyone know what is the difference in Pydicom between the two methods FileDataset.get() and FileDataset.get_item()?
Thanks!
Both of these are not used often in user code. Dataset.get is the equivalent of python's dict.get; it allows you to ask for an item in the dictionary, but return a default if that item does not exist in the Dataset. The more usual way to get an item from a Dataset is to use the dot notation, e.g.
dataset.PatientName
or to get the DataElement object via the tag number, e.g.
dataset[0x100010]
Dataset.get_item is a lower-level routine, primarily used when there is something wrong with some incoming data, and it needs to be corrected before the "raw data element" value is converted into python standard types (int, float, string types, etc).
When used with a keyword, Dataset.get() returns a value, not a DataElement instance. Dataset.get_item always returns either a DataElement instance, or a RawDataElement instance.
I imagine your answer is in the source for those two functions. Looks like get() handled strings as well as DataElements as input.
def get(self, key, default=None):
"""Extend dict.get() to handle DICOM DataElement keywords.
Parameters
----------
key : str or pydicom.tag.Tag
The element keyword or Tag or the class attribute name to get.
default : obj or None
If the DataElement or class attribute is not present, return
`default` (default None).
Returns
-------
value
If `key` is the keyword for a DataElement in the Dataset then
return the DataElement's value.
pydicom.dataelem.DataElement
If `key` is a tag for a DataElement in the Dataset then return the
DataElement instance.
value
If `key` is a class attribute then return its value.
"""
if isinstance(key, (str, compat.text_type)):
try:
return getattr(self, key)
except AttributeError:
return default
else:
# is not a string, try to make it into a tag and then hand it
# off to the underlying dict
if not isinstance(key, BaseTag):
try:
key = Tag(key)
except Exception:
raise TypeError("Dataset.get key must be a string or tag")
try:
return_val = self.__getitem__(key)
except KeyError:
return_val = default
return return_val
def get_item(self, key):
"""Return the raw data element if possible.
It will be raw if the user has never accessed the value, or set their
own value. Note if the data element is a deferred-read element,
then it is read and converted before being returned.
Parameters
----------
key
The DICOM (group, element) tag in any form accepted by
pydicom.tag.Tag such as [0x0010, 0x0010], (0x10, 0x10), 0x00100010,
etc. May also be a slice made up of DICOM tags.
Returns
-------
pydicom.dataelem.DataElement
"""
if isinstance(key, slice):
return self._dataset_slice(key)
if isinstance(key, BaseTag):
tag = key
else:
tag = Tag(key)
data_elem = dict.__getitem__(self, tag)
# If a deferred read, return using __getitem__ to read and convert it
if isinstance(data_elem, tuple) and data_elem.value is None:
return self[key]
return data_elem

Use __get__, __set__ with dictionary item?

Is there a way to make a dictionary of functions that use set and get statements and then use them as set and get functions?
class thing(object):
def __init__(self, thingy)
self.thingy = thingy
def __get__(self,instance,owner):
return thingy
def __set__(self,instance,value):
thingy += value
theDict = {"bob":thing(5), "suzy":thing(2)}
theDict["bob"] = 10
wanted result is that 10 goes into the set function and adds to the existing 5
print theDict["bob"]
>>> 15
actual result is that the dictionary replaces the entry with the numeric value
print theDict["bob"]
>>> 10
Why can't I just make a function like..
theDict["bob"].add(10)
is because it's building off an existing and already really well working function that uses the set and get. The case I'm working with is an edge case and wouldn't make sense to reprogram everything to make work for this one case.
I need some means to store instances of this set/get thingy that is accessible but doesn't create some layer of depth that might break existing references.
Please don't ask for actual code. It'd take pages of code to encapsulate the problem.
You could do it if you can (also) use a specialized version of the dictionary which is aware of your Thing class and handles it separately:
class Thing(object):
def __init__(self, thingy):
self._thingy = thingy
def _get_thingy(self):
return self._thingy
def _set_thingy(self, value):
self._thingy += value
thingy = property(_get_thingy, _set_thingy, None, "I'm a 'thingy' property.")
class ThingDict(dict):
def __getitem__(self, key):
if key in self and isinstance(dict.__getitem__(self, key), Thing):
return dict.__getitem__(self, key).thingy
else:
return dict.__getitem__(self, key)
def __setitem__(self, key, value):
if key in self and isinstance(dict.__getitem__(self, key), Thing):
dict.__getitem__(self, key).thingy = value
else:
dict.__setitem__(self, key, value)
theDict = ThingDict({"bob": Thing(5), "suzy": Thing(2), "don": 42})
print(theDict["bob"]) # --> 5
theDict["bob"] = 10
print(theDict["bob"]) # --> 15
# non-Thing value
print(theDict["don"]) # --> 42
theDict["don"] = 10
print(theDict["don"]) # --> 10
No, because to execute theDict["bob"] = 10, the Python runtime doesn't call any methods at all of the previous value of theDict["bob"]. It's not like when myObject.mydescriptor = 10 calls the descriptor setter.
Well, maybe it calls __del__ on the previous value if the refcount hits zero, but let's not go there!
If you want to do something like this then you need to change the way the dictionary works, not the contents. For example you could subclass dict (with the usual warnings that you're Evil, Bad and Wrong to write a non-Liskov-substituting derived class). Or you could from scratch implement an instance of collections.MutableMapping. But I don't think there's any way to hijack the normal operation of dict using a special value stored in it.
theDict["bob"] = 10 is just assign 10 to the key bob for theDict.
I think you should know about the magic methods __get__ and __set__ first. Go to: https://docs.python.org/2.7/howto/descriptor.html Using a class might be easier than dict.

Alternative to "assign to a function call" in a python

I'm trying to solve this newbie puzzle:
I've created this function:
def bucket_loop(htable, key):
bucket = hashtable_get_bucket(htable, key)
for entry in bucket:
if entry[0] == key:
return entry[1]
return None
And I have to call it in two other functions (bellow) in the following way: to change the value of the element entry[1] or to append to this list (entry) a new element. But I can't do that calling the function bucket_loop the way I did because "you can't assign to function call" (assigning to a function call is illegal in Python). What is the alternative (most similar to the code I wrote) to do this (bucket_loop(htable, key) = value and hashtable_get_bucket(htable, key).append([key, value]))?
def hashtable_update(htable, key, value):
if bucket_loop(htable, key) != None:
bucket_loop(htable, key) = value
else:
hashtable_get_bucket(htable, key).append([key, value])
def hashtable_lookup(htable, key):
return bucket_loop(htable, key)
Thanks, in advance, for any help!
This is the rest of the code to make this script works:
def make_hashtable(size):
table = []
for unused in range(0, size):
table.append([])
return table
def hash_string(s, size):
h = 0
for c in s:
h = h + ord(c)
return h % size
def hashtable_get_bucket(htable, key):
return htable[hash_string(key, len(htable))]
Similar question (but didn't help me): SyntaxError: "can't assign to function call"
In general, there are three things you can do:
Write “setter” functions (ex, bucket_set)
Return mutable values (ex, bucket_get(table, key).append(42) if the value is a list)
Use a class which overrides __getitem__ and __setitem__
For example, you could have a class like like:
class Bucket(object):
def __setitem__(self, key, value):
# … implementation …
def __getitem__(self, key):
# … implementation …
return value
Then use it like this:
>>> b = Bucket()
>>> b["foo"] = 42
>>> b["foo"]
42
>>>
This would be the most Pythonic way to do it.
One option that would require few changes would be adding a third argument to bucket_loop, optional, to use for assignment:
empty = object() # An object that's guaranteed not to be in your htable
def bucket_loop(htable, key, value=empty):
bucket = hashtable_get_bucket(htable, key)
for entry in bucket:
if entry[0] == key:
if value is not empty: # Reference (id) comparison
entry[1] = value
return entry[1]
else: # I think this else is unnecessary/buggy
return None
However, a few pointers:
I agree with Ignacio Vazquez-Abrams and David Wolever, a class would be better;
Since a bucket can have more than one key/value pairs, you shouldn't return None if the first entry didn't match your key. Loop through all of them, and only return None in the end; (you can ommit this statement also, the default behavior is to return None)
If your htable doesn't admit None as a value, you can use it instead of empty.
So you're basically cheating at udacity, which is an online cs class / university? Funny part is you couldn't even declare the question properly. Next time cheat thoroughly and paste the two functions you're supposed to simplify and request someone simplify them by creating a third function with the overlapping code within. Doesn't matter anyway because if this is the one you need help in you're likely not doing very well in the class
you were also able to solve the problem without using most of these tools, it was an exercise in understanding how to identify an handle redundancies, NOT efficiency...
Real instructions:
Modify the code for both hashtable_update and hashtable_lookup to have the same behavior they have now, but using fewer lines of code in each procedure.  You should define a new procedure, helper, to help with this.  Your new version should have approximately the same running time as the original version, but neither
hashtable_update or hashtable_lookup should include any for or while loop, and the block of each procedure should be no more than 6 lines of code
Seriously, cheating is lame.

Python dictionary - binary search for a key?

I want to write a container class that acts like a dictionary (actually derives from a dict), The keys for this structure will be dates.
When a key (i.e. date) is used to retrieve a value from the class, if the date does not exist then the next available date that preceeds the key is used to return the value.
The following data should help explain the concept further:
Date (key) Value
2001/01/01 123
2001/01/02 42
2001/01/03 100
2001/01/04 314
2001/01/07 312
2001/01/09 321
If I try to fetch the value associated with key (date) '2001/01/05' I should get the value stored under the key 2001/01/04 since that key occurs before where the key '2001/01/05' would be if it existed in the dictionary.
In order to do this, I need to be able to do a search (ideally binary, rather than naively looping through every key in the dictionary). I have searched for bsearch dictionary key lookups in Python dictionaries - but have not found anything useful.
Anyway, I want to write a class like that encapsulates this behavior.
This is what I have so far (not much):
#
class NearestNeighborDict(dict):
#
"""
#
a dictionary which returns value of nearest neighbor
if specified key not found
#
"""
def __init__(self, items={}):
dict.__init__(self, items)
def get_item(self, key):
# returns the item stored with the key (if key exists)
# else it returns the item stored with the key
You really don't want to subclass dict because you can't really reuse any of its functionality. Rather, subclass the abstract base class collections.Mapping (or MutableMapping if you want to also be able to modify an instance after creation), implement the indispensable special methods for the purpose, and you'll get other dict-like methods "for free" from the ABC.
The methods you need to code are __getitem__ (and __setitem__ and __delitem__ if you want mutability), __len__, __iter__, and __contains__.
The bisect module of the standard library gives you all you need to implement these efficiently on top of a sorted list. For example...:
import collections
import bisect
class MyDict(collections.Mapping):
def __init__(self, contents):
"contents must be a sequence of key/value pairs"
self._list = sorted(contents)
def __iter__(self):
return (k for (k, _) in self._list)
def __contains__(self, k):
i = bisect.bisect_left(self._list, (k, None))
return i < len(self._list) and self._list[i][0] == k
def __len__(self):
return len(self._list)
def __getitem__(self, k):
i = bisect.bisect_left(self._list, (k, None))
if i >= len(self._list): raise KeyError(k)
return self._list[i][1]
You'll probably want to fiddle __getitem__ depending on what you want to return (or whether you want to raise) for various corner cases such as "k greater than all keys in self".
The sortedcontainers module provides a SortedDict type that maintains the keys in sorted order and supports bisecting on those keys. The module is pure-Python and fast-as-C implementations with 100% test coverage and hours of stress.
For example:
from sortedcontainers import SortedDict
sd = SortedDict((date, value) for date, value in data)
# Bisect for the index of the desired key.
index = sd.bisect('2001/01/05')
# Lookup the real key at that index.
key = sd.iloc[index]
# Retrieve the value associated with that key.
value = sd[key]
Because SortedDict supports fast indexing, it's easy to look ahead or behind your key as well. SortedDict is also a MutableMapping so it should work nicely in your type system.
I'd extend a dict, and override the __getitem__ and __setitem__ method to store a sorted list of keys.
from bisect import bisect
class NearestNeighborDict(dict):
def __init__(self):
dict.__init__(self)
self._keylist = []
def __getitem__(self, x):
if x in self:
return dict.__getitem__(self, x)
index = bisect(self._keylist, x)
if index == len(self._keylist):
raise KeyError('No next date')
return dict.__getitem__(self, self._keylist[index])
def __setitem__(self, x, value):
if x not in self:
index = bisect(self._keylist, x)
self._keylist.insert(index, value)
dict.__setitem__(self, x, value)
It's true you're better off inheriting from MutableMapping, but the principle is the same, and the above code can be easily adapted.
Why not just maintain a sorted list from dict.keys() and search that? If you're subclassing dict you may even devise an opportunity to do a binary insert on that list when values are added.
Use the floor_key method on bintrees.RBTree: https://pypi.python.org/pypi/bintrees/2.0.1

Categories