Pythonic slicing of nested attributes - python

I am dealing with classes whose attributes are sometimes list whose elements can be dictionaries or further nested objects with attributes etc. I would like to perform some slicing that with my grasp of python is only doable with what feels profoundly un-Pythonic.
My minimal code looks like this:
class X(object):
def __init__(self):
self.a = []
x=X()
x.a.append({'key1':'v1'})
x.a.append({'key1':'v2'})
x.a.append({'key1':'v3'})
# this works as desired
x.a[0]['key1'] # 'v1'
I would like to perform an access to a key in the nested dictionary but make that call for all elements of the list containing that dictionary. The standard python way of doing this would be a list comprehension a la:
[v['key1'] for v in x.a]
However, my minimal example doesn't quite convey the full extent of nesting in my real-world scenario: The attribute list a in class X might contain objects, whose attributes are objects, whose attributes are dictionaries whose keys I want to select on while iterating over the outer list.
# I would like something like
useful_list = x.a[:]['key1'] # TypeError: list indices must be integers, not str
# or even better
cool_list = where(x.a[:]['key1'] == 'v2') # same TypeError
If I start list comprehending for every interesting key it quickly doesn't look all that Pythonic. Is there a nice way of doing this or do I have to code 'getter' methods for all conceivable pairings of lists and dictionary keys?
UPDATE:
I have been reading about overloading lists. Apparently one can mess with the getitem method which is used for indeces for lists and keys for dict. Maybe a custom class that iterates over list members. This is starting to sound contrived...

So, you want to create an hierarchical structure, with an operation which means
different things for different types, and is defined recursively.
Polymorphism to the rescue.
You could override __getitem__ instead of my get_items below, but in your case it might be better to define a non-builtin operation to avoid risking ambiguity. It's up to you really.
class ItemsInterface(object):
def get_items(self, key):
raise NotImplementedError
class DictItems(ItemsInterface, dict):
def __init__(self, *args, **kwargs):
dict.__init__(self, *args, **kwargs)
def get_items(self, key):
res = self[key]
# apply recursively
try:
res = res.get_items(key)
except AttributeError:
pass
return res
class ListItems(ItemsInterface, list):
def __init__(self, *args, **kwargs):
list.__init__(self, *args, **kwargs)
def get_items(self, key):
return [ x.get_items(key) for x in self ]
x = ListItems()
x.append(DictItems({'key1':'v1'}))
x.append(DictItems({'key1':'v2'}))
x.append(DictItems({'key1':'v3'}))
y = DictItems({'key1':'v999'})
x.append(ListItems([ y ]))
x.get_items('key1')
=> ['v1', 'v2', 'v3', ['v999']]
Of course, this solution might not be exactly what you need (you didn't explain what it should do if the key is missing, etc.)
but you can easily modify it to suit your needs.
This solution also supports ListItems as values of the DictItems. the get_items operation is applied recursively.

Related

Efficiently mapping unhashable objects to their index in a list

A Python list
f = [x0, x1, x2]
may be seen as an efficient representation of a mapping from [0, 1, ..., len(f) - 1] to the set of its elements. By "efficient" I mean that f[i] returns the element associated with i in O(1) time.
The inverse mapping may be defined as follows:
class Inverse:
def __init__(self, f):
self.f = f
def __getitem__(self, x):
return self.f.index(x)
This works, but Inverse(f)[x] takes O(n) time on average.
Alternatively, one may use a dict:
f_inv = {x: i for i, x in enumerate(f)}
This has O(1) average time complexity, but it requires the objects in the list to be hashable.
Is there a way to define an inverse mapping that provides equality-based lookups, in O(1) average time, with unhashable objects?
Edit: sample input and expected output:
>>> f = [x0, x1, x2]
>>> f_inv = Inverse(f) # this is to be defined
>>> f_inv[x0] # in O(1) time
0
>>> f_inv[x2] # in O(1) time
2
You can create an associated dictionary mapping the object ID's back to the list index.
The obvious disadvantage is that you will have to search the index for the identity object, not for on eobject that is merely equal.
On the upside, by creating a custom MutableSequence class using collections.abc, you can, with minimal code, write a class that keeps your data both as a sequence and as the reverse dictionary.
from collections.abc import MutableSequence
from threading import RLock
class MD(dict):
# No need for a full MutableMapping subclass, as the use is limited
def __getitem__(self, key):
return super().__getitem__(id(key))
class Reversible(MutableSequence):
def __init__(self, args):
self.seq = list()
self.reverse = MD()
self.lock = RLock()
for element in args:
self.append(element)
def __getitem__(self, index):
return self.seq[index]
def __setitem__(self, index, value):
with self.lock:
del self.reverse[id(self.seq[index])]
self.seq[index] = value
self.reverse[id(value)] = index
def __delitem__(self, index):
if index < 0:
index += len(self)
with self.lock:
# Increase all mapped indexes
for obj in self.seq[index:]:
self.reverse[obj] -= 1
del self.reverse[id(self.seq[index])]
del self.seq[index]
def __len__(self):
return len(self.seq)
def insert(self, index, value):
if index < 0:
index += len(self)
with self.lock:
# Increase all mapped indexes
for obj in self.seq[index:]:
self.reverse[obj] += 1
self.seq.insert(index, value)
self.reverse[id(value)] = index
And voilá: just use this object in place of your list, and the public attribute "reverse" to get the index of identity objects.
Perceive you can augment the "intelligence" of the "MD" class by trying to use different strategies, like to use the objects themselves, if they are hashable, and only resort to id, or other custom key based on other object attributes, when needed. That way you could mitigate the need for the search to be for the same object.
So, for ordinary operations on the list, this class maintain the reverted dictionary synchronized. There is no support for slice indexing, though.
For more information, check the docs at https://docs.python.org/3/library/collections.abc.html
Unfortunately you're stuck with an algorithm limitation here. Fast lookup structures, like hash tables or binary trees, are efficient because they put objects in particular buckets or order them based on their values. This requires them to be hashable or comparable consistently for the entire time you are storing them in this structure, otherwise a lookup is very likely to fail.
If the objects you need are mutable (usually the reason they are not hashable) then any time an object you are tracking changes you need to update the data structure. The safest way to do this is to create immutable objects. If you need to change an object, then create a new one, remove the old one from the dictionary, and insert the new object as a key with the same value.
The operations here are still O(1) with respect to the size of the dictionary, you just need to consider whether the cost of copying objects on every change is worth it.

What happens when I loop a dict in python

I know Python will simply return the key list when I put a dict in for...in... syntax.
But what what happens to the dict?
When we use help(dict), we can not see __next()__ method in the method list. So if I want to make a derived class based on dict:
class MyDict(dict)
def __init__(self, *args, **kwargs):
super(MyDict, self).__init__(*args, **kwargs)
and return the value list with for...in...
d = Mydict({'a': 1, 'b': 2})
for value in d:
what should I do?
Naively, if all you want is for iteration over an instance of MyClass to yield the values instead of the keys, then in MyClass define:
def __iter__(self):
return self.itervalues()
In Python 3:
def __iter__(self):
return iter(self.values())
But beware! By doing this your class no longer implements the contract of collections.MutableMapping, even though issubclass(MyClass, collections.MutableMapping) is True. You might be better off not subclassing dict, if this is the behaviour you want, but instead have an attribute of type dict to hold the data, and implement only the functions and operators you need.

Python - Recommended way to dynamically add methods within a class

I have a class where I want to initialize an attribute self.listN and an add_to_listN method for each element of a list, e.g. from attrs = ['list1', 'list2'] I want list1 and list2 to be initialized as empty lists and the methods add_to_list1 and add_to_list2 to be created. Each add_to_listN method should take two parameters, say value and unit, and append a tuple (value, unit) to the corresponding listN.
The class should therefore look like this in the end:
class Foo():
def __init__(self):
self.list1 = []
self.list1 = []
def add_to_list1(value, unit):
self.list1.append((value, unit))
def add_to_list2(value, unit):
self.list2.append((value, unit))
Leaving aside all the checks and the rest of the class, I came up with this:
class Foo():
def __init__(self):
for attr in ['list1', 'list2']:
setattr(self, attr, [])
setattr(self, 'add_to_%s' % attr, self._simple_add(attr))
def _simple_add(self, attr):
def method(value, unit=None):
getattr(self, attr).append((value, unit))
return method
I also checked other solutions such as the ones suggested here and I would like to do it "right", so my questions are:
Are/Should these methods (be) actually classmethods or not?
Is there a cost in creating the methods in __init__, and in this case is there an alternative?
Where is the best place to run the for loop and add these methods? Within the class definition? Out of it?
Is the use of metaclasses recommended in this case?
Update
Although Benjamin Hodgson makes some good points, I'm not asking for a (perhaps better) alternative way to do this but for the best way to use the tools that I mentioned. I'm using a simplified example in order not to focus on the details.
To further clarify my questions: the add_to_listN methods are meant to be additional, not to replace setters/getters (so I still want to be able to do l1 = f.list1 and f.list1 = [] with f = Foo()).
You are making a design error. You could override __getattr__, parse the attribute name, and return a closure which does what you want, but it's strange to dynamically generate methods, and strange code is bad code. There are often situations where you need to do it, but this is not one of them.
Instead of generating n methods which each do the same thing to one of n objects, why not just write one method which is parameterised by n? Something roughly like this:
class Foo:
def __init__(self):
self.lists = [
[],
[]
]
def add(self, row, value):
self.lists[row].append(value)
Then foo.add1(x) becomes simply foo.add(1, x); foo.add2(x) becomes foo.add(2, x), and so on. There's one method, parameterised along the axis of variation, which serves all cases - rather than a litany of ad-hoc generated methods. It's much simpler.
Don't mix up the data in your system with the names of the data in your system.

Adding an element to a collection type without changing the collection type in Python

Lets say I have a class that would work having either a tuple, a list, a dictionary, or a set of another type of object.
Something like this:
class AbstractClass:
"""An example class"""
def __init__(self, items=None):
self.items = items
def items(self):
"""Returns the items that this instance has"""
return self.items
Now I want to add a method like this:
def add_item(self, item):
"""Adds item to items"""
# code goes here
Now I'm stuck. I don't want to have to check if items is a list, tuple, and etc. and then do it on a case by case basis (as it simply seems unpythonic), but there doesn't seem to be one method that works universally. I would also want to try and preserve the type, so I don't want to convert items to a list (for example) and then use list's method of adding an item (either with items.append(item) or items + [item,]). Any suggestions?
The following is a limited list of examples for the expected behavior:
List
a = AbstractClass([1, 2])
a.add_item(4) # a.items now contains [1, 2, 4] in any order
Tuple
a = AbstractClass((1, 2))
a.add_item(4) # a.items now contains (1, 2, 4) in any order
Dictionary (note: this one is really quite optional, as I don't expect to be using this)
a = AbstractClass({0:1, 2:2})
a.add_item({3:4}) # a.items now should be {0:1, 2:2, 3:4}
Note: This is not meant to be used in practice, I just wanted to test the limits of python's dynamic nature
You could do this and it will work for all of the standard collection types:
def add_item(self, item):
# You could use append here, and it'd be faster
temporary_list = list(self.items) + [item]
self.items = type(self.items)(temporary_list)
This works because type(x) returns the "type" of that object. To extract out the actual type string ("list", "set", "tuple", etc.), we have to do some string manipulation. We can then construct a statement equivalent to:
# For lists
self.items = list(temporary_list)
# For sets
self.items = set(temporary_list)
# And so on...
EDIT: I have no idea why I was downvoted. Stackoverflow advises against downvoting without explaining why, just so you know.
You can check for method names you think will work, if you don't want to use isinstance():
def __init__(self, items=None):
self.items = items
if hasattr(self.items, 'add'):
self.add_item = self.items.add
elif hasattr(self.items, 'append'):
self.add_item = self.items.append
else:
raise Exception("no add method found")

Python dictionary - binary search for a key?

I want to write a container class that acts like a dictionary (actually derives from a dict), The keys for this structure will be dates.
When a key (i.e. date) is used to retrieve a value from the class, if the date does not exist then the next available date that preceeds the key is used to return the value.
The following data should help explain the concept further:
Date (key) Value
2001/01/01 123
2001/01/02 42
2001/01/03 100
2001/01/04 314
2001/01/07 312
2001/01/09 321
If I try to fetch the value associated with key (date) '2001/01/05' I should get the value stored under the key 2001/01/04 since that key occurs before where the key '2001/01/05' would be if it existed in the dictionary.
In order to do this, I need to be able to do a search (ideally binary, rather than naively looping through every key in the dictionary). I have searched for bsearch dictionary key lookups in Python dictionaries - but have not found anything useful.
Anyway, I want to write a class like that encapsulates this behavior.
This is what I have so far (not much):
#
class NearestNeighborDict(dict):
#
"""
#
a dictionary which returns value of nearest neighbor
if specified key not found
#
"""
def __init__(self, items={}):
dict.__init__(self, items)
def get_item(self, key):
# returns the item stored with the key (if key exists)
# else it returns the item stored with the key
You really don't want to subclass dict because you can't really reuse any of its functionality. Rather, subclass the abstract base class collections.Mapping (or MutableMapping if you want to also be able to modify an instance after creation), implement the indispensable special methods for the purpose, and you'll get other dict-like methods "for free" from the ABC.
The methods you need to code are __getitem__ (and __setitem__ and __delitem__ if you want mutability), __len__, __iter__, and __contains__.
The bisect module of the standard library gives you all you need to implement these efficiently on top of a sorted list. For example...:
import collections
import bisect
class MyDict(collections.Mapping):
def __init__(self, contents):
"contents must be a sequence of key/value pairs"
self._list = sorted(contents)
def __iter__(self):
return (k for (k, _) in self._list)
def __contains__(self, k):
i = bisect.bisect_left(self._list, (k, None))
return i < len(self._list) and self._list[i][0] == k
def __len__(self):
return len(self._list)
def __getitem__(self, k):
i = bisect.bisect_left(self._list, (k, None))
if i >= len(self._list): raise KeyError(k)
return self._list[i][1]
You'll probably want to fiddle __getitem__ depending on what you want to return (or whether you want to raise) for various corner cases such as "k greater than all keys in self".
The sortedcontainers module provides a SortedDict type that maintains the keys in sorted order and supports bisecting on those keys. The module is pure-Python and fast-as-C implementations with 100% test coverage and hours of stress.
For example:
from sortedcontainers import SortedDict
sd = SortedDict((date, value) for date, value in data)
# Bisect for the index of the desired key.
index = sd.bisect('2001/01/05')
# Lookup the real key at that index.
key = sd.iloc[index]
# Retrieve the value associated with that key.
value = sd[key]
Because SortedDict supports fast indexing, it's easy to look ahead or behind your key as well. SortedDict is also a MutableMapping so it should work nicely in your type system.
I'd extend a dict, and override the __getitem__ and __setitem__ method to store a sorted list of keys.
from bisect import bisect
class NearestNeighborDict(dict):
def __init__(self):
dict.__init__(self)
self._keylist = []
def __getitem__(self, x):
if x in self:
return dict.__getitem__(self, x)
index = bisect(self._keylist, x)
if index == len(self._keylist):
raise KeyError('No next date')
return dict.__getitem__(self, self._keylist[index])
def __setitem__(self, x, value):
if x not in self:
index = bisect(self._keylist, x)
self._keylist.insert(index, value)
dict.__setitem__(self, x, value)
It's true you're better off inheriting from MutableMapping, but the principle is the same, and the above code can be easily adapted.
Why not just maintain a sorted list from dict.keys() and search that? If you're subclassing dict you may even devise an opportunity to do a binary insert on that list when values are added.
Use the floor_key method on bintrees.RBTree: https://pypi.python.org/pypi/bintrees/2.0.1

Categories