best way to created set of named objects in Python - python

I find myself frequently creating sets of named objects where each object has a unique name. I implement these as dicts whose keys are derived from myObject.name. But this feels a bit clunky to keep the name in two places.
My typical approach looks like this:
class NamedObject(object):
ITEMS = {}
def __init__(self, name, ...other arguments...):
self.name = name
...more initialization...
#classmethod
def create_named_object(cls, name, ...other arguments...):
obj = cls(name, ...other arguments...)
cls.ITEMS[name] = obj
#classmethod
def find_object_by_name(cls, name):
return cls.ITEMS.get(name, None)
#classmethod
def filter_objects(cls, predicate):
return [e for e in cls.ITEMS.values() if predicate(e)]
I know I could create a generalized class to handle this, but is there a more naturally Pythonic way to do this?

There is no more generalised support in the standard library, no, nor is there any more 'Pythonic' way to achieve this than using a dictionary.
What you are doing is providing an lookup table index, and indices generally require some duplication of data. You are trading memory for access speed. But indices are use-case specific and either trivially implemented with a mapping, or too specific to the application to be generalisable to the level that adding that to the language library makes sense.
At least in Python, the string value for the name is not actually duplicated; you just add more references to it; once from the instance and another time from the ITEMS dictionary.

Related

Sorting a list of inherited class instances by a class attribute dictionary Key

Is it possible to write a base class method to sort a list of class objects, stored as a static class variable, in a child class, by a key in a dictionary, that is an attribute of the class using sort or sorted or does a more elaborate sorting method need to be written?
I’m a python noob and I’ve attempted to write the “my_sorter” method using sort & sorted, trying at a lambda key definition, itemgetter, and attrgetter and am not sure if I am failing at the syntax to access these mixed nested structures or if it’s just not possible without writing a more elaborate sorting routine to deliberately shift entries around in the list.
Note that each child class has a static variable named “mindex” that identifies the “primary key” of its attribute dictionary (i.e. a unique value to sort by).
What would the my_sorter() method look like?
class Engine():
storage = []
#classmethod
def my_sorter(cls):
#sort cls.storage by cls.mindex
class Person(Engine):
mindex = 'Name'
def __init__(self, name):
self.attributes = {
'Name' : name,
}
class Meeting (Engine):
mindex = 'Date'
def __init__(self, date):
self.attributes = {
'Date' : date,
}
You don't show anywhere how your objects are ending up in the storage list, but assuming you have that working correctly (and you're not getting a mix of objects of different subclasses all in Engine.storage unexpectedly), the sorting should be pretty simple.
#classmethod
def my_sorter(cls):
return sorted(cls.storage, key=lambda obj: obj.attributes[cls.mindex])
I'm a little skeptical though about whether your Person and Meeting classes should be subclasses of Engine. That doesn't seem like an IS-A relationship to me. Perhaps it would make more sense if I knew the whole project design.

Python heapify by some attribute, reheapify after attribute changes

I'm trying to use the heappq module in the python 3.5 standard library to make a priority queue of objects of the same type. I'd like to be able to heapify based on an attribute of the objects, then change the value of some of those attributes, then re-heapify based on the new values. I'm wondering how I go about doing this.
import heappq
class multiNode:
def __init__(self, keyValue):
self.__key = keyValue
def setKey(self, keyValue):
self.__key = keyValue
def getKey(self):
return self.__key
queue = [multiNode(1), multiNode(2), multiNode(3)]
heapq.heapify(queue) #want to heapify by whatever getKey returns for each node
queue[0].setKey(1000)
heapq.heapify(queue) #re heapify with those new values
There are a variety of ways of making your code work. For instance, you could make your items orderable by implementing some of the rich comparison operator methods (and perhaps use functools.total_ordering to implement the rest):
#functools.total_ordering
class multiNode:
def __init__(self, keyValue):
self.__key = keyValue
def setKey(self, keyValue):
self.__key = keyValue
def getKey(self):
return self.__key
def __eq__(self, other):
if not isinstance(other, multiNode):
return NotImplemented
return self.__key == other.__key
def __lt__(self, other):
if not isinstance(other, multiNode):
return NotImplemented
return self.__key < other.__key
This will make your code work, but it may not be very efficient to reheapify your queue every time you make a change to a node within it, especially if there are a lot of nodes in the queue. A better approach might be to write some extra logic around the queue so that you can invalidate a queue entry without removing it or violating the heap property. Then when you have an item you need to update, you just invalidate it's old entry and add in a new one with the new priority.
Here's a quick and dirty implementation that uses a dictionary to map from a node instance to a [pritority, node] list. If the node is getting its priority updated, the dictionary is checked and the node part of the list gets set to None. Invalidated entries are ignored when popping nodes off the front of the queue.
queue = []
queue_register = {}
def add_to_queue(node)
item = [node.getKey(), node]
queue.heappush(queue, item)
queue_register[node] = item
def update_key_in_queue(node, new_key):
queue_register[node][1] = None # invalidate old item
node.setKey(new_key)
add_to_queue(node)
def pop_from_queue():
node = None
while node is None:
_, node = heapq.heappop(queue) # keep popping items until we find one that's valid
del queue_register[node] # clean up our bookkeeping record
return node
You may want to test this against reheapifying to see which is faster for your program's actual usage of the queue.
A few final notes about your multiNode class (unrelated to what you were asking about in your question):
There are a number of things you're doing in the class that are not very Pythonic. To start with, the most common naming convention for Python uses CapitalizedNames for classes, and lower_case_names_with_underscores for almost everything else (variables of all kinds, functions, modules).
Another issue using double leading underscores for __key. Double leading (and not trailing) undescrores invokes Python's name mangling system. This may seem like its intended as a way to make variables private, but it is not really. It's more intended to help prevent accidental name collisions, such as when you're setting an attribute in a proxy object (that otherwise mimics the attributes of some other object) or in a mixin class (which may be inherited by other types with unknown attributes). If code outside your class really wants to access the mangled attribute __key in your multiNode class, they can still do so by using _multiNode__key. To hint that something is intended to be a private attribute, you should just use a single underscore _key.
And that brings me right to my final issue, that key probably shouldn't be private at all. It is not very Pythonic to use getX and setX methods to modify a private instance variable. It's much more common to document that the attribute is part of the class's public API and let other code access it directly. If you later decide you need to do something fancy whenever the attribute is looked up or modified, you can use a property descriptor to automatically transform attribute access into calls to a getter and setter function. Other programming languages usually start with getters and setters rather than public attributes because there is no such way of changing implementation of an attribute API later on. So anyway, I'd make your class's __init__ just set self.key = keyValue and get rid of setKey and getKey completely!
A crude way of doing what you're looking for would be to use dicts and Python's built in id() method. This method would basically allow you keep your heap as a heap of the id's of the objects that you create and then update those objects by accessing them in the dict where their id's are the keys. I tried this on my local machine and it seems to do what you're looking for:
import heapq
class multiNode:
def __init__(self, keyValue):
self.__key = keyValue
def setKey(self, keyValue):
self.__key = keyValue
def getKey(self):
return self.__key
first_node = multiNode(1)
second_node = multiNode(2)
thrid_node = multiNode(3)
# add more nodes here
q = [id(first_node), id(second_node), id(third_node)]
mutilNode_dict = {
id(first_node): first_node,
id(second_node): second_node,
id(third_node): third_node
}
heapq.heapify(q)
multiNode_dict[q[0]].setKey(1000)
heapq.heapify(q)
heapify() won't really do too much here because the id of the object is going to be the same until it's deleted. It is more useful if you're adding new objects to the heap and taking objects out.

Create objects from dictionary description

The Python MongoDB driver, PyMongo, returns results as dictionaries. I'm trying to figure out the best way to use such a dictionary in an object constructor.
Keep the dictionary as an attribute
self.dict = building_dict
Then each property of the building would be reachable through building.dict["property"].
A better attribute name could be used. Maybe a one-letter attribute. This doesn't look so elegant.
Parse dictionary to create attributes
self.name = building_dict['name']
self.construction_date = building_dict['construction_date']
...
In my model, the dictionaries can be pretty big but this task can be automated in the constructor to perform actions/checks on the values before or after the assignment.
Edit: The use of getters/setters is independent of options 1. and 2. above.
In solution 2., I'd avoid name collision between attributes and their getters by prefixing all dictionary keys by an underscore before making them attributes.
As a side-issue, the dictionary may contain the description of embedded documents, so the constructor should go through the whole dictionary to seek embedded documents that have their specific class in the code and instantiate those classes right away.
Update
I'll most probably use an ODM such as MongoEngine for my project and it will deal with those issues.
Outside of this specific use case (link with MongoDB, existing ODMs,...), the question is still relevant so I'm leaving below the best answer I could come up with.
The best you can do is to create an object. You can instantiate a classwith your dict like this:
building_dict = {'property': 4, 'name': 'my name'} # example dict
my_item = type('MyClass', (), building_dict) # instantiating class MyClass
You can access it afterwards like every other object:
print(my_item.property)
# 4
print(my_item.name)
# my name
My favorite solution so far stores elements as attributes and uses getters/setters:
class Building(object):
def __init__(self, dictionary):
# Check the values
# ...
# Find sub-dictionaries describing instances of another class
# stored as embedded documents in the base, call their
# constructor on sub-directories, then replace each sub-directory
# with the corresponding class instance.
# ...
# Set attributes from dictionary
for key in dictionary:
setattr(self, '_'+key, dictionary[key])
# Add default values if needed, etc.
# ...
# Usual getter/setter stuff
#property
def name(self):
try:
return self._name
except AttributeError as e:
# deal with missing name

Is the way my class inherits list class methods pythonicly correct?

A little example will help clarify my question:
I define two classes: Security and Universe which I would like to behave as a list of Secutity objects.
Here is my example code:
class Security(object):
def __init__(self, name):
self.name = name
class Universe(object):
def __init__(self, securities):
self.securities = securities
s1 = Security('name1')
s2 = Security('name2')
u = Universe([s1, s2])
I would like my Universe Class to be able to use usual list features such as enumerate(), len(), __getitem__()... :
enumerate(u)
len(u)
u[0]
So I defined my Class as:
class Universe(list, object):
def __init__(self, securities):
super(Universe, self).__init__(iter(securities))
self.securities = securities
It seems to work, but is it the appropriate pythonic way to do it ?
[EDIT]
The above solution does not work as I wish when I subset the list:
>>> s1 = Security('name1')
>>> s2 = Security('name2')
>>> s3 = Security('name3')
>>> u = Universe([s1, s2, s3])
>>> sub_u = u[0:2]
>>> type(u)
<class '__main__.Universe'>
>>> type(sub_u)
<type 'list'>
I would like my variable sub_u to remain of type Universe.
You don't have to actually be a list to use those features. That's the whole point of duck typing. Anything that defines __getitem__(self, i) automatically handles x[i], for i in x, iter(x), enumerate(x), and various other things. Also define __len__(self) and len(x), list(x), etc. also work. Or you can define __iter__ instead of __getitem__. Or both. It depends on exactly how list-y you want to be.
The documentation on Python's special methods explains what each one is for, and organizes them pretty nicely.
For example:
class FakeList(object):
def __getitem__(self, i):
return -i
fl = FakeList()
print(fl[20])
for i, e in enumerate(fl):
print(i)
if e < -2: break
No list in sight.
If you actually have a real list and want to represent its data as your own, there are two ways to do that: delegation, and inheritance. Both work, and both are appropriate in different cases.
If your object really is a list plus some extra stuff, use inheritance. If you find yourself stepping on the base class's behavior, you may want to switch to delegation anyway, but at least start with inheritance. This is easy:
class Universe(list): # don't add object also, just list
def __init__(self, securities):
super(Universe, self).__init__(iter(securities))
# don't also store `securities`--you already have `self`!
You may also want to override __new__, which allows you to get the iter(securities) into the list at creation time rather than initialization time, but this doesn't usually matter for a list. (It's more important for immutable types like str.)
If the fact that your object owns a list rather than being one is inherent in its design, use delegation.
The simplest way to delegate is explicitly. Define the exact same methods you'd define to fake being a list, and make them all just forward to the list you own:
class Universe(object):
def __init__(self, securities):
self.securities = list(securities)
def __getitem__(self, index):
return self.securities[index] # or .__getitem__[index] if you prefer
# ... etc.
You can also do delegation through __getattr__:
class Universe(object):
def __init__(self, securities):
self.securities = list(securities)
# no __getitem__, __len__, etc.
def __getattr__(self, name):
if name in ('__getitem__', '__len__',
# and so on
):
return getattr(self.securities, name)
raise AttributeError("'{}' object has no attribute '{}'"
.format(self.__class__.__name__), name)
Note that many of list's methods will return a new list. If you want them to return a new Universe instead, you need to wrap those methods. But keep in mind that some of those methods are binary operators—for example, should a + b return a Universe only if a is one, or only if both are, or if either are?
Also, __getitem__ is a little tricky, because they can return either a list or a single object, and you only want to wrap the former in a Universe. You can do that by checking the return value for isinstance(ret, list), or by checking the index for isinstance(index, slice); which one is appropriate depends on whether you can have lists as element of a Universe, and whether they should be treated as a list or as a Universe when extracted. Plus, if you're using inheritance, in Python 2, you also need to wrap the deprecated __getslice__ and friends, because list does support them (although __getslice__ always returns a sub-list, not an element, so it's pretty easy).
Once you decide those things, the implementations are easy, if a bit tedious. Here are examples for all three versions, using __getitem__ because it's tricky, and the one you asked about in a comment. I'll show a way to use generic helpers for wrapping, even though in this case you may only need it for one method, so it may be overkill.
Inheritance:
class Universe(list): # don't add object also, just list
#classmethod
def _wrap_if_needed(cls, value):
if isinstance(value, list):
return cls(value)
else:
return value
def __getitem__(self, index):
ret = super(Universe, self).__getitem__(index)
return _wrap_if_needed(ret)
Explicit delegation:
class Universe(object):
# same _wrap_if_needed
def __getitem__(self, index):
ret = self.securities.__getitem__(index)
return self._wrap_if_needed(ret)
Dynamic delegation:
class Universe(object):
# same _wrap_if_needed
#classmethod
def _wrap_func(cls, func):
#functools.wraps(func)
def wrapper(*args, **kwargs):
return cls._wrap_if_needed(func(*args, **kwargs))
def __getattr__(self, name):
if name in ('__getitem__'):
return self._wrap_func(getattr(self.securities, name))
elif name in ('__len__',
# and so on
):
return getattr(self.securities, name)
raise AttributeError("'{}' object has no attribute '{}'"
.format(self.__class__.__name__), name)
As I said, this may be overkill in this case, especially for the __getattr__ version. If you just want to override one method, like __getitem__, and delegate everything else, you can always define __getitem__ explicitly, and let __getattr__ handle everything else.
If you find yourself doing this kind of wrapping a lot, you can write a function that generates wrapper classes, or a class decorator that lets you write skeleton wrappers and fills in the details, etc. Because the details depend on your use case (all those issues I mentioned above that can go one way or the other), there's no one-size-fits-all library that just magically does what you want, but there are a number of recipes on ActiveState that show more complete details—and there are even a few wrappers in the standard library source.
That is a reasonable way to do it, although you don't need to inherit from both list and object. list alone is enough. Also, if your class is a list, you don't need to store self.securities; it will be stored as the contents of the list.
However, depending on what you want to use your class for, you may find it easier to define a class that stores a list internally (as you were storing self.securities), and then define methods on your class that (sometimes) pass through to the methods of this stored list, instead of inheriting from list. The Python builtin types don't define a rigorous interface in terms of which methods depend on which other ones (e.g., whether append depends on insert), so you can run into confusing behavior if you try to do any nontrivial manipulations of the contents of your list-class.
Edit: As you discovered, any operation that returns a new list falls into this category. If you subclass list without overriding its methods, then you call methods on your object (explicitly or implicitly), the underlying list methods will be called. These methods are hardcoded to return a plain Python list and do not check what the actual class of the object is, so they will return a plain Python list.

Lookup table for unhashable in Python

I need to create a mapping from objects of my own custom class (derived from dict) to objects of another custom class. As I see it there are two ways of doing this:
I can make the objects hashable. I'm not sure how I would do this. I know I can implement __hash__() but I'm unsure how to actually calculate the hash (which should be an integer).
Since my objects can be compared I can make a list [(myobj, myotherobj)] and then implement a lookup which finds the tuple where the first item in the tuple is the same as the lookup key. Implementing this is trivial (the number of objects is small) but I want to avoid reinventing the wheel if something like this already exists in the standard library.
It seems to me that wanting to look up unhashables would be a common problem so I assume someone has already solved this problem. Any suggestions on how to implement __hash()__ for a dict-like object or if there is some other standard way of making lookup tables of unhashables?
Mappings with mutable objects as keys are generally difficult. Is that really what you want? If you consider your objects to be immutable (there is no way to really enforce immutability in Python), or you know they will not be changed while they are used as keys in a mapping, you can implement your own hash-function for them in several ways. For instance, if your object only has hashable data-members, you can return the hash of a tuple of all data-members as the objects hash.
If your object is a dict-like, you can use the hash of a frozenset of all key-value-pairs.
def __hash__(self):
return hash(frozenset(self.iteritems()))
This only works if all values are hashable. In order to save recalculations of the hashes (which would be done on every lookup), you can cache the hash-value and just recalculate it if some dirty-flag is set.
A simple solution seems to be to do lookup[id(myobj)] = myotherobj instead of lookup[myobj] = myotherobj. Any commente on this approach?
The following should work if you're not storing any additional unhashable objects in your custom class:
def __hash__(self):
return hash(self.items())
Here is an implementation of a frozendict, taken from http://code.activestate.com/recipes/414283/:
class frozendict(dict):
def _blocked_attribute(obj):
raise AttributeError, "A frozendict cannot be modified."
_blocked_attribute = property(_blocked_attribute)
__delitem__ = __setitem__ = clear = _blocked_attribute
pop = popitem = setdefault = update = _blocked_attribute
def __new__(cls, *args):
new = dict.__new__(cls)
dict.__init__(new, *args)
return new
def __init__(self, *args):
pass
def __hash__(self):
try:
return self._cached_hash
except AttributeError:
h = self._cached_hash = hash(tuple(sorted(self.items())))
return h
def __repr__(self):
return "frozendict(%s)" % dict.__repr__(self)
I would replace tuple(sorted(self.items())) by frozenset(self.iteritems()) as in Spacecowboy's answer. And consider adding __slots__ = ("_cached_hash",) to the class.

Categories