I'm trying to create a custom hashing function for strings. I want to hash strings by their character frequency by weight. So that hi and ih will yield the same hash. Can I override __hash__?
Or is creating a wrapper class that holds the string and overriding __hash__ and __eq__ the only way?
You want a derived type with different equality semantics. Usually the approach taken will be to define how equality works, then build the hash method from the structures derived there, since it's neccesary that the hash agree with equality. That might be:
import collections
class FrequencyString(str):
#property
def normalized(self):
try:
return self._normalized
except AttributeError:
self._normalized = normalized = ''.join(sorted(collections.Counter(self).elements()))
return normalized
def __eq__(self, other):
return self.normalized == other.normalized
def __hash__(self):
return hash(self.normalized)
Your assumption is right, you cannot override the base clases in Python. Although can, of course, override what str() will do, it won't work for string literals.
If you are writing code for pre-python 2.2 look at the UserString class if you want to create your own: http://docs.python.org/2/library/userdict.html#module-UserString
Otherwise you can simply inherit str or unicode
In your case simply overwriting the __hash__ method is enough if you want to use it as a dict key. But if you're looking at comparisons than you would have to overwrite __eq__ or __cmp__
You can inherit from str, but since those are immutable you have to subclass them in a slightly different way. Most likely you will want to create new ones from existing strings, so you must also override the __new__ method. You may also have to put in extra special methods to defeat the optimizations that Python does.
Here is an example of subclassing built-in str, the mapstr object that allows easy substitution placeholders in forms.
Related
I’m building a class that extends the list data structure in Python, called a Partitional. I’m adding a few methods that I find myself using frequently when dividing a list into partitions.
The class is initialized with a (nullable) list, which exists as an attribute on the class.
class Partitional(list):
"""Extends the list data type. Adds methods for dividing a list into partition sets
and returning data about those partition sets"""
def __init__(self, source_list: list=[]):
super().__init__()
self.source_list: list = source_list
self.n: int = len(source_list)
...
I want to be able to reliably replace list instances with Partitional instances without violating Liskov substitution. So for list’s methods, I wrote methods on the Partitional class that operate on self.source_list, e.g.
...
def remove(self, matched_item):
self.source_list.remove(matched_item)
self.__init__(self.source_list)
def pop(self, *args):
popped_item = self.source_list.pop(*args)
self.__init__(self.source_list)
return popped_item
def clear(self):
self.source_list.clear()
self.__init__(self.source_list)
...
(the __init__ call is there because the Partitional class builds some internal attributes based on self.source_list when it’s initialized, so these need to be rebuilt if source_list changes.)
And I also want Python’s built-in methods that take a list as an argument to work with a Partitional instance, so I set to work writing method overrides for those as well, e.g.
...
def __len__(self):
return len(self.source_list)
def __enumerate__(self):
return enumerate(self.source_list)
...
The relevant built-in methods are a finite set for any given Python version, but... is there not a simpler way to do this?
My question:
Is there a way to write a class such that, if an instance of that class is used as the argument for a function, the class provides an attribute to the function instead, by default?
That way I’d only need to override this default behaviour for a subset of built-in methods.
So for example, if a use case involving a list instance looks like this:
example_list: list = [1,2,3,4,5]
length = len(example_list)
we substitute a Partitional instance built from the same list:
example_list: list = [1,2,3,4,5]
example_partitional = Partitional(example_list)
length = len(example_partitional)
and what’s “actually” happening is this:
length = len(example_partitional.source_list)
i.e.
length = len([1,2,3,4,5])
Other notes:
In working on this, I’ve realized that there are two broad categories of Liskov substitution violation possible:
Inherent violation, where the structure of the child class will make it incompatible with any use case where the child class is used in place of the parent class, e.g. if you override some fundamental property or structure of the parent.
Context-dependent violation, where, for any given piece of software, so long as you never use the child class in a way that would violate Liskov substitution, you’re fine. E.g. You override a method on the parent class that would change how a built-in function acts when it takes an instance of the class as an argument, but you never use that built-in method with the class instance in your system. Or any system that depends on your system. Or... (you see how relying on this caveat is not foolproof)
What I’m looking to do is come up with a technique that will protect against both categories of violation, without having to worry about use cases and context.
I know that what I'm doing is probably not the best way to do it, but right now I can't think of another way.
What I basically have is this:
class foo:
def __init__(self):
self.bar = ['bundy']
Currently, I'm defining a lot of methods for my class to return the result of the method on the list, like this:
def __len__(self):
return len(self.bar)
Of course, there are also other methods and objects which are not to do with bar - I'm not reinventing the list
Is there an easier way to 'copy' the methods, so that I don't have to define them all one by one?
You have to define some methods one by one, like you are doing.
However, there is a base-class in Python, other than list, that gives you a well defined way of which methods you need to supply, and defines the remaining methods that can be based on this minimum set.
These are the provided "abstract base classes" - what you want is to implement your object as a "Mutable Sequence" - them you only have to implement __getitem__, __setitem__, __delitem__, __len__, insert to have the full list functionality.
In python 3.x, just inherit your class from collections.abc.MutableSequence and implement those. (In Python 2.7 it is collection.MutableSequence instead.).
By doing this, you will get for free __contains__, __iter__, __reversed__, index, count, append, reverse, extend, pop, remove and __iadd__ methods.
I've created a class that is a tuple wrapper and tuples doesn't support item mutations.
Should I leave __setitem__ and __delitem__ implementation or implement those methods like e.g. below (thus fall in kind of Refused Bequest code smell)? Which approach is more pythonic? Aren't custom exceptions better in such case?
def __setitem__(self, key, value):
"""
:raise: Always.
:raises: TypeError
"""
self.data_set[key] = value # Raise from tuple.
def __delitem__(self, key):
"""
:raise: Always.
:raises: TypeError
"""
raise TypeError("Item deletion is unsupported") # Custom exceptions thrown.
If your class is supposed to be a proper tuple subtype (according to Liskov substitution principle), then it should behave the same way as a tuple wrt/ to set/del - which as Guillaume mentions is the default behaviour if you just define neither __setitem__ nor __delitem__. I don't see how that would fall into the "Refused Bequest" category.
If your class uses a tuple as part of it's implementation but is NOT supposed to be a proper tuple subtype, then do whatever makes sense - but if you don't want to allow item assignment / deletion then again the simplest thing is to not implement them.
Although that is a matter of taste, I think you should not implement them at all. A class that has a __setitem__, __delitem__ implements the mutable collection protocol (either implicitly, or even explicitly by using collection abstract base classes). Your class just does not support this interface, that's it, and the user has neither reason nor right to assume it does
Implement one or the other or both if they make sense for your custom class.
If you implement __setitem__() you will be able to use yourobject[yourindex] = yourvalue syntax in your code (with the semantic that you choose to implement).
If you implement __delitem__() you will be able to use del yourobject[yourindex]
It makes no sense to explictly implement a method just to raise an Exception, Python will do it by default:
class Test(object):
pass
test = Test()
test['foo'] = 'bar' # will call Test.__setitem__() which is not explicitly defined
will give TypeError: 'Test' object does not support item assignment
I have several classes where I want to add a single property to each class (its md5 hash value) and calculate that hash value when initializing objects of that class, but otherwise maintain everything else about the class. Is there any more elegant way to do that in python than to create a subclass for all the classes where I want to change the initialization and add the property?
You can add properties and override __init__ dynamically:
def newinit(self, orig):
orig(self)
self._md5 = #calculate md5 here
_orig_init = A.__init__
A.__init__ = lambda self: newinit(self, _orig_init)
A.md5 = property(lambda self: self._md5)
However, this can get quite confusing, even once you use more descriptive names than I did above. So I don't really recommend it.
Cleaner would probably be to simply subclass, possibly using a mixin class if you need to do this for multiple classes. You could also consider creating the subclasses dynamically using type() to cut down on the boilerplate further, but clarity of code would be my first concern.
I want to do something like this:
class Dictable:
def dict(self):
raise NotImplementedError
class Foo(Dictable):
def dict(self):
return {'bar1': self.bar1, 'bar2': self.bar2}
Is there a more pythonic way to do this? For example, is it possible to overload the built-in conversion dict(...)? Note that I don't necessarily want to return all the member variables of Foo, I'd rather have each class decide what to return.
Thanks.
The Pythonic way depends on what you want to do. If your objects shouldn't be regarded as mappings in their own right, then a dict method is perfectly fine, but you shouldn't "overload" dict to handle dictables. Whether or not you need the base class depends on whether you want to do isinstance(x, Dictable); note that hasattr(x, "dict") would serve pretty much the same purpose.
If the classes are conceptually mappings of keys to values, then implementing the Mapping protocol seems appropriate. I.e., you'd implement
__getitem__
__iter__
__len__
and inherit from collections.Mapping to get the other methods. Then you get dict(Foo()) for free. Example:
class Foo(Mapping):
def __getitem__(self, key):
if key not in ("bar1", "bar2"):
raise KeyError("{} not found".format(repr(key))
return getattr(self, key)
def __iter__(self):
yield "bar1"
yield "bar2"
def __len__(self):
return 2
Firstly, look at collections.ABC, which describes the Python abstract base class protocol (equivalent to interfaces in static languages).
Then, decide if you want to write your own ABC or make use of an existing one; in this case, Mapping might be what you want.
Note that although the dict constructor (i.e. dict(my_object)) is not overrideable, if it encounters an iterable object that yields a sequence of key-value pairs, it will construct a dict from that; i.e. (Python 2; for Python 3 replace items with iteritems):
def __iter__(self):
return {'bar1': self.bar1, 'bar2': self.bar2}.iteritems()
However, if your classes are intended to behave like a dict you shouldn't do this as it's different from the expected behaviour of a Mapping instance, which is to iterate over keys, not key-value pairs. In particular it would cause for .. in to behave incorrectly.
Most of the answers here are about making your class behave like a dict, which isn't actually what you asked. If you want to express the idea, "I am a class that can be turned into a dict," I would simply define a bunch of classes and have them each implement .dict(). Python favors duck-typing (what an object can do) over what an object is. The ABC doesn't add much. Documentation serves the same purpose.
You can certainly overload dict() but you almost never want to! Too many aspects of the standard library depend upon dict being available and you will break most of its functionality. You cab probably do something like this though:
class Dictable:
def dict(self):
return self.__dict__