Python typed collections

Python typed collections - python

I am new to python (using python 3.6).
I have some class that represents amounts of some fictional coins.
So an instance could represent say 10 bluecoins or negative sums such as -20 redcoins and so on.
I can now hold in a list several such CoinAmounts in a list.
e.g.
[CoinAmount(coin='blue',amount=-10), CoinAmount(coin='blue',amount=20),
CoinAmount(coin='red',amount=5), CoinAmount(coin='red',amount=-5),
CoinAmount(coin='green',amount=5)]
I want to be able to "compress" the above list by summing each type of coin so that I will have.
[CoinAmount(coin='blue',amount=10), CoinAmount(coin='green',amount=5)]
or
[CoinAmount(coin='blue',amount=10), CoinAmount(coin='red',amount=0), CoinAmount(coin='green',amount=5)]
from which it is easy to derive the former...
My Q's are:
1) Would it make sense to have some sort of a ListOfCoinAmounts that subclasses list and adds a compress method? or should I use so CoinAmountUtils class that has a static method that works on a list and Compreses it?
2) Is there a way to ensure that the list actually holds only CoinAmounts or is this should just be assumed and followed (or both - i.e. it can be done but shouldn't ?
3) In a more general way what is the best practice "pythonic" way to handle a "List of something specific"?

Inheritance - when not used for typing - is mostly a very restricted form of composition / delegation, so inheriting from list is ihmo a bad design.
Having some CoinContainer class that delegates to a list is a much better design, in that 1/ it gives you full control of the API and 2/ it lets you change the implementation as you want (you may find out that a list is not the best container for your needs).
Also it will be easier to implement since you don't have to make sure you override all of the list methods and magicmethods, only the ones you need (cf point #1).
wrt/ type-cheking, it's usually not considered pythonic - it's the client code responsability to make sure it only passes compatible objects. If you really want some type-checking here at least use an ABC and test against this ABC, not against a fixed type.

1) Subclassing list and having only CoinAmount type of elements in it is a good and cleaner method IMO.
2) Yes, that can be done. You can inherit the python list and override append method to check for types.
A good example here : Overriding append method after inheriting from a Python List
3) A good practice is indeed extending the list and putting your customizations.

Related

python - Executing transform function on parameter dict when creating new transformdict

I've read about this cool new dictionary type, the transformdict
I want to use it in my project, by initializing a new transform dict with regular dict:
tran_d = TransformDict(str.lower, {'A':1, 'B':2})
which succeeds but when I run this:
tran_d.keys()
I get:
['A', 'B']
How would you suggest to execute the transform function on the parameter (regular) dict when creating the new transform dict?
Just to be clear I want the following:
tran_d.keys() == ['a', 'b']

I already said it in the comments but it's important to realize that this is not what TransformDict is meant to do. Therefore you could subclass it with a custom implementation for keys:
class MyTransformDict(TransformDict):
def keys(self):
return map(self.transform_func, super().keys())
Depending on your Python version you probably need to use list() around the map (Python 3) or provide arguments for super: super(TransformDict, self) (Python 2). But it should illustrate the principle.
As #Rawing pointed out in the comments there will be more methods that don't work as expected, i.e. __iter__, items and probably also __repr__.

Per the implementation I have seen, the transformation function can be achieved through a property named transform_func, so
list(map(tran_d.transform_func, tran_d.keys()))
should do.

I wouldn't bother using TransformDict. It has been proposed as PEP 455 and been rejected. This means it won't be a built-in feature, so you'd have to manually implement it on your own or use some library that does it.
The BDFL delegate's conclusions about the PEP can be found here. The stripped down version is:
It is less readable than converting keys before usage.
It breaks in strange ways that sometimes even emit wrong errors.
It introduces unneeded complexity, since using plain dicts avoids above problems.

In addition to #Ronan-Paixão answer
TransformDict was a hypergeneralization which sprang up out of wanting case-folding keys, but with no rigorous research into what real world users might need the generalization for --- meaning the user expectations of what it should do were not well thought through, as the original question illustrates.
A recommendation is to implement your own dictionary subclass, to fit your own use case, as other answers have suggested.
So rather than suggesting "do not use TransformDict" I would suggest, "build your own, but give your class a better more descriptive name", then you'll know what it does, will have it quarantined, and not encourage bad stuff in the repos.
A good reference in addition to the PEP 455 is Hettinger's presentation: http://il.pycon.org/2016/static/sessions/raymond-hettinger.pdf

Python - Versioned list instead of immutable list?

Update:
As of CPython 3.6, dictionaries have a version (thank you pylang for showing this to me).
If they added the same version to list and made it public, all 3 asserts from my original post would pass! It would definitely meet my needs. Their implementation differs from what I envisioned, but I like it.
As it is, I don't feel I can use dictionary version:
It isn't public. Jake Vanderplas shows how to expose it in a post, but he cautions: definitely not code you should use for any purpose beyond simply having fun. I agree with his reasons.
In all of my use cases, the data is conceptually arrays of elements each of which has the same structure. A list of tuples is a natural fit. Using a dictionary would make the code less natural and probably more cumbersome.
Does anyone know if there are plans to add version to list?
Are there plans to make it public?
If there are plans to add version to list and make it public, I would feel awkward putting forward an incompatible VersionedList now. I would just implement the bare minimum I need and get by.
Original post below
Turns out that many of the times I wanted an immutable list, a VersionedList would have worked almost as well (sometimes even better).
Has anyone implemented a versioned list?
Is there a better, more Pythonic, concept that meets my needs? (See motivation below.)
What I mean by a versioned list is:
A class that behaves like a list
Any change to an instance or elements in the instance results in instance.version() being updated. So, if alist is a normal list:
a = VersionedList(alist)
a_version = a.version()
change(a)
assert a_version != a.version()
reverse_last_change(a)
If a list was hashable, hash() would achieve the above and meet all the needs identified in the motivation below. We need to define 'version()' in a way that doesn't have all of the same problems as 'hash()'.
If identical data in two lists is highly unlikely to ever happen except at initialization, we aren't going to have a reason to test for deep equality. From (https://docs.python.org/3.5/reference/datamodel.html#object.hash) The only required property is that objects which compare equal have the same hash value. If we don't impose this requirement on 'version()', it seems likely that 'version()' won't have all of the same problems that makes lists unhashable. So unlike hash, identical contents doesn't mean the same version
#contents of 'a' are now identical to original, but...
assert a_version != a.version()
b = VersionedList(alist)
c = VersionedList(alist)
assert b.version() != c.version()
For VersionList, it would be good if any attempt to modify the result of __get__ automatically resulted in a copy instead of modifying the underlying implementation data. I think that the only other option would be to have __get__ always return a copy of the elements, and this would be very inefficient for all of the use cases I can think of. I think we need to restrict the elements to immutable objects (deeply immutable, for example: exclude tuples with list elements). I can think of 3 ways to achieve this:
Only allow elements that can't contain mutable elements (int, str, etc are fine, but exclude tuples). (This is far too limiting for my cases)
Add code to __init__, __set__, etc to traverse inputs to deeply check for mutable sub-elements. (expensive, any way to avoid this?)
Also allow more complex elements, but require that they are deeply immutable. Perhaps require that they expose a deeply_immutable attribute. (This turns out to be easy for all the use cases I have)
Motivation:
If I am analyzing a dataset, I often have to perform multiple steps that return large datasets (note: since the dataset is ordered, it is best represented by a List not a set).
If at the end of several steps (ex: 5) it turns out that I need to perform different analysis (ex: back at step 4), I want to know that the dataset from step 3 hasn't accidentally been changed. That way I can start at step 4 instead of repeating steps 1-3.
I have functions (control-points, first-derivative, second-derivative, offset, outline, etc) that depend on and return array-valued objects (in the linear algebra sense). The base 'array' is knots.
control-points() depends on: knots, algorithm_enum
first-derivative() depends on: control-points(), knots
offset() depends on: first-derivative(), control-points(), knots, offset_distance
outline() depends on: offset(), end_type_enum
If offset_distance changes, I want to avoid having to recalculate first-derivative() and control-points(). To avoid recalculation, I need to know that nothing has accidentally changed the resultant 'arrays'.
If 'knots' changes, I need to recalculate everything and not depend on the previous resultant 'arrays'.
To achieve this, knots and all of the 'array-valued' objects could be VersionedList.
FYI: I had hoped to take advantage of an efficient class like numpy.ndarray. In most of my use cases, the elements logically have structure. Having to mentally keep track of multi-dimensions of indexes meant implementing and debugging the algorithms was many times more difficult with ndarray. An implementation based on lists of namedtuples of namedtuples turned out to be much more sustainable.

Private dicts in 3.6
In Python 3.6, dictionaries are now private (PEP 509) and compact (issue 27350), which track versions and preserve order respectively. These features are presently true when using the CPython 3.6 implementation. Despite the challenge, Jake VanderPlas demonstrates in his blog post a detailed demonstration of exposing this versioning feature from CPython within normal Python. We can use his approach to:
determine when a dictionary has been updated
preserve the order
Example
import numpy as np
d = {"a": np.array([1,2,3]),
"c": np.array([1,2,3]),
"b": np.array([8,9,10]),
}
for i in range(3):
print(d.get_version()) # monkey-patch
# 524938
# 524938
# 524938
Notice the version number does not change until the dictionary is updated, as shown below:
d.update({"c": np.array([10, 11, 12])})
d.get_version()
# 534448
In addition, the insertion order is preserved (the following was tested in restarted sessions of Python 3.5 and 3.6):
list(d.keys())
# ['a', 'c', 'b']
You may be able to take advantage of this new dictionary behavior, saving you from implementing a new datatype.
Details
For those interested, the latter get_version()is a monkey-patched method for any dictionary, implemented in Python 3.6 using the following modified code derived from Jake VanderPlas' blog post. This code was run prior to calling get_version().
import types
import ctypes
import sys
assert (3, 6) <= sys.version_info < (3, 7) # valid only in Python 3.6
py_ssize_t = ctypes.c_ssize_t
# Emulate the PyObjectStruct from CPython
class PyObjectStruct(ctypes.Structure):
_fields_ = [('ob_refcnt', py_ssize_t),
('ob_type', ctypes.c_void_p)]
# Create a DictStruct class to wrap existing dictionaries
class DictStruct(PyObjectStruct):
_fields_ = [("ma_used", py_ssize_t),
("ma_version_tag", ctypes.c_uint64),
("ma_keys", ctypes.c_void_p),
("ma_values", ctypes.c_void_p),
]
def __repr__(self):
return (f"DictStruct(size={self.ma_used}, "
f"refcount={self.ob_refcnt}, "
f"version={self.ma_version_tag})")
#classmethod
def wrap(cls, obj):
assert isinstance(obj, dict)
return cls.from_address(id(obj))
assert object.__basicsize__ == ctypes.sizeof(PyObjectStruct)
assert dict.__basicsize__ == ctypes.sizeof(DictStruct)
# Code for monkey-patching existing dictionaries
class MappingProxyStruct(PyObjectStruct):
_fields_ = [("mapping", ctypes.POINTER(DictStruct))]
#classmethod
def wrap(cls, D):
assert isinstance(D, types.MappingProxyType)
return cls.from_address(id(D))
assert types.MappingProxyType.__basicsize__ == ctypes.sizeof(MappingProxyStruct)
def mappingproxy_setitem(obj, key, val):
"""Set an item in a read-only mapping proxy"""
proxy = MappingProxyStruct.wrap(obj)
ctypes.pythonapi.PyDict_SetItem(proxy.mapping,
ctypes.py_object(key),
ctypes.py_object(val))
mappingproxy_setitem(dict.__dict__,
'get_version',
lambda self: DictStruct.wrap(self).ma_version_tag)

Duck typing trouble. Duck typing test for "i-am-like-a-list"

USAGE CONTEXT ADDED AT END
I often want to operate on an abstract object like a list. e.g.
def list_ish(thing):
for i in xrange(0,len(thing)):
print thing[i]
Now this appropriate if thing is a list, but will fail if thing is a dict for example. what is the pythonic why to ask "do you behave like a list?"
NOTE:
hasattr('__getitem__') and not hasattr('keys')
this will work for all cases I can think of, but I don't like defining a duck type negatively, as I expect there could be cases that it does not catch.
really what I want is to ask.
"hey do you operate on integer indicies in the way I expect a list to do?" e.g.
thing[i], thing[4:7] = [...], etc.
NOTE: I do not want to simply execute my operations inside of a large try/except, since they are destructive. it is not cool to try and fail here....
USAGE CONTEXT
-- A "point-lists" is a list-like-thing that contains dict-like-things as its elements.
-- A "matrix" is a list-like-thing that contains list-like-things
-- I have a library of functions that operate on point-lists and also in an analogous way on matrix like things.
-- for example, From the users point of view destructive operations like the "spreadsheet-like" operations "column-slice" can operate on both matrix objects and also on point-list objects in an analogous way -- the resulting thing is like the original one, but only has the specified columns.
-- since this particular operation is destructive it would not be cool to proceed as if an object were a matrix, only to find out part way thru the operation, it was really a point-list or none-of-the-above.
-- I want my 'is_matrix' and 'is_point_list' tests to be performant, since they sometimes occur inside inner loops. So I would be satisfied with a test which only investigated element zero for example.
-- I would prefer tests that do not involve construction of temporary objects, just to determine an object's type, but maybe that is not the python way.
in general I find the whole duck typing thing to be kinda messy, and fraught with bugs and slowness, but maybe I dont yet think like a true Pythonista
happy to drink more kool-aid...

One thing you can do, that should work quickly on a normal list and fail on a normal dict, is taking a zero-length slice from the front:
try:
thing[:0]
except TypeError:
# probably not list-like
else:
# probably list-like
The slice fails on dicts because slices are not hashable.
However, str and unicode also pass this test, and you mention that you are doing destructive edits. That means you probably also want to check for __delitem__ and __setitem__:
def supports_slices_and_editing(thing):
if hasattr(thing, '__setitem__') and hasattr(thing, '__delitem__'):
try:
thing[:0]
return True
except TypeError:
pass
return False
I suggest you organize the requirements you have for your input, and the range of possible inputs you want your function to handle, more explicitly than you have so far in your question. If you really just wanted to handle lists and dicts, you'd be using isinstance, right? Maybe what your method does could only ever delete items, or only ever replace items, so you don't need to check for the other capability. Document these requirements for future reference.

When dealing with built-in types, you can use the Abstract Base Classes. In your case, you may want to test against collections.Sequence or collections.MutableSequence:
if isinstance(your_thing, collections.Sequence):
# access your_thing as a list
This is supported in all Python versions after (and including) 2.6.
If you are using your own classes to build your_thing, I'd recommend that you inherit from these abstract base classes as well (directly or indirectly). This way, you can ensure that the sequence interface is implemented correctly, and avoid all the typing mess.
And for third-party libraries, there's no simple way to check for a sequence interface, if the third-party classes didn't inherit from the built-in types or abstract classes. In this case you'll have to check for every interface that you're going to use, and only those you use. For example, your list_ish function used __len__ and __getitem__, so only check whether these two methods exist. A wrong behavior of __getitem__ (e.g. a dict) should raise an exception.

Perhaps their is no ideal pythonic answer here, so I am proposing a 'hack' solution, but don't know enough about the class structure of python to know if I am getting this right:
def is_list_like(thing):
return hasattr(thing, '__setslice__')
def is_dict_like(thing):
return hasattr(thing, 'keys')
My reduce goals here are to simply have performant tests that will:
(1) never call a dict-thing, nor a string-like-thing a list List item
(2) returns the right answer for python types
(3) will return the right answer if someone implement a "full" set of core method for a list/dict
(4) is fast (ideally does not allocate objects during the test)
EDIT: Incorporated ideas from #DanGetz

list vs UserList and dict vs UserDict

Coding this day, which of the above is preferred and recommended (both in Python 2 and 3) for subclassing?
I read that UserList and UserDict have been introduced because in the past list and dict couldn't be subclassed, but since this isn't an issue anymore, is it encouraged to use them?

Depending on your usecase, these days you'd either subclass list and dict directly, or you can subclass collections.MutableSequence and collections. MutableMapping; these options are there in addition to using the User* objects.
The User* objects have been moved to the collections module in Python 3; but any code that used those in the Python 2 stdlib has been replaced with the collections.abc abstract base classes. Even in Python 2, UserList and UserDict are augmented collections.* implementations, adding methods list and dict provide beyond the basic interface.
The collections classes make it clearer what must be implemented for your subclass to be a complete implementation, and also let you implement smaller subsets (such as collections.Mapping, implementing a read-only mapping, or collections.Sequence for a tuple-like object).
The User* implementations should be used when you need to implement everything beyond the basic interface too; e.g. if you need to support addition, sorting, reversing and counting just like list does.
For anything else you are almost always better off using the collections abstract base classes as a basis; the built-in types are optimised for speed and are not that subclass-friendly. For example, you'll need to override just about every method on list where normally a new list is returned, to ensure your subclass is returned instead.
Only if you need to build code that insists on using a list or dict object (tested by using isinstance() is subclassing the types an option to consider. This is why collections.OrderedDict is a subclass of dict, for example.

No they are not encouraged anymore. You should not use the UserDict class as it is deprecated. The docs says you can just subclass dict directly. The userdict module is gone in Python 3.0

Why isn't the 'len' function inherited by dictionaries and lists in Python

example:
a_list = [1, 2, 3]
a_list.len() # doesn't work
len(a_list) # works
Python being (very) object oriented, I don't understand why the 'len' function isn't inherited by the object.
Plus I keep trying the wrong solution since it appears as the logical one to me

Guido's explanation is here:
First of all, I chose len(x) over x.len() for HCI reasons (def __len__() came much later). There are two intertwined reasons actually, both HCI:
(a) For some operations, prefix notation just reads better than postfix — prefix (and infix!) operations have a long tradition in mathematics which likes notations where the visuals help the mathematician thinking about a problem. Compare the easy with which we rewrite a formula like x*(a+b) into x*a + x*b to the clumsiness of doing the same thing using a raw OO notation.
(b) When I read code that says len(x) I know that it is asking for the length of something. This tells me two things: the result is an integer, and the argument is some kind of container. To the contrary, when I read x.len(), I have to already know that x is some kind of container implementing an interface or inheriting from a class that has a standard len(). Witness the confusion we occasionally have when a class that is not implementing a mapping has a get() or keys() method, or something that isn’t a file has a write() method.
Saying the same thing in another way, I see ‘len‘ as a built-in operation. I’d hate to lose that. /…/

The short answer: 1) backwards compatibility and 2) there's not enough of a difference for it to really matter. For a more detailed explanation, read on.
The idiomatic Python approach to such operations is special methods which aren't intended to be called directly. For example, to make x + y work for your own class, you write a __add__ method. To make sure that int(spam) properly converts your custom class, write a __int__ method. To make sure that len(foo) does something sensible, write a __len__ method.
This is how things have always been with Python, and I think it makes a lot of sense for some things. In particular, this seems like a sensible way to implement operator overloading. As for the rest, different languages disagree; in Ruby you'd convert something to an integer by calling spam.to_i directly instead of saying int(spam).
You're right that Python is an extremely object-oriented language and that having to call an external function on an object to get its length seems odd. On the other hand, len(silly_walks) isn't any more onerous than silly_walks.len(), and Guido has said that he actually prefers it (http://mail.python.org/pipermail/python-3000/2006-November/004643.html).

It just isn't.
You can, however, do:
>>> [1,2,3].__len__()
3
Adding a __len__() method to a class is what makes the len() magic work.

This way fits in better with the rest of the language. The convention in python is that you add __foo__ special methods to objects to make them have certain capabilities (rather than e.g. deriving from a specific base class). For example, an object is
callable if it has a __call__ method
iterable if it has an __iter__ method,
supports access with [] if it has __getitem__ and __setitem__.
...
One of these special methods is __len__ which makes it have a length accessible with len().

Maybe you're looking for __len__. If that method exists, then len(a) calls it:
>>> class Spam:
... def __len__(self): return 3
...
>>> s = Spam()
>>> len(s)
3

Well, there actually is a length method, it is just hidden:
>>> a_list = [1, 2, 3]
>>> a_list.__len__()
3
The len() built-in function appears to be simply a wrapper for a call to the hidden len() method of the object.
Not sure why they made the decision to implement things this way though.

there is some good info below on why certain things are functions and other are methods. It does indeed cause some inconsistencies in the language.
http://mail.python.org/pipermail/python-dev/2008-January/076612.html

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.