list vs UserList and dict vs UserDict - python

Coding this day, which of the above is preferred and recommended (both in Python 2 and 3) for subclassing?
I read that UserList and UserDict have been introduced because in the past list and dict couldn't be subclassed, but since this isn't an issue anymore, is it encouraged to use them?

Depending on your usecase, these days you'd either subclass list and dict directly, or you can subclass collections.MutableSequence and collections. MutableMapping; these options are there in addition to using the User* objects.
The User* objects have been moved to the collections module in Python 3; but any code that used those in the Python 2 stdlib has been replaced with the collections.abc abstract base classes. Even in Python 2, UserList and UserDict are augmented collections.* implementations, adding methods list and dict provide beyond the basic interface.
The collections classes make it clearer what must be implemented for your subclass to be a complete implementation, and also let you implement smaller subsets (such as collections.Mapping, implementing a read-only mapping, or collections.Sequence for a tuple-like object).
The User* implementations should be used when you need to implement everything beyond the basic interface too; e.g. if you need to support addition, sorting, reversing and counting just like list does.
For anything else you are almost always better off using the collections abstract base classes as a basis; the built-in types are optimised for speed and are not that subclass-friendly. For example, you'll need to override just about every method on list where normally a new list is returned, to ensure your subclass is returned instead.
Only if you need to build code that insists on using a list or dict object (tested by using isinstance() is subclassing the types an option to consider. This is why collections.OrderedDict is a subclass of dict, for example.

No they are not encouraged anymore. You should not use the UserDict class as it is deprecated. The docs says you can just subclass dict directly. The userdict module is gone in Python 3.0

Related

Difference between collections.abc.Sequence and typing.Sequence [duplicate]

This question already has an answer here:
collections.Iterable vs typing.Iterable in type annotation and checking for Iterable
(1 answer)
Closed 21 days ago.
This post was edited and submitted for review 21 days ago.
I was reading an article and about collection.abc and typing class in the python standard library and discover both classes have the same features.
I tried both options using the code below and got the same results
from collections.abc import Sequence
def average(sequence: Sequence):
return sum(sequence) / len(sequence)
print(average([1, 2, 3, 4, 5])) # result is 3.0
from typing import Sequence
def average(sequence: Sequence):
return sum(sequence) / len(sequence)
print(average([1, 2, 3, 4, 5])) # result is 3.0
Under what condition will collection.abc become a better option to typing. Are there benefits of using one over the other?
Good on you for using type annotations! As the documentations says, if you are on Python 3.9+, you should most likely never use typing.Sequence due to its deprecation. Since the introduction of generic alias types in 3.9 the collections.abc classes all support subscripting and should be recognized correctly by static type checkers of all flavors.
So the benefit of using collections.abc.T over typing.T is mainly that the latter is deprecated and should not be used.
As mentioned by jsbueno in his answer, annotations will never have runtime implications either way, unless of course they are explicitly picked up by a piece of code. They are just an essential part of good coding style. But your function would still work, i.e. your script would execute without error, even if you annotated your function with something absurd like def average(sequence: 4%3): ....
Proper annotations are still extremely valuable. Thus, I would recommend you get used to some of the best practices as soon as possible. (A more-or-less strict static type checker like mypy is very helpful for that.) For one thing, when you are using generic types like Sequence, you should always provide the appropriate type arguments. Those may be type variables, if your function is also generic or they may be concrete types, but you should always include them.
In your case, assuming you expect the contents of your sequence to be something that can be added with the same type and divided by an integer, you might want to e.g. annotate it as Sequence[float]. (In the Python type system, float is considered a supertype of int, even though there is no nominal inheritance.)
Another recommendation is to try and be as broad as possible in the parameter types. (This echoes the Python paradigm of dynamic typing.) The idea is that you just specify that the object you expect must be able to "quack", but you don't say it must be a duck.
In your example, since you are reliant on the argument being compatible with sum as well as with len, you should consider what types those functions expect. The len function is simple, since it basically just calls the __len__ method of the object you pass to it. The sum function is more nuanced, but in your case the relevant part is that it expects an iterable of elements that can be added (e.g. float).
If you take a look at the collections ABCs, you'll notice that Sequence actually offers much more than you need, being that it is a reversible collection. A Collection is the broadest built-in type that fulfills your requirements because it has __iter__ (from Iterable) and __len__ (from Sized). So you could do this instead:
from collections.abc import Collection
def average(numbers: Collection[float]) -> float:
return sum(numbers) / len(numbers)
(By the way, the parameter name should not reflect its type.)
Lastly, if you wanted to go all out and be as broad as possible, you could define your own protocol that is even broader than Collection (by getting rid of the Container inheritance):
from collections.abc import Iterable, Sized
from typing import Protocol, TypeVar
T = TypeVar("T", covariant=True)
class SizedIterable(Sized, Iterable[T], Protocol[T]):
...
def average(numbers: SizedIterable[float]) -> float:
return sum(numbers) / len(numbers)
This has the advantage of supporting very broad structural subtyping, but is most likely overkill.
(For the basics of Python typing, PEP 483 and PEP 484 are a must-read.)
Actually, in your code you need neither of those:
Typing with annotations, which is what you are doing with your imported Sequences class is an optional feature, meant for (1) quick documentation; (2) checking of the code before it is run by static code analysers such as Mypy.
The fact is that some IDEs use the result of static checking by default in their recomented configurations, and they can make it look like code without annotations is "faulty": it is not - this is an optional feature.
As long as the object you pass into your function respect some of the Sequence interface it will need, it will work (it needs __len__ and __getitem__ as is)
Just run your code without annotations and see it work:
def average(myvariable):
return sum(myvariable) / len(myvariable)
That said, here is what is happening: list is "the sequence" by excellence in Python, and implements everything a sequence needs.
typing.Sequence is just an indicator for the static-checker tools that the data marked with it should respect the Sequence protocol, and does nothing at run time. You can't instantiate it. You can inherit from it (probably) but just to specialize other markers for typing, not for anything that will have any effect during actual program execution.
On the other hand collections.abc.Sequence predates the optional typing recomendations in PEP 484: it works as both a "virtual super class" which can indicate everything that works as a sequence in runtime (through the use of isinstance) (*). AND it can be used as a solid base class to implement fully functional cusotm Sequence classes of your own: just inherit from collections.abc.Sequence and implement functional __getitem__ and __len__ methods as indicated in the docs here: https://docs.python.org/3/library/collections.abc.html (that is for read only sequences - for mutable sequences, check collections.abc.MutableSequence, of course).
(*) for your custom sequence implementation to be recognized as a Sequence proper it has to be "registered" in runtime with a call to collections.abc.Sequence.register. However, AFAIK, most tools for static type checking do not recognize this, and will error in their static analysis)

Python typed collections

I am new to python (using python 3.6).
I have some class that represents amounts of some fictional coins.
So an instance could represent say 10 bluecoins or negative sums such as -20 redcoins and so on.
I can now hold in a list several such CoinAmounts in a list.
e.g.
[CoinAmount(coin='blue',amount=-10), CoinAmount(coin='blue',amount=20),
CoinAmount(coin='red',amount=5), CoinAmount(coin='red',amount=-5),
CoinAmount(coin='green',amount=5)]
I want to be able to "compress" the above list by summing each type of coin so that I will have.
[CoinAmount(coin='blue',amount=10), CoinAmount(coin='green',amount=5)]
or
[CoinAmount(coin='blue',amount=10), CoinAmount(coin='red',amount=0), CoinAmount(coin='green',amount=5)]
from which it is easy to derive the former...
My Q's are:
1) Would it make sense to have some sort of a ListOfCoinAmounts that subclasses list and adds a compress method? or should I use so CoinAmountUtils class that has a static method that works on a list and Compreses it?
2) Is there a way to ensure that the list actually holds only CoinAmounts or is this should just be assumed and followed (or both - i.e. it can be done but shouldn't ?
3) In a more general way what is the best practice "pythonic" way to handle a "List of something specific"?
Inheritance - when not used for typing - is mostly a very restricted form of composition / delegation, so inheriting from list is ihmo a bad design.
Having some CoinContainer class that delegates to a list is a much better design, in that 1/ it gives you full control of the API and 2/ it lets you change the implementation as you want (you may find out that a list is not the best container for your needs).
Also it will be easier to implement since you don't have to make sure you override all of the list methods and magicmethods, only the ones you need (cf point #1).
wrt/ type-cheking, it's usually not considered pythonic - it's the client code responsability to make sure it only passes compatible objects. If you really want some type-checking here at least use an ABC and test against this ABC, not against a fixed type.
1) Subclassing list and having only CoinAmount type of elements in it is a good and cleaner method IMO.
2) Yes, that can be done. You can inherit the python list and override append method to check for types.
A good example here : Overriding append method after inheriting from a Python List
3) A good practice is indeed extending the list and putting your customizations.

How do I define a postfix function in Python?

I know that if you create your own object you can define your own methods on that object.
my_object_instance.mymethod()
I also know you can define infix functions with the infix package.
obj1 |func| obj2
What I want is the ability to define a function which accepts an existing type in postfix notation.
For example given a list l we may want to check if it is sorted. Defining a typical function might give us
if is_sorted(l): #dosomething
but it might be more idiomatic if one could write
if l.is_sorted(): #dosomething
Is this possible without creating a custom type?
The correct way is inheritance, creating a custom type by inheriting list and adding the new functionality. Monkeypatching is not a strength of Python. But since you specifically asked:
Is this possible without creating a custom type?
What kindall mentioned stands, Python does not allow it. But since nothing in the implementation is truly read-only, you can approximate the result by hacking in the class dict.
>>> def is_sorted(my_list):
... return sorted(my_list) == my_list
...
>>> import gc
>>> gc.get_referents(list.__dict__)[0]['is_sorted'] = is_sorted
>>> [1,2,3].is_sorted()
True
>>> [1,3,2].is_sorted()
False
The new "method" will appear in vars(list), the name will be there in dir([]), and it will also be available/usable on instances which were created before the monkeypatch was applied.
This approach uses the garbage collector interface to obtain, via the class mappingproxy, a reference to the underlying dict. And garbage collection by reference counting is a CPython implementation detail. Suffice it to say, this is dangerous/fragile and you should not use it in any serious code.
If you like this kind of feature, you might enjoy ruby as a programming language.
Python does not generally allow monkey-patching of built-in types because the common built-in types aren't written in Python (but rather C) and do not allow the class dictionary to be modified. You have to subclass them to add methods as you want to.

OrderedDict comprehensions

Can I extend syntax in python for dict comprehensions for other dicts, like the OrderedDict in collections module or my own types which inherit from dict?
Just rebinding the dict name obviously doesn't work, the {key: value} comprehension syntax still gives you a plain old dict for comprehensions and literals.
>>> from collections import OrderedDict
>>> olddict, dict = dict, OrderedDict
>>> {i: i*i for i in range(3)}.__class__
<type 'dict'>
So, if it's possible how would I go about doing that? It's OK if it only works in CPython. For syntax I guess I would try it with a O{k: v} prefix like we have on the r'various' u'string' b'objects'.
note: Of course we can use a generator expression instead, but I'm more interested seeing how hackable python is in terms of the grammar.
Sorry, not possible. Dict literals and dict comprehensions map to the built-in dict type, in a way that's hardcoded at the C level. That can't be overridden.
You can use this as an alternative, though:
OrderedDict((i, i * i) for i in range(3))
Addendum: as of Python 3.6, all Python dictionaries are ordered. As of 3.7, it's even part of the language spec. If you're using those versions of Python, no need for OrderedDict: the dict comprehension will Just Work (TM).
There is no direct way to change Python's syntax from within the language. A dictionary comprehension (or plain display) is always going to create a dict, and there's nothing you can do about that. If you're using CPython, it's using special bytecodes that generate a dict directly, which ultimately call the PyDict API functions and/or the same underlying functions used by that API. If you're using PyPy, those bytecodes are instead implemented on top of an RPython dict object which in turn is implemented on top of a compiled-and-optimized Python dict. And so on.
There is an indirect way to do it, but you're not going to like it. If you read the docs on the import system, you'll see that it's the importer that searches for cached compiled code or calls the compiler, and the compiler that calls the parser, and so on. In Python 3.3+, almost everything in this chain either is written in pure Python, or has an alternate pure Python implementation, meaning you can fork the code and do your own thing. Which includes parsing source with your own PyParsing code that builds ASTs, or compiling a dict comprehension AST node into your own custom bytecode instead of the default, or post-processing the bytecode, or…
In many cases, an import hook is sufficient; if not, you can always write a custom finder and loader.
If you're not already using Python 3.3 or later, I'd strongly suggest migrating before playing with this stuff. In older versions, it's harder, and less well documented, and you'll ultimately be putting in 10x the effort to learn something that will be obsolete whenever you do migrate.
Anyway, if this approach sounds interesting to you, you might want to take a look at MacroPy. You could borrow some code from it—and, maybe more importantly, learn how some of these features (that have no good examples in the docs) are used.
Or, if you're willing to settle for something less cool, you can just use MacroPy to build an "odict comprehension macro" and use that. (Note that MacroPy currently only works in Python 2.7, not 3.x.) You can't quite get o{…}, but you can get, say, od[{…}], which isn't too bad. Download od.py, realmain.py, and main.py, and run python main.py to see it working. The key is this code, which takes a DictionaryComp AST, converts it to an equivalent GeneratorExpr on key-value Tuples, and wraps it in a Call to collections.OrderedDict:
def od(tree, **kw):
pair = ast.Tuple(elts=[tree.key, tree.value])
gx = ast.GeneratorExp(elt=pair, generators=tree.generators)
odict = ast.Attribute(value=ast.Name(id='collections'),
attr='OrderedDict')
call = ast.Call(func=odict, args=[gx], keywords=[])
return call
A different alternative is, of course, to modify the Python interpreter.
I would suggest dropping the O{…} syntax idea for your first go, and just making normal dict comprehensions compile to odicts. The good news is, you don't really need to change the grammar (which is beyond hairy…), just any one of:
the bytecodes that dictcomps compile to,
the way the interpreter runs those bytecodes, or
the implementation of the PyDict type
The bad news, while all of those are a lot easier than changing the grammar, none of them can be done from an extension module. (Well, you can do the first one by doing basically the same thing you'd do from pure Python… and you can do any of them by hooking the .so/.dll/.dylib to patch in your own functions, but that's the exact same work as hacking on Python plus the extra work of hooking at runtime.)
If you want to hack on CPython source, the code you want is in Python/compile.c, Python/ceval.c, and Objects/dictobject.c, and the dev guide tells you how to find everything you need. But you might want to consider hacking on PyPy source instead, since it's mostly written in (a subset of) Python rather than C.
As a side note, your attempt wouldn't have worked even if everything were done at the Python language level. olddict, dict = dict, OrderedDict creates a binding named dict in your module's globals, which shadows the name in builtins, but doesn't replace it. You can replace things in builtins (well, Python doesn't guarantee this, but there are implementation/version-specific things-that-happen-to-work for every implementation/version I've tried…), but what you did isn't the way to do it.
Slightly modifying the response of #Max Noel, you can use list comprehension instead of a generator to create an OrderedDict in an ordered way (which of course is not possible using dict comprehension).
>>> OrderedDict([(i, i * i) for i in range(5)])
OrderedDict([(0, 0),
(1, 1),
(2, 4),
(3, 9),
(4, 16)])

How to deepcopy shelve objects in Python

Is it possible to deepcopy a shelve object in Python? When I try to deepcopy it, I get the following error:
import shelve,copy
input = shelve.open("test.dict", writeback=True)
input.update({"key1": 1, "key2": 2})
newinput = copy.deepcopy(input)
>> object.__new__(DB) is not safe, use DB.__new__()
Does it mean shelves are not-copyable?
Edit: Maybe it might be better if I elaborate my problem more: I am keeping a large dictionary as a shelve object, and I want to save the whole shelve object (= all key, val pairs I generated so far) to a seperate file while I keep adding new items to the original dict.
Probably I could first sync the shelve and copy the shelve file on disk explicitly, however I don't like that approach.
No, I don't think they are copiable (unless you monkey patch the class or convert into a dict). Here's why :
copy.copy() and copy.deepcopy() call the __copy__() and __deepcopy__() methods for the instances which does not depend on a "standard" type (which are atomic, list, tuple and instance methods ). If the class does not have those attributes, it falls back to __reduce_ex__ and __reduce__ . (see copy.py in your sources)
Unfortunately, the shelve object Shelf is based on UserDict.DictMixin which does not define copy() (and neither does Shelf) :
class DictMixin:
# Mixin defining all dictionary methods for classes that already have
# a minimum dictionary interface including getitem, setitem, delitem,
# and keys. Without knowledge of the subclass constructor, the mixin
# does not define __init__() or copy(). In addition to the four base
# methods, progressively more efficiency comes with defining
# __contains__(), __iter__(), and iteritems().
It may be a good idea to submit an issue to the shelve module bug tracker.
You could obtain a shallow copy by dict(input) and deepcopy that. Then maybe create another shelve on a new file and populate it via the update method.
newinput = shelve.open("newtest.dict")
newinput.update(copy.deepcopy(dict(input)))

Categories