Can I extend syntax in python for dict comprehensions for other dicts, like the OrderedDict in collections module or my own types which inherit from dict?
Just rebinding the dict name obviously doesn't work, the {key: value} comprehension syntax still gives you a plain old dict for comprehensions and literals.
>>> from collections import OrderedDict
>>> olddict, dict = dict, OrderedDict
>>> {i: i*i for i in range(3)}.__class__
<type 'dict'>
So, if it's possible how would I go about doing that? It's OK if it only works in CPython. For syntax I guess I would try it with a O{k: v} prefix like we have on the r'various' u'string' b'objects'.
note: Of course we can use a generator expression instead, but I'm more interested seeing how hackable python is in terms of the grammar.
Sorry, not possible. Dict literals and dict comprehensions map to the built-in dict type, in a way that's hardcoded at the C level. That can't be overridden.
You can use this as an alternative, though:
OrderedDict((i, i * i) for i in range(3))
Addendum: as of Python 3.6, all Python dictionaries are ordered. As of 3.7, it's even part of the language spec. If you're using those versions of Python, no need for OrderedDict: the dict comprehension will Just Work (TM).
There is no direct way to change Python's syntax from within the language. A dictionary comprehension (or plain display) is always going to create a dict, and there's nothing you can do about that. If you're using CPython, it's using special bytecodes that generate a dict directly, which ultimately call the PyDict API functions and/or the same underlying functions used by that API. If you're using PyPy, those bytecodes are instead implemented on top of an RPython dict object which in turn is implemented on top of a compiled-and-optimized Python dict. And so on.
There is an indirect way to do it, but you're not going to like it. If you read the docs on the import system, you'll see that it's the importer that searches for cached compiled code or calls the compiler, and the compiler that calls the parser, and so on. In Python 3.3+, almost everything in this chain either is written in pure Python, or has an alternate pure Python implementation, meaning you can fork the code and do your own thing. Which includes parsing source with your own PyParsing code that builds ASTs, or compiling a dict comprehension AST node into your own custom bytecode instead of the default, or post-processing the bytecode, or…
In many cases, an import hook is sufficient; if not, you can always write a custom finder and loader.
If you're not already using Python 3.3 or later, I'd strongly suggest migrating before playing with this stuff. In older versions, it's harder, and less well documented, and you'll ultimately be putting in 10x the effort to learn something that will be obsolete whenever you do migrate.
Anyway, if this approach sounds interesting to you, you might want to take a look at MacroPy. You could borrow some code from it—and, maybe more importantly, learn how some of these features (that have no good examples in the docs) are used.
Or, if you're willing to settle for something less cool, you can just use MacroPy to build an "odict comprehension macro" and use that. (Note that MacroPy currently only works in Python 2.7, not 3.x.) You can't quite get o{…}, but you can get, say, od[{…}], which isn't too bad. Download od.py, realmain.py, and main.py, and run python main.py to see it working. The key is this code, which takes a DictionaryComp AST, converts it to an equivalent GeneratorExpr on key-value Tuples, and wraps it in a Call to collections.OrderedDict:
def od(tree, **kw):
pair = ast.Tuple(elts=[tree.key, tree.value])
gx = ast.GeneratorExp(elt=pair, generators=tree.generators)
odict = ast.Attribute(value=ast.Name(id='collections'),
attr='OrderedDict')
call = ast.Call(func=odict, args=[gx], keywords=[])
return call
A different alternative is, of course, to modify the Python interpreter.
I would suggest dropping the O{…} syntax idea for your first go, and just making normal dict comprehensions compile to odicts. The good news is, you don't really need to change the grammar (which is beyond hairy…), just any one of:
the bytecodes that dictcomps compile to,
the way the interpreter runs those bytecodes, or
the implementation of the PyDict type
The bad news, while all of those are a lot easier than changing the grammar, none of them can be done from an extension module. (Well, you can do the first one by doing basically the same thing you'd do from pure Python… and you can do any of them by hooking the .so/.dll/.dylib to patch in your own functions, but that's the exact same work as hacking on Python plus the extra work of hooking at runtime.)
If you want to hack on CPython source, the code you want is in Python/compile.c, Python/ceval.c, and Objects/dictobject.c, and the dev guide tells you how to find everything you need. But you might want to consider hacking on PyPy source instead, since it's mostly written in (a subset of) Python rather than C.
As a side note, your attempt wouldn't have worked even if everything were done at the Python language level. olddict, dict = dict, OrderedDict creates a binding named dict in your module's globals, which shadows the name in builtins, but doesn't replace it. You can replace things in builtins (well, Python doesn't guarantee this, but there are implementation/version-specific things-that-happen-to-work for every implementation/version I've tried…), but what you did isn't the way to do it.
Slightly modifying the response of #Max Noel, you can use list comprehension instead of a generator to create an OrderedDict in an ordered way (which of course is not possible using dict comprehension).
>>> OrderedDict([(i, i * i) for i in range(5)])
OrderedDict([(0, 0),
(1, 1),
(2, 4),
(3, 9),
(4, 16)])
Related
Looking at this question, I realised that it is kind of awkward to use multiprocessing's Pool.map if you want is to run a list of functions in parallel:
from multiprocessing import Pool
def my_fun1(): return 1
def my_fun2(): return 2
def my_fun3(): return 3
with Pool(3) as p:
one, two, three = p.map(lambda f: f(), [my_fun1, my_fun2, my_fun3])
I'm not saying it is exactly cryptic, but I think I expected some conventional name for this, even if only within functools or something, similarly to apply/call in JavaScript (yes, I know JavaScript didn't have lambdas at the time those functions were defined, and no, I'm not saying JavaScript is an exemplary programming language, just an example). In fact, I definitely think something like this should be present in operator, but (unless my eyes deceive me) it seems to be absent. I read that in the case of the identity function the resolution was to let people define their own trivial functions, and I understand it better in that case because there are a couple of different variations you may want, but this one feels like a missing bit to me.
EDIT: As pointed out in the comments, Python 2 used to have an apply function for this purpose.
First, let's look at the practical question.
For any Python from 2.3 on, you can trivially write not just your no-argument apply, but a perfect-forwarding apply, as a one-liner, as explained in the 2.x docs for apply:
The use of apply() is equivalent to function(*args, **keywords)
In other words:
def apply(function, *args, **keywords):
return function(*args, **keywords)
… or, as an inline lambda:
lambda f, *a, **k: f(*a, **kw)
Of course the C implementation was a bit faster, but this is almost never relevant.1
If you're going to be using this more than once, I think defining the function out-of-line and reusing it by name is probably clearer, but the lamdba version is simple and obvious enough (even more so for your no-args use case) that I can't imagine anyone complaining about it.
Also, notice that this is actually more trivial than identity if you understand what you're doing, not less. With identity, it's ambiguous what you should return with multiple arguments (or keyword arguments), so you have to decide which behavior you want; with apple, there's only one obvious answer, and it's pretty much impossible to get wrong.
As for the history:
Python, like JavaScript, originally had no lambda. It's hard to dig up linkable docs for versions before 2.6, and hard to even find them before 2.3, but I think lambda was added in 1.5, and eventually reached the point where it could be used for perfect forwarding around 2.2. Before then, the docs recommended using apply for forwarding, but after that, the docs recommended using lambda in place of apply. In fact, there was no longer any recommended use of apply.
So in 2.3, the function was deprecated.2
During the Python-3000 discussions that led to 3.0, Guido suggested that all of the "functional programming" functions except maybe map and filter were unnecessary.3 Others made good cases for reduce and partial.4 But a big part of the case was that they're actually not trivial to write (in fully-general form), and easy to get wrong. That isn't true for apply. Also, people were able to find relevant uses of reduce and partial in real-world codebases, but the only uses of apply anyone could find were old pre-2.3 code. In fact, it was so rare that it wasn't even worth making the 2to3 tool transform calls to apply.
The final rationale for removing it was summarized in PEP 3100:
apply(): use f(*args, **kw) instead [2]
That footnote links to an essay by Guido called "Python Regrets", which is now a 404 link. The accompanying PowerPoint presentation is still available, however, or you can view an HTML flipbook of the presentation he wrote it for. But all it really says is the same one-liner, and IIRC, the only further discussion was "We already effectively got rid of it in 2.3."
1. In most idiomatic Python code that has to apply a function, the work inside that function is pretty heavy. In your case, of course, the overhead of calling the functions (pickling arguments and passing them over a pipe) is even heavier. The one case where it would matter is when you're doing "Haskell-style functional programming" instead of "Lisp-style"—that is, very few function definitions, and lots of functions made by transforming functions and composing the results. But that's already so slow (and stack-heavy) in Python that it's not a reasonable thing to do. (Flat use of decorators to apply a wrapper or three works great, but a potentially unbounded chain of wrappers will kill your performance.)
2. The formal deprecation mechanism didn't exist yet, so it was just moved to a "Non-essential Built-in Functions" section in the docs. But it was retroactively considered to be deprecated since 2.3, as you can see in the 2.7 docs.
3. Guido originally wanted to get rid of even them; the argument was that list comprehensions can do the same job better, as you can see in the "Regrets" flipbook. But promoting itertools.imap in place of map means it could be made lazy, like the new zip, and therefore better than comprehensions. I'm not sure why Guido didn't just make the same argument with generator expressions.
4. I'm not sure Guido himself was ever convinced for reduce, but the core devs as a whole were.
It sort of is in operator if you do one line of extra work:
>>> def foo():
... print 'hi'
...
>>> from operator import methodcaller
>>> call = methodcaller('__call__')
>>> call(foo)
hi
Of course, call = lambda f: f() is only one line as well...
I've read about this cool new dictionary type, the transformdict
I want to use it in my project, by initializing a new transform dict with regular dict:
tran_d = TransformDict(str.lower, {'A':1, 'B':2})
which succeeds but when I run this:
tran_d.keys()
I get:
['A', 'B']
How would you suggest to execute the transform function on the parameter (regular) dict when creating the new transform dict?
Just to be clear I want the following:
tran_d.keys() == ['a', 'b']
I already said it in the comments but it's important to realize that this is not what TransformDict is meant to do. Therefore you could subclass it with a custom implementation for keys:
class MyTransformDict(TransformDict):
def keys(self):
return map(self.transform_func, super().keys())
Depending on your Python version you probably need to use list() around the map (Python 3) or provide arguments for super: super(TransformDict, self) (Python 2). But it should illustrate the principle.
As #Rawing pointed out in the comments there will be more methods that don't work as expected, i.e. __iter__, items and probably also __repr__.
Per the implementation I have seen, the transformation function can be achieved through a property named transform_func, so
list(map(tran_d.transform_func, tran_d.keys()))
should do.
I wouldn't bother using TransformDict. It has been proposed as PEP 455 and been rejected. This means it won't be a built-in feature, so you'd have to manually implement it on your own or use some library that does it.
The BDFL delegate's conclusions about the PEP can be found here. The stripped down version is:
It is less readable than converting keys before usage.
It breaks in strange ways that sometimes even emit wrong errors.
It introduces unneeded complexity, since using plain dicts avoids above problems.
In addition to #Ronan-Paixão answer
TransformDict was a hypergeneralization which sprang up out of wanting case-folding keys, but with no rigorous research into what real world users might need the generalization for --- meaning the user expectations of what it should do were not well thought through, as the original question illustrates.
A recommendation is to implement your own dictionary subclass, to fit your own use case, as other answers have suggested.
So rather than suggesting "do not use TransformDict" I would suggest, "build your own, but give your class a better more descriptive name", then you'll know what it does, will have it quarantined, and not encourage bad stuff in the repos.
A good reference in addition to the PEP 455 is Hettinger's presentation: http://il.pycon.org/2016/static/sessions/raymond-hettinger.pdf
I was wondering what is the correct way to check a key:value pair of a dict. Lets say I have this dict
dict_ = {
'key1':'val1',
'key2':'val2'
}
I can check a condition like this
if dict_['key1'] == 'val1'
but I feel like there is a more elegant way that takes advantage of the dict data structure.
What you're doing already does take advantage of the data structure, which is why it's "the one obvious way" to do what you want to do. (You can find examples like this all over the tutorial, the reference docs, and the stdlib implementation.)
However, I can see what you're thinking: the dict is in some sense a container of key-value pairs (even if it's only a collections.Container of keys…), so… shouldn't there be some way to just check whether a key-value pair exists?
Up to Python 2.6, there really isn't.* But in 3.0, the items() method returns a special set-like view of the key-value pairs. And 2.7 backported that functionality, under the name viewitems. So:
('key1', 'val1') in d.viewitems()
But I don't think that's really clearer or cleaner; "items" feels like a lower-level way to think of dictionaries than "mappings", which is what both your original code and smci's answer rely on.
It's also less concise, it doesn't work in 2.6 or earlier, and many dict-like mapping objects don't support it,** and it's and slightly slower on 2.7 to boot, but these are probably less important, and not what you asked about.
* Well, there is, but only by iterating over all of the items with iteritems, or using items to effectively do the same exhaustive search behind your back, neither of which is what you want.
** In fact, in 2.7, it's not actually possible to support it with a pure-Python class…
If you want to avoid throwing KeyError if dict doesn't even contain 'key1':
if dict_.get('key1')=='val1':
(However, throwing an exception for missing key is perfectly fine Python idiom.)
Otherwise, #Cyber is correct that it's already fine! (What exactly is the problem?)
There is a has_key function
dict_.has_key('key1')
This returns a boolean true or false.
Alternatively, you can have you get function return a default value when the key is not present.
dict_.get('key3','Default Value')
Modified typo*
Coding this day, which of the above is preferred and recommended (both in Python 2 and 3) for subclassing?
I read that UserList and UserDict have been introduced because in the past list and dict couldn't be subclassed, but since this isn't an issue anymore, is it encouraged to use them?
Depending on your usecase, these days you'd either subclass list and dict directly, or you can subclass collections.MutableSequence and collections. MutableMapping; these options are there in addition to using the User* objects.
The User* objects have been moved to the collections module in Python 3; but any code that used those in the Python 2 stdlib has been replaced with the collections.abc abstract base classes. Even in Python 2, UserList and UserDict are augmented collections.* implementations, adding methods list and dict provide beyond the basic interface.
The collections classes make it clearer what must be implemented for your subclass to be a complete implementation, and also let you implement smaller subsets (such as collections.Mapping, implementing a read-only mapping, or collections.Sequence for a tuple-like object).
The User* implementations should be used when you need to implement everything beyond the basic interface too; e.g. if you need to support addition, sorting, reversing and counting just like list does.
For anything else you are almost always better off using the collections abstract base classes as a basis; the built-in types are optimised for speed and are not that subclass-friendly. For example, you'll need to override just about every method on list where normally a new list is returned, to ensure your subclass is returned instead.
Only if you need to build code that insists on using a list or dict object (tested by using isinstance() is subclassing the types an option to consider. This is why collections.OrderedDict is a subclass of dict, for example.
No they are not encouraged anymore. You should not use the UserDict class as it is deprecated. The docs says you can just subclass dict directly. The userdict module is gone in Python 3.0
This question already has answers here:
The Zen of Python [closed]
(22 answers)
Python: Am I missing something? [closed]
(16 answers)
Closed 8 years ago.
I would be interested in knowing what the StackOverflow community thinks are the important language features (idioms) of Python. Features that would define a programmer as Pythonic.
Python (pythonic) idiom - "code expression" that is natural or characteristic to the language Python.
Plus, Which idioms should all Python programmers learn early on?
Thanks in advance
Related:
Code Like a Pythonista: Idiomatic Python
Python: Am I missing something?
Python is a language that can be described as:
"rules you can fit in the
palm of your hand with a huge bag of
hooks".
Nearly everything in python follows the same simple standards. Everything is accessible, changeable, and tweakable. There are very few language level elements.
Take for example, the len(data) builtin function. len(data) works by simply checking for a data.__len__() method, and then calls it and returns the value. That way, len() can work on any object that implements a __len__() method.
Start by learning about the types and basic syntax:
Dynamic Strongly Typed Languages
bool, int, float, string, list, tuple, dict, set
statements, indenting, "everything is an object"
basic function definitions
Then move on to learning about how python works:
imports and modules (really simple)
the python path (sys.path)
the dir() function
__builtins__
Once you have an understanding of how to fit pieces together, go back and cover some of the more advanced language features:
iterators
overrides like __len__ (there are tons of these)
list comprehensions and generators
classes and objects (again, really simple once you know a couple rules)
python inheritance rules
And once you have a comfort level with these items (with a focus on what makes them pythonic), look at more specific items:
Threading in python (note the Global Interpreter Lock)
context managers
database access
file IO
sockets
etc...
And never forget The Zen of Python (by Tim Peters)
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
This page covers all the major python idioms: http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html
An important idiom in Python is docstrings.
Every object has a __doc__ attribute that can be used to get help on that object. You can set the __doc__ attribute on modules, classes, methods, and functions like this:
# this is m.py
""" module docstring """
class c:
"""class docstring"""
def m(self):
"""method docstring"""
pass
def f(a):
"""function f docstring"""
return
Now, when you type help(m), help(m.f) etc. it will print the docstring as a help message.
Because it's just part of normal object introspection this can be used by documention generating systems like epydoc or used for testing purposes by unittest.
It can also be put to more unconventional (i.e. non-idiomatic) uses such as grammars in Dparser.
Where it gets even more interesting to me is that, even though doc is a read-only attribute on most objects, you can use them anywhere like this:
x = 5
""" pseudo docstring for x """
and documentation tools like epydoc can pick them up and format them properly (as opposed to a normal comment which stays inside the code formatting.
Decorators get my vote. Where else can you write something like:
def trace(num_args=0):
def wrapper(func):
def new_f(*a,**k):
print_args = ''
if num_args > 0:
print_args = str.join(',', [str(x) for x in a[0:num_args]])
print('entering %s(%s)' %(f.__name__,print_args))
rc = f(*a,**k)
if rc is not None:
print('exiting %s(%s)=%s' %(f.__name__,str(rc)))
else:
print('exiting %s(%s)' %(f.__name__))
return rc
return new_f
return wrapper
#trace(1)
def factorial(n):
if n < 2:
return 1
return n * factorial(n-1)
factorial(5)
and get output like:
entering factorial(5)
entering factorial(4)
entering factorial(3)
entering factorial(2)
entering factorial(1)
entering factorial(0)
exiting factorial(0)=1
exiting factorial(1)=1
exiting factorial(2)=2
exiting factorial(3)=6
exiting factorial(4)=24
exiting factorial(5)=120
Everything connected to list usage.
Comprehensions, generators, etc.
Personally, I really like Python syntax defining code blocks by using indentation, and not by the words "BEGIN" and "END" (as in Microsoft's Basic and Visual Basic - I don't like these) or by using left- and right-braces (as in C, C++, Java, Perl - I like these).
This really surprised me because, although indentation has always been very important to me, I didn't make to much "noise" about it - I lived with it, and it is considered a skill to be able to read other peoples, "spaghetti" code. Furthermore, I never heard another programmer suggest making indentation a part of a language. Until Python! I only wish I had realized this idea first.
To me, it is as if Python's syntax forces you to write good, readable code.
Okay, I'll get off my soap-box. ;-)
From a more advanced viewpoint, understanding how dictionaries are used internally by Python. Classes, functions, modules, references are all just properties on a dictionary. Once this is understood it's easy to understand how to monkey patch and use the powerful __gettattr__, __setattr__, and __call__ methods.
Here's one that can help. What's the difference between:
[ foo(x) for x in range(0, 5) ][0]
and
( foo(x) for x in range(0, 5) ).next()
answer:
in the second example, foo is called only once. This may be important if foo has a side effect, or if the iterable being used to construct the list is large.
Two things that struck me as especially Pythonic were dynamic typing and the various flavors of lists used in Python, particularly tuples.
Python's list obsession could be said to be LISP-y, but it's got its own unique flavor. A line like:
return HandEvaluator.StraightFlush, (PokerCard.longFaces[index + 4],
PokerCard.longSuits[flushSuit]), []
or even
return False, False, False
just looks like Python and nothing else. (Technically, you'd see the latter in Lua as well, but Lua is pretty Pythonic in general.)
Using string substitutions:
name = "Joe"
age = 12
print "My name is %s, I am %s" % (name, age)
When I'm not programming in python, that simple use is what I miss most.
Another thing you cannot start early enough is probably testing. Here especially doctests are a great way of testing your code by explaining it at the same time.
doctests are simple text file containing an interactive interpreter session plus text like this:
Let's instantiate our class::
>>> a=Something(text="yes")
>>> a.text
yes
Now call this method and check the results::
>>> a.canify()
>>> a.text
yes, I can
If e.g. a.text returns something different the test will fail.
doctests can be inside docstrings or standalone textfiles and are executed by using the doctests module. Of course the more known unit tests are also available.
I think that tutorials online and books only talk about doing things, not doing things in the best way. Along with the python syntax i think that speed in some cases is important.
Python provides a way to benchmark functions, actually two!!
One way is to use the profile module, like so:
import profile
def foo(x, y, z):
return x**y % z # Just an example.
profile.run('foo(5, 6, 3)')
Another way to do this is to use the timeit module, like this:
import timeit
def foo(x, y, z):
return x**y % z # Can also be 'pow(x, y, z)' which is way faster.
timeit.timeit('foo(5, 6, 3)', 'from __main__ import *', number = 100)
# timeit.timeit(testcode, setupcode, number = number_of_iterations)