I'm trying to understand what 'implicit' and 'explicit' really means in the context of Python.
a = []
# my understanding is that this is implicit
if not a:
print("list is empty")
# my understanding is that this is explicit
if len(a) == 0:
print("list is empty")
I'm trying to follow the Zen of Python rules, but I'm curious to know if this applies in this situation or if I am over-thinking it?
The two statements have very different semantics. Remember that Python is dynamically typed.
For the case where a = [], both not a and len(a) == 0 are equivalent. A valid alternative might be to check not len(a). In some cases, you may even want to check for both emptiness and listness by doing a == [].
But a can be anything. For example, a = None. The check not a is fine, and will return True. But len(a) == 0 will not be fine at all. Instead you will get TypeError: object of type 'NoneType' has no len(). This is a totally valid option, but the if statements do very different things and you have to pick which one you want.
(Almost) everything has a __bool__ method in Python, but not everything has __len__. You have to decide which one to use based on the situation. Things to consider are:
Have you already verified whether a is a sequence?
Do you need to?
Do you mind if your if statement crashed on non-sequences?
Do you want to handle other falsy objects as if they were empty lists?
Remember that making the code look pretty takes second place to getting the job done correctly.
Though this question is old, I'd like to offer a perspective.
In a dynamic language, my preference would be to always describe the expected type and objective of a variable in order to offer more purpose understanding. Then use the knowledge of the language to be succinct and increase readability where possible (in python, an empty list's boolean result is false). Thus the code:
lst_colours = []
if not lst_colours:
print("list is empty")
Even better to convey meaning is using a variable for very specific checks.
lst_colours = []
b_is_list_empty = not lst_colours
if b_is_list_empty:
print("list is empty")
Checking a list is empty would be a common thing to do several times in a code base. So even better such things in a separate file helper function library. Thus isolating common checks, and reducing code duplication.
lst_colours = []
if b_is_list_empty(lst_colours):
print("list is empty")
def b_is_list_empty (lst):
......
Most importantly, add meaning as much as possible, have an agreed company standard to chose how to tackle the simple things, like variable naming and implicit/explicit code choices.
Try to think of:
if not a:
...
as shorthand for:
if len(a) == 0:
...
I don't think this is a good example of a gotcha with Python's Zen rule of "explicit" over "implicit". This is done rather mostly because of readability. It's not that the second one is bad and the other is good. It's just that the first one is more skillful. If one understands boolean nature of lists in Python, I think you find the first is more readable and readability counts in Python.
Related
While writing state machines to analyze different types of text data, independent of language used (VBA to process .xls contents using arrays/dictionaries or PHP/Python to make SQL insert queries out of .csv's) I often ran into neccesity of something like
boolean = False
while %sample statement%:
x = 'many different things'
if boolean == False:
boolean = True
else:
%action that DOES depend on contents of x
that need to do every BUT first time I get to it%
Every time I have to use a construction like this, I can't help feeling noob. Dear algorithmic gurus, can you assure me that it's the only way out and there is nothing more elegant? Any way to specify that some statement should be "burnt after reading"? So that some stupid boolean is not going to be checked each iteration of the loop
The only things that come across as slightly "noob" about this style are:
Comparing a boolean variable to True or False. Just write if <var> or if not <var>. (I'll ignore the = vs == as a typo!)
Not giving the boolean variable a good name. I know that here boolean is just a placeholder name, but in general using a name like first_item_seen rather than something generic can make the code a lot more readable:
first_item_seen = False
while [...]:
[...]
if first_item_seen:
[...]
else:
first_item_seen = True
Another suggestion that can work in some circumstances is to base the decision on another variable that naturally conveys the same state. For instance, it's relatively common to have a variable that contains None for the first iteration, but contains a value for later iterations (e.g. the result so far); using this can make the code slightly more efficient and often slightly clearer.
If I understand your problem correctly, I'd try something like
x = 'many different things'
while %sample statements%:
x = 'many different things'
action_that_depends_on_x()
It is almost equivalent; the only difference is that in your version the loop body could be never executed (hence x never being computed, hence no side effects of computing x), in my version it is always computed at least once.
Often times I find that, when working with Pythonic sets, the Pythonic way seems to be absent.
For example, doing something like a dijkstra or a*:
openSet, closedSet = set(nodes), set(nodes)
while openSet:
walkSet, openSet = openSet, set()
for node in walkSet:
for dest in node.destinations():
if dest.weight() < constraint:
if dest not in closedSet:
closedSet.add(dest)
openSet.add(dest)
This is a weakly contrived example, the focus is the last three lines:
if not value in someSet:
someSet.add(value)
doAdditionalThings()
Given the Python way of working with, for example, accessing/using values of a dict, I would have expected to be able to do:
try:
someSet.add(value)
except KeyError:
continue # well, that's ok then.
doAdditionalThings()
As a C++ programmer, my skin crawls a bit that I can't even do:
if someSet.add(value):
# add wasn't blocked by the value already being present
doAdditionalThings()
Is there a more Pythonic (and if possible more efficient) way to work with this sort of set-as-guard usage?
The add operation is not supposed to also tell you if the item was already in the set; it just makes sure it is in there after you add it. Or put another way, what you want is not "add an item and check if it worked"; you want to first check if the item is there, and if not, then do some special stuff. If all you wanted to do was add the item, you wouldn't do the check at all. There is nothing unpythonic about this pattern:
if item not in someSet:
someSet.add(item)
doStuff()
else:
doOtherStuff()
It is true that the API could have been designed so that .add returned whether the item was already in there, but in my experience that's not a particularly common use case. Part of the point of sets is that you can freely add items without worrying about whether they were already in there (since adding an already-included item has no effect). Also, having .add return None is consistent with the general convention for Python builtin types that methods that mutate their arguments return None. It is really things like dict.setdefault (which gets an item but first adds it if isn't there) that are the unusual case.
Recently, I saw some discussions online about how there is no good "switch / case" equivalent in Python. I realize that there are several ways to do something similar - some with lambda, some with dictionaries. There have been other StackOverflow discussions about the alternatives. There were even two PEPs (PEP 0275 and PEP 3103) discussing (and rejecting) the integration of switch / case into the language.
I came up with what I think is an elegant way to do switch / case.
It ends up looking like this:
from switch_case import switch, case # note the import style
x = 42
switch(x) # note the switch statement
if case(1): # note the case statement
print(1)
if case(2):
print(2)
if case(): # note the case with no args
print("Some number besides 1 or 2")
So, my questions are: Is this a worthwhile creation? Do you have any suggestions for making it better?
I put the include file on github, along with extensive examples. (I think the entire include file is about 50 executable lines, but I have 1500 lines of examples and documentation.) Did I over-engineer this thing, and waste a bunch of time, or will someone find this worthwhile?
Edit:
Trying to explain why this is different from other approaches:
1) Multiple paths are possible (executing two or more cases),
which is harder in the dictionary method.
2) can do checking for comparisons other than "equals"
(such as case(less_than(1000)).
3) More readable than the dictionary method, and possibly if/elif method
4) can track how many True cases there were.
5) can limit how many True cases are permitted. (i.e. execute the
first 2 True cases of...)
6) allows for a default case.
Here's a more elaborate example:
from switch_case import switch, case, between
x=12
switch(x, limit=1) # only execute the FIRST True case
if case(between(10,100)): # note the "between" case Function
print ("%d has two digits."%x)
if case(*range(0,100,2)): # note that this is an if, not an elif!
print ("%d is even."%x) # doesn't get executed for 2 digit numbers,
# because limit is 1; previous case was True.
if case():
print ("Nothing interesting to say about %d"%x)
# Running this program produces this output:
12 has two digits.
Here's an example attempting to show how switch_case can be more clear and concise than conventional if/else:
# conventional if/elif/else:
if (status_code == 2 or status_code == 4 or (11 <= status_code < 20)
or status_code==32):
[block of code]
elif status_code == 25 or status_code == 45:
[block of code]
if status_code <= 100:
[block can get executed in addition to above blocks]
# switch_case alternative (assumes import already)
switch(status_code)
if case (2, 4, between(11,20), 32): # significantly shorter!
[block of code]
elif case(25, 45):
[block of code]
if case(le(100)):
[block can get executed in addition to above blocks]
The big savings is in long if statements where the same switch is repeated over and over. Not sure how frequent of a use-case that is, but there seems to be certain cases where this makes sense.
The example file on github has even more examples.
So, my questions are: Is this a worthwhile creation?
No.
Do you have any suggestions for making it better?
Yes. Don't bother. What has it saved? Seriously? You have actually made the code more obscure by removing the variable x from each elif condition.. Also, by replacing the obvious elif with if you have created intentional confusion for all Python programmers who will now think that the cases are independent.
This creates confusion.
The big savings is in long if statements where the same switch is repeated over and over. Not sure how frequent of a use-case that is, but there seems to be certain cases where this makes sense.
No. It's very rare, very contrived and very hard to read. Seeing the actual variable(s) involved is essential. Eliding the variable name makes things intentionally confusing. Now I have to go find the owning switch() function to interpret the case.
When there are two or more variables, this completely collapses.
There have been a plethora of discussions that address this issue on Stackoverflow. You can use the search function at the top to look for some other discussions.
However, I fail to see how your solution is better than a basic dictionary:
def switch(x):
return {
1 : 1,
2 : 2,
}[x]
Although, adding a default clause is non-trivial with this method. However, your example seems to replicate a complex if/else statement anyway ? Not sure if I would include an external library for this.
IMHO, the main reason for the switch statement to exist is so it can be translated/compiled into a (very fast) jump table. How would your proposed implementation accomplish that goal? Python's dictionaries do it today, as other posters have shown.
Secondarily, I guess a switch statement might read more clearly than the alternatives in some languages, but in python's case I think if/elif/else wins on clarity.
from pyswitch import Switch # pyswitch can be found on PyPI
myswitch = Switch()
#myswitch.case(42)
def case42(value):
print "I got 42!"
#myswitch.case(range(10))
def caseRange10(value):
print "I got a number from 0-9, and it was %d!" % value
#myswitch.caseIn('lo')
def caseLo(value):
print "I got a string with 'lo' in it; it was '%s'" % value
#myswitch.caseRegEx(r'\b([Pp]y\w)\b')
def caseReExPy(matchOb):
print r"I got a string that matched the regex '\b[Pp]y\w\b', and the match was '%s'" % matchOb.group(1)
#myswitch.default
def caseDefault(value):
print "Hey, default handler here, with a value of %r." % value
myswitch(5) # prints: I got a number from 0-9, and it was 5!
myswitch('foobar') # prints: Hey, default handler here, with a value of foobar.
myswitch('The word is Python') # prints: I got a string that matched the regex '\b[Pp]y\w\b', and the match was 'Python'
You get the idea. Why? Yep, dispatch tables are the way to go in Python. I just got tired of writing them over and over, so I wrote a class and some decorators to handle it for me.
I have always just used dictionaries, if/elses, or lambdas for my switch like statements. Reading through your code tho =)
docs:
why-isn-t-there-a-switch-or-case-statement-in-python
Update 2021: match-case introduced in Python 3.10
This hotly debated topic can now be closed.
In fact Python 3.10 released in October 2021 introduces structural pattern matching which brings a match-case construct to the language.
See this related answer for details.
Consider:
categories = {'foo':[4], 'mer':[2, 9, 0]}
key = 'bar'
value = 5
We could safely append to a list stored in a dictionary in either of the following ways:
Being cautious, we always check whether the list exists before appending to it.
if not somedict.has_key(key):
somedict[key] = []
somedict[key].append(value)
Being direct, we simply clean up if there is an exception.
try:
somedict[key].append(value)
except KeyError:
somedict[key] = [value]
In both cases, the result could be:
{'foo':[4], 'mer':[2, 9, 0], 'bar':[5]}
To restate my question: In simple instances like this, is it better (in terms of style, efficiency, & philosophy) to be cautious or direct?
What you'll find is that your option 1 "being cautious" is often remarkably slow. Also, it's subject to obscure errors because the test you tried to write to "avoid" the exception is incorrect.
What you'll find is that your option 2 "being direct" is often much faster. It's also more likely to be correct, as well as faster and easier for people to read.
Why? Internally, Python often implements things like "contains" or "has_key" as an exception test.
def has_key( self, some_key ):
try:
self[some_key]
except KeyError:
return False
return True
Since this is typically how a has_key type of method is implemented, there's no reason for you code do waste time doing this in addition to what Python will already do.
More fundamentally, there's a correctness issue. Many attempts to prevent or avoid an exception are incomplete are incorrect.
For example, trying to establish if a string is potentially a float-point number is fraught with numerous exceptions and special cases. About the only way to do it correctly is this.
try:
x= float( some_string )
except ValueError:
# not a floating-point value
Just do the algorithm without worrying about "preventing" or "avoiding" exceptions.
In the general case, EFAP ("easier to ask for forgiveness than permission") is preferred in Python. Of course the rule of thumb "exceptions should be for exceptional cases" still holds (if you expect an exception to occur frequently, you propably should "look before you leap") - i.e. it depends. Efficiency-wise, it shouldn't make too much of a difference in most cases - when it does, consider that try blocks without exceptions are cheap and conditions are always checked.
Note that neither is necessary (at least you don't have to do it yourself/epplicitly) some cases, including this example - here, you should just use collections.defaultdict
You don't need a strong, compelling reason to use exceptions--they're not very expensive in Python. Here are some possible reasons to prefer one or the other, for your particular example:
The exception version requires a simpler API. Any container that supports item lookup and assignment (__getitem__ and __setitem__) will work. The non-exception version additionally requires that has_key be implemented.
The exception version may be slightly faster if the key usually exists, since it only requires a single dict lookup. The has_key version requires at least two--one for has_key and one for the actual lookup.
The non-exception version has a more consistent code path: it always puts the value in the array in the same place. By comparison, the exception version has a separate code path for each case.
Unless performance is particularly important (in which case you'd be benchmarking and profiling), none of these are very strong reasons; just use whichever seems more natural.
try is fast enough, except (if it happens) may not be. If the average length of those lists is going to be 1.1, use the check-first method. If it's going to be in the thousands, use try/except. If you are really worried, benchmark the alternatives.
Ensure that you are benchmarking the best alternatives. d.has_key(k) is a slow old has_been; you don't need the attribute lookup and the function call. Use k in d instead. Also use else to save a wasted append on the first trip:
Instead of:
if not somedict.has_key(key):
somedict[key] = []
somedict[key].append(value)
do this:
if key in somedict:
somedict[key].append(value)
else:
somedict[key] = [value]
You can use setdefault for this specific case:
somedict.setdefault(key, []).append(value)
See here: http://docs.python.org/library/stdtypes.html#mapping-types-dict
It depends, for exemple if the key is a paramenter of a function that will be used by an other programer, I would use the second approach, because I can't control the input, and the exception information it's actually usefull for a programer. But if its just a process inside a function and the key it's just some input from a database for exemple, the first approach it's better, then if something goes wrong, maybe show the exception information isn't helpfull at all. Use the exception approach if you want to do someting with the exception information.
EFAP is a good habit to get into for Python.
One reason is that it avoids the race condition if someone wants to use your code in a multithreaded app
The Zen of Python says "Explicit is better than implicit". Yet the "pythonic" way to check for emptiness is using implicit booleaness:
if not some_sequence:
some_sequence.fill_sequence()
This will be true if some_sequence is an empty sequence, but also if it is None or 0.
Compare with a theoretical explicit emptiness check:
if some_sequence is Empty:
some_sequence.fill_sequence()
With some unfavorably chosen variable name the implicit booleaness to check for emptiness gets even more confusing:
if saved:
mess_up()
Compare with:
if saved is not Empty:
mess_up()
See also: "Python: What is the best way to check if a list is empty?". I find it ironic that the most voted answer claims that implicit is pythonic.
So is there a higher reason why there is no explicit emptiness check, like for example is Empty in Python?
Polymorphism in if foo: and if not foo: isn't a violation of "implicit vs explicit": it explicitly delegates to the object being checked the task of knowing whether it's true or false. What that means (and how best to check it) obviously does and must depend on the object's type, so the style guide mandates the delegation -- having application-level code arrogantly asserts it knows better than the object would be the height of folly.
Moreover, X is Whatever always, invariably means that X is exactly the same object as Whatever. Making a totally unique exception for Empty or any other specific value of Whatever would be absurd -- hard to imagine a more unPythonic approach. And "being exactly the same object" is obviously transitive -- so you could never any more have distinct empty lists, empty sets, empty dicts... congratulations, you've just designed a completely unusable and useless language, where every empty container crazily "collapses" to a single empty container object (just imagine the fun when somebody tries to mutate an empty container...?!).
The reason why there is no is Empty is astoundingly simple once you understand what the is operator does.
From the python manual:
The operators is and is not test for object identity: x is y is true
if and only if x and y are the same object. x is not y yields the
inverse truth value.
That means some_sequence is Empty checks whether some_sequence is the same object as Empty. That cannot work the way you suggested.
Consider the following example:
>>> a = []
>>> b = {}
Now let's pretend there is this is Empty construct in python:
>>> a is Empty
True
>>> b is Empty
True
But since the is operator does identity check that means that a and b are identical to Empty. That in turn must mean that a and b are identical, but they are not:
>>> a is b
False
So to answer your question "why is there no is Empty in python?": because is does identity check.
In order to have the is Empty construct you must either hack the is operator to mean something else or create some magical Empty object which somehow detects empty collections and then be identical to them.
Rather than asking why there is no is Empty you should ask why there is no builtin function isempty() which calls the special method __isempty__().
So instead of using implicit booleaness:
if saved:
mess_up()
we have explicit empty check:
if not isempty(saved):
mess_up()
where the class of saved has an __isempty__() method implemented to some sane logic.
I find that far better than using implicit booleaness for emptyness check.
Of course you can easily define your own isempty() function:
def isempty(collection):
try:
return collection.__isempty__()
except AttributeError:
# fall back to implicit booleaness but check for common pitfalls
if collection is None:
raise TypeError('None cannot be empty')
if collection is False:
raise TypeError('False cannot be empty')
if collection == 0:
raise TypeError('0 cannot be empty')
return bool(collection)
and then define an __isempty__() method which returns a boolean for all your collection classes.
I agree that sometimes if foo: isn't explicit for me when I really want to tell the reader of the code that it's emptiness I'm testing. In those cases, I use if len(foo):. Explicit enough.
I 100% agree with Alex w.r.t is Empty being unpythonic.
Consider that Lisp has been using () empty list or its symbol NIL quite some years as False and T or anything not NIL as True, but generally computation of Truth already produced some useful result that need not be reproduce if needed. Look also partition method of strings, where middle result works very nicely as while control with the non-empty is True convention.
I try generally avoid using of len as it is most times very expensive in tight loops. It is often worth while to update length value of result in program logic instead of recalculating length.
For me I would prefer Python to have False as () or [] instead of 0, but that is how it is. Then it would be more natural to use not [] as not empty. But now () is not [] is True so you could use:
emptyset = set([])
if myset == emptyset:
If you want to be explicit of the empty set case (not myset is set([]))
I myself quite like the if not myset as my commenter.
Now came to my mind that maybe this is closest to explicit not_empty:
if any(x in myset for x in myset): print "set is not empty"
and so is empty would be:
if not any(x in myset for x in myset): print "set is empty"
There is an explicit emptyness check for iterables in Python. It is spelled not. What's implicit there? not gives True when iterable is empty, and gives False when it is nonempty.
What exactly do you object to? A name? As others have told you, it's certainly better than is Empty. And it's not so ungrammatical: considering how things are usually named in Python, we might imagine a sequence called widgets, containing, surprisingly, some widgets. Then,
if not widgets:
can be read as "if there are no widgets...".
Or do you object the length? Explicit doesn't mean verbose, those are two different concepts. Python does not have addition method, it has + operator, that is completely explicit if you know the type you're applying it to. The same thing with not.