Embedded if statements - python

Suppose I have a function like the following:
bigrams=[(k,v) for (k,v) in dict_bigrams.items()
if k[:pos_qu]==selection[:pos_qu]
and (k[pos_qu+1:]==selection[pos_qu+1:] if pos_qu!=1)
and k[pos_qu] not in alphabet.values()]
I want to make the second condition, namely k[pos_qu+1:]==selection[pos_qu+1:] dependent from another if statement, if pos_qu!=1. I tried (as shown above) by including the two together into parentheses but python flags a syntax error at the parentheses

If I understand your requirement correctly, you only want to check k[pos_qu+1:]==selection[pos_qu+1:] if the condition pos_qu!=1 is also met. You can rephrase that as the following condition:
pos_qu==1 or k[pos_qu+1:]==selection[pos_qu+1:]
Putting this into your comprehension:
bigrams=[(k,v) for (k,v) in dict_bigrams.items()
if k[:pos_qu]==selection[:pos_qu]
and (pos_qu==1 or k[pos_qu+1:]==selection[pos_qu+1:])
and k[pos_qu] not in alphabet.values()]

Whenever you find yourself with a complex list comprehension, trying to figure out how to do something complicated and not knowing how, the answer is usually to break things up. Expression syntax is inherently more limited than full statement (or multi-statement suite) syntax in Python, to prevent you from writing things that you won't be able to read later. Usually, that's a good thing—and, even when it isn't, you're better off going along with it than trying to fight it.
In this case, you've got a trivial comprehension, except for the if clause, which you don't know how to write as an expression. So, I'd turn the condition into a separate function:
def isMyKindOfKey(k):
… condition here
[(k,v) for (k,v) in dict_bigrams.items() if isMyKindOfKey(k)]
This lets you use full multi-statement syntax for the condition. It also lets you give the condition a name (hopefully something better than isMyKindOfKey); makes the parameters, local values captured by the closure, etc. more explicit; lets you test the function separately or reuse it; etc.
In cases where the loop itself is the non-trivial part (or there's just lots of nesting), it usually makes more sense to break up the entire comprehension into an explicit for loop and append, but I don't think that's necessary here.
It's worth noting that in this case—as in general—this doesn't magically solve your problem, it just gives you more flexibility in doing so. For example, you can use the same transformation from postfix if to infix or that F.J suggests, but you can also leave it as an if, e.g., like this:
def isMyKindOfKey(k):
retval = k[:pos_qu]==selection[:pos_qu]
if pos_qu!=1:
retval = retval and (k[pos_qu+1:]==selection[pos_qu+1:])
retval = retval and (k[pos_qu] not in alphabet.values())
return retval
That probably isn't actually the way I'd write this, but you can see how this is a trivial way to transform what's in your head into code, which would be very hard to do in an expression.

just change the order
bigrams=[(k,v) for (k,v) in dict_bigrams.items()
if k[:pos_qu]==selection[:pos_qu] #evaluated first
and pos_qu!=1 #if true continue and evaluate this next
and (k[pos_qu+1:]==selection[pos_qu+1:]) #if pos_qu != 1 lastly eval this
as the comment mentions this is not a very pythonic list comprehension and would be much more readable as a standard for loop..

Related

Zen of Python 'Explicit is better than implicit'

I'm trying to understand what 'implicit' and 'explicit' really means in the context of Python.
a = []
# my understanding is that this is implicit
if not a:
print("list is empty")
# my understanding is that this is explicit
if len(a) == 0:
print("list is empty")
I'm trying to follow the Zen of Python rules, but I'm curious to know if this applies in this situation or if I am over-thinking it?
The two statements have very different semantics. Remember that Python is dynamically typed.
For the case where a = [], both not a and len(a) == 0 are equivalent. A valid alternative might be to check not len(a). In some cases, you may even want to check for both emptiness and listness by doing a == [].
But a can be anything. For example, a = None. The check not a is fine, and will return True. But len(a) == 0 will not be fine at all. Instead you will get TypeError: object of type 'NoneType' has no len(). This is a totally valid option, but the if statements do very different things and you have to pick which one you want.
(Almost) everything has a __bool__ method in Python, but not everything has __len__. You have to decide which one to use based on the situation. Things to consider are:
Have you already verified whether a is a sequence?
Do you need to?
Do you mind if your if statement crashed on non-sequences?
Do you want to handle other falsy objects as if they were empty lists?
Remember that making the code look pretty takes second place to getting the job done correctly.
Though this question is old, I'd like to offer a perspective.
In a dynamic language, my preference would be to always describe the expected type and objective of a variable in order to offer more purpose understanding. Then use the knowledge of the language to be succinct and increase readability where possible (in python, an empty list's boolean result is false). Thus the code:
lst_colours = []
if not lst_colours:
print("list is empty")
Even better to convey meaning is using a variable for very specific checks.
lst_colours = []
b_is_list_empty = not lst_colours
if b_is_list_empty:
print("list is empty")
Checking a list is empty would be a common thing to do several times in a code base. So even better such things in a separate file helper function library. Thus isolating common checks, and reducing code duplication.
lst_colours = []
if b_is_list_empty(lst_colours):
print("list is empty")
def b_is_list_empty (lst):
......
Most importantly, add meaning as much as possible, have an agreed company standard to chose how to tackle the simple things, like variable naming and implicit/explicit code choices.
Try to think of:
if not a:
...
as shorthand for:
if len(a) == 0:
...
I don't think this is a good example of a gotcha with Python's Zen rule of "explicit" over "implicit". This is done rather mostly because of readability. It's not that the second one is bad and the other is good. It's just that the first one is more skillful. If one understands boolean nature of lists in Python, I think you find the first is more readable and readability counts in Python.

Python: Else statement with 2 for loops

I am searching for the pythonic way to do the following:
I have a list of keys and a list of objects.
For any key, something should be done with the first object that fits to that key.
If no object fits to no key, so nothing has be done at all, something different should be done instead.
I implemented this as follows and it is working properly:
didSomething = False
for key in keys:
for obj in objects:
if <obj fits to key>:
doSomething(obj, key)
didSomething = True
break
if not didSomething:
doSomethingDifferent()
But normally, if there is only one for-loop, you don't need such a temporary boolean to check whether something has been done or not. You can use a for-else statement instead. But this does not work with 2 for-loops, did it?
I have the feeling that there should be some better way to do this but i don't get it. Do you have any ideas or is there no improvement?
Thank you :)
This doesn't really fit into the for/else paradigm, because you don't want to break the outer loop. So just use a variable to track whether something was done, as in your original code.
Instead of the second loop, use a single expression that finds the first matching object. See Python: Find in list for ways to do this.
didSomething = false
for key in keys:
found = next((obj for obj in objects if <obj fits to key>), None)
if found:
doSomething(found, key)
didSomething = true
if not didSomething:
doSomethingDifferent()
Whenever you find yourself needing to break out of a nested loop, it’s usually hard to think through the details, and when you finish figuring it out, the answer is usually just that it’s impossible (or at least only possible with an explicit flag variable or an exception or something else that obscures your logic).
There's an easy answer to that (which I'll include below in case anyone finding this question by search has that problem), but that's not actually your problem. What you want to check is not "did I complete the loop normally", because you always complete the loop normally. What you want to check is "did I do something (in this case, call doSomething) one or more times".
That isn't really about the outer loop, unlike breaking out of the outer loop (which obviously is), so there's no syntax for it. You need to keep track of whether you did something one or more times, and the way you're already doing that is probably the simplest way.
In some cases, you can rearrange things to flatten or invert the loop, so you end up doing one thing with all of the currently-outer values one time and breaking out of that loop, in which case it is about looping again. But if that twists your logic up so much that it's no longer clear what's going on, that's not going to be an improvement. For example:
fits = set()
for key in keys:
for obj in objects:
if <obj fits to key>:
fits.add((obj, key))
for obj, key in fits:
do_something(obj, key)
if not fits:
do_something_else()
This can be simplified:
fits = {(obj, key) for key in keys for obj in objects if <obj fits to key>}
for obj, key in fits:
do_something(obj, key)
if not fits:
do_something_else()
But, either way, notice that the way I avoided storing a flag saying whether you ever found a fit was by storing a set of all of the fits you found. For some problems, that's an improvement. But if that set could be very large, it's a terrible idea. And if that set just conceptually doesn't mean anything in your problem, it might obscure the logic instead of simplifying it.
If your problem were breaking out of a nested loop (which it isn't, but, again, it might be for someone else who finds this question by search), there’s always an easy answer to that: just take the whole nest of loops and refactor it into a function. Then you can break out at any level by just using return. If you didn’t return anywhere, the code after the loops will get run, while if you did return, it will—just like an else.
So:
def fits():
for key in keys:
for obj in objects:
if <obj fits to key>:
doSomething(obj, key)
return
doSomethingDifferent()
fits()
I’m not sure whether breaking out if both loops is what you want. If it is, this does exactly what you want. If not, it doesn’t, but then I’m not sure what semantics you were looking for with the else–when it should get run—so I don’t know how to explain how to do that.
Once you’ve done this, you may find the abstraction generalizes to more than use in your code, so you can turn the function into something that takes parameters instead of using closure or global variables, and that returns a value or raises instead of calling one of two functions, and so on. But sometimes, this trivial local function is all you need.
There's no real way to simplify your code. It is, however, kind of confusing the way it's written. I would actually make it more verbose to make sure it's read properly:
def fit_objects_to_keys(objects, keys):
for key in keys:
for obj in objects:
if <obj fits to key>:
yield obj, key
break
none_fit = True
for obj, key in fit_objects_to_keys(keys, objects):
doSomething(obj, key)
none_fit = False
if none_fit:
doSomethingDifferent()
You may be able to simplify it further if you explain what <obj fits to key> actually does.
I agree with the comment that your code is fine as it is - but if you must flatten multiple for-loops into one (so that you can use the 'else' feature, for example, or the number of for-loops is itself variable), this is actually possible:
import itertools
for key, obj in itertools.product(keys, objects):
if <obj fits to key>:
doSomething(obj, key)
break
else:
doSomethingDifferent()

Pythonic way to add to a set and care about if it worked?

Often times I find that, when working with Pythonic sets, the Pythonic way seems to be absent.
For example, doing something like a dijkstra or a*:
openSet, closedSet = set(nodes), set(nodes)
while openSet:
walkSet, openSet = openSet, set()
for node in walkSet:
for dest in node.destinations():
if dest.weight() < constraint:
if dest not in closedSet:
closedSet.add(dest)
openSet.add(dest)
This is a weakly contrived example, the focus is the last three lines:
if not value in someSet:
someSet.add(value)
doAdditionalThings()
Given the Python way of working with, for example, accessing/using values of a dict, I would have expected to be able to do:
try:
someSet.add(value)
except KeyError:
continue # well, that's ok then.
doAdditionalThings()
As a C++ programmer, my skin crawls a bit that I can't even do:
if someSet.add(value):
# add wasn't blocked by the value already being present
doAdditionalThings()
Is there a more Pythonic (and if possible more efficient) way to work with this sort of set-as-guard usage?
The add operation is not supposed to also tell you if the item was already in the set; it just makes sure it is in there after you add it. Or put another way, what you want is not "add an item and check if it worked"; you want to first check if the item is there, and if not, then do some special stuff. If all you wanted to do was add the item, you wouldn't do the check at all. There is nothing unpythonic about this pattern:
if item not in someSet:
someSet.add(item)
doStuff()
else:
doOtherStuff()
It is true that the API could have been designed so that .add returned whether the item was already in there, but in my experience that's not a particularly common use case. Part of the point of sets is that you can freely add items without worrying about whether they were already in there (since adding an already-included item has no effect). Also, having .add return None is consistent with the general convention for Python builtin types that methods that mutate their arguments return None. It is really things like dict.setdefault (which gets an item but first adds it if isn't there) that are the unusual case.

Python dictionary instead of switch/case

I've recently learned that python doesn't have the switch/case statement. I've been reading about using dictionaries in its stead, like this for example:
values = {
value1: do_some_stuff1,
value2: do_some_stuff2,
valueN: do_some_stuffN,
}
values.get(var, do_default_stuff)()
What I can't figure out is how to apply this to do a range test. So instead of doing some stuff if value1=4 say, doing some stuff if value1<4. So something like this (which I know doesn't work):
values = {
if value1 <val: do_some_stuff1,
if value2 >val: do_some_stuff2,
}
values.get(var, do_default_stuff)()
I've tried doing this with if/elif/else statements. It works fine but it seems to go considerably slower compared to the situation where I don't need the if statements at all (which is maybe something obvious an inevitable). So here's my code with the if/elif/else statement:
if sep_ang(val1,val2,X,Y)>=ROI :
main.removeChild(source)
elif sep_ang(val1,val2,X,Y)<=5.0:
integral=float(spectrum[0].getElementsByTagName("parameter")[0].getAttribute("free"))
index=float(spectrum[0].getElementsByTagName("parameter")[0].getAttribute("free"))
print name,val1,val2,sep_ang(val1,val2,X,Y),integral,index
print >> reg,'fk5;point(',val1,val2,')# point=cross text={',name,'}'
else:
spectrum[0].getElementsByTagName("parameter")[0].setAttribute("free","0") #Integral
spectrum[0].getElementsByTagName("parameter")[1].setAttribute("free","0") #Index
integral=float(spectrum[0].getElementsByTagName("parameter")[0].getAttribute("free"))
index=float(spectrum[0].getElementsByTagName("parameter")[0].getAttribute("free"))
print name,val1,val2,sep_ang(val1,val2,X,Y),integral,index
print >> reg,'fk5;point(',val1,val2,')# point=cross text={',name,'}'
Which takes close to 5 min for checking about 1500 values of the var sep_ang. Where as if I don't want to use setAttribute() to change values in my xml file based on the value of sep_ang, I use this simple if else:
if sep_ang(val1,val2,X,Y)>=ROI :
main.removeChild(source)
else:
print name,val1,val2,ang_sep(val1,val2,X,Y);print >> reg,'fk5;point(',val1,val2,')# point
Which only takes ~30sec. Again I know it's likely that adding that elif statement and changing values of that attribute inevitably increases the execution time of my code by a great deal, I was just curious if there's a way around it.
Edit:
Is the benefit of using bisect as opposed to an if/elif statement in my situation that it can check values over some range quicker than using a bunch of elif statements?
It seems like I'll still need to use elif statements. Like this for example:
range=[10,100]
options='abc'
def func(val)
return options[bisect(range, val)]
if func(val)=a:
do stuff
elif func(val)=b:
do other stuff
else:
do other other stuff
So then my elif statement are only checking against a single value.
Thanks much for the help, it's greatly appreciated.
A dictionary is the wrong structure for this. The bisect examples show an example of this sort of range test.
Whilst the dictionary approach works well for single values, if you want ranges, if ... else if ... else if is probably the simplest approach.
If you're looking for a single value this a good match to a dictionary - since this is what dictionaries are for - but if you're looking for a range it doesn't work. You could do it with a dict using something like:
values = {
lambda x: x < 4: foo,
lambda x: x > 4: bar
}
and then loop through all the key-value pairs in the dictionary, passing your value key and running the value as a function if the key function returns true.
However, this wouldn't give you any benefit over a number of if statements and would be harder to maintain and debug. So don't do it, and just use if instead.
In that case you would use an if/then/else. You cannot do this with a switch, either.
The idea of a switch statement is that you have a value V that you test for identity against N possible outcomes. You can do this with an if-construct - however that would take O(N) runtime on average. The switch gives you constant O(1) every time.
This is obviously not possible for ranges (since they are not easily hashable) and thus you use if-constructs for these cases.
Example
if value1 <val: do_some_stuff1()
elif value2 >val: do_some_stuff2()
Note that this is actually smaller than trying to use a dictionary.
dict is not for doing this (nor is switch!).
A couple posters have suggested a dict with containment functions, but this is not the solution you want at all. It is O(n) (like an if statement), it doesn't really work (because you could have overlapping conditions), is unpredictable (because you do not know what order you will do the loop), and is much less clear than the equivalent if statement. The if statement is probably the way you want to go if you have a short, static-length list of conditions to apply.
If you have tons of conditions or if they could change as a result of your program, you want a different data structure. You could implement a binary tree or keep a sorted list and use the bisect module to find a value associated with the given range.
I don't know of any practicable solution. If you want to go with the guess what it does approach though you could do something like this:
obsure_switch = {
lambda x: 1<x<6 : some_function,
...
}
[action() for condition,action in obscure_switch.iteritems() if condition(var)]
Finally figured out what to do!
So instead of using a bunch of elif statements I did this:
range=[10,100]
options='abc'
def func(val)
choose=str(options[bisect(range,val)])
exec choose+"()"
def a():
do_stuff
def b():
do_other_stuff
def c():
do_other_other stuff
Not only does it work but it goes almost as fast as my original 4 line code where I'm not changing any values of things!

Using 'try' vs. 'if' in Python

Is there a rationale to decide which one of try or if constructs to use, when testing variable to have a value?
For example, there is a function that returns either a list or doesn't return a value. I want to check result before processing it. Which of the following would be more preferable and why?
result = function();
if (result):
for r in result:
#process items
or
result = function();
try:
for r in result:
# Process items
except TypeError:
pass;
Related discussion:
Checking for member existence in Python
You often hear that Python encourages EAFP style ("it's easier to ask for forgiveness than permission") over LBYL style ("look before you leap"). To me, it's a matter of efficiency and readability.
In your example (say that instead of returning a list or an empty string, the function were to return a list or None), if you expect that 99 % of the time result will actually contain something iterable, I'd use the try/except approach. It will be faster if exceptions really are exceptional. If result is None more than 50 % of the time, then using if is probably better.
To support this with a few measurements:
>>> import timeit
>>> timeit.timeit(setup="a=1;b=1", stmt="a/b") # no error checking
0.06379691968322732
>>> timeit.timeit(setup="a=1;b=1", stmt="try:\n a/b\nexcept ZeroDivisionError:\n pass")
0.0829463709378615
>>> timeit.timeit(setup="a=1;b=0", stmt="try:\n a/b\nexcept ZeroDivisionError:\n pass")
0.5070195056614466
>>> timeit.timeit(setup="a=1;b=1", stmt="if b!=0:\n a/b")
0.11940114974277094
>>> timeit.timeit(setup="a=1;b=0", stmt="if b!=0:\n a/b")
0.051202772912802175
So, whereas an if statement always costs you, it's nearly free to set up a try/except block. But when an Exception actually occurs, the cost is much higher.
Moral:
It's perfectly OK (and "pythonic") to use try/except for flow control,
but it makes sense most when Exceptions are actually exceptional.
From the Python docs:
EAFP
Easier to ask for forgiveness than
permission. This common Python coding
style assumes the existence of valid
keys or attributes and catches
exceptions if the assumption proves
false. This clean and fast style is
characterized by the presence of many
try and except statements. The
technique contrasts with the LBYL
style common to many other languages
such as C.
Your function should not return mixed types (i.e. list or empty string). It should return a list of values or just an empty list. Then you wouldn't need to test for anything, i.e. your code collapses to:
for r in function():
# process items
Please ignore my solution if the code I provide is not obvious at first glance and you have to read the explanation after the code sample.
Can I assume that the "no value returned" means the return value is None? If yes, or if the "no value" is False boolean-wise, you can do the following, since your code essentially treats "no value" as "do not iterate":
for r in function() or ():
# process items
If function() returns something that's not True, you iterate over the empty tuple, i.e. you don't run any iterations. This is essentially LBYL.
Generally, the impression I've gotten is that exceptions should be reserved for exceptional circumstances. If the result is expected never to be empty (but might be, if, for instance, a disk crashed, etc), the second approach makes sense. If, on the other hand, an empty result is perfectly reasonable under normal conditions, testing for it with an if statement makes more sense.
I had in mind the (more common) scenario:
# keep access counts for different files
file_counts={}
...
# got a filename somehow
if filename not in file_counts:
file_counts[filename]=0
file_counts[filename]+=1
instead of the equivalent:
...
try:
file_counts[filename]+=1
except KeyError:
file_counts[filename]=1
Which of the following would be more preferable and why?
Look Before You Leap is preferable in this case. With the exception approach, a TypeError could occur anywhere in your loop body and it'd get caught and thrown away, which is not what you want and will make debugging tricky.
(I agree with Brandon Corfman though: returning None for ‘no items’ instead of an empty list is broken. It's an unpleasant habit of Java coders that should not be seen in Python. Or Java.)
Your second example is broken - the code will never throw a TypeError exception since you can iterate through both strings and lists. Iterating through an empty string or list is also valid - it will execute the body of the loop zero times.
bobince wisely points out that wrapping the second case can also catch TypeErrors in the loop, which is not what you want. If you do really want to use a try though, you can test if it's iterable before the loop
result = function();
try:
it = iter(result)
except TypeError:
pass
else:
for r in it:
#process items
As you can see, it's rather ugly. I don't suggest it, but it should be mentioned for completeness.
As far as the performance is concerned, using try block for code that normally
doesn’t raise exceptions is faster than using if statement everytime. So, the decision depends on the probability of excetional cases.
As a general rule of thumb, you should never use try/catch or any exception handling stuff to control flow. Even though behind the scenes iteration is controlled via the raising of StopIteration exceptions, you still should prefer your first code snippet to the second.

Categories