Is a nested statement or the use of and in an if statement would be more efficient?
Basically is:
if condition1:
if condition2:
#do stuff
or
if condition1 and condition2:
#do stuff
more efficient or are they similar on performance and for the sake of readability which should I choose to use
Short answer: no
dis disassemblers python module to bytecode that being executed by interpreter. For both functions number of executed operations is the same (10)
But generally preferred to use and clause because of better code readability and less nesting
Code:
>>> def test_inner_if(condition1=True, condition2=True):
... if condition1:
... if condition2:
... pass
...
>>> def test_and_if(condition1=True, condition2=True):
... if condition1 and condition2:
... pass
...
>>> import dis
>>> dis.dis(test_inner_if)
2 0 LOAD_FAST 0 (condition1)
2 POP_JUMP_IF_FALSE 8
3 4 LOAD_FAST 1 (condition2)
6 POP_JUMP_IF_FALSE 8
4 >> 8 LOAD_CONST 0 (None)
10 RETURN_VALUE
>>> dis.dis(test_and_if)
2 0 LOAD_FAST 0 (condition1)
2 POP_JUMP_IF_FALSE 8
4 LOAD_FAST 1 (condition2)
6 POP_JUMP_IF_FALSE 8
3 >> 8 LOAD_CONST 0 (None)
10 RETURN_VALUE
2nd one is generally preferred. In terms of run time they wouldn't make really any noticeable difference, but in terms of readability of code (good code is easily readable) the second one is preferred.
Just go for readability
If you're trying to do something easy like x>3 and x<30 just use on if.
If your conditions are function calls probably use multiple ifs as it will be easier to read and easier to debug
The best piece of advice I have gotten in terms of optimization is,
"find the slowest part of your code and optimize the heck out of that"
The context of the if statements matter, so for example in one case they can be deep inside 3 or more for loops, then going for good optimization (and comment you logic) would be good. In another case though, you may determining whether to throw an error at the beginning of a function. In that case readability is critical.
Secondly, how you optimize is important too. The interpreter way see both way as equivalent which means that it is best to go for readability. One easy way to find out is to use this
import time
s = time.clock()
#your code here
print(time.clock() - s) #shows how long the code segment took to run
This can be an interesting experiment whenever you have an optimization question.
Related
In other words, what exists behind the two asterisks? Is it simply multiplying the number x times or something else?
As a follow-up question, is it better to write 2**3 or 2*2*2. I'm asking because I've heard that in C++ it's better to not use pow() for simple calculations, since it calls a function.
If you're interested in the internals, I'd disassemble the instruction to get the CPython bytecode it maps to. Using Python3:
»»» def test():
return 2**3
...:
»»» dis.dis(test)
2 0 LOAD_CONST 3 (8)
3 RETURN_VALUE
OK, so that seems to have done the calculation right on entry, and stored the result. You get exactly the same CPython bytecode for 2*2*2 (feel free to try it). So, for the expressions that evaluate to a constant, you get the same result and it doesn't matter.
What if you want the power of a variable?
Now you get two different bits of bytecode:
»»» def test(n):
return n ** 3
»»» dis.dis(test)
2 0 LOAD_FAST 0 (n)
3 LOAD_CONST 1 (3)
6 BINARY_POWER
7 RETURN_VALUE
vs.
»»» def test(n):
return n * 2 * 2
....:
»»» dis.dis(test)
2 0 LOAD_FAST 0 (n)
3 LOAD_CONST 1 (2)
6 BINARY_MULTIPLY
7 LOAD_CONST 1 (2)
10 BINARY_MULTIPLY
11 RETURN_VALUE
Now the question is of course, is the BINARY_MULTIPLY quicker than the BINARY_POWER operation?
The best way to try that is to use timeit. I'll use the IPython %timeit magic. Here's the output for multiplication:
%timeit test(100)
The slowest run took 15.52 times longer than the fastest. This could mean that an intermediate result is being cached
10000000 loops, best of 3: 163 ns per loop
and for power
The slowest run took 5.44 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 473 ns per loop
You may wish to repeat this for representative inputs, but empirically it looks like the multiplication is quicker (but note the mentioned caveat about the variance in the output).
If you want further internals, I'd suggest digging into the CPython code.
While the second one is little bit faster for numbers, the advantage is very low compared to the first: readability. If you are going for time, and you are pressured to make such optimizations, then python probably is not the language you should use.
Note: for values other than numbers:
a ** b translates to
a.__pow__(b)
whereas a * a * a is a call to
a.__mul__(a.__mul__(a))
Test Code:
import time
s = time.time()
for x in xrange(1,1000000):
x**5
print "done in ", time.time() - s
s = time.time()
for x in xrange(1,1000000):
x*x*x*x*x
print "done in ", time.time() - s
For my machine it yields:
done in 0.975429058075
done in 0.260419845581
[Finished in 1.2s]
If you ask frankly, multiplication is a bit faster.
>>timeit.timeit('[i*i*i*i for i in range(100)]', number=10000)
0.262529843304
>>timeit.timeit('[i**4 for i in range(100)]', number=10000)
0.31143438383
But, speed isn't the only thing to consider when you will choose one from two options. From example, what is easier while computing 2 to the power 20? Simply writing 2**20 or using for loop that will iterate 20 times and do some multiplication task?
The ** operator will, internally, use an iterative function (same semantics as built-in pow() (Python docs), which likely means it just calls that function anyway).
Therefore, if you know the power and can hardcode it, using 2*2*2 would likely be a little faster than 2**3. This has a little to do with the function, but I believe the main performance issue is that it will use a loop.
Note that it's still quite silly to replace more readable code for less readable code when it's something as simple as 2**3, the performance gain is minimal at best.
From the docs:
The power operator binds more tightly than unary operators on its left; it binds less tightly than unary operators on its right. The syntax is:
power ::= primary ["**" u_expr]
Thus, in an unparenthesized sequence of power and unary operators, the operators are evaluated from right to left (this does not constrain the evaluation order for the operands): -1**2 results in -1.
The power operator has the same semantics as the built-in pow() function, when called with two arguments: it yields its left argument raised to the power of its right argument.
This means that, in Python: 2**2**3 is evaluated as 2**(2**3) = 2**8 = 256.
In mathematics, stacked exponents are applied from the top down. If it were not done this way you would just get multiplication of exponents:
(((2**3)**4)**5) = 2**(3*4*5)
It might be a little faster just to do the multiplication, but much less readable.
I have found a few links talking about switch cases being faster in c++ than if else because it can be optimized in compilation. I then found some suggestions people had that using a dictionary may be faster than an If statement. However, most of the conversation are about someones work end just end up discussing that they should optimize other parts of the code first and it wont matter unless your doing millions of if else. Can anyone explain why this is?
Say I have 100 unique numbers that are going to be streamed in to a python code constantly. I want to check which number it is, then execute something. So i could either do a ton of if else, or i could put each number in a dictionary. For arguments sake, lets say its a single thread.
Does someone understand the layer between python and the low level execution that can explain how this is working?
Thanks :)
However, most of the conversation are about someones work end just end
up discussing that they should optimize other parts of the code first
and it wont matter unless your doing millions of if else. Can anyone
explain why this is?
Generally, you should only bother to optimize code if you really need to, i.e. if the program's performance is unusably slow.
If this is the case, you should use a profiler to determine which parts are actually causing the most problems. For Python, the cProfile module is pretty good for this.
Does someone understand the layer between python and the low level
execution that can explain how this is working?
If you want to get an idea of how your code executes, take a look at the dis module.
A quick example...
import dis
# Here are the things we might want to do
def do_something_a():
print 'I did a'
def do_something_b():
print 'I did b'
def do_something_c():
print 'I did c'
# Case 1
def f1(x):
if x == 1:
do_something_a()
elif x == 2:
do_something_b()
elif x == 3:
do_something_c()
# Case 2
FUNC_MAP = {1: do_something_a, 2: do_something_b, 3: do_something_c}
def f2(x):
FUNC_MAP[x]()
# Show how the functions execute
print 'Case 1'
dis.dis(f1)
print '\n\nCase 2'
dis.dis(f2)
...which outputs...
Case 1
18 0 LOAD_FAST 0 (x)
3 LOAD_CONST 1 (1)
6 COMPARE_OP 2 (==)
9 POP_JUMP_IF_FALSE 22
19 12 LOAD_GLOBAL 0 (do_something_a)
15 CALL_FUNCTION 0
18 POP_TOP
19 JUMP_FORWARD 44 (to 66)
20 >> 22 LOAD_FAST 0 (x)
25 LOAD_CONST 2 (2)
28 COMPARE_OP 2 (==)
31 POP_JUMP_IF_FALSE 44
21 34 LOAD_GLOBAL 1 (do_something_b)
37 CALL_FUNCTION 0
40 POP_TOP
41 JUMP_FORWARD 22 (to 66)
22 >> 44 LOAD_FAST 0 (x)
47 LOAD_CONST 3 (3)
50 COMPARE_OP 2 (==)
53 POP_JUMP_IF_FALSE 66
23 56 LOAD_GLOBAL 2 (do_something_c)
59 CALL_FUNCTION 0
62 POP_TOP
63 JUMP_FORWARD 0 (to 66)
>> 66 LOAD_CONST 0 (None)
69 RETURN_VALUE
Case 2
29 0 LOAD_GLOBAL 0 (FUNC_MAP)
3 LOAD_FAST 0 (x)
6 BINARY_SUBSCR
7 CALL_FUNCTION 0
10 POP_TOP
11 LOAD_CONST 0 (None)
14 RETURN_VALUE
...so it's pretty easy to see which function has to execute the most instructions.
As for which is actually faster, that's something you'd have to check by profiling the code.
The if/elif/else structure compares the key it was given to a sequence of possible values one by one until it finds a match in the condition of some if statement, then reads what it is supposed to execute from inside the if block. This can take a long time, because so many checks (n/2 on average, for n possible values) have to be made for every lookup.
The reason that a sequence of if statements is more difficult to optimize than a switch statement is that the condition checks (what's inside the parens in C++) might conceivably change the state of some variable that's involved in the next check, so you have to do them in order. The restrictions on switch statements remove that possibility, so the order doesn't matter (I think).
Python dictionaries are implemented as hash tables. The idea is this: if you could deal with arbitrarily large numbers and had infinite RAM, you could create a huge array of function pointers that is indexed just by casting whatever your lookup value is to an integer and using that as the index. Lookup would be virtually instantaneous.
You can't do that, of course, but you can create an array of some manageable length, pass the lookup value to a hash function (which generates some integer, depending on the lookup value), then % your result with the length of your array to get an index within the bounds of that array. That way, lookup takes as much time as is needed to call the hash function once, take the modulus, and jump to an index. If the amount of different possible lookup values is large enough, the overhead of the hash function becomes negligible compared to those n/2 condition checks.
(Actually, since many different lookup values will inevitably map to the same index, it's not quite that simple. You have to check for and resolve possible conflicts, which can be done in a number of ways. Still, the gist of it is as described above.)
I wish to yield from a generator only as many items are required.
In the following code
a, b, c = itertools.count()
I receive this exception:
ValueError: too many values to unpack
I've seen several related questions, however I have zero interest in the remaining items from the generator, I only wish to receive as many as I ask for, without providing that quantity in advance.
It seems to me that Python determines the number of items you want, but then proceeds to try to read and store more than that number (generating ValueError).
How can I yield only as many items as I require, without passing in how many items I want?
Update0
To aid in understanding the approximate behaviour I believe should be possible, here's a code sample:
def unpack_seq(ctx, items, seq):
for name in items:
setattr(ctx, name, seq.next())
import itertools, sys
unpack_seq(sys.modules[__name__], ["a", "b", "c"], itertools.count())
print a, b, c
If you can improve this code please do.
Alex Martelli's answer suggests to me the byte op UNPACK_SEQUENCE is responsible for the limitation. I don't see why this operation should require that generated sequences must also exactly match in length.
Note that Python 3 has different unpack syntaxes which probably invalidate technical discussion in this question.
You need a deep bytecode hack - what you request cannot be done at Python source code level, but (if you're willing to commit to a specific version and release of Python) may be doable by post-processing the bytecode after Python has compiled it. Consider, e.g.:
>>> def f(someit):
... a, b, c = someit()
...
>>> dis.dis(f)
2 0 LOAD_FAST 0 (someit)
3 CALL_FUNCTION 0
6 UNPACK_SEQUENCE 3
9 STORE_FAST 1 (a)
12 STORE_FAST 2 (b)
15 STORE_FAST 3 (c)
18 LOAD_CONST 0 (None)
21 RETURN_VALUE
>>>
As you see, the UNPACK_SEQUENCE 3 bytecode occurs without the iterator someit being given the least indication of it (the iterator has already been called!) -- so you must prefix it in the bytecode with a "get exactly N bytes" operation, e.g.:
>>> def g(someit):
... a, b, c = limit(someit(), 3)
...
>>> dis.dis(g)
2 0 LOAD_GLOBAL 0 (limit)
3 LOAD_FAST 0 (someit)
6 CALL_FUNCTION 0
9 LOAD_CONST 1 (3)
12 CALL_FUNCTION 2
15 UNPACK_SEQUENCE 3
18 STORE_FAST 1 (a)
21 STORE_FAST 2 (b)
24 STORE_FAST 3 (c)
27 LOAD_CONST 0 (None)
30 RETURN_VALUE
where limit is your own function, easily implemented (e.g via itertools.slice). So, the original 2-bytecode "load fast; call function" sequence (just before the unpack-sequence bytecode) must become this kind of 5-bytecode sequence, with a load-global bytecode for limit before the original sequence, and the load-const; call function sequence after it.
You can of course implement this bytecode munging in a decorator.
Alternatively (for generality) you could work on altering the function's original source, e.g. via parsing and alteration of the AST, and recompiling to byte code (but that does require the source to be available at decoration time, of course).
Is this worth it for production use? Absolutely not, of course -- ridiculous amount of work for a tiny "syntax sugar" improvement. However, it can be an instructive project to pursue to gain mastery in bytecode hacking, ast hacking, and other black magic tricks that you will probably never need but are surely cool to know when you want to move beyond the language of mere language wizard to that of world-class guru -- indeed I suspect that those who are motivated to become gurus are typically people who can't fail to yield to the fascination of such "language internals' mechanics"... and those who actually make it to that exalted level are the subset wise enough to realize such endeavors are "just play", and pursue them as a spare-time activity (mental equivalent of weight lifting, much like, say, sudoku or crosswords;-) without letting them interfere with the important tasks (delivering value to users by deploying solid, clear, simple, well-performing, well-tested, well-documented code, most often without even the slightest hint of black magic to it;-).
You need to make sure that the number of items on both sides matches. One way is to use islice from the itertools module:
from itertools import islice
a, b, c = islice(itertools.count(), 3)
Python does not work the way you desire. In any assignment, like
a, b, c = itertools.count()
the right-hand side is evaluated first, before the left-hand side.
The right-hand side can not know how many items are on the left-hand side unless you tell it.
Or use a list comprehension, since you know the desired number of items:
ic = itertools.count()
a,b,c = [ic.next() for i in range(3)]
or even simpler:
a,b,c = range(3)
I encounter the following small annoying dilemma over and over again in Python:
Option 1:
cleaner but slower(?) if called many times since a_list get re-created for each call of do_something()
def do_something():
a_list = ["any", "think", "whatever"]
# read something from a_list
Option 2:
Uglier but more efficient (spare the a_list creation all over again)
a_list = ["any", "think", "whatever"]
def do_something():
# read something from a_list
What do you think?
What's ugly about it?
Are the contents of the list always constants, as in your example? If so: recent versions of Python (since 2.4) will optimise that by evaluating the constant expression and keeping the result but only if it's a tuple. So you could change it to being a tuple. Or you could stop worrying about small things like that.
Here's a list of constants and a tuple of constants:
>>> def afunc():
... a = ['foo', 'bar', 'zot']
... b = ('oof', 'rab', 'toz')
... return
...
>>> import dis; dis.dis(afunc)
2 0 LOAD_CONST 1 ('foo')
3 LOAD_CONST 2 ('bar')
6 LOAD_CONST 3 ('zot')
9 BUILD_LIST 3
12 STORE_FAST 0 (a)
3 15 LOAD_CONST 7 (('oof', 'rab', 'toz'))
18 STORE_FAST 1 (b)
4 21 LOAD_CONST 0 (None)
24 RETURN_VALUE
>>>
Never create something more than once if you don't have to. This is a simply optimization that can be done on your part and I personally do not find the second example ugly at all.
Some may argue not to worry about optimizing little things like this but I feel that something this simple to fix should be done immediately. I would hate to see your application create multiple copies of anything that it doesn't need to simply to preserve an arbitrary sense of "code beauty". :)
Option 3:
def do_something(a_list = ("any", "think", "whatever")):
read something from a_list
Option 3 compared to Option 1:
Both are equally readable in my opinion (though some seem to think differently in the comments! :-) ). You could even write Option 3 like this
def do_something(
a_list = ("any", "think", "whatever")):
read something from a_list
which really minimizes the difference in terms of readability.
Unlike Option 1, however, Option 3 defines a_list only once -- at the time when do_something is defined. That's exactly what we want.
Option 3 compared to Option 2:
Avoid global variables if possible. Option 3 allows you to do that.
Also, with Option 2, over time or if other people maintain this code, the definition of a_list could get separated from def do_something. This may not be a big deal, but I think it is somewhat undesireable.
if your a_list doesn't change, move it out of the function.
You have some data
You have a method associated with it
You don't want to keep the data globally just for the sake of optimising the speed of the method unless you have to.
I think this is what classes are for.
class Processor:
def __init__(this):
this.data = "any thing whatever".split()
def fun(this,arg):
# do stuff with arg and list
inst = Processor()
inst.fun("skippy)
Also, if you someday want to separate out the data into a file, you can just modify the constructor to do so.
Well it seems it comes down to initializing the array in the function or not:
import time
def fun1():
a = ['any', 'think', 'whatever']
sum = 0
for i in range(100):
sum += i
def fun2():
sum = 0
for i in range(100):
sum += i
def test_fun(fun, times):
start = time.time()
for i in range(times):
fun()
end=time.time()
print "Function took %s" % (end-start)
# Test
print 'warming up'
test_fun(fun1, 100)
test_fun(fun2, 100)
print 'Testing fun1'
test_fun(fun1, 100000)
print 'Testing fun2'
test_fun(fun2, 100000)
print 'Again'
print 'Testing fun1'
test_fun(fun1, 100000)
print 'Testing fun2'
test_fun(fun2, 100000)
and the results:
>python test.py
warming up
Function took 0.000604152679443
Function took 0.000600814819336
Testing fun1
Function took 0.597407817841
Testing fun2
Function took 0.580779075623
Again
Testing fun1
Function took 0.595198154449
Testing fun2
Function took 0.580571889877
Looks like there is no difference.
I've worked on automated systems that process 100,000,000+ records a day, where a 1% percent performance improvement is huge.
I learned a big lesson working on that system: Faster is better, but only when you know when it's fast enough.
A 1% improvement would have been a huge reduction in total processing time, but it isn't enough to effect when we would need our next hardware upgrade. My application was so fast, that the amount of time I spent trying to milk that last 1% probably cost more than a new server would have.
In your case, you would have to call do_something tens of thousands of times before making a significant difference in performance. In some cases that would make a difference, in other it won't.
If the list is never modified, why do you use lists at all?
Without knowing your actual requirements, I'd recommend to simply use some if-statements to get rid of the list and the "read something from list" part completely.
I've seen two ways to create an infinite loop in Python:
while 1:
do_something()
while True:
do_something()
Is there any difference between these? Is one more pythonic than the other?
Fundamentally it doesn't matter, such minutiae doesn't really affect whether something is 'pythonic' or not.
If you're interested in trivia however, there are some differences.
The builtin boolean type didn't exist till Python 2.3 so code that was intended to run on ancient versions tends to use the while 1: form. You'll see it in the standard library, for instance.
The True and False builtins are not reserved words prior to Python 3 so could be assigned to, changing their value. This helps with the case above because code could do True = 1 for backwards compatibility, but means that the name True needs to be looked up in the globals dictionary every time it is used.
Because of the above restriction, the bytecode the two versions compile to is different in Python 2 as there's an optimisation for constant integers that it can't use for True. Because Python can tell when compiling the 1 that it's always non-zero, it removes the conditional jump and doesn't load the constant at all:
>>> import dis
>>> def while_1():
... while 1:
... pass
...
>>> def while_true():
... while True:
... pass
...
>>> dis.dis(while_1)
2 0 SETUP_LOOP 5 (to 8)
3 >> 3 JUMP_ABSOLUTE 3
6 POP_TOP
7 POP_BLOCK
>> 8 LOAD_CONST 0 (None)
11 RETURN_VALUE
>>> dis.dis(while_true)
2 0 SETUP_LOOP 12 (to 15)
>> 3 LOAD_GLOBAL 0 (True)
6 JUMP_IF_FALSE 4 (to 13)
9 POP_TOP
3 10 JUMP_ABSOLUTE 3
>> 13 POP_TOP
14 POP_BLOCK
>> 15 LOAD_CONST 0 (None)
18 RETURN_VALUE
So, while True: is a little easier to read, and while 1: is a bit kinder to old versions of Python. As you're unlikely to need to run on Python 2.2 these days or need to worry about the bytecode count of your loops, the former is marginally preferable.
The most pythonic way will always be the most readable. Use while True:
It doesn't really matter. Neither is hard to read or understand, though personally I'd always use while True, which is a bit more explicit.
More generally, a whole lot of while–break loops people write in Python could be something else. Sometimes I see people write i = 0; while True: i += 1 ..., which can be replaced with for i in itertools.count() and people writing while True: foo = fun() if foo is None: break when this can be written for foo in iter(fun, None), which requires learning but has less boilerplate and opportunity for silly mistakes.
Neither.
Both of them mean I have to scan the code looking for the break, instead of being able to see the stop condition right where it belongs.
I try to avoid this kind of thing wherever possible, and if it's not possible, let the code speak for itself like this:
while not found_answer:
check_number += 1
if check_number == 42:
found_answer = True
Edit: It seems that the word "avoid" above wasn't clear enough. Using a basically infinite loop and leaving it from somewhere within the loop (using break) should usually be avoided altogether. Sometimes that isn't possible. In that case, I like to use something like the code above, which, however, still represents the same concept – the above code is nothing more than a compromise – but at least, I can show the purpose of the loop at the beginning – just like I wouldn't call a function do_something_with_args(*args).
IMO the second option is more obvious.
If you could get rid of the while and write more compact code, that might be more pythonic.
For example:
# Get the even numbers in the range 1..10
# Version 1
l = []
n = 1
while 1:
if n % 2 == 0: l.append(n)
n += 1
if n > 10: break
print l
# Version 2
print [i for i in range(1, 11) if i % 2 == 0]
# Version 3
print range(2, 11, 2)
I think this is mostly a matter of style. Both should be easily understandable as an infinite loop.
However, personally I prefer the second option. That's because it just takes a mental micro-step less to understand, especially for programmers without C background.
The first one will work also in those early versions where True is not yet defined.
If you have an algorithm that is suppose to terminate in a finite time, I would recommend this, which is always safer than while True:
maxiter = 1000
for i in xrange(maxiter):
# your code
# on success:
break
else:
# that algorithm has not finished in maxiter steps! do something accordingly
I believe the second expression is more explicit, and thus more pythonic.
This is only a matter of style, any programming beginner will understand either option.
But the second option will only work if True wasn't assigned to False, which was possible until Python 3:
>>> True = False
>>> True
False
The better way is "while True" with a conditional break out of the loop.