What does the power operator (**) in Python translate into? - python

In other words, what exists behind the two asterisks? Is it simply multiplying the number x times or something else?
As a follow-up question, is it better to write 2**3 or 2*2*2. I'm asking because I've heard that in C++ it's better to not use pow() for simple calculations, since it calls a function.

If you're interested in the internals, I'd disassemble the instruction to get the CPython bytecode it maps to. Using Python3:
»»» def test():
return 2**3
...:
»»» dis.dis(test)
2 0 LOAD_CONST 3 (8)
3 RETURN_VALUE
OK, so that seems to have done the calculation right on entry, and stored the result. You get exactly the same CPython bytecode for 2*2*2 (feel free to try it). So, for the expressions that evaluate to a constant, you get the same result and it doesn't matter.
What if you want the power of a variable?
Now you get two different bits of bytecode:
»»» def test(n):
return n ** 3
»»» dis.dis(test)
2 0 LOAD_FAST 0 (n)
3 LOAD_CONST 1 (3)
6 BINARY_POWER
7 RETURN_VALUE
vs.
»»» def test(n):
return n * 2 * 2
....:
»»» dis.dis(test)
2 0 LOAD_FAST 0 (n)
3 LOAD_CONST 1 (2)
6 BINARY_MULTIPLY
7 LOAD_CONST 1 (2)
10 BINARY_MULTIPLY
11 RETURN_VALUE
Now the question is of course, is the BINARY_MULTIPLY quicker than the BINARY_POWER operation?
The best way to try that is to use timeit. I'll use the IPython %timeit magic. Here's the output for multiplication:
%timeit test(100)
The slowest run took 15.52 times longer than the fastest. This could mean that an intermediate result is being cached
10000000 loops, best of 3: 163 ns per loop
and for power
The slowest run took 5.44 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 473 ns per loop
You may wish to repeat this for representative inputs, but empirically it looks like the multiplication is quicker (but note the mentioned caveat about the variance in the output).
If you want further internals, I'd suggest digging into the CPython code.

While the second one is little bit faster for numbers, the advantage is very low compared to the first: readability. If you are going for time, and you are pressured to make such optimizations, then python probably is not the language you should use.
Note: for values other than numbers:
a ** b translates to
a.__pow__(b)
whereas a * a * a is a call to
a.__mul__(a.__mul__(a))
Test Code:
import time
s = time.time()
for x in xrange(1,1000000):
x**5
print "done in ", time.time() - s
s = time.time()
for x in xrange(1,1000000):
x*x*x*x*x
print "done in ", time.time() - s
For my machine it yields:
done in 0.975429058075
done in 0.260419845581
[Finished in 1.2s]

If you ask frankly, multiplication is a bit faster.
>>timeit.timeit('[i*i*i*i for i in range(100)]', number=10000)
0.262529843304
>>timeit.timeit('[i**4 for i in range(100)]', number=10000)
0.31143438383
But, speed isn't the only thing to consider when you will choose one from two options. From example, what is easier while computing 2 to the power 20? Simply writing 2**20 or using for loop that will iterate 20 times and do some multiplication task?

The ** operator will, internally, use an iterative function (same semantics as built-in pow() (Python docs), which likely means it just calls that function anyway).
Therefore, if you know the power and can hardcode it, using 2*2*2 would likely be a little faster than 2**3. This has a little to do with the function, but I believe the main performance issue is that it will use a loop.
Note that it's still quite silly to replace more readable code for less readable code when it's something as simple as 2**3, the performance gain is minimal at best.

From the docs:
The power operator binds more tightly than unary operators on its left; it binds less tightly than unary operators on its right. The syntax is:
power ::= primary ["**" u_expr]
Thus, in an unparenthesized sequence of power and unary operators, the operators are evaluated from right to left (this does not constrain the evaluation order for the operands): -1**2 results in -1.
The power operator has the same semantics as the built-in pow() function, when called with two arguments: it yields its left argument raised to the power of its right argument.
This means that, in Python: 2**2**3 is evaluated as 2**(2**3) = 2**8 = 256.
In mathematics, stacked exponents are applied from the top down. If it were not done this way you would just get multiplication of exponents:
(((2**3)**4)**5) = 2**(3*4*5)
It might be a little faster just to do the multiplication, but much less readable.

Related

What optimisations are done that this code completes quickly?

I was solving a problem I came across, what is the sum of powers of 3 from 0 to 2009 mod 8.
I got an answer using pen and paper, and tried to verify it with some simple python
print(sum(3**k for k in range(2010)) % 8)
I was surprised by how quickly it returned an answer. My question is what optimisations or tricks are used by the interpreter to get the answer so quickly?
None, it's just not a lot of computation for a computer to do.
Your code is equivalent to:
>>> a = sum(3**k for k in range(2010))
>>> a % 8
4
a is a 959-digit number - it's just not a large task to ask of a computer.
Try sticking two zeros on the end of the 2010 and you will see it taking an appreciable amount of time.
The only optimization at work is that each instance of 3**k is evaluated using a number of multiplications proportional to the number of bits in k (it does not multiply 3 by itself k-1 times).
As already noted, if you boost 2010 to 20100 or 201000 or ..., it will take much longer, because 3**k becomes very large. However, in those cases you can speed it enormously again by rewriting it as, e.g.,
print(sum(pow(3, k, 8) for k in range(201000)) % 8)
Internally, pow(3, k, 8) still does a number of multiplications proportional to the number of bits in k, but doesn't need to retain any integers internally larger than about 8**2 (the square of the modulus).
No fancy optimizations are responsible for the fast response you observed. Computers are just a lot faster in absolute terms than you expected.

Exponentials in python: x**y vs math.pow(x, y)

Which one is more efficient using math.pow or the ** operator? When should I use one over the other?
So far I know that x**y can return an int or a float if you use a decimal
the function pow will return a float
import math
print( math.pow(10, 2) )
print( 10. ** 2 )
Using the power operator ** will be faster as it won’t have the overhead of a function call. You can see this if you disassemble the Python code:
>>> dis.dis('7. ** i')
1 0 LOAD_CONST 0 (7.0)
3 LOAD_NAME 0 (i)
6 BINARY_POWER
7 RETURN_VALUE
>>> dis.dis('pow(7., i)')
1 0 LOAD_NAME 0 (pow)
3 LOAD_CONST 0 (7.0)
6 LOAD_NAME 1 (i)
9 CALL_FUNCTION 2 (2 positional, 0 keyword pair)
12 RETURN_VALUE
>>> dis.dis('math.pow(7, i)')
1 0 LOAD_NAME 0 (math)
3 LOAD_ATTR 1 (pow)
6 LOAD_CONST 0 (7)
9 LOAD_NAME 2 (i)
12 CALL_FUNCTION 2 (2 positional, 0 keyword pair)
15 RETURN_VALUE
Note that I’m using a variable i as the exponent here because constant expressions like 7. ** 5 are actually evaluated at compile time.
Now, in practice, this difference does not matter that much, as you can see when timing it:
>>> from timeit import timeit
>>> timeit('7. ** i', setup='i = 5')
0.2894785532627111
>>> timeit('pow(7., i)', setup='i = 5')
0.41218495570683444
>>> timeit('math.pow(7, i)', setup='import math; i = 5')
0.5655053168791255
So, while pow and math.pow are about twice as slow, they are still fast enough to not care much. Unless you can actually identify the exponentiation as a bottleneck, there won’t be a reason to choose one method over the other if clarity decreases. This especially applies since pow offers an integrated modulo operation for example.
Alfe asked a good question in the comments above:
timeit shows that math.pow is slower than ** in all cases. What is math.pow() good for anyway? Has anybody an idea where it can be of any advantage then?
The big difference of math.pow to both the builtin pow and the power operator ** is that it always uses float semantics. So if you, for some reason, want to make sure you get a float as a result back, then math.pow will ensure this property.
Let’s think of an example: We have two numbers, i and j, and have no idea if they are floats or integers. But we want to have a float result of i^j. So what options do we have?
We can convert at least one of the arguments to a float and then do i ** j.
We can do i ** j and convert the result to a float (float exponentation is automatically used when either i or j are floats, so the result is the same).
We can use math.pow.
So, let’s test this:
>>> timeit('float(i) ** j', setup='i, j = 7, 5')
0.7610865891750791
>>> timeit('i ** float(j)', setup='i, j = 7, 5')
0.7930400942188385
>>> timeit('float(i ** j)', setup='i, j = 7, 5')
0.8946636625872202
>>> timeit('math.pow(i, j)', setup='import math; i, j = 7, 5')
0.5699394063529439
As you can see, math.pow is actually faster! And if you think about it, the overhead from the function call is also gone now, because in all the other alternatives we have to call float().
In addition, it might be worth to note that the behavior of ** and pow can be overridden by implementing the special __pow__ (and __rpow__) method for custom types. So if you don’t want that (for whatever reason), using math.pow won’t do that.
The pow() function will allow you to add a third argument as a modulus.
For example: I was recently faced with a memory error when doing
2**23375247598357347582 % 23375247598357347583
Instead I did:
pow(2, 23375247598357347582, 23375247598357347583)
This returns in mere milliseconds instead of the massive amount of time and memory that the plain exponent takes. So, when dealing with large numbers and parallel modulus, pow() is more efficient, however when dealing with smaller numbers without modulus, ** is more efficient.
Just for the protocol: The ** operator is equivalent to the two-argument version of the built-in pow function, the pow function accepts an optional third argument (modulus) if the first two arguments are integers.
So, if you intend to calculate remainders from powers, use the built-in function. The math.pow will give you false results for arguments of reasonable size:
import math
base = 13
exp = 100
mod = 2
print math.pow(base, exp) % mod
print pow(base, exp, mod)
When I ran this, I got 0.0 in the first case which obviously cannot be true, because 13 is odd (and therefore all of it's integral powers). The math.pow version uses the limited accuracy of the IEEE-754 Double precision (52 bits mantissa, slightly less than 16 decimal places) which causes an error here.
For sake of fairness, we must say, math.pow can also be faster:
>>> import timeit
>>> min(timeit.repeat("pow(1.1, 9.9)", number=2000000, repeat=5))
0.3063715160001266
>>> min(timeit.repeat("math.pow(1.1, 9.9)", setup="import math", number=2000000, repeat=5))
0.2647279420000359
The math.pow function had (and still has) its strength in engineering applications, but for number theoretical applications, you should use the built-in pow function.
Some online examples
http://ideone.com/qaDWRd (wrong remainder with math.pow)
http://ideone.com/g7J9Un (lower performance with pow on int values)
http://ideone.com/KnEtXj (slightly lower performance with pow on float values)
Update (inevitable correction):
I removed the timing comparison of math.pow(2,100) and pow(2,100) since math.pow gives a wrong result whereas, for example, the comparison between pow(2,50) and math.pow(2,50) would have been fair (although not a realistic use of the math-module function). I added a better one and also the details that cause the limitation of math.pow.
** is indeed faster then math.pow(), but if you want a simple quadratic function like in your example it is even faster to use a product.
10.*10.
will be faster then
10.**2
The difference is not big and not noticable with one operation (using timeit), but with a large number of operations it can be significant.
Well, they are for different tasks, really.
Use pow (equivalent to x ** y with two arguments) when you want integer arithmetic.
And use math.pow if either argument is float, and you want float output.
For a discussion on the differences between pow and math.pow, see this question.
operator ** (same as pow()) can be used to calculate very large integer number.
>>> 2 ** 12345
164171010688258216356020741663906501410127235530735881272116103087925094171390144280159034536439457734870419127140401667195510331085657185332721089236401193044493457116299768844344303479235489462...
>>> math.pow(2, 12345)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OverflowError: math range error
For small powers, like 2, I prefer to just multiply the base:
Use x*x instead of x**2 or pow(x, 2).
I haven't timed it, but I'd bet the multiplier is as fast as either the exponential operator or the pow function.

In practice, why compare integer is better than compare string?

I did this test
import time
def test1():
a=100
b=200
start=time.time()
if (a>b):
c=a
else:
c=b
end=time.time()
print(end-start)
def test2():
a="amisetertzatzaz1111reaet"
b="avieatzfzatzr333333ts"
start=time.time()
if (a>b):
c=a
else:
c=b
end=time.time()
print(end-start)
def test3():
a="100"
b="200"
start=time.time()
if (a>b):
c=a
else:
c=b
end=time.time()
print(end-start)
And obtain as result
1.9073486328125e-06 #test1()
9.5367431640625e-07 #test2()
1.9073486328125e-06 #test3()
Execution times are similar. It's true, use integer instead of string reduce the storage space but what about the execution time?
Timing a single execution of a short piece of code doesn't tell you very much at all. In particular, if you look at the timing numbers from your test1 and test3, you'll see that the numbers are identical. That ought to be a warning sign that, in fact, all that you're seeing here is the resolution of the timer:
>>> 2.0 / 2 ** 20
1.9073486328125e-06
>>> 1.0 / 2 ** 20
9.5367431640625e-07
For better results, you need to run the code many times, and measure and subtract the timing overhead. Python has a built-in module timeit for doing exactly this. Let's time 100 million executions of each kind of comparison:
>>> from timeit import timeit
>>> timeit('100 > 200', number=10**8)
5.98881983757019
>>> timeit('"100" > "200"', number=10**8)
7.528342008590698
so you can see that the difference is not really all that much (string comparison only about 25% slower in this case). So why is string comparison slower? Well, the way to find out is to look at the implementation of the comparison operation.
In Python 2.7, comparison is implemented by the do_cmp function in object.c. (Please open this code in a new window to follow the rest of my analysis.) On line 817, you'll see that if the objects being compared are the same type and if they have a tp_compare function in their class structure, then that function is called. In the case of integer objects, this is what happens, the function being int_compare in intobject.c, which you'll see is very simple.
But strings don't have a tp_compare function, so do_cmp proceeds to call try_rich_to_3way_compare which then calls try_rich_compare_bool up to three times (trying the three comparison operators EQ, LT and GT in turn). This calls try_rich_compare which calls string_richcompare in stringobject.c.
So string comparison is slower because it has to use the complicated "rich comparison" infrastructure, whereas integer comparison is more direct. But even so, it doesn't make all that much difference.
Huh? Since the storage space is reduced, the number of bits that need to be compared is also reduced. Comparing bits is work, doing less work means it goes faster.

List lookup faster than tuple?

In the past, when I've needed array-like indexical  lookups in a tight loop, I usually use tuples, since they seem to be generally extremely performant (close to using just n-number of variables). However, I decided to question that assumption today and came up with some surprising results:
In [102]: l = range(1000)
In [103]: t = tuple(range(1000))
In [107]: timeit(lambda : l[500], number = 10000000)
Out[107]: 2.465047836303711
In [108]: timeit(lambda : t[500], number = 10000000)
Out[108]: 2.8896381855010986
Tuple lookups appear to take 17% longer than list lookups! Repeated experimentation gave similar results. Disassembling each, I found them to both be:
In [101]: dis.dis(lambda : l[5])
1 0 LOAD_GLOBAL 0 (l)
3 LOAD_CONST 1 (5)
6 BINARY_SUBSCR
7 RETURN_VALUE
For reference, a typical 10,000,000 global variable lookup/returns take 2.2s. Also, I ran it without the lambdas, y'know, just in case (note that number=100,000,000 rather than 10,000,000).
In [126]: timeit('t[500]', 't=range(1000)', number=100000000)
Out[126]: 6.972800970077515
In [127]: timeit('t[500]', 't=tuple(range(1000))', number=100000000)
Out[127]: 9.411366939544678
Here, the tuple lookup take 35% longer. What's going on here? For very tight loops, this actually seems like a significant discrepancy. What could be causing this?
Note that for decomposition into variable (e.g. x,y=t), tuples are slightly faster (~6% in my few tests less time) and for construction from a fixed number of arguments, tuples are crazy faster(~83% less time). Don't take these results as general rules; I just performed a few minitests that are going to be meaningless for most projects.
In [169]: print(sys.version)
2.7.1 (r271:86882M, Nov 30 2010, 09:39:13)
[GCC 4.0.1 (Apple Inc. build 5494)]
Tuples are primarily faster for constructing lists, not for accessing them.
Tuples should be slightly faster to access: they require one less indirection. However, I believe the main benefit is that they don't require a second allocation when constructing the list.
The reason lists are slightly faster for lookups is because the Python engine has a special optimization for it:
case BINARY_SUBSCR:
w = POP();
v = TOP();
if (PyList_CheckExact(v) && PyInt_CheckExact(w)) {
/* INLINE: list[int] */
Py_ssize_t i = PyInt_AsSsize_t(w);
if (i < 0)
i += PyList_GET_SIZE(v);
if (i >= 0 && i < PyList_GET_SIZE(v)) {
x = PyList_GET_ITEM(v, i);
Py_INCREF(x);
}
With this optimization commented out, tuples are very slightly faster than lists (by about 4%).
Note that adding a separate special-case optimization for tuples here isn't necessary a good idea. Every special case like this in the main body of the VM loop increases the code size, which decreases cache consistency, and it means every other type of lookup requires an extra branch.
Contrary to this, I have completely different advice.
If the data is -- by the nature of the problem -- fixed in length, use a tuple.
Examples:
( r, g, b ) - three elements, fixed by the definition of the problem.
( latitude, longitude ) - two elements, fixed by the problem definition
If the data is -- by the nature of the problem -- variable, use a list.
Speed is not the issue.
Meaning should be the only consideration.

What's the difference between "while 1" and "while True"?

I've seen two ways to create an infinite loop in Python:
while 1:
do_something()
while True:
do_something()
Is there any difference between these? Is one more pythonic than the other?
Fundamentally it doesn't matter, such minutiae doesn't really affect whether something is 'pythonic' or not.
If you're interested in trivia however, there are some differences.
The builtin boolean type didn't exist till Python 2.3 so code that was intended to run on ancient versions tends to use the while 1: form. You'll see it in the standard library, for instance.
The True and False builtins are not reserved words prior to Python 3 so could be assigned to, changing their value. This helps with the case above because code could do True = 1 for backwards compatibility, but means that the name True needs to be looked up in the globals dictionary every time it is used.
Because of the above restriction, the bytecode the two versions compile to is different in Python 2 as there's an optimisation for constant integers that it can't use for True. Because Python can tell when compiling the 1 that it's always non-zero, it removes the conditional jump and doesn't load the constant at all:
>>> import dis
>>> def while_1():
... while 1:
... pass
...
>>> def while_true():
... while True:
... pass
...
>>> dis.dis(while_1)
2 0 SETUP_LOOP 5 (to 8)
3 >> 3 JUMP_ABSOLUTE 3
6 POP_TOP
7 POP_BLOCK
>> 8 LOAD_CONST 0 (None)
11 RETURN_VALUE
>>> dis.dis(while_true)
2 0 SETUP_LOOP 12 (to 15)
>> 3 LOAD_GLOBAL 0 (True)
6 JUMP_IF_FALSE 4 (to 13)
9 POP_TOP
3 10 JUMP_ABSOLUTE 3
>> 13 POP_TOP
14 POP_BLOCK
>> 15 LOAD_CONST 0 (None)
18 RETURN_VALUE
So, while True: is a little easier to read, and while 1: is a bit kinder to old versions of Python. As you're unlikely to need to run on Python 2.2 these days or need to worry about the bytecode count of your loops, the former is marginally preferable.
The most pythonic way will always be the most readable. Use while True:
It doesn't really matter. Neither is hard to read or understand, though personally I'd always use while True, which is a bit more explicit.
More generally, a whole lot of while–break loops people write in Python could be something else. Sometimes I see people write i = 0; while True: i += 1 ..., which can be replaced with for i in itertools.count() and people writing while True: foo = fun() if foo is None: break when this can be written for foo in iter(fun, None), which requires learning but has less boilerplate and opportunity for silly mistakes.
Neither.
Both of them mean I have to scan the code looking for the break, instead of being able to see the stop condition right where it belongs.
I try to avoid this kind of thing wherever possible, and if it's not possible, let the code speak for itself like this:
while not found_answer:
check_number += 1
if check_number == 42:
found_answer = True
Edit: It seems that the word "avoid" above wasn't clear enough. Using a basically infinite loop and leaving it from somewhere within the loop (using break) should usually be avoided altogether. Sometimes that isn't possible. In that case, I like to use something like the code above, which, however, still represents the same concept – the above code is nothing more than a compromise – but at least, I can show the purpose of the loop at the beginning – just like I wouldn't call a function do_something_with_args(*args).
IMO the second option is more obvious.
If you could get rid of the while and write more compact code, that might be more pythonic.
For example:
# Get the even numbers in the range 1..10
# Version 1
l = []
n = 1
while 1:
if n % 2 == 0: l.append(n)
n += 1
if n > 10: break
print l
# Version 2
print [i for i in range(1, 11) if i % 2 == 0]
# Version 3
print range(2, 11, 2)
I think this is mostly a matter of style. Both should be easily understandable as an infinite loop.
However, personally I prefer the second option. That's because it just takes a mental micro-step less to understand, especially for programmers without C background.
The first one will work also in those early versions where True is not yet defined.
If you have an algorithm that is suppose to terminate in a finite time, I would recommend this, which is always safer than while True:
maxiter = 1000
for i in xrange(maxiter):
# your code
# on success:
break
else:
# that algorithm has not finished in maxiter steps! do something accordingly
I believe the second expression is more explicit, and thus more pythonic.
This is only a matter of style, any programming beginner will understand either option.
But the second option will only work if True wasn't assigned to False, which was possible until Python 3:
>>> True = False
>>> True
False
The better way is "while True" with a conditional break out of the loop.

Categories