I did this test
import time
def test1():
a=100
b=200
start=time.time()
if (a>b):
c=a
else:
c=b
end=time.time()
print(end-start)
def test2():
a="amisetertzatzaz1111reaet"
b="avieatzfzatzr333333ts"
start=time.time()
if (a>b):
c=a
else:
c=b
end=time.time()
print(end-start)
def test3():
a="100"
b="200"
start=time.time()
if (a>b):
c=a
else:
c=b
end=time.time()
print(end-start)
And obtain as result
1.9073486328125e-06 #test1()
9.5367431640625e-07 #test2()
1.9073486328125e-06 #test3()
Execution times are similar. It's true, use integer instead of string reduce the storage space but what about the execution time?
Timing a single execution of a short piece of code doesn't tell you very much at all. In particular, if you look at the timing numbers from your test1 and test3, you'll see that the numbers are identical. That ought to be a warning sign that, in fact, all that you're seeing here is the resolution of the timer:
>>> 2.0 / 2 ** 20
1.9073486328125e-06
>>> 1.0 / 2 ** 20
9.5367431640625e-07
For better results, you need to run the code many times, and measure and subtract the timing overhead. Python has a built-in module timeit for doing exactly this. Let's time 100 million executions of each kind of comparison:
>>> from timeit import timeit
>>> timeit('100 > 200', number=10**8)
5.98881983757019
>>> timeit('"100" > "200"', number=10**8)
7.528342008590698
so you can see that the difference is not really all that much (string comparison only about 25% slower in this case). So why is string comparison slower? Well, the way to find out is to look at the implementation of the comparison operation.
In Python 2.7, comparison is implemented by the do_cmp function in object.c. (Please open this code in a new window to follow the rest of my analysis.) On line 817, you'll see that if the objects being compared are the same type and if they have a tp_compare function in their class structure, then that function is called. In the case of integer objects, this is what happens, the function being int_compare in intobject.c, which you'll see is very simple.
But strings don't have a tp_compare function, so do_cmp proceeds to call try_rich_to_3way_compare which then calls try_rich_compare_bool up to three times (trying the three comparison operators EQ, LT and GT in turn). This calls try_rich_compare which calls string_richcompare in stringobject.c.
So string comparison is slower because it has to use the complicated "rich comparison" infrastructure, whereas integer comparison is more direct. But even so, it doesn't make all that much difference.
Huh? Since the storage space is reduced, the number of bits that need to be compared is also reduced. Comparing bits is work, doing less work means it goes faster.
Related
I wrote the following simple function that checks whether str1 is a permutation of str2:
def is_perm(str1, str2):
return True if sorted(str1)==sorted(str2) else False
Assuming that sorted(str) has a time complexity of O(n*logn), we can expect a time complexity of O(2*n*logn)=O(n*logn). The following function is an attempt to achieve a better time complexity:
def is_perm2(str1, str2):
dict1 = {}
dict2 = {}
for char in str1:
if char in dict1:
dict1[char] += 1
else:
dict1[char] = 1
for char in str2:
if char in dict2:
dict2[char] += 1
else:
dict2[char] = 1
if dict1==dict2:
return True
else:
return False
Each for-loop iterates n times. Assuming that dictionary lookup and both dictionary updates have constant time complexity, I expect an overall complexity of O(2n)=O(n). However, timeit measurements show the following, contradicting results. Why is is_perm2 slower than is_perm after 1000000 executions even though it's time complexity looks better? Are my assumptions wrong?
import timeit
print(timeit.timeit('is_perm("helloworld","worldhello")', 'from __main__ import is_perm', number=10000000))
print(timeit.timeit('is_perm2("helloworld","worldhello")', 'from __main__ import is_perm2', number=10000000))
# output of first print-call: 12.4199592999993934 seconds
# output of second print-call: 37.13826630001131 seconds
There is no guarantee that an algorithm with a time complexity of O(nlogn) will be slower than one with a time complexity of O(n) for a given input. The second one could for instance have a large constant overhead, making it slower for input sizes that are below 100000 (for instance).
In your test the input size is 10 ("helloworld"), which doesn't tell us much. Repeating that test doesn't make a difference, even if repeated 10000000 times. The repetition only gives a more precise estimate of the average time spent on that particular input.
You would need to feed the algorithm with increasingly large inputs. If memory allows, that would eventually bring us to an input size for which the O(nlogn) algorithm takes more time than the O(n) algorithm.
In this case, I found that the input size had to be really large in comparison with available memory, and I only barely managed to find a case where the difference showed:
import random
import string
import timeit
def shuffled_string(str):
lst = list(str)
random.shuffle(lst)
return "".join(lst)
def random_string(size):
return "".join(random.choices(string.printable, k=size))
str1 = random_string(10000000)
str2 = shuffled_string(str1)
print("start")
print(timeit.timeit(lambda: is_perm(str1, str2), number=5))
print(timeit.timeit(lambda: is_perm2(str1, str2), number=5))
After the initial set up of the strings (which each have a size of 10 million characters), the output on repl.it was:
54.72847577700304
51.07616817899543
The reason why the input has to be so large to see this happen, is that sorted is doing all the hard work in lower-level, compiled code (often C), while the second solution does all the looping and character reading in Python code (often interpreted). It is clear that the overhead of the second solution is huge in comparison with the first.
Improving the second solution
Although not your question, we could improve the implementation of the second algorithm, by relying on another built-in function: Counter:
def is_perm3(str1, str2):
return Counter(str1) == Counter(str2)
With the same test set up as above, the timing for this implementation on repl.it is:
24.917681352002546
Assuming that dictionary lookup and both dictionary updates have constant time complexity,
python dictionary is hashmap.
So exactly, dictionary lookup and update costs O(n) in worst case.
Total average time complexity of is_perm2 is O(n) but worse case time complexity is O(n^2).
If you want get exactly O(n) time complexity, please use List(not Dictionary) to store frequency of characters.
You can easily convert each character to ascii numbers and store their frequency to python list.
The difference in C++ is huge, but not in Python. I used similar code on C++, and the result is so different -- integer comparison is 20-30 times faster than string comparison.
Here is my example code:
import random, time
rand_nums = []
rand_strs = []
total_num = 1000000
for i in range(total_num):
randint = random.randint(0,total_num*10)
randstr = str(randint)
rand_nums.append(randint)
rand_strs.append(randstr)
start = time.time()
for i in range(total_num-1):
b = rand_nums[i+1]>rand_nums[i]
end = time.time()
print("integer compare:",end-start) # 0.14269232749938965 seconds
start = time.time()
for i in range(total_num-1):
b = rand_strs[i+1]>rand_strs[i]
end = time.time() # 0.15730643272399902 seconds
print("string compare:",end-start)
I can't explain why it's so slow in C++, but in Python, the reason is simple from your test code: random strings usually differ in the first byte, so the comparison time for those cases should be pretty much the same.
Also, not that much of your overhead will be in the loop control and list accesses. You'd get a much more accurate measure if you remove those factors by zipping the lists:
for s1, s2 in zip(rand_strs, rand_strs[1:]):
b = s1 > s2
The difference in C++ is huge, but not in Python.
The time spent in the comparison is minimal compared to the rest of the loop in Python. The actual comparison operation is implemented in Python's standard library C code, while the loop will execute through the interpreter.
As a test, you can run this code that performs all the same operations as the string comparison loop, except without the comparison:
start = time.time()
for i in range(total_num-1):
b = rand_strs[i+1], rand_strs[i]
end = time.time()
print("no compare:",end-start)
The times are pretty close to each other, though for me string comparison is always the slowest of the three loops:
integer compare: 1.2947499752044678
string compare: 1.3821675777435303
no compare: 1.3093421459197998
I'm working with very large numbers such as: 632382 to the power of 518061.
When I try calculating it directly using Python (632382**518061), it takes a really long time.
However, when I compare 2 very large numbers:
>>> 632382**518061 > 519432**525806
True
Python does it very quickly.
I assumed that in order to compare both numbers, Python would calculate them beforehand. But since the comparison is much faster than its actual calculation, Python is doing something different.
How is Python able to perform the comparison much faster (apparently without calculating the exact values)?
What takes so long is printing the values.
If I enter
>>> x = 632382**518061
in an interactive Python session, it takes about a second.
If I then enter
>>> x
it takes at least half a minute (I aborted it before it generated any output).1
Evaluating and printing the result of the expression 632382**518061 > 519432**525806 does not require printing the two large numbers, therefore it takes less time.
It still takes longer than evaluating the two numbers (without printing), as expected:
>>> from timeit import timeit
>>> timeit('632382**518061', number=1)
1.312588474999984
>>> timeit('519432**525806', number=1)
1.281405287000041
>>> timeit('632382**518061 > 519432**525806', number=1)
2.685868804999984
1After all, the decimal representation of x has 3005262 digits, which we can calculate much more quickly than with len(str(x)) by using logarithms:
>>> from math import log10, ceil
>>> ceil(518061 * log10(632382))
3005262
So the question regarding the speed of for loops vs while loops has been asked many times before. The for loop is supposed to be faster.
However, when I tested it in Python 3.5.1 the results were as follows:
timeit.timeit('for i in range(10000): True', number=10000)
>>> 12.697646026868842
timeit.timeit('while i<10000: True; i+=1',setup='i=0', number=10000)
>>> 0.0032265179766799434
The while loop runs >3000 times faster than the for loop! I've also tried pre-generating a list for the for loop:
timeit.timeit('for i in lis: True',setup='lis = [x for x in range(10000)]', number=10000)
>>> 3.638794646750142
timeit.timeit('while i<10000: True; i+=1',setup='i=0', number=10000)
>>> 0.0032454974941904524
Which made the for loop 3 times faster, but the difference is still 3 orders of magnitude.
Why does this happen?
You are creating 10k range() objects. These take some time to materialise. You then have to create iterator objects for those 10k objects too (for the for loop to iterate over the values). Next, the for loop uses the iterator protocol by calling the __next__ method on the resulting iterator. Those latter two steps also apply to the for loop over a list.
But most of all, you are cheating on the while loop test. The while loop only has to run once, because you never reset i back to 0 (thanks to Jim Fasarakis Hilliard pointing that out). You are in effect running a while loop through a total of 19999 comparisons; the first test runs 10k comparisons, the remaining 9999 tests run one comparison. And that comparison is fast:
>>> import timeit
>>> timeit.timeit('while i<10000: True; i+=1',setup='i=0', number=10000)
0.0008302750065922737
>>> (
... timeit.timeit('while i<10000: True; i+=1', setup='i=0', number=1) +
... timeit.timeit('10000 < 10000', number=9999)
... )
0.0008467709994874895
See how close those numbers are?
My machine is a little faster, so lets create a baseline to compare against; this is using 3.6.1 on a Macbook Pro (Retina, 15-inch, Mid 2015) running on OS X 10.12.5. And lets also fix the while loop to set i = 0 in the test, not the setup (which is run just once):
>>> import timeit
>>> timeit.timeit('for i in range(10000): pass', number=10000)
1.9789885189966299
>>> timeit.timeit('i=0\nwhile i<10000: True; i+=1', number=10000)
5.172155902953818
Oops, so a correctly running while is actually slower here, there goes your premise (and mine!).
I used pass to avoid having to answer question about how fast referencing that object is (it's fast but besides the point). My timings are going to be 6x faster than your machine.
If you wanted to explore why the iteration is faster, you could time the various components of the for loop in Python, starting with creating the range() object:
>>> timeit.timeit('range(10000)', number=10000)
0.0036197409499436617
So creating 10000 range() objects takes more time than running a single while loop that iterates 10k times. range() objects are more expensive to create than integers.
This does involves a global name lookup, which is slower, you could make it faster by using setup='_range = range' then use _range(1000); this shaves of about 1/3rd of the timings.
Next, create an iterator for this; here I'll use a local name for the iter() function, as the for loop doesn't have to do a hash-table lookup and just reaches for the C function instead. Hard-coded references to a memory location in a binary is a lot faster, of course:
>>> timeit.timeit('_iter(r)', setup='_iter = iter; r = range(10000)', number=10000)
0.0009729859884828329
Fairly fast, but; it takes the same amount of time as your single while loop iterating 10k times. So creating iterable objects is cheap. The C implementation is faster still. We haven't iterated yet.
Last, we call __next__ on the iterator object, 10k times. This is again done in C code, with cached references to internal C implementations, but with a functools.partial() object we can at least attempt to get a ball-park figure:
>>> timeit.timeit('n()', setup='from functools import partial; i = iter(range(10000)); n = partial(i.__next__)', number=10000) * 10000
7.759470026940107
Boy, 10k times 10k calls to iter(range(1000)).__next__ takes almost 4x more time than the for loop managed; this goes to show how efficient the actual C implementation really is.
However, it does illustrate that looping in C code is a lot faster, and this is why the while loop is actually slower when executed correctly; summing integers and making boolean comparisons in bytecode takes more time than iterating over range() in C code (where the CPU does the incrementing and comparisons directly in CPU registers):
>>> (
... timeit.timeit('9999 + 1', number=10000 ** 2) +
... timeit.timeit('9999 < 10000', number=10000 ** 2)
... )
3.695550534990616
It is those operations that make the while loop about 3 seconds slower.
TLDR: You didn't actually test a while loop correctly. I should have noticed this earlier too.
You are timing things incorrectly, setup is only executed once and then the value of i is 10000 for all consequent runs. See the documentation on timeit:
Time number executions of the main statement. This executes the setup statement once, and then returns the time it takes to execute the main statement a number of times, measured in seconds as a float.
Additionally verify it by printing i for each repetition:
>>> timeit('print(i)\nwhile i<10000: True; i+=1',setup='i=0', number=5)
0
10000
10000
10000
10000
As a result, all consequent runs merely perform a comparison (which is True) and finish early.
Time correctly and see how the for loop is actually faster:
>>> timeit('i=0\nwhile i<10000: True; i+=1', number=10000)
8.416439056396484
>>> timeit('for i in range(10000): True', number=10000)
5.589155912399292
In other words, what exists behind the two asterisks? Is it simply multiplying the number x times or something else?
As a follow-up question, is it better to write 2**3 or 2*2*2. I'm asking because I've heard that in C++ it's better to not use pow() for simple calculations, since it calls a function.
If you're interested in the internals, I'd disassemble the instruction to get the CPython bytecode it maps to. Using Python3:
»»» def test():
return 2**3
...:
»»» dis.dis(test)
2 0 LOAD_CONST 3 (8)
3 RETURN_VALUE
OK, so that seems to have done the calculation right on entry, and stored the result. You get exactly the same CPython bytecode for 2*2*2 (feel free to try it). So, for the expressions that evaluate to a constant, you get the same result and it doesn't matter.
What if you want the power of a variable?
Now you get two different bits of bytecode:
»»» def test(n):
return n ** 3
»»» dis.dis(test)
2 0 LOAD_FAST 0 (n)
3 LOAD_CONST 1 (3)
6 BINARY_POWER
7 RETURN_VALUE
vs.
»»» def test(n):
return n * 2 * 2
....:
»»» dis.dis(test)
2 0 LOAD_FAST 0 (n)
3 LOAD_CONST 1 (2)
6 BINARY_MULTIPLY
7 LOAD_CONST 1 (2)
10 BINARY_MULTIPLY
11 RETURN_VALUE
Now the question is of course, is the BINARY_MULTIPLY quicker than the BINARY_POWER operation?
The best way to try that is to use timeit. I'll use the IPython %timeit magic. Here's the output for multiplication:
%timeit test(100)
The slowest run took 15.52 times longer than the fastest. This could mean that an intermediate result is being cached
10000000 loops, best of 3: 163 ns per loop
and for power
The slowest run took 5.44 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 473 ns per loop
You may wish to repeat this for representative inputs, but empirically it looks like the multiplication is quicker (but note the mentioned caveat about the variance in the output).
If you want further internals, I'd suggest digging into the CPython code.
While the second one is little bit faster for numbers, the advantage is very low compared to the first: readability. If you are going for time, and you are pressured to make such optimizations, then python probably is not the language you should use.
Note: for values other than numbers:
a ** b translates to
a.__pow__(b)
whereas a * a * a is a call to
a.__mul__(a.__mul__(a))
Test Code:
import time
s = time.time()
for x in xrange(1,1000000):
x**5
print "done in ", time.time() - s
s = time.time()
for x in xrange(1,1000000):
x*x*x*x*x
print "done in ", time.time() - s
For my machine it yields:
done in 0.975429058075
done in 0.260419845581
[Finished in 1.2s]
If you ask frankly, multiplication is a bit faster.
>>timeit.timeit('[i*i*i*i for i in range(100)]', number=10000)
0.262529843304
>>timeit.timeit('[i**4 for i in range(100)]', number=10000)
0.31143438383
But, speed isn't the only thing to consider when you will choose one from two options. From example, what is easier while computing 2 to the power 20? Simply writing 2**20 or using for loop that will iterate 20 times and do some multiplication task?
The ** operator will, internally, use an iterative function (same semantics as built-in pow() (Python docs), which likely means it just calls that function anyway).
Therefore, if you know the power and can hardcode it, using 2*2*2 would likely be a little faster than 2**3. This has a little to do with the function, but I believe the main performance issue is that it will use a loop.
Note that it's still quite silly to replace more readable code for less readable code when it's something as simple as 2**3, the performance gain is minimal at best.
From the docs:
The power operator binds more tightly than unary operators on its left; it binds less tightly than unary operators on its right. The syntax is:
power ::= primary ["**" u_expr]
Thus, in an unparenthesized sequence of power and unary operators, the operators are evaluated from right to left (this does not constrain the evaluation order for the operands): -1**2 results in -1.
The power operator has the same semantics as the built-in pow() function, when called with two arguments: it yields its left argument raised to the power of its right argument.
This means that, in Python: 2**2**3 is evaluated as 2**(2**3) = 2**8 = 256.
In mathematics, stacked exponents are applied from the top down. If it were not done this way you would just get multiplication of exponents:
(((2**3)**4)**5) = 2**(3*4*5)
It might be a little faster just to do the multiplication, but much less readable.