why are // calculations faster the / calculations(or not) - python

I was just experimenting with some code and I found something out what makes no sence to me
>>> import timeit
>>> timeit.timeit("524288000/1024/1024")
0.05489620000000173
>>> timeit.timeit("524288000//1024//1024")
0.030612500000017917
>>>
using // in calculations is faster then / calculations
but when i repeated it this where the results:
>>> timeit.timeit("524288000//1024//1024")
0.02494899999999234
>>> timeit.timeit("524288000/1024/1024")
0.02480830000001788
and now is / faster then // what makes no sense to me
why is this?
edit:
the results of the experiment with the the amount of times repeated on 10000 this are the results:
avg for /: 0.0261193088
avg for //: 0.025788395899999896

When you time a function the CPU calculates the difference between the time when the instruction finished and the time when the instruction started, but a lot happens under the hood and not just the algorithm that you're timing.
Try to read some books about Operating Systems and you'll understand better.
In order to do these kind of experiments you should repeat this algorithm thousands of times to escape from variations.
Try the code below, but if you want to do real experiments change the loop value to something greater
import timeit
loops = 100
oneSlashAvg = 0
for i in range(loops):
oneSlashAvg += timeit.timeit("524288000/1024/1024")
print(oneSlashAvg/loops)
doubleSlashAvg = 0
for i in range(loops):
doubleSlashAvg += timeit.timeit("524288000//1024//1024")
print(doubleSlashAvg/loops)

Related

Numerical comparison in Python

I'm working with very large numbers such as: 632382 to the power of 518061.
When I try calculating it directly using Python (632382**518061), it takes a really long time.
However, when I compare 2 very large numbers:
>>> 632382**518061 > 519432**525806
True
Python does it very quickly.
I assumed that in order to compare both numbers, Python would calculate them beforehand. But since the comparison is much faster than its actual calculation, Python is doing something different.
How is Python able to perform the comparison much faster (apparently without calculating the exact values)?
What takes so long is printing the values.
If I enter
>>> x = 632382**518061
in an interactive Python session, it takes about a second.
If I then enter
>>> x
it takes at least half a minute (I aborted it before it generated any output).1
Evaluating and printing the result of the expression 632382**518061 > 519432**525806 does not require printing the two large numbers, therefore it takes less time.
It still takes longer than evaluating the two numbers (without printing), as expected:
>>> from timeit import timeit
>>> timeit('632382**518061', number=1)
1.312588474999984
>>> timeit('519432**525806', number=1)
1.281405287000041
>>> timeit('632382**518061 > 519432**525806', number=1)
2.685868804999984
1After all, the decimal representation of x has 3005262 digits, which we can calculate much more quickly than with len(str(x)) by using logarithms:
>>> from math import log10, ceil
>>> ceil(518061 * log10(632382))
3005262

Oddity calculating runtime with timeit in Python?

I want to calculate the runtime of two different algorithms in the same program. When I wrote a program calculating the runtime of each individually, I obtained very different results, so to test this new program, I had python calculate the runtime for the same algorithm twice. When I did this (in the program found below), I found that the runtimes of the same algorithm were in fact different! What am I missing and how do I fix this so I can compare algorithms?
import timeit
def calc1(x):
return x*x+x+1
def calc2(x):
return x*x+x+1
def main():
x = int(input("Input a number to be tested: "))
start1 = timeit.default_timer()
result1 = calc1(x)
end1 = timeit.default_timer()
start2 = timeit.default_timer()
result2 = calc2(x)
end2 = timeit.default_timer()
print("Result of calculation 1 was {0}; time to compute was {1} seconds.".format(result1,end1-start1))
print("Result of calculation 2 was {0}; time to compute was {1} seconds.".format(result2,end2-start2))
main()
I think you're being bitten by Windows power management in one part, and the second part, your testing method is flawed.
In the first case, I also got bitten by the fact that Windows, by default, will throttle CPU throughput in order to save power. Currently, there is a dramatic difference in your calculated runtimes; this can be dramatically reduced just by having a nonsense calculation like something = 5**1000000 immediately after x = int(input("Input a number to be tested: ")) to ramp up the CPU resources. I hate Windows 10 so I don't know how to change this off the top of my head; I need to check how you shift Power Options" to "High Performance" and remove CPU throttling (Edit soon), which will reduce this gap considerably.
The second issue is that you only run one test cycle. You cannot get stable results from this. Instead, you need multiple iterations. For example, with a million iterations you would see some similarity between the numbers:
import timeit
exec1 = timeit.timeit(stmt='def calc1():\n return x*x+x+1', number=1000000)
exec2 = timeit.timeit(stmt='def calc2():\n return x*x+x+1', number=1000000)
print("Execution of first loop: {}".format(exec1))
print("Execution of second loop: {}".format(exec2))
Depending on your IDE (if it's Canopy/Spyder) then there could be cleaner ways of running timeit such as using your existing definitions of calc1 and calc2.

While loop >1000 times faster than for loop?

So the question regarding the speed of for loops vs while loops has been asked many times before. The for loop is supposed to be faster.
However, when I tested it in Python 3.5.1 the results were as follows:
timeit.timeit('for i in range(10000): True', number=10000)
>>> 12.697646026868842
timeit.timeit('while i<10000: True; i+=1',setup='i=0', number=10000)
>>> 0.0032265179766799434
The while loop runs >3000 times faster than the for loop! I've also tried pre-generating a list for the for loop:
timeit.timeit('for i in lis: True',setup='lis = [x for x in range(10000)]', number=10000)
>>> 3.638794646750142
timeit.timeit('while i<10000: True; i+=1',setup='i=0', number=10000)
>>> 0.0032454974941904524
Which made the for loop 3 times faster, but the difference is still 3 orders of magnitude.
Why does this happen?
You are creating 10k range() objects. These take some time to materialise. You then have to create iterator objects for those 10k objects too (for the for loop to iterate over the values). Next, the for loop uses the iterator protocol by calling the __next__ method on the resulting iterator. Those latter two steps also apply to the for loop over a list.
But most of all, you are cheating on the while loop test. The while loop only has to run once, because you never reset i back to 0 (thanks to Jim Fasarakis Hilliard pointing that out). You are in effect running a while loop through a total of 19999 comparisons; the first test runs 10k comparisons, the remaining 9999 tests run one comparison. And that comparison is fast:
>>> import timeit
>>> timeit.timeit('while i<10000: True; i+=1',setup='i=0', number=10000)
0.0008302750065922737
>>> (
... timeit.timeit('while i<10000: True; i+=1', setup='i=0', number=1) +
... timeit.timeit('10000 < 10000', number=9999)
... )
0.0008467709994874895
See how close those numbers are?
My machine is a little faster, so lets create a baseline to compare against; this is using 3.6.1 on a Macbook Pro (Retina, 15-inch, Mid 2015) running on OS X 10.12.5. And lets also fix the while loop to set i = 0 in the test, not the setup (which is run just once):
>>> import timeit
>>> timeit.timeit('for i in range(10000): pass', number=10000)
1.9789885189966299
>>> timeit.timeit('i=0\nwhile i<10000: True; i+=1', number=10000)
5.172155902953818
Oops, so a correctly running while is actually slower here, there goes your premise (and mine!).
I used pass to avoid having to answer question about how fast referencing that object is (it's fast but besides the point). My timings are going to be 6x faster than your machine.
If you wanted to explore why the iteration is faster, you could time the various components of the for loop in Python, starting with creating the range() object:
>>> timeit.timeit('range(10000)', number=10000)
0.0036197409499436617
So creating 10000 range() objects takes more time than running a single while loop that iterates 10k times. range() objects are more expensive to create than integers.
This does involves a global name lookup, which is slower, you could make it faster by using setup='_range = range' then use _range(1000); this shaves of about 1/3rd of the timings.
Next, create an iterator for this; here I'll use a local name for the iter() function, as the for loop doesn't have to do a hash-table lookup and just reaches for the C function instead. Hard-coded references to a memory location in a binary is a lot faster, of course:
>>> timeit.timeit('_iter(r)', setup='_iter = iter; r = range(10000)', number=10000)
0.0009729859884828329
Fairly fast, but; it takes the same amount of time as your single while loop iterating 10k times. So creating iterable objects is cheap. The C implementation is faster still. We haven't iterated yet.
Last, we call __next__ on the iterator object, 10k times. This is again done in C code, with cached references to internal C implementations, but with a functools.partial() object we can at least attempt to get a ball-park figure:
>>> timeit.timeit('n()', setup='from functools import partial; i = iter(range(10000)); n = partial(i.__next__)', number=10000) * 10000
7.759470026940107
Boy, 10k times 10k calls to iter(range(1000)).__next__ takes almost 4x more time than the for loop managed; this goes to show how efficient the actual C implementation really is.
However, it does illustrate that looping in C code is a lot faster, and this is why the while loop is actually slower when executed correctly; summing integers and making boolean comparisons in bytecode takes more time than iterating over range() in C code (where the CPU does the incrementing and comparisons directly in CPU registers):
>>> (
... timeit.timeit('9999 + 1', number=10000 ** 2) +
... timeit.timeit('9999 < 10000', number=10000 ** 2)
... )
3.695550534990616
It is those operations that make the while loop about 3 seconds slower.
TLDR: You didn't actually test a while loop correctly. I should have noticed this earlier too.
You are timing things incorrectly, setup is only executed once and then the value of i is 10000 for all consequent runs. See the documentation on timeit:
Time number executions of the main statement. This executes the setup statement once, and then returns the time it takes to execute the main statement a number of times, measured in seconds as a float.
Additionally verify it by printing i for each repetition:
>>> timeit('print(i)\nwhile i<10000: True; i+=1',setup='i=0', number=5)
0
10000
10000
10000
10000
As a result, all consequent runs merely perform a comparison (which is True) and finish early.
Time correctly and see how the for loop is actually faster:
>>> timeit('i=0\nwhile i<10000: True; i+=1', number=10000)
8.416439056396484
>>> timeit('for i in range(10000): True', number=10000)
5.589155912399292

Karatsuba multiplication in Python : execution time

I have recently learned Karatsuba multiplication. In order to fully understand this concept, I have attempted to write the code in Python and compared the running time against classical multiplication. Although the results are equal, the execution time of karatsuba is still the lowest, despite I am using recursive calls. What's wrong with my approach? Some helps would definitely allow me to understand more about algorithm design.
Best
JP
print('Karatsuba multiplication in Python')
x=raw_input("first_number=")
y=raw_input("second_number=")
print('------------------------')
x=int(x)
y=int(y)
import math
import time
def karatsuba(x,y):
x=str(x)
y=str(y)
len_x=len(x)
len_y=len(y)
if(int(len_x)==1 or int(len_y)==1):
return int(x)*int(y)
else:
B=10
exp1=int(math.ceil(len_x/2.0))
exp2=int(math.ceil(len_y/2.0))
if(exp1<exp2):
exp=exp1
else:
exp=exp2
m1=len_x-exp
m2=len_y-exp
a=karatsuba(int(x[0:m1]),int(y[0:m2]))
c=karatsuba(int(x[m1:len_x]),int(y[m2:len_y]))
b=karatsuba(int(x[0:m1])+int(x[m1:len_x]),int(y[0:m2])+int(y[m2:len_y]))-a-c
results=a*math.pow(10,2*exp) + b*math.pow(10,exp) + c
return int(results)
start_time=time.time()
ctrl = x*y
tpt=time.time() - start_time
print x,'*',y,'=',ctrl
print("--- %s seconds ---" % tpt)
start_time=time.time()
output=karatsuba(x,y)
tpt=time.time() - start_time
print 'karatsuba(',x,',',y,')=',output
print("--- %s seconds ---" % tpt)
Karatsuba multiplication has bigger overhead then classical binary multiplication
the complexity is better but the overhead cause that karatsuba is faster for bigger numbers. The better is Karatsuba coded the less the treshold operands size.
I see in your code that you convert number to string to get digit count
that is very slow operation especially for big numbers use logarithms (constant binary to decadic digits ratio) and binary bit count instead. Look here for ideas on how to code Karatsuba faster (code is in C++)
usage of pow
another slow down use table of powers of 10 instead
what are you comparing it to? (originally asked by Padraic Cunningham)
Karatsuba is faster because it does operations on lower bit-count variables !!! I do not code in Phyton at all so I can missing something (like arbitrary int) but I do not see anywhere you lower the data-type with lowering the bit-count so you will be always slower !!!. Also nice will be to add example of slow multiplication you are comparing the times to like binary or radix multiplication ... (add what you use). If you use just * operator then if you are on some bigint lib then it is possible you are comparing Karatsuba with Karatsuba or even Schönhage-Strassen
time measurement
how do you measure time ? The times should be bigger then 10ms if not loop the computations N times and measure the whole thing to avoid accuracy problems. Also take in mind scheduling granularity of OS look here if you have no idea what I am writing about
Your algorithm should multiply numbers if the are < 10:
if int(len_x) < 10 or int(len_y) < 10:
karatsuba1 is your original code. karatsuba is using if int(len_x) < 10 or int(len_y) < 10
In [17]: %timeit karatsuba1(999,999)
100000 loops, best of 3: 13.3 µs per loop
In [18]: %timeit karatsuba(999,999)
1000000 loops, best of 3: 1.77 µs per loop

Unexpected performance curve from CPython merge sort

I have implemented a naive merge sorting algorithm in Python. Algorithm and test code is below:
import time
import random
import matplotlib.pyplot as plt
import math
from collections import deque
def sort(unsorted):
if len(unsorted) <= 1:
return unsorted
to_merge = deque(deque([elem]) for elem in unsorted)
while len(to_merge) > 1:
left = to_merge.popleft()
right = to_merge.popleft()
to_merge.append(merge(left, right))
return to_merge.pop()
def merge(left, right):
result = deque()
while left or right:
if left and right:
elem = left.popleft() if left[0] > right[0] else right.popleft()
elif not left and right:
elem = right.popleft()
elif not right and left:
elem = left.popleft()
result.append(elem)
return result
LOOP_COUNT = 100
START_N = 1
END_N = 1000
def test(fun, test_data):
start = time.clock()
for _ in xrange(LOOP_COUNT):
fun(test_data)
return time.clock() - start
def run_test():
timings, elem_nums = [], []
test_data = random.sample(xrange(100000), END_N)
for i in xrange(START_N, END_N):
loop_test_data = test_data[:i]
elapsed = test(sort, loop_test_data)
timings.append(elapsed)
elem_nums.append(len(loop_test_data))
print "%f s --- %d elems" % (elapsed, len(loop_test_data))
plt.plot(elem_nums, timings)
plt.show()
run_test()
As much as I can see everything is OK and I should get a nice N*logN curve as a result. But the picture differs a bit:
Things I've tried to investigate the issue:
PyPy. The curve is ok.
Disabled the GC using the gc module. Wrong guess. Debug output showed that it doesn't even run until the end of the test.
Memory profiling using meliae - nothing special or suspicious.
`
I had another implementation (a recursive one using the same merge function), it acts the similar way. The more full test cycles I create - the more "jumps" there are in the curve.
So how can this behaviour be explained and - hopefully - fixed?
UPD: changed lists to collections.deque
UPD2: added the full test code
UPD3: I use Python 2.7.1 on a Ubuntu 11.04 OS, using a quad-core 2Hz notebook. I tried to turn of most of all other processes: the number of spikes went down but at least one of them was still there.
You are simply picking up the impact of other processes on your machine.
You run your sort function 100 times for input size 1 and record the total time spent on this. Then you run it 100 times for input size 2, and record the total time spent. You continue doing so until you reach input size 1000.
Let's say once in a while your OS (or you yourself) start doing something CPU-intensive. Let's say this "spike" lasts as long as it takes you to run your sort function 5000 times. This means that the execution times would look slow for 5000 / 100 = 50 consecutive input sizes. A while later, another spike happens, and another range of input sizes look slow. This is precisely what you see in your chart.
I can think of one way to avoid this problem. Run your sort function just once for each input size: 1, 2, 3, ..., 1000. Repeat this process 100 times, using the same 1000 inputs (it's important, see explanation at the end). Now take the minimum time spent for each input size as your final data point for the chart.
That way, your spikes should only affect each input size only a few times out of 100 runs; and since you're taking the minimum, they will likely have no impact on the final chart at all.
If your spikes are really really long and frequent, you of course might want to increase the number of repetitions beyond the current 100 per input size.
Looking at your spikes, I notice the execution slows down exactly 3 times during a spike. I'm guessing the OS gives your python process one slot out of three during high load. Whether my guess is correct or not, the approach I recommend should resolve the issue.
EDIT:
I realized that I didn't clarify one point in my proposed solution to your problem.
Should you use the same input in each of your 100 runs for the given input size? Or should use 100 different (random) inputs?
Since I recommended to take the minimum of the execution times, the inputs should be the same (otherwise you'll be getting incorrect output, as you'll measuring the best-case algorithm complexity instead of the average complexity!).
But when you take the same inputs, you create some noise in your chart since some inputs are simply faster than others.
So a better solution is to resolve the system load problem, without creating the problem of only one input per input size (this is obviously pseudocode):
seed = 'choose whatever you like'
repeats = 4
inputs_per_size = 25
runtimes = defaultdict(lambda : float('inf'))
for r in range(repeats):
random.seed(seed)
for i in range(inputs_per_size):
for n in range(1000):
input = generate_random_input(size = n)
execution_time = get_execution_time(input)
if runtimes[(n, i)] > execution_time:
runtimes[(n,i)] = execution_time
for n in range(1000):
runtimes[n] = sum(runtimes[(n,i)] for i in range(inputs_per_size))/inputs_per_size
Now you can use runtimes[n] to build your plot.
Of course, depending if your system is super-noisy, you might change (repeats, inputs_per_size) from (4,25) to say, (10,10), or even (25,4).
I can reproduce the spikes using your code:
You should choose an appropriate timing function (time.time() vs. time.clock() -- from timeit import default_timer), number of repetitions in a test (how long each test takes), and number of tests to choose the minimal time from. It gives you a better precision and less external influence on the results. Read the note from timeit.Timer.repeat() docs:
It’s tempting to calculate mean and standard deviation from the result
vector and report these. However, this is not very useful. In a
typical case, the lowest value gives a lower bound for how fast your
machine can run the given code snippet; higher values in the result
vector are typically not caused by variability in Python’s speed, but
by other processes interfering with your timing accuracy. So the min()
of the result is probably the only number you should be interested in.
After that, you should look at the entire vector and apply common
sense rather than statistics.
timeit module can choose appropriate parameters for you:
$ python -mtimeit -s 'from m import testdata, sort; a = testdata[:500]' 'sort(a)'
Here's timeit-based performance curve:
The figure shows that sort() behavior is consistent with O(n*log(n)):
|------------------------------+-------------------|
| Fitting polynom | Function |
|------------------------------+-------------------|
| 1.00 log2(N) + 1.25e-015 | N |
| 2.00 log2(N) + 5.31e-018 | N*N |
| 1.19 log2(N) + 1.116 | N*log2(N) |
| 1.37 log2(N) + 2.232 | N*log2(N)*log2(N) |
To generate the figure I've used make-figures.py:
$ python make-figures.py --nsublists 1 --maxn=0x100000 -s vkazanov.msort -s vkazanov.msort_builtin
where:
# adapt sorting functions for make-figures.py
def msort(lists):
assert len(lists) == 1
return sort(lists[0]) # `sort()` from the question
def msort_builtin(lists):
assert len(lists) == 1
return sorted(lists[0]) # builtin
Input lists are described here (note: the input is sorted so builtin sorted() function shows expected O(N) performance).

Categories