Performance of searching a number in an array - python

I was doing some tests about finding a number in a number array with python. With the following code,
from time import time
search = 9999999
numbers = []
for i in range(100000000):
numbers.append(i)
start_time = time()
is_in = search in numbers
end_time = time()
print(is_in, end_time - start_time)
I got the output as follows:
True 0.10372281074523926
However, the amount of time that has passed seems much more than the output (nearly 4 seconds). In addition to that, when I change the search value to 0, it outputs the following,
True 0.0
But still, the amount of time that the program needs to terminate is nearly 4-5 seconds, (measured by human instincs) I wonder what is the reason behind this. Why it does not finish after 0.1 seconds as measured and why searching for 0 results in 0.0 seconds?

How long do you think it takes to build your numbers list, specially when doing so in the most inefficient way ? Well, let's check it - but let's check it the right way: using timeit:
>>> def foo():
... l = []
... for i in range(100000000): l.append(i)
... return l
...
>>> import timeit
>>> timeit.timeit("foo()", "from __main__ import foo", number=1)
6.561729616951197
So on this desktop (which is a rather decent machine), just creating this list already takes 6.5 seconds.
Now let's test the linear search:
>>> def search(i, num):
... return i in num
...
>>> numbers = foo()
>>> timeit.timeit("search(9999999, numbers)", "from __main__ import search, numbers", number=1)
0.06766342208720744
So we need 6.5 seconds to build the list, and 0.067 seconds to do a linear search. Note that in both cases we only executed the code under test one single time (the number=1 argument to timeit), which is not really accurate due to os process scheduling. For a more accurate reading you want to repeat the operation thousands times or more (the default value is actually 1000000 !) so you get a reasonably representative average value.
Now just for the fun let's rewrite foo():
>>> def foo():
... return list(range(100000000))
...
>>> timeit.timeit("foo()", "from __main__ import foo", number=1)
2.594872738001868
That's still long, but it's about 2.5 times faster. If you wonder why: this waythe runtime can allocate the required memory for the full list right from the start instead of having to grow it again and again and again.
And for a much more efficient (and constant time !) search:
>>> numset = set(numbers)
>>> timeit.timeit("search(9999999, numset)", "from __main__ import search, numset", number=1)
3.505963832139969e-06
Wait !!! 3.5 something seconds ??? But no - notice the e-06 at the end, it's actually 0.00000350596383213996 seconds, so almost 20000 times faster.

Related

Why is \r in Python print function faster but less smooth?

Say I have some simple code such as the following:
import time
tic = time.perf_counter()
for x in range(1000):
print(str(x))
toc = time.perf_counter()
print(toc - tic)
It prints 1000 things, and then prints the time it took to do it. When I run this, I get:
0
1
2
3
4
5
6
… skipped for brevity …
995
996
997
998
999
2.0521691879999997
A little more than two seconds. Not bad.
Now say I wanted to only have a single number showing at a time. I could do :
import time
tic = time.perf_counter()
for x in range(1000):
print('\r' + str(x), end = '')
toc = time.perf_counter()
print()
print(toc - tic)
I get (at the very end)
999
0.46631713999999996
This seems weird, because in the second script, you’re printing more things (some \r’s and x) than the first script (just x). So why would it be faster?
But in the second script, the output is very un-smooth. It looks like the program is counting by 110s. Why doesn’t it just start spitting out numbers rapidly like in the first one?
By the way, apparently sys.stdout.write() is faster than print(), and with the former, both the first script and the second script are about the same speed, but the second script is still not smooth.

Zero return in measuring time of function

import time
def find(a):
count = 0
for item in a:
count = count + 1
if item == 2:
return count
a = [7,4,5,10,3,5,88,5,5,5,5,5,5,5,5,5,5,55,
5,5,5,5,5,5,5,5,5,5,5,5,55,5,5,5,5,5,
5,5,5,5,5,2,5,5,5,55,5,55,5,5,5,6]
print (len(a))
sTime = time.time()
print (find(a))
eTime = time.time()
ave = eTime - sTime
print (ave)
I want measure the execution time of this function
My print (ave) returns 0; why?
To accurately time code execution you should use the timeit, rather than time. timeit easily allows the repetition of code blocks for timing to avoid very near zero results (the cause of your question)
import timeit
s = """
def find(a):
count = 0
for item in a:
count = count + 1
if item == 2:
return count
a = [7,4,5,10,3,5,88,5,5,5,5,5,5,5,5,5,5,55,5,5,5,5,5,5,5,5,5,5,5,5,55,5,5,5,5,5,5,5,5,5,5,2,5,5,5,55,5,55,5,5,5,6]
find(a)
"""
print(timeit.timeit(stmt=s, number=100000))
This will measure the amount of time it takes to run the code in multiline string s 100,000 times. Note that I replaced print(find(a)) with just find(a) to avoid having the result printed 100,000 times.
Running many times is advantageous for several reasons:
In general, code runs very quickly. Summing many quick runs results in a number which is actually meaningful and useful
Run time is dependent on many variable, uncontrollable factors (such as other processes using computing power). Running many times helps normalize this
If you are using timeit to compare two methodologies to see which is faster, multiple runs will make it easier to see the conclusive result
I'm not sure, either; I get a time about 1.4E-5.
Try putting the call into a loop to measure more iterations:
for i in range(10000):
result = find(a)
print(result)

Timing python function

(Python 2.7.8, Windows)
there is so many questions already about this particular subject, but I cannot seem to get any of them working.
So, what I'm trying to accomplish is timing how long a function takes to execute.
I have functions.py and main.py in following fashion:
#functions.py
def function(list):
does something
return list
...
#main.py
import functions
...stuff...
while:
list = gets list from file
functions.function(list) <--- this needs to get timed
Now I tried time.time() the start and end points first, but it's not accurate enough (difference tends to be 0.0), and after some googling it seems that this isn't the way to go anyway. Apparently what I should use(?) is timeit module. However I cannot understand how to get the function into it.
Any help?
As you mentioned, there's a Python module made for this very task, timeit. Its syntax, while a little idiosyncratic, is quite easy to understand:
timeit.timeit(stmt='pass', setup='pass', timer=<default timer>, number=1000000)
stmt is the function call to be measured, in your case: functions.function(list)
setup is the code you need to create the context necessary for stmt to execute, in your case: import functions; list = gets list from file
number is how many time timeit would run stmt to find its average execution time. You might want to change the number, since calling your function a million times might take a while.
tl;dr:
timeit.timeit(stmt='functions.function(list)', setup='import functions; list = gets list from file', number=100)
you see this demo: time.time
>>> def check(n):
... start = time.time()
... for x in range(n):
... pass
... stop = time.time()
... return stop-start
...
>>> check(1000)
0.0001239776611328125
>>> check(10000)
0.0012159347534179688
>>> check(100)
1.71661376953125e-05
the above function returns hum much time in sec taken by for for n loops.
so the algorithm is:
start = time.time()
# your stuff
stop = time.time()
time_taken = stop - start
start = timeit.default_timer()
my_function()
elapsed = timeit.default_timer() - start

Using time.time() to time a function often return 0 seconds

I have to time the implementation I did of an algorithm in one of my classes, and I am using the time.time() function to do so. After implementing it, I have to run that algorithm on a number of data files which contains small and bigger data sets in order to formally analyse its complexity.
Unfortunately, on the small data sets, I get a runtime of 0 seconds even if I get a precision of 0.000000000000000001 with that function when looking at the runtimes of the bigger data sets and I cannot believe that it really takes less than that on the smaller data sets.
My question is: Is there a problem using this function (and if so, is there another function I can use that has a better precision)? Or am I doing something wrong?
Here is my code if ever you need it:
import sys, time
import random
from utility import parseSystemArguments, printResults
...
def main(ville):
start = time.time()
solution = dynamique(ville) # Algorithm implementation
end = time.time()
return (end - start, solution)
if __name__ == "__main__":
sys.argv.insert(1, "-a")
sys.argv.insert(2, "3")
(algoNumber, ville, printList) = parseSystemArguments()
(algoTime, solution) = main(ville)
printResults(algoTime, solution, printList)
The printResults function:
def printResults(time, solution, printList=True):
print ("Temps d'execution = " + str(time) + "s")
if printList:
print (solution)
The solution to my problem was to use the timeit module instead of the time module.
import timeit
...
def main(ville):
start = timeit.default_timer()
solution = dynamique(ville)
end = timeit.default_timer()
return (end - start, solution)
Don't confuse the resolution of the system time with the resolution of a floating point number. The time resolution on a computer is only as frequent as the system clock is updated. How often the system clock is updated varies from machine to machine, so to ensure that you will see a difference with time, you will need to make sure it executes for a millisecond or more. Try putting it into a loop like this:
start = time.time()
k = 100000
for i in range(k)
solution = dynamique(ville)
end = time.time()
return ((end - start)/k, solution)
In the final tally, you then need to divide by the number of loop iterations to know how long your code actually runs once through. You may need to increase k to get a good measure of the execution time, or you may need to decrease it if your computer is running in the loop for a very long time.

timeit.timer giving far different results when using it in my own function vs. calling it from the command line

I made a little function using timeit just so I could be lazy and do less typing which isn't panning out as planned.
The (relevant) code:
def timing(function, retries=10000, formatSeconds=3, repeat=10):
"""Test how long a function takes to run. Defaults are set to run
10 times of 10000 tries each. Will display time as 1 of 4 types.
0 = Seconds, 1 = milliseconds, 2= microseconds and 3 = nanoseconds.
Pass in paramaters as: (function, retries=10000,formatSeconds=3, repeat=10)"""
t = timeit.Timer(lambda: function)
result = t.repeat(repeat=repeat,number=retries)
rlist = [i/retries for i in result]
It runs fine but it keeps returning:
timeprofile.timing(find_boundaries(numpy.asarray(Image.open(
r'D:\Python\image\image4.jpg')),79))
10 runs of 10000 cycles each:
Best time: 137.94764 Worst:158.16651 Avg: 143.25466 nanosecs/pass
Now, if I do from the interpreter:
import timeit
from timeit import Timer
t = timeit.Timer(lambda: (find_boundaries(numpy.asarray(Image.open(r'D:\Python\image\image4.jpg')),79)))
result = t.repeat(repeat=5,number=100)
result = [i/100 for i in result]
I end up with [0.007723014775432375, 0.007615270149786965, 0.0075242365377505395,
0.007420834966038683, 0.0074086862470653615], or about 8 milliseconds.
And if I run the profiler on the script, it also gives approximately the same result of about 8 milliseconds.
I'm not really sure what the problem is although I reckon it has something to do with the how it's calling the function. When I check the data in the debugger it shows the function as a dictionary with a len of 53, and each key contains 1 to 15 tuples with a pair of 2-3 digit numbers in each.
So, if anyone knows why it's doing that and would like to explain it to me, and how to fix it, that'd be great!
Yes, there is a difference. When you run:
timeprofile.timing(find_boundaries(numpy.asarray(Image.open(
r'D:\Python\image\image4.jpg')),79))
You are not passing in a function reference. You are calling the function and instead are passing in the result of that call. You are timing staticresult instead of somefunction(with, arguments).
Move out the lambda:
timeprofile.timing(lambda: (find_boundaries(numpy.asarray(Image.open(
r'D:\Python\image\image4.jpg')),79)))
This means you need to remove it from your timing function, and instead pass the function straight to the Timer() class:
t = timeit.Timer(function)

Categories