Evaluating linear search execution speed - python

When I learned how to code someone told me that the "break" instruction was not elegant from an algorithmic perspective and it should not be used. However I've tried to compare the execution speed of two different versions of the linear search algorithm and the version with a for loop is always faster.
Any opinions?
import numpy as np
import random
import time
n = 100000;
x = np.arange(0,n)
random.shuffle(x)
k=30 # the number to search for
#--- OPTION 1: LINEAR SEARCH USING WHILE LOOP
start_time = time.time()
i=0
while (x[i]!=k) and (i<n-1):
i+=1
print(i)
print("V1 --- %s seconds ---" % (time.time() - start_time))
#--- OPTION 2: LINEAR SEARCH USING FOR LOOP w/ BREAK
start_time = time.time()
for i in range(0,n):
if x[i]==k:
break
print(i)
print("V2 --- %s seconds ---" % (time.time() - start_time))

Don't listen to "someone". Everyone uses break statements. There is absolutely nothing wrong with using them. The can make your code both easier to understand and simpler: a win-win as far as I'm concerned.

Related

python program that terminates after one second

I want a piece of code that stops exactly in one second
note using time.sleep() does not do the job precisely
here is what I have so far(that doesn't give me precise result)
import time
import sys
start_time = time.time()
time.sleep(.99483-(time.time() - start_time))
print(time.time() - start_time)
sys.exit(0)
also note that the final time should include the last line which is sys.exit(0) execution
I appreciate any help or advice you could spear
import time
import sys
start_time = time.time()
while time.time() - start_time < 1:
pass
print(time.time() - start_time)
Try this and let me know if this meets your requirement

is there a better function than 'computeSVD()' that uses mapreduce in term of execution time?

I used the function computeSVD() and i used a large matrix on it and the execution time is so long comparing to a function that normaly use mapreduce which normaly makes the execution time better.
i compared these two functions:
start_time = time.time()
number_of_documents=200
L,S,R=np.linalg.svd(X) <--- don't use mapreduce
exemple_three = time.time() - start_time
print("---Exemple three : %s seconds ---" % (exemple_three))
output:
---Exemple three : 5.322664976119995 seconds ---
and the second one computeSVD()
start_time = time.time()
number_of_documents=200
svd = mat.computeSVD(5, computeU=True) <--- use mapreduce
exemple_two = time.time() - start_time
print("---Exemple one : %s seconds ---" % (exemple_two))
output:
---Exemple one : 252.04261994361877 seconds ---
my goal is a similar function that uses mapreduce

How to efficiently perform addition over large loops in python

I am trying to perform addition in an efficient way in python over large loops . I am trying to loop over a range of 100000000.
from datetime import datetime
start_time = datetime.now()
sum = 0
for i in range(100000000):
sum+=i
end_time = datetime.now()
print('--- %s seconds ---{}'.format(end_time - start_time))
print(sum)
The output from the above code is
--- %s seconds ---0:00:16.662666
4999999950000000
When i try to do it in C, its taking 0.43 seconds
From what i read, python creates new memory everytime when you perform addition to variable. I read some articles and came to know how to perform string concatenation in these situations by avoiding '+' sign . But i dont find anything how to do with integers.
Consider using the sum() function if you can process the list as a whole, which loops entirely in C code and is much faster, and also avoids the creation of new Python objects.
sum(range(100000000))
In my computer, your code takes 07.189210 seconds, while the above statement takes 02.751251 seconds, increasing the processing speed more than 3 times.
Edit: as suggested by mtrw, numpy.sum() can speed up processing even more.
Here is a comparison of three methods: your original way, using sum(range(100000000)) as suggested by Alex Metsai, and using the NumPy numerical library's sum and range functions:
from datetime import datetime
import numpy as np
def orig():
start_time = datetime.now()
sum = 0
for i in range(100000000):
sum+=i
end_time = datetime.now()
print('--- %s seconds ---{}'.format(end_time - start_time))
print(sum)
def pyway():
start_time = datetime.now()
mysum = sum(range(100000000))
end_time = datetime.now()
print('--- %s seconds ---{}'.format(end_time - start_time))
print(mysum)
def npway():
start_time = datetime.now()
sum = np.sum(np.arange(100000000))
end_time = datetime.now()
print('--- %s seconds ---{}'.format(end_time - start_time))
print(sum)
On my computer, I get:
>>> orig()
--- %s seconds ---0:00:09.504018
4999999950000000
>>> pyway()
--- %s seconds ---0:00:02.382020
4999999950000000
>>> npway()
--- %s seconds ---0:00:00.683411
4999999950000000
NumPy is the fastest, if you can use it in your application.
But, as suggested by Ethan in a comment, it's worth pointing out that calculating the answer directly is by far the fastest:
def mathway():
start_time = datetime.now()
mysum = 99999999*(99999999+1)/2
end_time = datetime.now()
print('--- %s seconds ---{}'.format(end_time - start_time))
print(mysum)
>>> mathway()
--- %s seconds ---0:00:00.000013
4999999950000000.0
I assume your actual problem is not so easily solved by pencil and paper :)

What is the best/most efficient way to output value every x seconds during a loop

I have always been curious about this as the simple way is definitely not efficient. How would you efficiently go about outputting a value every x seconds?
Here is an example of what I mean:
import time
num = 50000000
startTime = time.time()
j=0
for i in range(num):
j = (((j+10)**0.5)**2)**0.5
print time.time() - startTime
#output time: 24 seconds
startTime = time.time()
newTime = time.time()
j=0
for i in range(num):
j = (((j+10)**0.5)**2)**0.5
if time.time() - newTime > 0.5:
newTime = time.time()
print i
print time.time() - startTime
#output time: 32 seconds
A whole 1/3rd faster when not outputting the progress every half a second.
I know this is because it requires an extra calculation every loop, but the same applies with other similar checks you may want to do - how would you go about implementing something like this without seriously affecting the execution time?
Well, you know that you're doing many iterations per second, so you really don't need to make the time.time() call on every iteration. You can use a modulo operator to only actually check if you need to output something every N iterations of the loop.
startTime = time.time()
newTime = time.time()
j=0
for i in range(num):
j = (((j+10)**0.5)**2)**0.5
if i % 50 == 0: # Only check every 50th iteration
if time.time() - newTime > 0.5:
newTime = time.time()
print i, newTime
print time.time() - startTime
# 45 seconds (the original version took 42 on my system)
Checking only every 50 iterations reduces my run time from 56 seconds to 43 (the original took with no printing 42, and Tom Page's solution took 50 seconds), and the iterations complete quickly enough that its still outputting exactly every 0.5 seconds according to time.time():
0 1409083225.39
605000 1409083225.89
1201450 1409083226.39
1821150 1409083226.89
2439250 1409083227.39
3054400 1409083227.89
3644100 1409083228.39
4254350 1409083228.89
4831600 1409083229.39
5433450 1409083229.89
6034850 1409083230.39
6644400 1409083230.89
7252650 1409083231.39
7840100 1409083231.89
8438300 1409083232.39
9061200 1409083232.89
9667350 1409083233.39
...
You might save a few clock cycles by keeping track of the next time that a print is due
nexttime = time.time() + 0.5
And then your condition will be a simple comparison
If time.time() >= nexttime
As opposed to a subtraction followed by a comparison
If time.time() - newTime > 0.5
You'll only have to do an addition after each message as opposed to doing a subtraction after each itteration
I tried it with a sideband thread doing the printing. It added 5 seconds to exec time on python 2.x but virtually not extra time on python 3.x. Python 2.x threads have a lot of overhead. Here's my example with timing included as comments:
import time
import threading
def showit(event):
global i # could pass in a mutable object instead
while not event.is_set():
event.wait(.5)
print 'value is', i
num = 50000000
startTime = time.time()
j=0
for i in range(num):
j = (((j+10)**0.5)**2)**0.5
print time.time() - startTime
#output time: 23 seconds
event = threading.Event()
showit_thread = threading.Thread(target=showit, args=(event,))
showit_thread.start()
startTime = time.time()
j=0
for i in range(num):
j = (((j+10)**0.5)**2)**0.5
event.set()
time.sleep(.1)
print time.time() - startTime
#output time: 28 seconds
If you want to wait a specified period of time before doing something, just use the time.sleep() method.
for i in range(100):
print(i)
time.sleep(0.5)
This will wait half a second before printing the next value of i.
If you don't care about Windows, signal.setitimer will be simpler than using a background thread, and on many *nix platforms a whole lot more efficient.
Here's an example:
import signal
import time
num = 50000000
startTime = time.time()
def ontimer(sig, frame):
global i
print(i)
signal.signal(signal.SIGVTALRM, ontimer)
signal.setitimer(signal.ITIMER_VIRTUAL, 0.5, 0.5)
j=0
for i in range(num):
j = (((j+10)**0.5)**2)**0.5
signal.setitimer(signal.ITIMER_VIRTUAL, 0)
print(time.time() - startTime)
This is about as close to free as you're going to get performance-wise.
In some use cases, a virtual timer isn't sufficiently accurate, so you need to change that to ITIMER_REAL and change the signal to SIGALRM. That's a little more expensive, but still pretty cheap, and still dead simple.
On some (older) *nix platforms, alarm may be more efficient than setitmer, but unfortunately alarm only takes integral seconds, so you can't use it to fire twice/second.
Timings from my MacBook Pro:
no output: 15.02s
SIGVTALRM: 15.03s
SIGALRM: 15.44s
thread: 19.9s
checking time.time(): 22.3s
(I didn't test with either dano's optimization or Tom Page's; obviously those will reduce the 22.3, but they're not going to get it down to 15.44…)
Part of the problem here is that you're using time.time.
On my MacBook Pro, time.time takes more than 1/3rd as long as all of the work you're doing:
In [2]: %timeit time.time()
10000000 loops, best of 3: 105 ns per loop
In [3]: %timeit (((j+10)**0.5)**2)**0.5
1000000 loops, best of 3: 268 ns per loop
And that 105ns is fast for time—e.g., an older Windows box with no better hardware timer than ACPI can take 100x longer.
On top of that, time.time is not guaranteed to have enough precision to do what you want anyway:
Note that even though the time is always returned as a floating point number, not all systems provide time with a better precision than 1 second.
Even on platforms where it has better precision than 1 second, it may have a lower accuracy; e.g., it may only be updated once per scheduler tick.
And time isn't even guaranteed to be monotonic; on some platforms, if the system time changes, time may go down.
Calling it less often will solve the first problem, but not the others.
So, what can you do?
Unfortunately, there's no built-in answer, at least not with Python 2.7. The best solution is different on different platforms—probably GetTickCount64 on Windows, clock_gettime with the appropriate clock ID on most modern *nixes, gettimeofday on most other *nixes. These are relatively easy to use via ctypes if you don't want to distribute a C extension… but someone really should wrap it all up in a module and post it on PyPI, and unfortunately I couldn't find one…

In what situation do we need to use `multiprocessing.Pool.imap_unordered`?

The ordering of results from the returned iterator of imap_unordered is arbitrary, and it doesn't seem to run faster than imap(which I check with the following code), so why would one use this method?
from multiprocessing import Pool
import time
def square(i):
time.sleep(0.01)
return i ** 2
p = Pool(4)
nums = range(50)
start = time.time()
print 'Using imap'
for i in p.imap(square, nums):
pass
print 'Time elapsed: %s' % (time.time() - start)
start = time.time()
print 'Using imap_unordered'
for i in p.imap_unordered(square, nums):
pass
print 'Time elapsed: %s' % (time.time() - start)
Using pool.imap_unordered instead of pool.imap will not have a large effect on the total running time of your code. It might be a little faster, but not by too much.
What it may do, however, is make the interval between values being available in your iteration more even. That is, if you have operations that can take very different amounts of time (rather than the consistent 0.01 seconds you were using in your example), imap_unordered can smooth things out by yielding faster-calculated values ahead of slower-calculated values. The regular imap will delay yielding the faster ones until after the slower ones ahead of them have been computed (but this does not delay the worker processes moving on to more calculations, just the time for you to see them).
Try making your work function sleep for i*0.1 seconds, shuffling your input list and printing i in your loops. You'll be able to see the difference between the two imap versions. Here's my version (the main function and the if __name__ == '__main__' boilerplate was is required to run correctly on Windows):
from multiprocessing import Pool
import time
import random
def work(i):
time.sleep(0.1*i)
return i
def main():
p = Pool(4)
nums = range(50)
random.shuffle(nums)
start = time.time()
print 'Using imap'
for i in p.imap(work, nums):
print i
print 'Time elapsed: %s' % (time.time() - start)
start = time.time()
print 'Using imap_unordered'
for i in p.imap_unordered(work, nums):
print i
print 'Time elapsed: %s' % (time.time() - start)
if __name__ == "__main__":
main()
The imap version will have long pauses while values like 49 are being handled (taking 4.9 seconds), then it will fly over a bunch of other values (which were calculated by the other processes while we were waiting for 49 to be processed). In contrast, the imap_unordered loop will usually not pause nearly as long at one time. It will have more frequent, but shorter pauses, and its output will tend to be smoother.
imap_unordered also seems to use less memory over time than imap. At least that's what I experienced with a iterator over millions of things.

Categories