Higher Order Functions vs loops - running time & memory efficiency? - python

Does using Higher Order Functions & Lambdas make running time & memory efficiency better or worse?
For example, to multiply all numbers in a list :
nums = [1,2,3,4,5]
prod = 1
for n in nums:
prod*=n
vs
prod2 = reduce(lambda x,y:x*y , nums)
Does the HOF version have any advantage over the loop version other than it's lesser lines of code/uses a functional approach?
EDIT:
I am not able to add this as an answer as I don't have the required reputation.
I tied to profile the loop & HOF approach using timeit as suggested by #DSM
def test1():
s= """
nums = [a for a in range(1,1001)]
prod = 1
for n in nums:
prod*=n
"""
t = timeit.Timer(stmt=s)
return t.repeat(repeat=10,number=100)
def test2():
s="""
nums = [a for a in range(1,1001)]
prod2 = reduce(lambda x,y:x*y , nums)
"""
t = timeit.Timer(stmt=s)
return t.repeat(repeat=10,number=100)
And this is my result:
Loop:
[0.08340786340144211, 0.07211491653462579, 0.07162720686361926, 0.06593182661083438, 0.06399049758613146, 0.06605228229559557, 0.06419744588664211, 0.0671893658461038, 0.06477527090075941, 0.06418023793167627]
test1 average: 0.0644778902685
HOF:
[0.0759414223099324, 0.07616920129277016, 0.07570730355421262, 0.07604965128984942, 0.07547092059389193, 0.07544737286604364, 0.075532959799953, 0.0755039779810629, 0.07567424616704144, 0.07542563650187661]
test2 average: 0.0754917512762
On an average loop approach seems to be faster than using HOFs.

Higher-order functions can be very fast.
For example, map(ord, somebigstring) is much faster than the equivalent list comprehension [ord(c) for c in somebigstring]. The former wins for three reasons:
map() pre-sizes the result string to the length of somebigstring. In contrast, the list-comprehension must make many calls to realloc() as it grows.
map() only has to do one lookup for ord, first checking globals, then checking and finding it in builtins. The list comprehension has to repeat this work on every iteration.
The inner loop for map runs at C speed. The loop body for the list comprehension is a series of pure Python steps that each need to be dispatched or handled by the eval-loop.
Here are some timings to confirm the prediction:
>>> from timeit import Timer
>>> print min(Timer('map(ord, s)', 's="x"*10000').repeat(7, 1000))
0.808364152908
>>> print min(Timer('[ord(c) for c in s]', 's="x"*10000').repeat(7, 1000))
1.2946639061

from my experience loops can do things very fast , provided they are not nested too deeply , and with complex higher math operations , for simple operations and a Single layer of loops it can be as fast as any other way , maybe faster , so long as only integers are used as the index to the loop or loops, it would actually depend on what you are doing too
Also it might very well be that the higher order function will produce just as many loops
as the loop program version and might even be a little slower , you would have to time them both...just to be sure.

Related

Python nested for loop faster than single for loop

Why is the nested for loop faster than the single for loop?
start = time()
k = 0
m = 0
for i in range(1000):
for j in range(1000):
for l in range(100):
m+=1
#for i in range(100000000):
# k +=1
print int(time() - start)
For the single for loop I get a time of 14 seconds and for the nested for loop of 10 seconds
The relevant context is explained in this topic.
In short, range(100000000) builds a huge list in Python 2, whereas with the nested loops you only build lists with a total of 1000 + 1000 + 100 = 2100 elements. In Python 3, range is smarter and lazy like xrange in Python 2.
Here are some timings for the following code. Absolute runtime depends on the system, but comparing the values with each other is valuable.
import timeit
runs = 100
code = '''k = 0
for i in range(1000):
for j in range(1000):
for l in range(100):
k += 1'''
print(timeit.timeit(stmt=code, number=runs))
code = '''k = 0
for i in range(100000000):
k += 1'''
print(timeit.timeit(stmt=code, number=runs))
Outputs:
CPython 2.7 - range
264.650791883
372.886064053
Interpretation: building huge lists takes time.
CPython 2.7 - range exchanged with xrange
231.975350142
221.832423925
Interpretation: almost equal, as expected. (Nested for loops should have slightly
larger overhead than a single for loop.)
CPython 3.6 - range
365.20924194483086
437.26447860104963
Interpretation: Interesting! I did not expect this. Anyone?
It is because you are using Python2. Range generates a list of numbers, and has to allocate that list. In the first nested loop you are allocating 1000 + 1000 + 100, so the list size is 2100, while in the other one the list has a size of 100000000, which is much bigger.
In python2 is better to use a generator, xrange(), a generator yields the numbers instead of building and allocating a list with them.
Aditionally and for further information you can read this question that it is related to this but in python3
In Python 2, range creates a list with all of the numbers within the list. Try swapping range with xrange and you should see them take comparable time or the single loop approach may work a bit faster.
during the nested loops python has to allocate 1000+1000+100=2100 values for the counters whereas in the single loop it has to allocate 10M. This is what's taking the extra time
i have tested this in python 3.6 and the behaviour is similar, i would say it's very likely this is a memory allocation issue.

Efficiency of Python "in" keyword for sorted list

If I have a list that is already sorted and use the in keyword, for example:
a = [1,2,5,6,8,9,10]
print 8 in a
I think this should do a sequential search but can I make it faster by doing binary search?
Is there a pythonic way to search in a sorted list?
The standard library has the bisect module which supports searching in sorted sequences.
However, for small lists, I would bet that the C implementation behind the in operator would beat out bisect. You'd have to measure with a bunch of common cases to determine the real break-even point on your target hardware...
It's worth noting that if you can get away with an unordered iterable (i.e. a set), then you can do the lookup in O(1) time on average (using the in operator), compared to bisection on a sequence which is O(logN) and the in operator on a sequence which is O(N). And, with a set you also avoid the cost of sorting it in the first place :-).
There is a binary search for Python in the standard library, in module bisect. It does not support in/contains as is, but you can write a small function to handle it:
from bisect import bisect_left
def contains(a, x):
"""returns true if sorted sequence `a` contains `x`"""
i = bisect_left(a, x)
return i != len(a) and a[i] == x
Then
>>> contains([1,2,3], 3)
True
>>> contains([1,2,3], 4)
False
This is not going to be very speedy though, as bisect is written in Python, and not in C, so you'd probably find sequential in faster for quite a lot cases. bisect has had an optional C acceleration in CPython since Python 2.4.
It is hard to time the exact break-even point in CPython. This is because the code is written in C; if you check for a value that is greater to or less than any value in the sequence, then the CPU's branch prediction will play tricks on you, and you get:
In [2]: a = list(range(100))
In [3]: %timeit contains(a, 101)
The slowest run took 8.09 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 370 ns per loop
Here, the best of 3 is not representative of the true running time of the algorithm.
But tweaking tests, I've reached the conclusion that bisecting might be faster than in for lists having as few as 30 elements.
However, if you're doing really many in operations you ought to use a set; you can convert the list once into a set (it does not even be sorted) and the in operation will be asymptotically faster than any binary search ever would be:
>>> a = [10, 6, 8, 1, 2, 5, 9]
>>> a_set = set(a)
>>> 10 in a_set
True
On the other hand, sorting a list has greater time-complexity than building a set, so most of the time a set would be the way to go.
I would go with this pure one-liner (providing bisect is imported):
a and a[bisect.bisect_right(a, x) - 1] == x
Stress test:
from bisect import bisect_right
from random import randrange
def contains(a, x):
return a and a[bisect.bisect_right(a, x) - 1] == x
for _ in range(10000):
a = sorted(randrange(10) for _ in range(10))
x = randrange(-5, 15)
assert (x in a) == contains(a, x), f"Error for {x} in {a}"
... doesn't print anything.

Why is direct indexing of an array significantly faster than iteration?

Just some Python code for an example:
nums = [1,2,3]
start = timer()
for i in range(len(nums)):
print(nums[i])
end = timer()
print((end-start)) #computed to 0.0697546862831
start = timer()
print(nums[0])
print(nums[1])
print(nums[2])
end = timer()
print((end-start)) #computed to 0.0167170338524
I can grasp that some extra time will be taken in the loop because the value of i must be incremented a few times, but the difference between the running times of these two different methods seems a lot bigger than I expected. Is there something else happening underneath the hood that I'm not considering?
Short answer: it isn't, unless the loop is very small. The for loop has a small overhead, but the way you're doing it is inefficient. By using range(len(nums)) you're effectively creating another list and iterating through that, then doing the same index lookups anyway. Try this:
for i in nums:
print(i)
Results for me were as expected:
>>> import timeit
>>> timeit.timeit('nums[0];nums[1];nums[2]', setup='nums = [1,2,3]')
0.10711812973022461
>>> timeit.timeit('for i in nums:pass', setup='nums = [1,2,3]')
0.13474011421203613
>>> timeit.timeit('for i in range(len(nums)):pass', setup='nums = [1,2,3]')
0.42371487617492676
With a bigger list the advantage of the loop becomes apparent, because the incremental cost of accessing an element by index outweighs the one-off cost of the loop:
>>> timeit.timeit('for i in nums:pass', setup='nums = range(0,100)')
1.541944980621338
timeit.timeit(';'.join('nums[%s]' % i for i in range(0,100)), setup='nums = range(0,100)')
2.5244338512420654
In python 3, which puts a greater emphasis on iterators over indexable lists, the difference is even greater:
>>> timeit.timeit('for i in nums:pass', setup='nums = range(0,100)')
1.6542046590038808
>>> timeit.timeit(';'.join('nums[%s]' % i for i in range(0,100)), setup='nums = range(0,100)')
10.331634456000756
With such a small array you're probably measuring noise first, and then the overhead of calling range(). Note that range not only has to increment a variable a few times, it also creates an object that holds its state (the current value) because it's a generator. The function call and object creation are two things you don't pay for in the second example and for very short iterations they will probably dwarf three array accesses.
Essentially your second snippet does loop unrolling, which is a viable and frequent technique of speeding up performance-critical code.
The for loop have a cost in any case, and the one you write is especially costly. Here is four versions, using timeit for measure time:
from timeit import timeit
NUMS = [1, 2, 3]
def one():
for i in range(len(NUMS)):
NUMS[i]
def one_no_access():
for i in range(len(NUMS)):
i
def two():
NUMS[0]
NUMS[1]
NUMS[2]
def three():
for i in NUMS:
i
for func in (one, one_no_access, two, three):
print(func.__name__ + ':', timeit(func))
Here is the found times:
one: 1.0467438200000743
one_no_access: 0.8853238560000136
two: 0.3143197629999577
three: 0.3478466749998006
The one_no_access show the cost of the expression range(len(NUMS)).
While lists in python are stocked contiguously in memory, the random access of elements is in O(1), explaining two as the quicker.

Why is my quicksort so slow in python?

I tried to write a quicksort in python (for learning algorithms),but I found it about 10x slower than the native sort.Here's the result:
16384 numbers:
native: 5.556 ms
quicksort: 96.412 ms
65536 numbers:
native: 27.190 ms
quicksort: 436.110 ms
262144 numbers:
native: 151.820 ms
quicksort: 1975.943 ms
1048576 numbers:
native: 792.091 ms
quicksort: 9097.085 ms
4194304 numbers:
native: 3979.032 ms
quicksort: 39106.887 ms
Does it mean that there's something wrong with my implementation?
Or that's OK because the native sort uses a lot of low-level optimization?
Nevertheless, I feel it unacceptable for sorting of 1 million numbers to take nearly 10s, even though I wrote it just for learning rather than practical application. And my computer is quite fast.
Here's my code:
def quicksort(lst):
quicksortinner(lst,0,len(lst)-1)
def quicksortinner(lst,start,end):
if start>=end:
return
j=partition(lst,start,end)
quicksortinner(lst,start,j-1)
quicksortinner(lst,j+1,end)
def partition(lst,start,end):
pivotindex=random.randrange(start,end+1)
swap(lst,pivotindex,end)
pivot=lst[end]
i,j=start,end-1
while True:
while lst[i]<=pivot and i<=end-1:
i+=1
while lst[j]>=pivot and j>=start:
j-=1
if i>=j:
break
swap(lst,i,j)
swap(lst,i,end)
return i
def swap(lst,a,b):
if a==b:
return
lst[a],lst[b]=lst[b],lst[a]
In partition, i scans right and j scans left(the approach from Algorithms). Earlier I tried the way where both move right(maybe more common), and there's not much difference.
The native sort is written in C. Your quicksort is written in pure Python. A speed difference of 10x is expected. If you run your code using PyPy, you should get closer to native speed (PyPy uses a tracing JIT to achieve high performance). Likewise, Cython would give a nice speed boost as well (Cython is a Python-to-C compiler).
A way to tell if your algorithm is even in the same ballpark is to count the number of comparisons used by both sort algorithms. In finely tuned code, the comparison costs dominate the running time. Here's a tool for counting comparisons:
class CountCmps(float):
def __lt__(self, other):
global cnt
cnt += 1
return float.__lt__(self, other)
>>> from random import random
>>> data = [CountCmps(random()) for i in range(10000)]
>>> cnt = 0
>>> data.sort()
>>> cnt
119883
One other factor is your call to random.randrange() has many pure Python steps and does more work than you might expect. It will be a non-trivial component of the total run time. Because random pivot selection can be slow, consider using a median-of-three technique for selecting the pivot.
Also, the call to the swap() function isn't fast in CPython. Inlining that code should give you a speed boost.
As you can see, there is a lot more to optimizing Python than just selecting a good algorithm. Hope this answer gets you further to your goal :-)
You will gain a small speed-up by moving to iteration instead of recursion, although the large part of this is probably due to the native code being very fast.
I illustrate this with reference to MergeSort. Apologies for not using QuickSort - they work with about the same speed but MergeSort takes a little less time to wrap your head around, and the iterative version is more easy to demonstrate.
Essentially, MergeSort sorts a string by breaking it in half, sorting the two separately (using itself, of course!), and combining the results - sorted lists can be merges in O(n) time so this results in overall O(n log n) performance.
Here is a simple recursive MergeSort algorithm:
def mergeSort(theList):
if len(theList) == 1:
return theList
theLength = int(len(theList)/2)
return mergeSorted( mergeSort(theList[0:theLength]), mergeSort(theList[theLength:]) )
def mergeSorted(theList1,theList2):
sortedList = []
counter1 = 0
counter2 = 0
while True:
if counter1 == len(theList1):
return sortedList + theList2[counter2:]
if counter2 == len(theList2):
return sortedList + theList1[counter1:]
if theList1[counter1] < theList2[counter2]:
sortedList.append(theList1[counter1])
counter1 += 1
else:
sortedList.append(theList2[counter2])
counter2 += 1
Exactly as you found, this is beaten into the ground by the in-built sorting algorithm:
import timeit
setup = """from __main__ import mergeSortList
import random
theList = [random.random() for x in xrange(1000)]"""
timeit.timeit('theSortedList1 = sorted(theList)', setup=setup, number=1000)
#0.33633776246006164
timeit.timeit('theSortedList1 = mergeSort(theList)', setup=setup, number=1000)
#8.415547955717784
However, a bit of a time boost can be had by eliminating the recursive function calls in the mergeSort function (this also avoids the dangers of hitting recursion limits). This is done by starting at the base elements, and combining them pairwise, a bottom-up approach instead of a top-down approach. For example:
def mergeSortIterative(theList):
theNewList = map(lambda x: [x], theList)
theLength = 1
while theLength < len(theList):
theNewNewList = []
pairs = zip(theNewList[::2], theNewList[1::2])
for pair in pairs:
theNewNewList.append( mergeSorted( pair[0], pair[1] ) )
if len(pairs) * 2 < len(theNewList):
theNewNewList.append(theNewList[-1])
theLength *= 2
theNewList = theNewNewList
return theNewNewList[0]
Now the growing sorted list elements are stored at each iteration, greatly reducing the memory requirements and eliminating the recursive function calls. Running this gives about a 15% speed boost in my running time - and this was a quickly-thrown-together version
setup = """from __main__ import mergeSortIterative
import random
theList = [random.random() for x in xrange(1000)]"""
timeit.timeit('theSortedList1 = mergeSortIterative(theList)', setup=setup, number=1000)
#7.1798827493580575
So I'm still no-where near the in-built version, but a little bit better than I was doing before.
A recipe for iterative QuickSort can be found here.

Why this list comprehension is faster than equivalent generator expression?

I'm using Python 3.3.1 64-bit on Windows and this code snippet:
len ([None for n in range (1, 1000000) if n%3 == 1])
executes in 136ms, compared to this one:
sum (1 for n in range (1, 1000000) if n%3 == 1)
which executes in 146ms. Shouldn't a generator expression be faster or the same speed as the list comprehension in this case?
I quote from Guido van Rossum From List Comprehensions to Generator Expressions:
...both list comprehensions and generator expressions in Python 3 are
actually faster than they were in Python 2! (And there is no longer a
speed difference between the two.)
EDIT:
I measured the time with timeit. I know that it is not very accurate, but I care only about relative speeds here and I'm getting consistently shorter time for list comprehension version, when I test with different numbers of iterations.
I believe the difference here is entirely in the cost of 1000000 additions. Testing with 64-bit Python.org 3.3.0 on Mac OS X:
In [698]: %timeit len ([None for n in range (1, 1000000) if n%3 == 1])
10 loops, best of 3: 127 ms per loop
In [699]: %timeit sum (1 for n in range (1, 1000000) if n%3 == 1)
10 loops, best of 3: 138 ms per loop
In [700]: %timeit sum ([1 for n in range (1, 1000000) if n%3 == 1])
10 loops, best of 3: 139 ms per loop
So, it's not that the comprehension is faster than the genexp; they both take about the same time. But calling len on a list is instant, while summing 1M numbers adds another 7% to the total time.
Throwing a few different numbers at it, this seems to hold up unless the list is very tiny (in which case it does seem to get faster), or large enough that memory allocation starts to become a significant factor (which it isn't yet, at 333K).
Borrowed from this answer, there are two things to consider:
1. A Python list is index-able and fetching its length only takes O(1) times. This means that the speed of calling len() on a list does not depend on its size. However, if you call len() on a generator, you're consuming all the items it generates and thus, the time complexity is O(n).
2. See the linked answer above. A list comprehension is a tight C loop, whereas a generator has to store a reference to the iterator inside and call next(iter) for every item it generates. This creates another layer of overhead for generators. At a small scale, the difference in performance between list comprehension and generators can be safely ignored, but at a larger scale, you have to consider this.

Categories