Speedups in Looping Structures

Speedups in Looping Structures - python

I notice some interesting behavior when it comes to building lists in different ways. .append takes longer than list-comprehensions, which take longer than map, as shown in the experiments below:
def square(x): return x**2
def appendtime(times=10**6):
answer = []
start = time.clock()
for i in range(times):
answer.append(square(i))
end = time.clock()
return end-start
def comptime(times=10**6):
start = time.clock()
answer = [square(i) for i in range(times)]
end = time.clock()
return end-start
def maptime(times=10**6):
start = time.clock()
answer = map(square, range(times))
end = time.clock()
return end-start
for func in [appendtime, comptime, maptime]:
print("%s: %s" %(func.__name__, func()))
Python 2.7:
appendtime: 0.42632
comptime: 0.312877
maptime: 0.232474
Python 3.3.3:
appendtime: 0.614167
comptime: 0.5506650000000001
maptime: 0.57115
Now, I am very aware that range in python 2.7 builds a list, so I get why there is a disparity between the times of the corresponding functions in python 2.7 and 3.3. What I am more concerned about is the relative time differences between append, list-comprehension and map.
At first, I considered that this might be because map and list comprehensions may afford the interpreter knowledge of the eventual size of the resultant list, which would allow the interpreter to malloc a sufficiently large C array under the hood to store the list. By that logic, list-comprehensions and map should take pretty much the same amount of time.
However, the timing data shows that in python 2.7, listcomps are ~1.36x as fast as append, and map is ~1.34x as fast as listcomps.
More curious is that in python 3.3, listcomps are ~1.12x as fast as append, and map is actually slower than listcomps.
Clearly, map and listcomps don't "play by the same rules"; clearly, map takes advantage of something that listcomps don't.
Could anybody shed some light on the reason behind the difference in these timing values?

First, in python3.x, map returns an iterable, NOT a list, so that explains the 50kx speedup there. To make it a fair timing, in python3.x you'd need list(map(...)).
Second, .append will be slower because each time through the loop, the interpretter needs to look up the list, then it needs to look up the append function on the list. This additional .append lookup does not need to happen with the list-comp or map.
Finally, with the list-comprehension, I believe the function square needs to be looked up at every turn of your loop. With map, it is only looked up when you call map which is why if you're calling a function in your list-comprehension, map will typically be faster. Note that a list-comprehension usually beats out map with a lambda function though.

Related

Is it better for performance to use min() mutliple times or to store it in a variable?

I have a small python code that uses min(list) multiple times on the same unchanged list, this got me wondering if I use the result of functions like min(), len(), etc... multiple times on an unchanged list is it better for me to store those results in variables, does it affect memory usage / performance at all?
If I had to guess I'd say that if a function like min() gets called many times it'd be better for performance to store it in a variable, but I am not sure of this since I don't really know how python gets the value or if python automatically stores this value somewhere as long as the list isn't changed.

Speed
It is almost always cheaper to store the result and re-use it rather than re-call the function multiple times.
Python does not cache (store and later remember) results from functions like min(), len(), etc.
Here is a quick speed test:
timeit.timeit("c = min(x) + min(x)", "x = [1, 2, 3]")
0.24990593400000005
timeit.timeit("a = min(x); b = a + a", "x = [1, 2, 3]")
0.1296667110000005
The second is almost twice as fast, because storing a variable is much cheaper than re-calling the min function.
Memory use
If the result is a single number, as with min() or len(), then memory use is negligible.
If the result is something substantial (e.g. a large table of values), then you can remove it when you're done with it using del
large_object = expensive_function()
do_something(large_object)
do_something_else(large_object)
del large_object
Also, large objects will automatically be deleted from memory when they fall out of scope (e.g. when a function returns) or when garbage collection rounds happen at regular intervals. For this reason, del is only necessary in certain circumstances like when dealing with circular references to an object.

min() is very fast compared to many other operations, such as I/O. So the efficiency improvements could be small for short lists and only a few repeated calls. However, if you cache the results of min(), you can realize some time savings. See the code below for examples of time you can actually save. As you can see, you need multiple iterations of the loops that contain min() calls to get any substantial the time savings.
import timeit
lst = range(2)
def test_min():
x = [min(lst) for i in range(10)]
def test_cached_min():
min_lst = min(lst)
x = [min_lst for i in range(10)]
print(timeit.timeit("test_min()", globals = locals(), number = 1000))
print(timeit.timeit("test_cached_min()", globals = locals(), number = 1000))
# lst = range(2):
# 0.0027015960000000006
# 0.0010772920000000005
# lst = range(2000):
# 0.5262554810000001
# 0.05257684900000004

functions like min or max definitely have to traverse the array each time (giving them a complexity of O(n)). So yeah, specially if your array is larger, it's a better idea to store it in a variable rather than performing the calculation again.
More details about performance in another question

Time Complexity of Python List Operations
Source
The table shows that:
function len (to get length) has complexity O(1) (so very fast, so already stored)
function min (to get minimum) is O(n) (depends upon size of list, so computed each time).
This means that:
len does not need to be stored for reuse
min should be stored for reuse (especially for large lists)

If you are only using it 1-5 times, it doesn't really matter. But if you are going to call it anymore, and really less too, it is best to just save it as a variable. It will take next to no memory, and very little time to do so and to pull it from memory.

speed up function based on list comprehension

I'm trying to get the 15 most relevant item for each users but every functions i tried took an eternity. (more than 6 hours i shutdown it after that ...)
I have 418 unique users, 3718 unique items.
U2tfifd dict has as well 418 entry and there is 32645 words in tfidf_feature_names.
Shape of my interactions_full_df is (40733, 3)
i tried :
def index_tfidf_users(user_id) :
return [users for users in U2tfifd[user_id].flatten().tolist()]
def get_relevant_items(user_id):
return sorted(zip(tfidf_feature_names, index_tfidf_users(user_id)), key=lambda x: -x[1])[:15]
def get_tfidf_token(user_id) :
return [words for words, values in get_relevant_items(user_id)]
then interactions_full_df["tags"] = interactions_full_df["user_id"].apply(lambda x : get_tfidf_token(x))
or
def get_tfidf_token(user_id) :
tags = []
v = sorted(zip(tfidf_feature_names, U2tfifd[user_id].flatten().tolist()), key=lambda x: -x[1])[:15]
for words, values in v :
tags.append(words)
return tags
or
def get_tfidf_token(user_id) :
v = sorted(zip(tfidf_feature_names, U2tfifd[user_id].flatten().tolist()), key=lambda x: -x[1])[:15]
tags = [words for words in v]
return tags
U2tfifd is a dict with keys = user_id, values = an array

There are several things going on which could cause poor performance in your code. The impact of each of these will depend on things like your Python version (2.x or 3.x), your RAM speed, and whatnot. You'll need to experiment and benchmark the various potential improvements yourself.
1. TFIDF Sparsity (~10x speedup depending on sparsity)
One glaring potential problem is that TFIDF naturally returns sparse data (e.g. a paragraph doesn't use anywhere near as many unique words as an entire book), and working with dense structures like numpy arrays is a strange choice when the data is probably zero almost everywhere.
If you'll be doing this same analysis in the future, it might be helpful to make/use a version of TFIDF with sparse array outputs so that when you extract your tokens you can skip over the zero values. This would likely have the secondary benefit of the entire sparse array for each user fitting in the cache and preventing costly RAM access in your sorts and other operations.
It might be worth sparsifying your data anyway. On my potato, a quick benchmark on data which should be similar to yours indicates that the process can be done in ~30s. The process replaces much of the work you're doing with a highly optimized routine coded in C and wrapped for use in Python. The only real cost is the second pass through the non-zero entries, but unless that pass is pretty efficient to begin with you should be better off working with sparse data.
2. Duplicated Efforts and Memoization (~100x speedup)
If U2tfifd has 418 entries and interactions_full_df has 40733 rows then at least 40315 (or 99.0%) of your calls to get_tfidf_token() are wasted since you've already computed the answer. There are tons of memoization decorators out there, but you don't need anything very complicated for your use case.
def memoize(f):
_cache = {}
def _f(arg):
if arg not in _cache:
_cache[arg] = f(arg)
return _cache[arg]
return _f
#memoize
def get_tfidf_token(user_id):
...
Breaking this down, the function memoize() returns another function. The behavior of that function is to check a local cache for the expected return value before computing it and storing it if necessary.
The syntax #memoize... is short for something like the following.
def uncached_get_tfidf_token(user_id):
...
get_tfidf_token = memoize(uncached_get_tfidf_token)
The # symbol is used to signify that we want the modified, or decorated, version of get_tfidf_token() instead of the original. Depending on your application, it might be beneficial to chain decorators together.
3. Vectorized Operations (varying speedup, benchmarking necessary)
Python doesn't really have a notion of primitive types like other languages, and even integers take 24 bytes in memory on my machine. Lists aren't usually be packed, so you can incur costly cache misses as you're plowing through them. No matter how little work the CPU is doing for sorting and whatnot, clobbering a whole new chunk of memory to turn your array into a list and only using that brand new, expensive memory once is going to incur a performance hit.
Many of the things you are trying to do have fast (SIMD vectorized, parallelized, memory-efficient, packed memory, and other fun optimizations) numpy equivalents AND avoid unnecessary array copies and type conversions. It seems you're already using numpy anyway, so you won't have any extra imports or dependencies.
As one example, zip() creates another list in memory in Python 2.x and still does unnecessary work in Python 3.x when you really only care about the indices of tfidf_feature_names. To compute those indices, you can use something like the following, which avoids an unnecessary list creation and uses an optimized routine with slightly better asymptotic complexity as an added bonus.
def get_tfidf_token(user_id):
temp = U2tfifd[user_id].flatten()
ind = np.argpartition(temp, len(temp)-15)[-15:]
return tfidf_feature_names[ind] # works if tfidf_feature_names is a numpy array
return [tfidf_feature_names[i] for i in ind] # always works
Depending on the shape of U2tfifd[user_id], you could avoid the costly .flatten() computation by passing an axis argument to np.argsort() and flattening the 15 obtained indices instead.
4. Bonus
The sorted() function supports a reverse argument so that you can avoid extra computations like throwing a negative on every value. Simply use
sorted(..., reverse=True)
Even better, since you really don't care about the sort itself but just the 15 largest values you can get away with
sorted(...)[-15:]
to index the largest 15 instead of reversing the sort and taking the smallest 15. That doesn't really matter if you're using a better function for the application like np.argpartition(), but it could be helpful in the future.
You can also avoid some function calls by replacing .apply(lambda x : get_tfidf_token(x)) with .apply(get_tfidf_token) since get_tfidf_token is already a function which has the intended behavior. You don't really need the extra lambda.
As far as I can see though, most additional gains are fairly nitpicky and system-dependent. You can make most things faster with Cython or straight C with enough time for example, but you already have reasonably fast routines which do what you want out of the box. The extra engineering effort probably isn't worth any potential gains.

python generator is too slow to use it. why should I use it? and when?

Recently I got question about which one is the most fastest thing among iterator, list comprehension, iter(list comprehension) and generator.
and then make simple code as below.
n = 1000000
iter_a = iter(range(n))
list_comp_a = [i for i in range(n)]
iter_list_comp_a = iter([i for i in range(n)])
gene_a = (i for i in range(n))
import time
import numpy as np
for xs in [iter_a, list_comp_a, iter_list_comp_a, gene_a]:
start = time.time()
np.sum(xs)
end = time.time()
print((end-start)*100)
the result is below.
0.04439353942871094 # iterator
9.257078170776367 # list_comprehension
0.006318092346191406 # iterator of list_comprehension
7.491207122802734 # generator
generator is so slower than other thing.
and I don't know when it is useful?

generators do not store all elements in a memory in one go. They yield one at a time, and this behavior makes them memory efficient. Thus you can use them when memory is a constraint.

As a preamble : your whole benchmark is just plain wrong - the "list_comp_a" test doesn't test the construction time of a list using a list comprehension (nor does "iter_list_comp_a" fwiw), and the tests using iter() are mostly irrelevant - iter(iterable) is just a shortcut for iterable.__iter__() and is only of any use if you want to manipulate the iterator itself, which is practically quite rare.
If you hope to get some meaningful results, what you want to benchmark are the execution of a list comprehension, a generator expression and a generator function. To test their execution, the simplest way is to wrap all three cases in functions, one execution a list comprehension and the other two building lists from resp. a generator expression and a generator built from a generator function). In all cases I used xrange as the real source so we only benchmark the effective differences. Also we use timeit.timeit to do the benchmark as it's more reliable than manually messing with time.time(), and is actually the pythonic standard canonical way to benchmark small code snippets.
import timeit
# py2 / py3 compat
try:
xrange
except NameError:
xrange = range
n = 1000
def test_list_comp():
return [x for x in xrange(n)]
def test_genexp():
return list(x for x in xrange(n))
def mygen(n):
for x in xrange(n):
yield x
def test_genfunc():
return list(mygen(n))
for fname in "test_list_comp", "test_genexp", "test_genfunc":
result = timeit.timeit("fun()", "from __main__ import {} as fun".format(fname), number=10000)
print("{} : {}".format(fname, result))
Here (py 2.7.x on a 5+ years old standard desktop) I get the following results:
test_list_comp : 0.254354953766
test_genexp : 0.401108026505
test_genfunc : 0.403750896454
As you can see, list comprehensions are faster, and generator expressions and generator functions are mostly equivalent with a very slight (but constant if you repeat the test) advantage to generator expressions.
Now to answer your main question "why and when would you use generators", the answer is threefold: 1/ memory use, 2/ infinite iterations and 3/ coroutines.
First point : memory use. Actually, you don't need generators here, only lazy iteration, which can be obtained by writing your own iterable / iterable - like for example the builtin file type does - in a way to avoid loading everything in memory and only generating values on the fly. Here generators expressions and functions (and the underlying generator class) are a generic way to implement lazy iteration without writing your own iterable / iterator (just like the builtin property class is a generic way to use custom descriptors without wrting your own descriptor class).
Second point: infinite iteration. Here we have something that you can't get from sequence types (lists, tuples, sets, dicts, strings etc) which are, by definition, finite). An example is the itertools.cycle iterator:
Return elements from the iterable until it is exhausted.
Then repeat the sequence indefinitely.
Note that here again this ability comes not from generator functions or expressions but from the iterable/iterator protocol. There are obviously less use case for infinite iteration than for memory use optimisations, but it's still a handy feature when you need it.
And finally the third point: coroutines. Well, this is a rather complex concept specially the first time you read about it, so I'll let someone else do the introduction : https://jeffknupp.com/blog/2013/04/07/improve-your-python-yield-and-generators-explained/
Here you have something that only generators can offer, not a handy shortcut for iterables/iterators.

I think I asked a wrong question, maybe.
in original code, it was not correct because the np.sum doesn't works well.
np.sum(iterator) doesn't return correct answer. So, I changed my code like below.
n = 10000
iter_a = iter(range(n))
list_comp_a = [i for i in range(n)]
iter_list_comp_a = iter([i for i in range(n)])
gene_a = (i for i in range(n))
import time
import numpy as np
import timeit
for xs in [iter_a, list_comp_a, iter_list_comp_a, gene_a]:
start = time.time()
sum(xs)
end = time.time()
print("type: {}, performance: {}".format(type(xs), (end-start)*100))
and then, performance is like below. the performance of list is best and iterator is not good.
type: <class 'range_iterator'>, performance: 0.021791458129882812
type: <class 'list'>, performance: 0.013279914855957031
type: <class 'list_iterator'>, performance: 0.02429485321044922
type: <class 'generator'>, performance: 0.13570785522460938
and like #Kishor Pawar already mentioned, the list is better for performance, but when memory size is not enough, sum of list with too high n make the computer slower, but sum of iterator with too high n, maybe it it really a lot of time to compute, but didn't make the computer slow.
Thx for all.
When I have to compute a lot of lot of data, generator is better.
but,

Python: For loop Vs. Map

I am currently in the process of optimising the translation part of my software, which translates co-ordinates x amount of times. My current translation code is in the translate function and the supposedly optimised portion in the translate_map function.
I read here that the map function should be used instead of for loops where possible because the loop is performed in C.
When I run a test case below, the map function actually runs slower than a standard for loop. Why does the map perform slower than the conventional for loop? How could I optimise the translate function to run faster?
import time
def translate(atom_list):
for i in atom_list:
i[1]+=1
i[2]+=1
i[3]+=1
atoms = [[1,1,1,1]]*1000
start = time.time()
for x in xrange(10000):
translate(atoms)
print time.time() - start
atoms = [[1,1,1,1]]*1000
start = time.time()
def translate_map(atom_list):
atom_list[1]+=1
atom_list[2]+=1
atom_list[3]+=1
for x in xrange(10000):
map(translate_map,atoms)
print time.time() - start
output:
2.92705798149
4.14674210548

I suspect most of the overhead you're seeing with your map implementation comes from function call overhead. The translate function does all its work within a single loop, so there's just a single function call for the whole process. The implementation with map makes a separate function call for every item in the list.
A second source of overhead (though I suspect it is small compared to the function calls) is that map creates a list with the return values from the function. Since translate_map doesn't have a return statement, this will be all None values. Note that in Python 3, map is a generator, so your map version won't work at all unless you iterate over the results from the map call. The explicit loop is much clearer though, so I'd stick with that (if you don't go for numpy).
Oh, yes, numpy would make this much easier (and almost certainly faster too):
def translate(arr): # arr should be a numpy array
arr += 1
That's it! No loops needed (at the Python level).

When is not a good time to use python generators?

This is rather the inverse of What can you use Python generator functions for?: python generators, generator expressions, and the itertools module are some of my favorite features of python these days. They're especially useful when setting up chains of operations to perform on a big pile of data--I often use them when processing DSV files.
So when is it not a good time to use a generator, or a generator expression, or an itertools function?
When should I prefer zip() over itertools.izip(), or
range() over xrange(), or
[x for x in foo] over (x for x in foo)?
Obviously, we eventually need to "resolve" a generator into actual data, usually by creating a list or iterating over it with a non-generator loop. Sometimes we just need to know the length. This isn't what I'm asking.
We use generators so that we're not assigning new lists into memory for interim data. This especially makes sense for large datasets. Does it make sense for small datasets too? Is there a noticeable memory/cpu trade-off?
I'm especially interested if anyone has done some profiling on this, in light of the eye-opening discussion of list comprehension performance vs. map() and filter(). (alt link)

Use a list instead of a generator when:
1) You need to access the data multiple times (i.e. cache the results instead of recomputing them):
for i in outer: # used once, okay to be a generator or return a list
for j in inner: # used multiple times, reusing a list is better
...
2) You need random access (or any access other than forward sequential order):
for i in reversed(data): ... # generators aren't reversible
s[i], s[j] = s[j], s[i] # generators aren't indexable
3) You need to join strings (which requires two passes over the data):
s = ''.join(data) # lists are faster than generators in this use case
4) You are using PyPy which sometimes can't optimize generator code as much as it can with normal function calls and list manipulations.

In general, don't use a generator when you need list operations, like len(), reversed(), and so on.
There may also be times when you don't want lazy evaluation (e.g. to do all the calculation up front so you can release a resource). In that case, a list expression might be better.

Profile, Profile, Profile.
Profiling your code is the only way to know if what you're doing has any effect at all.
Most usages of xrange, generators, etc are over static size, small datasets. It's only when you get to large datasets that it really makes a difference. range() vs. xrange() is mostly just a matter of making the code look a tiny little bit more ugly, and not losing anything, and maybe gaining something.
Profile, Profile, Profile.

You should never favor zip over izip, range over xrange, or list comprehensions over generator comprehensions. In Python 3.0 range has xrange-like semantics and zip has izip-like semantics.
List comprehensions are actually clearer like list(frob(x) for x in foo) for those times you need an actual list.

As you mention, "This especially makes sense for large datasets", I think this answers your question.
If your not hitting any walls, performance-wise, you can still stick to lists and standard functions. Then when you run into problems with performance make the switch.
As mentioned by #u0b34a0f6ae in the comments, however, using generators at the start can make it easier for you to scale to larger datasets.

Regarding performance: if using psyco, lists can be quite a bit faster than generators. In the example below, lists are almost 50% faster when using psyco.full()
import psyco
import time
import cStringIO
def time_func(func):
"""The amount of time it requires func to run"""
start = time.clock()
func()
return time.clock() - start
def fizzbuzz(num):
"""That algorithm we all know and love"""
if not num % 3 and not num % 5:
return "%d fizz buzz" % num
elif not num % 3:
return "%d fizz" % num
elif not num % 5:
return "%d buzz" % num
return None
def with_list(num):
"""Try getting fizzbuzz with a list comprehension and range"""
out = cStringIO.StringIO()
for fibby in [fizzbuzz(x) for x in range(1, num) if fizzbuzz(x)]:
print >> out, fibby
return out.getvalue()
def with_genx(num):
"""Try getting fizzbuzz with generator expression and xrange"""
out = cStringIO.StringIO()
for fibby in (fizzbuzz(x) for x in xrange(1, num) if fizzbuzz(x)):
print >> out, fibby
return out.getvalue()
def main():
"""
Test speed of generator expressions versus list comprehensions,
with and without psyco.
"""
#our variables
nums = [10000, 100000]
funcs = [with_list, with_genx]
# try without psyco 1st
print "without psyco"
for num in nums:
print " number:", num
for func in funcs:
print func.__name__, time_func(lambda : func(num)), "seconds"
print
# now with psyco
print "with psyco"
psyco.full()
for num in nums:
print " number:", num
for func in funcs:
print func.__name__, time_func(lambda : func(num)), "seconds"
print
if __name__ == "__main__":
main()
Results:
without psyco
number: 10000
with_list 0.0519102208309 seconds
with_genx 0.0535933367509 seconds
number: 100000
with_list 0.542204280744 seconds
with_genx 0.557837353115 seconds
with psyco
number: 10000
with_list 0.0286369007033 seconds
with_genx 0.0513424889137 seconds
number: 100000
with_list 0.335414877839 seconds
with_genx 0.580363490491 seconds

You should prefer list comprehensions if you need to keep the values around for something else later and the size of your set is not too large.
For example:
you are creating a list that you will loop over several times later in your program.
To some extent you can think of generators as a replacement for iteration (loops) vs. list comprehensions as a type of data structure initialization. If you want to keep the data structure then use list comprehensions.

As far as performance is concerned, I can't think of any times that you would want to use a list over a generator.

I've never found a situation where generators would hinder what you're trying to do. There are, however, plenty of instances where using generators would not help you any more than not using them.
For example:
sorted(xrange(5))
Does not offer any improvement over:
sorted(range(5))

A generator builds and enumerable list of values. enumerables are useful when iterative process can use the values on demand. It takes time to build your generator, so if the list is millions of records in size, it may be more useful to use sql server to process the data in sql.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Speedups in Looping Structures - python

Related

Is it better for performance to use min() mutliple times or to store it in a variable?

speed up function based on list comprehension

python generator is too slow to use it. why should I use it? and when?

Python: For loop Vs. Map

When is not a good time to use python generators?

Categories

Resources