I am currently in the process of optimising the translation part of my software, which translates co-ordinates x amount of times. My current translation code is in the translate function and the supposedly optimised portion in the translate_map function.
I read here that the map function should be used instead of for loops where possible because the loop is performed in C.
When I run a test case below, the map function actually runs slower than a standard for loop. Why does the map perform slower than the conventional for loop? How could I optimise the translate function to run faster?
import time
def translate(atom_list):
for i in atom_list:
i[1]+=1
i[2]+=1
i[3]+=1
atoms = [[1,1,1,1]]*1000
start = time.time()
for x in xrange(10000):
translate(atoms)
print time.time() - start
atoms = [[1,1,1,1]]*1000
start = time.time()
def translate_map(atom_list):
atom_list[1]+=1
atom_list[2]+=1
atom_list[3]+=1
for x in xrange(10000):
map(translate_map,atoms)
print time.time() - start
output:
2.92705798149
4.14674210548
I suspect most of the overhead you're seeing with your map implementation comes from function call overhead. The translate function does all its work within a single loop, so there's just a single function call for the whole process. The implementation with map makes a separate function call for every item in the list.
A second source of overhead (though I suspect it is small compared to the function calls) is that map creates a list with the return values from the function. Since translate_map doesn't have a return statement, this will be all None values. Note that in Python 3, map is a generator, so your map version won't work at all unless you iterate over the results from the map call. The explicit loop is much clearer though, so I'd stick with that (if you don't go for numpy).
Oh, yes, numpy would make this much easier (and almost certainly faster too):
def translate(arr): # arr should be a numpy array
arr += 1
That's it! No loops needed (at the Python level).
Related
I'm working on a piece of code for a game that calculates the distances between all the objects on the screen using their in-game coordinate positions. Originally I was going to use basic Python and lists to do this, but since the number of distances that will need calculated will increase exponentially with the number of objects, I thought it might be faster to do it with numpy.
I'm not very familiar with numpy, and I've been experimenting on basic bits of code with it. I wrote a little bit of code to time how long it takes for the same function to complete a calculation in numpy and in regular Python, and numpy seems to consistently take a good bit more time than the regular python.
The function is very simple. It starts with 1.1 and then increments 200,000 times, adding 0.1 to the last value and then finding the square root of the new value. It's not what I'll actually be doing in the game code, which will involve finding total distance vectors from position coordinates; it's just a quick test I threw together. I already read here that the initialization of arrays takes more time in NumPy, so I moved the initializations of both the numpy and python arrays outside their functions, but Python is still faster than numpy.
Here is the bit of code:
#!/usr/bin/python3
import numpy
from timeit import timeit
#from time import process_time as timer
import math
thing = numpy.array([1.1,0.0], dtype='float')
thing2 = [1.1,0.0]
def NPFunc():
for x in range(1,200000):
thing[0] += 0.1
thing[1] = numpy.sqrt(thing[0])
print(thing)
return None
def PyFunc():
for x in range(1,200000):
thing2[0] += 0.1
thing2[1] = math.sqrt(thing2[0])
print(thing2)
return None
print(timeit(NPFunc, number=1))
print(timeit(PyFunc, number=1))
It gives this result, which indicates normal Python is 3x faster:
[ 20000.99999999 141.42489173]
0.2917748889885843
[20000.99999998944, 141.42489172698504]
0.10341173503547907
Am I doing something wrong, is is this calculation just so simple that it isn't a good test for numpy?
Am I doing something wrong, is is this calculation just so simple that it isn't a good test for NumPy?
It's not really that the calculation is simple, but that you're not taking any advantage of NumPy.
The main benefit of NumPy is vectorization: you can apply an operation to every element of an array in one go, and whatever looping is needed happens inside some tightly-optimized C (or Fortran or C++ or whatever) loop inside NumPy, rather than in a slow generic Python iteration.
But you're only accessing a single value, so there's no looping to be done in C.
On top of that, because the values in an array are stored as "native" values, NumPy functions don't need to unbox them, pulling the raw C double out of a Python float, and then re-box them in a new Python float, the way any Python math functions have to.
But you're not doing that either. In fact, you're doubling that work: You're pulling the value out o the array as a float (boxing it), then passing it to a function (which has to unbox it, and then rebox it to return a result), then storing it back in an array (unboxing it again).
And meanwhile, because np.sqrt is designed to work on arrays, it has to first check the type of what you're passing it and decide whether it needs to loop over an array or unbox and rebox a single value or whatever, while math.sqrt just takes a single value. When you call np.sqrt on an array of 200000 elements, the added cost of that type switch is negligible, but when you're doing it every time through the inner loop, that's a different story.
So, it's not an unfair test.
You've demonstrated that using NumPy to pull out values one at a time, act on them one at a time, and store them back in the array one at a time is slower than just not using NumPy.
But, if you compare it to actually taking advantage of NumPy—e.g., by creating an array of 200000 floats and then calling np.sqrt on that array vs. looping over it and calling math.sqrt on each one—you'll demonstrate that using NumPy the way it was intended is faster than not using it.
you're comparing it wrong
a_list = np.arange(0,20000,0.1)
timeit(lambda:np.sqrt(a_list),number=1)
Recently I got question about which one is the most fastest thing among iterator, list comprehension, iter(list comprehension) and generator.
and then make simple code as below.
n = 1000000
iter_a = iter(range(n))
list_comp_a = [i for i in range(n)]
iter_list_comp_a = iter([i for i in range(n)])
gene_a = (i for i in range(n))
import time
import numpy as np
for xs in [iter_a, list_comp_a, iter_list_comp_a, gene_a]:
start = time.time()
np.sum(xs)
end = time.time()
print((end-start)*100)
the result is below.
0.04439353942871094 # iterator
9.257078170776367 # list_comprehension
0.006318092346191406 # iterator of list_comprehension
7.491207122802734 # generator
generator is so slower than other thing.
and I don't know when it is useful?
generators do not store all elements in a memory in one go. They yield one at a time, and this behavior makes them memory efficient. Thus you can use them when memory is a constraint.
As a preamble : your whole benchmark is just plain wrong - the "list_comp_a" test doesn't test the construction time of a list using a list comprehension (nor does "iter_list_comp_a" fwiw), and the tests using iter() are mostly irrelevant - iter(iterable) is just a shortcut for iterable.__iter__() and is only of any use if you want to manipulate the iterator itself, which is practically quite rare.
If you hope to get some meaningful results, what you want to benchmark are the execution of a list comprehension, a generator expression and a generator function. To test their execution, the simplest way is to wrap all three cases in functions, one execution a list comprehension and the other two building lists from resp. a generator expression and a generator built from a generator function). In all cases I used xrange as the real source so we only benchmark the effective differences. Also we use timeit.timeit to do the benchmark as it's more reliable than manually messing with time.time(), and is actually the pythonic standard canonical way to benchmark small code snippets.
import timeit
# py2 / py3 compat
try:
xrange
except NameError:
xrange = range
n = 1000
def test_list_comp():
return [x for x in xrange(n)]
def test_genexp():
return list(x for x in xrange(n))
def mygen(n):
for x in xrange(n):
yield x
def test_genfunc():
return list(mygen(n))
for fname in "test_list_comp", "test_genexp", "test_genfunc":
result = timeit.timeit("fun()", "from __main__ import {} as fun".format(fname), number=10000)
print("{} : {}".format(fname, result))
Here (py 2.7.x on a 5+ years old standard desktop) I get the following results:
test_list_comp : 0.254354953766
test_genexp : 0.401108026505
test_genfunc : 0.403750896454
As you can see, list comprehensions are faster, and generator expressions and generator functions are mostly equivalent with a very slight (but constant if you repeat the test) advantage to generator expressions.
Now to answer your main question "why and when would you use generators", the answer is threefold: 1/ memory use, 2/ infinite iterations and 3/ coroutines.
First point : memory use. Actually, you don't need generators here, only lazy iteration, which can be obtained by writing your own iterable / iterable - like for example the builtin file type does - in a way to avoid loading everything in memory and only generating values on the fly. Here generators expressions and functions (and the underlying generator class) are a generic way to implement lazy iteration without writing your own iterable / iterator (just like the builtin property class is a generic way to use custom descriptors without wrting your own descriptor class).
Second point: infinite iteration. Here we have something that you can't get from sequence types (lists, tuples, sets, dicts, strings etc) which are, by definition, finite). An example is the itertools.cycle iterator:
Return elements from the iterable until it is exhausted.
Then repeat the sequence indefinitely.
Note that here again this ability comes not from generator functions or expressions but from the iterable/iterator protocol. There are obviously less use case for infinite iteration than for memory use optimisations, but it's still a handy feature when you need it.
And finally the third point: coroutines. Well, this is a rather complex concept specially the first time you read about it, so I'll let someone else do the introduction : https://jeffknupp.com/blog/2013/04/07/improve-your-python-yield-and-generators-explained/
Here you have something that only generators can offer, not a handy shortcut for iterables/iterators.
I think I asked a wrong question, maybe.
in original code, it was not correct because the np.sum doesn't works well.
np.sum(iterator) doesn't return correct answer. So, I changed my code like below.
n = 10000
iter_a = iter(range(n))
list_comp_a = [i for i in range(n)]
iter_list_comp_a = iter([i for i in range(n)])
gene_a = (i for i in range(n))
import time
import numpy as np
import timeit
for xs in [iter_a, list_comp_a, iter_list_comp_a, gene_a]:
start = time.time()
sum(xs)
end = time.time()
print("type: {}, performance: {}".format(type(xs), (end-start)*100))
and then, performance is like below. the performance of list is best and iterator is not good.
type: <class 'range_iterator'>, performance: 0.021791458129882812
type: <class 'list'>, performance: 0.013279914855957031
type: <class 'list_iterator'>, performance: 0.02429485321044922
type: <class 'generator'>, performance: 0.13570785522460938
and like #Kishor Pawar already mentioned, the list is better for performance, but when memory size is not enough, sum of list with too high n make the computer slower, but sum of iterator with too high n, maybe it it really a lot of time to compute, but didn't make the computer slow.
Thx for all.
When I have to compute a lot of lot of data, generator is better.
but,
I am trying to understand Python generators in many tutorials guys tells that they are much faster then for example iterating through a list, so I give it a try, I wrote a simple code. I didn't expect that time difference can be that big, can someone explain me why? Or maybe I am doing something wrong here.
def f(limit):
for i in range(limit):
if(i / 7.0) % 1 == 0:
yield i
def f1(limit):
l = []
for i in range(limit):
if(i / 7.0) % 1 == 0:
l.append(i)
return l
t = timeit.Timer(stmt="f(50)", setup="from __main__ import f")
print t.timeit()
t1 = timeit.Timer(stmt="f1(50)", setup="from __main__ import f1")
print t1.timeit()
Results:
t = 0.565694382945
t1 =11.9298217371
You are not comparing f and f1 fairly.
Your first test is only measuring how long it takes Python to construct a generator object. It never iterates over this object though, which means the code inside f is never actually executed (generators only execute their code when they are iterated over).
Your second test however measures how long it takes to call f1. Meaning, it counts how long it takes the function to construct the list l, run the for-loop to completion, call list.append numerous times, and then return the result. Obviously, this will be much slower than just producing a generator object.
For a fair comparison, exhaust the generator f by converting it into a list:
t = timeit.Timer(stmt="list(f(50))", setup="from __main__ import f")
This will cause it to be iterated over entirely, which means the code inside f will now be executed.
You're timing how long it takes to create a generator object. Creating one doesn't actually execute any code, so you're essentially timing an elaborate way to do nothing.
After you fix that, you'll find that generators are usually slightly slower when run to completion. Their advantage is that they don't need to store all elements in memory at once, and can stop halfway through. For example, when you have a sequence of boolean values and want to check whether any of them are true, with lists you'd first compute all the values and create a list of them, before checking for truth, while with generators you can:
Create the first boolean
Check if it's true, and if so, stop creating booleans
Else, create the second boolean
Check if that one's true, and if so, stop creating booleans
And so on.
https://wiki.python.org/moin/Generators has some good information under the section improved performance. Although creating a generator can take a bit of time, it offers a number of advantages.
Uses less memory. By creating the values one by one, the whole list is never in memory.
Less time to begin. Making a whole list takes time, while a generator can be used as soon as it creates the first value.
Generators don't have a set ending point.
Here's a good tutorial on creating generators and iterators http://sahandsaba.com/python-iterators-generators.html. Check it out!
Python is a kind of "script" programming language.
In this situation:
def dic_test():
a={}
a[0]=[0,0,0]
for i in range(10000000):
a[0][0]+=1
a[0][1]+=1
a[0][2]+=1
print(a)
def no_dic_test():
a={}
a[0]=[0,0,0]
target=a[0]
for i in range(10000000):
target[0]+=1
target[1]+=1
target[2]+=1
print(a)
Will no_dic_test() be faster than dic_test()?
I thought Yes. Because, Python is dynamical. Each statement will be translated separately.
I used profile to benchmark. The first function was slower than second one, but the different was slight.
First function: 5 function calls in 26.113 seconds
Second function: 5 function calls in 23.835 seconds
That is a extreme case. In my own case, like 10k keys, 10k times operations, direct use of a dictionary will be faster. I am so surprised.
To end, is there "static compiler" like C or cache optimisation in Python for Dictionary? or are Python hash table just too fast to face the problems?
Thanks!
Its pretty obvious that the second function is doing far less work on each loop.
The first function will have to do a dict lookup and local store for each loop where as the second function does this once.
There are runtimes like PyPy that spot the hot loop and JIT compile them for added performance, but the CPython runtime doesn't do this kind of optimisation yet.
I notice some interesting behavior when it comes to building lists in different ways. .append takes longer than list-comprehensions, which take longer than map, as shown in the experiments below:
def square(x): return x**2
def appendtime(times=10**6):
answer = []
start = time.clock()
for i in range(times):
answer.append(square(i))
end = time.clock()
return end-start
def comptime(times=10**6):
start = time.clock()
answer = [square(i) for i in range(times)]
end = time.clock()
return end-start
def maptime(times=10**6):
start = time.clock()
answer = map(square, range(times))
end = time.clock()
return end-start
for func in [appendtime, comptime, maptime]:
print("%s: %s" %(func.__name__, func()))
Python 2.7:
appendtime: 0.42632
comptime: 0.312877
maptime: 0.232474
Python 3.3.3:
appendtime: 0.614167
comptime: 0.5506650000000001
maptime: 0.57115
Now, I am very aware that range in python 2.7 builds a list, so I get why there is a disparity between the times of the corresponding functions in python 2.7 and 3.3. What I am more concerned about is the relative time differences between append, list-comprehension and map.
At first, I considered that this might be because map and list comprehensions may afford the interpreter knowledge of the eventual size of the resultant list, which would allow the interpreter to malloc a sufficiently large C array under the hood to store the list. By that logic, list-comprehensions and map should take pretty much the same amount of time.
However, the timing data shows that in python 2.7, listcomps are ~1.36x as fast as append, and map is ~1.34x as fast as listcomps.
More curious is that in python 3.3, listcomps are ~1.12x as fast as append, and map is actually slower than listcomps.
Clearly, map and listcomps don't "play by the same rules"; clearly, map takes advantage of something that listcomps don't.
Could anybody shed some light on the reason behind the difference in these timing values?
First, in python3.x, map returns an iterable, NOT a list, so that explains the 50kx speedup there. To make it a fair timing, in python3.x you'd need list(map(...)).
Second, .append will be slower because each time through the loop, the interpretter needs to look up the list, then it needs to look up the append function on the list. This additional .append lookup does not need to happen with the list-comp or map.
Finally, with the list-comprehension, I believe the function square needs to be looked up at every turn of your loop. With map, it is only looked up when you call map which is why if you're calling a function in your list-comprehension, map will typically be faster. Note that a list-comprehension usually beats out map with a lambda function though.