Accelerating for loops with list methods in Python - python

I have for loops in python that iterates nearly 2.5 million times and it's taking so much time to get a result. In JS I can make this happen in nearly 1 second but Python does it in 6 seconds on my computer. I must use Python in this case. Here is the code:
for i in xrange(k,p,2):
arx = process[i]
ary = process[i+1]
for j in xrange(7,-1,-1):
nx = arx + dirf[j]
ny = ary + dirg[j]
ind = ny*w+nx
if data[ind] == e[j]:
process[c]=nx
c=c+1
process[c]=ny
c=c+1
matrix[ind]=1
Here is some lists from code:
process = [None] * (11000*4) it's items will be replaced with integers after it's assignment.
dirf = [-1,0,1,-1,1,-1,0,1]
dirg = [1,1,1,0,0,-1,-1,-1]
e = [9, 8, 7, 6, 4, 3, 2, 1]
the data list is consists of 'r' informations from pixels of an rgba image.
data = imgobj.getdata(0)
How can I boost this. What am I doing wrong? Is there any other approaches about for loops? Thanks.

Here are a few suggestions for improving your code:
That inner xrange is being used a lot: what if you made that a regular list and just did something like this:
inner = range(7,-1,-1) # make the actual list
for(a,b,c): #first for
#stuff
for i in inner # reference an actual list and not the generator
Evidence :
n = range(7,-1,-1)
def one():
z = 0
for k in xrange(100):
for i in n:
z+=1
def two():
z = 0
for k in xrange(100):
for i in xrange(7,-1,-1):
z+=1
if __name__ == '__main__':
import timeit
print("one:")
print(timeit.timeit("one()",number=1000000 ,setup="from __main__ import one"))
print("two:")
print(timeit.timeit("two()",number=1000000 ,setup="from __main__ import two"))
"result"
one:
37.798637867
two:
63.5098838806
If the code I wrote is comparable, it would appear to indicate that referencing the inner list and not generating really speeds it up.
[edit] referencing local variable is faster than accessing global.
so if this is correct place the list definition as close to the loop as possible without having it generate every time.
You are also changing process twice. If it's not needed, just choose one.
As you mentioned in the comments, you say you're working with images. I am not sure if the following is relevant, but perhaps you could use openCV, which has a Python API to C code. That might speed it up. As others have mentioned: numpy and your own cython extensions will speed this up considerably.

Related

Speed of different data structures in Python

I have a piece of code I need to run, however, it is taking so long. I estimate it will take a minimum of 2 hours to run and at most 100 hours. As you can see, I would like to speed this up.
My code is as follows:
#B1
file = open("/home/1015097/Downloads/B1/tour5.txt", "r").readlines()
# This is a very large file. You can find it on http://codemasters.eng.unimelb.edu.au/bit_problems/B1.zip
# It is named tour5.txt, you can also find the question I am trying to solve there.
racers = [int(file[0].split()[0]) / float(x) for x in file[0].split()[1::]]
print("hi")
def av(race):
race = race.split()
j = 0
while j != len([float(race[0]) / float(x) for x in race[1::]]):
racers[j] = [float(race[0]) / float(x) for x in race[1::]][j] + racers[j]
j += 1
print(j)
print("yay...")
for i in range(1, len(file)):
print("yay")
av(file[i])
print("yaaay")
a = min(racers)
del(racers[racers.index(min(racers))])
b = min(racers)
c = b-a
h = int(c)
c-=h
m = int(c * 60)
c-=m/60
s = round(c * 60 * 60)
print(str(h) + "h" + str(m) + "m" + str(s) + "s")
We are currently coming first in Australia for the Code Bits contest and would not like to drop our perfect score. The random print statements were so that we could tell if the code was actually running, they are essentially checkpoints. the number that is printed out is the racer number, there are at least 3000 racers, we do not know exactly.
I would start with changing:
while j != len([float(race[0]) / float(x) for x in race[1::]]):
racers[j] = [float(race[0]) / float(x) for x in race[1::]][j] + racers[j]
into
while j != len(race) - 1:
racers[j] += float(race[0]) / float(race[j])
Avoid loops like the plague. Vectorize everything. Use numpy, etc. If you have to, even look into Cython. But most importantly, vectorize.
The av() function is probably the part where your code is taking the most time (btw, it would be a good idea to profile the code at various points, figure out the most taxing process and focus on vectorizing it). Also, try to minimize the number of initializations. If you can, create an object only once and for all.
Below is how I would change up the function.
import numpy as np
racers = np.array(racers)
def av(race, racers):
race = race.split()
race_float = np.array(len([float(race[0]) / float(x) for x in race[1::]]))
racers += race_float
return racers
Also, please refrain from:
Using print for debugging. You have a built-in logging module. Use it.
Using globals. Just pass them into functions as arguments and return the new object instead of directly modifying a global object.
I think you should be looking at numpy arrays instead of lists. Thereby you can avoid the loops and obtain close to c -speed. This problem is easily applicable to that. And by the way why not store anything in float64 or float32 so there is no conversion of datatypes. Code example not fully portable. It is a schoolwork and I should not do it for you:
import numpy as np
racer=np.array(racer) # Will work for 1 d lists and for 2-d lists with same lenght
racer_time=racer/time # diving a vector by a scalar is easy

Python: Fast mapping and lookup between two lists

I'm currently working on a high-performance python 2.7 project utilizing lists ten thousands elements in size. Obviously, every operation must be performed as fast as possible.
So, I have two lists: One of them is a list of unique arbitrary numbers, let's call it A, and the other one is a linear list starting with 1 and with the same length as the first list, named B, which represents indices in A (starting with 1)
Something like enumerate, starting with 1.
For example:
A = [500, 300, 400, 200, 100] # The order here is arbitrary, they can be any integers, but every integer can only exist once
B = [ 1, 2, 3, 4, 5] # This is fixed, starting from 1, with exactly as many elements as A
If I have an element of B (called e_B) and want the corresponding element in A, I can simply do correspond_e_A = A[e_B - 1]. No problem.
But now I have a huge list of random, non-unique integers, and I want to know the indices of the integers that are in A, and what the corresponding elements in B are.
I think I have a reasonable solution for the first question:
indices_of_existing = numpy.nonzero(numpy.in1d(random_list, A))[0]
What is great about this approach is that there is no need to map() single operations, numpy's in1d just returns a list like [True, True, False, True, ...]. Using nonzero() I can get the indices of the elements in random_list that exist in A. Perfect, I think.
But for the second question, I'm stumped.
I tried something like:
corresponding_e_B = map(lambda x: numpy.where(A==x)[0][0] + 1, random_list))
This correctly gives me the indices, but it's not optimal, because firstly I need a map(), secondly I need a lambda, and finally numpy.where() does not stop after the item was found once (remember, A has only unique elements), meaning that it scales horribly with huge datasets like mine.
I took a look at bisect, but it seems bisect only works with single requests, not with lists, meaning that I'd still have to use map() and build my list elementwise (that's slow, isn't it?)
Since I'm quite new to Python, I was hoping anyone here might have an idea? Maybe a library I don't know yet?
I think you would be well advised to use a hashtable for your lookups instead of numpy.in1d, which uses a O(n log n) merge sort as a preprocessing step.
>>> A = [500, 300, 400, 200, 100]
>>> index = { k:i for i,k in enumerate(A, 1) }
>>> random_list = [200, 100, 50]
>>> [i for i,x in enumerate(random_list) if x in index]
[0, 1]
>>> map(index.get, random_list)
[4, 5, None]
>>> filter(None, map(index.get, random_list))
[4, 5]
This is Python 2, in Python 3 map and filter return generator-like objects, so you would need a list around filter if you want to get the result as a list.
I have tried to use builtin functions as much as possible to push the computational burden to the C side (assuming you use CPython). All the names are resolved upfront, so it should be pretty fast.
In general, for maximum performance, you might want to consider using PyPy, a great alternative Python implementation with JIT compilation.
A benchmark to compare multiple approaches is never a bad idea:
import sys
is_pypy = '__pypy__' in sys.builtin_module_names
import timeit
import random
if not is_pypy:
import numpy
import bisect
n = 10000
m = 10000
q = 100
A = set()
while len(A) < n: A.add(random.randint(0,2*n))
A = list(A)
queries = set()
while len(queries) < m: queries.add(random.randint(0,2*n))
queries = list(queries)
# these two solve question one (find indices of queries that exist in A)
if not is_pypy:
def fun11():
for _ in range(q):
numpy.nonzero(numpy.in1d(queries, A))[0]
def fun12():
index = set(A)
for _ in range(q):
[i for i,x in enumerate(queries) if x in index]
# these three solve question two (find according entries of B)
def fun21():
index = { k:i for i,k in enumerate(A, 1) }
for _ in range(q):
[index[i] for i in queries if i in index]
def fun22():
index = { k:i for i,k in enumerate(A, 1) }
for _ in range(q):
list(filter(None, map(index.get, queries)))
def findit(keys, values, key):
i = bisect.bisect(keys, key)
if i == len(keys) or keys[i] != key:
return None
return values[i]
def fun23():
keys, values = zip(*sorted((k,i) for i,k in enumerate(A,1)))
for _ in range(q):
list(filter(None, [findit(keys, values, x) for x in queries]))
if not is_pypy:
# note this does not filter out nonexisting elements
def fun24():
I = numpy.argsort(A)
ss = numpy.searchsorted
maxi = len(I)
for _ in range(q):
a = ss(A, queries, sorter=I)
I[a[a<maxi]]
tests = ("fun12", "fun21", "fun22", "fun23")
if not is_pypy: tests = ("fun11",) + tests + ("fun24",)
if is_pypy:
# warmup
for f in tests:
timeit.timeit("%s()" % f, setup = "from __main__ import %s" % f, number=20)
# actual timing
for f in tests:
print("%s: %.3f" % (f, timeit.timeit("%s()" % f, setup = "from __main__ import %s" % f, number=3)))
Results:
$ python2 -V
Python 2.7.6
$ python3 -V
Python 3.3.3
$ pypy -V
Python 2.7.3 (87aa9de10f9ca71da9ab4a3d53e0ba176b67d086, Dec 04 2013, 12:50:47)
[PyPy 2.2.1 with GCC 4.8.2]
$ python2 test.py
fun11: 1.016
fun12: 0.349
fun21: 0.302
fun22: 0.276
fun23: 2.432
fun24: 0.897
$ python3 test.py
fun11: 0.973
fun12: 0.382
fun21: 0.423
fun22: 0.341
fun23: 3.650
fun24: 0.894
$ pypy ~/tmp/test.py
fun12: 0.087
fun21: 0.073
fun22: 0.128
fun23: 1.131
You can tweak n (size of A), m (size of random_list) and q (number of queries) to your scenario. To my surprise, my attempt to be clever and use builtin functions instead of list comps has not paid off, since fun22 is not a lot faster than fun21 (only ~10% In Python 2 and ~25% in Python 3, but almost 75% slower in PyPy). A case of premature optimization here. I guess the difference is due to the fact that fun22 builds up an unnecessary temporary list per query in Python 2. We also see that binary search is pretty bad (look at fun23).
def numpy_optimized(index, values):
I = np.argsort(values)
Q = np.searchsorted(values, index, sorter=I)
return I[Q]
This calculates the same thing as OP, but with the indices in matching order to the values queried, which I imagine is an improvement in functionality. It is up to twice as fast as OP's solution on my machine; which puts it slightly ahead of the non-pypy solutions, if I interpret your benchmarks correctly.
Or in case we cannot assume all index are present in values, and would like missing queries to be silently dropped:
def numpy_optimized_filtered(index, values):
I = np.argsort(values)
Q = np.searchsorted(values, index, sorter=I)
Z = I[Q]
return Z[values[Z]==index]

python printing a generator list after vectorization?

I am new with vectorization and generators. So far I have created the following function:
import numpy as np
def ismember(a,b):
for i in a:
if len(np.where(b==i)[0]) == 0:
lv_var = 0
else:
lv_var = np.int(np.where(b==i)[0])
yield lv_var
vect = np.vectorize(ismember)
A = np.array(xrange(700000))
B = np.array(xrange(700000))
lv_result = vect(A,B)
When I try to cast lv_result as a list or loop through the resulting numpy array I get a list of generator objects. I need to somehow get the actual result. How do I print the actual results from this function ? .next() on generator doesn't seem to do the job.
Could someone tell me what is that I am doing wrong or how could I reconfigure the code to achieve the end goal?
---------------------------------------------------
OK so I understand the vectorize part now (thanks Viet Nguyen for the example).
I was also able to print the generator object results. The code has been modified. Please see below.
For the generator part:
What I am trying to do is to mimic a MATLAB function called ismember (The one that is formatted as: [Lia,Locb] = ismember(A,B). I am just trying to get the Locb part only.
From Matlab: Locb, contain the lowest index in B for each value in A that is a member of B. The output array, Locb, contains 0 wherever A is not a member of B
One of the main problems is that I need to be able to perform this operation as efficient as possible. For testing I have two arrays of 700k elements. Creating a generator and going through the values of the generator doesn't seem to get any better performance wise.
To print the Generator, I have created function f() below.
import numpy as np
def ismember(a,b):
for i in a:
index = np.where(b==i)[0]
if len(index) == 0:
yield 0
else:
yield index
def f(A, gen_obj):
my_array = np.arange(len(A))
for i in my_array:
my_array[i] = gen_obj.next()
return my_array
A = np.arange(700000)
B = np.arange(700000)
gen_obj = ismember(A,B)
f(A, gen_obj)
print 'done'
Note: if we were to try the above code with smaller arrays:
Lets say.
A = np.array([3,4,4,3,6])
B = np.array([2,5,2,6,3])
The result will be an array of : [4 0 0 4 3]
Just like matlabs function: the goal is to get the lowest index in B for each value in A that is a member of B. The output array, Locb, contains 0 wherever A is not a member of B.
Numpy's intersection function doesn't help me to achieve the goal. Also the size of the returning array needs to be kept the same size as the size of array A.
So far this process is taking forever(for arrays of 700k elements). Unfortunately I haven't been able to find the best solution yet. Any inputs on how could I reconfigure the code to achieve the end goal, with the best performance, will be much appreciated.
Optimization Problem solved in:
python-run-generator-using-multiple-cores-for-optimization
I believe you've misunderstood the inputs to a numpy.vectorize function. The "vectorized" function operates on the arrays on a per-element basis (see numpy.vectorize reference). Your function ismember seems to presume that the inputs a and b are arrays. Instead, think of the function as something you would use with built-in map().
> import numpy as np
> def mask(a, b):
> return 1 if a == b else 0
> a = np.array([1, 2, 3, 4])
> b = np.array([1, 3, 4, 5])
> maskv = np.vectorize(mask)
> maskv(a, b)
array([1, 0, 0, 0])
Also, if I'm understanding your intention correctly, NumPy comes with an intersection function.

Long argument lists and performance

This is surely no python-specific question, but I am looking for a python-specific answer - if any. It is about putting code blocks with a large number of variables into functions (or alike?). Let me assume this code
##!/usr/bin/env python
# many variables: built in types, custom made objects, you name it.
# Let n be a 'substantial' number, say 47.
x1 = v1
x2 = v2
...
xn = vn
# several layers of flow control, for brevity only 2 loops
for i1 in range(ri1):
for i2 in range(ri2):
y1 = f1(i1,i2)
y2 = f2(i1,i2)
# Now, several lines of work
do_some_work
# involving HEAVY usage and FREQUENT (say several 10**3 times)
# access to all of x1,...xn, (and maybe y1,y2)
# One of the main points is that slowing down access to x1,...,xn
# will turn into a severe bottleneck for the performance of the code.
# now other things happen. These may or may not involve modification
# of x1,...xn
# some place later in the code, again, several layers of flow control,
# not necessarily identical to the first occur
for j1 in range(rj1):
y1 = g1(j1)
y2 = g2(j1)
# Now, again
do_some_work # <---- this is EXACTLY THE SAME code block as above
# a.s.o.
Obviously I would like to put 'do_some_work' into something like a function (or maybe something better?).
What would be the most performant way to do this in python
without function calls with a confusingly large numbers of arguments
without performance lossy indirection to access x1,...,xn (Say, by wrapping them into another list, class, or alike)
without using x1,...,xn as globals in a function do_some_work(...)
I have to admit, that I always find myself returning to globals.
A simple and dirty(probably not optimal) banchmark:
import timeit
def test_no_func():
(x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16,x17,x18,x19) = range(20)
for i1 in xrange(100):
for i2 in xrange(100):
for i3 in xrange(100):
results = [x0+x1+x2+x3+x4+x5+x6 for _ in xrange(100)]
results.extend(x7+x8+x9+x10+x11+x12+x13+x14+x15 for _ in xrange(100))
results.extend(x16+x17+x18+x19+x0 for _ in xrange(500))
for j1 in xrange(100):
for j2 in xrange(100):
for i3 in xrange(100):
results = [x0+x1+x2+x3+x4+x5+x6 for _ in xrange(100)]
results.extend(x7+x8+x9+x10+x11+x12+x13+x14+x15 for _ in xrange(100))
results.extend(x16+x17+x18+x19+x0 for _ in xrange(500))
def your_func(x_vars):
# of the number is not too big you can simply unpack.
# 150 is a bit too much for unpacking...
(x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16,x17,x18,x19) = x_vars
results = [x0+x1+x2+x3+x4+x5+x6 for _ in xrange(100)]
results.extend(x7+x8+x9+x10+x11+x12+x13+x14+x15 for _ in xrange(100))
results.extend(x16+x17+x18+x19+x0 for _ in xrange(500))
return results
def test_func():
(x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16,x17,x18,x19) = range(20)
for i1 in xrange(100):
for i2 in xrange(100):
for i3 in xrange(100):
results = your_func(val for key,val in locals().copy().iteritems() if key.startswith('x'))
for j1 in xrange(100):
for j2 in xrange(100):
for i3 in xrange(100):
results = your_func(val for key,val in locals().copy().iteritems() if key.startswith('x'))
print timeit.timeit('test_no_func()', 'from __main__ import test_no_func', number=1)
print timeit.timeit('test_func()', 'from __main__ import test_func, your_func', number=1)
Result:
214.810357094
227.490054131
which is about 5% slower passing the arguments. But probably you can't do much better than this introducing 1 million function calls...
Global variables are significantly slower than local variables.
Also, it's almost always a bad idea to use lots of different variable names. Better use a single data structure, for example a dictionary:
d = {"x1": "foo", "x2": "bar", "y1": "baz"}
etc.
Then you can pass d to your functions (which is very fast since just the address of the dict will be passed, not the entire dictionary), and access its contents from there.
if d["x2"] = "eggs":
d["x1"] = "spam"
I recommend using python cProfile module. Just run your script this way:
python -m cProfile your_script.py
in different modes (with and without function wrapper) and see how fast it works. I don't think accessing the variables is a bottleneck. Usually, loops and repeated operations are.
Secondly, I suggest thinking of abstracting the function, since you use i1, i2, etc.
those many variables might need to be in a list or a dictionary, and
cycles can be abstracted with itertools:
from itertools import product
equal_sums = 0
for l in product(range(10), repeat=6): # instead of 6 nested loops over range(10)
if sum(l[:3]) == sum(l[3:]):
equal_sums += 1

Python append performance

I'm having some performance problems with 'append' in Python.
I'm writing an algorithm that checks if there are two overlapping circles in a (large) set of circles.
I start by putting the extreme points of the circles (x_i-R_i & x_i+R_i) in a list and then sorting the list.
class Circle:
def __init__(self, middle, radius):
self.m = middle
self.r = radius
In between I generate N random circles and put them in the 'circles' list.
"""
Makes a list with all the extreme points of the circles.
Format = [Extreme, left/right ~ 0/1 extreme, index]
Seperate function for performance reason, python handles local variables faster.
Garbage collect is temporarily disabled since a bug in Python makes list.append run in O(n) time instead of O(1)
"""
def makeList():
"""gc.disable()"""
list = []
append = list.append
for circle in circles:
append([circle.m[0]-circle.r, 0, circles.index(circle)])
append([circle.m[0] + circle.r, 1, circles.index(circle)])
"""gc.enable()"""
return list
When running this with 50k circles it takes over 75 seconds to generate the list. As you might see in the comments I wrote I disabled garbage collect, put it in a separate function, used
append = list.append
append(foo)
instead of just
list.append(foo)
I disabled gc since after some searching it seems that there's a bug with python causing append to run in O(n) instead of O(c) time.
So is this way the fastest way or is there a way to make this run faster?
Any help is greatly appreciated.
Instead of
for circle in circles:
... circles.index(circle) ...
use
for i, circle in enumerate(circles):
... i ...
This could decrease your O(n^2) to O(n).
Your whole makeList could be written as:
sum([[[circle.m[0]-circle.r, 0, i], [circle.m[0]+circle.r, 1, i]] for i, circle in enumerate(circles)], [])
Your performance problem is not in the append() method, but in your use of circles.index(), which makes the whole thing O(n^2).
A further (comparitively minor) improvement is to use a list comprehension instead of list.append():
mylist = [[circle.m[0] - circle.r, 0, i]
for i, circle in enumerate(circles)]
mylist += [[circle.m[0] + circle.r, 1, i]
for i, circle in enumerate(circles)]
Note that this will give the data in a different order (which should not matter as you are planning to sort it anyway).
I've just tried several tests to improve "append" function's speed. It will definitely helpful for you.
Using Python
Using list(map(lambda - known as a bit faster means than for+append
Using Cython
Using Numba - jit
CODE CONTENT : getting numbers from 0 ~ 9999999, square them, and put them into a new list using append.
Using Python
import timeit
st1 = timeit.default_timer()
def f1():
a = range(0, 10000000)
result = []
append = result.append
for i in a:
append( i**2 )
return result
f1()
st2 = timeit.default_timer()
print("RUN TIME : {0}".format(st2-st1))
RUN TIME : 3.7 s
Using list(map(lambda
import timeit
st1 = timeit.default_timer()
result = list(map(lambda x : x**2 , range(0,10000000) ))
st2 = timeit.default_timer()
print("RUN TIME : {0}".format(st2-st1))
RUN TIME : 3.6 s
Using Cython
the coding in a .pyx file
def f1():
cpdef double i
a = range(0, 10000000)
result = []
append = result.append
for i in a:
append( i**2 )
return result
and I compiled it and ran it in .py file.
in .py file
import timeit
from c1 import *
st1 = timeit.default_timer()
f1()
st2 = timeit.default_timer()
print("RUN TIME : {0}".format(st2-st1))
RUN TIME : 1.6 s
Using Numba - jit
import timeit
from numba import jit
st1 = timeit.default_timer()
#jit(nopython=True, cache=True)
def f1():
a = range(0, 10000000)
result = []
append = result.append
for i in a:
append( i**2 )
return result
f1()
st2 = timeit.default_timer()
print("RUN TIME : {0}".format(st2-st1))
RUN TIME : 0.57 s
CONCLUSION :
As you mentioned above, changing the simple append form boosted up the speed a bit. And using Cython is much faster than in Python. However, turned out using Numba is the best choice in terms of speed improvement for 'for+append' !
Try using deque in the collections package to append large rows of data, without performance diminishing. And then convert a deque back to a DataFrame using List Comprehension.
Sample Case:
from collections import deque
d = deque()
for row in rows:
d.append([value_x, value_y])
df = pd.DataFrame({'column_x':[item[0] for item in d],'column_y':[item[1] for item in d]})
This is a real time-saver.
If performance were an issue, I would avoid using append. Instead, preallocate an array and then fill it up. I would also avoid using index to find position within the list "circles". Here's a rewrite. It's not compact, but I'll bet it's fast because of the unrolled loop.
def makeList():
"""gc.disable()"""
mylist = 6*len(circles)*[None]
for i in range(len(circles)):
j = 6*i
mylist[j] = circles[i].m[0]-circles[i].r
mylist[j+1] = 0
mylist[j+2] = i
mylist[j+3] = circles[i].m[0] + circles[i].r
mylist[j+4] = 1
mylist[j+5] = i
return mylist

Categories