Speeding up summation for loop in python

Speeding up summation for loop in python - python

I have the following bottleneck and am wondering if anyone can suggest ways to speed it up.
I have three lists x,y,z of length N. and I apply the following summation.
def abs_val_diff(x1, x2, x3, y1, y2, y3):
""" Find the absolute value of the difference between x and y """
return py.sqrt((x1 - y1) ** 2.0 + (x2 - y2) ** 2.0 + (x3 - y3) ** 2.0)
R = 0.1
sumV = 0.0
for i in xrange(N):
for j in xrange(i + 1, N):
if R > abs_val_diff(x[i], y[i], z[i],
x[j], y[j], z[j]):
sumV += 1.0
I have tried using numpy arrays, but either I am doing something wrong or there is a reduction in speed of about a factor of 2.
Any ideas would be highly appreciated.

I believe you can utilize numpy a little more efficiently by doing something like the following. Make a small modification to your function to use the numpy.sqrt:
import numpy as np
def abs_val_diff(x1, x2, x3, y1, y2, y3):
""" Find the absolute value of the difference between x and y """
return np.sqrt((x1 - y1) ** 2.0 + (x2 - y2) ** 2.0 + (x3 - y3) ** 2.0)
Then call with the full arrays:
res = abs_val_diff(x[:-1],y[:-1],z[:-1],x[1:],y[1:],z[1:])
Then, because you're adding 1 for each match, you can simply take the length of the array resulting from a query against the result:
sumV = len(res[R>res])
This lets numpy handle the iteration. Hopefully that works for you

Is there any reason you actually need to take the square root in your function? If all you do with the result is to compare it against a limit why not just square both sides of the comparison?
def abs_val_diff_squared(x1, x2, x3, y1, y2, y3):
""" Find the square of the absolute value of the difference between x and y """
return (x1 - y1) ** 2.0 + (x2 - y2) ** 2.0 + (x3 - y3) ** 2.0
R = 0.1
R_squared = R * R
sumV = 0.0
for i in xrange(N):
for j in xrange(i + 1, N):
if R_squared > abs_val_diff_squared(x[i], y[i], z[i],
x[j], y[j], z[j]):
sumV += 1.0
I also feel there ought to be much bigger savings gained from sorting the data into something like an octtree so you only have to look at nearby points rather than comparing everything against everything, but that's outside my knowledge.

It turns out long, ugly, list comprehensions are generally faster than explicit loops in python because they can be compiled to more efficient bytecode. I'm not sure if it'll help for you, but try something like this:
sumV = sum((1.0 for j in xrange(1+1, N) for i in xrange(N) if R > abs_val_diff(x[i], y[i], z[i], x[j], y[j], z[j])))
Yes, it looks absolutely atrocious, but there you go. More info can be found here and here.

Related

Find steady-state of a set of differential equations

Let us assume I have a set of differential equations to be integrated with scipy odeint. Now my goal is to find the steady-state (I chose initial conditions such that this state exists). At the moment I have implemented something like
cond = True
while cond:
x = integrate(interval = [0,t], steps = 200)
if var(x[-22::]) < maxvar:
cond = False
return mean(x)
else:
t*= 2
Do you have a more efficient approach?

If you are using odeint, then you already have your differential equations written as a function f(x, t) (or possibly f(x, t, *args)). If your system is autonomous (i.e. f does not actually depend on t), you can find an equilibrium by solving f(x, 0) == 0 for x. You can use, for example, scipy.optimize.fsolve to solve for the equilibrium.
The following is an example. It uses the "Coupled Spring Mass System" example from the scipy cookbook. scipy.optimize.fsolve is used to find the equilibrium solution x1 = 0.5, y1 = 0, x2 = 1.5, y2 = 0.
from scipy.optimize import fsolve
def vectorfield(w, t, p):
"""
Defines the differential equations for the coupled spring-mass system.
Arguments:
w : vector of the state variables:
w = [x1, y1, x2, y2]
t : time
p : vector of the parameters:
p = [m1, m2, k1, k2, L1, L2, b1, b2]
"""
x1, y1, x2, y2 = w
m1, m2, k1, k2, L1, L2, b1, b2 = p
# Create f = (x1', y1', x2', y2'):
f = [y1,
(-b1 * y1 - k1 * (x1 - L1) + k2 * (x2 - x1 - L2)) / m1,
y2,
(-b2 * y2 - k2 * (x2 - x1 - L2)) / m2]
return f
if __name__ == "__main__":
# Parameter values
# Masses:
m1 = 1.0
m2 = 1.5
# Spring constants
k1 = 8.0
k2 = 40.0
# Natural lengths
L1 = 0.5
L2 = 1.0
# Friction coefficients
b1 = 0.8
b2 = 0.5
# Pack up the parameters and initial conditions:
p = [m1, m2, k1, k2, L1, L2, b1, b2]
# Initial guess to pass to fsolve. The second and fourth components
# are the velocities of the masses, and we know they will be 0 at
# equilibrium. For the positions x1 and x2, we'll try 1 for both.
# A better guess could be obtained by solving the ODEs for some time
# interval, and using the last point of that solution.
w0 = [1.0, 0, 1.0, 0]
# Find the equilibrium
eq = fsolve(vectorfield, w0, args=(0, p))
print "Equilibrium: x1 = {0:.1f} y1 = {1:.1f} x2 = {2:.1f} y2 = {3:.1f}".format(*eq)
The output is:
Equilibrium: x1 = 0.5 y1 = 0.0 x2 = 1.5 y2 = 0.0

Referring to your comments, I don't see any better way:
In this case where you know approximately where the system will settle.
(This is fairly easy to predict in some systems, like a pendulum, or a charging capacitor), and that it will settle, the fastest way I know is to check if
(p[0] * x[i] + p[1] * x[i-1] ... + p[n] * x[i-n] - mean(x[i-0:n]) )< epsilon)
The difficulty is determining the size of epsilon, the parameters p[0:n] to tune this detection.
Clearly you are already using this window method:
epsilon = varmax
p[0:n] = 1
n = 22
and have optimized it by removing the parameters for the filter, and simply using the variance.
For many systems (where settling looks like 1 order differential equations) the filter parameters turn out to look like this:
p[0] = n/2
p[n] = n/2
p[1:n-1] = 0
meaning you can make stuff go faster if you replace the calculation of state variance with this simple test:
if( abs(x[-22] - x[0]) > epsilon)
This will not detect correctly if small disturbances are still present, since we don't know your system, that's difficult to talk about...

Calculating the sine, cosine and angle between 3 points

I've written a function to calculate the cosine, sine and degrees of the angle between three points of which I have the x and y coordinates - point 1 (x1, y1), point 2 (x2, y2) and point 3 (x3, y3). I've written the function and been trying to test it out, but I'm not completely confident in how accurate it is. Does anyone know if I've made a mistake in my calculations?
def path_angle_degree(x1, y1, x2, y2, x3, y3):
u = (x2 - x1, y2 - y1)
v = (x3 - x2, y3 - y2)
norm_u = math.sqrt(u[0] * u[0] + u[1] * u[1])
norm_v = math.sqrt(v[0] * v[0] + v[1] * v[1])
# this conditional is to check there has been movement between the points
if norm_u < 0.001 or norm_v < 0.001:
return (None, None, None)
prod_n = norm_u * norm_v
dot_uv = u[0] * v[0] + u[1] * v[1]
cos_uv = dot_uv / prod_n
# fixes floating point rounding
if cos_uv > 1.0 or cos_uv < -1.0:
cos_uv = round(cos_uv)
radians = math.acos(cos_uv)
sin_uv = math.sin(radians)
degree = math.degrees(radians)
return (cos_uv, sin_uv, degree)
An example of this function being called on a straight path would be:
print(path_angle_degree(6,6,7,6,8,6))
Thanks so much!

In addition to being expensive, sin_uv is unstable and doesn't cope with numerous edge cases. Use the cross product instead (you only need the z-component).
Also, you'll find it simpler and cheaper to normalize u and v before computing the products.
Once you have cos_uv and sin_uv, use atan2 to get the angle.

You might want to look at the arctan2() function.

Explanation of this code: Determining if a point is on which side of a line

Source: http://datasciencelab.wordpress.com/2014/01/10/machine-learning-classics-the-perceptron/
"The general equation of a line given two points in it, (x1,y2) and (x2,y2), is A + Bx + Cy = 0 where A, B, C can be written in terms of the two points. Defining a vector V = (A, B, C), any point (x,y) belongs to the line if V'x = 0, where x = (1,x,y). Points for which the dot product is positive fall on one side of the line, negatives fall on the other."
I don't quite understand how it works. Also, this line in particular:
self.V = np.array([xB*yA-xA*yB, yB-yA, xA-xB])
Why is Bx determined by yb-ya?
For what its worth, I'm learning Linear Algebra, so I'm quite familiar with the mathematical concept (I realize it is meant to be a normal), but how it is done escapes me.

Why is B determined by yb - ya? (or rather y2 - y1 for our example)
The text assumes the line equation to be: A + B*x + C*y = 0
Let's say we have two points from that line, P1(x1, y1) and P2(x2, y2)
Using the two-points form of the line equation, you will get y - y1 = [(y2 - y1)/(x2 - x1)] * (x - x1) (based on P1and P2)
The same equation, can be developed into [(x2 -x1)*y1 + (y1 - y2)*x1] + (y2 - y1) * x + (x1 - x2) * y = 0
Looking at A + B*x + C*y = 0, you see that:
A, is [(x2 -x1)*y1 + (y1 - y2)*x1] = x2*y1 - y2*x1
B, the coefficient of x is (y2 - y1)
C, the coefficient of y is (x1 - x2)
hence the value np.array([A, B, C]) in the source code appears as np.array([xB*yA - xA*yB, yB - yA, xA - xB])

Any way to speed up this Python code?

I've written some Python code to do some image processing work, but it takes a huge amount of time to run. I've spent the last few hours trying to optimize it, but I think I've reached the end of my abilities.
Looking at the outputs from the profiler, the function below is taking a large proportion of the overall time of my code. Is there any way that it can be speeded up?
def make_ellipse(x, x0, y, y0, theta, a, b):
c = np.cos(theta)
s = np.sin(theta)
a2 = a**2
b2 = b**2
xnew = x - x0
ynew = y - y0
ellipse = (xnew * c + ynew * s)**2/a2 + (xnew * s - ynew * c)**2/b2 <= 1
return ellipse
To give the context, it is called with x and y as the output from np.meshgrid with a fairly large grid size, and all of the other parameters as simple integer values.
Although that function seems to be taking a lot of the time, there are probably ways that the rest of the code can be speeded up too. I've put the rest of the code at this gist.
Any ideas would be gratefully received. I've tried using numba and autojiting the main functions, but that doesn't help much.

Let's try to optimize make_ellipse in conjunction with its caller.
First, notice that a and b are the same over many calls. Since make_ellipse squares them each time, just have the caller do that instead.
Second, notice that np.cos(np.arctan(theta)) is 1 / np.sqrt(1 + theta**2) which seems slightly faster on my system. A similar trick can be used to compute the sine, either from theta or from cos(theta) (or vice versa).
Third, and less concretely, think about short-circuiting some of the final ellipse formula evaluations. For example, wherever (xnew * c + ynew * s)**2/a2 is greater than 1, the ellipse value must be False. If this happens often, you can "mask" out the second half of the (expensive) calculation of the ellipse at those locations. I haven't planned this thoroughly, but see numpy.ma for some possible leads.

It won't speed up things for all cases, but if your ellipses don't take up the whole image, you should limit your search for points inside the ellipse to its bounding rectangle. I am lazy with the math, so I googled it and reused #JohnZwinck neat cosine of an arctangent trick to come up with this function:
def ellipse_bounding_box(x0, y0, theta, a, b):
x_tan_t = -b * np.tan(theta) / a
if np.isinf(x_tan_t) :
x_cos_t = 0
x_sin_t = np.sign(x_tan_t)
else :
x_cos_t = 1 / np.sqrt(1 + x_tan_t*x_tan_t)
x_sin_t = x_tan_t * x_cos_t
x = x0 + a*x_cos_t*np.cos(theta) - b*x_sin_t*np.sin(theta)
y_tan_t = b / np.tan(theta) / a
if np.isinf(y_tan_t):
y_cos_t = 0
y_sin_t = np.sign(y_tan_t)
else:
y_cos_t = 1 / np.sqrt(1 + y_tan_t*y_tan_t)
y_sin_t = y_tan_t * y_cos_t
y = y0 + b*y_sin_t*np.cos(theta) + a*y_cos_t*np.sin(theta)
return np.sort([-x, x]), np.sort([-y, y])
You can now modify your original function to something like this:
def make_ellipse(x, x0, y, y0, theta, a, b):
c = np.cos(theta)
s = np.sin(theta)
a2 = a**2
b2 = b**2
x_box, y_box = ellipse_bounding_box(x0, y0, theta, a, b)
indices = ((x >= x_box[0]) & (x <= x_box[1]) &
(y >= y_box[0]) & (y <= y_box[1]))
xnew = x[indices] - x0
ynew = y[indices] - y0
ellipse = np.zeros_like(x, dtype=np.bool)
ellipse[indices] = ((xnew * c + ynew * s)**2/a2 +
(xnew * s - ynew * c)**2/b2 <= 1)
return ellipse

Since everything but x and y are integers, you can try to minimize the number of array computations. I imagine most of the time is spent in this statement:
ellipse = (xnew * c + ynew * s)**2/a2 + (xnew * s - ynew * c)**2/b2 <= 1
A simple rewriting like so should reduce the number of array operations:
a = float(a)
b = float(b)
ellipse = (xnew * (c/a) + ynew * (s/a))**2 + (xnew * (s/b) - ynew * (c/b))**2 <= 1
What was 12 array operations is now 10 (plus 4 scalar ops). I'm not sure if numba's jit would have tried this. It might just do all the broadcasting first, then jit the resulting operations. In this case, reordering so common operations are done at once should help.
Furthering along, you can rewrite this again as
ellipse = ((xnew + ynew * (s/c)) * (c/a))**2 + ((xnew * (s/c) - ynew) * (c/b))**2 <= 1
Or
t = numpy.tan(theta)
ellipse = ((xnew + ynew * t) * (b/a))**2 + (xnew * t - ynew)**2 <= (b/c)**2
Replacing one more array operation with a scalar, and eliminating other scalar ops to get 9 array operations and 2 scalar ops.
As always, be aware of what the range of inputs are to avoid rounding errors.
Unfortunately there's no way good way to do a running sum and bail early if either of the two addends is greater than the right hand side of the comparison. That would be an obvious speed-up, but one you'd need cython (or c/c++) to code.

You can speed it up considerably by using Cython. There is a very good documentation on how to do this.

Karatsuba algorithm too much recursion

I am trying to implement the Karatsuba multiplication algorithm in c++ but right now I am just trying to get it to work in python.
Here is my code:
def mult(x, y, b, m):
if max(x, y) < b:
return x * y
bm = pow(b, m)
x0 = x / bm
x1 = x % bm
y0 = y / bm
y1 = y % bm
z2 = mult(x1, y1, b, m)
z0 = mult(x0, y0, b, m)
z1 = mult(x1 + x0, y1 + y0, b, m) - z2 - z0
return mult(z2, bm ** 2, b, m) + mult(z1, bm, b, m) + z0
What I don't get is: how should z2, z1, and z0 be created? Is using the mult function recursively correct? If so, I'm messing up somewhere because the recursion isn't stopping.
Can someone point out where the error is?

NB: the response below addresses directly the OP's question about
excessive recursion, but it does not attempt to provide a correct
Karatsuba algorithm. The other responses are far more informative in
this regard.
Try this version:
def mult(x, y, b, m):
bm = pow(b, m)
if min(x, y) <= bm:
return x * y
# NOTE the following 4 lines
x0 = x % bm
x1 = x / bm
y0 = y % bm
y1 = y / bm
z0 = mult(x0, y0, b, m)
z2 = mult(x1, y1, b, m)
z1 = mult(x1 + x0, y1 + y0, b, m) - z2 - z0
retval = mult(mult(z2, bm, b, m) + z1, bm, b, m) + z0
assert retval == x * y, "%d * %d == %d != %d" % (x, y, x * y, retval)
return retval
The most serious problem with your version is that your calculations of x0 and x1, and of y0 and y1 are flipped. Also, the algorithm's derivation does not hold if x1 and y1 are 0, because in this case, a factorization step becomes invalid. Therefore, you must avoid this possibility by ensuring that both x and y are greater than b**m.
EDIT: fixed a typo in the code; added clarifications
EDIT2:
To be clearer, commenting directly on your original version:
def mult(x, y, b, m):
# The termination condition will never be true when the recursive
# call is either
# mult(z2, bm ** 2, b, m)
# or mult(z1, bm, b, m)
#
# Since every recursive call leads to one of the above, you have an
# infinite recursion condition.
if max(x, y) < b:
return x * y
bm = pow(b, m)
# Even without the recursion problem, the next four lines are wrong
x0 = x / bm # RHS should be x % bm
x1 = x % bm # RHS should be x / bm
y0 = y / bm # RHS should be y % bm
y1 = y % bm # RHS should be y / bm
z2 = mult(x1, y1, b, m)
z0 = mult(x0, y0, b, m)
z1 = mult(x1 + x0, y1 + y0, b, m) - z2 - z0
return mult(z2, bm ** 2, b, m) + mult(z1, bm, b, m) + z0

Usually big numbers are stored as arrays of integers. Each integer represents one digit. This approach allows to multiply any number by the power of base with simple left shift of the array.
Here is my list-based implementation (may contain bugs):
def normalize(l,b):
over = 0
for i,x in enumerate(l):
over,l[i] = divmod(x+over,b)
if over: l.append(over)
return l
def sum_lists(x,y,b):
l = min(len(x),len(y))
res = map(operator.add,x[:l],y[:l])
if len(x) > l: res.extend(x[l:])
else: res.extend(y[l:])
return normalize(res,b)
def sub_lists(x,y,b):
res = map(operator.sub,x[:len(y)],y)
res.extend(x[len(y):])
return normalize(res,b)
def lshift(x,n):
if len(x) > 1 or len(x) == 1 and x[0] != 0:
return [0 for i in range(n)] + x
else: return x
def mult_lists(x,y,b):
if min(len(x),len(y)) == 0: return [0]
m = max(len(x),len(y))
if (m == 1): return normalize([x[0]*y[0]],b)
else: m >>= 1
x0,x1 = x[:m],x[m:]
y0,y1 = y[:m],y[m:]
z0 = mult_lists(x0,y0,b)
z1 = mult_lists(x1,y1,b)
z2 = mult_lists(sum_lists(x0,x1,b),sum_lists(y0,y1,b),b)
t1 = lshift(sub_lists(z2,sum_lists(z1,z0,b),b),m)
t2 = lshift(z1,m*2)
return sum_lists(sum_lists(z0,t1,b),t2,b)
sum_lists and sub_lists returns unnormalized result - single digit can be greater than the base value. normalize function solved this problem.
All functions expect to get list of digits in the reverse order. For example 12 in base 10 should be written as [2,1]. Lets take a square of 9987654321.
» a = [1,2,3,4,5,6,7,8,9]
» res = mult_lists(a,a,10)
» res.reverse()
» res
[9, 7, 5, 4, 6, 1, 0, 5, 7, 7, 8, 9, 9, 7, 1, 0, 4, 1]

The goal of the Karatsuba multiplication is to improve on the divide-and conquer multiplication algorithm by making 3 recursive calls instead of four. Therefore, the only lines in your script that should contain a recursive call to the multiplication are those assigning z0,z1 and z2. Anything else will give you a worse complexity. You can't use pow to compute bm when you haven't defined multiplication yet (and a fortiori exponentiation), either.
For that, the algorithm crucially uses the fact that it is using a positional notation system. If you have a representation x of a number in base b, then x*bm is simply obtained by shifting the digits of that representation m times to the left. That shifting operation is essentially "free" with any positional notation system. That also means that if you want to implement that, you have to reproduce this positional notation, and the "free" shift. Either you chose to compute in base b=2 and use python's bit operators (or the bit operators of a given decimal, hex, ... base if your test platform has them), or you decide to implement for educational purposes something that works for an arbitrary b, and you reproduce this positional arithmetic with something like strings, arrays, or lists.
You have a solution with lists already. I like to work with strings in python, since int(s, base) will give you the integer corresponding to the string s seen as a number representation in base base: it makes tests easy. I have posted an heavily commented string-based implementation as a gist here, including string-to-number and number-to-string primitives for good measure.
You can test it by providing padded strings with the base and their (equal) length as arguments to mult:
In [169]: mult("987654321","987654321",10,9)
Out[169]: '966551847789971041'
If you don't want to figure out the padding or count string lengths, a padding function can do it for you:
In [170]: padding("987654321","2")
Out[170]: ('987654321', '000000002', 9)
And of course it works with b>10:
In [171]: mult('987654321', '000000002', 16, 9)
Out[171]: '130eca8642'
(Check with wolfram alpha)

I believe that the idea behind the technique is that the zi terms are computed using the recursive algorithm, but the results are not unified together that way. Since the net result that you want is
z0 B^2m + z1 B^m + z2
Assuming that you choose a suitable value of B (say, 2) you can compute B^m without doing any multiplications. For example, when using B = 2, you can compute B^m using bit shifts rather than multiplications. This means that the last step can be done without doing any multiplications at all.
One more thing - I noticed that you've picked a fixed value of m for the whole algorithm. Typically, you would implement this algorithm by having m always be a value such that B^m is half the number of digits in x and y when they are written in base B. If you're using powers of two, this would be done by picking m = ceil((log x) / 2).
Hope this helps!

In Python 2.7: Save this file as Karatsuba.py
def karatsuba(x,y):
"""Karatsuba multiplication algorithm.
Return the product of two numbers in an efficient manner
#author Shashank
date: 23-09-2018
Parameters
----------
x : int
First Number
y : int
Second Number
Returns
-------
prod : int
The product of two numbers
Examples
--------
>>> import Karatsuba.karatsuba
>>> a = 1234567899876543211234567899876543211234567899876543211234567890
>>> b = 9876543211234567899876543211234567899876543211234567899876543210
>>> Karatsuba.karatsuba(a,b)
12193263210333790590595945731931108068998628253528425547401310676055479323014784354458161844612101832860844366209419311263526900
"""
if len(str(x)) == 1 or len(str(y)) == 1:
return x*y
else:
n = max(len(str(x)), len(str(y)))
m = n/2
a = x/10**m
b = x%10**m
c = y/10**m
d = y%10**m
ac = karatsuba(a,c) #step 1
bd = karatsuba(b,d) #step 2
ad_plus_bc = karatsuba(a+b, c+d) - ac - bd #step 3
prod = ac*10**(2*m) + bd + ad_plus_bc*10**m #step 4
return prod

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Speeding up summation for loop in python - python

Related

Find steady-state of a set of differential equations

Calculating the sine, cosine and angle between 3 points

Explanation of this code: Determining if a point is on which side of a line

Any way to speed up this Python code?

Karatsuba algorithm too much recursion

Categories

Resources