Vectorize Forward Euler method for system of differential equations - python

I am numerically solving for x(t) for a system of first order differential equations. The system is:
dx/dt = y
dy/dt = -x - a*y(x^2 + y^2 -1)
I have implemented the Forward Euler method to solve this problem as follows:
def forward_euler():
h = 0.01
num_steps = 10000
x = np.zeros([num_steps + 1, 2]) # steps, number of solutions
y = np.zeros([num_steps + 1, 2])
a = 1.
x[0, 0] = 10. # initial condition 1st solution
y[0, 0] = 5.
x[0, 1] = 0. # initial condition 2nd solution
y[0, 1] = 0.0000000001
for step in xrange(num_steps):
x[step + 1] = x[step] + h * y[step]
y[step + 1] = y[step] + h * (-x[step] - a * y[step] * (x[step] ** 2 + y[step] ** 2 - 1))
return x, y
Now I would like to vectorize the code further and keep x and y in the same array, I have come up with the following solution:
def forward_euler_vector():
num_steps = 10000
h = 0.01
x = np.zeros([num_steps + 1, 2, 2]) # steps, variables, number of solutions
a = 1.
x[0, 0, 0] = 10. # initial conditions 1st solution
x[0, 1, 0] = 5.
x[0, 0, 1] = 0. # initial conditions 2nd solution
x[0, 1, 1] = 0.0000000001
def f(x):
return np.array([x[1],
-x[0] - a * x[1] * (x[0] ** 2 + x[1] ** 2 - 1)])
for step in xrange(num_steps):
x[step + 1] = x[step] + h * f(x[step])
return x
The question: forward_euler_vector() works, but was this to best way to vectorize it? I am asking because the vectorized version runs about 20 ms slower on my laptop:
In [27]: %timeit forward_euler()
1 loops, best of 3: 301 ms per loop
In [65]: %timeit forward_euler_vector()
1 loops, best of 3: 320 ms per loop

There is always the trivial autojit solution:
def forward_euler(initial_x, initial_y, num_steps, h):
x = np.zeros([num_steps + 1, 2]) # steps, number of solutions
y = np.zeros([num_steps + 1, 2])
a = 1.
x[0, 0] = initial_x[0] # initial condition 1st solution
y[0, 0] = initial_y[0]
x[0, 1] = initial_x[1] # initial condition 2nd solution
y[0, 1] = initial_y[1]
for step in xrange(int(num_steps)):
x[step + 1] = x[step] + h * y[step]
y[step + 1] = y[step] + h * (-x[step] - a * y[step] * (x[step] ** 2 + y[step] ** 2 - 1))
return x, y
Timings:
from numba import autojit
jit_forward_euler = autojit(forward_euler)
%timeit forward_euler([10,0], [5,0.0000000001], 1E4, 0.01)
1 loops, best of 3: 385 ms per loop
%timeit jit_forward_euler([10,0], [5,0.0000000001], 1E4, 0.01)
100 loops, best of 3: 3.51 ms per loop

#Ophion comment explains very well what's going on. The call to array() within f(x) introduces some overhead, that kills the benefit of the use of matrix multiplication in the expression h * f(x[step]).
And as he says, you may be interested in having a look at scipy.integrate for a nice set of numerical integrators.
To solve the problem at hand of vectorising your code, you want to avoid recreating the array every time you call f. You would like to initialize the array once, and return it modified at every call. This is similar to what a static variable is in C/C++.
You can achieve this with a mutable default argument, that is interpreted once, at the time of the definition of the function f(x), and that has local scope. Since it has to be mutable, you encapsulate it in a list of a single element:
def f(x,static_tmp=[empty((2,2))]):
static_tmp[0][0]=x[1]
static_tmp[0][1]=-x[0] - a * x[1] * (x[0] ** 2 + x[1] ** 2 - 1)
return static_tmp[0]
With this modification to your code, the overhead of array creation disappears, and on my machine I gain a small improvement:
%timeit forward_euler() #258ms
%timeit forward_euler_vector() #248ms
This means that the gain of optimizing matrix multiplication with numpy is quite small, at least on the problem at hand.
You may want to get rid of the function f straight away as well, doing its operations within the for loop, getting rid of the call overhead. This trick of the default argument can however be applied also with scipy more general time integrators, where you must provide a function f.
EDIT: as pointed out by Jaime, another way to go is to treat static_tmp as an attribute of the function f, and to create it after having declared the function but before calling it:
def f(x):
f.static_tmp[0]=x[1]
f.static_tmp[1]=-x[0] - a * x[1] * (x[0] ** 2 + x[1] ** 2 - 1)
return f.static_tmp
f.static_tmp=empty((2,2))

Related

vectorizing a "leaky integrator" in numpy

I need a leaky integrator -- an IIR filter -- that implements:
y[i] = x[i] + y[i-1] * leakiness
The following code works. However, my x vectors are long and this is in an inner loop. So my questions:
For efficiency, is there a way to vectorize this in numpy?
If not numpy, would it be advantageous to use one of the scipy.signal filter algorithms?
The iterative code follows. state is simply the value of the previous y[i-1] that gets carried forward over successive calls:
import numpy as np
def leaky_integrator(x, state, leakiness):
y = np.zeros(len(x), dtype=np.float32)
for i in range(len(x)):
if i == 0:
y[i] = x[i] + state * leakiness
else:
y[i] = x[i] + y[i-1] * leakiness
return y, y[-1]
>>> leakiness = 0.5
>>> a1 = [1, 0, 0, 0]
>>> state = 0
>>> print("a1=", a1, "state=", state)
a1= [1, 0, 0, 0] state= 0
>>> a2, state = leaky_integrator(a1, state, leakiness)
>>> print("a2=", a2, "state=", state)
a2= [1. 0.5 0.25 0.125] state= 0.125
>>> a3, state = leaky_integrator(a2, state, leakiness)
>>> print("a3=", a3, "state=", state)
a3= [1.0625 1.03125 0.765625 0.5078125] state= 0.5078125
I can see two options:
The simplest (and suggested solution) is to extend the dependency list and use numba.
Use matrix multiplication by rethinking the problem in term of matrix operations.
In fact if x=[a, b, c], s=state, l=leakiness
then
y = [a + s*l, b + (a + s*l)*l, c + (b + (a + s*l)*l)*l]
= [a + s*l, b + a*l + s*l**2, c + b*l + a*l**2 + s*l**3]
= [[1, 0, 0], [l, 1, 0], [l**2, l, 1]] # x + s * [l, l**2, l**3]
However you may need to generate a matrix of size x.size**2 and may get an out of memory error even for small sizes (e.g. for a 1M size array it results about 7 TiB, which I don't think it is doable).
Going back to the numba implemenation, it can be enough to add a #jit(nopython=True) decorator to the function you already implemented.
Doing so in my machine with a random array of size 1M:
%timeit leaky_integrator(a1, s, l)
2.07 s ± 99.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit leaky_integrator_jitted(a1, s, l)
7.66 ms ± 22.2 µs per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
(Remark on numba performance here)
from numba import jit
#jit(nopython=True)
def leaky_integrator_jitted(x, state, leakiness):
y = np.zeros(len(x), dtype=np.float32)
for i in range(len(x)):
if i == 0:
y[i] = x[i] + state * leakiness
else:
y[i] = x[i] + y[i-1] * leakiness
return y, y[-1]

scipy optimize one iteration at a time

I want to control the objective of my optimization as a function of the number of iterations. In my real problem, I have a complicated regularization term that I want to control using the iteration number.
Is it possible to call a scipy optimizer one iteration at a time, or at least to be able to access the iteration number in the objective function?
Here is an example showing my best attempt so far:
from scipy.optimize import fmin_slsqp
from scipy.optimize import minimize as mini
import numpy as np
# define objective function
# x is the design input
# iteration is the iteration number
# the idea is that I want to control a regularization term using the iteration number
def objective(x, iteration):
return (1 - x[0]) ** 2 + 100 * (x[1] - x[0] ** 2) ** 2 + 10 * np.sum(x ** 2) / iteration
x = np.ones(2) * 5
for ii in range(20):
x = fmin_slsqp(objective, x, iter=1, args=(ii,), iprint=0)
if ii == 5: print('at iteration 5, I expect to get ~ [0, 0], but I get', x)
truex = mini(objective, np.ones(2) * 5, args=(200,)).x
print('the final result is ', x, 'instead of the correct answer, which is close to [1, 1] (', truex, ')')
output:
at iteration 5, I expect to get ~ [0, 0], but I get [5. 5.]
the final result is [5. 5.] instead of the correct answer, [1, 1] ([0.88613989 0.78485145])
No, I don't think scipy offers this option.
Interestingly, pytorch does. See this example of optimizing one iteration at a time:
import numpy as np
# define rosenbrock function and gradient
a = 1
b = 5
def f(x):
return (a - x[0]) ** 2 + b * (x[1] - x[0] ** 2) ** 2
# create stochastic rosenbrock function and gradient
def f_rand(x):
return f(x) * np.random.uniform(0.5, 1.5)
x = np.array([0.1, 0.1])
x0 = x.copy()
import torch
x_tensor = torch.tensor(x0, requires_grad=True)
optimizer = torch.optim.Adam([x_tensor], lr=learning_rate)
def closure():
optimizer.zero_grad()
loss = f_rand(x_tensor)
loss.backward()
return loss
# optimize one iteration at a time
for ii in range(200):
optimizer.step(closure)
print('optimal solution found: ', x_tensor, f(x_tensor))
If you really need to use scipy, you can make a class to count iterations, though you should be careful when mixing this with an algorithm that is approximating the inverse hessian matrix.
from scipy.optimize import fmin_slsqp
from scipy.optimize import minimize as mini
import numpy as np
# define objective function
# x is the design input
# iteration is the iteration number
# the idea is that I want to control a regularization term using the iteration number
def objective(x):
return (1 - x[0]) ** 2 + 100 * (x[1] - x[0] ** 2) ** 2 + 10 * np.sum(x ** 2)
class myclass:
def __init__(self):
self.iteration = 0
def call(self, x):
self.iteration += 1
return (1 - x[0]) ** 2 + 100 * (x[1] - x[0] ** 2) ** 2 + 10 * np.sum(x ** 2) / self.iteration
x = np.ones(2) * 5
obj = myclass()
x = fmin_slsqp(obj.call, x, iprint=0)
truex = mini(objective, np.ones(2) * 5).x
print('the final result is ', x, ', which is not the correct answer, and is not close to [1, 1] (', truex, ')')

Double dot product with broadcasting in numpy

I have the following operation :
import numpy as np
x = np.random.rand(3,5,5)
w = np.random.rand(5,5)
y=np.zeros((3,5,5))
for i in range(3):
y[i] = np.dot(w.T,np.dot(x[i],w))
Which corresponds to the pseudo-expression y[m,i,j] = sum( w[k,i] * x[m,k,l] * w[l,j], axes=[k,l] or equivalently simply the dot product of w.T , x, w broadcaster over the first dimension of x.
How can I implement it with numpy's broadcasting rules ?
Thanks in advance.
Here's one vectorized approach with np.tensordot which should be better than broadcasting + summation anyday -
# Take care of "np.dot(x[i],w)" term
x_w = np.tensordot(x,w,axes=((2),(0)))
# Perform "np.dot(w.T,np.dot(x[i],w))" : "np.dot(w.T,x_w)"
y_out = np.tensordot(x_w,w,axes=((1),(0))).swapaxes(1,2)
Alternatively, all of the mess being taken care of with one np.einsum call, but could be slower -
y_out = np.einsum('ab,cae,eg->cbg',w,x,w)
Runtime test -
In [114]: def tensordot_app(x, w):
...: x_w = np.tensordot(x,w,axes=((2),(0)))
...: return np.tensordot(x_w,w,axes=((1),(0))).swapaxes(1,2)
...:
...: def einsum_app(x, w):
...: return np.einsum('ab,cae,eg->cbg',w,x,w)
...:
In [115]: x = np.random.rand(30,50,50)
...: w = np.random.rand(50,50)
...:
In [116]: %timeit tensordot_app(x, w)
1000 loops, best of 3: 477 µs per loop
In [117]: %timeit einsum_app(x, w)
1 loop, best of 3: 219 ms per loop
Giving the broadcasting a chance
The sum-notation was -
y[m,i,j] = sum( w[k,i] * x[m,k,l] * w[l,j], axes=[k,l] )
Thus, the three terms would be stacked for broadcasting, like so -
w : [ N x k x i x N x N]
x : [ m x k x N x l x N]
w : [ N x N X N x l x j]
, where N represents new-axis being appended to facilitate broadcasting along those dims.
The terms with new axes being added with None/np.newaxis would then look like this -
w : w[None, :, :, None, None]
x : x[:, :, None, :, None]
w : w[None, None, None, :, :]
Thus, the broadcasted product would be -
p = w[None,:,:,None,None]*x[:,:,None,:,None]*w[None,None,None,:,:]
Finally, the output would be sum-reduction to lose (k,l), i.e. axes =(1,3) -
y = p.sum((1,3))

How can I vectorize and speed up this large array calculation?

I'm currently trying to calculate the sum of all sum of subsquares in a 10.000 x 10.000 array of values. As an example, if my array was :
1 1 1
2 2 2
3 3 3
I want the result to be :
1+1+1+2+2+2+3+3+3 [sum of squares of size 1]
+(1+1+2+2)+(1+1+2+2)+(2+2+3+3)+(2+2+3+3) [sum of squares of size 2]
+(1+1+1+2+2+2+3+3+3) [sum of squares of size 3]
________________________________________
68
So, as a first try i wrote a very simple python code to do that. As it was in O(k^2.n^2) (n being the size of the big array and k the size of the subsquares we are getting), the processing was awfully long. I wrote another algorithm in O(n^2) to speed it up :
def getSum(tab,size):
n = len(tab)
tmp = numpy.zeros((n,n))
for i in xrange(0,n):
sum = 0
for j in xrange(0,size):
sum += tab[j][i]
tmp[0][i] = sum
for j in xrange(1,n-size+1):
sum += (tab[j+size-1][i] - tab[j-1][i])
tmp[j][i] = sum
finalsum = 0
for i in xrange(0,n-size+1):
sum = 0
for j in xrange(0,size):
sum += tmp[i][j]
finalsum += sum
for j in xrange(1,n-size+1):
finalsum += (tmp[i][j+size-1] - tmp[i][j-1])
return finalsum
So this code works fine. Given an array and a size of subsquares, it will return the sum of the values in all this subsquares. I basically iterate over the size of subsquares to get all the possible values.
The problem is this is again waaay to long for big arrays (over 20 days for a 10.000 x 10.000 array). I googled it and learned I could vectorize the iterations over arrays with numpy. However, i couldn't figure out how to make it so in my case...
If someone can help me to speed my algorithm up, or give me good documentation on the subject, i'll be glad !
Thank you !
Following the excellent idea of #Divakar, I would suggest using integral images to speedup convolutions. If the matrix is very big, you have to convolve it several times (once for each kernel size). Several convolutions (or evaluations of sums inside a square) can be very efficiently computed using integral images (aka summed area tables).
Once an integral image M is computed, the sum of all values inside a region (x0, y0) - (x1, y1) can be computed with just 4 aritmetic computations, regardless of the size of the window (picture from wikipedia):
M[x1, y1] - M[x1, y0] - M[x0, y1] + M[x0, y0]
This can be very easily vectorized in numpy. An integral images can be calculated with cumsum. Following the example:
tab = np.array([[1, 1, 1], [2, 2, 2], [3, 3, 3]])
M = tab.cumsum(0).cumsum(1) # Create integral images
M = np.pad(M, ((1,0), (1,0)), mode='constant') # pad it with a row and column of zeros
M is padded with a row and a column of zeros to handle the first row (where x0 = 0 or y0 = 0).
Then, given a window size W, the sum of EVERY window of size W can be computed efficiently and fully vectorized with numpy as:
all_sums = M[W:, W:] - M[:-W, W:] - M[W:, :-W] + M[:-W, :-W]
Note that the vectorized operation above, calculates the sum of every window, i.e. every A, B, C, and D of the matrix. The sum of all windows is then calculated as
total = all_sums.sum()
Note that for N different sizes, different to convolutions, the integral image has to be computed only once, thus, the code can be written very efficiently as:
def get_all_sums(A):
M = A.cumsum(0).cumsum(1)
M = np.pad(M, ((1,0), (1,0)), mode='constant')
total = 0
for W in range(1, A.shape[0] + 1):
tmp = M[W:, W:] + M[:-W, :-W] - M[:-W, W:] - M[W:, :-W]
total += tmp.sum()
return total
The output for the example:
>>> get_all_sums(tab)
68
Some timings comparing convolutions to integral images with different size matrices. getAllSums refeers to Divakar's convolutional method, while get_all_sums to the integral images based method described above:
>>> R1 = np.random.randn(10, 10)
>>> R2 = np.random.randn(100, 100)
1) With R1 10x10 matrix:
>>> %time getAllSums(R1)
CPU times: user 353 µs, sys: 9 µs, total: 362 µs
Wall time: 335 µs
2393.5912717342017
>>> %time get_all_sums(R1)
CPU times: user 243 µs, sys: 0 ns, total: 243 µs
Wall time: 248 µs
2393.5912717342012
2) With R2 100x100 matrix:
>>> %time getAllSums(R2)
CPU times: user 698 ms, sys: 0 ns, total: 698 ms
Wall time: 701 ms
176299803.29826894
>>> %time get_all_sums(R2)
CPU times: user 2.51 ms, sys: 0 ns, total: 2.51 ms
Wall time: 2.47 ms
176299803.29826882
Note that using integral images is 300 times faster than convolutions for large enough matrices.
Those sliding summations are best suited to be calculated as 2D convolution summations and those could be efficiently calculated with scipy's convolve2d. Thus, for a specific size, you could get the summations, like so -
def getSum(tab,size):
# Define kernel and perform convolution to get such sliding windowed summations
kernel = np.ones((size,size),dtype=tab.dtype)
return convolve2d(tab, kernel, mode='valid').sum()
To get summations across all sizes, I think the best way both in terms of memory and performance efficiency would be to use a loop to loop over all possible sizes. Thus, to get the final summation, you would have -
def getAllSums(tab):
finalSum = 0
for i in range(tab.shape[0]):
finalSum += getSum(tab,i+1)
return finalSum
Sample run -
In [51]: tab
Out[51]:
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
In [52]: getSum(tab,1) # sum of squares of size 1
Out[52]: 18
In [53]: getSum(tab,2) # sum of squares of size 2
Out[53]: 32
In [54]: getSum(tab,3) # sum of squares of size 3
Out[54]: 18
In [55]: getAllSums(tab) # sum of squares of all sizes
Out[55]: 68
Based the idea to calculate how many times each number counted, I came to this simple code:
def get_sum(matrix, n):
ret = 0
for i in range(n):
for j in range(n):
for k in range(1, n + 1):
# k is the square size. count is times of the number counted.
count = min(k, n - k + 1, i + 1, n - i) * min(k, n - k + 1, j + 1, n - j)
ret += count * matrix[i][j]
return ret
a = [[1, 1, 1], [2, 2, 2], [3, 3, 3]]
print get_sum(a, 3) # 68
Divakar's solution is fantastic, however, I think mine could be more efficient, at least in asymptotical time complexity (O(n^3) compared with Divakar's O(n^3logn)).
I get a O(n^2) solution now...
Basically, we can get that:
def get_sum2(matrix, n):
ret = 0
for i in range(n):
for j in range(n):
x = min(i + 1, n - i)
y = min(j + 1, n - j)
# k < half
half = (n + 1) / 2
for k in range(1, half + 1):
count = min(k, x) * min(k, y)
ret += count * matrix[i][j]
# k >= half
for k in range(half + 1, n + 1):
count = min(n + 1 - k, x) * min(n + 1 - k, y)
ret += count * matrix[i][j]
return ret
You can see sum(min(k, x) * min(k, y)) can be calculated in O(1) when 1 <= k <= n/2
So we came to that O(n^2) code:
def get_square_sum(n):
return n * (n + 1) * (2 * n + 1) / 6
def get_linear_sum(a, b):
return (b - a + 1) * (a + b) / 2
def get_count(x, y, k_end):
# k <= min(x, y), count is k*k
sum1 = get_square_sum(min(x, y))
# k > min(x, y) and k <= max(x, y), count is k * min(x, y)
sum2 = get_linear_sum(min(x, y) + 1, max(x, y)) * min(x, y)
# k > max(x, y), count is x * y
sum3 = x * y * (k_end - max(x, y))
return sum1 + sum2 + sum3
def get_sum3(matrix, n):
ret = 0
for i in range(n):
for j in range(n):
x = min(i + 1, n - i)
y = min(j + 1, n - j)
half = n / 2
# k < half
ret += get_count(x, y, half) * matrix[i][j]
# k >= half
ret += get_count(x, y, half + half % 2) * matrix[i][j]
return ret
Test:
a = [[1, 1, 1], [2, 2, 2], [3, 3, 3]]
n = 1000
b = [[1] * n] * n
print get_sum3(a, 3) # 68
print get_sum3(b, n) # 33500333666800
You can rewrite my O(n^2) Python code to C and I believe it will result a very efficient solution...

How to vectorize finding max value in numpy array with if statement?

My Setup: Python 2.7.4.1, Numpy MKL 1.7.1, Windows 7 x64, WinPython
Context:
I tried to implement the Sequential Minimal Optimization algorithm for solving SVM. I use maximal violating pair approach.
The problem:
In working set selection procedure i want to find maximum value of gradient and its index for elements which met some condition, y[i]*alpha[i]<0 or y[i]*alpha[i]
#y - array of -1 and 1
y=np.array([-1,1,1,1,-1,1])
#alpha- array of floats in range [0,C]
alpha=np.array([0.4,0.1,1.33,0,0.9,0])
#grad - array of floats
grad=np.array([-1,-1,-0.2,-0.4,0.4,0.2])
GMaxI=float('-inf')
GMax_idx=-1
n=alpha.shape[0] #usually n=100000
C=4
B=[0,0,C]
for i in xrange(0,n):
yi=y[i] #-1 or 1
alpha_i=alpha[i]
if (yi * alpha_i< B[yi+1]): # B[-1+1]=0 B[1+1]=C
if( -yi*grad[i]>=GMaxI):
GMaxI= -yi*grad[i]
GMax_idx = i
This procedure is called many times (~50000) and profiler shows that this is the bottleneck.
It is possible to vectorize this code?
Edit 1:
Add some small exemplary data
Edit 2:
I have checked solution proposed by hwlau , larsmans and Mr E. Only solutions proposed Mr E is correct. Below sample code with all three answers:
import numpy as np
y=np.array([ -1, -1, -1, -1, -1, -1, -1, -1])
alpha=np.array([0, 0.9, 0.4, 0.1, 1.33, 0, 0.9, 0])
grad=np.array([-3, -0.5, -1, -1, -0.2, -4, -0.4, -0.3])
C=4
B=np.array([0,0,C])
#hwlau - wrong index and value
filter = (y*alpha < C*0.5*(y+1)).astype('float')
GMax_idx = (filter*(-y*grad)).argmax()
GMax = -y[GMax_idx]*grad[GMax_idx]
print GMax_idx,GMax
#larsmans - wrong index
neg_y_grad = (-y * grad)[y * alpha < B[y + 1]]
GMaxI = np.max(neg_y_grad)
GMax_ind = np.argmax(neg_y_grad)
print GMax_ind,GMaxI
#Mr E - correct result
BY = np.take(B, y+1)
valid_mask = (y * alpha < BY)
values = -y * grad
values[~valid_mask] = np.min(values) - 1.0
GMaxI = values.max()
GMax_idx = values.argmax()
print GMax_idx,GMaxI
Output (GMax_idx, GMaxI)
0 -3.0
3 -0.2
4 -0.2
Conclusions
After checking all solutions, the fastest one (2x-6x) is solution proposed by #ali_m. However it requires to install some python packages: numba and all its prerequisites.
I have some trouble to use numba with class methods, so I create global functions which are autojited with numba, my solution look something like this:
from numba import autojit
#autojit
def FindMaxMinGrad(A,B,alpha,grad,y):
'''
Finds i,j indices with maximal violatin pair scheme
A,B - 3 dim arrays, contains bounds A=[-C,0,0], B=[0,0,C]
alpha - array like, contains alpha coeficients
grad - array like, gradient
y - array like, labels
'''
GMaxI=-100000
GMaxJ=-100000
GMax_idx=-1
GMin_idx=-1
for i in range(0,alpha.shape[0]):
if (y[i] * alpha[i]< B[y[i]+1]):
if( -y[i]*grad[i]>GMaxI):
GMaxI= -y[i]*grad[i]
GMax_idx = i
if (y[i] * alpha[i]> A[y[i]+1]):
if( y[i]*grad[i]>GMaxJ):
GMaxJ= y[i]*grad[i]
GMin_idx = i
return (GMaxI,GMaxJ,GMax_idx,GMin_idx)
class SVM(object):
def working_set(self,....):
FindMaxMinGrad(.....)
You can probably do quite a lot better than plain vectorization if you use numba to JIT-compile your original code that used nested loops.
import numpy as np
from numba import autojit
#autojit
def jit_max_grad(y, alpha, grad, B):
maxgrad = -inf
maxind = -1
for ii in xrange(alpha.shape[0]):
if (y[ii] * alpha[ii] < B[y[ii] + 1]):
g = -y[ii] * grad[ii]
if g >= maxgrad:
maxgrad = g
maxind = ii
return maxind, maxgrad
For comparison, here's Mr E's vectorized version:
def mr_e_max_grad(y, alpha, grad, B):
BY = np.take(B, y+1)
valid_mask = (y * alpha < BY)
values = -y * grad
values[~valid_mask] = np.min(values) - 1.0
GMaxI = values.max()
GMax_idx = values.argmax()
return GMax_idx, GMaxI
Timing:
y = np.array([ -1, -1, -1, -1, -1, -1, -1, -1])
alpha = np.array([0, 0.9, 0.4, 0.1, 1.33, 0, 0.9, 0])
grad = np.array([-3, -0.5, -1, -1, -0.2, -4, -0.4, -0.3])
C = 4
B = np.array([0,0,C])
%timeit mr_e_max_grad(y, alpha, grad, B)
# 100000 loops, best of 3: 19.1 µs per loop
%timeit jit_max_grad(y, alpha, grad, B)
# 1000000 loops, best of 3: 1.07 µs per loop
Update: if you want to see what the timings look like on bigger arrays, it's easy to define a function that generates semi-realistic fake data based on your description in the question:
def make_fake(n, C=4):
y = np.random.choice((-1, 1), n)
alpha = np.random.rand(n) * C
grad = np.random.randn(n)
B = np.array([0,0,C])
return y, alpha, grad, B
%%timeit y, alpha, grad, B = make_fake(100000, 4)
mr_e_max_grad(y, alpha, grad, B)
# 1000 loops, best of 3: 1.83 ms per loop
%%timeit y, alpha, grad, B = make_fake(100000, 4)
jit_max_grad(y, alpha, grad, B)
# 1000 loops, best of 3: 471 µs per loop
I think this is a fully vectorized version
import numpy as np
#y - array of -1 and 1
y=np.array([-1,1,1,1,-1,1])
#alpha- array of floats in range [0,C]
alpha=np.array([0.4,0.1,1.33,0,0.9,0])
#grad - array of floats
grad=np.array([-1,-1,-0.2,-0.4,0.4,0.2])
BY = np.take(B, y+1)
valid_mask = (y * alpha < BY)
values = -yi * grad
values[~valid_mask] = np.min(values) - 1.0
GMaxI = values.max()
GMax_idx = values.argmax()
Here you go:
y=np.array([-1,1,1,1,-1,1])
alpha=np.array([0.4,0.1,1.33,0,0.9,0])
grad=np.array([-1,-1,-0.2,-0.4,0.4,0.2])
C=4
filter = (y*alpha < C*0.5*(y+1)).astype('float')
GMax_idx = (filter*(-y*grad)).argmax()
GMax = -y[GMax_idx]*grad[GMax_idx]
No benchmark tried, but it is pure numerical and vectorized so it should be fast.
If you change B from a list to a NumPy array, you can at least vectorize the yi * alpha_i< B[yi+1] and push the loop inwards:
GMaxI = float('-inf')
GMax_idx = -1
for i in np.where(y * alpha < B[y + 1])[0]:
if -y[i] * grad[i] >= GMaxI:
GMaxI= -y[i] * grad[i]
GMax_idx = i
That should save a bit of time. Next up, you can vectorize -y[i] * grad[i]:
GMaxI = float('-inf')
GMax_idx = -1
neg_y_grad = -y * grad
for i in np.where(y * alpha < B[y + 1])[0]:
if neg_y_grad[i] >= GMaxI:
GMaxI= -y[i] * grad[i]
GMax_idx = i
Finally, we can vectorize away the entire loop by using max and argmax on -y * grad, filtered by y * alpha < B[y + 1]:
neg_y_grad = (-y * grad)
GMaxI = np.max(neg_y_grad[y * alpha < B[y + 1]])
GMax_idx = np.where(neg_y_grad == GMaxI)[0][0]

Categories