I have an array a of length N and need to implement the following operation:
With p in [0..1]. This equation is a lossy sum, where the first indexes in the sum are weighted by a greater loss (p^{n-i}) than the last ones. The last index (i=n) is always weigthed by 1. if p = 1, then the operation is a simple cumsum.
b = np.cumsum(a)
If if p != 1, I can implement this operation in a cpu-inefficient way:
b = np.empty(np.shape(a))
# I'm using the (-1,-1,-1) idiom for reversed ranges
p_vec = np.power(p, np.arange(N-1, 0-1, -1))
# p_vec[0] = p^{N-1}, p_vec[-1] = 1
for n in range(N):
b[n] = np.sum(a[:n+1]*p_vec[-(n+1):])
Or in a memory-inefficient but vectorized way (IMO is cpu inefficient too, since a lot of work is wasted):
a_idx = np.reshape(np.arange(N+1), (1, N+1)) - np.reshape(np.arange(N-1, 0-1, -1), (N, 1))
a_idx = np.maximum(0, a_idx)
# For N=4, a_idx looks like this:
# [[0, 0, 0, 0, 1],
# [0, 0, 0, 1, 2],
# [0, 0, 1, 2, 3],
# [0, 1, 2, 3, 4]]
a_ext = np.concatenate(([0], a,), axis=0) # len(a_ext) = N + 1
p_vec = np.power(p, np.arange(N, 0-1, -1)) # len(p_vec) = N + 1
b = np.dot(a_ext[a_idx], p_vec)
Is there a better way to achieve this 'lossy' cumsum?
What you want is a IIR filter, you can use scipy.signal.lfilter(), here is the code:
Your code:
import numpy as np
N = 10
p = 0.8
x = np.random.randn(N)
y = np.empty_like(x)
p_vec = np.power(p, np.arange(N-1, 0-1, -1))
for n in range(N):
y[n] = np.sum(x[:n+1]*p_vec[-(n+1):])
the output:
array([1.76405235, 1.81139909, 2.42785725, 4.183179 , 5.21410119,
3.19400307, 3.50529088, 2.65287549, 2.01908154, 2.02586374])
By using lfilter():
from scipy import signal
y = signal.lfilter([1], [1, -p], x)
the output:
array([1.76405235, 1.81139909, 2.42785725, 4.183179 , 5.21410119,
3.19400307, 3.50529088, 2.65287549, 2.01908154, 2.02586374])
I have recently hit a roadblock when it comes to performance. I know how to manually loop and do the interpolation from the origin cell to all the other cells by brute-forcing/looping each row and column in 2d array.
however when I process a 2D array of a shape say (3000, 3000), the linear spacing and the interpolation come to a standstill and severely hurt performance.
I am looking for a way I can optimize this loop, I am aware of vectorization and broadcasting just not sure how I can apply it in this situation.
I will explain it with code and figures
import numpy as np
from scipy.ndimage import map_coordinates
m = np.array([
origin_row = 3
origin_col = 3
m_max = np.zeros(m.shape)
m_dist = np.zeros(m.shape)
rows, cols = m.shape
for col in range(cols):
for row in range(rows):
# Get spacing linear interpolation
x_plot = np.linspace(col, origin_col, 5)
y_plot = np.linspace(row, origin_row, 5)
# grab the interpolated line
interpolated_line = map_coordinates(m,
order=1, mode='nearest')
m_max[row][col] = max(interpolated_line)
m_dist[row][col] = np.argmax(interpolated_line)
As you can see this is very brute force, and I have managed to broadcast all the code around this part but stuck on this part.
here is an illustration of what I am trying to achieve, I will go through the first iteration
1.) the input array
2.) the first loop from 0,0 to origin (3,3)
3.) this will return [10 9 9 8 0] and the max will be 10 and the index will be 0
5.) here is the output for the sample array I used
Here is an update of the performance based on the accepted answer.
To speed up the code, you could first create the x_plot and y_plot outside of the loops instead of creating them several times each one:
#this would be outside of the loops
num = 5
lin_col = np.array([np.linspace(i, origin_col, num) for i in range(cols)])
lin_row = np.array([np.linspace(i, origin_row, num) for i in range(rows)])
then you could access them in each loop by x_plot = lin_col[col] and y_plot = lin_row[row]
Second, you can avoid both loops by using map_coordinates on more than just one v_stack for each couple (row, col). To do so, you can create all the combinaisons of x_plot and y_plot by using np.tile and np.ravel such as:
arr_vs = np.vstack(( np.tile( lin_row, cols).ravel(),
np.tile( lin_col.ravel(), rows)))
Note that ravel is not used at the same place each time to get all the combinaisons. Now you can use map_coordinates with this arr_vs and reshape the result with the number of rows, cols and num to get each interpolated_line in the last axis of a 3D-array:
arr_map = map_coordinates(m, arr_vs, order=1, mode='nearest').reshape(rows,cols,num)
Finally, you can use np.max and np.argmax on the last axis of arr_map to get the results m_max and m_dist. So all the code would be:
import numpy as np
from scipy.ndimage import map_coordinates
m = np.array([
origin_row = 3
origin_col = 3
rows, cols = m.shape
num = 5
lin_col = np.array([np.linspace(i, origin_col, num) for i in range(cols)])
lin_row = np.array([np.linspace(i, origin_row, num) for i in range(rows)])
arr_vs = np.vstack(( np.tile( lin_row, cols).ravel(),
np.tile( lin_col.ravel(), rows)))
arr_map = map_coordinates(m, arr_vs, order=1, mode='nearest').reshape(rows,cols,num)
m_max = np.max( arr_map, axis=-1)
m_dist = np.argmax( arr_map, axis=-1)
print (m_max)
print (m_dist)
and you get like expected:
array([[10, 10, 10, 10, 10, 10],
[ 9, 9, 10, 10, 9, 9],
[ 9, 9, 9, 10, 8, 9],
[ 9, 8, 8, 0, 8, 9],
[ 8, 8, 7, 8, 8, 9],
[ 7, 7, 8, 8, 8, 8]])
array([[0, 0, 0, 0, 0, 0],
[0, 0, 2, 0, 0, 0],
[0, 2, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 2, 0, 0, 0, 0],
[1, 1, 2, 1, 2, 1]])
EDIT: lin_col and lin_row are related, so you can do faster:
if cols >= rows:
arr = np.arange(cols)[:,None]
lin_col = arr + (origin_col-arr)/(num-1.)*np.arange(num)
lin_row = lin_col[:rows] + np.linspace(0, origin_row - origin_col, num)[None,:]
arr = np.arange(rows)[:,None]
lin_row = arr + (origin_row-arr)/(num-1.)*np.arange(num)
lin_col = lin_row[:cols] + np.linspace(0, origin_col - origin_row, num)[None,:]
Here is a sort-of-vectorized approach. It is not very optimized and there may be one or two index-off-by-one errors, but it may give you ideas.
Two examples a monochrome 384x512 test pattern and a "real" 3-channel 768x1024 image. Both are uint8.
This takes half a minute on my machine.
For larger images one would require more RAM than I have (8GB). Or one would have to break it down into smaller chunks.
And the code
import numpy as np
def rays(img, ctr):
M, N, *d = img.shape
aidx = 2*(slice(None),) + (img.ndim-2)*(None,)
m, n = ctr
out = np.empty_like(img)
offsI = np.empty(img.shape, np.uint16)
offsJ = np.empty(img.shape, np.uint16)
img4, out4, I4, J4 = ((x[m:, n:], x[m:, n::-1], x[m::-1, n:], x[m::-1, n::-1]) for x in (img, out, offsI, offsJ))
for i, o, y, x in zip(img4, out4, I4, J4):
for _ in range(2):
M, N, *d = i.shape
widths = np.arange(1, M+1, dtype=np.uint16).clip(None, N)
I = np.arange(M, dtype=np.uint16).repeat(widths)
J = np.ones_like(I)
J[0] = 0
J[widths[:-1].cumsum()] -= widths[:-1]
J = J.cumsum(dtype=np.uint16)
ii = np.arange(1, 2*M-1, dtype=np.uint16) // 2
II = ii.clip(None, I[:, None])
jj = np.arange(2*M-2, dtype=np.uint32) // 2 * 2 + 1
jj[0] = 0
JJ = ((1 + jj) * J[:, None] // (2*(I+1))[:, None]).astype(np.uint16).clip(None, J[:, None])
idx = i[II, JJ].argmax(axis=1)
II, JJ = (np.take_along_axis(ZZ[aidx] , idx[:, None], 1)[:, 0] for ZZ in (II, JJ))
y[I, J], x[I, J] = II, JJ
SH = II, JJ, *np.ogrid[tuple(map(slice, img.shape))][2:]
o[I, J] = i[SH]
i, o = i.swapaxes(0, 1), o.swapaxes(0, 1)
y, x = x.swapaxes(0, 1), y.swapaxes(0, 1)
return out, offsI, offsJ
from scipy.misc import face
f = face()
fr, *fidx = rays(f, (200, 400))
s = np.uint8((np.arange(384)[:, None] % 41 < 2)&(np.arange(512) % 41 < 2))
s = 255*s + 128*s[::-1, ::-1] + 64*s[::-1] + 32*s[:, ::-1]
sr, *sidx = rays(s, (200, 400))
import Image
I have a task to check if a matrix is a rotation matrix, I write code as follow:
import numpy as np
def isRotationMatrix(R):
# some code here
# return True or False
R = np.array([
[0, 0, 1],
[1, 0, 0],
[0, 1, 0],
print(isRotationMatrix(R)) # Should be True
R = np.array([
[-1, 0, 0],
[0, 1, 0],
[0, 0, 1],
print(isRotationMatrix(R)) # Should be False
I don't know how to implement the function isRotationMatrix.
My naive implement, it only works for a 3x3 matrix:
def isRotationMatrix(R_3x3):
should_be_norm_one = np.allclose(np.linalg.norm(R_3x3, axis=0), np.ones(shape=3))
x = R_3x3[:, 0].ravel()
y = R_3x3[:, 1].ravel()
z = R_3x3[:, 2].ravel()
should_be_perpendicular = \
np.allclose(np.cross(x, y), z) \
and np.allclose(np.cross(y, z), x) \
and np.allclose(np.cross(z, x), y)
return should_be_perpendicular and should_be_norm_one
I am using this definition of rotation matrix. A rotation matrix should satisfy the conditions M (M^T) = (M^T) M = I and det(M) = 1. Here M^T denotes transpose of M, I denotes identity matrix and det(M) represents determinant of matrix M.
You can use the following python code to check if the matrix is a rotation matrix.
import numpy as np
''' I have chosen `M` as an example. Feel free to put in your own matrix.'''
M = np.array([[0,-1,0],[1,0,0],[0,0,1]])
def isRotationMatrix(M):
tag = False
I = np.identity(M.shape[0])
if np.all((np.matmul(M, M.T)) == I) and (np.linalg.det(M)==1): tag = True
return tag
if(isRotationMatrix(M)): print 'M is a rotation matrix.'
else: print 'M is not a rotation matrix.'
A rotation matrix is a orthonormal matrix and its determinant should be 1.
My implement:
import numpy as np
def isRotationMatrix(R):
# square matrix test
if R.ndim != 2 or R.shape[0] != R.shape[1]:
return False
should_be_identity = np.allclose(R.dot(R.T), np.identity(R.shape[0], np.float))
should_be_one = np.allclose(np.linalg.det(R), 1)
return should_be_identity and should_be_one
if __name__ == '__main__':
R = np.array([
[0, 0, 1],
[1, 0, 0],
[0, 1, 0],
print(isRotationMatrix(R)) # True
R = np.array([
[-1, 0, 0],
[0, 1, 0],
[0, 0, 1],
print(isRotationMatrix(R)) # True
print(isRotationMatrix(np.zeros((3, 2)))) # False
I'm completeing an assignment but a little stuck on part 5 and 6. It essentially does: Using the FFT to multiply two binary numbers. I was wondering if someone could help out.
# The binary numbers and their product
a_bin = 0b100100100100
b_bin = 0b111000111000
c_bin = a_bin * b_bin
print('The product of a and b is', c_bin)
# (i) The coefficients of the polynomials A and B
Acoeff = [0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1]
Bcoeff = [0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1]
# (ii) the value representations of A and B
Aval = np.fft(Acoeff, 32)
Bval = np.fft(Bcoeff, 32)
# (iii) The value representation of C
Cval = []
for i in range(len(Aval)):
Cval.append(Aval[i] * Bval[i])
# (iv) The coefficients of the polynomial C
Ccoeff = np.ifft(Cval)
# we'll get rid of the imaginary parts, which are just numerical errors
for i, c in enumerate(Ccoeff):
Ccoeff[i] = int(round(c.real))
# (v) calculate the product by evaluating the polynomial at 2, i.e., C(2)
# (You may need to take the real part at the end if there is a small imaginary component)
prod = 0
print('Using the FFT the product of a and b is', int(round(prod.real)))
# (vi) write code to calculate the binary digits of c directly from the coefficients of C, Ccoeff.
# hint: You can use (q,r) = divmod(x, 2) to find the quotient and remainder of x when divided by 2
I have some complex assignment logic in a simulation that I would like to optimize for performance. The current logic is implemented as a set of nested for loops over a variety of numpy arrays. I would like to vectorize this assignment logic but haven't been able to figure out if this is possible
import numpy as np
from itertools import izip
def reverse_enumerate(l):
return izip(xrange(len(l)-1, -1, -1), reversed(l))
materials = np.array([[1, 0, 1, 1],
[1, 1, 0, 0],
[0, 1, 1, 1],
[1, 0, 0, 1]])
vectors = np.array([[1, 1, 0, 0],
[0, 0, 1, 1]])
prices = np.array([10, 20, 30, 40])
demands = np.array([1, 1, 1, 1])
supply_by_vector = np.zeros(len(vectors)).astype(int)
#go through each material and assign it to the first vector that the material covers
for m_indx, material in enumerate(materials):
#find the first vector where the material covers the SKU
for v_indx, vector in enumerate(vectors):
if (vector <= material).all():
supply_by_vector[v_indx] = supply_by_vector[v_indx] + 1
original_supply_by_vector = np.copy(supply_by_vector)
profit_by_vector = np.zeros(len(vectors))
remaining_ask_by_sku = np.copy(demands)
#calculate profit by assigning material from vectors to SKUs to satisfy demand
#go through vectors in reverse order (so lowest priority vectors are used up first)
profit = 0.0
for v_indx, vector in reverse_enumerate(vectors):
for sku_indx, price in enumerate(prices):
available = supply_by_vector[v_indx]
if available == 0:
ask = remaining_ask_by_sku[sku_indx]
if ask <= 0:
if vector[sku_indx]:
assign = ask if available > ask else available
remaining_ask_by_sku[sku_indx] = remaining_ask_by_sku[sku_indx] - assign
supply_by_vector[v_indx] = supply_by_vector[v_indx] - assign
profit_by_vector[v_indx] = profit_by_vector[v_indx] + assign*price
profit = profit + assign * price
print 'total profit:', profit
print 'unfulfilled demand:', remaining_ask_by_sku
print 'original supply:', original_supply_by_vector
total profit: 80.0
unfulfilled demand: [0 1 0 0]
original supply: [1 2]
It seems there is a dependency between iterations within the innermost nested loop in the second part/group of the nested loops and that to me seemed like difficult if not impossible to vectorize. So, this post is basically a partial solution trying to vectorize instead the first group of two nested loops, which were -
supply_by_vector = np.zeros(len(vectors)).astype(int)
for m_indx, material in enumerate(materials):
#find the first vector where the material covers the SKU
for v_indx, vector in enumerate(vectors):
if (vector <= material).all():
supply_by_vector[v_indx] = supply_by_vector[v_indx] + 1
That entire section could be replaced by one line of vectorized code, like so -
supply_by_vector = ((vectors[:,None] <= materials).all(2)).sum(1)