Avoiding for loops in numpy arrays multiplications

Avoiding for loops in numpy arrays multiplications - python

I need to multiply two 3D arrays in an usual way.
If needed to accomplish my task, I can ''afford'' to permute (change their shape) as I need as they are pretty small in size (less than (1_000, 200, 200) of np.complex128).
At the moment, I have the following inefficient triple nested for-loop:
import numpy as np
result = np.zeros( (640, 39, 20) )
a = np.random.rand(640, 640, 20)
b = np.random.rand(39, 640, 20)
for j in range(640):
for m in range(39):
for l in range(20):
result[j, m, l] = (a[j, :, l] * b[m, :, l]).sum()
How can I make the above as efficient as possible using numpy's magic?
I know I could use numba and hope that I beat numpy by using compiled code and parallel=True, but I want to see if numpy suffices for my task.
EDIT: Does it work for a more complex inner for loop as below?
for l in range(20):
for m in range(-l, l+1, 1):
for j in range(640):
result[j, m, l] = (a[j, :, l] * b[m, :, l]).sum()
After #hpaulj comment, I now understand that the above is not possible.
Thank you!

Related

einsum equivalent for ndarray multiplication

I have the following multiplication routine:
import numpy as np
a = np.random.rand(3,3)
b = np.random.rand(3,50,50)
res = np.zeros((3, 50, 50))
for i in range(50):
for j in range(50):
res[:,i,j] = a # b[:,i,j]
What is the einsum equivalent expression?
Best regards

Might want to brush up on Einstein summation notation:
res = np.einsum('ij, jkl -> ikl', a, b)
In this case, np.tensordot is also useful:
np.tensordot(a ,b, 1).shape
Out[]: (3, 50, 50)

How to get the "center" of grayscale numpy image

I have a 2D numpy array arr of shape (m,n) with nonnegative values. I would like to find a pair (k,l) such that
the difference between sum(arr[:k, :]) and sum(arr[k:, :]) is minimal
similarly, the difference between sum(arr[:, :l]) and sum(arr[:, l:]) is minimal
If you can come up with an algorithm only for k, the rest is actually easy. We simply transpose the matrix to find l.
A note for the skeptical: We may assume that sum(arr[:k, :]) and sum(arr[:,:l]) are strictly increasing functions of k and l, respectively.

This works:
sum_to_k = np.pad(np.cumsum(np.sum(a, axis=1)), (1, 0))
sum_to_l = np.pad(np.cumsum(np.sum(a, axis=0)), (1, 0))
k = np.argmin(np.abs(sum_to_k - (sum_to_k[-1] - sum_to_k)))
l = np.argmin(np.abs(sum_to_l - (sum_to_l[-1] - sum_to_l)))

NumPy template matching SQDIFF with `sliding window_view`

The SQDIFF is defined as openCV definition. (I believe they omit channels)
Which in junior numpy Python should be
A = np.arange(27, dtype=np.float32)
A = A.reshape(3,3,3) # The "image"
B = np.ones([2, 2, 3], dtype=np.float32) # window
rw, rh = A.shape[0] - B.shape[0] + 1, A.shape[1] - B.shape[1] + 1 # End result size
result = np.zeros([rw, rh])
for i in range(rw):
for j in range(rh):
w = A[i:i + B.shape[0], j:j + B.shape[1]]
res = B - w
result[i, j] = np.sum(
res ** 2
)
cv_result = cv.matchTemplate(A, B, cv.TM_SQDIFF) # this result is the same as the simple for loops
assert np.allclose(cv_result, result)
This is comparatively slow solution. I have read about sliding_window_view but cannot get it correct.
# This will fail with these large arrays but is ok for smaller ones
A = np.random.rand(1028, 1232, 3).astype(np.float32)
B = np.random.rand(248, 249, 3).astype(np.float32)
locations = np.lib.stride_tricks.sliding_window_view(A, B.shape)
sqdiff = np.sum((B - locations) ** 2, axis=(-1,-2, -3, -4)) # This will fail with normal sized images
will fail with MemoryError even if the result easily fits to memory. How can I produce similar results to the cv2.matchTemplate function with this faster way?

As a last resort, you may perform the computation in tiles, instead of computing "all at once".
np.lib.stride_tricks.sliding_window_view returns a view of the data, so it doesn't consume a lot of RAM.
The expression B - locations can't use a view, and requires the RAM for storing an array with shape (781, 984, 1, 248, 249, 3) of float elements.
The total RAM for storing B - locations is 781*984*1*248*249*3*4 = 569,479,908,096 bytes.
For avoiding the need for storing B - locations at the RAM at once, we may compute sqdiff in tiles, when "tile" computation requires less RAM.
A simple tiles division is using every row as a tile - loop over the rows of sqdiff, and compute the output row by row.
Example:
sqdiff = np.zeros((locations.shape[0], locations.shape[1]), np.float32) # Allocate an array for storing the result.
# Compute sqdiff row by row instead of computing all at once.
for i in range(sqdiff.shape[0]):
sqdiff[i, :] = np.sum((B - locations[i, :, :, :, :, :]) ** 2, axis=(-1, -2, -3, -4))
Executable code sample:
import numpy as np
import cv2
A = np.random.rand(1028, 1232, 3).astype(np.float32)
B = np.random.rand(248, 249, 3).astype(np.float32)
locations = np.lib.stride_tricks.sliding_window_view(A, B.shape)
cv_result = cv2.matchTemplate(A, B, cv2.TM_SQDIFF) # this result is the same as the simple for loops
#sqdiff = np.sum((B - locations) ** 2, axis=(-1, -2, -3, -4)) # This will fail with normal sized images
sqdiff = np.zeros((locations.shape[0], locations.shape[1]), np.float32) # Allocate an array for storing the result.
# Compute sqdiff row by row instead of computing all at once.
for i in range(sqdiff.shape[0]):
sqdiff[i, :] = np.sum((B - locations[i, :, :, :, :, :]) ** 2, axis=(-1, -2, -3, -4))
assert np.allclose(cv_result, sqdiff)
I know the solution is a bit disappointing... But it is the only generic solution I could find.

is equivalent to
where the 'star' operation is a cross-correlation, the 1_[m, n] is a window the size of the template, and 1_[k, l] is a window with the size of the image.
You can compute the cross-correlation terms using 'scipy.signal.correlate' and find the matches by looking for local minima in the square difference map.
You might want to do some non-minimum suppression too.
This solution will require orders of magnitude less memory to store.
For more help, please post a reproducible example with an image and template that are valid for the algorithm. Using noise will result in meaningless outputs.

Create parameter for multidimensional evaluation in Python/Numpy

I have a functional evaluation which has many parameters, and I want to vectorize the evaluation. Something like this:
I = 100
J = 34
K = 6
i, j, k = array(range(I)), array(range(J)), array(range(K))
i, j, k = meshgrid(i, j, k)
f = myfun(i, j, k)
This is excellent, however, now I also have a parameter that I want to send to myfun that I generate with some other function and that is invariant over some of the indices above, thus:
p = my_param_gen()
and let's say
p.shape()
will output
(100, 6)
This would correspond to p being invariant over the index J. Now, I would like to expand the shape of p to be
(100, 34, 6)
in a meshgrid-kind of fashion so that the new dimension is filled constant with the old dimensions. How do I do this the best? The approach should work also with adding many new dimensions. I have seen numpy.expand_dims, but it does not do this.

Your
In [116]: i.shape
Out[116]: (34, 100, 6)
If p.shape is (100,6), then p will broadcast with i,j,k without further change. That is p[None,:,:] expansion is automatic.
If you'd used i, j, k = np.meshgrid(i, j, k, indexing='ij'),
In [121]: i.shape
Out[121]: (100, 34, 6)
And p[:,None,:] would be needed for broadcasting (equilvalently np.expand_dims(p,1))

Simplify numpy.dot within loop

Is it possible to simplify this:
import numpy as np
a = np.random.random_sample((40, 3))
data_base = np.random.random_sample((20, 3))
mean = np.random.random_sample((40,))
data = []
for s in data_base:
data.append(mean + np.dot(a, s))
data should be of size (20, 40). I was wondering if I could do some broadcasting instead of the loop. I was not able to do it with np.add and some [:, None]. I certainly do not use this correctly.

Your data creates a (20,40) array:
In [385]: len(data)
Out[385]: 20
In [386]: data = np.array(data)
In [387]: data.shape
Out[387]: (20, 40)
The straight forward application of dot produces the same thing:
In [388]: M2=mean+np.dot(data_base, a.T)
In [389]: np.allclose(M2,data)
Out[389]: True
The matmul operator also works with these arrays (no need to expand and squeeze):
M3 = data_base#a.T + mean

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Avoiding for loops in numpy arrays multiplications - python

Related

einsum equivalent for ndarray multiplication

How to get the "center" of grayscale numpy image

NumPy template matching SQDIFF with `sliding window_view`

Create parameter for multidimensional evaluation in Python/Numpy

Simplify numpy.dot within loop

Categories

Resources