numpy einsum: nested dot products

numpy einsum: nested dot products - python

I have two n-by-k-by-3 arrays a and b, e.g.,
import numpy as np
a = np.array([
[
[1, 2, 3],
[3, 4, 5]
],
[
[4, 2, 4],
[1, 4, 5]
]
])
b = np.array([
[
[3, 1, 5],
[0, 2, 3]
],
[
[2, 4, 5],
[1, 2, 4]
]
])
and it like to compute the dot-product of all pairs of "triplets", i.e.,
np.sum(a*b, axis=2)
A better way to do that is perhaps einsum, but I can't seem to get the indices straight.
Any hints here?

You are loosing the third axis on those two 3D input arrays with that sum-reduction, while keeping the first two axes aligned. Thus, with np.einsum, we would have the first two strings identical alongwith the third string being identical too, but would be skipped in the output string notation signalling we are reducing along that axis for both the inputs. Thus, the solution would be -
np.einsum('ijk,ijk->ij',a,b)

Related

Multiplication with two 2D matrix to output 3D matrix

I am wondering any good ways to calculate this type of multiplication.
It's simply multiplying x[i] by x element-wise, and resulting into [2, 2, 3] matrix.
>>> x
array([[0, 1, 2],
[3, 4, 5]])
>>> output
array([[[ 0, 1, 4],
[ 0, 4, 10]],
[[ 0, 4, 10],
[ 9, 16, 25]]])
I tried with code below and wondering for faster version using numpy.
np.array([
np.multiply(x[i], x)
for i in range(x.shape[0])
])

There are two straightforward ways to do so, the first is using broadcasting, and the second one using einsum. I'd recommed using timeit, to compare the various versions for their speed with the application you have in mind:
out_broadcast = x[:, None, :] * x
out_einsum = np.einsum('ij,kj->ikj',x,x)

Numpy linalg.norm with ufunc.reduceat functionality

Solution: #QuangHoang's first comment namely np.linalg.norm(arr,axis=1).
I would like to apply Numpy's linalg.norm function column wise to sub-arrays of a 3D array by using ranges (or indices?), similar in functionality to what ufunc.reduceat does.
Given the following array:
import numpy as np
In []: arr = np.array([[0,1,2,3], [2,2,3,4], [3,2,5,6],
[1,7,1,9], [1,4,8,6], [2,3,5,8],
[2,5,7,3], [2,3,4,6], [2,5,3,2]]).reshape(3,3,4)
Out []: array([[[0, 1, 2, 3],
[2, 2, 3, 4],
[3, 2, 5, 6]],
[[1, 7, 1, 9],
[1, 4, 8, 6],
[2, 3, 5, 8]],
[[2, 5, 7, 3],
[2, 3, 4, 6],
[2, 5, 3, 2]]])
I would like to apply linalg.norm column wise to the three sub-arrays separately i.e. for the first column it would be linalg.norm([0, 2, 3]), linalg.norm([1, 1, 2]) and linalg.norm([2, 2, 2]), for the second linalg.norm([1, 2, 2]), linalg.norm([7, 4, 3]) and linalg.norm([5, 3, 5]) etc. resulting in a 2D vector with shape (3,4) containing the results of the linalg.norm calls.
Doing this with a 2D array is straightforward by specifying the axis:
import numpy.linalg as npla
In []: npla.norm(np.array([[0,1,2,3], [2,2,3,4], [3,2,5,6]]), axis=0)
Out []: array([3.60555128, 3. , 6.164414 , 7.81024968])
But I don't understand how to do that for each sub-array separately. I believe that reduceat with a ufunc like add allows to set indices and ranges. Would something similar be possible here but with linalg.norm?
Edit 1:
I followed #hpaulj's advice to look at the code used for add.reduce. Getting a better understanding of the method I was able to search more precisely and I found np.apply_along_axis which is exactly what I was looking for:
In []: np.apply_along_axis(npla.norm, 1, arr)
Out []: array([[ 3.60555128, 3. , 6.164414 , 7.81024968],
[ 2.44948974, 8.60232527, 9.48683298, 13.45362405],
[ 3.46410162, 7.68114575, 8.60232527, 7. ]])
However, this method is very slow. Is there a way to use linalg.nrom in a vectorized manner instead?
Edit 2:
#QuangHoang's first comment is actually the correct answer I was looking for. I misunderstood the method which is why I misunderstood their comment. Specifying the axis in the linalg.norm call is what is required here:
np.linalg.norm(arr,axis=1)

Remove entire sub array from multi-dimensional array if any element in array is duplicate

I have a multi-dimensional array in Python where there may be a repeated integer within a vector in the array. For example.
array = [[1,2,3,4],
[2,9,12,4],
[5,6,7,8],
[6,8,12,13]]
I would like to completely remove the vectors that contain any element that has appeared previously. In this case, vector [2,9,12,4] and vector [6,11,12,13] should be removed because they have an element (2 and 6 respectively) that has appeared in a previous vector within that array. Note that [6,8,12,13] contains two elements that have appeared previously, so the code should be able to work with these scenarios as well.
The resulting array should end up being:
array = [[1,2,3,4],
[5,6,7,8]]
I thought I could achieve this with np.unique(array, axis=0), but I couldnt find another function that would take care of this particular uniqueness.
Any thoughts are appreaciated.

You can work with array of sorted numbers and corresponding indices of rows that looks like so:
number_info = array([[ 0, 1],
[ 0, 2],
[ 1, 2],
[ 0, 3],
[ 0, 4],
[ 1, 4],
[ 2, 5],
[ 2, 6],
[ 3, 6],
[ 2, 7],
[ 2, 8],
[ 3, 8],
[ 1, 9],
[ 1, 12],
[ 3, 12],
[ 3, 13]])
It indicates that rows remove_idx = [2, 5, 8, 11, 14] of this array needs to be removed and it points to rows rows_idx = [1, 1, 3, 3, 3] of the original array. Now, the code:
flat_idx = np.repeat(np.arange(array.shape[0]), array.shape[1])
number_info = np.transpose([flat_idx, array.ravel()])
number_info = number_info[np.argsort(number_info[:,1])]
remove_idx = np.where((np.diff(number_info[:,1])==0) &
(np.diff(number_info[:,0])>0))[0] + 1
remove_rows = number_info[remove_idx, 0]
output = np.delete(array, remove_rows, axis=0)
Output:
array([[1, 2, 3, 4],
[5, 6, 7, 8]])

Here's a quick way to do it with a list comprehension and set intersections:
>>> array = [[1,2,3,4],
... [2,9,12,4],
... [5,6,7,8],
... [6,8,12,13]]
>>> [v for i, v in enumerate(array) if not any(set(a) & set(v) for a in array[:i])]
[[1, 2, 3, 4], [5, 6, 7, 8]]

How to apply a function on jagged Numpy arrays (unequal row lengths) without using np.apply_along_axis()?

I'm trying to speed up a process, I think this might be possible using numpy's apply_along_axis. The problem is that not all my axis have the same length.
When I do:
a = np.array([[1, 2, 3],
[2, 3, 4],
[4, 5, 6]])
b = np.apply_along_axis(sum, 1, a)
print(b)
This works fine. But I would like to do something similar to (please note that the first row has 4 elements and the rest have 3):
a = np.array([[1, 2, 3, 4],
[2, 3, 4],
[4, 5, 6]])
b = np.apply_along_axis(sum, 1, a)
print(b)
But this fails because:
numpy.AxisError: axis 1 is out of bounds for array of dimension 1
I've looked around and the only 'solution' I've found is to add zeros to make all the arrays the same length, which would probably defeat the purpose of performance improvement.
Is there any way to use numpy_apply_along_axis on a non-regular shaped numpy array?

You can transform your initial array of iterable-objects to ndarray by padding them with zeros in a vectorized manner:
import numpy as np
a = np.array([[1, 2, 3, 4],
[2, 3, 4],
[4, 5, 6]])
max_len = len(max(a, key = lambda x: len(x))) # max length of iterable-objects contained in array
cust_func = np.vectorize(pyfunc=lambda x: np.pad(array=x,
pad_width=(0,max_len),
mode='constant',
constant_values=(0,0))[:max_len], otypes=[list])
a_pad = np.stack(cust_func(a))
output:
array([[1, 2, 3, 4],
[2, 3, 4, 0],
[4, 5, 6, 0]])

It depends.
Do you know the size of the vectors before or are you appending to a list?
see e.g. http://stackoverflow.com/a/58085045/7919597
You could for example pad the arrays
import numpy as np
a1 = [1, 2, 3, 4]
a2 = [2, 3, 4, np.nan] # pad with nan
a3 = [4, 5, 6, np.nan] # pad with nan
b = np.stack([a1, a2, a3], axis=0)
print(b)
# you can apply the normal numpy operations on
# arrays with nan, they usually just result in a nan
# in a resulting array
c = np.diff(b, axis=-1)
print(c)
Afterwards you can apply a moving window on each row over the columns.
Have a look at https://stackoverflow.com/a/22621523/7919597 which is only 1d, but can give you an idea of how it could work.
It is possible to use a 2d array with only one row as kernel (shape e.g. (1, 3)) with scipy.signal.convolve2d and use the idea above.
This is a workaround to get a "row-wise 1D convolution":
from scipy import signal
krnl = np.array([[0, 1, 0]])
d = signal.convolve2d(c, krnl, mode='same')
print(d)

Explain this 4D numpy array indexing intuitively

x = np.random.randn(4, 3, 3, 2)
print(x[1,1])
output:
[[ 1.68158825 -0.03701415]
[ 1.0907524 -1.94530359]
[ 0.25659178 0.00475093]]
I am python newbie. I can't really understand 4-D array index like above. What does x[1,1] mean?
For example, for vector
a = [[2][3][8][9]], a[0] = 2, a[3] = 9.
I get this but I don't know what x[1,1] refers to.
Please explain in detail. Thank you.

A 2D array is a matrix : an array of arrays.
A 4D array is basically a matrix of matrices:
Specifying one index gives you an array of matrices:
>>> x[1]
array([[[-0.37387191, -0.19582887],
[-2.88810217, -0.8249608 ],
[-0.46763329, 1.18628611]],
[[-1.52766397, -0.2922034 ],
[ 0.27643125, -0.87816021],
[-0.49936658, 0.84011388]],
[[ 0.41885001, 0.16037164],
[ 1.21510322, 0.01923682],
[ 0.96039904, -0.22761806]]])
Specifying two indices gives you a matrix:
>>> x[1, 1]
array([[-1.52766397, -0.2922034 ],
[ 0.27643125, -0.87816021],
[-0.49936658, 0.84011388]])
Specifying three indices gives you an array:
>>> x[1, 1, 1]
array([ 0.27643125, -0.87816021])
Specifying four indices gives you a single element:
>>> x[1, 1, 1, 1]
-0.87816021212791107
x[1,1] gives you the small matrix that was saved in the 2nd column of the 2nd row of the large matrix.

A 4d numpy array is an array nested 4 layers deep, so at the top level it would look like this:
[ # 1st level Array (Outer)
[ # 2nd level Array
[[1, 2], [3, 4]], # 3rd level arrays, containing 2 4th level arrays
[[5, 6], [7, 8]]
],
[ # 2nd Level array
[[9, 10], [11, 12]],
[[13, 14], [15, 16]]
]
]
x[1,1] expands to x[1][1], Let's unpack this one expression at a time, the first expression x[1] selects the first element from the global array which is the following object from the earlier array:
[
[[1, 2], [3, 4]],
[[5, 6], [7, 8]]
]
The next expression now looks like this:
[
[[1, 2], [3, 4]],
[[5, 6], [7, 8]]
][1]
So evaluating that (selecting the first element in the array) gives us the following result:
[[1, 2], [3, 4]]
As you can see selecting an element in a 4d array gives us a 3d array, selecting an element from a 3d array gives a 2d array and selecting an element from a 2d array gives us a 1d array.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

numpy einsum: nested dot products - python

Related

Multiplication with two 2D matrix to output 3D matrix

Numpy linalg.norm with ufunc.reduceat functionality

Remove entire sub array from multi-dimensional array if any element in array is duplicate

How to apply a function on jagged Numpy arrays (unequal row lengths) without using np.apply_along_axis()?

Explain this 4D numpy array indexing intuitively

Categories

Resources