I am trying to figure out a fully vectorised way to compute the co-variance matrix for a 2D numpy array for a given base kernel function. For example if the input is X = [[a,b],[c,d]] for a kernel function k(x_1,x_2) the covariance matrix will be
K=[[k(a,a),k(a,b),k(a,c),k(a,d)],
[k(b,a),k(b,b),k(b,c),k(b,d)],
[k(c,a),k(c,b),k(c,c),k(c,d)],
[k(d,a),k(d,b),k(d,c),k(d,d)]].
how do I go about doing this? I am confused as to how to repeat the values and then apply the function and what might be the most efficient way of doing this.
You can use np.meshgrid to get two matrices with values for the first and second parameter to the k function.
In [8]: X = np.arange(4).reshape(2,2)
In [9]: np.meshgrid(X, X)
Out[9]:
[array([[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3]]),
array([[0, 0, 0, 0],
[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3]])]
You can then just pass these matrices to the k function:
In [10]: k = lambda x1, x2: (x1-x2)**2
In [11]: X1, X2 = np.meshgrid(X, X)
In [12]: k(X1, X2)
Out[12]:
array([[0, 1, 4, 9],
[1, 0, 1, 4],
[4, 1, 0, 1],
[9, 4, 1, 0]])
Here's another way
k(X.reshape(-1, 1), X.reshape(1, -1))
Related
I have a 3D numpy array data where dimensions a and b represent the resolution of an image and c is the image/frame number. I want to call np.histogram on each pixel (a and b combination) across the c dimension, with an output array of dimension (a, b, BINS). I've accomplished this task with a nested loop, but how can I vectorize this operation?
hists = np.zeros((a, b, BINS))
for row in range(a):
for column in range(b):
hists[row, column, :] = np.histogram(data[row, column, :], bins=BINS)[0]
I am confident that the solution is trivial, nonetheless all help is appreciated :)
np.histogram computes over the flattened array.
However, you could use np.apply_along_axis.
np.apply_along_axis(lambda a: np.histogram(a, bins=BINS)[0], 2, data)
This is interesting problem.
Make a Minimal Working Example (MWE)
It should be the main habit in asking questions on SO.
a, b, c = 2, 3, 4
data = np.random.randint(10, size=(a, b, c))
hists = np.zeros((a, b, c), dtype=int)
for row in range(a):
for column in range(b):
hists[row, column, :] = np.histogram(data[row, column, :], bins=c)[0]
data
>>> array([[[6, 4, 3, 3],
[7, 3, 8, 0],
[1, 5, 8, 0]],
[[5, 5, 7, 8],
[3, 2, 7, 8],
[6, 8, 8, 0]]])
hists
>>> array([[[2, 1, 0, 1],
[1, 1, 0, 2],
[2, 0, 1, 1]],
[[2, 0, 1, 1],
[2, 0, 0, 2],
[1, 0, 0, 3]]])
Make it as simple as possible (but still working)
You can eliminate one loop and simplify it:
new_data = data.reshape(a*b, c)
new_hists = np.zeros((a*b, c), dtype=int)
for row in range(a*b):
new_hists[row, :] = np.histogram(new_data[row, :], bins=c)[0]
new_hists
>>> array([[2, 1, 0, 1],
[1, 1, 0, 2],
[2, 0, 1, 1],
[2, 0, 1, 1],
[2, 0, 0, 2],
[1, 0, 0, 3]])
new_data
>>> array([[6, 4, 3, 3],
[7, 3, 8, 0],
[1, 5, 8, 0],
[5, 5, 7, 8],
[3, 2, 7, 8],
[6, 8, 8, 0]])
Can you find a similar problems and use keypoints of their solution?
In general, you can't vectorise something like that is being done in loop:
for row in array:
some_operation(row)
Except the cases you can call another vectorised operation on flattened array and then move it back to the initial shape:
arr = array.ravel()
another_operation(arr)
out = arr.reshape(array.shape)
It looks you're fortunate with np.histogram because I'm pretty sure similar things have been done before.
Final solution
new_data = data.reshape(a*b, c)
m, M = new_data.min(axis=1), new_data.max(axis=1)
bins = (c * (new_data - m[:, None]) // (M-m)[:, None])
out = np.zeros((a*b, c+1), dtype=int)
advanced_indexing = np.repeat(np.arange(a*b), c), bins.ravel()
np.add.at(out, advanced_indexing, 1)
out.reshape((a, b, -1))
>>> array([[[2, 1, 0, 0, 1],
[1, 1, 0, 1, 1],
[2, 0, 1, 0, 1]],
[[2, 0, 1, 0, 1],
[2, 0, 0, 1, 1],
[1, 0, 0, 1, 2]]])
Note that it adds an extra bin in each histogram and puts max values in it but I hope it's not hard to fix if you need.
I want to generate a fixed number of random column indexes (without replacement) for each row of a numpy array.
A = np.array([[3, 5, 2, 3, 3],
[1, 3, 3, 4, 5],
[3, 5, 4, 2, 1],
[1, 2, 3, 5, 3]])
If I fixed the required column number to 2, I want something like
np.array([[1,3],
[0,4],
[1,4],
[2,3]])
I am looking for a non-loop Numpy based solution. I tried with choice, but with the replacement=False I get error
ValueError: Cannot take a larger sample than population when
'replace=False'
Here's one vectorized approach inspired by this post -
def random_unique_indexes_per_row(A, N=2):
m,n = A.shape
return np.random.rand(m,n).argsort(1)[:,:N]
Sample run -
In [146]: A
Out[146]:
array([[3, 5, 2, 3, 3],
[1, 3, 3, 4, 5],
[3, 5, 4, 2, 1],
[1, 2, 3, 5, 3]])
In [147]: random_unique_indexes_per_row(A, N=2)
Out[147]:
array([[4, 0],
[0, 1],
[3, 2],
[2, 0]])
In [148]: random_unique_indexes_per_row(A, N=3)
Out[148]:
array([[2, 0, 1],
[3, 4, 2],
[3, 2, 1],
[4, 3, 0]])
Like this?
B = np.random.randint(5, size=(len(A), 2))
You can use random.choice() as following:
def random_indices(arr, n):
x, y = arr.shape
return np.random.choice(np.arange(y), (x, n))
# or return np.random.randint(low=0, high=y, size=(x, n))
Demo:
In [34]: x, y = A.shape
In [35]: np.random.choice(np.arange(y), (x, 2))
Out[35]:
array([[0, 2],
[0, 1],
[0, 1],
[3, 1]])
As an experimental approach here is a way that in 99% of the times will give unique indices:
In [60]: def random_ind(arr, n):
...: x, y = arr.shape
...: ind = np.random.randint(low=0, high=y, size=(x * 2, n))
...: _, index = np.unique(ind.dot(np.random.rand(ind.shape[1])), return_index=True)
...: return ind[index][:4]
...:
...:
...:
In [61]: random_ind(A, 2)
Out[61]:
array([[0, 1],
[1, 0],
[1, 1],
[1, 4]])
In [62]: random_ind(A, 2)
Out[62]:
array([[1, 0],
[2, 0],
[2, 1],
[3, 1]])
In [64]: random_ind(A, 3)
Out[64]:
array([[0, 0, 0],
[1, 1, 2],
[0, 4, 1],
[2, 3, 1]])
In [65]: random_ind(A, 4)
Out[65]:
array([[0, 4, 0, 3],
[1, 0, 1, 4],
[0, 4, 1, 2],
[3, 0, 1, 0]])
This function will return IndexError at line return ind[index][:4] if there's no 4 unique items in that case you can repeat the function to make sure you'll get the desire result.
Is there an elegant/quick way to reproduce this without the for loops? I'm looking to have a 3D matrix of values, and and 2D matrix that gives the indices for which to copy the 3rd dimensions' values while creating a new 3D matrix of the same shape. Here is an implementation with a lot of loops.
np.random.seed(0)
x = np.random.randint(5, size=(2, 3, 4))
y = np.random.randint(x.shape[1], size=(3, 4))
z = np.zeros((2, 3, 4))
for i in range(x.shape[0]):
for j in range(x.shape[1]):
z[i, j, :] = x[i, y[i, j], :]
This puzzled me for a bit, until I realized you aren't using all of y. y is (3,4), but you are indexing over (2,3):
In [28]: x[np.arange(2)[:,None], y[:2,:3],:]
Out[28]:
array([[[4, 0, 0, 4],
[4, 0, 3, 3],
[3, 1, 3, 2]],
[[3, 0, 3, 0],
[2, 1, 0, 1],
[1, 0, 1, 4]]])
We could use all of y with:
In [32]: x[np.arange(2)[:,None,None],y,np.arange(4)]
Out[32]:
array([[[4, 0, 3, 2],
[4, 0, 3, 2],
[3, 0, 0, 3]],
[[3, 1, 1, 4],
[3, 1, 1, 4],
[1, 1, 3, 1]]])
the 3 indexes broadcast to (2,3,4). But the selection is different from your z.
I have two array s
x=array([[0, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[2, 2, 2, 2, 2]])
I want to subselect elements in each row by the length in array y
y = array([3, 2, 4])
My target is z:
z = array([[0, 0, 0],
[1, 0,],
[2, 2, 2, 2]])
How could I do that with numpy functions instead of list/loop?
Thank you so much for your help.
Numpy array is optimized for homogeneous array with a specific dimensions. I like to think of it like a matrix: it does not make sense to have a matrix with different number of elements on each rows.
That said, depending on how you want to use the processed array, you can simply make a list of array:
z = [array([0, 0, 0]),
array([1, 0,]),
array([2, 2, 2, 2]])]
Still, you will need to do that manually:
x = array([[0, 0, 0, 0, 0], [1, 0, 0, 0, 0], [2, 2, 2, 2, 2]])
y = array([3, 2, 4])
z = [x_item[:y_item] for x_item, y_item in zip(x, y)]
The list comprehension iterates over the x and y combined with zip() to create the new slice of the original array.
Something like this also,
z = [x[i,:e] for i,e in enumerate(y)]
Let's say I have a 3-D array:
[[[0,1,2],
[0,1,2],
[0,1,2]],
[[3,4,5],
[3,4,5],
[3,4,5]]]
And I want to rearrange this by the columns:
[[0,1,2,3,4,5],
[0,1,2,3,4,5],
[0,1,2,3,4,5]]
What would be an elegant python numpy code for doing this for essentially a 3-D np.array of arbitrary shape and depth?
Could there be a fast method that bypasses for loop? All the approaches I made were terribly adhoc and brute they were basically too slow and useless...
Thanks!!
Swap axes and reshape -
a.swapaxes(0,1).reshape(a.shape[1],-1)
Sample run -
In [115]: a
Out[115]:
array([[[0, 1, 2],
[0, 1, 2],
[0, 1, 2]],
[[3, 4, 5],
[3, 4, 5],
[3, 4, 5]]])
In [116]: a.swapaxes(0,1).reshape(a.shape[1],-1)
Out[116]:
array([[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5]])
Using einops:
einops.rearrange(a, 'x y z -> y (x z) ')
And I would recommend to give meaningful names to axes (instead of x y z) depending on the context (e.g. time, height, etc.). This will make it easy to understand what the code does
In : einops.rearrange(a, 'x y z -> y (x z) ')
Out:
array([[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5]])