Is there an elegant/quick way to reproduce this without the for loops? I'm looking to have a 3D matrix of values, and and 2D matrix that gives the indices for which to copy the 3rd dimensions' values while creating a new 3D matrix of the same shape. Here is an implementation with a lot of loops.
np.random.seed(0)
x = np.random.randint(5, size=(2, 3, 4))
y = np.random.randint(x.shape[1], size=(3, 4))
z = np.zeros((2, 3, 4))
for i in range(x.shape[0]):
for j in range(x.shape[1]):
z[i, j, :] = x[i, y[i, j], :]
This puzzled me for a bit, until I realized you aren't using all of y. y is (3,4), but you are indexing over (2,3):
In [28]: x[np.arange(2)[:,None], y[:2,:3],:]
Out[28]:
array([[[4, 0, 0, 4],
[4, 0, 3, 3],
[3, 1, 3, 2]],
[[3, 0, 3, 0],
[2, 1, 0, 1],
[1, 0, 1, 4]]])
We could use all of y with:
In [32]: x[np.arange(2)[:,None,None],y,np.arange(4)]
Out[32]:
array([[[4, 0, 3, 2],
[4, 0, 3, 2],
[3, 0, 0, 3]],
[[3, 1, 1, 4],
[3, 1, 1, 4],
[1, 1, 3, 1]]])
the 3 indexes broadcast to (2,3,4). But the selection is different from your z.
Related
It's easy to understand the concept of Transpose in 2-D array. I reall can not understand How the transpose of high-dimensional arrays works.
For example
c = np.indices([4,5]).T.reshape(20,1,2)
d = np.indices([4,5]).reshape(20,1,2)
np.all(c==d) # output is False
Why are the outputs of C and D inconsistent?
In [143]: c = np.indices([4,5])
In [144]: c
Out[144]:
array([[[0, 0, 0, 0, 0],
[1, 1, 1, 1, 1],
[2, 2, 2, 2, 2],
[3, 3, 3, 3, 3]],
[[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]]])
In [145]: c.shape
Out[145]: (2, 4, 5)
In [146]: c.T.shape
Out[146]: (5, 4, 2)
Look at one 2d array from the size 2 dimension:
In [150]: c[0,:,:]
Out[150]:
array([[0, 0, 0, 0, 0],
[1, 1, 1, 1, 1],
[2, 2, 2, 2, 2],
[3, 3, 3, 3, 3]])
In [151]: c.T[:,:,0]
Out[151]:
array([[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3]])
The 2nd is the usual 2d transpose, a (5,4) array.
MATLAB doesn't do transpose on 3d arrays, at least it doesn't call it such. It may have a way making such a change. numpy, using a general shape/strides multidimensional implementation, easily generalizes the 2d transpose - to 1d or 3d or more.
I have a 3D numpy array data where dimensions a and b represent the resolution of an image and c is the image/frame number. I want to call np.histogram on each pixel (a and b combination) across the c dimension, with an output array of dimension (a, b, BINS). I've accomplished this task with a nested loop, but how can I vectorize this operation?
hists = np.zeros((a, b, BINS))
for row in range(a):
for column in range(b):
hists[row, column, :] = np.histogram(data[row, column, :], bins=BINS)[0]
I am confident that the solution is trivial, nonetheless all help is appreciated :)
np.histogram computes over the flattened array.
However, you could use np.apply_along_axis.
np.apply_along_axis(lambda a: np.histogram(a, bins=BINS)[0], 2, data)
This is interesting problem.
Make a Minimal Working Example (MWE)
It should be the main habit in asking questions on SO.
a, b, c = 2, 3, 4
data = np.random.randint(10, size=(a, b, c))
hists = np.zeros((a, b, c), dtype=int)
for row in range(a):
for column in range(b):
hists[row, column, :] = np.histogram(data[row, column, :], bins=c)[0]
data
>>> array([[[6, 4, 3, 3],
[7, 3, 8, 0],
[1, 5, 8, 0]],
[[5, 5, 7, 8],
[3, 2, 7, 8],
[6, 8, 8, 0]]])
hists
>>> array([[[2, 1, 0, 1],
[1, 1, 0, 2],
[2, 0, 1, 1]],
[[2, 0, 1, 1],
[2, 0, 0, 2],
[1, 0, 0, 3]]])
Make it as simple as possible (but still working)
You can eliminate one loop and simplify it:
new_data = data.reshape(a*b, c)
new_hists = np.zeros((a*b, c), dtype=int)
for row in range(a*b):
new_hists[row, :] = np.histogram(new_data[row, :], bins=c)[0]
new_hists
>>> array([[2, 1, 0, 1],
[1, 1, 0, 2],
[2, 0, 1, 1],
[2, 0, 1, 1],
[2, 0, 0, 2],
[1, 0, 0, 3]])
new_data
>>> array([[6, 4, 3, 3],
[7, 3, 8, 0],
[1, 5, 8, 0],
[5, 5, 7, 8],
[3, 2, 7, 8],
[6, 8, 8, 0]])
Can you find a similar problems and use keypoints of their solution?
In general, you can't vectorise something like that is being done in loop:
for row in array:
some_operation(row)
Except the cases you can call another vectorised operation on flattened array and then move it back to the initial shape:
arr = array.ravel()
another_operation(arr)
out = arr.reshape(array.shape)
It looks you're fortunate with np.histogram because I'm pretty sure similar things have been done before.
Final solution
new_data = data.reshape(a*b, c)
m, M = new_data.min(axis=1), new_data.max(axis=1)
bins = (c * (new_data - m[:, None]) // (M-m)[:, None])
out = np.zeros((a*b, c+1), dtype=int)
advanced_indexing = np.repeat(np.arange(a*b), c), bins.ravel()
np.add.at(out, advanced_indexing, 1)
out.reshape((a, b, -1))
>>> array([[[2, 1, 0, 0, 1],
[1, 1, 0, 1, 1],
[2, 0, 1, 0, 1]],
[[2, 0, 1, 0, 1],
[2, 0, 0, 1, 1],
[1, 0, 0, 1, 2]]])
Note that it adds an extra bin in each histogram and puts max values in it but I hope it's not hard to fix if you need.
I am trying to figure out a fully vectorised way to compute the co-variance matrix for a 2D numpy array for a given base kernel function. For example if the input is X = [[a,b],[c,d]] for a kernel function k(x_1,x_2) the covariance matrix will be
K=[[k(a,a),k(a,b),k(a,c),k(a,d)],
[k(b,a),k(b,b),k(b,c),k(b,d)],
[k(c,a),k(c,b),k(c,c),k(c,d)],
[k(d,a),k(d,b),k(d,c),k(d,d)]].
how do I go about doing this? I am confused as to how to repeat the values and then apply the function and what might be the most efficient way of doing this.
You can use np.meshgrid to get two matrices with values for the first and second parameter to the k function.
In [8]: X = np.arange(4).reshape(2,2)
In [9]: np.meshgrid(X, X)
Out[9]:
[array([[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3]]),
array([[0, 0, 0, 0],
[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3]])]
You can then just pass these matrices to the k function:
In [10]: k = lambda x1, x2: (x1-x2)**2
In [11]: X1, X2 = np.meshgrid(X, X)
In [12]: k(X1, X2)
Out[12]:
array([[0, 1, 4, 9],
[1, 0, 1, 4],
[4, 1, 0, 1],
[9, 4, 1, 0]])
Here's another way
k(X.reshape(-1, 1), X.reshape(1, -1))
I want to generate a fixed number of random column indexes (without replacement) for each row of a numpy array.
A = np.array([[3, 5, 2, 3, 3],
[1, 3, 3, 4, 5],
[3, 5, 4, 2, 1],
[1, 2, 3, 5, 3]])
If I fixed the required column number to 2, I want something like
np.array([[1,3],
[0,4],
[1,4],
[2,3]])
I am looking for a non-loop Numpy based solution. I tried with choice, but with the replacement=False I get error
ValueError: Cannot take a larger sample than population when
'replace=False'
Here's one vectorized approach inspired by this post -
def random_unique_indexes_per_row(A, N=2):
m,n = A.shape
return np.random.rand(m,n).argsort(1)[:,:N]
Sample run -
In [146]: A
Out[146]:
array([[3, 5, 2, 3, 3],
[1, 3, 3, 4, 5],
[3, 5, 4, 2, 1],
[1, 2, 3, 5, 3]])
In [147]: random_unique_indexes_per_row(A, N=2)
Out[147]:
array([[4, 0],
[0, 1],
[3, 2],
[2, 0]])
In [148]: random_unique_indexes_per_row(A, N=3)
Out[148]:
array([[2, 0, 1],
[3, 4, 2],
[3, 2, 1],
[4, 3, 0]])
Like this?
B = np.random.randint(5, size=(len(A), 2))
You can use random.choice() as following:
def random_indices(arr, n):
x, y = arr.shape
return np.random.choice(np.arange(y), (x, n))
# or return np.random.randint(low=0, high=y, size=(x, n))
Demo:
In [34]: x, y = A.shape
In [35]: np.random.choice(np.arange(y), (x, 2))
Out[35]:
array([[0, 2],
[0, 1],
[0, 1],
[3, 1]])
As an experimental approach here is a way that in 99% of the times will give unique indices:
In [60]: def random_ind(arr, n):
...: x, y = arr.shape
...: ind = np.random.randint(low=0, high=y, size=(x * 2, n))
...: _, index = np.unique(ind.dot(np.random.rand(ind.shape[1])), return_index=True)
...: return ind[index][:4]
...:
...:
...:
In [61]: random_ind(A, 2)
Out[61]:
array([[0, 1],
[1, 0],
[1, 1],
[1, 4]])
In [62]: random_ind(A, 2)
Out[62]:
array([[1, 0],
[2, 0],
[2, 1],
[3, 1]])
In [64]: random_ind(A, 3)
Out[64]:
array([[0, 0, 0],
[1, 1, 2],
[0, 4, 1],
[2, 3, 1]])
In [65]: random_ind(A, 4)
Out[65]:
array([[0, 4, 0, 3],
[1, 0, 1, 4],
[0, 4, 1, 2],
[3, 0, 1, 0]])
This function will return IndexError at line return ind[index][:4] if there's no 4 unique items in that case you can repeat the function to make sure you'll get the desire result.
Dask (http://dask.pydata.org/en/latest/array-api.html) is a flexible parallel computing library for analytics. It scales to big data, in constrast to Numpy and has many similar methods. How can I achieve the same effect as numpy.tile on a dask array?
Using dask.array.concatenate() could be a possible workaround.
Demo in NumPy:
In [374]: x = numpy.arange(4).reshape((2, 2))
In [375]: x
Out[375]:
array([[0, 1],
[2, 3]])
In [376]: n = 3
In [377]: numpy.tile(x, n)
Out[377]:
array([[0, 1, 0, 1, 0, 1],
[2, 3, 2, 3, 2, 3]])
In [378]: numpy.concatenate([x for i in range(n)], axis=1)
Out[378]:
array([[0, 1, 0, 1, 0, 1],
[2, 3, 2, 3, 2, 3]])