Related
I have a 3D numpy array data where dimensions a and b represent the resolution of an image and c is the image/frame number. I want to call np.histogram on each pixel (a and b combination) across the c dimension, with an output array of dimension (a, b, BINS). I've accomplished this task with a nested loop, but how can I vectorize this operation?
hists = np.zeros((a, b, BINS))
for row in range(a):
for column in range(b):
hists[row, column, :] = np.histogram(data[row, column, :], bins=BINS)[0]
I am confident that the solution is trivial, nonetheless all help is appreciated :)
np.histogram computes over the flattened array.
However, you could use np.apply_along_axis.
np.apply_along_axis(lambda a: np.histogram(a, bins=BINS)[0], 2, data)
This is interesting problem.
Make a Minimal Working Example (MWE)
It should be the main habit in asking questions on SO.
a, b, c = 2, 3, 4
data = np.random.randint(10, size=(a, b, c))
hists = np.zeros((a, b, c), dtype=int)
for row in range(a):
for column in range(b):
hists[row, column, :] = np.histogram(data[row, column, :], bins=c)[0]
data
>>> array([[[6, 4, 3, 3],
[7, 3, 8, 0],
[1, 5, 8, 0]],
[[5, 5, 7, 8],
[3, 2, 7, 8],
[6, 8, 8, 0]]])
hists
>>> array([[[2, 1, 0, 1],
[1, 1, 0, 2],
[2, 0, 1, 1]],
[[2, 0, 1, 1],
[2, 0, 0, 2],
[1, 0, 0, 3]]])
Make it as simple as possible (but still working)
You can eliminate one loop and simplify it:
new_data = data.reshape(a*b, c)
new_hists = np.zeros((a*b, c), dtype=int)
for row in range(a*b):
new_hists[row, :] = np.histogram(new_data[row, :], bins=c)[0]
new_hists
>>> array([[2, 1, 0, 1],
[1, 1, 0, 2],
[2, 0, 1, 1],
[2, 0, 1, 1],
[2, 0, 0, 2],
[1, 0, 0, 3]])
new_data
>>> array([[6, 4, 3, 3],
[7, 3, 8, 0],
[1, 5, 8, 0],
[5, 5, 7, 8],
[3, 2, 7, 8],
[6, 8, 8, 0]])
Can you find a similar problems and use keypoints of their solution?
In general, you can't vectorise something like that is being done in loop:
for row in array:
some_operation(row)
Except the cases you can call another vectorised operation on flattened array and then move it back to the initial shape:
arr = array.ravel()
another_operation(arr)
out = arr.reshape(array.shape)
It looks you're fortunate with np.histogram because I'm pretty sure similar things have been done before.
Final solution
new_data = data.reshape(a*b, c)
m, M = new_data.min(axis=1), new_data.max(axis=1)
bins = (c * (new_data - m[:, None]) // (M-m)[:, None])
out = np.zeros((a*b, c+1), dtype=int)
advanced_indexing = np.repeat(np.arange(a*b), c), bins.ravel()
np.add.at(out, advanced_indexing, 1)
out.reshape((a, b, -1))
>>> array([[[2, 1, 0, 0, 1],
[1, 1, 0, 1, 1],
[2, 0, 1, 0, 1]],
[[2, 0, 1, 0, 1],
[2, 0, 0, 1, 1],
[1, 0, 0, 1, 2]]])
Note that it adds an extra bin in each histogram and puts max values in it but I hope it's not hard to fix if you need.
Say I have a tensor and index:
x = torch.tensor([1,2,3,4,5])
idx = torch.tensor([0,2,4])
If I want to select all elements not in the index, I can manually define a Boolean mask like so:
mask = torch.ones_like(x)
mask[idx] = 0
x[mask]
is there a more elegant way of doing this?
i.e. a syntax where I can directly pass the indices as opposed to creating a mask e.g. something like:
x[~idx]
I couldn't find a satisfactory solution to finding the complement of a multi-dimensional tensor of indices and finally implemented my own. It can work on cuda and enjoys fast parallel computation.
def complement_idx(idx, dim):
"""
Compute the complement: set(range(dim)) - set(idx).
idx is a multi-dimensional tensor, find the complement for its trailing dimension,
all other dimension is considered batched.
Args:
idx: input index, shape: [N, *, K]
dim: the max index for complement
"""
a = torch.arange(dim, device=idx.device)
ndim = idx.ndim
dims = idx.shape
n_idx = dims[-1]
dims = dims[:-1] + (-1, )
for i in range(1, ndim):
a = a.unsqueeze(0)
a = a.expand(*dims)
masked = torch.scatter(a, -1, idx, 0)
compl, _ = torch.sort(masked, dim=-1, descending=False)
compl = compl.permute(-1, *tuple(range(ndim - 1)))
compl = compl[n_idx:].permute(*(tuple(range(1, ndim)) + (0,)))
return compl
Example:
>>> import torch
>>> a = torch.rand(3, 4, 5)
>>> a
tensor([[[0.7849, 0.7404, 0.4112, 0.9873, 0.2937],
[0.2113, 0.9923, 0.6895, 0.1360, 0.2952],
[0.9644, 0.9577, 0.2021, 0.6050, 0.7143],
[0.0239, 0.7297, 0.3731, 0.8403, 0.5984]],
[[0.9089, 0.0945, 0.9573, 0.9475, 0.6485],
[0.7132, 0.4858, 0.0155, 0.3899, 0.8407],
[0.2327, 0.8023, 0.6278, 0.0653, 0.2215],
[0.9597, 0.5524, 0.2327, 0.1864, 0.1028]],
[[0.2334, 0.9821, 0.4420, 0.1389, 0.2663],
[0.6905, 0.2956, 0.8669, 0.6926, 0.9757],
[0.8897, 0.4707, 0.5909, 0.6522, 0.9137],
[0.6240, 0.1081, 0.6404, 0.1050, 0.6413]]])
>>> b, c = torch.topk(a, 2, dim=-1)
>>> b
tensor([[[0.9873, 0.7849],
[0.9923, 0.6895],
[0.9644, 0.9577],
[0.8403, 0.7297]],
[[0.9573, 0.9475],
[0.8407, 0.7132],
[0.8023, 0.6278],
[0.9597, 0.5524]],
[[0.9821, 0.4420],
[0.9757, 0.8669],
[0.9137, 0.8897],
[0.6413, 0.6404]]])
>>> c
tensor([[[3, 0],
[1, 2],
[0, 1],
[3, 1]],
[[2, 3],
[4, 0],
[1, 2],
[0, 1]],
[[1, 2],
[4, 2],
[4, 0],
[4, 2]]])
>>> compl = complement_idx(c, 5)
>>> compl
tensor([[[1, 2, 4],
[0, 3, 4],
[2, 3, 4],
[0, 2, 4]],
[[0, 1, 4],
[1, 2, 3],
[0, 3, 4],
[2, 3, 4]],
[[0, 3, 4],
[0, 1, 3],
[1, 2, 3],
[0, 1, 3]]])
>>> al = torch.cat([c, compl], dim=-1)
>>> al
tensor([[[3, 0, 1, 2, 4],
[1, 2, 0, 3, 4],
[0, 1, 2, 3, 4],
[3, 1, 0, 2, 4]],
[[2, 3, 0, 1, 4],
[4, 0, 1, 2, 3],
[1, 2, 0, 3, 4],
[0, 1, 2, 3, 4]],
[[1, 2, 0, 3, 4],
[4, 2, 0, 1, 3],
[4, 0, 1, 2, 3],
[4, 2, 0, 1, 3]]])
>>> al, _ = al.sort(dim=-1)
>>> al
tensor([[[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]],
[[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]],
[[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]]])
You may want to try the single-line expression:
x[np.setdiff1d(range(len(x)), idx)]
Though it seems also not elegant:).
I want to generate a fixed number of random column indexes (without replacement) for each row of a numpy array.
A = np.array([[3, 5, 2, 3, 3],
[1, 3, 3, 4, 5],
[3, 5, 4, 2, 1],
[1, 2, 3, 5, 3]])
If I fixed the required column number to 2, I want something like
np.array([[1,3],
[0,4],
[1,4],
[2,3]])
I am looking for a non-loop Numpy based solution. I tried with choice, but with the replacement=False I get error
ValueError: Cannot take a larger sample than population when
'replace=False'
Here's one vectorized approach inspired by this post -
def random_unique_indexes_per_row(A, N=2):
m,n = A.shape
return np.random.rand(m,n).argsort(1)[:,:N]
Sample run -
In [146]: A
Out[146]:
array([[3, 5, 2, 3, 3],
[1, 3, 3, 4, 5],
[3, 5, 4, 2, 1],
[1, 2, 3, 5, 3]])
In [147]: random_unique_indexes_per_row(A, N=2)
Out[147]:
array([[4, 0],
[0, 1],
[3, 2],
[2, 0]])
In [148]: random_unique_indexes_per_row(A, N=3)
Out[148]:
array([[2, 0, 1],
[3, 4, 2],
[3, 2, 1],
[4, 3, 0]])
Like this?
B = np.random.randint(5, size=(len(A), 2))
You can use random.choice() as following:
def random_indices(arr, n):
x, y = arr.shape
return np.random.choice(np.arange(y), (x, n))
# or return np.random.randint(low=0, high=y, size=(x, n))
Demo:
In [34]: x, y = A.shape
In [35]: np.random.choice(np.arange(y), (x, 2))
Out[35]:
array([[0, 2],
[0, 1],
[0, 1],
[3, 1]])
As an experimental approach here is a way that in 99% of the times will give unique indices:
In [60]: def random_ind(arr, n):
...: x, y = arr.shape
...: ind = np.random.randint(low=0, high=y, size=(x * 2, n))
...: _, index = np.unique(ind.dot(np.random.rand(ind.shape[1])), return_index=True)
...: return ind[index][:4]
...:
...:
...:
In [61]: random_ind(A, 2)
Out[61]:
array([[0, 1],
[1, 0],
[1, 1],
[1, 4]])
In [62]: random_ind(A, 2)
Out[62]:
array([[1, 0],
[2, 0],
[2, 1],
[3, 1]])
In [64]: random_ind(A, 3)
Out[64]:
array([[0, 0, 0],
[1, 1, 2],
[0, 4, 1],
[2, 3, 1]])
In [65]: random_ind(A, 4)
Out[65]:
array([[0, 4, 0, 3],
[1, 0, 1, 4],
[0, 4, 1, 2],
[3, 0, 1, 0]])
This function will return IndexError at line return ind[index][:4] if there's no 4 unique items in that case you can repeat the function to make sure you'll get the desire result.
I have a 2D array with filled with some values (column 0) and zeros (rest of the columns). I would like to do pretty much the same as I do with MS excel but using numpy, meaning to put into the rest of the columns values from calculations based on the first column. Here it is a MWE:
import numpy as np
a = np.zeros(20, dtype=np.int8).reshape(4,5)
b = [1, 2, 3, 4]
b = np.array(b)
a[:, 0] = b
# don't change the first column
for column in a[:, 1:]:
a[:, column] = column[0]+1
The expected output:
array([[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[4, 5, 6, 7, 8]], dtype=int8)
The resulting output:
array([[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0]], dtype=int8)
Any help would be appreciated.
Looping is slow and there is no need to loop to produce the array that you want:
>>> a = np.ones(20, dtype=np.int8).reshape(4,5)
>>> a[:, 0] = b
>>> a
array([[1, 1, 1, 1, 1],
[2, 1, 1, 1, 1],
[3, 1, 1, 1, 1],
[4, 1, 1, 1, 1]], dtype=int8)
>>> np.cumsum(a, axis=1)
array([[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[4, 5, 6, 7, 8]])
What went wrong
Let's start, as in the question, with this array:
>>> a
array([[1, 0, 0, 0, 0],
[2, 0, 0, 0, 0],
[3, 0, 0, 0, 0],
[4, 0, 0, 0, 0]], dtype=int8)
Now, using the code from the question, let's do the loop and see what column actually is:
>>> for column in a[:, 1:]:
... print(column)
...
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
As you can see, column is not the index of the column but the actual values in the column. Consequently, the following does not do what you would hope:
a[:, column] = column[0]+1
Another method
If we want to loop (so that we can do something more complex), here is another approach to generating the desired array:
>>> b = np.array([1, 2, 3, 4])
>>> np.column_stack([b+i for i in range(5)])
array([[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[4, 5, 6, 7, 8]])
Your usage of column is a little ambiguous: in for column in a[:, 1:], it is treated as a column and in the body, however, it is treated as index to the column. You can try this instead:
for column in range(1, a.shape[1]):
a[:, column] = a[:, column-1]+1
a
#array([[1, 2, 3, 4, 5],
# [2, 3, 4, 5, 6],
# [3, 4, 5, 6, 7],
# [4, 5, 6, 7, 8]], dtype=int8)
Suppose I have the following numpy arrays:
>>a
array([[0, 0, 2],
[2, 0, 1],
[2, 2, 1]])
>>b
array([[2, 2, 0],
[2, 0, 2],
[1, 1, 2]])
that I then vertically stack
c=np.dstack((a,b))
resulting in:
>>c
array([[[0, 2],
[0, 2],
[2, 0]],
[[2, 2],
[0, 0],
[1, 2]],
[[2, 1],
[2, 1],
[1, 2]]])
From this I wish to, for each 3rd dimension of c, check which combination is present in this subarray, and then number it accordingingly with the index of the list-match. I've tried the following, but it is not working. The algorithm is simple enough with double for-loops, but because c is very large, it is prohibitively slow.
classes=[(0,0),(2,1),(2,2)]
out=np.select( [h==c for h in classes], range(len(classes)), default=-1)
My desired output would be
out = [[-1,-1,-1],
[3, 1,-1],
[2, 2,-1]]
How about this:
(np.array([np.array(h)[...,:] == c for h in classes]).all(axis = -1) *
(2 + np.arange(len(classes)))[:, None, None]).max(axis=0) - 1
It returns, what you actually need
array([[-1, -1, -1],
[ 3, 1, -1],
[ 2, 2, -1]])
You can test the a and b arrays separately like this:
clsa = (0,2,2)
clesb = (0,1,2)
np.select ( [(ca==a) & (cb==b) for ca,cb in zip (clsa, clsb)], range (3), default = -1)
which gets your desired result (except returns 0,1,2 instead of 1,2,3).
Here is another way to get what you want, thought I would post it in case it's useful to anyone.
import numpy as np
a = np.array([[0, 0, 2],
[2, 0, 1],
[2, 2, 1]])
b = np.array([[2, 2, 0],
[2, 0, 2],
[1, 1, 2]])
classes=[(0,0),(2,1),(2,2)]
c = np.empty(a.shape, dtype=[('a', a.dtype), ('b', b.dtype)])
c['a'] = a
c['b'] = b
classes = np.array(classes, dtype=c.dtype)
classes.sort()
out = classes.searchsorted(c)
out = np.where(c == classes[out], out+1, -1)
print out
#array([[-1, -1, -1]
# [ 3, 1, -1]
# [ 2, 1, -1]])