I have a 3D numpy array data where dimensions a and b represent the resolution of an image and c is the image/frame number. I want to call np.histogram on each pixel (a and b combination) across the c dimension, with an output array of dimension (a, b, BINS). I've accomplished this task with a nested loop, but how can I vectorize this operation?
hists = np.zeros((a, b, BINS))
for row in range(a):
for column in range(b):
hists[row, column, :] = np.histogram(data[row, column, :], bins=BINS)[0]
I am confident that the solution is trivial, nonetheless all help is appreciated :)
np.histogram computes over the flattened array.
However, you could use np.apply_along_axis.
np.apply_along_axis(lambda a: np.histogram(a, bins=BINS)[0], 2, data)
This is interesting problem.
Make a Minimal Working Example (MWE)
It should be the main habit in asking questions on SO.
a, b, c = 2, 3, 4
data = np.random.randint(10, size=(a, b, c))
hists = np.zeros((a, b, c), dtype=int)
for row in range(a):
for column in range(b):
hists[row, column, :] = np.histogram(data[row, column, :], bins=c)[0]
>>> array([[[6, 4, 3, 3],
[7, 3, 8, 0],
[1, 5, 8, 0]],
[[5, 5, 7, 8],
[3, 2, 7, 8],
[6, 8, 8, 0]]])
>>> array([[[2, 1, 0, 1],
[1, 1, 0, 2],
[2, 0, 1, 1]],
[[2, 0, 1, 1],
[2, 0, 0, 2],
[1, 0, 0, 3]]])
Make it as simple as possible (but still working)
You can eliminate one loop and simplify it:
new_data = data.reshape(a*b, c)
new_hists = np.zeros((a*b, c), dtype=int)
for row in range(a*b):
new_hists[row, :] = np.histogram(new_data[row, :], bins=c)[0]
>>> array([[2, 1, 0, 1],
[1, 1, 0, 2],
[2, 0, 1, 1],
[2, 0, 1, 1],
[2, 0, 0, 2],
[1, 0, 0, 3]])
>>> array([[6, 4, 3, 3],
[7, 3, 8, 0],
[1, 5, 8, 0],
[5, 5, 7, 8],
[3, 2, 7, 8],
[6, 8, 8, 0]])
Can you find a similar problems and use keypoints of their solution?
In general, you can't vectorise something like that is being done in loop:
for row in array:
Except the cases you can call another vectorised operation on flattened array and then move it back to the initial shape:
arr = array.ravel()
out = arr.reshape(array.shape)
It looks you're fortunate with np.histogram because I'm pretty sure similar things have been done before.
Final solution
new_data = data.reshape(a*b, c)
m, M = new_data.min(axis=1), new_data.max(axis=1)
bins = (c * (new_data - m[:, None]) // (M-m)[:, None])
out = np.zeros((a*b, c+1), dtype=int)
advanced_indexing = np.repeat(np.arange(a*b), c), bins.ravel()
np.add.at(out, advanced_indexing, 1)
out.reshape((a, b, -1))
>>> array([[[2, 1, 0, 0, 1],
[1, 1, 0, 1, 1],
[2, 0, 1, 0, 1]],
[[2, 0, 1, 0, 1],
[2, 0, 0, 1, 1],
[1, 0, 0, 1, 2]]])
Note that it adds an extra bin in each histogram and puts max values in it but I hope it's not hard to fix if you need.
Given array:
A = array([[[1, 2, 3, 1],
[4, 5, 6, 2],
[7, 8, 9, 3]]])
I obtain the following array in the forward pass with a downsampling factor of k-1:
k = 3
B = A[...,::k]
array([[[1, 1],
[4, 2],
[7, 3]]])
In the backward pass I want to be able to come back to my original shape, with an output of:
array([[[1, 0, 0, 1],
[4, 0, 0, 2],
[7, 0, 0, 3]]])
You can use numpy.zeros to initialize the output and indexing:
shape = list(B.shape)
shape[-1] = k*(shape[-1]-1)+1
# [1, 3, 4]
A2 = np.zeros(shape, dtype=B.dtype)
A2[..., ::k] = B
array([[[1, 0, 0, 1],
[4, 0, 0, 2],
[7, 0, 0, 3]]])
using A:
A2 = np.zeros_like(A)
A2[..., ::k] = B
# or directly
# A2[..., ::k] = A[..., ::k]
Suppose I have a Tensor like
a = torch.tensor([[3, 1, 5, 0, 4, 2],
[2, 1, 3, 4, 5, 0],
[0, 4, 5, 1, 2, 3],
[3, 1, 4, 5, 0, 2],
[3, 5, 4, 2, 0, 1],
[5, 3, 0, 4, 1, 2]])
and I want to reorganize the rows of the tensor by applying the transformation a[c] where
c = torch.tensor([0,2,4,1,3,5])
to get
b = torch.tensor([[3, 1, 5, 0, 4, 2],
[0, 4, 5, 1, 2, 3],
[3, 5, 4, 2, 0, 1],
[2, 1, 3, 4, 5, 0],
[3, 1, 4, 5, 0, 2],
[5, 3, 0, 4, 1, 2]])
For doing it, I want to generate the tensor c so that I can do this transformation irrespective of the size of tensor a and the stepping size (which I have taken to be equal to 2 in this example for simplicity). Can anyone let me know how do I generate such a tensor for the general case without using an explicit for loop in PyTorch?
You can use torch.index_select, so:
b = torch.index_select(a, 0, c)
The explanation in the official docs is pretty clear.
I also came up with another solution, which solves the above problem of reorganizing the rows of tensor a to generate tensor b without generating the indices array c
step = 2
b = a.view(-1,step,a.size(-1)).transpose(0,1).reshape(-1,a.size(-1))
Thinking for a little longer, I came up with the below solution for generation of the indices
step = 2
idx = torch.arange(0,a.size(0),step)
# idx = tensor([0, 2, 4])
idx = idx.repeat(int(a.size(0)/idx.size(0)))
# idx = tensor([0, 2, 4, 0, 2, 4])
incr = torch.arange(0,step)
# incr = tensor([0, 1])
incr = incr.repeat_interleave(int(a.size(0)/incr.size(0)))
# incr = tensor([0, 0, 0, 1, 1, 1])
c = incr + idx
# c = tensor([0, 2, 4, 1, 3, 5])
After this, the tensor c can be used to get the tensor b by using
b = a[c.long()]
Is there an elegant/quick way to reproduce this without the for loops? I'm looking to have a 3D matrix of values, and and 2D matrix that gives the indices for which to copy the 3rd dimensions' values while creating a new 3D matrix of the same shape. Here is an implementation with a lot of loops.
x = np.random.randint(5, size=(2, 3, 4))
y = np.random.randint(x.shape[1], size=(3, 4))
z = np.zeros((2, 3, 4))
for i in range(x.shape[0]):
for j in range(x.shape[1]):
z[i, j, :] = x[i, y[i, j], :]
This puzzled me for a bit, until I realized you aren't using all of y. y is (3,4), but you are indexing over (2,3):
In [28]: x[np.arange(2)[:,None], y[:2,:3],:]
array([[[4, 0, 0, 4],
[4, 0, 3, 3],
[3, 1, 3, 2]],
[[3, 0, 3, 0],
[2, 1, 0, 1],
[1, 0, 1, 4]]])
We could use all of y with:
In [32]: x[np.arange(2)[:,None,None],y,np.arange(4)]
array([[[4, 0, 3, 2],
[4, 0, 3, 2],
[3, 0, 0, 3]],
[[3, 1, 1, 4],
[3, 1, 1, 4],
[1, 1, 3, 1]]])
the 3 indexes broadcast to (2,3,4). But the selection is different from your z.
In numpy, I would like to be able to input n for rows and m for columns and end with the array that looks like:
So that would be a 3x4. Each column is just a copy of the previous one and the row increases by one each time. As an example:
input would be 4, then 6 and the output would be and array
4 rows and 6 columns where the row increases by one each time. Thanks for your time.
So many possibilities...
In [51]: n = 4
In [52]: m = 6
In [53]: np.tile(np.arange(n), (m, 1)).T
array([[0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3]])
In [54]: np.repeat(np.arange(n).reshape(-1,1), m, axis=1)
array([[0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3]])
In [55]: np.outer(np.arange(n), np.ones(m, dtype=int))
array([[0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3]])
Here's one more. The neat trick here is that the values are not duplicated--only memory for the single sequence [0, 1, 2, ..., n-1] is allocated.
In [67]: from numpy.lib.stride_tricks import as_strided
In [68]: seq = np.arange(n)
In [69]: rep = as_strided(seq, shape=(n,m), strides=(seq.strides[0],0))
In [70]: rep
array([[0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3]])
Be careful with the as_strided function. If you don't get the arguments right, you can crash Python.
To see that seq has not been copied, change seq in place, and then check rep:
In [71]: seq[1] = 99
In [72]: rep
array([[ 0, 0, 0, 0, 0, 0],
[99, 99, 99, 99, 99, 99],
[ 2, 2, 2, 2, 2, 2],
[ 3, 3, 3, 3, 3, 3]])
import numpy as np
def foo(n, m):
return np.array([np.arange(n)] * m).T
Natively (no Python lists):
rows, columns = 4, 6
numpy.arange(rows).reshape(-1, 1).repeat(columns, axis=1)
#>>> array([[0, 0, 0, 0, 0, 0],
#>>> [1, 1, 1, 1, 1, 1],
#>>> [2, 2, 2, 2, 2, 2],
#>>> [3, 3, 3, 3, 3, 3]])
You can easily do this using built in python functions. The program counts to 3 converting each number to a string and repeats the string 6 times.
print [6*str(n) for n in range(0,4)]
Here is the output.
ks-MacBook-Pro:~ kyle$ pbpaste | python
['000000', '111111', '222222', '333333']
On more for fun
np.zeros((n, m), dtype=np.int) + np.arange(n, dtype=np.int)[:,None]
As has been mentioned, there are many ways to do this.
Here's what I'd do:
import numpy as np
def makearray(m, n):
A = np.empty((m,n))
A.T[:] = np.arange(m)
return A
Here's an amusing alternative that will work if you aren't going to be changing the contents of the array.
It should save some memory.
Be careful though because this doesn't allocate a full array, it will have multiple entries pointing to the same memory address.
import numpy as np
from numpy.lib.stride_tricks import as_strided
def makearray(m, n):
A = np.arange(m)
return as_strided(A, strides=(A.strides[0],0), shape=(m,n))
In either case, as I have written them, a 3x4 array can be created by makearray(3, 4)
Using count from the built-in module itertools:
>>> from itertools import count
>>> rows = 4
>>> columns = 6
>>> cnt = count()
>>> [[cnt.next()]*columns for i in range(rows)]
[[0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1], [2, 2, 2, 2, 2, 2], [3, 3, 3, 3, 3, 3]]
you can simply
>>> nc=5
>>> nr=4
>>> [[k]*nc for k in range(nr)]
[[0, 0, 0, 0, 0], [1, 1, 1, 1, 1], [2, 2, 2, 2, 2], [3, 3, 3, 3, 3]]
Several other possibilities using a (n,1) array
a = np.arange(n)[:,None] (or np.arange(n).reshape(-1,1))
If used with a (m,) array, just leave it (n,1), and let broadcasting expand it for you.
Suppose I have the following numpy arrays:
array([[0, 0, 2],
[2, 0, 1],
[2, 2, 1]])
array([[2, 2, 0],
[2, 0, 2],
[1, 1, 2]])
that I then vertically stack
resulting in:
array([[[0, 2],
[0, 2],
[2, 0]],
[[2, 2],
[0, 0],
[1, 2]],
[[2, 1],
[2, 1],
[1, 2]]])
From this I wish to, for each 3rd dimension of c, check which combination is present in this subarray, and then number it accordingingly with the index of the list-match. I've tried the following, but it is not working. The algorithm is simple enough with double for-loops, but because c is very large, it is prohibitively slow.
out=np.select( [h==c for h in classes], range(len(classes)), default=-1)
My desired output would be
out = [[-1,-1,-1],
[3, 1,-1],
[2, 2,-1]]
How about this:
(np.array([np.array(h)[...,:] == c for h in classes]).all(axis = -1) *
(2 + np.arange(len(classes)))[:, None, None]).max(axis=0) - 1
It returns, what you actually need
array([[-1, -1, -1],
[ 3, 1, -1],
[ 2, 2, -1]])
You can test the a and b arrays separately like this:
clsa = (0,2,2)
clesb = (0,1,2)
np.select ( [(ca==a) & (cb==b) for ca,cb in zip (clsa, clsb)], range (3), default = -1)
which gets your desired result (except returns 0,1,2 instead of 1,2,3).
Here is another way to get what you want, thought I would post it in case it's useful to anyone.
import numpy as np
a = np.array([[0, 0, 2],
[2, 0, 1],
[2, 2, 1]])
b = np.array([[2, 2, 0],
[2, 0, 2],
[1, 1, 2]])
c = np.empty(a.shape, dtype=[('a', a.dtype), ('b', b.dtype)])
c['a'] = a
c['b'] = b
classes = np.array(classes, dtype=c.dtype)
out = classes.searchsorted(c)
out = np.where(c == classes[out], out+1, -1)
print out
#array([[-1, -1, -1]
# [ 3, 1, -1]
# [ 2, 1, -1]])