Related
I want to define a 3d numpy array with shape = [3, 3, 5] and also with values as a range starting with 11 and step = 3 in column-wise manner. I mean:
B[:,:,0] = [[ 11, 20, 29],
[ 14, 23, 32],
[17, 26, 35]]
B[:,:,1] = [[ 38, ...],
[ 41, ...],
[ 44, ...]]
...
I am new to numpy and I doubt doing it with np.arange or np.mgrid maybe. but I don't how to do.
How can this be done with minimal lines of code?
You can calculate the end of the range by multiplying the shape by the step and adding the start. Then it's just reshape and transpose to move the column around:
start = 11
step = 3
shape = [5, 3, 3]
end = np.prod(shape) * step + start
B = np.arange(start, end, step).reshape([5, 3, 3]).transpose(2, 1, 0)
B[:, :, 0]
# array([[11, 20, 29],
# [14, 23, 32],
# [17, 26, 35]])
Having an array A with the shape (2,6, 60), is it possible to index it based on a binary array B of shape (6,)?
The 6 and 60 is quite arbitrary, they are simply the 2D data I wish to access.
The underlying thing I am trying to do is to calculate two variants of the 2D data (in this case, (6,60)) and then efficiently select the ones with the lowest total sum - that is where the binary (6,) array comes from.
Example: For B = [1,0,1,0,1,0] what I wish to receive is equal to stacking
A[1,0,:]
A[0,1,:]
A[1,2,:]
A[0,3,:]
A[1,4,:]
A[0,5,:]
but I would like to do it by direct indexing and not a for-loop.
I have tried A[B], A[:,B,:], A[B,:,:] A[:,:,B] with none of them providing the desired (6,60) matrix.
import numpy as np
A = np.array([[4, 4, 4, 4, 4, 4], [1, 1, 1, 1, 1, 1]])
A = np.atleast_3d(A)
A = np.tile(A, (1,1,60)
B = np.array([1, 0, 1, 0, 1, 0])
A[B]
Expected results are a (6,60) array containing the elements from A as described above, the received is either (2,6,60) or (6,6,60).
Thank you in advance,
Linus
You can generate a range of the indices you want to iterate over, in your case from 0 to 5:
count = A.shape[1]
indices = np.arange(count) # np.arange(6) for your particular case
>>> print(indices)
array([0, 1, 2, 3, 4, 5])
And then you can use that to do your advanced indexing:
result_array = A[B[indices], indices, :]
If you always use the full range from 0 to length - 1 (i.e. 0 to 5 in your case) of the second axis of A in increasing order, you can simplify that to:
result_array = A[B, indices, :]
# or the ugly result_array = A[B, np.arange(A.shape[1]), :]
Or even this if it's always 6:
result_array = A[B, np.arange(6), :]
An alternative solution using np.take_along_axis (from version 1.15 - docs)
import numpy as np
x = np.arange(2*6*6).reshape((2,6,6))
m = np.zeros(6, int)
m[0] = 1
#example: [1, 0, 0, 0, 0, 0]
np.take_along_axis(x, m[None, :, None], 0) #add dimensions to mask to match array dimensions
>>array([[[36, 37, 38, 39, 40, 41],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]]])
Suppose I have a numpy array img, with img.shape == (468,832,3). What does img[::2, ::2] do? It reduces the shape to (234,416,3) Can you please explain the logic?
Let's read documentation together (Source).
(Just read the bold part first)
The basic slice syntax is i:j:k where i is the starting index, j is the stopping index, and k is the step (k \neq 0). This selects the m elements (in the corresponding dimension) with index values i, i + k, ..., i + (m - 1) k where m = q + (r\neq0) and q and r are the quotient and remainder obtained by dividing j - i by k: j - i = q k + r, so that i + (m - 1) k < j.
...
Assume n is the number of elements in the dimension being sliced.
Then, if i is not given it defaults to 0 for k > 0 and n - 1 for k < 0
. If j is not given it defaults to n for k > 0 and -n-1 for k < 0 . If
k is not given it defaults to 1. Note that :: is the same as : and
means select all indices along this axis.
Now looking at your part.
[::2, ::2] will be translated to [0:468:2, 0:832:2] because you do not specify the first two or i and j in the documentation. (You only specify k here. Recall the i:j:k notation above.) You select elements on these axes at the step size 2 which means you select every other elements along the axes specified.
Because you did not specify for the 3rd dimension, all will be selected.
It slices every alternate row, and then every alternate column, from an array, returning an array of size (n // 2, n // 2, ...).
Here's an example of slicing with a 2D array -
>>> a = np.arange(16).reshape(4, 4)
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
>>> a[::2, ::2]
array([[ 0, 2],
[ 8, 10]])
And, here's another example with a 3D array -
>>> a = np.arange(27).reshape(3, 3, 3)
>>> a
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])
>>> a[::2, ::2] # same as a[::2, ::2, :]
array([[[ 0, 1, 2],
[ 6, 7, 8]],
[[18, 19, 20],
[24, 25, 26]]])
Well, we have the RGB image as a 3D array of shape:
img.shape=(468,832,3)
Now, what does img[::2, ::2] do?
we're just downsampling the image (i.e. we're shrinking the image size by half by taking only every other pixel from the original image and we do this by using a step size of 2, which means to skip one pixel). This should be clear from the example below.
Let's take a simple grayscale image for easier understanding.
In [13]: arr
Out[13]:
array([[10, 11, 12, 13, 14, 15],
[20, 21, 22, 23, 24, 25],
[30, 31, 32, 33, 34, 35],
[40, 41, 42, 43, 44, 45],
[50, 51, 52, 53, 54, 55],
[60, 61, 62, 63, 64, 65]])
In [14]: arr.shape
Out[14]: (6, 6)
In [15]: arr[::2, ::2]
Out[15]:
array([[10, 12, 14],
[30, 32, 34],
[50, 52, 54]])
In [16]: arr[::2, ::2].shape
Out[16]: (3, 3)
Notice which pixels are in the sliced version. Also, observe how the array shape changes after slicing (i.e. it is reduced by half).
Now, this downsampling happens for all three channels in the image since there's no slicing happening in the third axis. Thus, you will get the shape reduced only for the first two axis in your example.
(468, 832, 3)
. . |
. . |
(234, 416, 3)
In python/numpy, how can I subset a multidimensional array where another one, of the same shape, is maximum along some axis (e.g. the first one)?
Suppose I have two 3*2*4 arrays, a and b. I want to obtain a 2*4 array containing the values of b at the locations where a has its maximal values along the first axis.
import numpy as np
np.random.seed(7)
a = np.random.rand(3*2*4).reshape((3,2,4))
b = np.random.rand(3*2*4).reshape((3,2,4))
print a
#[[[ 0.07630829 0.77991879 0.43840923 0.72346518]
# [ 0.97798951 0.53849587 0.50112046 0.07205113]]
#
# [[ 0.26843898 0.4998825 0.67923 0.80373904]
# [ 0.38094113 0.06593635 0.2881456 0.90959353]]
#
# [[ 0.21338535 0.45212396 0.93120602 0.02489923]
# [ 0.60054892 0.9501295 0.23030288 0.54848992]]]
print a.argmax(axis=0) #(I would like b at these locations along axis0)
#[[1 0 2 1]
# [0 2 0 1]]
I can do this really ugly manual subsetting:
index = zip(a.argmax(axis=0).flatten(),
[0]*a.shape[2]+[1]*a.shape[2], # a.shape[2] = 4 here
range(a.shape[2])+range(a.shape[2]))
# [(1, 0, 0), (0, 0, 1), (2, 0, 2), (1, 0, 3),
# (0, 1, 0), (2, 1, 1), (0, 1, 2), (1, 1, 3)]
Which would allow me to obtain my desired result:
b_where_a_is_max_along0 = np.array([b[i] for i in index]).reshape(2,4)
# For verification:
print a.max(axis=0) == np.array([a[i] for i in index]).reshape(2,4)
#[[ True True True True]
# [ True True True True]]
What is the smart, numpy way to achieve this? Thanks :)
Use advanced-indexing -
m,n = a.shape[1:]
b_out = b[a.argmax(0),np.arange(m)[:,None],np.arange(n)]
Sample run -
Setup input array a and get its argmax along first axis -
In [185]: a = np.random.randint(11,99,(3,2,4))
In [186]: idx = a.argmax(0)
In [187]: idx
Out[187]:
array([[0, 2, 1, 2],
[0, 1, 2, 0]])
In [188]: a
Out[188]:
array([[[49*, 58, 13, 69], # * are the max positions
[94*, 28, 55, 86*]],
[[34, 17, 57*, 50],
[48, 73*, 22, 80]],
[[19, 89*, 42, 71*],
[24, 12, 66*, 82]]])
Verify results with b -
In [193]: b
Out[193]:
array([[[18*, 72, 35, 51], # Mark * at the same positions in b
[74*, 57, 50, 84*]], # and verify
[[58, 92, 53*, 65],
[51, 95*, 43, 94]],
[[85, 23*, 13, 17*],
[17, 64, 35*, 91]]])
In [194]: b[a.argmax(0),np.arange(2)[:,None],np.arange(4)]
Out[194]:
array([[18, 23, 53, 17],
[74, 95, 35, 84]])
You could use ogrid
>>> x = np.random.random((2,3,4))
>>> x
array([[[ 0.87412737, 0.11069105, 0.86951092, 0.74895912],
[ 0.48237622, 0.67502597, 0.11935148, 0.44133397],
[ 0.65169681, 0.21843482, 0.52877862, 0.72662927]],
[[ 0.48979028, 0.97103611, 0.36459645, 0.80723839],
[ 0.90467511, 0.79118429, 0.31371856, 0.99443492],
[ 0.96329039, 0.59534491, 0.15071331, 0.52409446]]])
>>> y = np.argmax(x, axis=1)
>>> y
array([[0, 1, 0, 0],
[2, 0, 0, 1]])
>>> i, j = np.ogrid[:2,:4]
>>> x[i ,y, j]
array([[ 0.87412737, 0.67502597, 0.86951092, 0.74895912],
[ 0.96329039, 0.97103611, 0.36459645, 0.99443492]])
I create two matrices
import numpy as np
arrA = np.zeros((9000,3))
arrB = np.zerros((9000,6))
I want to concatenate pieces of those matrices.
But when I try to do:
arrC = np.hstack((arrA, arrB[:,1]))
I get an error:
ValueError: all the input arrays must have same number of dimensions
I guess it's because np.shape(arrB[:,1]) is equal (9000,) instead of (9000,1), but I cannot figure out how to resolve it.
Could you please comment on this issue?
You could preserve dimensions by passing a list of indices, not an index:
>>> arrB[:,1].shape
(9000,)
>>> arrB[:,[1]].shape
(9000, 1)
>>> out = np.hstack([arrA, arrB[:,[1]]])
>>> out.shape
(9000, 4)
This is easier to see visually.
Assume:
>>> arrA=np.arange(9000*3).reshape(9000,3)
>>> arrA
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
...,
[26991, 26992, 26993],
[26994, 26995, 26996],
[26997, 26998, 26999]])
>>> arrB=np.arange(9000*6).reshape(9000,6)
>>> arrB
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[ 12, 13, 14, 15, 16, 17],
...,
[53982, 53983, 53984, 53985, 53986, 53987],
[53988, 53989, 53990, 53991, 53992, 53993],
[53994, 53995, 53996, 53997, 53998, 53999]])
If you take a slice of arrB, you are producing a series that looks more like a row:
>>> arrB[:,1]
array([ 1, 7, 13, ..., 53983, 53989, 53995])
What you need is a column the same shape as a column to add to arrA:
>>> arrB[:,[1]]
array([[ 1],
[ 7],
[ 13],
...,
[53983],
[53989],
[53995]])
Then hstack works as expected:
>>> arrC=np.hstack((arrA, arrB[:,[1]]))
>>> arrC
array([[ 0, 1, 2, 1],
[ 3, 4, 5, 7],
[ 6, 7, 8, 13],
...,
[26991, 26992, 26993, 53983],
[26994, 26995, 26996, 53989],
[26997, 26998, 26999, 53995]])
An alternate form is to specify -1 in one dimension and the number of rows or cols desired as the other in .reshape():
>>> arrB[:,1].reshape(-1,1) # one col
array([[ 1],
[ 7],
[ 13],
...,
[53983],
[53989],
[53995]])
>>> arrB[:,1].reshape(-1,6) # 6 cols
array([[ 1, 7, 13, 19, 25, 31],
[ 37, 43, 49, 55, 61, 67],
[ 73, 79, 85, 91, 97, 103],
...,
[53893, 53899, 53905, 53911, 53917, 53923],
[53929, 53935, 53941, 53947, 53953, 53959],
[53965, 53971, 53977, 53983, 53989, 53995]])
>>> arrB[:,1].reshape(2,-1) # 2 rows
array([[ 1, 7, 13, ..., 26983, 26989, 26995],
[27001, 27007, 27013, ..., 53983, 53989, 53995]])
There is more on array shaping and stacking here
I would try something like this:
np.vstack((arrA.transpose(), arrB[:,1])).transpose()
There several ways of making your selection from arrB a (9000,1) array:
np.hstack((arrA,arrB[:,[1]]))
np.hstack((arrA,arrB[:,1][:,None]))
np.hstack((arrA,arrB[:,1].reshape(9000,1)))
np.hstack((arrA,arrB[:,1].reshape(-1,1)))
One uses the concept of indexing with an array or list, the next adds a new axis (e.g. np.newaxis), the third uses reshape. These are all basic numpy array manipulation tasks.