I create two matrices
import numpy as np
arrA = np.zeros((9000,3))
arrB = np.zerros((9000,6))
I want to concatenate pieces of those matrices.
But when I try to do:
arrC = np.hstack((arrA, arrB[:,1]))
I get an error:
ValueError: all the input arrays must have same number of dimensions
I guess it's because np.shape(arrB[:,1]) is equal (9000,) instead of (9000,1), but I cannot figure out how to resolve it.
Could you please comment on this issue?
You could preserve dimensions by passing a list of indices, not an index:
>>> arrB[:,1].shape
(9000,)
>>> arrB[:,[1]].shape
(9000, 1)
>>> out = np.hstack([arrA, arrB[:,[1]]])
>>> out.shape
(9000, 4)
This is easier to see visually.
Assume:
>>> arrA=np.arange(9000*3).reshape(9000,3)
>>> arrA
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
...,
[26991, 26992, 26993],
[26994, 26995, 26996],
[26997, 26998, 26999]])
>>> arrB=np.arange(9000*6).reshape(9000,6)
>>> arrB
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[ 12, 13, 14, 15, 16, 17],
...,
[53982, 53983, 53984, 53985, 53986, 53987],
[53988, 53989, 53990, 53991, 53992, 53993],
[53994, 53995, 53996, 53997, 53998, 53999]])
If you take a slice of arrB, you are producing a series that looks more like a row:
>>> arrB[:,1]
array([ 1, 7, 13, ..., 53983, 53989, 53995])
What you need is a column the same shape as a column to add to arrA:
>>> arrB[:,[1]]
array([[ 1],
[ 7],
[ 13],
...,
[53983],
[53989],
[53995]])
Then hstack works as expected:
>>> arrC=np.hstack((arrA, arrB[:,[1]]))
>>> arrC
array([[ 0, 1, 2, 1],
[ 3, 4, 5, 7],
[ 6, 7, 8, 13],
...,
[26991, 26992, 26993, 53983],
[26994, 26995, 26996, 53989],
[26997, 26998, 26999, 53995]])
An alternate form is to specify -1 in one dimension and the number of rows or cols desired as the other in .reshape():
>>> arrB[:,1].reshape(-1,1) # one col
array([[ 1],
[ 7],
[ 13],
...,
[53983],
[53989],
[53995]])
>>> arrB[:,1].reshape(-1,6) # 6 cols
array([[ 1, 7, 13, 19, 25, 31],
[ 37, 43, 49, 55, 61, 67],
[ 73, 79, 85, 91, 97, 103],
...,
[53893, 53899, 53905, 53911, 53917, 53923],
[53929, 53935, 53941, 53947, 53953, 53959],
[53965, 53971, 53977, 53983, 53989, 53995]])
>>> arrB[:,1].reshape(2,-1) # 2 rows
array([[ 1, 7, 13, ..., 26983, 26989, 26995],
[27001, 27007, 27013, ..., 53983, 53989, 53995]])
There is more on array shaping and stacking here
I would try something like this:
np.vstack((arrA.transpose(), arrB[:,1])).transpose()
There several ways of making your selection from arrB a (9000,1) array:
np.hstack((arrA,arrB[:,[1]]))
np.hstack((arrA,arrB[:,1][:,None]))
np.hstack((arrA,arrB[:,1].reshape(9000,1)))
np.hstack((arrA,arrB[:,1].reshape(-1,1)))
One uses the concept of indexing with an array or list, the next adds a new axis (e.g. np.newaxis), the third uses reshape. These are all basic numpy array manipulation tasks.
Related
Suppose there is a ndarray A = np.random.random([3, 5, 4]), and I have another index ndarray of size 3 x 4, whose entry is the index I want to select from the 1st axis (the axis of dimension being 5). How can I achieve it using pythonic code?
Example:
A = [[[0.95220166 0.49801865 0.83217126 0.33361628]
[0.31751156 0.85899736 0.81965214 0.62465746]
[0.69251917 0.83201231 0.6089141 0.36589825]
[0.96674647 0.6056233 0.45515703 0.90552863]
[0.94524208 0.42422369 0.91633385 0.53177495]]
[[0.02883774 0.18012477 0.64642352 0.21295456]
[0.88475705 0.76020851 0.6888415 0.47958142]
[0.17306953 0.94981064 0.91468365 0.37297622]
[0.75924232 0.27537972 0.68803293 0.0904176 ]
[0.14596762 0.70103752 0.06090593 0.07920207]]
[[0.11092702 0.58002663 0.13553706 0.89662211]
[0.09146413 0.86212582 0.65908978 0.2995175 ]
[0.29025485 0.60788672 0.98595003 0.06762369]
[0.56136928 0.09623415 0.20178919 0.46531331]
[0.28628325 0.28215312 0.39670151 0.68243605]]]
Indices
= [[3 1 2 1]
[3 2 0 4]
[3 3 1 2]]
Result_I_want
= [[0.96674647, 0.85899736, 0.6089141, 0.62465746]
[0.75924232, 0.94981064, 0.64642352, 0.07920207]
[0.56136928, 0.09623415, 0.65908978, 0.06762369]]
In [148]: A = np.arange(3*5*4).reshape([3, 5, 4])
In [151]: B = np.array([[3, 1, 2, 1],
...: [3, 2, 0, 4],
...: [3, 3, 1, 2]])
In [152]: B.shape
Out[152]: (3, 4)
In [153]: A.shape
Out[153]: (3, 5, 4)
Apply B to the middle dimension, and use arrays with shape (3,1) and (4,) for the other two. Together they broadcast to select a (3,4) array of elements.
In [154]: A[np.arange(3)[:,None],B,np.arange(4)]
Out[154]:
array([[12, 5, 10, 7],
[32, 29, 22, 39],
[52, 53, 46, 51]])
Try np.take_along_axis:
A = np.arange(3*5*4).reshape([3, 5, 4])
# B is the same as your sample data
np.take_along_axis(A, B[:,None,:], axis=1).reshape(B.shape)
Output:
array([[12, 5, 10, 7],
[32, 29, 22, 39],
[52, 53, 46, 51]])
Lets say I have an array of the below nature:
x = arange(30).reshape((10,3))
x
Out[52]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17],
[18, 19, 20],
[21, 22, 23],
[24, 25, 26],
[27, 28, 29]])
How do I add a fourth column to each of the row such that this column is an exponential function of the index number and ends up with something like this:
array([[ 0, 1, 2, 2.718281828],
[ 3, 4, 5, 7.389056099], ,
[ 6, 7, 8, 20.08553692],
[ 9, 10, 11, 54.59815003 ],
[12, 13, 14, 148.4131591],
[15, 16, 17, 403.4287935],
[18, 19, 20, 1096.633158 ],
[21, 22, 23, 2980.957987],
[24, 25, 26, 8103.083928],
[27, 28, 29, 22026.46579]])
Computing the exponential is easy:
ex = np.exp(np.arange(x.shape[0]) + 1)
What you want to do with it is a whole different story. Numpy doesn't allow heterogeneous arrays, unlike say pandas. So with the simple answer, your result will be float64 (x is most likely int64 or int32):
x = np.concatenate((x, ex[:, None]), axis=1)
An alternative is using structured arrays, which will let you preserve the input types:
d = [('', x.dtype)] * x.shape[1] + [('', ex.dtype)]
out = np.empty(ex.shape, dtype=d)
Bulk assignment is a bit tricky, but can be done with a view obtained from the raw ndarray constructor:
view = np.ndarray(buffer=out, dtype=x.dtype, shape=x.shape, strides=(out.dtype.itemsize, x.dtype.itemsize))
view[...] = x
np.ndarray(buffer=out, dtype=ex.dtype, shape=ex.shape, strides=(out.dtype.itemsize,), offset=x.strides[0])[:] = ex
A simpler approach would be to use recarray, as #PaulPanzer suggests:
out = np.core.records.fromarrays([*x.T, ex])
Try this:
import numpy as np
a = np.arange(30).reshape((10,3))
b = np.zeros((a.shape[0], a.shape[1] + 1))
b[:, :-1] = a
b[:, 3] = np.exp(np.arange(len(b)))
To create a single array of powers of e starting at one, you can use
powers = np.power(np.e, np.arange(10) + 1)
Which basically takes the number e and rases it to the powers given by array np.arange(10) + 1, i.e. the numbers [1...10].
You can then add this as an additional column by first reshaping it and then adding it using np.hstack.
powers = powers.reshape(-1, 1)
x = np.hstack((x, powers))
You can construct such column with:
>>> np.exp(np.arange(1, 11))
array([2.71828183e+00, 7.38905610e+00, 2.00855369e+01, 5.45981500e+01,
1.48413159e+02, 4.03428793e+02, 1.09663316e+03, 2.98095799e+03,
8.10308393e+03, 2.20264658e+04])
So we can first obtain the number of rows, and then use np.hstack:
rows = x.shape[0]
result = np.hstack((x, np.exp(np.arange(1, rows+1)).reshape(-1, 1)))
We then otain:
>>> np.hstack((x, np.exp(np.arange(1, 11)).reshape(-1, 1)))
array([[0.00000000e+00, 1.00000000e+00, 2.00000000e+00, 2.71828183e+00],
[3.00000000e+00, 4.00000000e+00, 5.00000000e+00, 7.38905610e+00],
[6.00000000e+00, 7.00000000e+00, 8.00000000e+00, 2.00855369e+01],
[9.00000000e+00, 1.00000000e+01, 1.10000000e+01, 5.45981500e+01],
[1.20000000e+01, 1.30000000e+01, 1.40000000e+01, 1.48413159e+02],
[1.50000000e+01, 1.60000000e+01, 1.70000000e+01, 4.03428793e+02],
[1.80000000e+01, 1.90000000e+01, 2.00000000e+01, 1.09663316e+03],
[2.10000000e+01, 2.20000000e+01, 2.30000000e+01, 2.98095799e+03],
[2.40000000e+01, 2.50000000e+01, 2.60000000e+01, 8.10308393e+03],
[2.70000000e+01, 2.80000000e+01, 2.90000000e+01, 2.20264658e+04]])
I have a 6x6 matrix: e.g. matrix A
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]])
I also have a 3x3x3 matrix: e.g. matrix B
array([[[ 1, 7, 2],
[ 5, 9, 3],
[ 2, 8, 6]],
[[ 3, 4, 6],
[ 6, 8, 9],
[ 4, 2, 8]],
[[ 6, 4, 7],
[ 8, 7, 8],
[ 4, 4, 7]]])
Finally, I have a 3x4x4 matrix C, (4 rows, 4 columns, 3 dimensions), that's empty (filled with 0s)
I want to multiply each "3rd dimension" of B (i.e. [1,:,:],[2,:,:],[3,:,:]) with A. However, for each dimension I want to multiply B in "windows", sliding by 1 each time across A till I cannot go further, at which point I move back to the beginning, slide 1 unit down and again sliding across one-by-one multiplying B with A, till the end, then move down and repeat till you don't go over the border. The results being stored in the respective "3rd dimension" of matrix C. So my result would be a [3x4x4] matrix.
Ex. (multiplication is dot product giving a scalar value, np.sum((np.multiply(x,y)))), so...
imagining B "overtop" of A, starting in the right corner, I multiply that 3x3 part of A with Bs [1x3x3] part storing the result in C...
referring to 1st unit (located in 1st row and 1st column) in the 1st dimension of C...
C[1,0,0] = 340. because [[0,1,2],[6,7,8],[12,13,4]] dot product [[1,7,2],[5,9,3],[2,8,6]]
sliding B matrix over by 1 on A, and storing my 2nd result in C...
C[1,0,1] = 383. because [[1,2,3],[7,8,9],[13,14,15]] dot product [[1,7,2],[5,9,3],[2,8,6]]
Then repeat this procedure of sliding across and down and across and ..., for B[2,:,:] and B[3,:,:] over A again, storing in C2,:,:] and C[3,:,:] respectively.
What is a good way to do this?
I think you're asking about 2D cross-correlation with three different kernels, rather than straightforward matrix multiplication.
The following piece of code is not the most efficient way to do this, but does this give you the answer you are looking for? I'm using scipy.signal.correlate2d to achieve 2D correlation here...
>>> from scipy.signal import correlate2d
>>> C = np.dstack([correlate2d(A, B[:, :, i], 'valid') for i in range(B.shape[2])])
>>> C.shape
(4, 4, 3)
>>> C
array([[[ 333, 316, 464],
[ 372, 369, 520],
[ 411, 422, 576],
[ 450, 475, 632]],
[[ 567, 634, 800],
[ 606, 687, 856],
[ 645, 740, 912],
[ 684, 793, 968]],
[[ 801, 952, 1136],
[ 840, 1005, 1192],
[ 879, 1058, 1248],
[ 918, 1111, 1304]],
[[1035, 1270, 1472],
[1074, 1323, 1528],
[1113, 1376, 1584],
[1152, 1429, 1640]]])
Here's a more "fun" way of doing this which doesn't use scipy, but using stride_tricks instead. I'm not sure if it's more efficient:
>>> import numpy.lib.stride_tricks as st
>>> s, t = A.strides
>>> i, j = A.shape
>>> k, l, m = B.shape
>>> D = st.as_strided(A, shape=(i-k+1, j-l+1, k, l), strides=(s, t, s, t))
>>> E = np.einsum('ijkl,klm->ijm', D, B)
>>> (E == C).all()
True
Suppose I have a numpy array img, with img.shape == (468,832,3). What does img[::2, ::2] do? It reduces the shape to (234,416,3) Can you please explain the logic?
Let's read documentation together (Source).
(Just read the bold part first)
The basic slice syntax is i:j:k where i is the starting index, j is the stopping index, and k is the step (k \neq 0). This selects the m elements (in the corresponding dimension) with index values i, i + k, ..., i + (m - 1) k where m = q + (r\neq0) and q and r are the quotient and remainder obtained by dividing j - i by k: j - i = q k + r, so that i + (m - 1) k < j.
...
Assume n is the number of elements in the dimension being sliced.
Then, if i is not given it defaults to 0 for k > 0 and n - 1 for k < 0
. If j is not given it defaults to n for k > 0 and -n-1 for k < 0 . If
k is not given it defaults to 1. Note that :: is the same as : and
means select all indices along this axis.
Now looking at your part.
[::2, ::2] will be translated to [0:468:2, 0:832:2] because you do not specify the first two or i and j in the documentation. (You only specify k here. Recall the i:j:k notation above.) You select elements on these axes at the step size 2 which means you select every other elements along the axes specified.
Because you did not specify for the 3rd dimension, all will be selected.
It slices every alternate row, and then every alternate column, from an array, returning an array of size (n // 2, n // 2, ...).
Here's an example of slicing with a 2D array -
>>> a = np.arange(16).reshape(4, 4)
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
>>> a[::2, ::2]
array([[ 0, 2],
[ 8, 10]])
And, here's another example with a 3D array -
>>> a = np.arange(27).reshape(3, 3, 3)
>>> a
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])
>>> a[::2, ::2] # same as a[::2, ::2, :]
array([[[ 0, 1, 2],
[ 6, 7, 8]],
[[18, 19, 20],
[24, 25, 26]]])
Well, we have the RGB image as a 3D array of shape:
img.shape=(468,832,3)
Now, what does img[::2, ::2] do?
we're just downsampling the image (i.e. we're shrinking the image size by half by taking only every other pixel from the original image and we do this by using a step size of 2, which means to skip one pixel). This should be clear from the example below.
Let's take a simple grayscale image for easier understanding.
In [13]: arr
Out[13]:
array([[10, 11, 12, 13, 14, 15],
[20, 21, 22, 23, 24, 25],
[30, 31, 32, 33, 34, 35],
[40, 41, 42, 43, 44, 45],
[50, 51, 52, 53, 54, 55],
[60, 61, 62, 63, 64, 65]])
In [14]: arr.shape
Out[14]: (6, 6)
In [15]: arr[::2, ::2]
Out[15]:
array([[10, 12, 14],
[30, 32, 34],
[50, 52, 54]])
In [16]: arr[::2, ::2].shape
Out[16]: (3, 3)
Notice which pixels are in the sliced version. Also, observe how the array shape changes after slicing (i.e. it is reduced by half).
Now, this downsampling happens for all three channels in the image since there's no slicing happening in the third axis. Thus, you will get the shape reduced only for the first two axis in your example.
(468, 832, 3)
. . |
. . |
(234, 416, 3)
Say I have a 2-D numpy array A of size 20 x 10.
I also have an array of length 20, del_ind.
I want to delete an element from each row of A according to del_ind, to get a resultant array of size 20 x 9.
How can I do this?
I looked into np.delete with a specified axis = 1, but this only deletes element from the same position for each row.
Thanks for the help
You will probably have to build a new array.
Fortunately you can avoid python loops for this task, using fancy indexing:
h, w = 20, 10
A = np.arange(h*w).reshape(h, w)
del_ind = np.random.randint(0, w, size=h)
mask = np.ones((h,w), dtype=bool)
mask[range(h), del_ind] = False
A_ = A[mask].reshape(h, w-1)
Demo with a smaller dataset:
>>> h, w = 5, 4
>>> %paste
A = np.arange(h*w).reshape(h, w)
del_ind = np.random.randint(0, w, size=h)
mask = np.ones((h,w), dtype=bool)
mask[range(h), del_ind] = False
A_ = A[mask].reshape(h, w-1)
## -- End pasted text --
>>> A
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19]])
>>> del_ind
array([2, 2, 1, 1, 0])
>>> A_
array([[ 0, 1, 3],
[ 4, 5, 7],
[ 8, 10, 11],
[12, 14, 15],
[17, 18, 19]])
Numpy isn't known for inplace edits; it's mainly intended for statically sized matrices. For that reason, I'd recommend doing this by copying the intended elements to a new array.
Assuming that it's sufficient to delete one column from every row:
def remove_indices(arr, indices):
result = np.empty((arr.shape[0], arr.shape[1] - 1))
for i, (delete_index, row) in enumerate(zip(indices, arr)):
result[i] = np.delete(row, delete_index)
return result