This question already has answers here:
numpy subtract every row of matrix by vector
(3 answers)
numpy subtract/add 1d array from 2d array
(2 answers)
Closed 5 years ago.
I have two numpy array a and b
a=np.array([[1,2,3],[4,5,6],[7,8,9]])
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
b = np.array([1,2,3])
array([1, 2, 3])
I would like to substract to each row of a the correspondent element of b (ie. to the first row of a, the first element of b, etc)
so that c is
array([[0, 1, 2],
[2, 3, 4],
[4, 5, 6]])
Is there a python command to do this?
Is there a python command to do this?
Yes, the - operator.
In addition you need to make b into a column vector so that broadcasting can do the rest for you:
a - b[:, np.newaxis]
# array([[0, 1, 2],
# [2, 3, 4],
# [4, 5, 6]])
yup! You just need to make b a column vector first
a - b[:, np.newaxis]
Reshape b into a column vector, then subtract:
a - b.reshape(3, 1)
b isn't altered in place, but the result of the reshape method call will be the column vector:
array([[1],
[2],
[3]])
Allowing the "shape" of the subtraction you wanted. A little more general reshape operation would be:
b.reshape(b.size, 1)
Taking however many elements b has, and molding them into an N x 1 vector.
Update: A quick benchmark shows kazemakase's answer, using b[:, np.newaxis] as the reshaping strategy, to be ~7% faster. For small vectors, those few extra fractions of a µs won't matter. But for large vectors or inner loops, prefer his approach. It's a less-general reshape, but more performant for this use.
Related
In some special cases, array can be concatenated without explicitly calling concatenate function. For example, given a 2D array A, the following code will yield an identical array B:
B = np.array([A[ii,:] for ii in range(A.shape[0])])
I know this method works, but do not quite understand the underlying mechanism. Can anyone demystify the code above a little bit?
A[ii,:] is ii-th row of array A.
The list comprehension [A[ii,:] for ii in range(A.shape[0])] basically makes a list of rows in A (A.shape[0] is number of rows in A).
Finally, B is an array, that its content is a list of A's rows, which is essentially the same as A itself.
By now you should be familiar with making an array from a list of lists:
In [178]: np.array([[1,2],[3,4]])
Out[178]:
array([[1, 2],
[3, 4]])
but that works just as well if it's a list of arrays:
In [179]: np.array([np.array([1,2]),np.array([3,4])])
Out[179]:
array([[1, 2],
[3, 4]])
stack also does this, by adding a dimension to the arrays and calling concatenate (read its code):
In [180]: np.stack([np.array([1,2]),np.array([3,4])])
Out[180]:
array([[1, 2],
[3, 4]])
concatenate joins the arrays - on an existing axis:
In [181]: np.concatenate([np.array([1,2]),np.array([3,4])])
Out[181]: array([1, 2, 3, 4])
stack adds a dimension first, as in:
In [182]: np.concatenate([np.array([[1,2]]),np.array([[3,4]])])
Out[182]:
array([[1, 2],
[3, 4]])
np.array and concatenate aren't identical, but there's a lot of overlap in their functionality.
This question already has answers here:
Indexing one array by another in numpy
(4 answers)
Closed 6 years ago.
For example, I have two numpy arrays,
A = np.array(
[[0,1],
[2,3],
[4,5]])
B = np.array(
[[1],
[0],
[1]], dtype='int')
and I want to extract one element from each row of A, and that element is indexed by B, so I want the following results:
C = np.array(
[[1],
[2],
[5]])
I tried A[:, B.ravel()], but it'll broadcast B, not what I want. Also looked into np.take, seems not the right solution to my problem.
However, I could use np.choose by transposing A,
np.choose(B.ravel(), A.T)
but any other better solution?
You can use NumPy's purely integer array indexing -
A[np.arange(A.shape[0]),B.ravel()]
Sample run -
In [57]: A
Out[57]:
array([[0, 1],
[2, 3],
[4, 5]])
In [58]: B
Out[58]:
array([[1],
[0],
[1]])
In [59]: A[np.arange(A.shape[0]),B.ravel()]
Out[59]: array([1, 2, 5])
Please note that if B is a 1D array or a list of such column indices, you could simply skip the flattening operation with .ravel().
Sample run -
In [186]: A
Out[186]:
array([[0, 1],
[2, 3],
[4, 5]])
In [187]: B
Out[187]: [1, 0, 1]
In [188]: A[np.arange(A.shape[0]),B]
Out[188]: array([1, 2, 5])
C = np.array([A[i][j] for i,j in enumerate(B)])
I am sorry that the title of my question may sound vague, since I do not know the exact name of such operation.
Given a tensor A (N×M×M) and a one-dimension array b (N), I would like to get another tensor B (N×M×M) such that each item (M×M) in B is the multiplication between A and b.
A possible but ugly solution is to flatten(reshape) A firstly, i.e, converting A into a 2D array, then apply a dot operation, and finally reshape back.
Is there any standard/simple operation in numpy to achieve this?
For example,
A = np.ones(12).reshape(3, 2, 2)
b = np.array([2, 3, 4])
The expected B is
[[[2, 2],
[2, 2]],
[[3, 3],
[3, 3]],
[[4, 4],
[4, 4]]]
What you are looking for is broadcasting; in two words, reshape your array b with the value 1 in some dimensions in order to get more control on what will happen; the total number of elements in b will remain unchanged but you may choose how the array will behave during the arithmetic operation:
A*b.reshape((3,1,1))
This question already has answers here:
Efficient Numpy 2D array construction from 1D array
(7 answers)
Closed 6 years ago.
I have an array like
[1,3,4,5,6,2,1,,,,]
now, I want to change it to
[[1,3,4],[3,4,5],[4,5,6],[5,6,2],,,,]
How can I achieve this using numpy? Is there any function to do so? And, using loop is not an option.
np.lib.stride_tricks.as_strided method will does that.
Here strides is (4,4) for 32 bit int. If you want more flexible code, I have commented stride parameter in the code. shape parameter determines output array dimensions.
> import numpy as np
> A = [1,2,3,4,5,6]
> n = 3 # output matrix has 3 columns
> m = len(A) - (n-1) # calculate number of output matrix rows using input matrix length
> # strides_param = np.array(A, dtype=np.int32).strides * 2
> np.lib.stride_tricks.as_strided(A, shape=(m,n), strides=(4,4))
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6]])
Using list comprehension instead of a library but why make it complicated. (Do you consider this a loop?)
x = [1,3,4,5,6,2,1]
y = [x[n:n+3] for n in range(len(x)-2)]
Result is:
[[1, 3, 4], [3, 4, 5], [4, 5, 6], [5, 6, 2], [6, 2, 1]]
Given:
test = numpy.array([[1, 2], [3, 4], [5, 6]])
test[i] gives the ith row (e.g. [1, 2]). How do I access the ith column? (e.g. [1, 3, 5]). Also, would this be an expensive operation?
To access column 0:
>>> test[:, 0]
array([1, 3, 5])
To access row 0:
>>> test[0, :]
array([1, 2])
This is covered in Section 1.4 (Indexing) of the NumPy reference. This is quick, at least in my experience. It's certainly much quicker than accessing each element in a loop.
>>> test[:,0]
array([1, 3, 5])
this command gives you a row vector, if you just want to loop over it, it's fine, but if you want to hstack with some other array with dimension 3xN, you will have
ValueError: all the input arrays must have same number of dimensions
while
>>> test[:,[0]]
array([[1],
[3],
[5]])
gives you a column vector, so that you can do concatenate or hstack operation.
e.g.
>>> np.hstack((test, test[:,[0]]))
array([[1, 2, 1],
[3, 4, 3],
[5, 6, 5]])
And if you want to access more than one column at a time you could do:
>>> test = np.arange(9).reshape((3,3))
>>> test
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> test[:,[0,2]]
array([[0, 2],
[3, 5],
[6, 8]])
You could also transpose and return a row:
In [4]: test.T[0]
Out[4]: array([1, 3, 5])
Although the question has been answered, let me mention some nuances.
Let's say you are interested in the first column of the array
arr = numpy.array([[1, 2],
[3, 4],
[5, 6]])
As you already know from other answers, to get it in the form of "row vector" (array of shape (3,)), you use slicing:
arr_col1_view = arr[:, 1] # creates a view of the 1st column of the arr
arr_col1_copy = arr[:, 1].copy() # creates a copy of the 1st column of the arr
To check if an array is a view or a copy of another array you can do the following:
arr_col1_view.base is arr # True
arr_col1_copy.base is arr # False
see ndarray.base.
Besides the obvious difference between the two (modifying arr_col1_view will affect the arr), the number of byte-steps for traversing each of them is different:
arr_col1_view.strides[0] # 8 bytes
arr_col1_copy.strides[0] # 4 bytes
see strides and this answer.
Why is this important? Imagine that you have a very big array A instead of the arr:
A = np.random.randint(2, size=(10000, 10000), dtype='int32')
A_col1_view = A[:, 1]
A_col1_copy = A[:, 1].copy()
and you want to compute the sum of all the elements of the first column, i.e. A_col1_view.sum() or A_col1_copy.sum(). Using the copied version is much faster:
%timeit A_col1_view.sum() # ~248 µs
%timeit A_col1_copy.sum() # ~12.8 µs
This is due to the different number of strides mentioned before:
A_col1_view.strides[0] # 40000 bytes
A_col1_copy.strides[0] # 4 bytes
Although it might seem that using column copies is better, it is not always true for the reason that making a copy takes time too and uses more memory (in this case it took me approx. 200 µs to create the A_col1_copy). However if we needed the copy in the first place, or we need to do many different operations on a specific column of the array and we are ok with sacrificing memory for speed, then making a copy is the way to go.
In the case we are interested in working mostly with columns, it could be a good idea to create our array in column-major ('F') order instead of the row-major ('C') order (which is the default), and then do the slicing as before to get a column without copying it:
A = np.asfortranarray(A) # or np.array(A, order='F')
A_col1_view = A[:, 1]
A_col1_view.strides[0] # 4 bytes
%timeit A_col1_view.sum() # ~12.6 µs vs ~248 µs
Now, performing the sum operation (or any other) on a column-view is as fast as performing it on a column copy.
Finally let me note that transposing an array and using row-slicing is the same as using the column-slicing on the original array, because transposing is done by just swapping the shape and the strides of the original array.
A[:, 1].strides[0] # 40000 bytes
A.T[1, :].strides[0] # 40000 bytes
To get several and indepent columns, just:
> test[:,[0,2]]
you will get colums 0 and 2
>>> test
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
>>> ncol = test.shape[1]
>>> ncol
5L
Then you can select the 2nd - 4th column this way:
>>> test[0:, 1:(ncol - 1)]
array([[1, 2, 3],
[6, 7, 8]])
This is not multidimensional. It is 2 dimensional array. where you want to access the columns you wish.
test = numpy.array([[1, 2], [3, 4], [5, 6]])
test[:, a:b] # you can provide index in place of a and b