I have 2 numpy arrays:
a = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
b = np.array([2, 1, 2])
I want to use b as starting indices into the columns of a and set all the values of a from those column indexes onwards to 0 like this:
np.array([[1, 2, 3],
[4, 0, 6],
[0, 0, 0]])
i.e., set elements of column 1 from position 2 onwards to 0, set elements of column 2 from position 1 onwards to 0, and set elements of column 3 from position 2 onwards to 0.
When I try this:
a[:, b:] = 0
I get
TypeError: only integer scalar arrays can be converted to a scalar index
Is there a way to slice using an array of indices without a for loop?
Edit: updated the example to show the indices can be arbitrary
You can use boolean array indexing. First, create a mask of indices you want to set to 0 and then apply the mask to array and assign the replacement value (e.g., 0 in your case).
mask = b>np.arange(a.shape[1])[:,None]
a[~mask]=0
output:
array([[1, 2, 3],
[4, 0, 6],
[0, 0, 0]])
I think the issue is in a[:,b:]; here b: means little if b is not a scaler e.g. 5: means 6th onwards but [1,2,3]: means nothing when array is 2d.
It should be a[:,b]. Setting a[:,b] = 0 will set all columns specified in b to 0. Following is the run.
In [2]: import numpy as np
In [3]: a = np.array([[1, 2, 3],
...: [4, 5, 6],
...: [7, 8, 9]])
...:
...: b = np.array([2, 1, 2])
...:
In [4]: a
Out[4]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [5]: b
Out[5]: array([2, 1, 2])
In [6]: b.dtype
Out[6]: dtype('int64')
In [7]: a[:, b:] = 0
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-7-6e5050513225> in <module>
----> 1 a[:, b:] = 0
TypeError: only integer scalar arrays can be converted to a scalar index
In [8]: a[:, b] = 0
In [9]: a
Out[9]:
array([[1, 0, 0],
[4, 0, 0],
[7, 0, 0]])
But that's not what you want.
To get what you want, you need to specify rows indices and column indices e.g. (1,1), (2,0), (2,1), (2,2).
In [11]: a[[1,2,2,2], [1, 0, 1,2]] = 0
In [12]: a
Out[12]:
array([[1, 2, 3],
[4, 0, 6],
[0, 0, 0]])
Related
I have a 2D array:
[[1,2,0,0],
[4,0,9,4],
[0,0,1,0],
[4,6,9,0]]
is there an efficient way (without using loops) to replace every first 0 in the array, with a 1:
[[1,2,1,0],
[4,1,9,4],
[1,0,1,0],
[4,6,9,1]]
?
Thanks a lot !
Here is a one-liner inspired by the accepted answer of this question:
a = np.array([
[1, 2, 0, 0],
[4, 0, 9, 4],
[0, 0, 1, 0],
[4, 6, 9, 0]
])
a[range(len(a)), np.argmax(a == 0, axis=1)] = 1
So, you can use np.where to get the indices of the rows and columns where the array is 0:
In [45]: arr = np.array(
...: [[1,2,0,0],
...: [4,0,9,4],
...: [0,0,1,0],
...: [4,6,9,0]]
...: )
In [46]: r, c = np.where(arr == 0)
Then, use np.unique to get the unique x values, which will correspond to the first incidence of 0 in each row, and use return_index to get the indices to extract the corresponding column values:
In [47]: uniq_val, uniq_idx = np.unique(r, return_index=True)
In [48]: arr[uniq_val, c[uniq_idx]] = 1
In [49]: arr
Out[49]:
array([[1, 2, 1, 0],
[4, 1, 9, 4],
[1, 0, 1, 0],
[4, 6, 9, 1]])
If performance is really an issue, you could just write a numba function, I suspect this would be very amenable to numba
I have a matrix mat like below;
mat = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
and a list s = [1, 2, 5].
I have to subtract along axis=1. I did as follow and it works..
mat - s = array([[ 0, 0, -2],
[ 3, 3, 1],
[ 6, 6, 4]])
However, if I subtract along axis=0;
ie,
mat - s[:,None]
I get errors.
TypeError: list indices must be integers or slices, not tuple
Here's a little hack:
s = np.array([1,2,5])
(mat.T - s).T
Output:
array([[0, 1, 2],
[2, 3, 4],
[2, 3, 4]])
Edit: .T does not change anything if s is 1d so you can remove it.
You were on the right track with the use of [:,None], but your definition of s was wrong.
In [128]: mat = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
In [129]: s = [1, 2, 5]
In [130]: mat - s
Out[130]:
array([[ 0, 0, -2],
[ 3, 3, 1],
[ 6, 6, 4]])
In this subtraction, s has automatically been 'promoted' to numpy array.
The [..,...] indexing does not work with a list:
In [131]: s[:,None]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-131-bafcfb7b67c1> in <module>
----> 1 s[:,None]
TypeError: list indices must be integers or slices, not tuple
The tuple in this error is the comma expression: s[:, None] is the same as s[(:,None)]. The python parser passes a tuple to the s.__getitem__ method. numpy arrays handle tuples (multidimensonal indexing), lists don't.
If we start with an array, then we can apply the reshape, and perform the desired subtraction:
In [132]: sa = np.array(s)
In [133]: sa
Out[133]: array([1, 2, 5])
In [134]: sa[:,None]
Out[134]:
array([[1],
[2],
[5]])
In [135]: mat - sa[:,None]
Out[135]:
array([[0, 1, 2],
[2, 3, 4],
[2, 3, 4]])
sa is 1d, so transpose doesn't change anything:
In [136]: sa.T
Out[136]: array([1, 2, 5])
I'm working with 2D numpy arrays which exhibit variable sizes, in terms of the number of rows and columns. I'd like to pad this array with zeros both before the start of the first row and at the end of the last row, but I'd like the start/end of the zeros to be offset in a different way for each column of data.
So the original 2D array:
1 2 3
4 5 6
7 8 9
A Normal example of padding:
0 0 0
0 0 0
1 2 3
4 5 6
7 8 9
0 0 0
Modified Padding with offsets (what I'm trying to do):
0 0 0
1 0 0
4 0 3
7 2 6
0 5 9
0 8 0
Does numpy possess any functions which can replicate the last example in an extendable manner for variables numbers of rows/columns, that avoids the use of for loops/other computationally slow approaches?
Here's a vectorized one with broadcasting and boolean-indexing -
def create_padded_array(a, row_start, n_rows):
r = np.arange(n_rows)[:,None]
row_start = np.asarray(row_start)
mask = (r >= row_start) & (r < row_start+a.shape[0])
out = np.zeros(mask.shape, dtype=a.dtype)
out.T[mask.T] = a.ravel('F')
return out
Sample run -
In [184]: a
Out[184]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [185]: create_padded_array(a, row_start=[1,3,2], n_rows=6)
Out[185]:
array([[0, 0, 0],
[1, 0, 0],
[4, 0, 3],
[7, 2, 6],
[0, 5, 9],
[0, 8, 0]])
Sorry for the trouble, but I think I found the answer that I was looking for.
I can use numpy.pad to create an arbitrary number of filler zeros at the end of my original array. There is also a function called numpy.roll which can then be used to shift all array elements along a given axis by a set number of positions down the column.
After a quick test, it looks like this is extendable for an arbitrary number of matrix elements and allows a unique offset along each column.
Thanks to everyone for their responses to this question!
To my knowledge there is no such numpy function with those exact specific requirements, however what you can do is have your array:
`
In [10]: arr = np.array([(1,2,3),(4,5,6),(7,8,9)])
In [11]: arr
Out[11]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])`
Then pad it:
In [12]: arr = np.pad(arr, ((2,1),(0,0)), 'constant', constant_values=(0))
In [13]: arr
Out[13]:
array([[0, 0, 0],
[0, 0, 0],
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[0, 0, 0]])
Then you can randomize with shuffle (which I assume is what you want to do):
But np.random.shuffle only shuffles rows if this is satisfactory for your needs then:
In [14]: np.random.shuffle(arr)
In [15]: arr
Out[15]:
array([[7, 8, 9],
[4, 5, 6],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[1, 2, 3]])
If this is not satisfactory you can do this:
First create a 1D array:
In [16]: arr = np.arange(1,10)
In [17]: arr
Out[17]: array([1, 2, 3, 4, 5, 6, 7, 8, 9])
Then pad your array with zeros:
In [18]: arr = np.pad(arr, (6,3), 'constant', constant_values = (0))
In [19]: arr
Out[19]: array([0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0])
Then you shuffle the array:
In [20]: np.random.shuffle(arr)
In [21]: arr
Out[21]: array([4, 0, 0, 5, 0, 0, 3, 0, 0, 0, 8, 0, 7, 2, 1, 6, 0, 9])
Finally you reshape to the desired format:
In [22]: np.reshape(arr,[6,3])
Out[22]:
array([[4, 0, 0],
[5, 0, 0],
[3, 0, 0],
[0, 8, 0],
[7, 2, 1],
[6, 0, 9]])
Although this may seem lengthy this is much quicker for large data sets than it will be using for loops, or any other python control structures. When you say offsets if you want to change the amount of randomness you can choose to only shuffle portions of the 1D array then combine it to the rest of the data so that way the whole data set is not shuffled but a portion you want to be shuffled is.
(If what you mean by offsets is different from my assumption above please clarify in a comment)
for example, I have the numpy arrays like this
a =
array([[1, 2, 3],
[4, 3, 2]])
and index like this to select the max values
max_idx =
array([[0, 2],
[1, 0]])
how can I access there positions at the same time, to modify them.
like "a[max_idx] = 0" getting the following
array([[1, 2, 0],
[0, 3, 2]])
Simply use subscripted-indexing -
a[max_idx[:,0],max_idx[:,1]] = 0
If you are working with higher dimensional arrays and don't want to type out slices of max_idx for each axis, you can use linear-indexing to assign zeros, like so -
a.ravel()[np.ravel_multi_index(max_idx.T,a.shape)] = 0
Sample run -
In [28]: a
Out[28]:
array([[1, 2, 3],
[4, 3, 2]])
In [29]: max_idx
Out[29]:
array([[0, 2],
[1, 0]])
In [30]: a[max_idx[:,0],max_idx[:,1]] = 0
In [31]: a
Out[31]:
array([[1, 2, 0],
[0, 3, 2]])
Numpy support advanced slicing like this:
a[b[:, 0], b[:, 1]] = 0
Code above would fit your requirement.
If b is more than 2-D. A better way should be like this:
a[np.split(b, 2, axis=1)]
The np.split will split ndarray into columns.
I'd like to get the index of a value for every column in a matrix M. For example:
M = matrix([[0, 1, 0],
[4, 2, 4],
[3, 4, 1],
[1, 3, 2],
[2, 0, 3]])
In pseudocode, I'd like to do something like this:
for col in M:
idx = numpy.where(M[col]==0) # Only for columns!
and have idx be 0, 4, 0 for each column.
I have tried to use where, but I don't understand the return value, which is a tuple of matrices.
The tuple of matrices is a collection of items suited for indexing. The output will have the shape of the indexing matrices (or arrays), and each item in the output will be selected from the original array using the first array as the index of the first dimension, the second as the index of the second dimension, and so on. In other words, this:
>>> numpy.where(M == 0)
(matrix([[0, 0, 4]]), matrix([[0, 2, 1]]))
>>> row, col = numpy.where(M == 0)
>>> M[row, col]
matrix([[0, 0, 0]])
>>> M[numpy.where(M == 0)] = 1000
>>> M
matrix([[1000, 1, 1000],
[ 4, 2, 4],
[ 3, 4, 1],
[ 1, 3, 2],
[ 2, 1000, 3]])
The sequence may be what's confusing you. It proceeds in flattened order -- so M[0,2] appears second, not third. If you need to reorder them, you could do this:
>>> row[0,col.argsort()]
matrix([[0, 4, 0]])
You also might be better off using arrays instead of matrices. That way you can manipulate the shape of the arrays, which is often useful! Also note ajcr's transpose-based trick, which is probably preferable to using argsort.
Finally, there is also a nonzero method that does the same thing as where in this case. Using the transpose trick now:
>>> (M == 0).T.nonzero()
(matrix([[0, 1, 2]]), matrix([[0, 4, 0]]))
As an alternative to np.where, you could perhaps use np.argwhere to return an array of indexes where the array meets the condition:
>>> np.argwhere(M == 0)
array([[[0, 0]],
[[0, 2]],
[[4, 1]]])
This tells you each the indexes in the format [row, column] where the condition was met.
If you'd prefer the format of this output array to be grouped by column rather than row, (that is, [column, row]), just use the method on the transpose of the array:
>>> np.argwhere(M.T == 0).squeeze()
array([[0, 0],
[1, 4],
[2, 0]])
I also used np.squeeze here to get rid of axis 1, so that we are left with a 2D array. The sequence you want is the second column, i.e. np.argwhere(M.T == 0).squeeze()[:, 1].
The result of where(M == 0) would look something like this
(matrix([[0, 0, 4]]), matrix([[0, 2, 1]])) First matrix tells you the rows where 0s are and second matrix tells you the columns where 0s are.
Out[4]:
matrix([[0, 1, 0],
[4, 2, 4],
[3, 4, 1],
[1, 3, 2],
[2, 0, 3]])
In [5]: np.where(M == 0)
Out[5]: (matrix([[0, 0, 4]]), matrix([[0, 2, 1]]))
In [6]: M[0,0]
Out[6]: 0
In [7]: M[0,2] #0th row 2nd column
Out[7]: 0
In [8]: M[4,1] #4th row 1st column
Out[8]: 0
This isn't anything new on what's been already suggested, but a one-line solution is:
>>> np.where(np.array(M.T)==0)[-1]
array([0, 4, 0])
(I agree that NumPy matrix objects are more trouble than they're worth).
>>> M = np.array([[0, 1, 0],
... [4, 2, 4],
... [3, 4, 1],
... [1, 3, 2],
... [2, 0, 3]])
>>> [np.where(M[:,i]==0)[0][0] for i in range(M.shape[1])]
[0, 4, 0]