I am trying to extract several values at once from an array but I can't seem to find a way to do it in a one-liner in Numpy.
Simply put, considering an array:
a = numpy.arange(10)
> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
I would like to be able to extract, say, 2 values, skip the next 2, extract the 2 following values etc. This would result in:
array([0, 1, 4, 5, 8, 9])
This is an example but I am ideally looking for a way to extract x values and skip y others.
I thought this could be done with slicing, doing something like:
a[:2:2]
but it only returns 0, which is the expected behavior.
I know I could obtain the expected result by combining several slicing operations (similarly to Numpy Array Slicing) but I was wondering if I was not missing some numpy feature.
If you want to avoid creating copies and allocating new memory, you could use a window_view of two elements:
win = np.lib.stride_tricks.sliding_window_view(a, 2)
array([[0, 1],
[1, 2],
[2, 3],
[3, 4],
[4, 5],
[5, 6],
[6, 7],
[7, 8],
[8, 9]])
And then only take every 4th window view:
win[::4].ravel()
array([0, 1, 4, 5, 8, 9])
Or directly go with the more dangerous as_strided, but heed the warnings in the documentation:
np.lib.stride_tricks.as_strided(a, shape=(3,2), strides=(32,8))
You can use a modulo operator:
x = 2 # keep
y = 2 # skip
out = a[np.arange(a.shape[0])%(x+y)<x]
Output: array([0, 1, 4, 5, 8, 9])
Output with x = 2 ; y = 3:
array([0, 1, 5, 6])
Related
I am trying to efficiently update some elements of a numpy array A, using another array b to indicate the indexes of the elements of A to be updated. However b can contain duplicates which are ignored whereas I would like to be taken into account. I would like to avoid for looping b. To illustrate it:
>>> A = np.arange(10).reshape(2,5)
>>> A[0, np.array([1,1,1,2])] += 1
>>> A
array([[0, 2, 3, 3, 4],
[5, 6, 7, 8, 9]])
whereas I would like the output to be:
array([[0, 3, 3, 3, 4],
[5, 6, 7, 8, 9]])
Any ideas?
To correctly handle the duplicate indices, you'll need to use np.add.at instead of +=. Therefore to update the first row of A, the simplest way would probably be to do the following:
>>> np.add.at(A[0], [1,1,1,2], 1)
>>> A
array([[0, 4, 3, 3, 4],
[5, 6, 7, 8, 9]])
The documents for the ufunc.at method can be found here.
One approach is to use numpy.histogram to find out how many values there are at each index, then add the result to A:
A[0, :] += np.histogram(np.array([1,1,1,2]), bins=np.arange(A.shape[1]+1))[0]
The [::n] indexing option in numpy provides a very useful way to index every nth item in a list. However, is it possible to use this feature to extract multiple values, e.g. every other pair of values?
For example:
a = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
And I want to extract every other pair of values i.e. I want to return
a[0, 1, 4, 5, 8, 9,]
Of course the index could be built using loops or something, but I wonder if there's a faster way to use ::-style indexing in numpy but also specifying the width of the pattern to take every nth iteration of.
Thanks
With length of array being a multiple of the window size -
In [29]: W = 2 # window-size
In [30]: a.reshape(-1,W)[::2].ravel()
Out[30]: array([0, 1, 4, 5, 8, 9])
Explanation with breaking-down-the-steps -
# Reshape to split into W-sized groups
In [43]: a.reshape(-1,W)
Out[43]:
array([[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11]])
# Use stepsize to select every other pair starting from the first one
In [44]: a.reshape(-1,W)[::2]
Out[44]:
array([[0, 1],
[4, 5],
[8, 9]])
# Flatten for desired output
In [45]: a.reshape(-1,W)[::2].ravel()
Out[45]: array([0, 1, 4, 5, 8, 9])
If you are okay with 2D output, skip the last step as that still be a view into the input and virtually free on runtime. Let's verify the view-part -
In [47]: np.shares_memory(a,a.reshape(-1,W)[::2])
Out[47]: True
For generic case of not necessarily a multiple, we can use a masking based one -
In [64]: a[(np.arange(len(a))%(2*W))<W]
Out[64]: array([0, 1, 4, 5, 8, 9])
You can do that reshaping the array into a nx3 matrix, then slice up the first two elements for each row and finally flatten up the reshaped array:
a.reshape((-1,3))[:,:2].flatten()
resulting in:
array([ 0, 1, 3, 4, 6, 7, 9, 10])
I have a NumPy array, for example:
>>> import numpy as np
>>> x = np.random.randint(0, 10, size=(5, 5))
>>> x
array([[4, 7, 3, 7, 6],
[7, 9, 5, 7, 8],
[3, 1, 6, 3, 2],
[9, 2, 3, 8, 4],
[0, 9, 9, 0, 4]])
Is there a way to get a view (or copy) that contains indices 1:3 of the first row, indices 2:4 of the second row and indices 3:5 of the forth row?
So, in the above example, I wish to get:
>>> # What to write here?
array([[7, 3],
[5, 7],
[8, 4]])
Obviously, I would like a general method that would work efficiently also for multi-dimensional large arrays (and not only for the toy example above).
Try:
>>> np.array([x[0, 1:3], x[1, 2:4], x[3, 3:5]])
array([[7, 3],
[5, 7],
[8, 4]])
You can use numpy.lib.stride_tricks.as_strided as long as the offsets between rows are uniform:
# How far to step along the rows
offset = 1
# How wide the chunk of each row is
width = 2
view = np.lib.stride_tricks.as_strided(x, shape=(x.shape[0], width), strides=(x.strides[0] + offset * x.strides[1],) + x.strides[1:])
The result is guaranteed to be a view into the original data, not a copy.
Since as_strided is ridiculously powerful, be very careful how you use it. For example, make absolutely sure that the view does not go out of bounds in the last few rows.
If you can avoid it, try not to assign anything into a view returned by as_strided. Assignment just increases the dangers of unpredictable behavior and crashing a thousandfold if you don't know exactly what you're doing.
I guess something like this :D
In:
import numpy as np
x = np.random.randint(0, 10, size=(5, 5))
Out:
array([[7, 3, 3, 1, 9],
[6, 1, 3, 8, 7],
[0, 2, 2, 8, 4],
[8, 8, 1, 8, 8],
[1, 2, 4, 3, 4]])
In:
list_of_indicies = [[0,1,3], [1,2,4], [3,3,5]] #[row, start, stop]
def func(array, row, start, stop):
return array[row, start:stop]
for i in range(len(list_of_indicies)):
print(func(x,list_of_indicies[i][0],list_of_indicies[i][1], list_of_indicies[i][2]))
Out:
[3 3]
[3 8]
[3 4]
So u can modify it for your needs. Good luck!
I would extract diagonal vectors and stack them together, like this:
def diag_slice(x, start, end):
n_rows = min(*x.shape)-end+1
columns = [x.diagonal(i)[:n_rows, None] for i in range(start, end)]
return np.hstack(columns)
In [37]: diag_slice(x, 1, 3)
Out[37]:
array([[7, 3],
[5, 7],
[3, 2]])
For the general case it will be hard to beat a row by row list comprehension:
In [28]: idx = np.array([[0,1,3],[1,2,4],[4,3,5]])
In [29]: [x[i,j:k] for i,j,k in idx]
Out[29]: [array([7, 8]), array([2, 0]), array([9, 2])]
If the resulting arrays are all the same size, they can be combined into one 2d array:
In [30]: np.array(_)
Out[30]:
array([[7, 8],
[2, 0],
[9, 2]])
Another approach is to concatenate the indices before. I won't get into the details, but create something like this:
In [27]: x[[0,0,1,1,3,3],[1,2,2,3,3,4]]
Out[27]: array([7, 8, 2, 0, 3, 8])
Selecting from different rows complicates this 2nd approach. Conceptually the first is simpler. Past experience suggests the speed is about the same.
For uniform length slices, something like the as_strided trick may be faster, but it requires more understanding.
Some masking based approaches have also been suggested. But the details are more complicated, so I'll leave those to people like #Divakar who have specialized in them.
Someone has already pointed out the as_strided tricks, and yes, you should really use it with caution.
Here is a broadcast / fancy index approach which is less efficient than as_strided but still works pretty well IMO
window_size, step_size = 2, 1
# index within window
index = np.arange(2)
# offset
offset = np.arange(1, 4, step_size)
# for your case it's [0, 1, 3], I'm not sure how to generalize it without further information
fancy_row = np.array([0, 1, 3]).reshape(-1, 1)
# array([[1, 2],
# [2, 3],
# [3, 4]])
fancy_col = offset.reshape(-1, 1) + index
x[fancy_row, fancy_col]
I am trying to efficiently update some elements of a numpy array A, using another array b to indicate the indexes of the elements of A to be updated. However b can contain duplicates which are ignored whereas I would like to be taken into account. I would like to avoid for looping b. To illustrate it:
>>> A = np.arange(10).reshape(2,5)
>>> A[0, np.array([1,1,1,2])] += 1
>>> A
array([[0, 2, 3, 3, 4],
[5, 6, 7, 8, 9]])
whereas I would like the output to be:
array([[0, 3, 3, 3, 4],
[5, 6, 7, 8, 9]])
Any ideas?
To correctly handle the duplicate indices, you'll need to use np.add.at instead of +=. Therefore to update the first row of A, the simplest way would probably be to do the following:
>>> np.add.at(A[0], [1,1,1,2], 1)
>>> A
array([[0, 4, 3, 3, 4],
[5, 6, 7, 8, 9]])
The documents for the ufunc.at method can be found here.
One approach is to use numpy.histogram to find out how many values there are at each index, then add the result to A:
A[0, :] += np.histogram(np.array([1,1,1,2]), bins=np.arange(A.shape[1]+1))[0]
To permute a 1D array A I know that you can run the following code:
import numpy as np
A = np.random.permutation(A)
I have a 2D array and want to apply exactly the same permutation for every row of the array. Is there any way you can specify the numpy to do that for you?
Generate random permutations for the number of columns in A and index into the columns of A, like so -
A[:,np.random.permutation(A.shape[1])]
Sample run -
In [100]: A
Out[100]:
array([[3, 5, 7, 4, 7],
[2, 5, 2, 0, 3],
[1, 4, 3, 8, 8]])
In [101]: A[:,np.random.permutation(A.shape[1])]
Out[101]:
array([[7, 5, 7, 4, 3],
[3, 5, 2, 0, 2],
[8, 4, 3, 8, 1]])
Actually you do not need to do this, from the documentation:
If x is a multi-dimensional array, it is only shuffled along its first
index.
So, taking Divakar's array:
a = np.array([
[3, 5, 7, 4, 7],
[2, 5, 2, 0, 3],
[1, 4, 3, 8, 8]
])
you can just do: np.random.permutation(a) and get something like:
array([[2, 5, 2, 0, 3],
[3, 5, 7, 4, 7],
[1, 4, 3, 8, 8]])
P.S. if you need to perform column permutations - just do np.random.permutation(a.T).T. Similar things apply to multi-dim arrays.
It depends what you mean on every row.
If you want to permute all values (regardless of row and column), reshape your array to 1d, permute, reshape back to 2d.
If you want to permutate each row but not shuffle the elements among the different columns you need to loop trough the one axis and call permutation.
for i in range(len(A)):
A[i] = np.random.permutation(A[i])
It can probably done shorter somehow but that is how it can be done.