slice a 3d numpy array using a 2d numpy array - python

Is it possible to slice a 3d array using a 2d array. Im assuming it can be done but would require that you have to specify the axis?
If I have 3 arrays, such that:
A = [[1,2,3,4,5],
[1,3,5,7,9],
[5,4,3,2,1]] # shape (3,5)
B1 = [[1],
[2],
[3]] # shape (3, 1)
B2 = [[4],
[3],
[4]] # shape (3,1)
Is its possible to slice A using B1 an B2 like:
Out = A[B1:B2]
so that it would return me:
Out = [[2,3,4,5],
[5, 7],
[2, 1]]
or would this not work if the slices created arrays in Out of different lengths?

Numpy is optimized for homogeneous arrays of numbers with fixed dimensions, so it does not support varying row or column sizes.
However you can achieve what you want by using a list of arrays:
Out = [A[i, B1[i]:B2[i]+1] for i in range(len(B1))]

Here's one to vectorization -
n_range = np.arange(A.shape[1])
elems = A[(n_range >= B1) & (n_range <= B2)]
idx = (B2 - B1 + 1).ravel().cumsum()
out = np.split(elems,idx)[:-1]
The trick is to use broadcasting to create a mask of elements to be selected for the output. Then, splitting the array of those elements at specified positions to get list of arrays.
Sample input, output -
In [37]: A
Out[37]:
array([[1, 2, 3, 4, 5],
[1, 3, 5, 7, 9],
[5, 4, 3, 2, 1]])
In [38]: B1
Out[38]:
array([[1],
[2],
[3]])
In [39]: B2
Out[39]:
array([[4],
[3],
[4]])
In [40]: out
Out[40]: [array([2, 3, 4, 5]), array([5, 7]), array([2, 1])]
# Please note that the o/p is a list of arrays

Your desired result has a different number of terms in each row - that's a strong indicator that a fully vectorized solution is not possible. It is not doing the same thing for each row or each column.
Secondly, n:m translates to slice(n,m). slice only takes integers, not lists or arrays.
The obvious solution is some sort of iteration over rows:
In [474]: A = np.array([[1,2,3,4,5],
[1,3,5,7,9],
[5,4,3,2,1]]) # shape (3,5)
In [475]: B1=[1,2,3] # no point in making these 2d
In [476]: B2=[5,4,5] # corrected values
In [477]: [a[b1:b2] for a,b1,b2 in zip(A,B1,B2)]
Out[477]: [array([2, 3, 4, 5]), array([5, 7]), array([2, 1])]
This solution works just as well if A is a nested list
In [479]: [a[b1:b2] for a,b1,b2 in zip(A.tolist(),B1,B2)]
Out[479]: [[2, 3, 4, 5], [5, 7], [2, 1]]
The 2 lists could also be converted to an array of 1d indices, and then select values from A.ravel(). That would produce a 1d array, e.g.
array([2, 3, 4, 5, 5, 7, 2, 1]
which in theory could be np.split - but recent experience with other questions indicates that this doesn't save much time.
If the length of the row selections were all the same we can get a 2d array. Iterative version taking 2 elements per row:
In [482]: np.array([a[b1:b1+2] for a,b1 in zip(A,B1)])
Out[482]:
array([[2, 3],
[5, 7],
[2, 1]])
I've discussed in earlier SO questions how produce this sort of result with one indexing operation.
On what slice accepts:
In [486]: slice([1,2],[3,4]).indices(10)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-486-0c3514e61cf6> in <module>()
----> 1 slice([1,2],[3,4]).indices(10)
TypeError: slice indices must be integers or None or have an __index__ method
'vectorized' ravel indexing
In [505]: B=np.array([B1,B2])
In [506]: bb=A.shape[1]*np.arange(3)+B
In [508]: ri =np.r_[tuple([slice(i,j) for i,j in bb.T])]
# or np.concatenate([np.arange(i,j) for i,j in bb.T])
In [509]: ri
Out[509]: array([ 1, 2, 3, 4, 7, 8, 13, 14])
In [510]: A.ravel()[ri]
Out[510]: array([2, 3, 4, 5, 5, 7, 2, 1])
It still has an iteration - to generate the slices that go into np.r_ (which expands them into a single indexing array)

Related

How to apply a function on jagged Numpy arrays (unequal row lengths) without using np.apply_along_axis()?

I'm trying to speed up a process, I think this might be possible using numpy's apply_along_axis. The problem is that not all my axis have the same length.
When I do:
a = np.array([[1, 2, 3],
[2, 3, 4],
[4, 5, 6]])
b = np.apply_along_axis(sum, 1, a)
print(b)
This works fine. But I would like to do something similar to (please note that the first row has 4 elements and the rest have 3):
a = np.array([[1, 2, 3, 4],
[2, 3, 4],
[4, 5, 6]])
b = np.apply_along_axis(sum, 1, a)
print(b)
But this fails because:
numpy.AxisError: axis 1 is out of bounds for array of dimension 1
I've looked around and the only 'solution' I've found is to add zeros to make all the arrays the same length, which would probably defeat the purpose of performance improvement.
Is there any way to use numpy_apply_along_axis on a non-regular shaped numpy array?
You can transform your initial array of iterable-objects to ndarray by padding them with zeros in a vectorized manner:
import numpy as np
a = np.array([[1, 2, 3, 4],
[2, 3, 4],
[4, 5, 6]])
max_len = len(max(a, key = lambda x: len(x))) # max length of iterable-objects contained in array
cust_func = np.vectorize(pyfunc=lambda x: np.pad(array=x,
pad_width=(0,max_len),
mode='constant',
constant_values=(0,0))[:max_len], otypes=[list])
a_pad = np.stack(cust_func(a))
output:
array([[1, 2, 3, 4],
[2, 3, 4, 0],
[4, 5, 6, 0]])
It depends.
Do you know the size of the vectors before or are you appending to a list?
see e.g. http://stackoverflow.com/a/58085045/7919597
You could for example pad the arrays
import numpy as np
a1 = [1, 2, 3, 4]
a2 = [2, 3, 4, np.nan] # pad with nan
a3 = [4, 5, 6, np.nan] # pad with nan
b = np.stack([a1, a2, a3], axis=0)
print(b)
# you can apply the normal numpy operations on
# arrays with nan, they usually just result in a nan
# in a resulting array
c = np.diff(b, axis=-1)
print(c)
Afterwards you can apply a moving window on each row over the columns.
Have a look at https://stackoverflow.com/a/22621523/7919597 which is only 1d, but can give you an idea of how it could work.
It is possible to use a 2d array with only one row as kernel (shape e.g. (1, 3)) with scipy.signal.convolve2d and use the idea above.
This is a workaround to get a "row-wise 1D convolution":
from scipy import signal
krnl = np.array([[0, 1, 0]])
d = signal.convolve2d(c, krnl, mode='same')
print(d)

What's the difference between shape(150,) and shape (150,1)?

What's the difference between shape(150,) and shape (150,1)?
I think they are the same, I mean they both represent a column vector.
Both have the same values, but one is a vector and the other one is a matrix of the vector. Here's an example:
import numpy as np
x = np.array([1, 2, 3, 4, 5])
y = np.array([[1], [2], [3], [4], [5]])
print(x.shape)
print(y.shape)
And the output is:
(5,)
(5, 1)
Although they both occupy same space and positions in memory,
I think they are the same, I mean they both represent a column vector.
No they are not and certainly not according to NumPy (ndarrays).
The main difference is that the
shape (150,) => is a 1D array, whereas
shape (150,1) => is a 2D array
Questions like this see to come from two misconceptions.
not realizing that (5,) is a 1 element tuple.
expecting MATLAB like matrices
Make an array with the handy arange function:
In [424]: x = np.arange(5)
In [425]: x.shape
Out[425]: (5,) # 1 element tuple
In [426]: x.ndim
Out[426]: 1
numpy does not automatically make matrices, 2d arrays. It does not follow MATLAB in that regard.
We can reshape that array, adding a 2nd dimension. The result is a view (sooner or later you need to learn what that means):
In [427]: y = x.reshape(5,1)
In [428]: y.shape
Out[428]: (5, 1)
In [429]: y.ndim
Out[429]: 2
The display of these 2 arrays is very different. Same numbers, but the layout and number of brackets is very different, reflecting the respective shapes:
In [430]: x
Out[430]: array([0, 1, 2, 3, 4])
In [431]: y
Out[431]:
array([[0],
[1],
[2],
[3],
[4]])
The shape difference may seem academic - until you try to do math with the arrays:
In [432]: x+x
Out[432]: array([0, 2, 4, 6, 8]) # element wise sum
In [433]: x+y
Out[433]:
array([[0, 1, 2, 3, 4],
[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[4, 5, 6, 7, 8]])
How did that end up producing a (5,5) array? Broadcasting a (5,) array with a (5,1) array!

Numpy.where used with list of values

I have a 2d and 1d array. I am looking to find the two rows that contain at least once the values from the 1d array as follows:
import numpy as np
A = np.array([[0, 3, 1],
[9, 4, 6],
[2, 7, 3],
[1, 8, 9],
[6, 2, 7],
[4, 8, 0]])
B = np.array([0,1,2,3])
results = []
for elem in B:
results.append(np.where(A==elem)[0])
This works and results in the following array:
[array([0, 5], dtype=int64),
array([0, 3], dtype=int64),
array([2, 4], dtype=int64),
array([0, 2], dtype=int64)]
But this is probably not the best way of proceeding. Following the answers given in this question (Search Numpy array with multiple values) I tried the following solutions:
out1 = np.where(np.in1d(A, B))
num_arr = np.sort(B)
idx = np.searchsorted(B, A)
idx[idx==len(num_arr)] = 0
out2 = A[A == num_arr[idx]]
But these give me incorrect values:
In [36]: out1
Out[36]: (array([ 0, 1, 2, 6, 8, 9, 13, 17], dtype=int64),)
In [37]: out2
Out[37]: array([0, 3, 1, 2, 3, 1, 2, 0])
Thanks for your help
If you need to know whether each row of A contains ANY element of array B without interest in which particular element of B it is, the following script can be used:
input:
np.isin(A,B).sum(axis=1)>0
output:
array([ True, False, True, True, True, True])
Since you're dealing with a 2D array* you can use broadcasting to compare B with raveled version of A. This will give you the respective indices in a raveled shape. Then you can reverse the result and get the corresponding indices in original array using np.unravel_index.
In [50]: d = np.where(B[:, None] == A.ravel())[1]
In [51]: np.unravel_index(d, A.shape)
Out[51]: (array([0, 5, 0, 3, 2, 4, 0, 2]), array([0, 2, 2, 0, 0, 1, 1, 2]))
^
# expected result
* From documentation: For 3-dimensional arrays this is certainly efficient in terms of lines of code, and, for small data sets, it can also be computationally efficient. For large data sets, however, the creation of the large 3-d array may result in sluggish performance.
Also, Broadcasting is a powerful tool for writing short and usually intuitive code that does its computations very efficiently in C. However, there are cases when broadcasting uses unnecessarily large amounts of memory for a particular algorithm. In these cases, it is better to write the algorithm's outer loop in Python. This may also produce more readable code, as algorithms that use broadcasting tend to become more difficult to interpret as the number of dimensions in the broadcast increases.
Is something like this what you are looking for?
import numpy as np
from itertools import combinations
A = np.array([[0, 3, 1],
[9, 4, 6],
[2, 7, 3],
[1, 8, 9],
[6, 2, 7],
[4, 8, 0]])
B = np.array([0,1,2,3])
for i in combinations(A, 2):
if np.all(np.isin(B, np.hstack(i))):
print(i[0], ' ', i[1])
which prints the following:
[0 3 1] [2 7 3]
[0 3 1] [6 2 7]
note: this solution does NOT require the rows be consecutive. Please let me know if that is required.

Numpy slice to indices

Let's say I have a 3x3 matrix. The 1D indices of this matrix are:
0 1 2
3 4 5
6 7 8
Is there a function that receives a slice and returns the 1D indices, wathever the dimension? Something like:
m = np.ones((3, 3))
id1 = some_function(m, (1, :)) # [3, 4, 5]
id2 = some_function(m, (:, 1)) # [1, 4, 7]
# Use the indices together
m[id1 + id2] = wathever
m[~(id1 + id2)] = wathever else
I don't want to code it because I'm sure it exists somewhere in numpy! For those who wonder why I want that, it's because I want to merge several slices together, use not (~) on the indices, etc.
ravel_multi_index returns the 1d equivalent of n-d indexing tuple:
In [208]: np.ravel_multi_index(([1],[0,1,2]),(3,3))
Out[208]: array([3, 4, 5], dtype=int32)
In [209]: np.ravel_multi_index(([0,1,2],[1]),(3,3))
Out[209]: array([1, 4, 7], dtype=int32)
For more complex indexing we may need to use ix_ to get index broadcasting right:
In [214]: np.ravel_multi_index((np.ix_([0,1,2],[1,2])),(3,3))
Out[214]:
array([[1, 2],
[4, 5],
[7, 8]], dtype=int32)
Now we just need to turn [1,:] in to that tuple. Something in indexing_tricks should do that.
In [222]: np.ravel_multi_index((np.ix_(np.r_[0:3],[1,2])),(3,3))
Out[222]:
array([[1, 2],
[4, 5],
[7, 8]], dtype=int32)
In [223]: np.ravel_multi_index((np.ix_([1],np.r_[0:3])),(3,3))
Out[223]: array([[3, 4, 5]], dtype=int32)
In a more general case we'd want to use m.shape instead of (3,3).
~ works on boolean masks, not indices. So to 'delete' the [1] element from a array, we can do:
In [225]: mask = np.ones((3,),bool)
In [226]: mask[1] = False # index to delete
In [227]: np.arange(3)[mask]
Out[227]: array([0, 2])
This is essentially what np.delete does.

numpy.searchsorted with 2D array

I have this numpy array where the values in each row will always be sorted and monotonically increasing:
a = np.array([[1, 2, 3, 4, 8],
[2, 5, 6, 7, 8],
[5, 7, 11, 12, 13]])
and I want to search for the following values (which are NOT sorted or monotonic) for each row:
b = np.array([4.5, 2.3, 11.6])
so that I get an answer of:
[4, 1, 3]
However, searchsorted does not support this (it feels like it needs an axis keyword).
Is there an EFFICIENT way I can do this for a very large array? Obviously with a for loop I can index the array a and b like this:
for i in np.arange(np.alen(a)):
print a[i].searchsorted(b[i])
but this is slow when a is large.
Is there a way to do this in numpy that is more efficient?
You can searchsorted on the ravel/flattened array:
In [11]: np.searchsorted(a.ravel(), b)
Out[11]: array([3, 6])
You can then use divmod on the result (which gets the row and column):
In [12]: divmod(np.searchsorted(a.ravel(), b), a.shape[1])
Out[12]: (array([0, 1]), array([3, 1]))

Categories