strange behaviour of numpy array_split - python

I do not understand the behaviour of numpy.array_split with subindices. Indeed when I consider an array of a given length, I determine a subindices and I try to use array_split. I obtain different behaviour if the number of subindices is odd or even. Let's make an example
import numpy as np
a = np.ones(2750001) # fake array
t = np.arange(a.size) # fake time basis
indA = ((t>= 5e5) & (t<= 1e6)) # First subindices odd number
indB = ((t>=5e5+1) & (t<= 1e6)) # Second indices even number
# now perform array_split
print(np.shape(np.array_split(a[indA],10)))
# (10,)
print(np.shape(np.array_split(a[indB],10)))
# (10, 50000)
Now we have different results, basically for the even number we have that the shape command gives actually (10,50000) whereas the shape command in case of odd indices gives (10,) (the 10 lists supposed). I'm a bit surprise actually and I would like to understand the reason. I know that array_split can be used also when the number of splitting does not equally divide the array. But I would like some clue also because I need to insert in a loop where I do not know a priori if the indices will be even or odd.

I think the suprising behavior has more to do with np.shape than np.array_split:
In [58]: np.shape([(1,2),(3,4)])
Out[58]: (2, 2)
In [59]: np.shape([(1,2),(3,4,5)])
Out[59]: (2,)
np.shape(a) is showing the shape of the array np.asarray(a):
def shape(a):
try:
result = a.shape
except AttributeError:
result = asarray(a).shape
return result
So, when np.array_split returns a list of arrays of unequal length, np.asarray(a) is a 1-dimensional array of object dtype:
In [61]: np.asarray([(1,2),(3,4,5)])
Out[61]: array([(1, 2), (3, 4, 5)], dtype=object)
When array_split returns a list of arrays of equal length, then np.asarray(a) returns a 2-dimensional array:
In [62]: np.asarray([(1,2),(3,4)])
Out[62]:
array([[1, 2],
[3, 4]])

Related

numpy sum of each array in a list of arrays of different size

Given a list of numpy arrays, each of different length, as that obtained by doing lst = np.array_split(arr, indices), how do I get the sum of every array in the list? (I know how to do it using list-comprehension but I was hoping there was a pure-numpy way to do it).
I thought that this would work:
np.apply_along_axis(lambda arr: arr.sum(), axis=0, arr=lst)
But it doesn't, instead it gives me this error which I don't understand:
ValueError: operands could not be broadcast together with shapes (0,) (12,)
NB: It's an array of sympy objects.
There's a faster way which avoids np.split, and utilizes np.reduceat. We create an ascending array of indices where you want to sum elements with np.append([0], np.cumsum(indices)[:-1]). For proper indexing we need to put a zero in front (and discard the last element, if it covers the full range of the original array.. otherwise just delete the [:-1] indexing). Then we use the np.add ufunc with np.reduceat:
import numpy as np
arr = np.arange(1, 11)
indices = np.array([2, 4, 4])
# this should split like this
# [1 2 | 3 4 5 6 | 7 8 9 10]
np.add.reduceat(arr, np.append([0], np.cumsum(indices)[:-1]))
# array([ 3, 18, 34])

How to find a specific value in a numpy array?

I have my np array list with tuples like np.array[(0,1), (2,5),...]
Now I want to search for the index of a certain value. But I just know the left side of the tuple. The approach I have found to get the index of a value (if you have both) is the following:
x = np.array(list(map(lambda x: x== (2, 5), groups)))
print(np.where(x))
But how can I search if I only know x==(2,) but not the right number?
As stated in https://numpy.org/doc/stable/reference/generated/numpy.where.html#numpy.where, it is preferred to use np.nonzero directly. I would also recommend reading up on NumPy's use of boolean masking. The answer to your question, is this:
import numpy as np
i = 0 # index of number we're looking for
j = 2 # number we're looking for
mask = x[:,i] == j # Generate a binary/boolean mask for this array and this comparison
indices = np.nonzero(mask) # Find indices where x==(2,_)
print(indices)
In NumPy it's generally preferred to avoid loops like the one you use above. Instead, you should use vectorized operations. So, try to avoid the list(map()) construction you used here.
It might just be easier that you think.
Example:
# Create a sample array.
a = np.array([(0, 1), (2, 5), (3, 2), (4, 6)])
# Use slicing to return all rows, there column 0 equals 2.
(a[:,0] == 2).argmax()
>>> 1
# Test the returned index against the array to verify.
a[1]
>>> array([2, 5])
Another way to look at the array is shown below, and will help put the concept of rows/columns into perspective for the mentioned array:
>>> a
array([[0, 1],
[2, 5],
[3, 2],
[4, 6]])

How do I in a sense "juggle" elements in ndarrays?

What I mean by this is imagine you have an ndarray a with shape (2,3,4). I want to define another ndarray b with shape (3,2,4) such that
b[i][j][k] = a[j][i][k]
Matrix operations only apply to the last 2 index places. If there is a way to make matrix operations act on any 2 chosen index places then everything can be solved.
Thank you
On the same lines of your thought, you can use numpy.einsum() to achieve what you want.
In [21]: arr = np.random.randn(2,3,4)
In [22]: arr.shape
Out[22]: (2, 3, 4)
# swap first two dimensions
In [23]: rolled = np.einsum('ijk->jik', arr)
In [24]: rolled.shape
Out[24]: (3, 2, 4)
But pay attention to what you want to do with the resulting array because a view of the original array is returned. Thus, if you modify the rolled array, the original arr will also be affected.
Use numpy.rollaxis:
numpy.rollaxis(a, 1)
What you are looking for is probably np.transpose(..) (in fact the transpose of a 2D matrix is a specific case of this):
b = a.transpose((1, 0, 2))
Here we specify that the first index of the new matrix (b) is the second (1) index of the old matrix (a); that the second index of the new matrix is the first (0) index of the old matrix; and the third index of the new matrix is the third index (2) of the old matrix.
This thus means that if a has a.shape = (m, n, p), then b.shape = (n, m, p).

Index of multidimensional array

I have a problem using multi-dimensional vectors as indices for multi-dimensional vectors. Say I have C.ndim == idx.shape[0], then I want C[idx] to give me a single element. Allow me to explain with a simple example:
A = arange(0,10)
B = 10+A
C = array([A.T, B.T])
C = C.T
idx = array([3,1])
Now, C[3] gives me the third row, and C[1] gives me the first row. C[idx] then will give me a vstack of both rows. However, I need to get C[3,1]. How would I achieve that given arrays C, idx?
/edit:
An answer suggested tuple(idx). This work's perfectly for a single idx. But:
Let's take it to the next level: say INDICES is a vector where I have stacked vertically arrays of shape idx. tuple(INDICES) will give me one long tuple, so C[tuple(INDICES)] won't work. Is there a clean way of doing this or will I need to iterate over the rows?
If you convert idx to a tuple, it'll be interpreted as basic and not advanced indexing:
>>> C[3,1]
13
>>> C[tuple(idx)]
13
For the vector case:
>>> idx
array([[3, 1],
[7, 0]])
>>> C[3,1], C[7,0]
(13, 7)
>>> C[tuple(idx.T)]
array([13, 7])
>>> C[idx[:,0], idx[:,1]]
array([13, 7])

How can I check whether a numpy array is empty or not?

How can I check whether a numpy array is empty or not?
I used the following code, but this fails if the array contains a zero.
if not self.Definition.all():
Is this the solution?
if self.Definition == array([]):
You can always take a look at the .size attribute. It is defined as an integer, and is zero (0) when there are no elements in the array:
import numpy as np
a = np.array([])
if a.size == 0:
# Do something when `a` is empty
https://numpy.org/devdocs/user/quickstart.html (2020.04.08)
NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of non-negative integers. In NumPy dimensions are called axes.
(...) NumPy’s array class is called ndarray. (...) The more important attributes of an ndarray object are:
ndarray.ndim
the number of axes (dimensions) of the array.
ndarray.shape
the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the number of axes, ndim.
ndarray.size
the total number of elements of the array. This is equal to the product of the elements of shape.
One caveat, though.
Note that np.array(None).size returns 1!
This is because a.size is equivalent to np.prod(a.shape),
np.array(None).shape is (), and an empty product is 1.
>>> import numpy as np
>>> np.array(None).size
1
>>> np.array(None).shape
()
>>> np.prod(())
1.0
Therefore, I use the following to test if a numpy array has elements:
>>> def elements(array):
... return array.ndim and array.size
>>> elements(np.array(None))
0
>>> elements(np.array([]))
0
>>> elements(np.zeros((2,3,4)))
24
Why would we want to check if an array is empty? Arrays don't grow or shrink in the same that lists do. Starting with a 'empty' array, and growing with np.append is a frequent novice error.
Using a list in if alist: hinges on its boolean value:
In [102]: bool([])
Out[102]: False
In [103]: bool([1])
Out[103]: True
But trying to do the same with an array produces (in version 1.18):
In [104]: bool(np.array([]))
/usr/local/bin/ipython3:1: DeprecationWarning: The truth value
of an empty array is ambiguous. Returning False, but in
future this will result in an error. Use `array.size > 0` to
check that an array is not empty.
#!/usr/bin/python3
Out[104]: False
In [105]: bool(np.array([1]))
Out[105]: True
and bool(np.array([1,2]) produces the infamous ambiguity error.
edit
The accepted answer suggests size:
In [11]: x = np.array([])
In [12]: x.size
Out[12]: 0
But I (and most others) check the shape more than the size:
In [13]: x.shape
Out[13]: (0,)
Another thing in its favor is that it 'maps' on to an empty list:
In [14]: x.tolist()
Out[14]: []
But there are other other arrays with 0 size, that aren't 'empty' in that last sense:
In [15]: x = np.array([[]])
In [16]: x.size
Out[16]: 0
In [17]: x.shape
Out[17]: (1, 0)
In [18]: x.tolist()
Out[18]: [[]]
In [19]: bool(x.tolist())
Out[19]: True
np.array([[],[]]) is also size 0, but shape (2,0) and len 2.
While the concept of an empty list is well defined, an empty array is not well defined. One empty list is equal to another. The same can't be said for a size 0 array.
The answer really depends on
what do you mean by 'empty'?
what are you really test for?

Categories