remove empty numpy array - python

I have a numpy array:
array([], shape=(0, 4), dtype=float64)
How can I remove this array in a multidimensional array?
I tried
import numpy as np
if array == []:
np.delete(array)
But, the multidimensional array still has this empty array.
EDIT:
The input is
new_array = [array([], shape=(0, 4), dtype=float64),
array([[-0.97, 0.99, -0.98, -0.93 ],
[-0.97, -0.99, 0.59, -0.93 ],
[-0.97, 0.99, -0.98, -0.93 ],
[ 0.70 , 1, 0.60, 0.65]]), array([[-0.82, 1, 0.61, -0.63],
[ 0.92, -1, 0.77, 0.88],
[ 0.92, -1, 0.77, 0.88],
[ 0.65, -1, 0.73, 0.85]]), array([], shape=(0, 4), dtype=float64)]
The expected output after removing the empty arrays is:
new array = [array([[-0.97, 0.99, -0.98, -0.93 ],
[-0.97, -0.99, 0.59, -0.93 ],
[-0.97, 0.99, -0.98, -0.93 ],
[ 0.70 , 1, 0.60, 0.65]]),
array([[-0.82, 1, 0.61, -0.63],
[ 0.92, -1, 0.77, 0.88],
[ 0.92, -1, 0.77, 0.88],
[ 0.65, -1, 0.73, 0.85]])]

new_array, as printed, looks like a list of arrays. And even if it were an array, it would be a 1d array of dtype=object.
==[] is not the way to check for an empty array:
In [10]: x=np.zeros((0,4),float)
In [11]: x
Out[11]: array([], shape=(0, 4), dtype=float64)
In [12]: x==[]
Out[12]: False
In [14]: 0 in x.shape # check if there's a 0 in the shape
Out[14]: True
Check the syntax for np.delete. It requires an array, an index and an axis, and returns another array. It does not operate in place.
If new_array is a list, a list comprehension would do a nice job of removing the [] arrays:
In [33]: alist=[x, np.ones((2,3)), np.zeros((1,4)),x]
In [34]: alist
Out[34]:
[array([], shape=(0, 4), dtype=float64), array([[ 1., 1., 1.],
[ 1., 1., 1.]]), array([[ 0., 0., 0., 0.]]), array([], shape=(0, 4), dtype=float64)]
In [35]: [y for y in alist if 0 not in y.shape]
Out[35]:
[array([[ 1., 1., 1.],
[ 1., 1., 1.]]), array([[ 0., 0., 0., 0.]])]
It would also work if new_array was a 1d array:
new_array=np.array(alist)
newer_array = np.array([y for y in new_array if 0 not in y.shape])
To use np.delete with new_array, you have to specify which elements:
In [47]: np.delete(new_array,[0,3])
Out[47]:
array([array([[ 1., 1., 1.],
[ 1., 1., 1.]]),
array([[ 0., 0., 0., 0.]])], dtype=object)
to find [0,3] you could use np.where:
np.delete(new_array,np.where([y.size==0 for y in new_array]))
Better yet, skip the delete and where and go with a boolean mask
new_array[np.array([y.size>0 for y in new_array])]
I don't think there's a way of identifying these 'emtpy' arrays without a list comprehension, since you have to check the shape or size property, not the element's data. Also there's a limit as to what kinds of math you can do across elements of an object array. It's more like a list than a 2d array.

I had initially an array (3,11,11) and after a multprocessing using pool.map my array was transformed in a list like this:
[array([], shape=(0, 11, 11), dtype=float64),
array([[[ 0.35318114, 0.36152024, 0.35572945, 0.34495254, 0.34169853,
0.36553977, 0.34266126, 0.3492261 , 0.3339431 , 0.34759375,
0.33490712],...
if a convert this list in an array the shape was (3,), so I used:
myarray = np.vstack(mylist)
and this returned my first 3d array with the original shape (3,11,11).

Delete takes the multidimensional array as a parameter. Then you need to specify the subarray to delete and the axis it's on. See http://docs.scipy.org/doc/numpy/reference/generated/numpy.delete.html
np.delete(new_array,<obj indicating subarray to delete (perhaps an array of integers in your case)>, 0)
Also, note that the deletion is not in-place.

Related

Element wise mean of numpy arrays of different sizes

So there is a csv file I'm reading where I'm focusing on col3 where the rows have the values of different lengths where initially it was being read as a type str but was fixed using pd.eval.
df = pd.read_csv('datafile.csv', converters={'col3': pd.eval})
row e.g. [0, 100, -200, 300, -150...]
There are many rows of different sizes and I want to calculate the element wise average, where I have followed this solution.
I first ran into the Numpy VisibleDeprecationWarning error which I fixed using this.
But for the last step of the solution using np.nanmean I'm running into a new error which is
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
My code looks like this so far:
import pandas as pd
import numpy as np
import itertools
df = pd.read_csv('datafile.csv', converters={'col3': pd.eval})
datafile = df[(df['col1'] == 'Red') & (df['col2'] == Name) & ((df['col4'] == 'EX') | (df['col5'] == 'EX'))]
np.warnings.filterwarnings('ignore', category=np.VisibleDeprecationWarning)
ar = np.array(list(itertools.zip_longest(df['col3'], fillvalue=np.nan)))
print(ar)
np.nanmean(ar,axis=1)
the arrays print like this
And the error is pointing towards the last line
The error I can see if pointing towards the arrays being of type object but I'm not sure how to fix it.
Make a ragged array:
In [23]: arr = np.array([np.arange(5), np.ones(5),np.zeros(3)],object)
In [24]: arr
Out[24]:
array([array([0, 1, 2, 3, 4]), array([1., 1., 1., 1., 1.]),
array([0., 0., 0.])], dtype=object)
Note the shape and dtype.
Try to use mean on it:
In [25]: np.mean(arr)
Traceback (most recent call last):
Input In [25] in <cell line: 1>
np.mean(arr)
File <__array_function__ internals>:180 in mean
File /usr/local/lib/python3.10/dist-packages/numpy/core/fromnumeric.py:3432 in mean
return _methods._mean(a, axis=axis, dtype=dtype,
File /usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:180 in _mean
ret = umr_sum(arr, axis, dtype, out, keepdims, where=where)
ValueError: operands could not be broadcast together with shapes (5,) (3,)
Apply mean to each element array works:
In [26]: [np.mean(a) for a in arr]
Out[26]: [2.0, 1.0, 0.0]
Trying to use zip_longest:
In [27]: import itertools
In [28]: list(itertools.zip_longest(arr))
Out[28]:
[(array([0, 1, 2, 3, 4]),),
(array([1., 1., 1., 1., 1.]),),
(array([0., 0., 0.]),)]
No change. We can use it by unpacking the arr - but it has padded the arrays in the wrong way:
In [29]: list(itertools.zip_longest(*arr))
Out[29]: [(0, 1.0, 0.0), (1, 1.0, 0.0), (2, 1.0, 0.0), (3, 1.0, None), (4, 1.0, None)]
zip_longest can be used to pad lists, but it takes more thought than this.
If we make an array from that list:
In [35]: np.array(list(itertools.zip_longest(*arr,fillvalue=np.nan)))
Out[35]:
array([[ 0., 1., 0.],
[ 1., 1., 0.],
[ 2., 1., 0.],
[ 3., 1., nan],
[ 4., 1., nan]])
and transpose it, we can take the nanmean:
In [39]: np.array(list(itertools.zip_longest(*arr,fillvalue=np.nan))).T
Out[39]:
array([[ 0., 1., 2., 3., 4.],
[ 1., 1., 1., 1., 1.],
[ 0., 0., 0., nan, nan]])
In [40]: np.nanmean(_, axis=1)
Out[40]: array([2., 1., 0.])

Numpy n-tuple array with dtype float

I need an expression that will grant me an 8-tuple float array. Currently, I have the 8-tuple array via:
E = np.zeros((n,m), dtype='8i') #8-tuple
However, when I append an indices i,j via:
E[i,j][0] = 1000.2 #etc.
I get back a tuple array with dtype int:
[1000 0 0 0 0 0 0 0]
It appears I need a way of using the dtype within my zeros command to both set the n-tuple and the float value. Does anyone know how this is done?
If an array is integer dtype, then assigned values will be truncated:
In [169]: x=np.array([0,1,2])
In [170]: x
Out[170]: array([0, 1, 2])
In [173]: x[0] = 1.234
In [174]: x
Out[174]: array([1, 1, 2])
The array has to have a float dtype to hold float values.
Simply changing the i (integer) to f (float) produces a float array:
In [166]: E = np.zeros((2,3), dtype='8f')
In [167]: E.shape
Out[167]: (2, 3, 8)
In [168]: E.dtype
Out[168]: dtype('float32')
This '8f' dtype is not common. The string actually translates to:
In [175]: np.dtype('8f')
Out[175]: dtype(('<f4', (8,)))
But when used in np.zeros that 8 is treated as a dimension. Usually we specify all dimensions in the shape, as #FHTMitchell notes:
In [176]: E1 = np.zeros((2,3,8), dtype=np.float32)
In [177]: E1.shape
Out[177]: (2, 3, 8)
In [178]: E1.dtype
Out[178]: dtype('float32')
Your use of 'n-tuple' is unclear. While shape is a tuple, numeric arrays don't use tuple notation. That is reserved for structured arrays.
In [180]: np.zeros((3,), dtype='f,f,f,f')
Out[180]:
array([(0., 0., 0., 0.), (0., 0., 0., 0.), (0., 0., 0., 0.)],
dtype=[('f0', '<f4'), ('f1', '<f4'), ('f2', '<f4'), ('f3', '<f4')])
In [181]: _.shape
Out[181]: (3,)
This is a 1d array with 3 elements. The dtype shows 4 fields. Each element, or record, is displayed as a tuple.
But fields are indexed by name, not number:
In [182]: Out[180]['f1']
Out[182]: array([0., 0., 0.], dtype=float32)
It is also possible to put 'arrays' within fields:
In [183]: np.zeros((3,), dtype=[('f0','f',(4,))])
Out[183]:
array([([0., 0., 0., 0.],), ([0., 0., 0., 0.],), ([0., 0., 0., 0.],)],
dtype=[('f0', '<f4', (4,))])
In [184]: _['f0']
Out[184]:
array([[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]], dtype=float32)
Initially I thought the 8f notation would produce this sort of array. But apparently I have to either use the full notation with field name, or make a comma separated string:
In [185]: np.zeros((3,), dtype='4f,i')
Out[185]:
array([([0., 0., 0., 0.], 0), ([0., 0., 0., 0.], 0),
([0., 0., 0., 0.], 0)], dtype=[('f0', '<f4', (4,)), ('f1', '<i4')])
dtype notation can be confusing, https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.dtypes.html
Unless you are intentionally trying to create a structured array, it is best to stay away from the '8f' notation.
In [189]: np.array([0,1,2,3],dtype='4i')
TypeError: object of type 'int' has no len()
In [190]: np.array([[0,1,2,3]],dtype='4i')
TypeError: object of type 'int' has no len()
In [191]: np.array([(0,1,2,3)],dtype='4i') # requires [(...)]
Out[191]: array([[0, 1, 2, 3]], dtype=int32)
Without the 4, I can simply write:
In [193]: np.array([[0,1,2,3]], dtype='i')
Out[193]: array([[0, 1, 2, 3]], dtype=int32)
In [194]: np.array([0,1,2,3], dtype='i')
Out[194]: array([0, 1, 2, 3], dtype=int32)
In [195]: np.array([[0,1,2,3]])
Out[195]: array([[0, 1, 2, 3]])
E = np.zeros((n,m), dtype='8f')
Try:
E = np.zeros((n,m), dtype='8f') #8-tuple

Understanding axes in NumPy

I was going through NumPy documentation, and am not able to understand one point. It mentions, for the example below, the array has rank 2 (it is 2-dimensional). The first dimension (axis) has a length of 2, the second dimension has a length of 3.
[[ 1., 0., 0.],
[ 0., 1., 2.]]
How does the first dimension (axis) have a length of 2?
Edit:
The reason for my confusion is the below statement in the documentation.
The coordinates of a point in 3D space [1, 2, 1] is an array of rank
1, because it has one axis. That axis has a length of 3.
In the original 2D ndarray, I assumed that the number of lists identifies the rank/dimension, and I wrongly assumed that the length of each list denotes the length of each dimension (in that order). So, as per my understanding, the first dimension should be having a length of 3, since the length of the first list is 3.
In numpy, axis ordering follows zyx convention, instead of the usual (and maybe more intuitive) xyz.
Visually, it means that for a 2D array where the horizontal axis is x and the vertical axis is y:
x -->
y 0 1 2
| 0 [[1., 0., 0.],
V 1 [0., 1., 2.]]
The shape of this array is (2, 3) because it is ordered (y, x), with the first axis y of length 2.
And verifying this with slicing:
import numpy as np
a = np.array([[1, 0, 0], [0, 1, 2]], dtype=np.float)
>>> a
Out[]:
array([[ 1., 0., 0.],
[ 0., 1., 2.]])
>>> a[0, :] # Slice index 0 of first axis
Out[]: array([ 1., 0., 0.]) # Get values along second axis `x` of length 3
>>> a[:, 2] # Slice index 2 of second axis
Out[]: array([ 0., 2.]) # Get values along first axis `y` of length 2
You may be confusing the other sentence with the picture example below. Think of it like this: Rank = number of lists in the list(array) and the term length in your question can be thought of length = the number of 'things' in the list(array)
I think they are trying to describe to you the definition of shape which is in this case (2,3)
in that post I think the key sentence is here:
In NumPy dimensions are called axes. The number of axes is rank.
If you print the numpy array
print(np.array([[ 1. 0. 0.],[ 0. 1. 2.]])
You'll get the following output
#col1 col2 col3
[[ 1. 0. 0.] # row 1
[ 0. 1. 2.]] # row 2
Think of it as a 2 by 3 matrix... 2 rows, 3 columns. It is a 2d array because it is a list of lists. ([[ at the start is a hint its 2d)).
The 2d numpy array
np.array([[ 1. 0., 0., 6.],[ 0. 1. 2., 7.],[3.,4.,5,8.]])
would print as
#col1 col2 col3 col4
[[ 1. 0. , 0., 6.] # row 1
[ 0. 1. , 2., 7.] # row 2
[3., 4. , 5., 8.]] # row 3
This is a 3 by 4 2d array (3 rows, 4 columns)
The first dimensions is the length:
In [11]: a = np.array([[ 1., 0., 0.], [ 0., 1., 2.]])
In [12]: a
Out[12]:
array([[ 1., 0., 0.],
[ 0., 1., 2.]])
In [13]: len(a) # "length of first dimension"
Out[13]: 2
The second is the length of each "row":
In [14]: [len(aa) for aa in a] # 3 is "length of second dimension"
Out[14]: [3, 3]
Many numpy functions take axis as an argument, for example you can sum over an axis:
In [15]: a.sum(axis=0)
Out[15]: array([ 1., 1., 2.])
In [16]: a.sum(axis=1)
Out[16]: array([ 1., 3.])
The thing to note is that you can have higher dimensional arrays:
In [21]: b = np.array([[[1., 0., 0.], [ 0., 1., 2.]]])
In [22]: b
Out[22]:
array([[[ 1., 0., 0.],
[ 0., 1., 2.]]])
In [23]: b.sum(axis=2)
Out[23]: array([[ 1., 3.]])
Keep the following points in mind when considering Numpy axes:
Each sub-level of a list (or array) represents an axis. For example:
import numpy as np
a = np.array([1,2]) # 1 axis
b = np.array([[1,2],[3,4]]) # 2 axes
c = np.array([[[1,2],[3,4]],[[5,6],[7,8]]]) # 3 axes
Axis labels correspond to the level of the sub-list they represent, starting with axis 0 for the outer most list.
To illustrate this, consider the following array of different shape, each with 24 elements:
# 1D Array
a0 = np.array(
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]
)
a0.shape # (24,) - here, the length along the 0-axis is 24
# 2D Array
a01 = np.array(
[
[1.1, 1.2, 1.3, 1.4],
[2.1, 2.2, 2.3, 2.4],
[3.1, 3.2, 3.3, 3.4],
[4.1, 4.2, 4.3, 4.4],
[5.1, 5.2, 5.3, 5.4],
[6.1, 6.2, 6.3, 6.4]
]
)
a01.shape # (6, 4) - now, the length along the 0-axis is 6
# 3D Array
a012 = np.array(
[
[
[1.1.1, 1.1.2],
[1.2.1, 1.2.2],
[1.3.1, 1.3.2]
],
[
[2.1.1, 2.1.2],
[2.2.1, 2.2.2],
[2.3.1, 2.3.2]
],
[
[3.1.1, 3.1.2],
[3.2.1, 3.2.2],
[3.3.1, 3.3.2]
],
[
[4.1.1, 4.1.2],
[4.2.1, 4.2.2],
[4.3.1, 4.3.2]
]
)
a012.shape # (4, 3, 2) - and finally, the length along the 0-axis is 4

Numpy indexing set 1 to max value and zero's to all others

I think I've misunderstood something with indexing in numpy.
I have a 3D-numpy array of shape (dim_x, dim_y, dim_z) and I want to find the maximum along the third axis (dim_z), and set its value to 1 and all the others to zero.
The problem is that I end up with several 1 in the same row, even if values are different.
Here is the code :
>>> test = np.random.rand(2,3,2)
>>> test
array([[[ 0.13110146, 0.07138861],
[ 0.84444158, 0.35296986],
[ 0.97414498, 0.63728852]],
[[ 0.61301975, 0.02313646],
[ 0.14251848, 0.91090492],
[ 0.14217992, 0.41549218]]])
>>> result = np.zeros_like(test)
>>> result[:test.shape[0], np.arange(test.shape[1]), np.argmax(test, axis=2)]=1
>>> result
array([[[ 1., 0.],
[ 1., 1.],
[ 1., 1.]],
[[ 1., 0.],
[ 1., 1.],
[ 1., 1.]]])
I was expecting to end with :
array([[[ 1., 0.],
[ 1., 0.],
[ 1., 0.]],
[[ 1., 0.],
[ 0., 1.],
[ 0., 1.]]])
Probably I'm missing something here. From what I've understood, 0:dim_x, np.arange(dim_y) returns dim_x of dim_y tuples and np.argmax(test, axis=dim_z) has the shape (dim_x, dim_y) so if the indexing is of the form [x, y, z] a couple [x, y] is not supposed to appear twice.
Could someone explain me where I'm wrong ? Thanks in advance.
What we are looking for
We get the argmax indices along the last axis -
idx = np.argmax(test, axis=2)
For the given sample data, we have idx :
array([[0, 0, 0],
[0, 1, 1]])
Now, idx covers the first and second axes, while getting those argmax indices.
To assign the corresponding ones in the output, we need to create range arrays for the first two axes covering the lengths along those and aligned according to the shape of idx. Now, idx is a 2D array of shape (m,n), where m = test.shape[0] and n = test.shape[1].
Thus, the range arrays for assignment into first two axes of output must be -
X = np.arange(test.shape[0])[:,None]
Y = np.arange(test.shape[1])
Notice, the extension of the first range array to 2D is needed to have it aligned against the rows of idx and Y would align against the cols of idx -
In [239]: X
Out[239]:
array([[0],
[1]])
In [240]: Y
Out[240]: array([0, 1, 2])
Schematically put -
idx :
Y array
--------->
x x x | X array
x x x |
v
The fault in original code
Your code was -
result[:test.shape[0], np.arange(test.shape[1]), ..
This is essentially :
result[:, np.arange(test.shape[1]), ...
So, you are selecting all elements along the first axis, instead of only selecting the corresponding ones that correspond to idx indices. In that process, you were selecting a lot more than required elements for assignment and hence you were seeing many more than required 1s in result array.
The correction
Thus, the only correction needed was indexing into the first axis with the range array and a working solution would be -
result[np.arange(test.shape[0])[:,None], np.arange(test.shape[1]), ...
The alternative(s)
Alternatively, using the range arrays created earlier with X and Y -
result[X,Y,idx] = 1
Another way to get X,Y would be with np.mgrid -
m,n = test.shape[:2]
X,Y = np.ogrid[:m,:n]
I think there's a problem with mixing basic (slice) and advanced indexing. It's easier to see when selecting value from an array than with this assignment; but it can result in transposed axes. For a problem like this it is better use advanced indexing all around, as provided by ix_
In [24]: test = np.random.rand(2,3,2)
In [25]: idx=np.argmax(test,axis=2)
In [26]: idx
Out[26]:
array([[1, 0, 1],
[0, 1, 1]], dtype=int32)
with basic and advanced:
In [31]: res1 = np.zeros_like(test)
In [32]: res1[:, np.arange(test.shape[1]), idx]=1
In [33]: res1
Out[33]:
array([[[ 1., 1.],
[ 1., 1.],
[ 0., 1.]],
[[ 1., 1.],
[ 1., 1.],
[ 0., 1.]]])
with advanced:
In [35]: I,J = np.ix_(range(test.shape[0]), range(test.shape[1]))
In [36]: I
Out[36]:
array([[0],
[1]])
In [37]: J
Out[37]: array([[0, 1, 2]])
In [38]: res2 = np.zeros_like(test)
In [40]: res2[I, J , idx]=1
In [41]: res2
Out[41]:
array([[[ 0., 1.],
[ 1., 0.],
[ 0., 1.]],
[[ 1., 0.],
[ 0., 1.],
[ 0., 1.]]])
On further thought, the use of the slice for the 1st dimension is just wrong , if the goal is to set or find the 6 argmax values
In [54]: test
Out[54]:
array([[[ 0.15288242, 0.36013289],
[ 0.90794601, 0.15265616],
[ 0.34014976, 0.53804266]],
[[ 0.97979479, 0.15898605],
[ 0.04933804, 0.89804999],
[ 0.10199319, 0.76170911]]])
In [55]: test[I, J, idx]
Out[55]:
array([[ 0.36013289, 0.90794601, 0.53804266],
[ 0.97979479, 0.89804999, 0.76170911]])
In [56]: test[:, J, idx]
Out[56]:
array([[[ 0.36013289, 0.90794601, 0.53804266],
[ 0.15288242, 0.15265616, 0.53804266]],
[[ 0.15898605, 0.04933804, 0.76170911],
[ 0.97979479, 0.89804999, 0.76170911]]])
With the slice it selects a (2,3,2) set of values from test (or res), not the intended (2,3). There 2 extra rows.
Here is an easier way to do it:
>>> test == test.max(axis=2, keepdims=1)
array([[[ True, False],
[ True, False],
[ True, False]],
[[ True, False],
[False, True],
[False, True]]], dtype=bool)
...and if you really want that as floating-point 1.0 and 0.0, then convert it:
>>> (test==test.max(axis=2, keepdims=1)).astype(float)
array([[[ 1., 0.],
[ 1., 0.],
[ 1., 0.]],
[[ 1., 0.],
[ 0., 1.],
[ 0., 1.]]])
Here is a way to do it with only one winner per row-column combo (i.e. no ties, as discussed in comments):
rowmesh, colmesh = np.meshgrid(range(test.shape[0]), range(test.shape[1]), indexing='ij')
maxloc = np.argmax(test, axis=2)
flatind = np.ravel_multi_index( [rowmesh, colmesh, maxloc ], test.shape )
result = np.zeros_like(test)
result.flat[flatind] = 1
UPDATE after reading hpaulj's answer:
rowmesh, colmesh = np.ix_(range(test.shape[0]), range(test.shape[1]))
is a more-efficient, more numpythonic, alternative to my meshgrid call (the rest of the code stays the same)
The issue of why your approach fails is hard to explain, but here's one place where intuition could start: your slicing approach says "all rows, times all columns, times a certain sequence of layers". How many elements is that slice in total? By contrast, how many elements do you actually want to set to 1? It can be instructive to look at the values you get when you view the corresponding test values of the slice you're trying to assign to:
>>> test[:, :, maxloc].shape
(2, 3, 2, 3) # oops! it's because maxloc itself is 2x3
>>> test[:, :, maxloc]
array([[[[ 0.13110146, 0.13110146, 0.13110146],
[ 0.13110146, 0.07138861, 0.07138861]],
[[ 0.84444158, 0.84444158, 0.84444158],
[ 0.84444158, 0.35296986, 0.35296986]],
[[ 0.97414498, 0.97414498, 0.97414498],
[ 0.97414498, 0.63728852, 0.63728852]]],
[[[ 0.61301975, 0.61301975, 0.61301975],
[ 0.61301975, 0.02313646, 0.02313646]],
[[ 0.14251848, 0.14251848, 0.14251848],
[ 0.14251848, 0.91090492, 0.91090492]],
[[ 0.14217992, 0.14217992, 0.14217992],
[ 0.14217992, 0.41549218, 0.41549218]]]]) # note the repetition, because in maxloc you're repeatedly asking for layer 0 sometimes, and sometimes repeatedly for layer 1

Split NumPy array according to values in the array (a condition)

I have an array:
arr = [(1,1,1), (1,1,2), (1,1,3), (1,1,4)...(35,1,22),(35,1,23)]
I want to split my array according to the third value in each ordered pair. I want each third value of 1 to be the start
of a new array. The results should be:
[(1,1,1), (1,1,2),...(1,1,35)][(1,2,1), (1,2,2),...(1,2,46)]
and so on. I know numpy.split should do the trick but I'm lost as to how to write the condition for the split.
Here's a quick idea, working with a 1d array. It can be easily extended to work with your 2d array:
In [385]: x=np.arange(10)
In [386]: I=np.where(x%3==0)
In [387]: I
Out[387]: (array([0, 3, 6, 9]),)
In [389]: np.split(x,I[0])
Out[389]:
[array([], dtype=float64),
array([0, 1, 2]),
array([3, 4, 5]),
array([6, 7, 8]),
array([9])]
The key is to use where to find the indecies where you want split to act.
For a 2d arr
First make a sample 2d array, with something interesting in the 3rd column:
In [390]: arr=np.ones((10,3))
In [391]: arr[:,2]=np.arange(10)
In [392]: arr
Out[392]:
array([[ 1., 1., 0.],
[ 1., 1., 1.],
...
[ 1., 1., 9.]])
Then use the same where and boolean to find indexes to split on:
In [393]: I=np.where(arr[:,2]%3==0)
In [395]: np.split(arr,I[0])
Out[395]:
[array([], dtype=float64),
array([[ 1., 1., 0.],
[ 1., 1., 1.],
[ 1., 1., 2.]]),
array([[ 1., 1., 3.],
[ 1., 1., 4.],
[ 1., 1., 5.]]),
array([[ 1., 1., 6.],
[ 1., 1., 7.],
[ 1., 1., 8.]]),
array([[ 1., 1., 9.]])]
I cannot think of any numpy functions or tricks to do this . A simple solution using for loop would be -
In [48]: arr = [(1,1,1), (1,1,2), (1,1,3), (1,1,4),(1,2,1),(1,2,2),(1,2,3),(1,3,1),(1,3,2),(1,3,3),(1,3,4),(1,3,5)]
In [49]: result = []
In [50]: for i in arr:
....: if i[2] == 1:
....: tempres = []
....: result.append(tempres)
....: tempres.append(i)
....:
In [51]: result
Out[51]:
[[(1, 1, 1), (1, 1, 2), (1, 1, 3), (1, 1, 4)],
[(1, 2, 1), (1, 2, 2), (1, 2, 3)],
[(1, 3, 1), (1, 3, 2), (1, 3, 3), (1, 3, 4), (1, 3, 5)]]
From looking at the documentation it seems like specifying the index of where to split on will work best. For your specific example the following works if arr is already a 2dimensional numpy array:
np.split(arr, np.where(arr[:,2] == 1)[0])
arr[:,2] returns a list of the 3rd entry in each tuple. The colon says to take every row and the 2 says to take the 3rd column, which is the 3rd component.
We then use np.where to return all the places where the 3rd coordinate is a 1. We have to do np.where()[0] to get at the array of locations directly.
We then plug in the indices we've found where the 3rd coordinate is 1 to np.split which splits at the desired locations.
Note that because the first entry has a 1 in the 3rd coordinate it will split before the first entry. This gives us one extra "split" array which is empty.

Categories