Error while merging split part of numpy array - python

test = [(0,1,2),(9,0,1),(0,1,3),(0,1,8)]
test=np.array(test)
test = np.array_split(test, 4)
t_0 = test[:0]
t_1 = test[0]
new_test= t_0+test[1:]
print(new_test)
It is giving me following answer : [array([[9, 0, 1]]), array([[0, 1, 3]]), array([[0, 1, 8]])]
Whereas I am aiming for [(9,0,1),(0,1,3),(0,1,8)] if I select the first set from test.

You are mixing up lists and arrays. Pay attention to what you get at each stage:
Start with a list of tuples:
In [126]: test = [(0,1,2),(9,0,1),(0,1,3),(0,1,8)]
Make a 2d array:
In [127]: arr = np.array(test)
In [128]: arr
Out[128]:
array([[0, 1, 2],
[9, 0, 1],
[0, 1, 3],
[0, 1, 8]])
Split into a list - one 'row' per element, but each is a 2d array. Question, will the split number always be this size?
In [129]: alist = np.array_split(arr, arr.shape[0])
In [130]: alist
Out[130]:
[array([[0, 1, 2]]),
array([[9, 0, 1]]),
array([[0, 1, 3]]),
array([[0, 1, 8]])]
Sublists:
In [131]: alist[:0]
Out[131]: []
In [132]: alist[1:]
Out[132]: [array([[9, 0, 1]]), array([[0, 1, 3]]), array([[0, 1, 8]])]
List join:
In [133]: alist[:0] + alist[1:]
Out[133]: [array([[9, 0, 1]]), array([[0, 1, 3]]), array([[0, 1, 8]])]
Looks like what you want is a list of tuples, like what you started with:
In [134]: test[:0] + test[1:]
Out[134]: [(9, 0, 1), (0, 1, 3), (0, 1, 8)]
You could recreate a 2d array, by applying concatenate to the joined lists of arrays:
In [135]: np.concatenate(alist[:0] + alist[1:])
Out[135]:
array([[9, 0, 1],
[0, 1, 3],
[0, 1, 8]])
In [136]: np.concatenate(alist[:1] + alist[2:])
Out[136]:
array([[0, 1, 2],
[0, 1, 3],
[0, 1, 8]])
In [137]: np.concatenate(alist[:2] + alist[3:])
Out[137]:
array([[0, 1, 2],
[9, 0, 1],
[0, 1, 8]])
But note that you could just as easily get any of these arrays with indexing:
In [138]: arr[[0,1,3],:]
Out[138]:
array([[0, 1, 2],
[9, 0, 1],
[0, 1, 8]])
And with the r_ helper you could construct the indices from ranges:
In [139]: np.r_[:2, 3:4]
Out[139]: array([0, 1, 3])
In [140]: arr[np.r_[:2, 3:4],:]
Out[140]:
array([[0, 1, 2],
[9, 0, 1],
[0, 1, 8]])
You could also do the join after indexing:
In [141]: np.concatenate([arr[:2,:], arr[3:,:]], axis=0)
Out[141]:
array([[0, 1, 2],
[9, 0, 1],
[0, 1, 8]])
+ is define for lists a a join operator. For arrays it is addition. concatenate (along with various stack variants) is the array join function.

Related

How does the transpose of high-dimensional arrays work?

It's easy to understand the concept of Transpose in 2-D array. I reall can not understand How the transpose of high-dimensional arrays works.
For example
c = np.indices([4,5]).T.reshape(20,1,2)
d = np.indices([4,5]).reshape(20,1,2)
np.all(c==d) # output is False
Why are the outputs of C and D inconsistent?
In [143]: c = np.indices([4,5])
In [144]: c
Out[144]:
array([[[0, 0, 0, 0, 0],
[1, 1, 1, 1, 1],
[2, 2, 2, 2, 2],
[3, 3, 3, 3, 3]],
[[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]]])
In [145]: c.shape
Out[145]: (2, 4, 5)
In [146]: c.T.shape
Out[146]: (5, 4, 2)
Look at one 2d array from the size 2 dimension:
In [150]: c[0,:,:]
Out[150]:
array([[0, 0, 0, 0, 0],
[1, 1, 1, 1, 1],
[2, 2, 2, 2, 2],
[3, 3, 3, 3, 3]])
In [151]: c.T[:,:,0]
Out[151]:
array([[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3]])
The 2nd is the usual 2d transpose, a (5,4) array.
MATLAB doesn't do transpose on 3d arrays, at least it doesn't call it such. It may have a way making such a change. numpy, using a general shape/strides multidimensional implementation, easily generalizes the 2d transpose - to 1d or 3d or more.

Sampling unique column indexes for each row of a numpy array

I want to generate a fixed number of random column indexes (without replacement) for each row of a numpy array.
A = np.array([[3, 5, 2, 3, 3],
[1, 3, 3, 4, 5],
[3, 5, 4, 2, 1],
[1, 2, 3, 5, 3]])
If I fixed the required column number to 2, I want something like
np.array([[1,3],
[0,4],
[1,4],
[2,3]])
I am looking for a non-loop Numpy based solution. I tried with choice, but with the replacement=False I get error
ValueError: Cannot take a larger sample than population when
'replace=False'
Here's one vectorized approach inspired by this post -
def random_unique_indexes_per_row(A, N=2):
m,n = A.shape
return np.random.rand(m,n).argsort(1)[:,:N]
Sample run -
In [146]: A
Out[146]:
array([[3, 5, 2, 3, 3],
[1, 3, 3, 4, 5],
[3, 5, 4, 2, 1],
[1, 2, 3, 5, 3]])
In [147]: random_unique_indexes_per_row(A, N=2)
Out[147]:
array([[4, 0],
[0, 1],
[3, 2],
[2, 0]])
In [148]: random_unique_indexes_per_row(A, N=3)
Out[148]:
array([[2, 0, 1],
[3, 4, 2],
[3, 2, 1],
[4, 3, 0]])
Like this?
B = np.random.randint(5, size=(len(A), 2))
You can use random.choice() as following:
def random_indices(arr, n):
x, y = arr.shape
return np.random.choice(np.arange(y), (x, n))
# or return np.random.randint(low=0, high=y, size=(x, n))
Demo:
In [34]: x, y = A.shape
In [35]: np.random.choice(np.arange(y), (x, 2))
Out[35]:
array([[0, 2],
[0, 1],
[0, 1],
[3, 1]])
As an experimental approach here is a way that in 99% of the times will give unique indices:
In [60]: def random_ind(arr, n):
...: x, y = arr.shape
...: ind = np.random.randint(low=0, high=y, size=(x * 2, n))
...: _, index = np.unique(ind.dot(np.random.rand(ind.shape[1])), return_index=True)
...: return ind[index][:4]
...:
...:
...:
In [61]: random_ind(A, 2)
Out[61]:
array([[0, 1],
[1, 0],
[1, 1],
[1, 4]])
In [62]: random_ind(A, 2)
Out[62]:
array([[1, 0],
[2, 0],
[2, 1],
[3, 1]])
In [64]: random_ind(A, 3)
Out[64]:
array([[0, 0, 0],
[1, 1, 2],
[0, 4, 1],
[2, 3, 1]])
In [65]: random_ind(A, 4)
Out[65]:
array([[0, 4, 0, 3],
[1, 0, 1, 4],
[0, 4, 1, 2],
[3, 0, 1, 0]])
This function will return IndexError at line return ind[index][:4] if there's no 4 unique items in that case you can repeat the function to make sure you'll get the desire result.

Deleting row in numpy array based on condition

I have a 2D numpy array of shape [6,2] and I want to remove the subarrays with the third element containing 0.
array([[0, 2, 1], #Input
[0, 1, 1],
[1, 1, 0],
[1, 0, 2],
[0, 2, 0],
[2, 1, 2]])
array([[0, 2, 1], #Output
[0, 1, 1],
[1, 0, 2],
[2, 1, 2]])
My code is positives = gt_boxes[np.where(gt_boxes[range(gt_boxes.shape[0]),2] != 0)]
It works but is there a simplified method to this?
You can use boolean indexing.
In [413]: x[x[:, -1] != 0]
Out[413]:
array([[0, 2, 1],
[0, 1, 1],
[1, 0, 2],
[2, 1, 2]])
x[:, -1] will retrieve the last column
x[:, -1] != 0 returns a boolean mask
Use the mask to index into the original array

np.choose not giving desired result after broadcasting

I would like to pick the nth elements as specified in maxsuit from suitCounts. I did broadcast the maxsuit array so I do get a result, but not the desired one. Any suggestions what I'm doing conceptually wrong is appreciated. I don't understand the result of np.choose(self.maxsuit[:,:,None]-1, self.suitCounts), which is not what I'm looking for.
>>> self.maxsuit
Out[38]:
array([[3, 3],
[1, 1],
[1, 1]], dtype=int64)
>>> self.maxsuit[:,:,None]-1
Out[33]:
array([[[2],
[2]],
[[0],
[0]],
[[0],
[0]]], dtype=int64)
>>> self.suitCounts
Out[34]:
array([[[2, 1, 3, 0],
[1, 0, 3, 0]],
[[4, 1, 2, 0],
[3, 0, 3, 0]],
[[2, 2, 0, 0],
[1, 1, 1, 0]]])
>>> np.choose(self.maxsuit[:,:,None]-1, self.suitCounts)
Out[35]:
array([[[2, 2, 0, 0],
[1, 1, 1, 0]],
[[2, 1, 3, 0],
[1, 0, 3, 0]],
[[2, 1, 3, 0],
[1, 0, 3, 0]]])
The desired result would be:
[[3,3],[4,3],[2,1]]
You could use advanced-indexing for a broadcasted way to index into the array, like so -
In [415]: val # Data array
Out[415]:
array([[[2, 1, 3, 0],
[1, 0, 3, 0]],
[[4, 1, 2, 0],
[3, 0, 3, 0]],
[[2, 2, 0, 0],
[1, 1, 1, 0]]])
In [416]: idx # Indexing array
Out[416]:
array([[3, 3],
[1, 1],
[1, 1]])
In [417]: m,n = val.shape[:2]
In [418]: val[np.arange(m)[:,None],np.arange(n),idx-1]
Out[418]:
array([[3, 3],
[4, 3],
[2, 1]])
A bit cleaner way with np.ogrid to use open range arrays -
In [424]: d0,d1 = np.ogrid[:m,:n]
In [425]: val[d0,d1,idx-1]
Out[425]:
array([[3, 3],
[4, 3],
[2, 1]])
This is the best I can do with choose
In [23]: np.choose([[1,2,0],[1,2,0]], suitcounts[:,:,:3])
Out[23]:
array([[4, 2, 3],
[3, 1, 3]])
choose prefers that we use a list of arrays, rather than single one. It's supposed to prevent misuse. So the problem could be written as:
In [24]: np.choose([[1,2,0],[1,2,0]], [suitcounts[0,:,:3], suitcounts[1,:,:3], suitcounts[2,:,:3]])
Out[24]:
array([[4, 2, 3],
[3, 1, 3]])
The idea is to select items from the 3 subarrays, based on an index array like:
In [25]: np.array([[1,2,0],[1,2,0]])
Out[25]:
array([[1, 2, 0],
[1, 2, 0]])
The output will match the indexing array in shape. The choise arrays have match in shape as well, hence my use of [...,:3].
Values for the first column are selected from suitcounts[1,:,:3], for the 2nd column from suitcounts[2...] etc.
choose is limited to 32 choices; this is limitation imposed by the broadcasting mechanism.
Speaking of broadcasting I could simplify the expression
In [26]: np.choose([1,2,0], suitcounts[:,:,:3])
Out[26]:
array([[4, 2, 3],
[3, 1, 3]])
This broadcasts [1,2,0] to match the 2x3 shape of the subarrays.
I could get the target order by reordering the columns:
In [27]: np.choose([0,1,2], suitcounts[:,:,[2,0,1]])
Out[27]:
array([[3, 4, 2],
[3, 3, 1]])

How do I set cell values in `np.array()` based on condition?

I have a numpy array and a list of valid values in that array:
import numpy as np
arr = np.array([[1,2,0], [2,2,0], [4,1,0], [4,1,0], [3,2,0], ... ])
valid = [1,4]
Is there a nice pythonic way to set all array values to zero, that are not in the list of valid values and do it in-place? After this operation, the list should look like this:
[[1,0,0], [0,0,0], [4,1,0], [4,1,0], [0,0,0], ... ]
The following creates a copy of the array in memory, which is bad for large arrays:
arr = np.vectorize(lambda x: x if x in valid else 0)(arr)
It bugs me, that for now I loop over each array element and set it to zero if it is in the valid list.
Edit: I found an answer suggesting there is no in-place function to achieve this. Also stop changing my whitespaces. It's easier to see the changes in arr whith them.
You can use np.place for an in-situ update -
np.place(arr,~np.in1d(arr,valid),0)
Sample run -
In [66]: arr
Out[66]:
array([[1, 2, 0],
[2, 2, 0],
[4, 1, 0],
[4, 1, 0],
[3, 2, 0]])
In [67]: np.place(arr,~np.in1d(arr,valid),0)
In [68]: arr
Out[68]:
array([[1, 0, 0],
[0, 0, 0],
[4, 1, 0],
[4, 1, 0],
[0, 0, 0]])
Along the same lines, np.put could also be used -
np.put(arr,np.where(~np.in1d(arr,valid))[0],0)
Sample run -
In [70]: arr
Out[70]:
array([[1, 2, 0],
[2, 2, 0],
[4, 1, 0],
[4, 1, 0],
[3, 2, 0]])
In [71]: np.put(arr,np.where(~np.in1d(arr,valid))[0],0)
In [72]: arr
Out[72]:
array([[1, 0, 0],
[0, 0, 0],
[4, 1, 0],
[4, 1, 0],
[0, 0, 0]])
Indexing with booleans would work too:
>>> arr = np.array([[1, 2, 0], [2, 2, 0], [4, 1, 0], [4, 1, 0], [3, 2, 0]])
>>> arr[~np.in1d(arr, valid).reshape(arr.shape)] = 0
>>> arr
array([[1, 0, 0],
[0, 0, 0],
[4, 1, 0],
[4, 1, 0],
[0, 0, 0]])

Categories