Select rows in a Numpy 2D array with a boolean vector - python

I have a matrix and a boolean vector:
>>>from numpy import *
>>>a = arange(20).reshape(4,5)
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
>>>b = asarray( [1, 1, 0, 1] ).reshape(-1,1)
array([[1],
[1],
[0],
[1]])
Now I want to select all the corresponding rows in this matrix where the corresponding index in the vector is equal to zero.
>>>a[b==0]
array([10])
How can I make it so this returns this particular row?
[10, 11, 12, 13, 14]

The shape of b is somewhat strange, but if you can craft it as a nicer index it's a simple selection:
idx = b.reshape(a.shape[0])
print a[idx==0,:]
>>> [[10 11 12 13 14]]
You can read this as, "select all the rows where the index is 0, and for each row selected take all the columns". Your expected answer should really be a list-of-lists since you are asking for all of the rows that match a criteria.

Nine years later, I just wanted to add another answer to this question in the case where b is actually a boolean vector.
Square bracket indexing of a Numpy matrix with scalar indices give the corresponding rows, so for example a[2] gives the third row of a. Multiple rows can be selected (potentially with repeats) using a vector of indices.
Similarly, logical vectors that have the same length as the number of rows act as "masks", for example:
a = np.arange(20).reshape(4,5)
b = np.array( [True, True, False, True] )
a[b] # 3x5 matrix formed with the first, second, and last row of a
To answer the OP specifically, the only thing to do from there is to negate the vector b:
a[ np.logical_not(b) ]
Lastly, if b is defined as in the OP with ones and zeros and a column shape, one would simply do: np.logical_not( b.ravel().astype(bool) ).

Related

Delete array of values from numpy array

This post is an extension of this question.
I would like to delete multiple elements from a numpy array that have certain values. That is for
import numpy as np
a = np.array([1, 1, 2, 5, 6, 8, 8, 8, 9])
How do I delete one instance of each value of [1,5,8], such that the output is [1,2,6,8,8,9]. All I have found in the documentation for an array removal is the use of np.setdiff1d, but this removes all instances of each number. How can this be updated?
Using outer comparison and argmax to only remove once. For large arrays this will be memory intensive, since the created mask has a.shape * r.shape elements.
r = np.array([1, 5, 8])
m = (a == r[:, None]).argmax(1)
np.delete(a, m)
array([1, 2, 6, 8, 8, 9])
This does assume that each value in r appears in a at least once, otherwise the value at index 0 will get deleted since argmax will not find a match, and will return 0.
delNums = [np.where(a == x)[0][0] for x in [1,5,8]]
a = np.delete(a, delNums)
here, delNums contains the indexes of the values 1,5,8 and np.delete() will delete the values at those specified indexes
OUTPUT:
[1 2 6 8 8 9]

Python: Access saved points from 2d array in 3d numpy array

I got a 2d numpy array (shape(y,x)=601,1200) and a 3d numpy array (shape(z,y,x)=137,601,1200).
In my 2d array, I saved the z values at the y, x point which I now want to access from my 3d array and save it into a new 2d array.
I tried something like this without success.
levels = array2d.reshape(-1)
y = np.arange(601)
x = np.arange(1200)
newArray2d=oldArray3d[levels,y,x]
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (721200,) (601,) (1200,)
I don't want to try something with loops, so is there any faster method?
This is the data you have:
x_len = 12 # In your case, 1200
y_len = 6 # In your case, 601
z_len = 3 # In your case, 137
import numpy as np
my2d = np.random.randint(0,z_len,(y_len,x_len))
my3d = np.random.randint(0,5,(z_len,y_len,x_len))
This is one way to build your new 2d array:
yindices,xindices = np.indices(my2d.shape)
new2d = my3d[my2d[:], yindices, xindices]
Notes:
We're using Integer Advanced Indexing.
This means we index the 3d array my3d with 3 integer index arrays.
For more explanation on how integer array indexing works, please refer to my answer on this other question
In your attempt, there was no need to reshape your 2d with reshape(-1), since the shape of the integer index array that we pass, will (after any broadcasting) become the shape of the resulting 2d array.
Also, in your attempt, your second and third index arrays need to have opposite orientations. That is, they must be of shape (y_len,1) and (1, x_len). Notice the different positions of the 1. This ensures that these two index arrays will get broadcasted
There's some vagueness in your question, but I think you want to advanced indexing like this:
In [2]: arr = np.arange(24).reshape(4,3,2)
In [3]: levels = np.random.randint(0,4,(3,2))
In [4]: levels
Out[4]:
array([[1, 2],
[3, 1],
[0, 2]])
In [5]: arr
Out[5]:
array([[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 6, 7],
[ 8, 9],
[10, 11]],
[[12, 13],
[14, 15],
[16, 17]],
[[18, 19],
[20, 21],
[22, 23]]])
In [6]: arr[levels, np.arange(3)[:,None], np.arange(2)]
Out[6]:
array([[ 6, 13],
[20, 9],
[ 4, 17]])
levels is (3,2). I created the other 2 indexing arrays to they broadcast with it (3,1) and (2,). The result is a (3,2) array of values from arr, selected by their combined indices.

deleting rows based on value found in specififc column

I am attempting to write a code that searches a numpy array for cases where the value in the fifth column does not have 50. If it does not I wish to remove it.
This is what I have so far:
for rows in range(len(b)):
if b[:,4].any() != 50:
b = np.delete(b, b[rows])
However, I keep getting the following error:
too many indices for array
Lets run the calculation with some diagnositic prints. Note where the error occurs. That's important! (We shouldn't just keep trying things without isolating the problem!)
In [2]: b=np.array([[0,1,2],[1,2,3],[2,1,2]])
In [3]: for row in range(len(b)):
...: print(row)
...: if b[:,2].any() !=2:
...: print(b[row])
...: b = np.delete(b, b[row])
...:
0
[0 1 2]
1
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-3-04dc188d9a2b> in <module>()
1 for row in range(len(b)):
2 print(row)
----> 3 if b[:,2].any() !=2:
4 print(b[row])
5 b = np.delete(b, b[row])
IndexError: too many indices for array
So the error occurs on the 2nd iteration (row 1). Something is wrong with the b after the delete. What is the new value of b?
In [4]: b
Out[4]: array([1, 2, 3, 2, 1, 2])
b is a 1d array, not the 2d we started with. That explains the error, right? Something must be wrong with the use of delete. Maybe we need to check its documentation????
Look at the axis parameter:
axis : int, optional
The axis along which to delete the subarray defined by `obj`.
If `axis` is None, `obj` is applied to the flattened array.
We didn't specify an axis, so the delete was applied to the flattened array, and result was flattened - 1d.
But even if I specify an axis I get an error (I won't get into that), which prompts me to look more carefully at the if condition:
In [10]: b[:,2]
Out[10]: array([2, 3, 2])
In [11]: b[:,2].any()
Out[11]: True
In [12]: b[:,2]!=2
Out[12]: array([False, True, False])
Applying any to the column don't make sense - it just checks if any values in the column are not 0. Instead we want to test the column against the target, getting a boolean that matches the column in size.
We can use that boolean directly as row selection mask
In [13]: b[_,:]
Out[13]: array([[1, 2, 3]])
No need to iterate.
Another problem with your iteration. You iterate on the range(3), [0,1,2]. But inside the loop you try to remove a row from b, changing the size of b. That going to give problems when you try to index b[row] by number, right? When iterating, in Python or numpy, be careful about modifying the object that you are iterating over.
Sorry to be long winded about this, but it looks like you need some basic debugging guidance.
Here's a basic list approach:
In [15]: [row for row in b if row[2]!=2]
Out[15]: [array([1, 2, 3])]
I'm iterating on the rows, not their indices, and for each row checking the column value, and keeping that row if the check is True. We could do that with np.delete, but a list comprehension is clearer (and faster).
It would be better to provide b and desired output, but if i understand it correctly, you could use:
import numpy as np
b = np.array([[50, 2, 3, 4, 5, 6],
[4, 50, 6, 7, 8, 9],
[1, 1, 1, 1, 50, 9]])
array([[50, 2, 3, 4, 5, 6],
[ 4, 50, 6, 7, 8, 9],
[ 1, 1, 1, 1, 50, 9]])
Then you can check which rows contain 50 in the 5th column using
b[:, 4] == 50
array([False, False, True])
and feed this Boolean array back to b to select the desired columns:
b[b[:, 4] == 50]
which leaves you with one row in this case
array([[ 1, 1, 1, 1, 50, 9]])

Align numpy array according to another array

I have a numpy array a containing arbitrary integer numbers, and I have another array b, (it is always a subset of a, but the order of numbers in b is different than a. I want to align the elements of b in the order it appears in a.
a = np.array([4,2,6,5,8,7,10,12]);
b = np.array([10,6,2,12]),
I want b to be align as [2,6,10,12]. How can I do it in numpy efficiently ?
Approach #1 : One approach with np.in1d, assuming no duplicates in a -
a[np.in1d(a,b)]
Better sample case with elements in a disturbed such that its not sorted for the common elements to present a variety case -
In [103]: a
Out[103]: array([ 4, 12, 6, 5, 8, 7, 10, 2])
In [104]: b
Out[104]: array([10, 6, 2, 12])
In [105]: a[np.in1d(a,b)]
Out[105]: array([12, 6, 10, 2])
Approach #2 : One approach with np.searchsorted -
sidx = a.argsort()
out = a[np.sort(sidx[np.searchsorted(a,b,sorter=sidx)])]

Change a 1D NumPy array from (implicit) row major to column major order

I have a 1D array in NumPy that implicitly represents some 2D data in row-major order. Here's a trivial example:
import numpy as np
# My data looks like [[1,2,3,4], [5,6,7,8]]
a = np.array([1,2,3,4,5,6,7,8])
I want to get a 1D array in column-major order (ie. b = [1,5,2,6,3,7,4,8] in the example above).
Normally, I would just do the following:
mat = np.reshape(a, (-1,4))
b = mat.flatten('F')
Unfortunately, the length of my input array is not an exact multiple of the row length I want (ie. a = [1,2,3,4,5,6,7]), so I can't call reshape. I want to keep that extra data, though, which might be quite a lot since my rows are pretty long. Is there any straightforward way to do this in NumPy?
The simplest way I can think of is not to try and use reshape with methods such as ravel('F'), but just to concatenate sliced views of your array.
For example:
>>> cols = 4
>>> a = np.array([1,2,3,4,5,6,7])
>>> np.concatenate([a[i::cols] for i in range(cols)])
array([1, 5, 2, 6, 3, 7, 4])
This works for any length of array and any number of columns:
>>> cols = 5
>>> b = np.arange(17)
>>> np.concatenate([b[i::cols] for i in range(cols)])
array([ 0, 5, 10, 15, 1, 6, 11, 16, 2, 7, 12, 3, 8, 13, 4, 9, 14])
Alternatively, use as_strided to reshape. The fact that the array a is too small to fit the (2, 4) shape doesn't matter: you'll just get junk (i.e. whatever's in memory) in the last place:
>>> np.lib.stride_tricks.as_strided(a, shape=(2, 4))
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 168430121]])
>>> _.flatten('F')[:7]
array([1, 5, 2, 6, 3, 7, 4])
In the general case, given an array b and a desired number of columns cols you can do this:
>>> x = np.lib.stride_tricks.as_strided(b, shape=(len(b)//cols + 1, cols)) # reshape to min 2d array needed to hold array b
>>> np.concatenate((x[:,:len(b)%cols].ravel('F'), x[:-1, len(b)%cols:].ravel('F')))
This unravels the "good" part of the array (those columns not containing junk values) and the bad part (except for the junk values which lie in the bottom row) and concatenates the two unraveled arrays. For example:
>>> cols = 5
>>> b = np.arange(17)
>>> x = np.lib.stride_tricks.as_strided(b, shape=(len(b)//cols + 1, cols))
>>> np.concatenate((x[:,:len(b)%cols].ravel('F'), x[:-1, len(b)%cols:].ravel('F')))
array([ 0, 5, 10, 15, 1, 6, 11, 16, 2, 7, 12, 3, 8, 13, 4, 9, 14])
Use some value to represent null to make the array be a multiple of how you want to split it. If casting to float is acceptable, you could use nan's to represent the added elements that represent nulls. Then reshape to 2D, call transpose, and reshape to 1D. Then eliminate the nulls.
import numpy as np
a = np.array([1,2,3,4,5,6,7]) # input
b = np.concatenate( (a, [np.NaN]) ) # add a NaN to make it 8 = 4x2
c = b.reshape(2,4).transpose().reshape(8,) # reshape to 2x4, transpose, reshape to 8x1
d = c[-np.isnan(c)] # remove NaN
print d
[ 1. 5. 2. 6. 3. 7. 4.]

Categories