How to remove rows while iterating in numpy - python

How to remove rows while iterating in numpy, as Java does:
Iterator < Message > itMsg = messages.iterator();
while (itMsg.hasNext()) {
Message m = itMsg.next();
if (m != null) {
itMsg.remove();
continue;
}
}
Here is my pseudo code. Remove the rows whose entries are all 0 and 1 while iterating.
#! /usr/bin/env python
import numpy as np
M = np.array(
[
[0, 1 ,0 ,0],
[0, 0, 1, 0],
[0, 0, 0, 0], #remove this row whose entries are all 0
[1, 1, 1, 1] #remove this row whose entries are all 1
])
it = np.nditer(M, order="K", op_flags=['readwrite'])
while not it.finished :
row = it.next() #how to get a row?
sumRow = np.sum(row)
if sumRow==4 or sumRow==0 : #remove rows whose entries are all 0 and 1 as well
#M = np.delete(M, row, axis =0)
it.remove_axis(i) #how to get i?

Writing good numpy code requires you to think in a vectorized fashion. Not every problem has a good vectorization, but for those that do, you can write clean and fast code pretty easily. In this case, we can decide on what rows we want to remove/keep and then use that to index into your array:
>>> M
array([[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 0],
[1, 1, 1, 1]])
>>> M[~((M == 0).all(1) | (M == 1).all(1))]
array([[0, 1, 0, 0],
[0, 0, 1, 0]])
Step by step, we can compare M to something to make a boolean array:
>>> M == 0
array([[ True, False, True, True],
[ True, True, False, True],
[ True, True, True, True],
[False, False, False, False]], dtype=bool)
We can use all to see if a row or column is all true:
>>> (M == 0).all(1)
array([False, False, True, False], dtype=bool)
We can use | to do an or operation:
>>> (M == 0).all(1) | (M == 1).all(1)
array([False, False, True, True], dtype=bool)
We can use this to select rows:
>>> M[(M == 0).all(1) | (M == 1).all(1)]
array([[0, 0, 0, 0],
[1, 1, 1, 1]])
But since these are the rows we want to throw away, we can use ~ (NOT) to flip False and True:
>>> M[~((M == 0).all(1) | (M == 1).all(1))]
array([[0, 1, 0, 0],
[0, 0, 1, 0]])
If instead we wanted to keep columns which weren't all 1 or all 0, we simply need to change what axis we're working on:
>>> M
array([[1, 1, 0, 1],
[1, 0, 1, 1],
[1, 0, 0, 1],
[1, 1, 1, 1]])
>>> M[:, ~((M == 0).all(axis=0) | (M == 1).all(axis=0))]
array([[1, 0],
[0, 1],
[0, 0],
[1, 1]])

Related

How to efficiently filter/create mask of numpy.array based on list of tuples

I try to create mask of numpy.array based on list of tuples. Here is my solution that produces expected result:
import numpy as np
filter_vals = [(1, 1, 0), (0, 0, 1), (0, 1, 0)]
data = np.array([
[[0, 0, 0], [1, 1, 0], [1, 1, 1]],
[[1, 0, 0], [0, 1, 0], [0, 0, 1]],
[[1, 1, 0], [0, 1, 1], [1, 0, 1]],
])
mask = np.array([], dtype=bool)
for f_val in filter_vals:
if mask.size == 0:
mask = (data == f_val).all(-1)
else:
mask = mask | (data == f_val).all(-1)
Output/mask:
array([[False, True, False],
[False, True, True],
[ True, False, False]]
Problem is that with bigger data array and increasing number of tuples in filter_vals, it is getting slower.
It there any better solution? I tried to use np.isin(data, filter_vals), but it does not provide result I need.
A classical approach using broadcasting would be:
*A, B = data.shape
(data.reshape((-1,B)) == np.array(filter_vals)[:,None]).all(-1).any(0).reshape(A)
This will however be memory expensive. So applicability really depends on your use case.
output:
array([[False, True, False],
[False, True, True],
[ True, False, False]])

masked_scatter but rowwise?

Assuming a mask as follows:
mask = torch.tensor([
[True, True, False, True, False],
[True, False, True, True, True ],
])
I would like to number the True values with sequential values in each row separately. I don't care what's in the False spots, so 0 for simplicity. Thus the desired result is
tensor([[0, 1, 0, 2, 0], # 0 1 _ 2 _
[0, 0, 1, 2, 3]]) # 0 _ 1 2 3
I hoped this would work:
replacements = torch.arange(mask.size(1)).expand(mask.size())
target = torch.zeros(mask.size(), dtype=int)
target.masked_scatter(mask, replacements)
Unfortunately, masked_scatter ignores the shape of replacements, so this code results in:
tensor([[0, 1, 0, 2, 0], # 0 1 _ 2 _
[3, 0, 4, 0, 1]]) # 3 _ 4 0 1
What would I need to do instead?
I would try something with torch.cumsum: torch.cumsum(mask,dim=1) -1) * mask
The complete example
import torch
mask = torch.tensor([
[True, True, False, True, False],
[True, False, True, True, True ],
])
result=torch.cumsum(mask,dim=1) -1) * mask
print(result)
That would print:
tensor([[0, 1, 0, 2, 0],
[0, 0, 1, 2, 3]])

Changing the array value above the first non-zero element in the column

I'm looking for vectorized way to changing the array value above the first non-zero element in the column.
for x in range(array.shape[1]):
for y in range(array.shape[0]):
if array[y,x]>0:
break
else:
array[y,x]=255
In
Out
As you wrote about an array (not a DataFrame), I assume that you have
a Numpy array and want to use Numpy methods.
To do your task, run the following code:
np.where(np.cumsum(np.not_equal(array, 0), axis=0), array, 255)
Example and explanation of steps:
The source array:
array([[0, 1, 0],
[0, 0, 1],
[1, 1, 0],
[1, 0, 0]])
np.not_equal(array, 0) computes a boolean array with True for
elements != 0:
array([[False, True, False],
[False, False, True],
[ True, True, False],
[ True, False, False]])
np.cumsum(..., axis=0) computes cumulative sum (True counted as 1)
along axis 0 (in columns):
array([[0, 1, 0],
[0, 1, 1],
[1, 2, 1],
[2, 2, 1]], dtype=int32)
​4. The above array is a mask used in where. For masked values (where
the corresponding element of the mask is True (actually, != 0)),
take values from corresponding elements of array, otherwise take 255:
np.where(..., array, 255)
The result (for my array) is:
array([[255, 1, 255],
[255, 0, 1],
[ 1, 1, 0],
[ 1, 0, 0]])
Use masking:
array[array == 0] = 255

How to quickly determine if a matrix is a permutation matrix

How to quickly determine if a square logical matrix is a permutation matrix? For instance,
is not a permutation matrix since the 3rd row have 2 entries 1.
PS: A permutation matrix is a square binary matrix that has exactly one entry 1 in each row and each column and 0s elsewhere.
I define a logical matrix like
numpy.array([(0,1,0,0), (0,0,1,0), (0,1,1,0), (1,0,0,1)])
Here is my source code:
#!/usr/bin/env python
import numpy as np
### two test cases
M1 = np.array([
(0, 1, 0, 0),
(0, 0, 1, 0),
(0, 1, 1, 0),
(1, 0, 0, 1)]);
M2 = np.array([
(0, 1, 0, 0),
(0, 0, 1, 0),
(1, 0, 0, 0),
(0, 0, 0, 1)]);
### fuction
def is_perm_matrix(M) :
for sumRow in np.sum(M, axis=1) :
if sumRow != 1 :
return False
for sumCol in np.sum(M, axis=0) :
if sumCol != 1 :
return False
return True
### print the result
print is_perm_matrix(M1) #False
print is_perm_matrix(M2) #True
Is there any better implementation?
What about this:
def is_permuation_matrix(x):
x = np.asanyarray(x)
return (x.ndim == 2 and x.shape[0] == x.shape[1] and
(x.sum(axis=0) == 1).all() and
(x.sum(axis=1) == 1).all() and
((x == 1) | (x == 0)).all())
Quick test:
In [37]: is_permuation_matrix(np.eye(3))
Out[37]: True
In [38]: is_permuation_matrix([[0,1],[2,0]])
Out[38]: False
In [39]: is_permuation_matrix([[0,1],[1,0]])
Out[39]: True
In [41]: is_permuation_matrix([[0,1,0],[0,0,1],[1,0,0]])
Out[41]: True
In [42]: is_permuation_matrix([[0,1,0],[0,0,1],[1,0,1]])
Out[42]: False
In [43]: is_permuation_matrix([[0,1,0],[0,0,1]])
Out[43]: False
Here's a simple non-numpy solution that assumes that the matrix is a list of lists and that it only contains integers 0 or 1. It also functions correctly if the matrix contains Booleans.
def is_perm_matrix(m):
#Check rows
if all(sum(row) == 1 for row in m):
#Check columns
return all(sum(col) == 1 for col in zip(*m))
return False
m1 = [
[0, 1, 0],
[1, 0, 0],
[0, 0, 1],
]
m2 = [
[0, 1, 0],
[1, 0, 0],
[0, 1, 1],
]
m3 = [
[0, 1, 0],
[1, 0, 0],
[1, 0, 0],
]
m4 = [
[True, False, False],
[False, True, False],
[True, False, False],
]
print is_perm_matrix(m1)
print is_perm_matrix(m2)
print is_perm_matrix(m3)
print is_perm_matrix(m4)
output
True
False
False
False
One method is to call np.sum and pass an axis param, this should generate an array with all ones if not then you don't have a permutation matrix:
In [56]:
a = np.array([[0,1,0,0],[0,0,1,0],[0,1,1,0],[1,0,0,1]])
a
Out[56]:
array([[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 1, 1, 0],
[1, 0, 0, 1]])
In [57]:
np.all(np.sum(a,axis=0) == np.ones((1,4)), True)
Out[57]:
array([False], dtype=bool)
In [58]:
np.all(np.sum(a,axis=1) == np.ones((1,4)), True)
Out[58]:
array([False], dtype=bool)
In [60]:
np.sum(a, axis=1) == np.ones([1,4])
Out[60]:
array([[ True, True, False, False]], dtype=bool)
In [59]:
np.sum(a, axis=0) == np.ones([1,4])
Out[59]:
array([[ True, False, False, True]], dtype=bool)
In [61]:
np.sum(a,axis=0)
Out[61]:
array([1, 2, 2, 1])
In [62]:
np.sum(a,axis=1)
Out[62]:
array([1, 1, 2, 2])

Instantiate a matrix with x zeros and the rest ones

I would like to be able to quickly instantiate a matrix where the first few (variable number of) cells in a row are 0, and the rest are ones.
Imagine we want a 3x4 matrix.
I have instantiated the matrix first as all ones:
ones = np.ones([4,3])
Then imagine we have an array that announces how many leading zeros there are:
arr = np.array([2,1,3,0]) # first row has 2 zeroes, second row 1 zero, etc
Required result:
array([[0, 0, 1],
[0, 1, 1],
[0, 0, 0],
[1, 1, 1]])
Obviously this can be done in the opposite way as well, but I'd consider the approach where 1 is a default value, and zeros would be replaced.
What would be the best way to avoid some silly loop?
Here's one way. n is the number of columns in the result. The number of rows is determined by len(arr).
In [29]: n = 5
In [30]: arr = np.array([1, 2, 3, 0, 3])
In [31]: (np.arange(n) >= arr[:, np.newaxis]).astype(int)
Out[31]:
array([[0, 1, 1, 1, 1],
[0, 0, 1, 1, 1],
[0, 0, 0, 1, 1],
[1, 1, 1, 1, 1],
[0, 0, 0, 1, 1]])
There are two parts to the explanation of how this works. First, how to create a row with m zeros and n-m ones? For that, we use np.arange to create a row with values [0, 1, ..., n-1]`:
In [35]: n
Out[35]: 5
In [36]: np.arange(n)
Out[36]: array([0, 1, 2, 3, 4])
Next, compare that array to m:
In [37]: m = 2
In [38]: np.arange(n) >= m
Out[38]: array([False, False, True, True, True], dtype=bool)
That gives an array of boolean values; the first m values are False and the rest are True. By casting those values to integers, we get an array of 0s and 1s:
In [39]: (np.arange(n) >= m).astype(int)
Out[39]: array([0, 0, 1, 1, 1])
To perform this over an array of m values (your arr), we use broadcasting; this is the second key idea of the explanation.
Note what arr[:, np.newaxis] gives:
In [40]: arr
Out[40]: array([1, 2, 3, 0, 3])
In [41]: arr[:, np.newaxis]
Out[41]:
array([[1],
[2],
[3],
[0],
[3]])
That is, arr[:, np.newaxis] reshapes arr into a 2-d array with shape (5, 1). (arr.reshape(-1, 1) could have been used instead.) Now when we compare this to np.arange(n) (a 1-d array with length n), broadcasting kicks in:
In [42]: np.arange(n) >= arr[:, np.newaxis]
Out[42]:
array([[False, True, True, True, True],
[False, False, True, True, True],
[False, False, False, True, True],
[ True, True, True, True, True],
[False, False, False, True, True]], dtype=bool)
As #RogerFan points out in his comment, this is basically an outer product of the arguments, using the >= operation.
A final cast to type int gives the desired result:
In [43]: (np.arange(n) >= arr[:, np.newaxis]).astype(int)
Out[43]:
array([[0, 1, 1, 1, 1],
[0, 0, 1, 1, 1],
[0, 0, 0, 1, 1],
[1, 1, 1, 1, 1],
[0, 0, 0, 1, 1]])
Not as concise as I wanted (I was experimenting with mask_indices), but this will also do the work:
>>> n = 3
>>> zeros = [2, 1, 3, 0]
>>> numpy.array([[0] * zeros[i] + [1]*(n - zeros[i]) for i in range(len(zeros))])
array([[0, 0, 1],
[0, 1, 1],
[0, 0, 0],
[1, 1, 1]])
>>>
Works very simple: concatenates multiplied required number of times, one-element lists [0] and [1], creating the array row by row.

Categories