How to check that a matrix contains a zero column? - python

I have a large matrix, I'd like to check that it has a column of all zeros somewhere in it. How to do that in numpy?

Here's one way:
In [19]: a
Out[19]:
array([[9, 4, 0, 0, 7, 2, 0, 4, 0, 1, 2],
[0, 2, 0, 0, 0, 7, 6, 0, 6, 2, 0],
[6, 8, 0, 4, 0, 6, 2, 0, 8, 0, 3],
[5, 4, 0, 0, 0, 0, 0, 0, 0, 3, 8]])
In [20]: (~a.any(axis=0)).any()
Out[20]: True
If you later decide that you need the column index:
In [26]: numpy.where(~a.any(axis=0))[0]
Out[26]: array([2])

Create an equals 0 mask (mat == 0), and run all on it along an axis.
(mat == 0).all(axis=0).any()

Related

Set numpy matrix elements to zero if varying row index is exceeded

I have a quite large m times n numpy matrix M filled with non-zero values and an array x of length m, where each entry indicates the row index, after which the matrix elements should be set to zero. So for example, if n=5 and x[i]=3, then the i-th row of the matrix be set to [M_i1, M_i2, M_i3, 0, 0].
If all entries of x had the same value k, I could simply use slicing with something like M[:,k:]=0, but I could not figure out an efficient way to this with different values for each row without looping over all rows and use slicing for each row.
I thougt about creating a matrix that looks like [[1]*x[1] + [0]*(n-x[1]),...,[1]*x[m] + [0]*(n-x[m])] and use it for boolean indexing but also don't know how to create this without looping.
The non-vectorized solution looks like this:
for i in range(m):
if x[i] < n:
M[i,x[i]:] = 0
with example input
M = np.array([[1,2,3],[4,5,6]])
m, n = 2, 3
x = np.array([1,2])
and output
array([[1, 0, 0],
[4, 5, 0]])
Does anyone have a vectorized solution for this problem?
Thank you very much!
You can use multi-dimensional boolean indexing:
M[x[:,None]<=np.arange(M.shape[1])] = 0
example:
M = [[7, 8, 4, 2, 3, 9, 1, 8, 4, 3],
[2, 1, 6, 1, 5, 2, 2, 2, 9, 2],
[6, 1, 6, 8, 4, 3, 6, 9, 2, 6],
[5, 4, 0, 8, 3, 0, 0, 1, 8, 7],
[8, 7, 8, 8, 9, 2, 0, 8, 0, 2]]
x = [4, 4, 0, 6, 2]
output:
[[7, 8, 4, 2, 0, 0, 0, 0, 0, 0],
[2, 1, 6, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[5, 4, 0, 8, 3, 0, 0, 0, 0, 0],
[8, 7, 0, 0, 0, 0, 0, 0, 0, 0]]
This looks like a mask-smearing exercise. At each row, you want to smear starting with the element at np.minimum(x[row], n):
mask = np.zeros(M.shape, bool)
mask[np.flatnonzero(x < n), x[x < n]] = True
M[np.cumsum(mask, axis=1, dtype=bool)] = 0

Cumulative count of duplicate values by certain axis in Numpy array

Let's say I have this numpy array:
array([[4, 5, 6, 8, 5, 6],
[5, 1, 1, 9, 0, 5],
[7, 0, 5, 8, 0, 5],
[9, 2, 3, 8, 2, 3],
[1, 2, 2, 9, 2, 8]])
And going row by row, I would like to see, by column, the cumulative count of the number that appears. So for this array, the result would be:
array([[0, 0, 0, 0, 0, 0], # (*0)
[0, 0, 0, 0, 0, 0], # (*1)
[0, 0, 0, 1, 1, 1], # (*2)
[0, 0, 0, 2, 0, 0], # (*3)
[0, 1, 0, 1, 1, 0]] # (*4)
(*0): first time each value appears
(*1): all values are different from the previous one (in the column)
(*2): For the last 3 columns, a 1 appears because there is already 1 value repetition.
(*3): For the 4th column, a 2 appears because it's the 3rd time that a 8 appears.
(*4): In the 4th column, a 1 appears because it's the 2nd time that a 9 appears in that column. Similarly, for the second and second to last column.
Any idea how to perform this?
Thanks!
Maybe there is a faster way using numpy ufuncs, however here is a solution using standard python:
from collections import defaultdict
import numpy as np
a = np.array([[4, 5, 6, 8, 5, 6],
[5, 1, 1, 9, 0, 5],
[7, 0, 5, 8, 0, 5],
[9, 2, 3, 8, 2, 3],
[1, 2, 2, 9, 2, 8]])
# define function
def get_count(array):
count = []
for row in array.T:
occurences = defaultdict(int)
rowcount = []
for n in row:
occurences[n] += 1
rowcount.append(occurences[n] - 1)
count.append(rowcount)
return np.array(count).T
Output:
>>> get_count(a)
array([[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 1],
[0, 0, 0, 2, 0, 0],
[0, 1, 0, 1, 1, 0]])

Delete rows in ndarray where sum of multiple indexes is 0

So I have a very large two-dimensional numpy array such as:
array([[ 2, 4, 0, 0, 0, 5, 9, 0],
[ 2, 3, 0, 1, 0, 3, 1, 1],
[ 1, 5, 4, 3, 2, 7, 8, 3],
[ 0, 7, 0, 0, 0, 6, 4, 4],
...,
[ 6, 5, 6, 0, 0, 1, 9, 5]])
I would like to quickly remove each row of the array where np.sum(row[2:5]) == 0
The only way I can think to do this is with for loops, but that takes very long when there are millions of rows. Additionally, this needs to be constrained to Python 2.7
Boolean expressions can be used as an index. You can use them to mask the array.
inputarray = array([[ 2, 4, 0, 0, 0, 5, 9, 0],
[ 2, 3, 0, 1, 0, 3, 1, 1],
[ 1, 5, 4, 3, 2, 7, 8, 3],
[ 0, 7, 0, 0, 0, 6, 4, 4],
...,
[ 6, 5, 6, 0, 0, 1, 9, 5]])
mask = numpy.sum(inputarray[:,2:5], axis=1) != 0
result = inputarray[mask,:]
What this is doing:
inputarray[:, 2:5] selects all the columns you want to sum over
axis=1 means we're doing the sum on the columns
We want to keep the rows where the sum is not zero
The mask is used as a row index and selects the rows where the boolean expression is True
Another solution would be to use numpy.apply_along_axis to calculate the sums and cast it as a bool, and use that for your index:
my_arr = np.array([[ 2, 4, 0, 0, 0, 5, 9, 0],
[ 2, 3, 0, 1, 0, 3, 1, 1],
[ 1, 5, 4, 3, 2, 7, 8, 3],
[ 0, 7, 0, 0, 0, 6, 4, 4],])
my_arr[np.apply_along_axis(lambda x: bool(sum(x[2:5])), 1, my_arr)]
array([[2, 3, 0, 1, 0, 3, 1, 1],
[1, 5, 4, 3, 2, 7, 8, 3]])
We just cast the sum too a bool since any number that's not 0 is going to be True.
>>> a
array([[2, 4, 0, 0, 0, 5, 9, 0],
[2, 3, 0, 1, 0, 3, 1, 1],
[1, 5, 4, 3, 2, 7, 8, 3],
[0, 7, 0, 0, 0, 6, 4, 4],
[6, 5, 6, 0, 0, 1, 9, 5]])
You are interested in columns 2 through five
>>> a[:,2:5]
array([[0, 0, 0],
[0, 1, 0],
[4, 3, 2],
[0, 0, 0],
[6, 0, 0]])
>>> b = a[:,2:5]
You want to find the sum of those columns in each row
>>> sum_ = b.sum(1)
>>> sum_
array([0, 1, 9, 0, 6])
These are the rows that meet your criteria
>>> sum_ != 0
array([False, True, True, False, True], dtype=bool)
>>> keep = sum_ != 0
Use boolean indexing to select those rows
>>> a[keep, :]
array([[2, 3, 0, 1, 0, 3, 1, 1],
[1, 5, 4, 3, 2, 7, 8, 3],
[6, 5, 6, 0, 0, 1, 9, 5]])
>>>

Modify every two largest elements of matrix rows and columns

In python, I have a matrix and I want to find the two largest elements in every row and every column and change their values to 1 (seperately, I mean get two matrices where one of them modified the rows and the other modified the cols).
The main goal is to get a corresponding matrix with zeros everywhere except those ones I've put in the 2 largest element of each row and column (using np.where(mat == 1, 1, 0).
I'm trying to use the np.argpartition in order to do so but without success.
Please help.
See image below.
Here's an approach with np.argpartition -
idx_row = np.argpartition(-a,2,axis=1)[:,:2]
out_row = np.zeros(a.shape,dtype=int)
out_row[np.arange(idx_row.shape[0])[:,None],idx_row] = 1
idx_col = np.argpartition(-a,2,axis=0)[:2]
out_col = np.zeros(a.shape,dtype=int)
out_col[idx_col,np.arange(idx_col.shape[1])] = 1
Sample input, output -
In [40]: a
Out[40]:
array([[ 3, 7, 1, -5, 14, 2, 8],
[ 5, 8, 1, 4, -3, 3, 10],
[11, 3, 5, 1, 9, 2, 5],
[ 6, 4, 12, 6, 1, 15, 4],
[ 8, 2, 0, 1, -2, 3, 5]])
In [41]: out_row
Out[41]:
array([[0, 0, 0, 0, 1, 0, 1],
[0, 1, 0, 0, 0, 0, 1],
[1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 0, 0, 1, 0],
[1, 0, 0, 0, 0, 0, 1]])
In [42]: out_col
Out[42]:
array([[0, 1, 0, 0, 1, 0, 1],
[0, 1, 0, 1, 0, 1, 1],
[1, 0, 1, 0, 1, 0, 0],
[0, 0, 1, 1, 0, 1, 0],
[1, 0, 0, 0, 0, 0, 0]])
Alternatively, if you are into compact codes, we can skip the initialization and use broadcasting to get the outputs from idx_row and idx_col directly, like so -
out_row = (idx_row[...,None] == np.arange(a.shape[1])).any(1).astype(int)
out_col = (idx_col[...,None] == np.arange(a.shape[0])).any(0).astype(int).T

How to look at only the 3rd value in all lists in a list

I have a list of lists and I want to be able to refer to the 1st, 2nd, 3rd, etc. column in a list of lists. Here is my code for the list:
matrix = [
[0, 0, 0, 5, 0, 0, 0, 0, 6],
[8, 0, 0, 0, 4, 7, 5, 0, 3],
[0, 5, 0, 0, 0, 3, 0, 0, 0],
[0, 7, 0, 8, 0, 0, 0, 0, 9],
[0, 0, 0, 0, 1, 0, 0, 0, 0],
[9, 0, 0, 0, 0, 4, 0, 2, 0],
[0, 0, 0, 9, 0, 0, 0, 1, 0],
[7, 0, 8, 3, 2, 0, 0, 0, 5],
[3, 0, 0, 0, 0, 8, 0, 0, 0],
]
I want to be able to say something like:
matrix = [
[0, 0, 0, 5, 0, 0, 0, 0, 6],
[8, 0, 0, 0, 4, 7, 5, 0, 3],
[0, 5, 0, 0, 0, 3, 0, 0, 0],
[0, 7, 0, 8, 0, 0, 0, 0, 9],
[0, 0, 0, 0, 1, 0, 0, 0, 0],
[9, 0, 0, 0, 0, 4, 0, 2, 0],
[0, 0, 0, 9, 0, 0, 0, 1, 0],
[7, 0, 8, 3, 2, 0, 0, 0, 5],
[3, 0, 0, 0, 0, 8, 0, 0, 0],
]
if (The fourth column in this matrix does not have any 1's in it):
(then do something)
I want to know what the python syntax would be for the stuff in parenthesis.
The standard way to perform what you asked is to do a list comprehension
if (The fourth column in this matrix does not have any 1's in it):
translates in:
>>>if not any([1 == row[3] for row in matrix])
However, depending on how often you need to perform this operation, how big is your matrix, etc... you might wish to look into numpy as it is easier (and remarkably faster) to address columns. An example:
>>> import numpy as np
>>> matrix = np.random.randint(0, 10, (5, 5))
>>> matrix
array([[3, 0, 9, 9, 3],
[5, 7, 7, 7, 6],
[5, 4, 6, 2, 2],
[1, 3, 5, 0, 5],
[3, 9, 7, 8, 6]])
>>> matrix[..., 3] #fourth column
array([9, 7, 2, 0, 8])
Try this:
if all(row[3] != 1 for row in matrix):
# do something
The row[3] part takes a look at the fourth element of a row, the for row in matrix part looks at all the rows in the matrix - this produces a list with all the fourth elements in all the rows, that is, the whole fourth column. Now if it is true for all the elements in the fourth column that they're different from one, then the condition is satisfied and you can do what you need inside the if.
A more traditional approach would be:
found_one = False
for i in xrange(len(matrix)):
if matrix[i][3] == 1:
found_one = True
break
if found_one:
# do something
Here I'm iterating over all the rows (i index) of the fourth column (3 index), and checking if an element is equal to one: if matrix[i][3] == 1:. Notice that the for cycle goes from the 0 index up to the "height" of the matrix minus one, that's what the xrange(len(matrix)) part says.
if 1 in [row[3] for row in matrix]:

Categories