I want to check how many columns of a numpy array/matrix have only positive values.
I took my matrix and printed A>0 and got True and False and then I tried any and all functions but didn't succeed.
In [55]: a = np.array([[13, 21, 12],
[21, -1, 6],
[ 1, 10, 2],
[41, 1, 4]])
The output should be 2.
I saved the matrix A in B and tried writing:
B.all(axis=1).any()>0
This function counts the number of column whose elements are all greater than 0:
def count(mat):
counter = 0
tmp = mat > 0
for col in tmp.T:
if all(col):
counter += 1
return counter
How does this function work?
First it assigns to tmp a matrix of boolean values indicating if the corresponding value of the original matrix was greater than 0, then it iterates through the transpose of such matrix and checks if all the values are True, meaning they are all greater than 0.
The transpose contains the columns of the original matrix. Whey you create a numpy array you pass the rows to the function. By transposing, the array will contain the columns.
Related
I have a list of numbers
a = [1, 2, 3, 4, 5]
and an existed array
b = [[np.nan, 10, np.nan],
[11, 12, 13],
[np.nan, 14, np.nan]]
How can I place the numbers from "list a" to the elements on array b that contains a number which I should get
c = [[np.nan, 1, np.nan],
[2, 3, 4],
[np.nan, 5, np.nan]]
Maybe it can be done with loops but I want to avoid it because the length of the list and the dimension of the array will change. However, the length of the list will always match the number of the elements that are not an np.nan in the array.
Here is an approach to solve it without using loops.
First, we flatten the array b to convert it to a 1D array and then replace the none nan values with contents of a. Then, convert the array back to its initial shape.
flat_b = b.flatten()
flat_b[~np.isnan(flat_b)] = a
flat_b.reshape(b.shape)
You can np.isnan to create a boolean mask. Then use it in indexing1.
m = np.isnan(b)
b[~m] = a
print(b)
[[nan 1. nan]
[ 2. 3. 4.]
[nan 5. nan]]
1. NumPy's Boolean Indexing
c = b
current = 0
for i in range(len(c)):
for j in range(len(c[i])):
if c[i][j] != np.nan and current < len(a):
c[i][j] = a[current]
current += 1
While this may look long and complicated, it actually only has a O(n) complexity. It just iterates through the 2D array and replaces the non-nan values with the current value from a.
I'm pretty new to NumPy and I'm looking for a way to get the index of a current column I'm iterating over in a matrix.
import numpy as np
#sum of elements in each column
def p_b(mtrx):
b = []
for c in mtrx.T:
summ = 0
for i in c:
summ += i
b.append(summ)
return b
#return modified matrix where each element is equal to itself divided by
#the sum of the current column in the original matrix
def a_div_b(mtrx):
for c in mtrx:
for i in c:
#change i to be i/p_b(mtrx)[index_of_a_current_column]
return mtrx
For the input ([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) the result would be
([[1/12, 2/12, 3/12], [4/15, 5/15, 6/15], [7/18, 8/18, 9/18]]).
Any ideas about how I can achieve that?
You don't need those functions and loops to do that. Those will not be efficient. When using numpy, go for vectorized operations whenever is possible (in most cases it is possible). numpy broadcasting rules are used to perform mathematical operation between arrays of different dimensions, when possible, such that you can use vectorization, which is much more efficient than python loops.
In your case, say that your array arr is:
arr = np.arange(1, 10)
arr.shape = (3, 3)
#arr is:
>>> arr
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
you can achieve the desired result with:
res = (arr.T / arr.sum(axis=0)).T
>>> res
array([[0.08333333, 0.16666667, 0.25 ],
[0.26666667, 0.33333333, 0.4 ],
[0.38888889, 0.44444444, 0.5 ]])
numpy sum allows you to sum your array along a given axis if the axis parameter is given. 0 is the inner axis, the one you want to sum.
.T gives the transposed matrix. You need to transpose to perform the division on the correct axis and then transpose back.
Hi I have a huge array and I want to find the proportion of times that the highest number in the array by index number. So this would return:
np.array([
[ [5, 2, 2], [2, 5, 10] ]
])
array([ 0.5, 0, 0.5 ])
Also if the highest number is a duplicate such as [12, 25, 25] then I don't want to ignore it but include it in the count for the denominator when calculating the proportions.
I want to input a two dimensional array with inner arrays of size 3 and want to find out the distribution of max values throughout the array.
Find the maximum(s) along axis zero with numpy.amax
create a boolean array where array == maximum
Count all the True values of the boolean array along axis zero with numpy.sum
find the proportion by dividing the count(s) by the number of rows in the original array - array.shape[0]
I have a numpy array with dim (157,1944).
I want to get indices of columns that have a Nonzero element in any row.
example: [[0,0,3,4], [0,0,1,1]] ----> [2,3]
If you look each row, there is a Non Zero element in columns [2, 3]
So if I have
[[0,1,3,4], [0,0,1,1]]
I should get [1,2,3] because column index 0 has no Nonzero elements in any row.
Not sure if your question is completely defined. However, say we start with
import numpy as np
a = np.array([[0,0,3,4], [0,0,1,1]])
then
>>> np.nonzero(np.all(a != 0, axis=0))[0]
array([2, 3])
are the indices of the columns for which none of the rows are nonzero, and
>>> np.nonzero(np.any(a != 0, axis=0))[0]
array([2, 3])
are the indices of the columns for which not all of the rows are zero (it happens to be the same for the example you gave).
I am looking for the first column containing a nonzero element in a sparse matrix (scipy.sparse.csc_matrix). Actually, the first column starting with the i-th one to contain a nonzero element.
This is part of a certain type of linear equation solver. For dense matrices I had the following: (Relevant line is pcol = ...)
import numpy
D = numpy.matrix([[1,0,0],[2,0,0],[3,0,1]])
i = 1
pcol = i + numpy.argmax(numpy.any(D[:,i:], axis=0))
if pcol != i:
# Pivot columns i, pcol
D[:,[i,pcol]] = D[:,[pcol,i]]
print(D)
# Result should be numpy.matrix([[1,0,0],[2,0,0],[3,1,0]])
The above should swap columns 1 and 2. If we set i = 0 instead, D is unchanged since column 0 already contains nonzero entries.
What is an efficient way to do this for scipy.sparse matrices? Are there analogues for the numpy.any() and numpy.argmax() functions?
With a csc matrix it is easy to find the nonzero columns.
In [302]: arr=sparse.csc_matrix([[0,0,1,2],[0,0,0,2]])
In [303]: arr.A
Out[303]:
array([[0, 0, 1, 2],
[0, 0, 0, 2]])
In [304]: arr.indptr
Out[304]: array([0, 0, 0, 1, 3])
In [305]: np.diff(arr.indptr)
Out[305]: array([0, 0, 1, 2])
The last line shows how many nonzero terms there are in each column.
np.nonzero(np.diff(arr.indptr))[0][0] would be the index of the first nonzero value in that diff.
Do the same on a csr matrix for find the 1st nonzero row.
I can elaborate on indptr if you want.