Walk through each column in a numpy matrix efficiently in Python - python

I have a very big two-dimensions array in Python, using numpy library. I want to walk through each column efficiently and check each time if elements are different from 0 to count their number in every column.
Suppose I have the following matrix.
M = array([[1,2], [3,4]])
The following code enables us to walk through each row efficiently, for example (it is not what I intend to do of course!):
for row_idx, row in enumerate(M):
print "row_idx", row_idx, "row", row
for col_idx, element in enumerate(row):
print "col_idx", col_idx, "element", element
# update the matrix M: square each element
M[row_idx, col_idx] = element ** 2
However, in my case I want to walk through each column efficiently, since I have a very big matrix.
I've heard that there is a very efficient way to achieve this using numpy, instead of my current code:
curr_col, curr_row = 0, 0
while (curr_col < numb_colonnes):
result = 0
while (curr_row < numb_rows):
# If different from 0
if (M[curr_row][curr_col] != 0):
result += 1
curr_row += 1
.... using result value ...
curr_col += 1
curr_row = 0
Thanks in advance!

In the code you showed us, you treat numpy's arrays as lists and for what you can see, it works! But arrays are not lists, and while you can treat them as such it wouldn't make sense to use arrays, or even numpy.
To really exploit the usefulness of numpy you have to operate directly on arrays, writing, e.g.,
M = M*M
when you want to square the elements of an array and using the rich set of numpy functions to operate directly on arrays.
That said, I'll try to get a bit closer to your problem...
If your intent is to count the elements of an array that are different from zero, you can use the numpy function sum.
Using sum, you can obtain the sum of all the elements in an array, or you can sum across a particular axis.
import numpy as np
a = np.array(((3,4),(5,6)))
print np.sum(a) # 18
print np.sum(a, axis=0) # [8, 10]
print np.sum(a, axis=1) # [7, 11]
Now you are protesting: I don't want to sum the elements, I want to count the non-zero elements... but
if you write a logical test on an array, you obtain an array of booleans, e.g, we want to test which elements of a are even
print a%2==0
# [[False True]
# [False True]]
False is zero and True is one, at least when we sum it...
print np.sum(a%2==0) # 2
or, if you want to sum over a column, i.e., the index that changes is the 0-th
print np.sum(a%2==0, axis=0) # [0 2]
or sum across a row
print np.sum(a%2==0, axis=1) # [1 1]
To summarize, for your particular use case
by_col = np.sum(M!=0, axis=0)
# use the counts of non-zero terms in each column, stored in an array
...
# if you need the grand total, use sum again
total = np.sum(by_col)

Related

2 different specified elements from 2 numpy arrays

I have two numpy arrays with 0s and 1s in them. How can I find the indexes with 1 in the first array and 0 in the second?
I tried np.logical_and
But got error message (builtin_function_or_method' object is not subscriptable)
Use np.where(arr1==1) and np.where(arr2==0)
import numpy as np
array1 = np.array([0,0,0,1,1,0,1])
array2 = np.array([0,1,0,0,1,0,1])
ones = np.where(array1 == 1)
zeroes = np.where(array2 == 0)
print("array 1 has 1 at",ones)
print("array 2 has 0 at",zeroes)
returns:
array 1 has 1 at (array([3, 4, 6]),)
array 2 has 0 at (array([0, 2, 3, 5]),)
I'm not sure if theres some built-in numpy function that will do this for you, since it's a fairly specific problem. EDIT: there is, see bottom
Nonetheless, if one were to exist, it would have to be a linear time algorithm if you're passing in a bare numpy array, so writing your own isn't difficult.
If I have any numpy array (or python array) myarray, and I want a collection of indices where some object myobject appears, we can do this in one line using a list comprehension:
indices = [i for i in range(len(myarray)) if myarray[i] == myobject]
So what's going on here?
A list comprehension works in the following format:
[<output> for <input> in <iterable> if <condition>]
In our case, <input> and <output> are the indices of myarray, and the <condition> block checks if the value at the current index is equal to that of our desired value.
Edit: as White_Sirilo helpfully pointed out, numpy.where does the same thing, I stand corrected
Let's say your arrays are called j and k. The following code returns all indices where j[index] = 1 and k[index] = 0 if both arrays are 1-dimensional. It also works if j and k are different sizes.
idx_1 = np.where(j == 1)[0]
idx_2 = np.where(k == 0)[0]
final_indices = np.intersect1d(idx_1, idx_2, return_indices=False)
If your array is 2-dimensional, you can use the above code in a function and then go row-by-row. There are almost definitely better ways to do this, but this works in a pinch.
tow numpy array given in problem.
array1 and array2
just use
one_index=np.where(array1==1)
and
zero_index=np.where(array2==0)

How to identify / find a 1D numpy array pattern in a 2D numpy array?

I have a 2D numpy array, and am trying to find the entries where it is equal to a 1D array, but the dimensions of these two arrays prohibit broadcasting. Specifically, my 2D array is like 300x400, and I want to see where it's equal to a 2 element row vector [1, -1].
I am trying to find the location of pixels in an image that are on the border of segmentations. This is denoted in this mask by a 1 being adjacent to the foreground and -1 to the background. So I need to find locations where [1,-1] occurs in the rows of the mask, let's say a.
I have tried a == [1,-1] but this just performs object-level equality and returns False.
I guess I could do this with
for i in range(a.shape[0]):
for j in range(a.shape[1]-1):
if a[i,j] == 1:
if a[i,j+1] == -1:
print(i)
but is there not some cute way to do this with a numpy method or something? I hate loops
arr = np.array([[1,1,-1],[1,-1,-1]])
arr_idx = (arr==1)[:,:-1] & (arr==-1)[:,1:]
Gives
>>> arr_idx
array([[False, True],
[ True, False]])
Which is an index for things that meet your criteria. Note that this is shaped with one fewer column than your input matrix (for obvious reasons).
You can add a column on one side or the other to change the indexing to either side of the pair you're looking for.
arr_idx = np.concatenate((np.zeros(shape=(2, 1), dtype=bool), arr_idx), axis=1)
>>> arr_idx
array([[False, False, True],
[False, True, False]])
Sticking a new column on the left gives the index for the -1 component of the pair.
You can use a 2d cross-correlation for this.
There is a function in scipy for this: signal.correlate2d()
import numpy as np
from scipy import signal
arr = np.array([[1,1,-1],
[1,-1,-1]])
# the pattern you are looking for, has to be 2d
krn = np.array([[1,-1]])
res = signal.correlate2d(arr, krn, mode='same')
print(res)
The result is
[[ 0 2 -1]
[ 2 0 -1]]
it is the higher the better the match. In your case the 2 indicates the positions where your pattern is found.
Check for a 1 starting with the first index (drop the last as no -1 can be after it)
tmp1 = a[:,0:-1]
t1 = tmp1==1
t1 is True wherever the shifted a is 1
Check for -1 starting from the first till the last index
tmp2 = a[:,1:]
t2 = tmp2 ==-1
t1 and t2 contain True and False (shifted according to your goal)
t1*t2 will give you True in the rows where your condition is fulfilled. Summing over the rows gives you a number above zero at every row index i
res = np.sum(t1*t2,axis=1)
desired_Indices = np.where(res>0)
... everybody hates loops ;)

finding the occurrence of vector v (1,k) inside a matrix M (m,k)

I want to find the number of occurrences of vector v in matrix M.
What I have is a matrix the size (60K, 10)
and I initialised a test vector v (1,10):
tester = np.zeros((1, 10))
Now I want to check how much time that vector entirely repeats itself in the matrix rows.
I did it iterative and it works, but the fact that the matrix is very large, it affects the performance and im trying to find some more elegant and faster way.
would appreciate some help
Thanks.
you can do the following:
temp = np.where((prediction == tester)).all(axis=1))
len(temp[0])
what np.where() returns in the case it has no values [x,y] accept for the condition is the indices, in your case it will return the True and False option, starting from the True.
so using this will sure to lower your running time, and for me its much more elegant then looping through the matrix.
you can check np.where api:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html
Just compare and use all, so each row will result in a True value only if all its elements compare equal to the reference array. Then, you can simply sum the result, since int(True) == 1.
Example:
np.random.seed(0)
data = np.random.randint(0, 2, size=(50, 3))
to_match = np.random.randint(0, 2, size=(1, 3))
print(to_match)
print((data == to_match).all(axis=1).sum())
Output:
[[0 0 0]]
4
...which means that there are 4 instances of [0, 0, 0] in data.

function to get number of columns in a NumPy array that returns 1 if it is a 1D array

I have defined operations on 3xN NumPy arrays, and I want to loop over each column of the array.
I tried:
for i in range(nparray.shape[1]):
However, if nparray.ndim == 1, this fails.
Is there a clean way to ascertain the number of columns of a NumPy array, for example, to get 1 if it is a 1D array (like MATLAB's size operation does)?
Otherwise, I have implemented:
if nparray.ndim == 1:
num_points = 1
else:
num_points = nparray.shape[1]
for i in range(num_points):
If you're just looking for something less verbose, you could do this:
num_points = np.atleast_2d(nparray).shape[1]
That will, of course, make a new temporary array just to take its shape, which is a little silly… but it'll be pretty cheap, because it's just a view of the same memory.
However, I think your explicit code is more readable, except that I might do it with a try:
try:
num_points = nparray.shape[1]
except IndexError:
num_points = 1
If you're doing this repeatedly, whatever you do, you should wrap it in a function. For example:
def num_points(arr, axis):
try:
return arr.shape[axis]
except IndexError:
return 1
Then all you have to write is:
for i in range(num_points(nparray, 1)):
And of course it means you can change things everywhere by just editing one place, e.g.,:
def num_points(arr, axis):
return nparray[:,...,np.newaxis].shape[1]
If you want to keep the one-liner, how about using conditional expressions:
for i in range(nparray.shape[1] if nparray.ndim > 1 else 1):
pass
By default, to iterate a np.array means to iterate over the rows. If you have to iterate over columns, just iterate through the transposed array:
>>> a2=array(range(12)).reshape((3,4))
>>> for col in a2.T:
print col
[0 4 8]
[1 5 9]
[ 2 6 10]
[ 3 7 11]
What's the intended behavior of an array array([1,2,3]), it is treated as having one column or having 3 cols? It is confusing that you mentioned that the arrays are all 3XN arrays, which means this should be the intended behavior, as it should be treated as having just 1 column:
>>> a1=array(range(3))
>>> for col in a1.reshape((3,-1)).T:
print col
[0 1 2]
So, a general solution: for col in your_array.reshape((3,-1)).T: #do something
I think the easiest way is to use the len function:
for i in range(len(nparray)):
...
Why? Because if the number of nparray is a like a one dimensional vector, len will return the number of elements. In your case, the number of columns.
nparray = numpy.ones(10)
print(len(nparray))
Out: 10
If nparray is like a matrix, the number of columns will be returned.
nparray = numpy.ones((10, 5))
print(len(nparray))
Out: 10
If you have a list of numpy arrays with different sizes, just use len inside a loop based on your list.

element-wise operations on a sparse matrix

If you have a sparse matrix X:
>> print type(X)
<class 'scipy.sparse.csr.csr_matrix'>
...How can you sum the squares of each element in each row, and save them into a list? For example:
>>print X.todense()
[[0 2 0 2]
[0 2 0 1]]
How can you turn that into a list of sum of squares of each row:
[[0²+2²+0²+2²]
[0²+2²+0²+1²]]
or:
[8, 5]
First of all, the csr matrix has a .sum method (relying on the dot product) which works well, so what you need is the squaring. The simplest solution is to create a copy of the sparse matrix, square its data and then sum it:
squared_X = X.copy()
# now square the data in squared_X
squared_X.data **= 2
# and sum each row:
squared_sum = squared_X.sum(1)
# and delete the squared_X:
del squared_X
If you really must save the space, I guess you could just replace .data and then replace it back, something along:
X.sum_duplicate() # make sure, not sure if this happens with normal usage.
old_data = X.data.copy()
X.data **= 2
squared_sum = X.sum(1)
X.data = old_data
EDIT: There is actually another nice way, as the csr matrix has a .multiply method for elementwise multiplication:
squared_sum = X.multiply(X).sum(1)
Addition:
Elementwise operations are thus easily done by accessing csr.data which stores the values for all nonzero elements. NOTE: I guess .sum_duplicates() may be necessary, I am not sure what kind of operations would make it necessary.

Categories