How to calculate a sum of mismatching elements in two NumPy arrays

How to calculate a sum of mismatching elements in two NumPy arrays - python

So I'm currently trying to implement a perceptron, and I have two NumPy arrays, dimensions are 1x200. I would like to check each and every element in the two matrices against each other, and get back the sum of the elements which doesn't match each other. I tried doing something like this:
b = (x_A > 0).astype(int)
b[b == 0] = -1
Now I want to compare this matrix with the other, my question is therefore, is there a way to avoid for-loops and still get what I want (the sum of elements which doesn't match)?

You should just be able to do this directly - assuming that your arrays are of the same dimensions. For numpy arrays a and b:
np.sum(a != b)
a != b gives an array of Booleans (True when they are not equal element-wise and False when they are). Sum will give you the count of all elements that are not equal.

Related

Python: general sum over numpy rows

I want to sum all the lines of one matrix hence, if I have a n x 2 matrix, the result should be a 1 x 2 vector with all rows summed. I can do something like that with np.sum( arg, axis=1 ) but I get an error if I supply a vector as argument. Is there any more general sum function which doesn't throw an error when a vector is supplied? Note: This was never a problem in MATLAB.
Background: I wrote a function which calculates some stuff and sums over all rows of the matrix. Depending on the number of inputs, the matrix has a different number of rows and the number of rows is >= 1

According to numpy.sum documentation, you cannot specify axis=1 for vectors as you would get a numpy AxisError saying axis 1 is out of bounds for array of dimension 1.
A possible workaround could be, for example, writing a dedicated function that checks the size before performing the sum. Please find below a possible implementation:
import numpy as np
M = np.array([[1, 4],
[2, 3]])
v = np.array([1, 4])
def sum_over_columns(input_arr):
if len(input_arr.shape) > 1:
return input_arr.sum(axis=1)
return input_arr.sum()
print(sum_over_columns(M))
print(sum_over_columns(v))
In a more pythonic way (not necessarily more readable):
def oneliner_sum(input_arr):
return input_arr.sum(axis=(1 if len(input_arr.shape) > 1 else None))

You can do
np.sum(np.atleast_2d(x), axis=1)
This will first convert vectors to singleton-dimensional 2D matrices if necessary.

How to return a numpy array with values where, the common indices values for 2 arrays are both greater than 0

I want the first array to display it's values only when common indices values of both the arrays are greater than zero else make it zero. I'm not really sure how to frame the question. Hopefully the expected output provides better insight.
I tried playing around with np.where, but I can't seem to make it work when 2 arrays are provided.
a = np.array([0,2,1,0,4])
b = np.array([1,1,3,4,0])
# Expected Output
a = ([0,2,1,0,0])

The zip function, which takes elements of two arrays side by side, is useful here. You don't necessarily need an np/numpy function.
import numpy as np
a = np.array([0,2,1,0,4])
b = np.array([1,1,3,4,0])
c = np.array([x if x * y > 0 else 0 for x,y in zip(a, b)])
print(c)

Speed up double for loop in numpy

I currently have the following double loop in my Python code:
for i in range(a):
for j in range(b):
A[:,i]*=B[j][:,C[i,j]]
(A is a float matrix. B is a list of float matrices. C is a matrix of integers. By matrices I mean m x n np.arrays.
To be precise, the sizes are: A: mxa B: b matrices of size mxl (with l different for each matrix) C: axb. Here m is very large, a is very large, b is small, the l's are even smaller than b
)
I tried to speed it up by doing
for j in range(b):
A[:,:]*=B[j][:,C[:,j]]
but surprisingly to me this performed worse.
More precisely, this did improve performance for small values of m and a (the "large" numbers), but from m=7000,a=700 onwards the first appraoch is roughly twice as fast.
Is there anything else I can do?
Maybe I could parallelize? But I don't really know how.
(I am not committed to either Python 2 or 3)

Here's a vectorized approach assuming B as a list of arrays that are of the same shape -
# Convert B to a 3D array
B_arr = np.asarray(B)
# Use advanced indexing to index into the last axis of B array with C
# and then do product-reduction along the second axis.
# Finally, we perform elementwise multiplication with A
A *= B_arr[np.arange(B_arr.shape[0]),:,C].prod(1).T
For cases with smaller a, we could run a loop that iterates through the length of a instead. Also, for more performance, it might be a better idea to store those elements into a separate 2D array instead and perform the elementwise multiplication only once after we get out of the loop.
Thus, we would have an alternative implementation like so -
range_arr = np.arange(B_arr.shape[0])
out = np.empty_like(A)
for i in range(a):
out[:,i] = B_arr[range_arr,:,C[i,:]].prod(0)
A *= out

Walk through each column in a numpy matrix efficiently in Python

I have a very big two-dimensions array in Python, using numpy library. I want to walk through each column efficiently and check each time if elements are different from 0 to count their number in every column.
Suppose I have the following matrix.
M = array([[1,2], [3,4]])
The following code enables us to walk through each row efficiently, for example (it is not what I intend to do of course!):
for row_idx, row in enumerate(M):
print "row_idx", row_idx, "row", row
for col_idx, element in enumerate(row):
print "col_idx", col_idx, "element", element
# update the matrix M: square each element
M[row_idx, col_idx] = element ** 2
However, in my case I want to walk through each column efficiently, since I have a very big matrix.
I've heard that there is a very efficient way to achieve this using numpy, instead of my current code:
curr_col, curr_row = 0, 0
while (curr_col < numb_colonnes):
result = 0
while (curr_row < numb_rows):
# If different from 0
if (M[curr_row][curr_col] != 0):
result += 1
curr_row += 1
.... using result value ...
curr_col += 1
curr_row = 0
Thanks in advance!

In the code you showed us, you treat numpy's arrays as lists and for what you can see, it works! But arrays are not lists, and while you can treat them as such it wouldn't make sense to use arrays, or even numpy.
To really exploit the usefulness of numpy you have to operate directly on arrays, writing, e.g.,
M = M*M
when you want to square the elements of an array and using the rich set of numpy functions to operate directly on arrays.
That said, I'll try to get a bit closer to your problem...
If your intent is to count the elements of an array that are different from zero, you can use the numpy function sum.
Using sum, you can obtain the sum of all the elements in an array, or you can sum across a particular axis.
import numpy as np
a = np.array(((3,4),(5,6)))
print np.sum(a) # 18
print np.sum(a, axis=0) # [8, 10]
print np.sum(a, axis=1) # [7, 11]
Now you are protesting: I don't want to sum the elements, I want to count the non-zero elements... but
if you write a logical test on an array, you obtain an array of booleans, e.g, we want to test which elements of a are even
print a%2==0
# [[False True]
# [False True]]
False is zero and True is one, at least when we sum it...
print np.sum(a%2==0) # 2
or, if you want to sum over a column, i.e., the index that changes is the 0-th
print np.sum(a%2==0, axis=0) # [0 2]
or sum across a row
print np.sum(a%2==0, axis=1) # [1 1]
To summarize, for your particular use case
by_col = np.sum(M!=0, axis=0)
# use the counts of non-zero terms in each column, stored in an array
...
# if you need the grand total, use sum again
total = np.sum(by_col)

fast way to get the indices of a lower triangular matrix as 1 dimensional list in python

Given the number of rows (or columns) , n, of a square matrix, I am trying to get the index pairs of the lower triangular matrix in a 1 dimensional list. So far I thought of the following solution:
def getLowerTriangularIndices(n):
inds=[];
for i in range(1,n):
for j in range(i):
inds.append((i,j))
return inds;
Considering the two for loops, it would be far better to have a more efficient way of calculating this maybe using numpy. Does anyone have a suggestion?

Numpy has a method for that...
import numpy as np
# create your matrix. If it's not yet a numpy array, make it one
ar = np.array(matrix)
indices = np.tril_indices_from(ar)
This returns a tuple of two arrays. If you want to have them as lists, you could do
indices = [list(x) for x in np.tril_indices_from(ar)]
You actually do not need to have an array to get the indices, there is also np.tril_indices, which takes the shape as arguments.
So your function would read:
def getLowerTriangularIndices(n):
return [list(x) for x in np.tril_indices(n)]
or if you want a list of tuples instead:
def getLowerTriangularIndices(n):
return zip(*np.tril_indices(n)]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to calculate a sum of mismatching elements in two NumPy arrays - python

Related

Python: general sum over numpy rows

How to return a numpy array with values where, the common indices values for 2 arrays are both greater than 0

Speed up double for loop in numpy

Walk through each column in a numpy matrix efficiently in Python

fast way to get the indices of a lower triangular matrix as 1 dimensional list in python

Categories

Resources