Python: general sum over numpy rows - python

I want to sum all the lines of one matrix hence, if I have a n x 2 matrix, the result should be a 1 x 2 vector with all rows summed. I can do something like that with np.sum( arg, axis=1 ) but I get an error if I supply a vector as argument. Is there any more general sum function which doesn't throw an error when a vector is supplied? Note: This was never a problem in MATLAB.
Background: I wrote a function which calculates some stuff and sums over all rows of the matrix. Depending on the number of inputs, the matrix has a different number of rows and the number of rows is >= 1

According to numpy.sum documentation, you cannot specify axis=1 for vectors as you would get a numpy AxisError saying axis 1 is out of bounds for array of dimension 1.
A possible workaround could be, for example, writing a dedicated function that checks the size before performing the sum. Please find below a possible implementation:
import numpy as np
M = np.array([[1, 4],
[2, 3]])
v = np.array([1, 4])
def sum_over_columns(input_arr):
if len(input_arr.shape) > 1:
return input_arr.sum(axis=1)
return input_arr.sum()
print(sum_over_columns(M))
print(sum_over_columns(v))
In a more pythonic way (not necessarily more readable):
def oneliner_sum(input_arr):
return input_arr.sum(axis=(1 if len(input_arr.shape) > 1 else None))

You can do
np.sum(np.atleast_2d(x), axis=1)
This will first convert vectors to singleton-dimensional 2D matrices if necessary.

Related

comparing numpy arrays element-wise setting an element-wise result

possibly this has been asked before, but I have a hard time finding a corresponding solution, since I can't find the right keywords to search for it.
One huge advantage of numpy arrays that I like is that they are transparent for many operations.
However in my code I have a function that has a conditional statement in the form (minimal working example):
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([1, 1, 3])
def func(x, y):
if x > y:
z = 1
else:
z = 2
return z
func(arr1, arr2) obviously results in an error message:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I do understand what the problem here is and why it can't work like this.
What I would like to do here is that x > y is evaluated for each element and then an array z is returned with the corresponding result for each comparison. (Needs to ensure of course that the arrays are of equal length, but that's not a problem here)
I know that I could do this by changing func such that it iterates over the elements, but since I'm trying to improve my numpy skills: is there a way to do this without explicit iteration?
arr1 > arr2 does exactly what you'd hope it does: compare each element of the two arrays and build an array with the result. The result can be used to index in any of the two arrays, should you need to. The equivalent of your function, however, can be done with np.where:
res = np.where(arr1 > arr2, 1, 2)
or equivalently (but slightly less efficiently), using the boolean array directly:
res = np.ones(arr1.shape)
res[arr1 <= arr2] = 2 # note that I have to invert the condition to select the cells where you want the 2

Multiply each row of a matrix with it's conjugate transposed numpy

I have a numpy.ndarray variable A of size MxN. I wish to take each row and multiply with it's conjugate transposed. For the first row we will get:
np.matmul(np.expand_dims(A[0,:],axis=1),np.expand_dims(A[0,:].conj(),axis=0))
we get an NxN sized result. I want the final result for the total operation to be of size MxNxN.
I can fo this with a simple loop which iterates over the rows of A and concatenates the results. I wish to avoid a for loop for a faster run time with SIMD operations. Is there a way to do this in a single code line with broadcasting?
Otherwise, can I do something else and somehow reshape the results into my requierment?
The next code does what the same as your code snippet but without for-loop. On the other hand, it uses np.repeat twice, so you will need to benchmark both versions and compare them to test their memory/time performance.
import numpy as np
m, n = A.shape
x, y = A.conj().repeat(n, axis=0), A.reshape([-1, 1]).repeat(n, axis=1)
B = (x * y).reshape([m, n, n])
How it works
Basically x holds the conjugate values of the array A in a single column and then is repeated n times on the column axis (it has a shape m*n by n).
y repeats each row in the conjugate matrix of A, n consecutive times (its final shape is m*n by n also)
x and y are multiplied element-wise and the result is unwrapped to a matrix of shape m by n by n stored in B
A list comprehension comprehension could do the trick:
result = np.array([np.matmul(np.expand_dims(A[i,:],axis=1), np.expand_dims(A[i,:].conj(),axis=0)) for i in range(A.shape[0])])

Performing mathematical operations with arrays of arbirary length?

I don't understand this question. Actually just this part;
"Given two vectors of length n that are represented with one-dimensional arrays"
I use two vectors but I don't know what value they have.
For example,
vector can be a = [1,2,3]
but I don't know exactly what are they? What do they have?
Maybe it is a = [3,4,5].
You don't need numpy do something as simple as this.
Instead just translate the formula into Python code:
import math
a = [1, 2, 3]
b = [3, 4, 5]
n = len(a)
# Compute Euclidean distance between vectors "a" and "b".
# First sum the squares of the difference of each component of vectors.
distance = 0
for i in range(n):
difference = a[i] - b[i]
distance += difference * difference
# The answer is square root of those summed differences.
distance = math.sqrt(distance)
print(distance) # -> 3.4641016151377544
Your task is to write code that computes the value if the vectors a and b are given. Your job is not to write down a number.
You could start with this:
distance = 0
for value in a:
[your code]
print(distance)
You could use numpy. Your so called vectors would then correspond to numpy arrays.
import numpy as np
np.sqrt(np.sum(np.power(a-b,2)))
You might need to add this before
a, b = np.array(a),np.array(b)

Walk through each column in a numpy matrix efficiently in Python

I have a very big two-dimensions array in Python, using numpy library. I want to walk through each column efficiently and check each time if elements are different from 0 to count their number in every column.
Suppose I have the following matrix.
M = array([[1,2], [3,4]])
The following code enables us to walk through each row efficiently, for example (it is not what I intend to do of course!):
for row_idx, row in enumerate(M):
print "row_idx", row_idx, "row", row
for col_idx, element in enumerate(row):
print "col_idx", col_idx, "element", element
# update the matrix M: square each element
M[row_idx, col_idx] = element ** 2
However, in my case I want to walk through each column efficiently, since I have a very big matrix.
I've heard that there is a very efficient way to achieve this using numpy, instead of my current code:
curr_col, curr_row = 0, 0
while (curr_col < numb_colonnes):
result = 0
while (curr_row < numb_rows):
# If different from 0
if (M[curr_row][curr_col] != 0):
result += 1
curr_row += 1
.... using result value ...
curr_col += 1
curr_row = 0
Thanks in advance!
In the code you showed us, you treat numpy's arrays as lists and for what you can see, it works! But arrays are not lists, and while you can treat them as such it wouldn't make sense to use arrays, or even numpy.
To really exploit the usefulness of numpy you have to operate directly on arrays, writing, e.g.,
M = M*M
when you want to square the elements of an array and using the rich set of numpy functions to operate directly on arrays.
That said, I'll try to get a bit closer to your problem...
If your intent is to count the elements of an array that are different from zero, you can use the numpy function sum.
Using sum, you can obtain the sum of all the elements in an array, or you can sum across a particular axis.
import numpy as np
a = np.array(((3,4),(5,6)))
print np.sum(a) # 18
print np.sum(a, axis=0) # [8, 10]
print np.sum(a, axis=1) # [7, 11]
Now you are protesting: I don't want to sum the elements, I want to count the non-zero elements... but
if you write a logical test on an array, you obtain an array of booleans, e.g, we want to test which elements of a are even
print a%2==0
# [[False True]
# [False True]]
False is zero and True is one, at least when we sum it...
print np.sum(a%2==0) # 2
or, if you want to sum over a column, i.e., the index that changes is the 0-th
print np.sum(a%2==0, axis=0) # [0 2]
or sum across a row
print np.sum(a%2==0, axis=1) # [1 1]
To summarize, for your particular use case
by_col = np.sum(M!=0, axis=0)
# use the counts of non-zero terms in each column, stored in an array
...
# if you need the grand total, use sum again
total = np.sum(by_col)

fast way to get the indices of a lower triangular matrix as 1 dimensional list in python

Given the number of rows (or columns) , n, of a square matrix, I am trying to get the index pairs of the lower triangular matrix in a 1 dimensional list. So far I thought of the following solution:
def getLowerTriangularIndices(n):
inds=[];
for i in range(1,n):
for j in range(i):
inds.append((i,j))
return inds;
Considering the two for loops, it would be far better to have a more efficient way of calculating this maybe using numpy. Does anyone have a suggestion?
Numpy has a method for that...
import numpy as np
# create your matrix. If it's not yet a numpy array, make it one
ar = np.array(matrix)
indices = np.tril_indices_from(ar)
This returns a tuple of two arrays. If you want to have them as lists, you could do
indices = [list(x) for x in np.tril_indices_from(ar)]
You actually do not need to have an array to get the indices, there is also np.tril_indices, which takes the shape as arguments.
So your function would read:
def getLowerTriangularIndices(n):
return [list(x) for x in np.tril_indices(n)]
or if you want a list of tuples instead:
def getLowerTriangularIndices(n):
return zip(*np.tril_indices(n)]

Categories