pythonic way to apply function to object multiple times - python

I want to repeatedly sum over varying dimensions of a numpy ndarray eg.
#what I've got
sumOverDims = [6 4 2 1]
ndarray = any n-dimensional numpy array
#what I want to do
ndarray.sum(6).sum(4).sum(2).sum(1)
how can I do this without an ugly loop?

Numpy's sum accepts a tuple for the axis argument:
ndarray.sum(axis=(1,2,4,6))

In general, a thing like
X.f(e0).f(e1).f(e2).…
can be rephrased as
reduce(lambda a, b: a.f(b), [ e0, e1, e2, … ], X)
or (if you dislike lambdas):
def f_(a, b): return a.f(b)
reduce(f_, [ e0, e1, e2, … ], X)
But I'm a bit in doubt if this really makes it more readable and effectively clearer (and thus more Pythonic) than using an iterative loop:
result = X
for e in [ e0, e1, e2, … ]:
result = result.f(e)
return result
I guess it boils down to a matter of taste and what you are more used to.

You could use reduce. An example with only two dimensions:
>>> A = numpy.array([[1,2,3],[4,5,6],[7,8,9]])
>>> reduce(numpy.sum, [1], A)
array([ 6, 15, 24])
>>> reduce(numpy.sum, [1, 0], A)
45
The argument to reduce is numpy's sum function, a list with the dimensions to sum over, and the numpy array A as the initial element to reduce. The function gets as parameter the (partially summed) numpy array and the dimension to sum next.

Related

PyTorch - Efficient way to apply different functions to different 'row/column' of a tensor

Let's say I have a 2-d tensor:
x = torch.Tensor([[1, 2], [3, 4]])
Is there an efficient way to apply one function to the first 'row' [1, 2] and apply a second different function to the second row [3, 4]? (Doesn't have to be a row, could be across any dimension)
At the moment, I use the following code: Say I have my two functions, f and g, for example,
def f(z):
return 2 * z
def g(z):
return 0.5 * z
Then, to apply them to seperate rows I would do:
torch.cat([f(x[0]).unsqueeze(0), g(x[1]).unsqueeze(0)], dim = 0)
which gives the desired tensor [[2, 4], [1.5, 2]].
Obviously, in this 2-d example this solution is fine, but it seems a bit clunky. Is there a better way of doing this? Particularly in higher dimensions or when there are a large number of elements in the chosen dimension
A handy tip is to slice instead of selecting to avoid the unsqueeze step. Indeed, notice how x[:1] keeps the indexed dimension compared to x[0].
This way you can perform the desired operation in a slightly shorter form:
>>> torch.vstack((f(x[:1]), g(x[1:])))
Optionally you can use vstack to not have to provide dim=0 to torch.stack.
Alternatively, you can use a helper function that will apply both f and g:
>>> fn = lambda a,b: (f(a), g(b))
And split the tensor inline with torch.Tensor.split:
>>> torch.vstack(fn(*x.split(1)))

Optimize testing all combinations of rows from multiple NumPy arrays

I have three NumPy arrays of ints, same number of columns, arbitrary number of rows each. I am interested in all instances where a row of the first one plus a row of the second one gives a row of the third one ([3, 1, 4] + [1, 5, 9] = [4, 6, 13]).
Here is a pseudo-code:
for i, j in rows(array1), rows(array2):
if i + j is in rows(array3):
somehow store the rows this occured at (eg. (1,2,5) if 1st row of
array1 + 2nd row of array2 give 5th row of array3)
I will need to run this for very big matrices so I have two questions:
(1) I can write the above using nested loops but is there a quicker way, perhaps list comprehensions or itertools?
(2) What is the fastest/most memory-efficient way to store the triples? Later I will need to create a heatmap using two as coordinates and the first one as the corresponding value eg. point (2,5) has value 1 in the pseudo-code example.
Would be very grateful for any tips - I know this sounds quite simple but it needs to run fast and I have very little experience with optimization.
edit: My ugly code was requested in comments
import numpy as np
#random arrays
A = np.array([[-1,0],[0,-1],[4,1], [-1,2]])
B = np.array([[1,2],[0,3],[3,1]])
C = np.array([[0,2],[2,3]])
#triples stored as numbers with 2 coordinates in a otherwise-zero matrix
output_matrix = np.zeros((B.shape[0], C.shape[0]), dtype = int)
for i in range(A.shape[0]):
for j in range(B.shape[0]):
for k in range(C.shape[0]):
if np.array_equal((A[i,] + B[j,]), C[k,]):
output_matrix[j, k] = i+1
print(output_matrix)
We can leverage broadcasting to perform all those summations and comparison in a vectorized manner and then use np.where on it to get the indices corresponding to the matching ones and finally index and assign -
output_matrix = np.zeros((B.shape[0], C.shape[0]), dtype = int)
mask = ((A[:,None,None,:] + B[None,:,None,:]) == C).all(-1)
I,J,K = np.where(mask)
output_matrix[J,K] = I+1
(1) Improvements
You can use sets for the final result in the third matrix, as a + b = c must hold identically. This already replaces one nested loop with a constant-time lookup. I will show you an example of how to do this below, but we first ought to introduce some notation.
For a set-based approach to work, we need a hashable type. Lists will thus not work, but a tuple will: it is an ordered, immutable structure. There is, however, a problem: tuple addition is defined as appending, that is,
(0, 1) + (1, 0) = (0, 1, 1, 0).
This will not do for our use-case: we need element-wise addition. As such, we subclass the built-in tuple as follows,
class AdditionTuple(tuple):
def __add__(self, other):
"""
Element-wise addition.
"""
if len(self) != len(other):
raise ValueError("Undefined behaviour!")
return AdditionTuple(self[idx] + other[idx]
for idx in range(len(self)))
Where we override the default behaviour of __add__. Now that we have a data-type amenable to our problem, let's prepare the data.
You give us,
A = [[-1, 0], [0, -1], [4, 1], [-1, 2]]
B = [[1, 2], [0, 3], [3, 1]]
C = [[0, 2], [2, 3]]
To work with. I say,
from types import SimpleNamespace
A = [AdditionTuple(item) for item in A]
B = [AdditionTuple(item) for item in B]
C = {tuple(item): SimpleNamespace(idx=idx, values=[])
for idx, item in enumerate(C)}
That is, we modify A and B to use our new data-type, and turn C into a dictionary which supports (amortised) O(1) look-up times.
We can now do the following, eliminating one loop altogether,
from itertools import product
for a, b in product(enumerate(A), enumerate(B)):
idx_a, a_i = a
idx_b, b_j = b
if a_i + b_j in C: # a_i + b_j == c_k, identically
C[a_i + b_j].values.append((idx_a, idx_b))
Then,
>>>print(C)
{(2, 3): namespace(idx=1, values=[(3, 2)]), (0, 2): namespace(idx=0, values=[(0, 0), (1, 1)])}
Where for each value in C, you get the index of that value (as idx), and a list of tuples of (idx_a, idx_b) whose elements of A and B together sum to the value at idx in C.
Let us briefly analyse the complexity of this algorithm. Redefining the lists A, B, and C as above is linear in the length of the lists. Iterating over A and B is of course in O(|A| * |B|), and the nested condition computes the element-wise addition of the tuples: this is linear in the length of the tuples themselves, which we shall denote k. The whole algorithm then runs in O(k * |A| * |B|).
This is a substantial improvement over your current O(k * |A| * |B| * |C|) algorithm.
(2) Matrix plotting
Use a dok_matrix, a sparse SciPy matrix representation. Then you can use any heatmap-plotting library you like on the matrix, e.g. Seaborn's heatmap.

Unpacking a list in python using .T?

I'm using scipy's method integrate.odeint to solve a second order LDE. The method requires that the equation be put in the form of a system of two first-order equations in two unknowns. The method
odeint(system_matrix,initial_conditions_matrix,time_values)
outputs the solution vector at each point of time in time_values. The solution vector is actually of the form [u,u'], where u is the variable I am interested in. So I want to plot only u. I found online one way of accomplishing this is to use
u,u'=odeint(system_matrix,initial_conditions_matrix,time_values).T
but I don't understand why this works and what does the .T at the end mean?
odeint(system_matrix,initial_conditions_matrix,time_values) is a matrix of 2 columns.
To be able to get the first column, first use .T (transpose) and then you are able to unpack since the elements are oriented like you want.
BTW I doubt that u' is a valid variable name. I would do:
u,_ = odeint(system_matrix,initial_conditions_matrix,time_values).T
since second value is of no interest to you.
The example I have in mind is:
>>> sol = odeint(pend, y0, t, args=(b, c))
The solution is an array with shape (101, 2). The first column is theta(t), and the second is omega(t). The following code plots both components.
>>>
>>> import matplotlib.pyplot as plt
>>> plt.plot(t, sol[:, 0], 'b', label='theta(t)')
>>> plt.plot(t, sol[:, 1], 'g', label='omega(t)')
sol[:,0] selects the first column of sol
Unpacking is usually used with a function that returns a tuple, for example:
def foo():
....
return [1,2,3],{3:3}
x, y = foo()
should end up with x being a list, y a dictionary.
But it works with any iterable, provide the number of terms match. For example a 2 row array can be unpacked into 2 arrays.
In [1]: x, y = np.arange(6).reshape(2,3)
In [4]: x,y
Out[4]: (array([0, 1, 2]), array([3, 4, 5]))
If I'd created a (3,2) array I would have needed x,y,z= ..., or .T.
Because we can index columns and rows, unpacking isn't used a lot in numpy. Usually we have too many rows to unpack. But it works just as basic Python intended to.
As a matter of curiosity, transpose works on a tuple
In [6]: np.transpose((x,y))
Out[6]:
array([[0, 3],
[1, 4],
[2, 5]])
This is actually used in np.argwhere, which turns the tuple of indices produced by np.where into array with the same number of columns as dimensions.

Function that acts on all elements of numpy array?

I wonder if you can define a function to act on all elements of a 1-D numpy array simultaneously, so that you don't have to loop over the array. Similar to the way you can, for example, square all elements of an array without looping. An example of what I'm after is to replace this code:
A = np.array([ [1,4,2], [5,1,8], [2,9,5], [3,6,6] ])
B = []
for i in A:
B.append( i[0] + i[1] - i[2] )
B = array(B)
print B
Output:
>>> array([3, -2, 6, 3])
With something like:
A = np.array([ [1,4,2], [5,1,8], [2,9,5], [3,6,6] ])
def F(Z):
return Z[0] + Z[1] - Z[2]
print F(A)
So that the output is something like:
>>> array( [ [3] , [-2], [6], [3] ] )
I know the 2nd code won't produce what I'm after, but I'm just trying to give an idea of what I'm talking about. Thanks!
EDIT:
I used the function above just as a simple example. The real function I'd like to use is something like this:
from numpy import linalg as LA
def F(Z):
#Z is an array of matrices
return LA.eigh(Z)[0]
So I have an array of 3x3 matrices, and I'd like an output array of their eigenvalues. And I'm wondering if it's possible to do this in some numpythonic way, so as not to have to loop over the array.
Try:
np.apply_along_axis(F, 1, A)

Calculating new entries in array based on entries from another array in python

I have a question based an how to "call" a specific cell in an array, while looping over another array.
Assume, there is an array a:
a = [[a1 a2 a3],[b1 b2 b3]]
and an array b:
b = [[c1 c2] , [d1 d2]]
Now, I want to recalculate the values in array b, by using the information from array a. In detail, each value of array b has to be recalculated by multiplication with the integral of the gauss-function between the borders given in array a. but for the sake of simplicity, lets forget about the integral, and assume a simple calculation is necessary in the form of:
c1 = c1 * (a2-a1) ; c2 = c2 * (a3 - a2) and so on,
with indices it might look like:
b[i,j] = b[i,j] * (a[i, j+1] - a[i,j])
Can anybody tell me how to solve this problem?
Thank you very much and best regards,
Marc
You can use zip function within a nested list comprehension :
>>> [[k*(v[1]-v[0]) for k,v in zip(v,zip(s,s[1:]))] for s,v in zip(a,b)]
zip(s,s[1:]) will gave you the desire pairs of elements that you want, for example :
>>> s =[4, 5, 6]
>>> zip(s,s[1:])
[(4, 5), (5, 6)]
Demo :
>>> b =[[7, 8], [6, 0]]
>>> a = [[1,5,3],[4 ,0 ,6]]
>>> [[k*(v[1]-v[0]) for k,v in zip(v,zip(s,s[1:]))] for s,v in zip(a,b)]
[[28, -16], [-24, 0]]
you can also do this really cleanly with numpy:
import numpy as np
a, b = np.array(a), np.array(b)
np.diff(a) * b
First I would split your a table in a table of lower bound and one of upper bound to work with aligned tables and improve readability :
lowerBounds = a[...,:-1]
upperBounds = a[...,1:]
Define the Gauss function you provided :
def f(x, gs_wdth = 1., mean=0.):
return 1./numpy.sqrt(2*numpy.pi)*gs_wdth * numpy.exp(-(x-mean)**2/(2*gs_wdth**2))
Then, use a nditer (see Iterating Over Arrays) to efficientely iterate over the arrays :
it = numpy.nditer([b, lowerBounds, upperBounds],
op_flags=[['readwrite'], ['readonly'], ['readonly']])
for _b, _lb, _ub in it:
multiplier = scipy.integrate.quad(f, _lb, _ub)[0]
_b[...] *= multiplier
print b
This does the job required in your post, and should be computationnaly efficient. Note that b in modified "in-place" : original values are lost but there is no memory overshoot during calculation.

Categories