I have multiple arrays of the same dimension, or rather a matrix say
data.shape
# (n, m)
I want to interpolate the m-axis and leave the n-axis. Ideally I would get a function which I can call by with an x-array of length n.
interpolated(x)
x.shape
# (n,)
I tried
from scipy import interpolate
interpolated = interpolate.interp1d(x=x_points, y=data)
interpolated(x).shape
# (n, n)
but this evaluates every array at the given point. Is there a better way to do it than ugly loops like
interpolated = array(interpolate.interp1d(x=x_points, y=array_) for
array_ in data)
array(func_(xi) for func_, xi in zip(interpolated, x))
Your (n,m)-shaped data is, as you said, is a collection of n datasets, each of length m. You're trying to pass this an n-length x array, and expect to obtain an n-length result. That is, you're querying the n independent datasets at n unrelated points.
This makes me believe that you need to use n independent interpolators. There is no real benefit in trying to get away with a single call to an interpolation routine. Interpolation routines as far as I know assume that the target of the interpolation is a single object. Either a multivariate function, or a function that has an array-shaped value; in either case you can query the function one (optionally higher-dimensional) point at a time. For instance, multilinear interpolation works across rows of the input, so there's (again, as far as I know) no way to "interpolate linearly along an axis". In your case, there is absolutely no relationship between the rows of your data, and there's no relationship between query points, so it's also semantically motivated to use n independent interpolators for your problem.
As for convenience, you can shove all those interpolating functions into a single function for ease of use:
interpolated = [interpolate.interp1d(x=x_points, y=array_) for
array_ in data]
def common_interpolator(x):
'''interpolate n separate datasets at n separate input points'''
return array([fun(xx) for fun,xx in zip(interpolated,x)])
This will allow you to use a single call to common_interpolator with an input array_like of length n.
But since you mentioned it in comments, you can actually make use of np.vectorize if you want to add multiple sets if query points to this function. Here's a complete example with three trivial dummy functions:
import numpy as np
# three scalar (well, or vectorized) functions:
funs = [lambda x,i=i: x+i for i in range(3)]
# define a wrapper for calling them together
def allfuns(xs):
'''bundled call to functions: n-length input to n-length output'''
return np.array([fun(x) for fun,x in zip(funs,xs)])
# define a vectorized version of the wrapper, (...,n) to (...,n)-shape
allfuns_vector = np.vectorize(allfuns,signature='(n)->(n)')
# print some examples
x = np.arange(3)
print([fun(xx) for fun,xx in zip(funs,x)])
# [0, 2, 4]
print(allfuns(x))
# [0 2 4]
print(allfuns_vector(x))
# [0 2 4]
print(allfuns_vector([x,x+10]))
#[[ 0 2 4]
# [10 12 14]]
As you can see, all of the above work the same way for a 1d input array. But we can pass a (k,n)-shaped array to the vectorized version and it will perform the interpolation row-wise, that is each [:,n] slice will be fed to the original interpolator bundle. As far as I know np.vectorize is essentially a wrapper for a for loop, but at least it makes calling your functions more convenient.
Related
I have a function foo(x,y) that takes two scalars (or lists of scalars) and returns a scalar output (or list of scalars computed pairwise from the input). I want to be able to evaluate this function over 2 orthogonal arrays such that the output is a matrix ij of foo(x[i], y[j]).
I have a for-loop version that solves this problem as below:
import numpy as np
x = np.range(50) # Could be linspaces, whatever the axis in the vector space is
y = np.range(50)
mat = np.zeros(len(x), len(y)) # To hold the result for plotting
for i in range(len(x)):
for j in range(len(y)):
mat[i][j] = foo(x[i], y[j])
where my result is stored in mat. However, this is dreadfully slow, and looks to me as if it could easily be vectorized. I'm not aware of how Python solves this problem however, as this doesn't appear to be something like zip or map. Is there another such function or concept (beyond trivially making extremely long arrays of the same array rotated by a value and passing them that way) that could vectorize this successfully? Or is the nature of the foo function limiting the ability to vectorize this?
In this case, itertools.product is the tool you want. It generates an iterable sequence of elements from the Cartesian product of N inputs, which you can use to discretely map a vector space. You can then evaluate foo on these. This isn't vectorization per se, but does reduce the nested for loop.
See docs at https://docs.python.org/3/library/itertools.html#itertools.product
Equation f(x,a,b) below requires an iterative solution, for which I am using one of the scipy optimisation methods ('brentq') which is essentially calculating the value of x for which f(x,a,b)=0.
However, I need to use array inputs for 'a' and 'b' and the arrays are very large e.g. could be as high as 1-100 million.
What is the most efficient/fastest way to do this with scipy/numpy? At present I am resorting to for loops as per below, but this becomes slow with my actual underlying equations (not shown). Note that each row in array is independent of others.
import numpy as np
from scipy import optimize
# function to solve (simplified)
def f(x,a,b): return (a/x)**0.25 * (x**0.5) - b*x
# array size
N = 10000000
# example input arrays from which 'a' and 'b' are taken (in reality values come from other complex functions)
A = np.linspace(1,500,N)
B = np.linspace(0.1,1,N)
# solution using brentq
results = [optimize.brentq(f, 1e10, 1000, args=(a,b)) for a,b in zip(A,B)]
results
I have to to multiple operations on sub-arrays like matrix inversions or building determinants. Since for-loops are not very fast in Python I wonder what is the best way to do this.
import numpy as np
n = 8
a = np.random.rand(3,3,n)
b = np.empty(n)
c = np.zeros_like(a)
for i in range(n):
b[i] = np.linalg.det(a[:,:,i])
c[:,:,i] = np.linalg.inv(a[:,:,i])
Those numpy.linalg functions would accept n-dim arrays as long as the last two axes are the ones that form the 2D slices along which functions are intended to be operated upon. Hence, to solve our cases, permute axes to bring-up the axis of iteration as the first one, perform the required operation and if needed push-back that axis back to it's original place.
Hence, we could get those outputs, like so -
b = np.linalg.det(np.moveaxis(a,2,0))
c = np.moveaxis(np.linalg.inv(np.moveaxis(a,2,0)),0,2)
I am trying to get rid of the for loop and instead do an array-matrix multiplication to decrease the processing time when the weights array is very large:
import numpy as np
sequence = [np.random.random(10), np.random.random(10), np.random.random(10)]
weights = np.array([[0.1,0.3,0.6],[0.5,0.2,0.3],[0.1,0.8,0.1]])
Cov_matrix = np.matrix(np.cov(sequence))
results = []
for w in weights:
result = np.matrix(w)*Cov_matrix*np.matrix(w).T
results.append(result.A)
Where:
Cov_matrix is a 3x3 matrix
weights is an array of n lenght with n 1x3 matrices in it.
Is there a way to multiply/map weights to Cov_matrix and bypass the for loop? I am not very familiar with all the numpy functions.
I'd like to reiterate what's already been said in another answer: the np.matrix class has much more disadvantages than advantages these days, and I suggest moving to the use of the np.array class alone. Matrix multiplication of arrays can be easily written using the # operator, so the notation is in most cases as elegant as for the matrix class (and arrays don't have several restrictions that matrices do).
With that out of the way, what you need can be done in terms of a call to np.einsum. We need to contract certain indices of three matrices while keeping one index alone in two matrices. That is, we want to perform w_{ij} * Cov_{jk} * w.T_{ki} with a summation over j, k, giving us an array with i indices. The following call to einsum will do:
res = np.einsum('ij,jk,ik->i', weights, Cov_matrix, weights)
Note that the above will give you a single 1d array, whereas you originally had a list of arrays with shape (1,1). I suspect the above result will even make more sense. Also, note that I omitted the transpose in the second weights argument, and this is why the corresponding summation indices appear as ik rather than ki. This should be marginally faster.
To prove that the above gives the same result:
In [8]: results # original
Out[8]: [array([[0.02803215]]), array([[0.02280609]]), array([[0.0318784]])]
In [9]: res # einsum
Out[9]: array([0.02803215, 0.02280609, 0.0318784 ])
The same can be achieved by working with the weights as a matrix and then looking at the diagonal elements of the result. Namely:
np.diag(weights.dot(Cov_matrix).dot(weights.transpose()))
which gives:
array([0.03553664, 0.02394509, 0.03765553])
This does more calculations than necessary (calculates off-diagonals) so maybe someone will suggest a more efficient method.
Note: I'd suggest slowly moving away from np.matrix and instead work with np.array. It takes a bit of getting used to not being able to do A*b but will pay dividends in the long run. Here is a related discussion.
I'm trying to create a naive numerical integration function to illustrate the benefits of Monte Carlo integration in high dimensions. I want something like this:
def quad_int(f, mins, maxs, numPoints=100):
'''
Use the naive (Riemann sum) method to numerically integrate f on a box
defined by the mins and maxs.
INPUTS:
f - A function handle. Should accept a 1-D NumPy array
as input.
mins - A 1-D NumPy array of the minimum bounds on integration.
maxs - A 1-D NumPy array of the maximum bounds on integration.
numPoints - An integer specifying the number of points to sample in
the Riemann-sum method. Defaults to 100.
'''
n = len(mins)
# Create a grid of evenly spaced points to evaluate f on
# Evaluate f at each point in the grid; sum all these values up
dV = np.prod((maxs-mins/numPoints))
# Multiply the sum by dV to get the approximate integral
I know my dV is going to cause problems with numerical stability, but right now what I'm having trouble with is creating the domain. If the number of dimensions was fixed, it would be easy enough to just use np.meshgrid like this:
# We don't want the last value since we are using left-hand Riemann sums
x = np.linspace(mins[0],maxs[0],numPoints)[:-1]
y = np.linspace(mins[1],maxs[1],numPoints)[:-1]
z = np.linspace(mins[2],maxs[2],numPoints)[:-1]
X, Y, Z = np.meshgrid(x,y,z)
tot = 0
for index, x in np.ndenumerate(X):
tot += f(x, Y[index], Z[index])
Is there an analogue to np.meshgrid that can do this for arbitrary dimensions, maybe accept a tuple of arrays? Or is there some other way to do Riemann sums in higher dimensions? I've thought about doing it recursively but can't figure out how that would work.
You could use a list comprehension to generate all of the linspaces, and then pass that list to meshgrid with a * (to convert the list to a tuple of arguments).
XXX = np.meshgrid(*[np.linspace(i,j,numPoints)[:-1] for i,j in zip(mins,maxs)])
XXX is now a list of n arrays (each n dimensional).
I'm using straight forward Python list and argument operations.
np.lib.index_tricks has other index and grid generation functions and classes that might be of use. It's worth reading just to see how things can be done.
A neat trick used in various numpy functions when indexing arrays of unknown dimension, is to construct as list of the desired indices. It can include slice(None) where you'd normally see :. Then convert it to a tuple and use it.
In [606]: index=[2,3]
In [607]: [slice(None)]+index
Out[607]: [slice(None, None, None), 2, 3]
In [609]: Y[tuple([slice(None)]+index)]
Out[609]: array([ 0. , 0.5, 1. , 1.5])
In [611]: Y[:,2,3]
Out[611]: array([ 0. , 0.5, 1. , 1.5])
They use a list where they need to change elements. Converting to a tuple isn't always needed
index=[slice(None)]*3
index[1]=0
Y[index] # same as Y[:,0,:]