Reducing redundancy for calculating large number of integrals numerically - python

I need to calculate the following integral on a 2D-grid (x,y positions):
with r = sqrt(x^2 + y^2) and the 2D-grid centered at x=y=0.
The implementation is straightforward:
import numpy as np
from scipy import integrate
def integralFunction(x):
def squareSaturation(y):
return np.sqrt(1-np.exp(-y**2))
return integrate.quad(squareSaturation,0,x)[0]
#vectorize function to apply function with integrals on np-array
integralFunctionVec = np.vectorize(integralFunction)
xmax = ymax = 5
Nx = Ny = 1024
X, Y = np.linspace(-xmax, xmax, Nx), np.linspace(-ymax, ymax, Ny)
X, Y = np.meshgrid(X, Y)
R = np.sqrt(X**2+Y**2)
Z = integralFunctionVec(R)
However, I'm currently working on a 1024x1024 grid and the calculation takes ~1.5 minutes. Now there is some redundancy in those calculations that I want to reduce to speed up the calculation. Namely:
As the grid is centered around r = 0, many values for r on the grid are the same. Due to symmetry only ~1/8 of all values are unique (for a square grid). One idea was to calculate the integral only for the unique values (found via np.unique) and then save them in a look-up table (hashmap?) Or I could cache the function values so that only new values are calculated (via #lru_cache). But does that actually work when I vectorize the function afterwards?
As the integral goes from 0 to r, the integral is often calculating integrals over intervals it has already calculated. E.g. if you calculate from 0 to 1 and afterwards from 0 to 2, only the interval from 1 to 2 is "new". But what would be the best way to utilize that? And would that even be a real performance boost using scipy.integrate.quad?
Do you have any feedback or other ideas to optimize this calculation?

You can use Numba to speed up the computation of quad. Here is an example:
import numpy as np
import numba as nb
from scipy import integrate
#nb.cfunc('float64(float64)')
def numbaSquareSaturation(y):
return np.sqrt(1-np.exp(-y**2))
squareSaturation = scipy.LowLevelCallable(numbaSquareSaturation.ctypes)
def integralFunction(x):
return integrate.quad(squareSaturation,0,x)[0]
integralFunctionVec = np.vectorize(integralFunction)
xmax = ymax = 5
Nx = Ny = 1024
X, Y = np.linspace(-xmax, xmax, Nx), np.linspace(-ymax, ymax, Ny)
X, Y = np.meshgrid(X, Y)
R = np.sqrt(X**2+Y**2)
Z = integralFunctionVec(R)
This is about 25 times faster on my machine. The code is still suboptimal since squareSaturation calls introduces a big overhead but is seems SciPy does not provide a way to vectorize quad efficiently for your case. Note that using nb.cfunc+scipy.LowLevelCallable significantly speed up the execution as pointed out by #max9111.
As the grid is centered around r = 0, many values for r on the grid are the same. Due to symmetry only ~1/8 of all values are unique (for a square grid). One idea was to calculate the integral only for the unique values (found via np.unique) and then save them in a look-up table (hashmap?) Or I could cache the function values so that only new values are calculated (via #lru_cache). But does that actually work when I vectorize the function afterwards?
I do not expect this approach to be significantly faster although not recomputing the values is indeed a good idea. Note that hashmap are pretty slow as well as np.unique. I suggest you to just select the quarter of the input array R. Something like R[0:R.shape[0]//2, 0:R.shape[1]//2]. Be careful if the shape is odd.
As the integral goes from 0 to r, the integral is often calculating integrals over intervals it has already calculated. E.g. if you calculate from 0 to 1 and afterwards from 0 to 2, only the interval from 1 to 2 is "new". But what would be the best way to utilize that? And would that even be a real performance boost using scipy.integrate.quad?
This could help since the domain of a integral is smaller and the function should be smoother. This means Scipy should be faster to compute it. Even if it would not do that automatically, you can reduce the precision of the computed sub-intervals using optional parameters of quad.

Related

How better perform Pearson R from 2 arrays of dimensions (m, n) and (n), returning an array of (m) size? [Python, NumPy, SciPy]

I'm trying to improve a simple algorithm to obtaining the Pearson correlation coefficient from two arrays, X(m, n) and Y(n), returning me another array R of dimension (m).
In the case, I want to know the behavior each row of X regarding the values of Y. A sample (working) code is presented below:
import numpy as np
from scipy.stats import pearsonr
np.random.seed(1)
m, n = 10, 5
x = 100*np.random.rand(m, n)
y = 2 + 2*x.mean(0)
r = np.empty(m)
for i in range(m):
r[i] = pearsonr(x[i], y)[0]
For this particular case, I get: r = array([0.95272843, -0.69134753, 0.36419159, 0.27467137, 0.76887201, 0.08823868, -0.72608421, -0.01224453, 0.58375626, 0.87442889])
For small values of m (near 10k) this runs pretty fast, but I'm starting to work with m ~ 30k, and so this is taking much longer than I expected. I'm aware I could implement multiprocessing/multi-threading but I believe there's a (better) pythonic way of doing this.
I tried to use use pearsonr(x, np.ones((m, n))*y), but it returns only (nan, nan).
pearsonr only supports 1D array internally. Moreover, it computes the p-values which is not used here. Thus, it would be more efficient not to compute it if possible. Additionally, the code also recompute the y vector every time and it does not efficiently make use of vectorized Numpy operations. This is why the computation is a bit slow. You can check this in the code here.
One way to compute this is by writing your own custom implementation based on the one of Scipy:
def multi_pearsonr(x, y):
xmean = x.mean(axis=1)
ymean = y.mean()
xm = x - xmean[:,None]
ym = y - ymean
normxm = np.linalg.norm(xm, axis=1)
normym = np.linalg.norm(ym)
return np.clip(np.dot(xm/normxm[:,None], ym/normym), -1.0, 1.0)
It is 450 times faster on my machine for m = 10_000.
Note that I did not keep the checks of the Scipy code, but it may be a good idea to keep them if your input is not guaranteed to be statistically safe (ie. well formatted for the computation of the Pearson test).

Why is it scipy.stats.gaussian_kde() slower than seaborn.kde_plot() for the same data?

In python 3.7, I have this numpy array with shape=(2, 34900). This arrays is a list of coordinates where the index 0 represents the X axis and the index 1 represents the y axis.
When I use seaborn.kde_plot() to make a visualization of the distribution of this data, I'm able to get the result in about 5-15 seconds when running on a i5 7th generation.
But when I try to run the following piece of code:
#Find the kernel for
k = scipy.stats.kde.gaussian_kde(data, bw_method=.3)
#Define the grid
xi, yi = np.mgrid[0:1:2000*1j, 0:1:2000*1j]
#apply the function
zi = k(np.vstack([xi.flatten(), yi.flatten()]))
which finds the gaussian kernel for this data and applies it to a grid I defined, it takes much more time. I wasn't able to run the full array but when running on a slice with the size of 140, it takes about 40 seconds to complete.
The 140 sized slice does make an interesting result which I was able to visualize using plt.pcolormesh().
My question is what I am missing here. If I understood what is happening correctly, I'm using the scipy.stats.kde.gaussian_kde() to create an estimation of a function defined by the data. Then I'm applying the function to a 2D space and getting it's Z component as result. Then I'm plotting the Z component. But how can this process be any different from
seaborn.kde_plot() that makes the code take so much longer.
Scipy's implementation just goes through each point doing this:
for i in range(self.n):
diff = self.dataset[:, i, newaxis] - points
tdiff = dot(self.inv_cov, diff)
energy = sum(diff*tdiff,axis=0) / 2.0
result = result + exp(-energy)
Seaborn has in general two ways to calculate the bivariate kde. If available, it uses statsmodels, if not, it falls back to scipy.
The scipy code is similar to what is shown in the question. It uses scipy.stats.gaussian_kde. The statsmodels code uses statsmodels.nonparametric.api.KDEMultivariate.
However, for a fair comparisson we would need to take the same grid size for both methods. The standard gridsize for seaborn is 100 points.
import numpy as np; np.random.seed(42)
import seaborn.distributions as sd
N = 34900
x = np.random.randn(N)
y = np.random.randn(N)
bw="scott"
gridsize=100
cut=3
clip = [(-np.inf, np.inf), (-np.inf, np.inf)]
f = lambda x,y : sd._statsmodels_bivariate_kde(x, y, bw, gridsize, cut, clip)
g = lambda x,y : sd._scipy_bivariate_kde(x, y, bw, gridsize, cut, clip)
If we time those two functions,
# statsmodels
%timeit f(x,y) # 1 loop, best of 3: 16.4 s per loop
# scipy
%timeit g(x,y) # 1 loop, best of 3: 8.67 s per loop
Scipy is hence twice as fast as statsmodels (the seaborn default). The reason why the code in the question takes so long is that instead of a grid of size 100, a grid of size 2000 is used.
Seeing those results one would actually be tempted to use scipy instead of statsmodels. Unfortunately it does not allow to choose which one to use. One hence needs to manually set the respective flag.
import seaborn.distributions as sd
sd._has_statsmodels = False
# plot kdeplot with scipy.stats.kde.gaussian_kde
sns.kdeplot(x,y)
It seems that seaborn just takes a sample of my data. Since the size is smaller, it is able to finish it in a small amount. On the other hand, SciPy uses every single point in its processing. So it takes way longer with the size of dataset I'm using.

Fast 3D interpolation of atmospheric data in Numpy/Scipy

I am trying to interpolate 3D atmospheric data from one vertical coordinate to another using Numpy/Scipy. For example, I have cubes of temperature and relative humidity, both of which are on constant, regular pressure surfaces. I want to interpolate the relative humidity to constant temperature surface(s).
The exact problem I am trying to solve has been asked previously here, however, the solution there is very slow. In my case, I have approximately 3M points in my cube (30x321x321), and that method takes around 4 minutes to operate on one set of data.
That post is nearly 5 years old. Do newer versions of Numpy/Scipy perhaps have methods that handle this faster? Maybe new sets of eyes looking at the problem have a better approach? I'm open to suggestions.
EDIT:
Slow = 4 minutes for one set of data cubes. I'm not sure how else I can quantify it.
The code being used...
def interpLevel(grid,value,data,interp='linear'):
"""
Interpolate 3d data to a common z coordinate.
Can be used to calculate the wind/pv/whatsoever values for a common
potential temperature / pressure level.
grid : numpy.ndarray
The grid. For example the potential temperature values for the whole 3d
grid.
value : float
The common value in the grid, to which the data shall be interpolated.
For example, 350.0
data : numpy.ndarray
The data which shall be interpolated. For example, the PV values for
the whole 3d grid.
kind : str
This indicates which kind of interpolation will be done. It is directly
passed on to scipy.interpolate.interp1d().
returns : numpy.ndarray
A 2d array containing the *data* values at *value*.
"""
ret = np.zeros_like(data[0,:,:])
for yIdx in xrange(grid.shape[1]):
for xIdx in xrange(grid.shape[2]):
# check if we need to flip the column
if grid[0,yIdx,xIdx] > grid[-1,yIdx,xIdx]:
ind = -1
else:
ind = 1
f = interpolate.interp1d(grid[::ind,yIdx,xIdx], \
data[::ind,yIdx,xIdx], \
kind=interp)
ret[yIdx,xIdx] = f(value)
return ret
EDIT 2:
I could share npy dumps of sample data, if anyone was interested enough to see what I am working with.
Since this is atmospheric data, I imagine that your grid does not have uniform spacing; however if your grid is rectilinear (such that each vertical column has the same set of z-coordinates) then you have some options.
For instance, if you only need linear interpolation (say for a simple visualization), you can just do something like:
# Find nearest grid point
idx = grid[:,0,0].searchsorted(value)
upper = grid[idx,0,0]
lower = grid[idx - 1, 0, 0]
s = (value - lower) / (upper - lower)
result = (1-s) * data[idx - 1, :, :] + s * data[idx, :, :]
(You'll need to add checks for value being out of range, of course).For a grid your size, this will be extremely fast (as in tiny fractions of a second)
You can pretty easily modify the above to perform cubic interpolation if need be; the challenge is in picking the correct weights for non-uniform vertical spacing.
The problem with using scipy.ndimage.map_coordinates is that, although it provides higher order interpolation and can handle arbitrary sample points, it does assume that the input data be uniformly spaced. It will still produce smooth results, but it won't be a reliable approximation.
If your coordinate grid is not rectilinear, so that the z-value for a given index changes for different x and y indices, then the approach you are using now is probably the best you can get without a fair bit of analysis of your particular problem.
UPDATE:
One neat trick (again, assuming that each column has the same, not necessarily regular, coordinates) is to use interp1d to extract the weights doing something like follows:
NZ = grid.shape[0]
zs = grid[:,0,0]
ident = np.identity(NZ)
weight_func = interp1d(zs, ident, 'cubic')
You only need to do the above once per grid; you can even reuse weight_func as long as the vertical coordinates don't change.
When it comes time to interpolate then, weight_func(value) will give you the weights, which you can use to compute a single interpolated value at (x_idx, y_idx) with:
weights = weight_func(value)
interp_val = np.dot(data[:, x_idx, y_idx), weights)
If you want to compute a whole plane of interpolated values, you can use np.inner, although since your z-coordinate comes first, you'll need to do:
result = np.inner(data.T, weights).T
Again, the computation should be practically immediate.
This is quite an old question but the best way to do this nowadays is to use MetPy's interpolate_1d funtion:
https://unidata.github.io/MetPy/latest/api/generated/metpy.interpolate.interpolate_1d.html
There is a new implementation of Numba accelerated interpolation on regular grids in 1, 2, and 3 dimensions:
https://github.com/dbstein/fast_interp
Usage is as follows:
from fast_interp import interp2d
import numpy as np
nx = 50
ny = 37
xv, xh = np.linspace(0, 1, nx, endpoint=True, retstep=True)
yv, yh = np.linspace(0, 2*np.pi, ny, endpoint=False, retstep=True)
x, y = np.meshgrid(xv, yv, indexing='ij')
test_function = lambda x, y: np.exp(x)*np.exp(np.sin(y))
f = test_function(x, y)
test_x = -xh/2.0
test_y = 271.43
fa = test_function(test_x, test_y)
interpolater = interp2d([0,0], [1,2*np.pi], [xh,yh], f, k=5, p=[False,True], e=[1,0])
fe = interpolater(test_x, test_y)

Python 4D linear interpolation on a rectangular grid

I need to interpolate temperature data linearly in 4 dimensions (latitude, longitude, altitude and time).
The number of points is fairly high (360x720x50x8) and I need a fast method of computing the temperature at any point in space and time within the data bounds.
I have tried using scipy.interpolate.LinearNDInterpolator but using Qhull for triangulation is inefficient on a rectangular grid and takes hours to complete.
By reading this SciPy ticket, the solution seemed to be implementing a new nd interpolator using the standard interp1d to calculate a higher number of data points, and then use a "nearest neighbor" approach with the new dataset.
This, however, takes a long time again (minutes).
Is there a quick way of interpolating data on a rectangular grid in 4 dimensions without it taking minutes to accomplish?
I thought of using interp1d 4 times without calculating a higher density of points, but leaving it for the user to call with the coordinates, but I can't get my head around how to do this.
Otherwise would writing my own 4D interpolator specific to my needs be an option here?
Here's the code I've been using to test this:
Using scipy.interpolate.LinearNDInterpolator:
import numpy as np
from scipy.interpolate import LinearNDInterpolator
lats = np.arange(-90,90.5,0.5)
lons = np.arange(-180,180,0.5)
alts = np.arange(1,1000,21.717)
time = np.arange(8)
data = np.random.rand(len(lats)*len(lons)*len(alts)*len(time)).reshape((len(lats),len(lons),len(alts),len(time)))
coords = np.zeros((len(lats),len(lons),len(alts),len(time),4))
coords[...,0] = lats.reshape((len(lats),1,1,1))
coords[...,1] = lons.reshape((1,len(lons),1,1))
coords[...,2] = alts.reshape((1,1,len(alts),1))
coords[...,3] = time.reshape((1,1,1,len(time)))
coords = coords.reshape((data.size,4))
interpolatedData = LinearNDInterpolator(coords,data)
Using scipy.interpolate.interp1d:
import numpy as np
from scipy.interpolate import LinearNDInterpolator
lats = np.arange(-90,90.5,0.5)
lons = np.arange(-180,180,0.5)
alts = np.arange(1,1000,21.717)
time = np.arange(8)
data = np.random.rand(len(lats)*len(lons)*len(alts)*len(time)).reshape((len(lats),len(lons),len(alts),len(time)))
interpolatedData = np.array([None, None, None, None])
interpolatedData[0] = interp1d(lats,data,axis=0)
interpolatedData[1] = interp1d(lons,data,axis=1)
interpolatedData[2] = interp1d(alts,data,axis=2)
interpolatedData[3] = interp1d(time,data,axis=3)
Thank you very much for your help!
In the same ticket you have linked, there is an example implementation of what they call tensor product interpolation, showing the proper way to nest recursive calls to interp1d. This is equivalent to quadrilinear interpolation if you choose the default kind='linear' parameter for your interp1d's.
While this may be good enough, this is not linear interpolation, and there will be higher order terms in the interpolation function, as this image from the wikipedia entry on bilinear interpolation shows:
This may very well be good enough for what you are after, but there are applications where a triangulated, really piecewise linear, interpoaltion is preferred. If you really need this, there is an easy way of working around the slowness of qhull.
Once LinearNDInterpolator has been setup, there are two steps to coming up with an interpolated value for a given point:
figure out inside which triangle (4D hypertetrahedron in your case) the point is, and
interpolate using the barycentric coordinates of the point relative to the vertices as weights.
You probably do not want to mess with barycentric coordinates, so better leave that to LinearNDInterpolator. But you do know some things about the triangulation. Mostly that, because you have a regular grid, within each hypercube the triangulation is going to be the same. So to interpolate a single value, you could first determine in which subcube your point is, build a LinearNDInterpolator with the 16 vertices of that cube, and use it to interpolate your value:
from itertools import product
def interpolator(coords, data, point) :
dims = len(point)
indices = []
sub_coords = []
for j in xrange(dims) :
idx = np.digitize([point[j]], coords[j])[0]
indices += [[idx - 1, idx]]
sub_coords += [coords[j][indices[-1]]]
indices = np.array([j for j in product(*indices)])
sub_coords = np.array([j for j in product(*sub_coords)])
sub_data = data[list(np.swapaxes(indices, 0, 1))]
li = LinearNDInterpolator(sub_coords, sub_data)
return li([point])[0]
>>> point = np.array([12.3,-4.2, 500.5, 2.5])
>>> interpolator((lats, lons, alts, time), data, point)
0.386082399091
This cannot work on vectorized data, since that would require storing a LinearNDInterpolator for every possible subcube, and even though it probably would be faster than triangulating the whole thing, it would still be very slow.
scipy.ndimage.map_coordinates
is a nice fast interpolator for uniform grids (all boxes the same size).
See multivariate-spline-interpolation-in-python-scipy on SO
for a clear description.
For non-uniform rectangular grids, a simple wrapper
Intergrid maps / scales non-uniform to uniform grids,
then does map_coordinates.
On a 4d test case like yours it takes about 1 μsec per query:
Intergrid: 1000000 points in a (361, 720, 47, 8) grid took 652 msec
For very similar things I use Scientific.Functions.Interpolation.InterpolatingFunction.
import numpy as np
from Scientific.Functions.Interpolation import InterpolatingFunction
lats = np.arange(-90,90.5,0.5)
lons = np.arange(-180,180,0.5)
alts = np.arange(1,1000,21.717)
time = np.arange(8)
data = np.random.rand(len(lats)*len(lons)*len(alts)*len(time)).reshape((len(lats),len(lons),len(alts),len(time)))
axes = (lats, lons, alts, time)
f = InterpolatingFunction(axes, data)
You can now leave it to the user to call the InterpolatingFunction with coordinates:
>>> f(0,0,10,3)
0.7085675631375401
InterpolatingFunction has nice additional features, such as integration and slicing.
However, I do not know for sure whether the interpolation is linear. You would have to look in the module source to find out.
I can not open this address, and find enough informations about this package

Spline representation with scipy.interpolate: Poor interpolation for low-amplitude, rapidly oscillating functions

I need to (numerically) calculate the first and second derivative of a function for which I've attempted to use both splrep and UnivariateSpline to create splines for the purpose of interpolation the function to take the derivatives.
However, it seems that there's an inherent problem in the spline representation itself for functions who's magnitude is order 10^-1 or lower and are (rapidly) oscillating.
As an example, consider the following code to create a spline representation of the sine function over the interval (0,6*pi) (so the function oscillates three times only):
import scipy
from scipy import interpolate
import numpy
from numpy import linspace
import math
from math import sin
k = linspace(0, 6.*pi, num=10000) #interval (0,6*pi) in 10'000 steps
y=[]
A = 1.e0 # Amplitude of sine function
for i in range(len(k)):
y.append(A*sin(k[i]))
tck =interpolate.UnivariateSpline(x, y, w=None, bbox=[None, None], k=5, s=2)
M=tck(k)
Below are the results for M for A = 1.e0 and A = 1.e-2
http://i.imgur.com/uEIxq.png Amplitude = 1
http://i.imgur.com/zFfK0.png Amplitude = 1/100
Clearly the interpolated function created by the splines is totally incorrect! The 2nd graph does not even oscillate the correct frequency.
Does anyone have any insight into this problem? Or know of another way to create splines within numpy/scipy?
Cheers,
Rory
I'm guessing that your problem is due to aliasing.
What is x in your example?
If the x values that you're interpolating at are less closely spaced than your original points, you'll inherently lose frequency information. This is completely independent from any type of interpolation. It's inherent in downsampling.
Nevermind the above bit about aliasing. It doesn't apply in this case (though I still have no idea what x is in your example...
I just realized that you're evaluating your points at the original input points when you're using a non-zero smoothing factor (s).
By definition, smoothing won't fit the data exactly. Try putting s=0 in instead.
As a quick example:
import matplotlib.pyplot as plt
import numpy as np
from scipy import interpolate
x = np.linspace(0, 6.*np.pi, num=100) #interval (0,6*pi) in 10'000 steps
A = 1.e-4 # Amplitude of sine function
y = A*np.sin(x)
fig, axes = plt.subplots(nrows=2)
for ax, s, title in zip(axes, [2, 0], ['With', 'Without']):
yinterp = interpolate.UnivariateSpline(x, y, s=s)(x)
ax.plot(x, yinterp, label='Interpolated')
ax.plot(x, y, 'bo',label='Original')
ax.legend()
ax.set_title(title + ' Smoothing')
plt.show()
The reason that you're only clearly seeing the effects of smoothing with a low amplitude is due to the way the smoothing factor is defined. See the documentation for scipy.interpolate.UnivariateSpline for more details.
Even with a higher amplitude, the interpolated data won't match the original data if you use smoothing.
For example, if we just change the amplitude (A) to 1.0 in the code example above, we'll still see the effects of smoothing...
The problem is in choosing suitable values for the s parameter. Its values depend on the scaling of the data.
Reading the documentation carefully, one can deduce that the parameter should be chosen around s = len(y) * np.var(y), i.e. # of data points * variance. Taking for example s = 0.05 * len(y) * np.var(y) gives a smoothing spline that does not depend on the scaling of the data or the number of data points.
EDIT: sensible values for s depend of course also on the noise level in the data. The docs seem to recommend choosing s in the range (m - sqrt(2*m)) * std**2 <= s <= (m + sqrt(2*m)) * std**2 where std is the standard deviation associated with the "noise" you want to smooth over.

Categories