Shifting a curve to another curve horizontally - python

Does anyone have an idea how to fit a curve to another curve, simply by shifting it to the right. For example, in this plot, I want to shift the orange curve to the right (no vertical shift!) in order that the curves overlap eachother. Can anyone help me to do this?
Data of the curves:
y1 = [1.2324, 1.4397, 1.5141, 1.7329, 1.9082, 2.2884, 2.166, 2.8175, 3.1014, 2.8893, 3.673, 4.3875, 4.9817, 5.6906, 6.3667, 7.2854, 8.2703, 9.3432, 10.591, 11.963, 13.579, 15.36, 17.306, 19.508, 21.976, 24.666, 27.692, 31.026, 34.724, 38.702]
y2 = [1.6231, 1.6974, 1.8145, 2.4805, 2.5643, 2.6176, 2.9332, 3.4379, 4.0154, 4.2258, 4.6837, 5.9837, 6.4408, 7.2903, 8.2283, 9.4134, 10.537, 11.947, 13.344, 15.202, 17.073, 19.211, 21.598, 24.216, 27.06, 30.31, 33.933, 37.882, 42.201, 46.978]
x = [0.1, 0.127, 0.161, 0.204, 0.259, 0.329, 0.418, 0.53, 0.672, 0.853, 1.08, 1.37, 1.74, 2.21, 2.81, 3.56, 4.52, 5.74, 7.28, 9.24, 11.7, 14.9, 18.9, 24.0, 30.4, 38.6, 48.9, 62.1, 78.8, 100.0]

This question has a few problems that needed some fiddling to resolve. This is not the ideal solution I'm sure, but it provided a close enough value to what was expected by the manual method (around 1.5 and 1.6).
The first roadblock is that when you shift the X values, you don't get matching y values, so calculating the residual can be tricky. I brute-forced my way through this problem by creating a huge new x array with 1000 points, then interpolated the original 2 y values on this new x array (this will come later). Therefore, when calculating the residual between the two curves, the x values will be off, but not by much.
reference_y = y1
to_shift_y = y2
expanded_x = np.logspace(np.log10(x[0]), np.log10(x[-1]), num=1000)
expanded_y_reference = np.interp(expanded_x, x, reference_y)
expanded_y_to_shift = np.interp(expanded_x, x, to_shift_y)
Then, when you shift x by some constant, you'll get two regions where there won't be equivalent x values.
original x: -------------------------------xxxx
shifted x: xxxx-------------------------------
I created a new x array with the shift parameter, hor_shift set so some value greater than 1. Then, I found the indices where the original and shifted stop matching.
start = np.argmax(expanded_x >= expanded_x_shifted[0])
end = np.argmin(expanded_x_shifted <= expanded_x[-1])
Since these arrays are [False, False, True, True ...] and [True, True, ..., True, False, False], argmax and argmin will return the first instance where you have a different value.
Now, we have to slice our original and shifted x arrays so they have the same size, and values in common, and the same with the expanded y arrays. Pardon the long names, it's just so I don't get confused.
expanded_x_original_in_common_with_shifted = expanded_x[start:]
expanded_x_shifted_in_common_with_original = expanded_x_shifted[:end]
sliced_expanded_y_reference = expanded_y_reference[start:]
sliced_expanded_y_to_shift = expanded_y_to_shift[:end]
And last, and most importantly, we can calculate a distance between the two curves, assuming the x values are aligned.
residual = ((sliced_expanded_y_reference - sliced_expanded_y_to_shift) ** 2).sum()
By minimizing this, we can get the ideal shift.
We can compare our curves. Here, I used two values for the shift, 1.3 and 1.56, to illustrate good and bad shift values (these were found by testing different values). The vertical lines show the region in common.
Now, we can transform this process into a function and use some minimization method to find the ideal shift value. Here's what I got.
from lmfit import Parameters, minimize
par = Parameters()
# If the shift parameter is 1, you get an error
par.add('shift', value=1.1, min=1)
def min_function(par, x, reference_y, to_shift_y):
hor_shift = par['shift'].value
# print(hor_shift) # <- in case you want to follow the process
expanded_x = np.logspace(np.log10(x[0]), np.log10(x[-1]), num=1000)
expanded_x_shifted = expanded_x * hor_shift
start = np.argmax(expanded_x >= expanded_x_shifted[0])
end = np.argmin(expanded_x_shifted <= expanded_x[-1])
expanded_x_original_in_common_with_shifted = expanded_x[start:]
expanded_x_shifted_in_common_with_original = expanded_x_shifted[:end]
expanded_y_reference = np.interp(expanded_x, x, reference_y)
expanded_y_to_shift = np.interp(expanded_x, x, to_shift_y)
sliced_expanded_y_reference = expanded_y_reference[start:]
sliced_expanded_y_to_shift = expanded_y_to_shift[:end]
residual = ((sliced_expanded_y_reference - sliced_expanded_y_to_shift) ** 2).sum()
return residual
minimize(min_function, par, method='nelder', args=(x, reference_y, to_shift_y))
This results in an ideal shift parameter of 1.555, confirming the initial guess. Note that you have to change the residual expression to (sliced_expanded_y_reference - sliced_expanded_y_to_shift) if you want your chisquared to match the one in the graphs.

The notations are changed in order to make more clear the matrix equations below :
y(x)=y1(x)
z(x)=y2(x)
A translation of value=c on the x logarithmic scale is equivalent to an expansion of value=b on the x linear scale because log(x)+c=log(b x) with c=log(b).
The inverse function x=f(y) have to approximately coincide with bx=f(z). So we consider the sum of residuals [f(y)-x]^2+[f(z)-b x]^2. This leads to the regression calculus below. The function f(y) is approximated with a polynomial of degree m.
With the given data the shape of the curve of f(y) is rather smooth. This suggest that a low degree m might be sufficient.
For example with m=2 the result is :
The black curve is the horizontaly translated blue curve from c=0.180 on the logarithmic scale.
Of course one can use a polynomial of higher degree. For example with m=3 we get b=1.536 and c=0.186
This numerical example is a favourable case because the curve x=f(y) has a simple shape. In case of more complicated shapes probably a bigger value of m should be necessary, with the risk of unreliable regression calculus.

Related

How to approximate function for geometrically growing sequence?

I have the function to create x0:
x0 = []
for i in range(0,N):
if i == 0:
a = 0.4
else:
a = round(0.4 + 0.3*2**(i-1), 1)
print(i, a)
x0.append(a)
which gives me data of growing sequence: [0.4, 0.7, 1.0, 1.6, 2.8, 5.2, 10.0, 19.6, 38.8, 77.2, ...] which I want to find a function to fit to these points. I don't want to use polynomials because the N can be different but need to find single parameter function. The projection needs to be very general.
My approach is using the code:
def fun(x, a, b):
return np.cos(x**(2/3) * a + b)
# Make fit #
y0 = [0]*len(x0)
p, c = curve_fit(f=fun, xdata=np.array(x0), ydata=y0)
x = np.linspace(0, max(x0), 10000)
plt.plot(x, fun(x, *p))
plt.scatter(x0, y0)
That function's progress seem to be too wide for starting points and quite fit the last ones. I also tried to lower the initial oscillations by multiplying this function by x, but the period still is too wide at the beginning. Is it possible to find good oscillation function to go thru (almost) all those points? I don't know how to set parameter under the x**(...) because placing there a variable cause the fit to estimate it as close to 1, which is not what I need. Can I set the power for sin(x**b) that way? If not what functions family should I try?
Below the plot for function multiplied by b*x. The oscillations at first points should be much denser.
Thanks to suggestions I've found the best fit and I think it can't be better.
The solution is:
def fun(x, a, b, c):
return np.cos(np.pi*(np.log2((x-a)/b) + c))
and the fit method looks like
p, c = curve_fit(f=fun, xdata=np.array(x0), ydata=y0, bounds=([0, -np.inf, -np.inf], [x0[0], np.inf, np.inf]))
It's important to set initial bounds for a to avoid convergention failure or "Residuals are not finite in the initial point" issues. At last each point has its own crossing despite mad behavior of the close to 0 at the domain. Parameters are pretty close to 0 or 1 - not tending to infinity.

How to compare two 3D curves in Python?

I have a huge array with coordinates describing a 3D curves, ~20000 points. I am trying to use less points, by ignoring some, say take 1 every 2 points. When I do this and I plot the reduced number of points the shape looks the same. However I would like to compare the two curves properly, similar to the chi squared test to see how much the reduced plot differs from the original.
Is there an easy, built-in way of doing this or does anyone have any ideas on how to approach the problem.
The general question of "line simplification" seems to be an entire field of research. I recommend you to have a look, for instance, at the Ramer–Douglas–Peucker algorithm. There are several python modules, I could find: rdp and simplification (which also implement the Py-Visvalingam-Whyatt algorithm).
Anyway, I am trying something for evaluating the difference between two polylines, using interpolation. Any curves can be compared, even without common points.
The first idea is to compute the distance along the path for both polylines. They are used as landmarks to go from one given point on the first curve to a relatively close point on the other curve.
Then, the points of the first curve can be interpolated on the other curve. These two datasets can now be compared, point by point.
On the graph, the black curve is the interpolation of xy2 on the curve xy1. So the distances between the black squares and the orange circles can be computed, and averaged.
This gives an average distance measure, but nothing to compare against and decide if the applied reduction is good enough...
def normed_distance_along_path( polyline ):
polyline = np.asarray(polyline)
distance = np.cumsum( np.sqrt(np.sum( np.diff(polyline, axis=1)**2, axis=0 )) )
return np.insert(distance, 0, 0)/distance[-1]
def average_distance_between_polylines(xy1, xy2):
s1 = normed_distance_along_path(xy1)
s2 = normed_distance_along_path(xy2)
interpol_xy1 = interp1d( s1, xy1 )
xy1_on_2 = interpol_xy1(s2)
node_to_node_distance = np.sqrt(np.sum( (xy1_on_2 - xy2)**2, axis=0 ))
return node_to_node_distance.mean() # or use the max
# Two example polyline:
xy1 = [0, 1, 8, 2, 1.7], [1, 0, 6, 7, 1.9] # it should work in 3D too
xy2 = [.1, .6, 4, 8.3, 2.1, 2.2, 2], [.8, .1, 2, 6.4, 6.7, 4.4, 2.3]
average_distance_between_polylines(xy1, xy2) # 0.45004578069119189
If you subsample the original curve, a simple way to assess the approximation error is by computing the maximum distance between the original curve and the line segments between the resampled vertices. The maximum distance occurs at the original vertices and it suffices to evaluate at these points only.
By the way, this provides a simple way to perform the subsampling by setting a maximum tolerance and decimating until the tolerance is exceeded.
You can also think of computing the average distance, but this probably involves nasty integrals and might give less visually pleasing results.

What do extra results of numpy.polyfit mean?

When creating a line of best fit with numpy's polyfit, you can specify the parameter full to be True. This returns 4 extra values, apart from the coefficents. What do these values mean and what do they tell me about how well the function fits my data?
https://docs.scipy.org/doc/numpy/reference/generated/numpy.polyfit.html
What i'm doing is:
bestFit = np.polyfit(x_data, y_data, deg=1, full=True)
and I get the result:
(array([ 0.00062008, 0.00328837]), array([ 0.00323329]), 2, array([
1.30236506, 0.55122159]), 1.1102230246251565e-15)
The documentation says that the four extra pieces of information are: residuals, rank, singular_values, and rcond.
Edit:
I am looking for a further explanation of how rcond and singular_values describes goodness of fit.
Thank you!
how rcond and singular_values describes goodness of fit.
Short answer: they don't.
They do not describe how well the polynomial fits the data; this is what residuals are for. They describe how numerically robust was the computation of that polynomial.
rcond
The value of rcond is not really about quality of fit, it describes the process by which the fit was obtained, namely a least-squares solution of a linear system. Most of the time the user of polyfit does not provide this parameter, so a suitable value is picked by polyfit itself. This value is then returned to the user for their information.
rcond is used for truncation in ill-conditioned matrices. Least squares solver does two things:
Finds x that minimizes the norm of residuals Ax-b
If multiple x achieve this minimum, returns x with the smallest norm among those.
The second clause occurs when some changes of x do not affect the right-hand side at all. But since floating point computations are imperfect, usually what happens is that some changes of x affect the right hand side very little. And this is where rcond is used to decide when "very little" should be considered as "zero up to noise".
For example, consider the system
x1 = 1
x1 + 0.0000000001 * x2 = 2
This one can be solved exactly: x1 = 1 and x2 = 10000000000. But... that tiny coefficient (that in reality, came after some matrix manipulations) has some numeric error in it; for all we know it could be negative, or zero. Should we let it have such huge influence on the solution?
So, in such a situation the matrix (specifically its singular values) gets truncated at level rcond. This leaves
x1 = 1
x1 = 2
for which the least-squares solution is x1 = 1.5, x2 = 0. Note that this solution is robust: no huge numbers from tiny fluctuations of coefficients.
Singular values
When one solves a linear system Ax = b in the least squares sense, the singular values of A determine how numerically tricky this is. Specifically, large disparity between largest and smallest singular values is problematic: such systems are ill-conditioned. An example is
0.835*x1 + 0.667*x2 = 0.168
0.333*x1 + 0.266*x2 = 0.0067
The exact solution is (1, -1). But if the right hand side is changed from 0.067 to 0.066, the solution is (-666, 834) -- totally different. The problem is that the singular values of A are (roughly) 1 and 1e-6; this magnifies any changes on the right by the factor of 1e6.
Unfortunately, polynomial fit often results in ill-conditioned matrices. For example, fitting a polynomial of degree 24 to 25 equally spaced data points is unadvisable.
import numpy as np
x = np.arange(25)
np.polyfit(x, x, 24, full=True)
The singular values are
array([4.68696731e+00, 1.55044718e+00, 7.17264545e-01, 3.14298605e-01,
1.16528492e-01, 3.84141241e-02, 1.15530672e-02, 3.20120674e-03,
8.20608411e-04, 1.94870760e-04, 4.28461687e-05, 8.70404409e-06,
1.62785983e-06, 2.78844775e-07, 4.34463936e-08, 6.10212689e-09,
7.63709211e-10, 8.39231664e-11, 7.94539407e-12, 6.32326226e-13,
4.09332903e-14, 2.05501534e-15, 7.55397827e-17, 4.81104905e-18,
8.98275758e-20]),
which, with the default value of rcond (5.55e-15 here), gets four of them truncated to 0.
The difference in magnitude between smallest and largest singular values indicates that perturbing the y-values by numbers of size 1e-15 can result in changes of about 1 to the coefficients. (Not every perturbation will do that, just some that happen to align with a singular vector for a small singular value).
Rank
The effective rank is just the number of singular values above the rcond threshold. In the above example it's 21. This means that even though the fit is for 25 points, and we get a polynomial with 25 coefficients, there are only 21 degrees of freedom in the solution.

compare two time series (simulation results)

I want to do unit testing of simulation models and for that, I run a simulation once and store the results (a time series) as reference in a csv file (see an example here). Now when I change my model, I run the simulation again, store the new reults as a csv file as well and then I compare the results.
The results are usually not 100% identical, an example plot is shown below:
The reference results are plotted in black and the new results are plotted in green.
The difference of the two is plotted in the second plot, in blue.
As can be seen, at a step the difference can become arbitrarily high, while everywhere else the difference is almost zero.
Therefore, I would prefer to use a different algorithms for comparison than just subtracting the two, but I can only describe my idea graphically:
When plotting the reference line twice, first in a light color with a high line width and then again in a dark color and a small line width, then it will look like it has a pink tube around the centerline.
Note that during a step that tube will not only be in the direction of the ordinate axis, but also in the direction of the abscissa.
When doing my comparison, I want to know whether the green line stays within the pink tube.
Now comes my question: I do not want to compare the two time series using a graph, but using a python script. There must be something like this already, but I cannot find it because I am missing the right vocabulary, I believe. Any ideas? Is something like that in numpy, scipy, or similar? Or would I have to write the comparison myself?
Additional question: When the script says the two series are not sufficiently similar, I would like to plot it as described above (using matplotlib), but the line width has to be defined somehow in other units than what I usually use to define line width.
I would assume here that your problem can be simplified by assuming that your function has to be close to another function (e.g. the center of the tube) with the very same support points and then a certain number of discontinuities are allowed.
Then, I would implement a different discretization of function compared to the typical one that is used for L^2 norm (See for example some reference here).
Basically, in the continuous case, the L^2 norm relaxes the constrain of the two function being close everywhere, and allow it to be different on a finite number of points, called singularities
This works because there are an infinite number of points where to calculate the integral, and a finite number of points will not make a difference there.
However, since there are no continuous functions here, but only their discretization, the naive approach will not work, because any singularity will contribute potentially significantly to the final integral value.
Therefore, what you could do is to perform a point by point check whether the two functions are close (within some tolerance) and allow at most num_exceptions points to be off.
import numpy as np
def is_close_except(arr1, arr2, num_exceptions=0.01, **kwargs):
# if float, calculate as percentage of number of points
if isinstance(num_exceptions, float):
num_exceptions = int(len(arr1) * num_exceptions)
num = len(arr1) - np.sum(np.isclose(arr1, arr2, **kwargs))
return num <= num_exceptions
By contrast the standard L^2 norm discretization would lead to something like this integrated (and normalized) metric:
import numpy as np
def is_close_l2(arr1, arr2, **kwargs):
norm1 = np.sum(arr1 ** 2)
norm2 = np.sum(arr2 ** 2)
norm = np.sum((arr1 - arr2) ** 2)
return np.isclose(2 * norm / (norm1 + norm2), 0.0, **kwargs)
This however will fail for arbitrarily large peaks, unless you set such a large tolerance than basically anything results as "being close".
Note that the kwargs is used if you want to specify a additional tolerance constraints to np.isclose() or other of its options.
As a test, you could run:
import numpy as np
import numpy.random
np.random.seed(0)
num = 1000
snr = 100
n_peaks = 5
x = np.linspace(-10, 10, num)
# generate ground truth
y = np.sin(x)
# distributed noise
y2 = y + np.random.random(num) / snr
# distributed noise + peaks
y3 = y + np.random.random(num) / snr
peak_positions = [np.random.randint(num) for _ in range(n_peaks)]
for i in peak_positions:
y3[i] += np.random.random() * snr
# for distributed noise, both work with a 1/snr tolerance
is_close_l2(y, y2, atol=1/snr)
# output: True
is_close_except(y, y2, atol=1/snr)
# output: True
# for peak noise, since n_peaks < num_exceptions, this works
is_close_except(y, y3, atol=1/snr)
# output: True
# and if you allow 0 exceptions, than it fails, as expected
is_close_except(y, y3, num_exceptions=0, atol=1/snr)
# output: False
# for peak noise, this fails because the contribution from the peaks
# in the integral is much larger than the contribution from the rest
is_close_l2(y, y3, atol=1/snr)
# output: False
There are other approaches to this problem involving higher mathematics (e.g. Fourier or Wavelet transforms), but I would stick to the simplest.
EDIT (updated):
However, if the working assumption does not hold or you do not like, for example because the two functions have different sampling or they are described by non-injective relations.
In that case, you can follow the center of the tube using (x, y) data and the calculate the Euclidean distance from the target (the tube center), and check that this distance is point-wise smaller than the maximum allowed (the tube size):
import numpy as np
# assume it is something with shape (N, 2) meaning (x, y)
target = ...
# assume it is something with shape (M, 2) meaning again (x, y)
trajectory = ...
# calculate the distance minimum distance between each point
# of the trajectory and the target
def is_close_trajectory(trajectory, target, max_dist):
dist = np.zeros(trajectory.shape[0])
for i in range(len(dist)):
dist[i] = np.min(np.sqrt(
(target[:, 0] - trajectory[i, 0]) ** 2 +
(target[:, 1] - trajectory[i, 1]) ** 2))
return np.all(dist < max_dist)
# same as above but faster and more memory-hungry
def is_close_trajectory2(trajectory, target, max_dist):
dist = np.min(np.sqrt(
(target[:, np.newaxis, 0] - trajectory[np.newaxis, :, 0]) ** 2 +
(target[:, np.newaxis, 1] - trajectory[np.newaxis, :, 1]) ** 2),
axis=1)
return np.all(dist < max_dist)
The price of this flexibility is that this will be a significantly slower or memory-hungry function.
Assuming you have your list of results in the form we discussed in the comments already loaded:
from random import randint
import numpy
l1 = [(i,randint(0,99)) for i in range(10)]
l2 = [(i,randint(0,99)) for i in range(10)]
# I generate some random lists e.g:
# [(0, 46), (1, 33), (2, 85), (3, 63), (4, 63), (5, 76), (6, 85), (7, 83), (8, 25), (9, 72)]
# where the first element is the time and the second a value
print(l1)
# Then I just evaluate for each time step the difference between the values
differences = [abs(x[0][1]-x[1][1]) for x in zip(l1,l2)]
print(differences)
# And I can just print hte maximum difference and its index:
print(max(differences))
print(differences.index(max(differences)))
And with this data if you define that your "tube" is for example 10 large you can just check if the maxximum value that you find is greater than your thrashold in order to decide if those functions are similar enough or not
you will have to remove outliers from your dataset first if you need to skip a random spike.
you could also try the following?
from tslearn.metrics import dtw
print(dtw(arr1,arr2)*100/<lengthOfArray>)
Bit late to the game but I encountered the same conundrum recently and this seems to be the only question on on the site discussing this particular problem.
A basic solution is to use time and amplitude tolerance values to create a 'bounding box' style envelope (similar to your pink tube) around the data.
I'm sure there are more elegant ways to do this, but a very crudely coded brute force example would be something like the following using pandas:
import pandas as pd
data = pd.DataFrame()
data['benchmark'] = [0.1, 0.2, 0.3] # or whatever you pull from your expected value data set
data['under_test'] = [0.2, 0.3, 0.1] # or whatever you pull from your simulation results data set
sample_rate = 20 # or whatever the data sample rate is
st = 0.05 * sample_rate # shift tolerance adjusted to time series sample rate
# best to make it an integer so we can use standard
# series shift functions and whatnot
at = 0.05 # amplitude tolerance
bounding = pd.DataFrame()
# if we didn't care about time shifts, the following two would be sufficient
# (i.e. if the data didn't have severe discontinuities between samples)
bounding['top'] = data[['benchmark']] + at
bounding['bottom'] = data[['benchmark']] - at
# if you want to be able to tolerate large discontinuities
# the bounds can be widened along the time axis to accommodate for large jumps
bounding['bottomleft'] = data[['benchmark']].shift(-st) - at
bounding['topleft'] = data[['benchmark']].shift(-st) + at
bounding['topright'] = data[['benchmark']].shift(st) + at
bounding['bottomright'] = data[['benchmark']].shift(st) - at
# minimums and maximums give us a rough (but hopefully good enough) envelope
# these can be plotted as a parametric replacement of the 'pink tube' of line width
data['min'] = bounding.min(1)
data['max'] = bounding.max(1)
# see if the test data falls inside the envelope
data['pass/fail'] = data['under_test'].between(data['min'], data['max'])
# You now have a machine-readable column of booleans
# indicating which data points are outside the envelope

Fast 3D interpolation of atmospheric data in Numpy/Scipy

I am trying to interpolate 3D atmospheric data from one vertical coordinate to another using Numpy/Scipy. For example, I have cubes of temperature and relative humidity, both of which are on constant, regular pressure surfaces. I want to interpolate the relative humidity to constant temperature surface(s).
The exact problem I am trying to solve has been asked previously here, however, the solution there is very slow. In my case, I have approximately 3M points in my cube (30x321x321), and that method takes around 4 minutes to operate on one set of data.
That post is nearly 5 years old. Do newer versions of Numpy/Scipy perhaps have methods that handle this faster? Maybe new sets of eyes looking at the problem have a better approach? I'm open to suggestions.
EDIT:
Slow = 4 minutes for one set of data cubes. I'm not sure how else I can quantify it.
The code being used...
def interpLevel(grid,value,data,interp='linear'):
"""
Interpolate 3d data to a common z coordinate.
Can be used to calculate the wind/pv/whatsoever values for a common
potential temperature / pressure level.
grid : numpy.ndarray
The grid. For example the potential temperature values for the whole 3d
grid.
value : float
The common value in the grid, to which the data shall be interpolated.
For example, 350.0
data : numpy.ndarray
The data which shall be interpolated. For example, the PV values for
the whole 3d grid.
kind : str
This indicates which kind of interpolation will be done. It is directly
passed on to scipy.interpolate.interp1d().
returns : numpy.ndarray
A 2d array containing the *data* values at *value*.
"""
ret = np.zeros_like(data[0,:,:])
for yIdx in xrange(grid.shape[1]):
for xIdx in xrange(grid.shape[2]):
# check if we need to flip the column
if grid[0,yIdx,xIdx] > grid[-1,yIdx,xIdx]:
ind = -1
else:
ind = 1
f = interpolate.interp1d(grid[::ind,yIdx,xIdx], \
data[::ind,yIdx,xIdx], \
kind=interp)
ret[yIdx,xIdx] = f(value)
return ret
EDIT 2:
I could share npy dumps of sample data, if anyone was interested enough to see what I am working with.
Since this is atmospheric data, I imagine that your grid does not have uniform spacing; however if your grid is rectilinear (such that each vertical column has the same set of z-coordinates) then you have some options.
For instance, if you only need linear interpolation (say for a simple visualization), you can just do something like:
# Find nearest grid point
idx = grid[:,0,0].searchsorted(value)
upper = grid[idx,0,0]
lower = grid[idx - 1, 0, 0]
s = (value - lower) / (upper - lower)
result = (1-s) * data[idx - 1, :, :] + s * data[idx, :, :]
(You'll need to add checks for value being out of range, of course).For a grid your size, this will be extremely fast (as in tiny fractions of a second)
You can pretty easily modify the above to perform cubic interpolation if need be; the challenge is in picking the correct weights for non-uniform vertical spacing.
The problem with using scipy.ndimage.map_coordinates is that, although it provides higher order interpolation and can handle arbitrary sample points, it does assume that the input data be uniformly spaced. It will still produce smooth results, but it won't be a reliable approximation.
If your coordinate grid is not rectilinear, so that the z-value for a given index changes for different x and y indices, then the approach you are using now is probably the best you can get without a fair bit of analysis of your particular problem.
UPDATE:
One neat trick (again, assuming that each column has the same, not necessarily regular, coordinates) is to use interp1d to extract the weights doing something like follows:
NZ = grid.shape[0]
zs = grid[:,0,0]
ident = np.identity(NZ)
weight_func = interp1d(zs, ident, 'cubic')
You only need to do the above once per grid; you can even reuse weight_func as long as the vertical coordinates don't change.
When it comes time to interpolate then, weight_func(value) will give you the weights, which you can use to compute a single interpolated value at (x_idx, y_idx) with:
weights = weight_func(value)
interp_val = np.dot(data[:, x_idx, y_idx), weights)
If you want to compute a whole plane of interpolated values, you can use np.inner, although since your z-coordinate comes first, you'll need to do:
result = np.inner(data.T, weights).T
Again, the computation should be practically immediate.
This is quite an old question but the best way to do this nowadays is to use MetPy's interpolate_1d funtion:
https://unidata.github.io/MetPy/latest/api/generated/metpy.interpolate.interpolate_1d.html
There is a new implementation of Numba accelerated interpolation on regular grids in 1, 2, and 3 dimensions:
https://github.com/dbstein/fast_interp
Usage is as follows:
from fast_interp import interp2d
import numpy as np
nx = 50
ny = 37
xv, xh = np.linspace(0, 1, nx, endpoint=True, retstep=True)
yv, yh = np.linspace(0, 2*np.pi, ny, endpoint=False, retstep=True)
x, y = np.meshgrid(xv, yv, indexing='ij')
test_function = lambda x, y: np.exp(x)*np.exp(np.sin(y))
f = test_function(x, y)
test_x = -xh/2.0
test_y = 271.43
fa = test_function(test_x, test_y)
interpolater = interp2d([0,0], [1,2*np.pi], [xh,yh], f, k=5, p=[False,True], e=[1,0])
fe = interpolater(test_x, test_y)

Categories