Related
hi i have two sets of data taken from two seperate import files which are both being imported into python and have been placed in two seperate lists as follows:
list 1 is of the form:
(node, x coordinate, y coordinate, z coordinate)
example list 1: [[1,0,0,0],[2,1,0,0],[3,0,1,0],[4,1,1,0],[5,0,0,1],[6,1,0,1],[7,0,1,1],[8,1,1,1]]
list 2 is in the form:
(x coordinate, y coordinate, z coordinate, temperature)
example list 2: [[0,0,0,100],[1,0,0,90],[0,1,0,85],[1,1,0,110],[0,0,1,115],[1,0,1,118],[0,1,1,100],[1,1,11,96]]
from these two lists I need to use the coordinates to create a third list which contains a node value and its corresponding temperature. This task is a simple dictionary function if all the x y and z coordinates match up however with the data i am working with this will not always be the case.
For example if in list 1 I add a new entry at the end of the list, node number 9;
new entry at end of list 1 [9, 0.5, 0.9, 0.25]
Now I find myself with a node number with no corresponding temperature. At this point an interpolation function will need to be performed on list 2 to give me the temperature related to this node. Through basic 3d interpolation calculations I have worked out that this temperature will be 97.9 therefore my final output list would look like this:
Output list:
(node, temperature)
Output list: [[1,100],[2,90],[3,85],[4,110],[5,115],[6,118],[7,100],[8,96],[9,97.9]]
I am reasonably new to python so am struggling to find a solution to this interpolation problem, I have been researching how to do this for a number of weeks now and have still not been able to find a solution.
Any help would be greatly greatly appreciated,
Thanks
There are quite a few interpolation routines in scipy, but above 2 dimensions, most of them only offer linear and nearest neighbour interpolation - which might not be sufficient for your use.
All of the interpolation routiens are listed on the interplation page of the scipy docs area. Straight away you can ignore the mnivariate, and 1D and 2D spline sections - you want the multivariate section.
There are 9 functions here, split into structured and unstructed data:
Unstructured data:
griddata(points, values, xi[, method, ...]) Interpolate unstructured
D-dimensional data.
LinearNDInterpolator(points, values[, ...]) Piecewise linear interpolant in N dimensions.
NearestNDInterpolator(points, values) Nearest-neighbour interpolation in N dimensions.
CloughTocher2DInterpolator(points, values[, tol]) Piecewise cubic, C1 smooth, curvature-minimizing interpolant in 2D.
Rbf(*args) A class for radial basis function approximation/interpolation of n-dimensional scattered data.
interp2d(x, y, z[, kind, copy, ...]) Interpolate over a 2-D grid. For >
data on a grid:
interpn(points, values, xi[, method, ...]) Multidimensional
interpolation on regular grids.
RegularGridInterpolator(points, values[, ...]) Interpolation on a regular grid in arbitrary dimensions
RectBivariateSpline(x, y, z[, bbox, kx, ky, s]) Bivariate spline approximation over a rectangular mesh.
plus an additional one in the see also section, though we'll ignore that.
You should read how they each work, it might help you understand a little better.
The way these functions work though, is that you pass them data i.e. x,y,z coords, and the corresponding values at those points, and they then return a function which allows you to get a point at any location.
I would recommend the Rbf function here though, as from what i can see it's the only nD option which does not limit you to linear or nearest neighbour interpolation.
For example, you have two lists:
node_locations = [(node, x_coord, y_coord, z_coord), ...]
temp_data = [(x0, y0, z0, temp0), (x1, y1, z1, temp1), ...]
xs, ys, zs, temps = zip(*teemp_data) # This will unpack your data into columns, rather than rows.
from scipy.interpolate import Rbf
rbfi = Rbf(xs, ys, zs, temps)
# I don't know how you want your output data, so i'm just dumping it in a dictionary.
node_data = {}
for node, x, y, z in node_locations:
node_data[node] = rbfi(x, y, z)
Try something like that.
For scientific computing, I wouldn't use lists but numpy arrays instead.
So in your case:
import numpy as np
nodes = np.array(example_list_1)
temperatures = np.array(example_list_2)
With this you can then go on to use scipy's interpolation functions, like for example:
http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.griddata.html#scipy.interpolate.griddata
from scipy.interpolate import griddata
interpolated = griddata(temperatures[:, :-1],
temperatures[:, -1],
nodes[:, 1:])
I am trying to interpolate 3D atmospheric data from one vertical coordinate to another using Numpy/Scipy. For example, I have cubes of temperature and relative humidity, both of which are on constant, regular pressure surfaces. I want to interpolate the relative humidity to constant temperature surface(s).
The exact problem I am trying to solve has been asked previously here, however, the solution there is very slow. In my case, I have approximately 3M points in my cube (30x321x321), and that method takes around 4 minutes to operate on one set of data.
That post is nearly 5 years old. Do newer versions of Numpy/Scipy perhaps have methods that handle this faster? Maybe new sets of eyes looking at the problem have a better approach? I'm open to suggestions.
EDIT:
Slow = 4 minutes for one set of data cubes. I'm not sure how else I can quantify it.
The code being used...
def interpLevel(grid,value,data,interp='linear'):
"""
Interpolate 3d data to a common z coordinate.
Can be used to calculate the wind/pv/whatsoever values for a common
potential temperature / pressure level.
grid : numpy.ndarray
The grid. For example the potential temperature values for the whole 3d
grid.
value : float
The common value in the grid, to which the data shall be interpolated.
For example, 350.0
data : numpy.ndarray
The data which shall be interpolated. For example, the PV values for
the whole 3d grid.
kind : str
This indicates which kind of interpolation will be done. It is directly
passed on to scipy.interpolate.interp1d().
returns : numpy.ndarray
A 2d array containing the *data* values at *value*.
"""
ret = np.zeros_like(data[0,:,:])
for yIdx in xrange(grid.shape[1]):
for xIdx in xrange(grid.shape[2]):
# check if we need to flip the column
if grid[0,yIdx,xIdx] > grid[-1,yIdx,xIdx]:
ind = -1
else:
ind = 1
f = interpolate.interp1d(grid[::ind,yIdx,xIdx], \
data[::ind,yIdx,xIdx], \
kind=interp)
ret[yIdx,xIdx] = f(value)
return ret
EDIT 2:
I could share npy dumps of sample data, if anyone was interested enough to see what I am working with.
Since this is atmospheric data, I imagine that your grid does not have uniform spacing; however if your grid is rectilinear (such that each vertical column has the same set of z-coordinates) then you have some options.
For instance, if you only need linear interpolation (say for a simple visualization), you can just do something like:
# Find nearest grid point
idx = grid[:,0,0].searchsorted(value)
upper = grid[idx,0,0]
lower = grid[idx - 1, 0, 0]
s = (value - lower) / (upper - lower)
result = (1-s) * data[idx - 1, :, :] + s * data[idx, :, :]
(You'll need to add checks for value being out of range, of course).For a grid your size, this will be extremely fast (as in tiny fractions of a second)
You can pretty easily modify the above to perform cubic interpolation if need be; the challenge is in picking the correct weights for non-uniform vertical spacing.
The problem with using scipy.ndimage.map_coordinates is that, although it provides higher order interpolation and can handle arbitrary sample points, it does assume that the input data be uniformly spaced. It will still produce smooth results, but it won't be a reliable approximation.
If your coordinate grid is not rectilinear, so that the z-value for a given index changes for different x and y indices, then the approach you are using now is probably the best you can get without a fair bit of analysis of your particular problem.
UPDATE:
One neat trick (again, assuming that each column has the same, not necessarily regular, coordinates) is to use interp1d to extract the weights doing something like follows:
NZ = grid.shape[0]
zs = grid[:,0,0]
ident = np.identity(NZ)
weight_func = interp1d(zs, ident, 'cubic')
You only need to do the above once per grid; you can even reuse weight_func as long as the vertical coordinates don't change.
When it comes time to interpolate then, weight_func(value) will give you the weights, which you can use to compute a single interpolated value at (x_idx, y_idx) with:
weights = weight_func(value)
interp_val = np.dot(data[:, x_idx, y_idx), weights)
If you want to compute a whole plane of interpolated values, you can use np.inner, although since your z-coordinate comes first, you'll need to do:
result = np.inner(data.T, weights).T
Again, the computation should be practically immediate.
This is quite an old question but the best way to do this nowadays is to use MetPy's interpolate_1d funtion:
https://unidata.github.io/MetPy/latest/api/generated/metpy.interpolate.interpolate_1d.html
There is a new implementation of Numba accelerated interpolation on regular grids in 1, 2, and 3 dimensions:
https://github.com/dbstein/fast_interp
Usage is as follows:
from fast_interp import interp2d
import numpy as np
nx = 50
ny = 37
xv, xh = np.linspace(0, 1, nx, endpoint=True, retstep=True)
yv, yh = np.linspace(0, 2*np.pi, ny, endpoint=False, retstep=True)
x, y = np.meshgrid(xv, yv, indexing='ij')
test_function = lambda x, y: np.exp(x)*np.exp(np.sin(y))
f = test_function(x, y)
test_x = -xh/2.0
test_y = 271.43
fa = test_function(test_x, test_y)
interpolater = interp2d([0,0], [1,2*np.pi], [xh,yh], f, k=5, p=[False,True], e=[1,0])
fe = interpolater(test_x, test_y)
I am having problems trying to find the FWHM of some data. I initially tried to fit a curve using interpolate.interp1d. With this I was able to create a function that when I entered an x value it would return an interpolated y value. The issue is that I need the inverse of this functionality. In other words, I want to switch my independent and dependent variables. When I try to switch them, I get errors because the independent data has to be sorted. If I sort the data, I will lose the indexes, and therefore lose the shape of my graph.
I tried:
x = np.linspace(0, line.shape[0], line.shape[0])
self.x_curve = interpolate.interp1d(x, y, 'linear')
where y is my data.
To get the inverse, I tried:
self.x_curve = interpolate.interp1d(sorted(y), x, 'linear')
but the values are off.
I then moved on and tried to use UnivariateSpline and get the roots to find the FWHM (from this question here: Finding the full width half maximum of a peak), but the roots() method keeps giving me an empty list [].
This is what I used:
x_curve = interpolate.UnivariateSpline(x, y)
r = x_curve.roots()
print(r)
Here is an image of the data (with the UnivariateSpline):
Any ideas? Thanks.
Using UnivariateSpline.roots() to get FWHM will only work if you shift the data so that its value is 0 at FWHM.
Seeing that the background of the data is noisy, I'd first estimate the baseline. For example:
y_baseline = y[(x<200) & (x>350)].mean()
(adjust the limits for x as you see fit). Then shift the data so that the middle of the baseline and the peak is at 0. Seeing that your data has a minimum and not a maximum as in the example, I'm using y.min():
y_shifted = y - (y.min()+y_baseline)/2.0
Now fit a spline to this shifted data and roots() should be able to find the roots, the difference of which is the FWHM.
x_curve = interpolate.UnivariateSpline(x, y_shifted, s=0)
x_curve.roots()
Increase the s parameter if you want to estimate the FWHM from smoothed data.
How can I find the peak curvature of a spline fitted using scipy? (Actually, peak second differential would be enough)
I have calculated the tck values as follows, using my 1d xs and ys vectors:
tck = splrep(xs, ys, s=0)
I know I can evaluate the second differential at any x of my choice:
ddy = splev([x], tck, 2)
So I could loop over many values of x, calculate the curvature and take the maximum. But I would prefer to interpret the values in tck to get the coefficients of the individual cubic functions, and thus calculate the peak curvature directly. However, tck appears rather opaque - how can I extract the cubic function coefficients from it?
Just use the der keyword argument on splev function:
ddy = splev(X, tck, der=2)
and preferrably don't loop over many values of x, instead make a Nx1 array X containing every value you want to evaluate, so as to get back an array of values instead of individual values you'll have to put in a sequence anyway.
Also, it is extremely adviseable to PLOT your results as a way to debug it. If plots make sense, things are most likely working (and, if not, they surely are NOT working) as you expect.
EDIT: in case the interpolation using X gives just an approximate value and you want the TRUE maximum, you can use parabolic interpolation of the three points that define the maximum (the local interpolated maximum and its neighbors), considering the spline is locally smooth:
def parabolic_interpolation(p1, p2, p3):
x1, y1 = p1
x2, y2 = p2
x3, y3 = p3
denom = (x1-x2)*(x1-x3)*(x2-x3);
a = (x3*(y2-y1)+x2*(y1-y3)+x1*(y3-y2))/denom
b = (x3*x3*(y1-y2)+x2*x2*(y3-y1)+x1*x1*(y2-y3))/denom
c = (x2*x3*(x2-x3)*y1+x3*x1*(x3-x1)*y2+x1*x2*(x1-x2)*y3)/denom
xv = -b/(2*a)
yv = c-b**2/(4*a)
return (xv, yv) # coordinates of the vertex
Hope this helps!
I need to (numerically) calculate the first and second derivative of a function for which I've attempted to use both splrep and UnivariateSpline to create splines for the purpose of interpolation the function to take the derivatives.
However, it seems that there's an inherent problem in the spline representation itself for functions who's magnitude is order 10^-1 or lower and are (rapidly) oscillating.
As an example, consider the following code to create a spline representation of the sine function over the interval (0,6*pi) (so the function oscillates three times only):
import scipy
from scipy import interpolate
import numpy
from numpy import linspace
import math
from math import sin
k = linspace(0, 6.*pi, num=10000) #interval (0,6*pi) in 10'000 steps
y=[]
A = 1.e0 # Amplitude of sine function
for i in range(len(k)):
y.append(A*sin(k[i]))
tck =interpolate.UnivariateSpline(x, y, w=None, bbox=[None, None], k=5, s=2)
M=tck(k)
Below are the results for M for A = 1.e0 and A = 1.e-2
http://i.imgur.com/uEIxq.png Amplitude = 1
http://i.imgur.com/zFfK0.png Amplitude = 1/100
Clearly the interpolated function created by the splines is totally incorrect! The 2nd graph does not even oscillate the correct frequency.
Does anyone have any insight into this problem? Or know of another way to create splines within numpy/scipy?
Cheers,
Rory
I'm guessing that your problem is due to aliasing.
What is x in your example?
If the x values that you're interpolating at are less closely spaced than your original points, you'll inherently lose frequency information. This is completely independent from any type of interpolation. It's inherent in downsampling.
Nevermind the above bit about aliasing. It doesn't apply in this case (though I still have no idea what x is in your example...
I just realized that you're evaluating your points at the original input points when you're using a non-zero smoothing factor (s).
By definition, smoothing won't fit the data exactly. Try putting s=0 in instead.
As a quick example:
import matplotlib.pyplot as plt
import numpy as np
from scipy import interpolate
x = np.linspace(0, 6.*np.pi, num=100) #interval (0,6*pi) in 10'000 steps
A = 1.e-4 # Amplitude of sine function
y = A*np.sin(x)
fig, axes = plt.subplots(nrows=2)
for ax, s, title in zip(axes, [2, 0], ['With', 'Without']):
yinterp = interpolate.UnivariateSpline(x, y, s=s)(x)
ax.plot(x, yinterp, label='Interpolated')
ax.plot(x, y, 'bo',label='Original')
ax.legend()
ax.set_title(title + ' Smoothing')
plt.show()
The reason that you're only clearly seeing the effects of smoothing with a low amplitude is due to the way the smoothing factor is defined. See the documentation for scipy.interpolate.UnivariateSpline for more details.
Even with a higher amplitude, the interpolated data won't match the original data if you use smoothing.
For example, if we just change the amplitude (A) to 1.0 in the code example above, we'll still see the effects of smoothing...
The problem is in choosing suitable values for the s parameter. Its values depend on the scaling of the data.
Reading the documentation carefully, one can deduce that the parameter should be chosen around s = len(y) * np.var(y), i.e. # of data points * variance. Taking for example s = 0.05 * len(y) * np.var(y) gives a smoothing spline that does not depend on the scaling of the data or the number of data points.
EDIT: sensible values for s depend of course also on the noise level in the data. The docs seem to recommend choosing s in the range (m - sqrt(2*m)) * std**2 <= s <= (m + sqrt(2*m)) * std**2 where std is the standard deviation associated with the "noise" you want to smooth over.