I have some data that comes in the form (x, y, z, V) where x,y,z are distances, and V is the moisture. I read a lot on StackOverflow about interpolation by python like this and this valuable posts, but all of them were about regular grids of x, y, z. i.e. every value of x contributes equally with every point of y, and every point of z. On the other hand, my points came from 3D finite element grid (as below), where the grid is not regular.
The two mentioned posts 1 and 2, defined each of x, y, z as a separate numpy array then they used something like cartcoord = zip(x, y) then scipy.interpolate.LinearNDInterpolator(cartcoord, z) (in a 3D example). I can not do the same as my 3D grid is not regular, thus not each point has a contribution to other points, so if when I repeated these approaches I found many null values, and I got many errors.
Here are 10 sample points in the form of [x, y, z, V]
data = [[27.827, 18.530, -30.417, 0.205] , [24.002, 17.759, -24.782, 0.197] ,
[22.145, 13.687, -33.282, 0.204] , [17.627, 18.224, -25.197, 0.197] ,
[29.018, 18.841, -38.761, 0.212] , [24.834, 20.538, -33.012, 0.208] ,
[26.232, 22.327, -27.735, 0.204] , [23.017, 23.037, -29.230, 0.205] ,
[28.761, 21.565, -31.586, 0.211] , [26.263, 23.686, -32.766, 0.215]]
I want to get the interpolated value V of the point (25, 20, -30)
How can I get it?
I found the answer, and posting it for the benefit of StackOverflow readers.
The method is as follows:
1- Imports:
import numpy as np
from scipy.interpolate import griddata
from scipy.interpolate import LinearNDInterpolator
2- prepare the data as follows:
# put the available x,y,z data as a numpy array
points = np.array([
[ 27.827, 18.53 , -30.417], [ 24.002, 17.759, -24.782],
[ 22.145, 13.687, -33.282], [ 17.627, 18.224, -25.197],
[ 29.018, 18.841, -38.761], [ 24.834, 20.538, -33.012],
[ 26.232, 22.327, -27.735], [ 23.017, 23.037, -29.23 ],
[ 28.761, 21.565, -31.586], [ 26.263, 23.686, -32.766]])
# and put the moisture corresponding data values in a separate array:
values = np.array([0.205, 0.197, 0.204, 0.197, 0.212,
0.208, 0.204, 0.205, 0.211, 0.215])
# Finally, put the desired point/points you want to interpolate over
request = np.array([[25, 20, -30], [27, 20, -32]])
3- Write the final line of code to get the interpolated values
Method 1, using griddata
print griddata(points, values, request)
# OUTPUT: array([ 0.20448536, 0.20782028])
Method 2, using LinearNDInterpolator
# First, define an interpolator function
linInter= LinearNDInterpolator(points, values)
# Then, apply the function to one or more points
print linInter(np.array([[25, 20, -30]]))
print linInter(request)
# OUTPUT: [0.20448536 0.20782028]
# I think you may use it with python map or pandas.apply as well
Hope this benefit every one.
Bet regards
Related
I have some 2D data-arrays which I need to interpolate. SciPy offers some solutions to this, as nicely explained for example here. My data very roughly looks like a 2D parabola (as a function of x1 and x2), with the opening pointing downwards, as illustrated in Fig. 1 for a horizontal cut along x2=0. As you can see, there are no negative values (the datapoints are all exactly 0 there).
I wanted to perform cubic interpolation as I require smooth data. This gives a problem at the edges, resulting in "wiggling" or "overshooting" of the fit/interpolation, as illustrated in Fig. 2. Negative values are, however, not allowed in the following post-processing of the interpolated data (also the following overshooting to positive values where they should be zero need to be suppressed).
.
I thought that a clever "solution" would be to simply set the values which are 0 (note that those are all exactly the same) to NaN, such that they are ignored by the interpolation. But using SciPy's griddata and the cubic method is not working with NaNs. The linear method can handle this, but I need cubic.
My question is, am I missing something or doing something wrong that results in griddata not working properly with NaNs and the cubic method ?
An example code is as follows:
import matplotlib.pyplot as plt
import numpy as np
import scipy.interpolate as interp
def some_funct( x1, x2 ):
result = -x1**2 - 2.*x2**2 + 60.
result[result < .0] = .0
return result
# define original (sparse) grid
N_x1, N_x2 = 20, 20
x1_old = np.linspace( -9, 10, N_x1 )
x2_old = np.linspace( -9, 10, N_x2 )
X1_old, X2_old = np.meshgrid( x1_old, x2_old )
# datapoints to be interpolated
z_old = some_funct( X1_old, X2_old )
# set 0 datapoints to nan
z_old[z_old==0] = np.nan
# grid for interpolation
x1_new = np.linspace( -9, 10, 10*N_x1 )
x2_new = np.linspace( -9, 10, 10*N_x2 )
X1_new, X2_new = np.meshgrid( x1_new, x2_new )
# perform interpolation
z_new = interp.griddata( np.array([X1_old.ravel(),X2_old.ravel()]).T, z_old.ravel(),
(X1_new, X2_new),
method='cubic', fill_value=.0 # only works for 'linear'
)
# plot horizonal cut along x2=0
fig1 = plt.figure( figsize=(8,6) )
ax1 = fig1.add_subplot( 1,1,1 )
x2_old_0_id = (np.abs(x2_old - .0)).argmin()
x2_new_0_id = (np.abs(x2_new - .0)).argmin()
ax1.plot( x1_old, z_old[ x2_old_0_id , : ], marker='x', linestyle='None', label='data' )
ax1.plot( x1_new, z_new[ x2_new_0_id , : ], label='interpolation' )
ax1.legend()
ax1.set_xlabel( 'x1' )
ax1.set_ylabel( 'z' )
plt.show()
Any hints are greatly appreciated!
Update: Forgot to include the versions I am using:
numpy: 1.15.1
scipy: 1.1.0
For monotone cubic interpolation, which does not overshoot, use pchip or Akima1DInterpolator
I'm attempting to plot a 3D chart using matplotlib.pyplot.contourf() with the following program:
import numpy as np
import matplotlib.pyplot as plt
import scipy
# calculates Fast Fourier transforms for each value in the 1D array "Altitude"
# and stacks them vertically to form a 2D array of fft values called "Fourier"
Fourier = np.array([])
for i in range(len(Altitude)):
Ne_fft = Ne_lowpass[i,:]/np.average(Ne_lowpass[i,:])
Ne_fft = Ne_fft - Ne_fft.mean()
W = scipy.fftpack.fftfreq(10*Ne_fft.size, d=(Time[-1]-Time[0])/len(Ne_fft))
P = 1/abs(W)
FFT = abs(scipy.fftpack.fft(Ne_fft, n=10*len(Ne_fft)))
FFT = FFT**2
if len(Fourier) == 0:
Fourier = FFT
else:
Fourier = np.vstack((Fourier,FFT))
# plots the 2D contourf plot of "Fourier", with respect to "Altitude" and period "P"
plt.figure(5)
C = plt.contourf(P,Altitude,Fourier,100,cmap='jet')
plt.xscale('log')
plt.xlim([1,P[np.argmax(P)+1]])
plt.ylim([59,687])
plt.ylabel("Altitude")
plt.xlabel("Period")
plt.title("Power spectrum of Ne")
cbar = plt.colorbar(C)
cbar.set_label("Power", fontsize = 16)
For the most part it is working fine; however, in some places useless white space is plotted. the plot produced can be found here (sorry, I don't have enough reputation points to attach images directly)
The purpose of this program is to calculate a series of Fast Fourier Transforms across 1 axis of a 2 dimensional numpy array, and stack them up to display a contour plot depicting which periodicities are most prominent in the data.
I checked the parts of the plotted quantity that appear white, and finite values are still present, although much smaller than noticable quantities elsewhere in the plot:
print(Fourier[100:,14000:])
[[ 2.41147887e-03 1.50783490e-02 4.82620482e-02 ..., 1.49769976e+03
5.88859945e+02 1.31930217e+02]
[ 2.12684922e-03 1.44076962e-02 4.65881565e-02 ..., 1.54719976e+03
6.14086374e+02 1.38727145e+02]
[ 1.84414615e-03 1.38162140e-02 4.51940720e-02 ..., 1.56478339e+03
6.23619105e+02 1.41367042e+02]
...,
[ 3.51539440e-03 3.20182148e-03 2.38117665e-03 ..., 2.43824864e+03
1.18676851e+03 3.13067945e+02]
[ 3.51256439e-03 3.19924000e-03 2.37923875e-03 ..., 2.43805298e+03
1.18667139e+03 3.13042038e+02]
[ 3.50985146e-03 3.19677302e-03 2.37741084e-03 ..., 2.43790243e+03
1.18659640e+03 3.13021994e+02]]
print(np.isfinite(Fourier.all()))
True
print(np.isnan(Fourier.any()))
False
Is the white space present because the values are so small compared to the rest of the plot? I'm not sure at all how to fix this.
You can fix this problem by adding option extend='both'.
Example:
C = plt.contourf(P,Altitude, Fourier,100, cmap='jet', extend='both')
Ref: https://matplotlib.org/examples/pylab_examples/contourf_demo.html
In the line plt.contourf(P,Altitude,Fourier,100,cmap='jet') you are taking 100 automatically chosen levels for the contour plot. "Automatic" in this case does not guarantee that those levels include all data.
If you want to make sure they all data is included you may define you own levels to use
plt.contourf(x, y, Z, np.linspace(Z.min(), Z.max(), 100))
Note: I asked this question before but it was closed as a duplicate, however, I, along with several others believe it was unduely closed, I explain why in an edit in my original post. So I would like to re-ask this question here again.
Does anyone know of a python library that can interpolate between two lines. For example, given the two solid lines below, I would like to produce the dashed line in the middle. In other words, I'd like to get the centreline. The input is a just two numpy arrays of coordinates with size N x 2 and M x 2 respectively.
Furthermore, I'd like to know if someone has written a function for this in some optimized python library. Although optimization isn't exactly a necessary.
Here is an example of two lines that I might have, you can assume they do not overlap with each other and an x/y can have multiple y/x coordinates.
array([[ 1233.87375018, 1230.07095987],
[ 1237.63559365, 1253.90749041],
[ 1240.87500801, 1264.43925132],
[ 1245.30875975, 1274.63795396],
[ 1256.1449357 , 1294.48254424],
[ 1264.33600095, 1304.47893299],
[ 1273.38192911, 1313.71468591],
[ 1283.12411536, 1322.35942538],
[ 1293.2559388 , 1330.55873344],
[ 1309.4817002 , 1342.53074698],
[ 1325.7074616 , 1354.50276051],
[ 1341.93322301, 1366.47477405],
[ 1358.15898441, 1378.44678759],
[ 1394.38474581, 1390.41880113]])
array([[ 1152.27115094, 1281.52899302],
[ 1155.53345506, 1295.30515742],
[ 1163.56506781, 1318.41642169],
[ 1168.03497425, 1330.03181319],
[ 1173.26135672, 1341.30559949],
[ 1184.07110925, 1356.54121651],
[ 1194.88086178, 1371.77683353],
[ 1202.58908737, 1381.41765447],
[ 1210.72465255, 1390.65097106],
[ 1227.81309742, 1403.2904646 ],
[ 1244.90154229, 1415.92995815],
[ 1261.98998716, 1428.56945169],
[ 1275.89219696, 1438.21626352],
[ 1289.79440676, 1447.86307535],
[ 1303.69661656, 1457.50988719],
[ 1323.80994319, 1470.41028655],
[ 1343.92326983, 1488.31068591],
[ 1354.31738934, 1499.33260989],
[ 1374.48879779, 1516.93734053],
[ 1394.66020624, 1534.54207116]])
Visualizing this we have:
So my attempt at this has been using the skeletonize function in the skimage.morphology library by first rasterizing the coordinates into a filled in polygon. However, I get branching at the ends like this:
First of all, pardon the overkill; I had fun with your question. If the description is too long, feel free to skip to the bottom, I defined a function that does everything I describe.
Your problem would be relatively straightforward if your arrays were the same length. In that case, all you would have to do is find the average between the corresponding x values in each array, and the corresponding y values in each array.
So what we can do is create arrays of the same length, that are more or less good estimates of your original arrays. We can do this by fitting a polynomial to the arrays you have. As noted in comments and other answers, the midline of your original arrays is not specifically defined, so a good estimate should fulfill your needs.
Note: In all of these examples, I've gone ahead and named the two arrays that you posted a1 and a2.
Step one: Create new arrays that estimate your old lines
Looking at the data you posted:
These aren't particularly complicated functions, it looks like a 3rd degree polynomial would fit them pretty well. We can create those using numpy:
import numpy as np
# Find the range of x values in a1
min_a1_x, max_a1_x = min(a1[:,0]), max(a1[:,0])
# Create an evenly spaced array that ranges from the minimum to the maximum
# I used 100 elements, but you can use more or fewer.
# This will be used as your new x coordinates
new_a1_x = np.linspace(min_a1_x, max_a1_x, 100)
# Fit a 3rd degree polynomial to your data
a1_coefs = np.polyfit(a1[:,0],a1[:,1], 3)
# Get your new y coordinates from the coefficients of the above polynomial
new_a1_y = np.polyval(a1_coefs, new_a1_x)
# Repeat for array 2:
min_a2_x, max_a2_x = min(a2[:,0]), max(a2[:,0])
new_a2_x = np.linspace(min_a2_x, max_a2_x, 100)
a2_coefs = np.polyfit(a2[:,0],a2[:,1], 3)
new_a2_y = np.polyval(a2_coefs, new_a2_x)
The result:
That's not bad so bad! If you have more complicated functions, you'll have to fit a higher degree polynomial, or find some other adequate function to fit to your data.
Now, you've got two sets of arrays of the same length (I chose a length of 100, you can do more or less depending on how smooth you want your midpoint line to be). These sets represent the x and y coordinates of the estimates of your original arrays. In the example above, I named these new_a1_x, new_a1_y, new_a2_x and new_a2_y.
Step two: calculate the average between each x and each y in your new arrays
Then, we want to find the average x and average y value for each of our estimate arrays. Just use np.mean:
midx = [np.mean([new_a1_x[i], new_a2_x[i]]) for i in range(100)]
midy = [np.mean([new_a1_y[i], new_a2_y[i]]) for i in range(100)]
midx and midy now represent the midpoint between our 2 estimate arrays. Now, just plot your original (not estimate) arrays, alongside your midpoint array:
plt.plot(a1[:,0], a1[:,1],c='black')
plt.plot(a2[:,0], a2[:,1],c='black')
plt.plot(midx, midy, '--', c='black')
plt.show()
And voilĂ :
This method still works with more complex, noisy data (but you have to fit the function thoughtfully):
As a function:
I've put the above code in a function, so you can use it easily. It returns an array of your estimated midpoints, in the format you had your original arrays in.
The arguments: a1 and a2 are your 2 input arrays, poly_deg is the degree polynomial you want to fit, n_points is the number of points you want in your midpoint array, and plot is a boolean, whether you want to plot it or not.
import matplotlib.pyplot as plt
import numpy as np
def interpolate(a1, a2, poly_deg=3, n_points=100, plot=True):
min_a1_x, max_a1_x = min(a1[:,0]), max(a1[:,0])
new_a1_x = np.linspace(min_a1_x, max_a1_x, n_points)
a1_coefs = np.polyfit(a1[:,0],a1[:,1], poly_deg)
new_a1_y = np.polyval(a1_coefs, new_a1_x)
min_a2_x, max_a2_x = min(a2[:,0]), max(a2[:,0])
new_a2_x = np.linspace(min_a2_x, max_a2_x, n_points)
a2_coefs = np.polyfit(a2[:,0],a2[:,1], poly_deg)
new_a2_y = np.polyval(a2_coefs, new_a2_x)
midx = [np.mean([new_a1_x[i], new_a2_x[i]]) for i in range(n_points)]
midy = [np.mean([new_a1_y[i], new_a2_y[i]]) for i in range(n_points)]
if plot:
plt.plot(a1[:,0], a1[:,1],c='black')
plt.plot(a2[:,0], a2[:,1],c='black')
plt.plot(midx, midy, '--', c='black')
plt.show()
return np.array([[x, y] for x, y in zip(midx, midy)])
[EDIT]:
I was thinking back on this question, and I overlooked a simpler way to do this, by "densifying" both arrays to the same number of points using np.interp. This method follows the same basic idea as the line-fitting method above, but instead of approximating lines using polyfit / polyval, it just densifies:
min_a1_x, max_a1_x = min(a1[:,0]), max(a1[:,0])
min_a2_x, max_a2_x = min(a2[:,0]), max(a2[:,0])
new_a1_x = np.linspace(min_a1_x, max_a1_x, 100)
new_a2_x = np.linspace(min_a2_x, max_a2_x, 100)
new_a1_y = np.interp(new_a1_x, a1[:,0], a1[:,1])
new_a2_y = np.interp(new_a2_x, a2[:,0], a2[:,1])
midx = [np.mean([new_a1_x[i], new_a2_x[i]]) for i in range(100)]
midy = [np.mean([new_a1_y[i], new_a2_y[i]]) for i in range(100)]
plt.plot(a1[:,0], a1[:,1],c='black')
plt.plot(a2[:,0], a2[:,1],c='black')
plt.plot(midx, midy, '--', c='black')
plt.show()
The "line between two lines" is not so well defined. You can obtain a decent though simple solution by triangulating between the two curves (you can triangulate by progressing from vertex to vertex, choosing the diagonals that produce the less skewed triangle).
Then the interpolated curve joins the middles of the sides.
I work with rivers, so this is a common problem. One of my solutions is exactly like the one you showed in your question--i.e. skeletonize the blob. You see that the boundaries have problems, so what I've done that seems to work well is to simply mirror the boundaries. For this approach to work, the blob must not intersect the corners of the image.
You can find my implementation in RivGraph; this particular algorithm is in rivers/river_utils.py called "mask_to_centerline".
Here's an example output showing how the ends of the centerline extend to the desired edge of the object:
sacuL's solution almost worked for me, but I needed to aggregate more than just two curves.
Here is my generalization for sacuL's solution:
def interp(*axis_list):
min_max_xs = [(min(axis[:,0]), max(axis[:,0])) for axis in axis_list]
new_axis_xs = [np.linspace(min_x, max_x, 100) for min_x, max_x in min_max_xs]
new_axis_ys = [np.interp(new_x_axis, axis[:,0], axis[:,1]) for axis, new_x_axis in zip(axis_list, new_axis_xs)]
midx = [np.mean([new_axis_xs[axis_idx][i] for axis_idx in range(len(axis_list))]) for i in range(100)]
midy = [np.mean([new_axis_ys[axis_idx][i] for axis_idx in range(len(axis_list))]) for i in range(100)]
for axis in axis_list:
plt.plot(axis[:,0], axis[:,1],c='black')
plt.plot(midx, midy, '--', c='black')
plt.show()
If we now run an example:
a1 = np.array([[x, x**2+5*(x%4)] for x in range(10)])
a2 = np.array([[x-0.5, x**2+6*(x%3)] for x in range(10)])
a3 = np.array([[x+0.2, x**2+7*(x%2)] for x in range(10)])
interp(a1, a2, a3)
we get the plot:
I'm trying to build a program to map a 2d coordinate (latitude, longitude) to a float value. I have about 1 million rows of training data like
(41.140359, -8.612964) -> 65
... -> ...
I think this is a regression problem, except all of the regression examples I've found are only using 1 dimension, so I'm not sure.
What algorithm (or category of algorithms) should I use in this instance?
Before trying to find a function, plot your data on an excel of python plot, you may see the kind of function you are looking for.
In addition, excel has a regression computation module.
It is a regression problem and you can freely use e.g. linear regression to solve it. The examples are often one-dimensional so it is easy to understand, however they work for an arbitrary number of dimensions.
You can try to use linear regression first.
Lets give an example using numpy.linalg.lstsq:
>>> import numpy as np
>>> x = np.random.rand(10, 2)
>>> x
array([[ 0.7920302 , 0.05650698],
[ 0.76380636, 0.07123805],
[ 0.18650694, 0.89150851],
[ 0.22730377, 0.83013102],
[ 0.72369719, 0.07772721],
[ 0.26277287, 0.44253368],
[ 0.44421399, 0.98533921],
[ 0.91476656, 0.27183732],
[ 0.74745802, 0.08840694],
[ 0.60000819, 0.67162258]])
>>> y = np.random.rand(10)
>>> y
array([ 0.53341968, 0.63964031, 0.46097061, 0.68602146, 0.20041928,
0.42642768, 0.34039486, 0.93539655, 0.29946688, 0.57526445])
>>> m, c = np.linalg.lstsq(x, y)[0]
>>> print m,c
0.605269341974 0.370359070752
See documentation for more information about plotting and what those values represent.
I would like to get data from a single contour of evenly spaced 2D data (an image-like data).
Based on the example found in a similar question: How can I get the (x,y) values of the line that is ploted by a contour plot (matplotlib)?
>>> import matplotlib.pyplot as plt
>>> x = [1,2,3,4]
>>> y = [1,2,3,4]
>>> m = [[15,14,13,12],[14,12,10,8],[13,10,7,4],[12,8,4,0]]
>>> cs = plt.contour(x,y,m, [9.5])
>>> cs.collections[0].get_paths()
The result of this call into cs.collections[0].get_paths() is:
[Path([[ 4. 1.625 ]
[ 3.25 2. ]
[ 3. 2.16666667]
[ 2.16666667 3. ]
[ 2. 3.25 ]
[ 1.625 4. ]], None)]
Based on the plots, this result makes sense and appears to be collection of (y,x) pairs for the contour line.
Other than manually looping over this return value, extracting the coordinates and assembling arrays for the line, are there better ways to get data back from a matplotlib.path object? Are there pitfalls to be aware of when extracting data from a matplotlib.path?
Alternatively, are there alternatives within matplotlib or better yet numpy/scipy to do a similar thing? Ideal thing would be to get a high resolution vector of (x,y) pairs describing the line, which could be used for further analysis, as in general my datasets are not a small or simple as the example above.
For a given path, you can get the points like this:
p = cs.collections[0].get_paths()[0]
v = p.vertices
x = v[:,0]
y = v[:,1]
from: http://matplotlib.org/api/path_api.html#module-matplotlib.path
Users of Path objects should not access the vertices and codes arrays
directly. Instead, they should use iter_segments() to get the
vertex/code pairs. This is important, since many Path objects, as an
optimization, do not store a codes at all, but have a default one
provided for them by iter_segments().
Otherwise, I'm not really sure what your question is. [Zip] is a sometimes useful built in function when working with coordinates. 1
The vertices of an all paths can be returned as a numpy array of float64 simply via:
cs.allsegs[i][j] # for element j, in level i
where cs is defined as in the original question as:
import matplotlib.pyplot as plt
x = [1, 2, 3, 4]
y = [1, 2, 3, 4]
m = [[15, 14, 13, 12], [14, 12, 10, 8], [13, 10, 7, 4], [12, 8, 4, 0]]
cs = plt.contour(x, y, m, [9.5])
More detailed:
Going through the collections and extracting the paths and vertices is not the most straight forward or fastest thing to do. The returned Contour object actually has attributes for the segments via cs.allsegs, which returns a nested list of shape [level][element][vertex_coord]:
num_levels = len(cs.allsegs)
num_element = len(cs.allsegs[0]) # in level 0
num_vertices = len(cs.allsegs[0][0]) # of element 0, in level 0
num_coord = len(cs.allsegs[0][0][0]) # of vertex 0, in element 0, in level 0
See reference:
https://matplotlib.org/stable/api/contour_api.html
I am facing a similar problem, and stumbled over this matplotlib list discussion.
Basically, it is possible to strip away the plotting and call the underlying functions directly, not super convenient, but possible. The solution is also not pixel precise, as there is probably some interpolation going on in the underlying code.
import matplotlib.pyplot as plt
import matplotlib._cntr as cntr
import scipy as sp
data = sp.zeros((6,6))
data[2:4,2:4] = 1
plt.imshow(data,interpolation='none')
level=0.5
X,Y = sp.meshgrid(sp.arange(data.shape[0]),sp.arange(data.shape[1]))
c = cntr.Cntr(X, Y, data.T)
nlist = c.trace(level, level, 0)
segs = nlist[:len(nlist)//2]
for seg in segs:
plt.plot(seg[:,0],seg[:,1],color='white')
plt.show()