How to generate equispaced interpolating values

How to generate equispaced interpolating values - python

I have a list of (x,y) values that are not uniformly spaced. Here is the archive used in this question.
I am able to interpolate between the values but what I get are not equispaced interpolating points. Here's what I do:
x_data = [0.613,0.615,0.615,...]
y_data = [5.919,5.349,5.413,...]
# Interpolate values for x and y.
t = np.linspace(0, 1, len(x_data))
t2 = np.linspace(0, 1, 100)
# One-dimensional linear interpolation.
x2 = np.interp(t2, t, x_data)
y2 = np.interp(t2, t, y_data)
# Plot x,y data.
plt.scatter(x_data, y_data, marker='o', color='k', s=40, lw=0.)
# Plot interpolated points.
plt.scatter(x2, y2, marker='o', color='r', s=10, lw=0.5)
Which results in:
As can be seen, the red dots are closer together in sections of the graph where the original points distribution is denser.
I need a way to generate the interpolated points equispaced in x, y according to a given step value (say 0.1)
As askewchan correctly points out, when I mean "equispaced in x, y" I mean that two consecutive interpolated points in the curve should be distanced from each other (euclidean straight line distance) by the same value.
I tried unubtu's answer and it works well for smooth curves but seems to break for not so smooth ones:
This happens because the code calculates the point distance in an euclidean way instead of directly over the curve and I need the distance over the curve to be the same between points. Can this issue be worked around somehow?

Convert your xy-data to a parametrized curve, i.e. calculate all all distances between the points and generate the coordinates on the curve by cumulative summing. Then interpolate the x- and y-coordinates independently with respect to the new coordinates.
import numpy as np
from matplotlib import pyplot as plt
data = '''0.615 5.349
0.615 5.413
0.617 6.674
0.617 6.616
0.63 7.418
0.642 7.809
0.648 8.04
0.673 8.789
0.695 9.45
0.712 9.825
0.734 10.265
0.748 10.516
0.764 10.782
0.775 10.979
0.783 11.1
0.808 11.479
0.849 11.951
0.899 12.295
0.951 12.537
0.972 12.675
1.038 12.937
1.098 13.173
1.162 13.464
1.228 13.789
1.294 14.126
1.363 14.518
1.441 14.969
1.545 15.538
1.64 16.071
1.765 16.7
1.904 17.484
2.027 18.36
2.123 19.235
2.149 19.655
2.172 20.096
2.198 20.528
2.221 20.945
2.265 21.352
2.312 21.76
2.365 22.228
2.401 22.836
2.477 23.804'''
data = np.array([line.split() for line in data.split('\n')],dtype=float)
x,y = data.T
xd = np.diff(x)
yd = np.diff(y)
dist = np.sqrt(xd**2+yd**2)
u = np.cumsum(dist)
u = np.hstack([[0],u])
t = np.linspace(0,u.max(),10)
xn = np.interp(t, u, x)
yn = np.interp(t, u, y)
f = plt.figure()
ax = f.add_subplot(111)
ax.set_aspect('equal')
ax.plot(x,y,'o', alpha=0.3)
ax.plot(xn,yn,'ro', markersize=8)
ax.set_xlim(0,5)

Let's first consider a simple case. Suppose your data looked like the blue line,
below.
If you wanted to select equidistant points that were r distance apart,
then there would be some critical value for r where the cusp at (1,2) is the first equidistant point.
If you wanted points that were greater than this critical distance apart, then
the first equidistant point would jump from (1,2) to some place very different --
depicted by the intersection of the green arc with the blue line. The change is not gradual.
This toy case suggests that a tiny change in the parameter r can have a radical, discontinuous affect on the solution.
It also suggests that you must know the location of the ith equidistant point
before you can determine the location of the (i+1)-th equidistant point.
So it appears an iterative solution is required:
import numpy as np
import matplotlib.pyplot as plt
import math
x, y = np.genfromtxt('data', unpack=True, skip_header=1)
# find lots of points on the piecewise linear curve defined by x and y
M = 1000
t = np.linspace(0, len(x), M)
x = np.interp(t, np.arange(len(x)), x)
y = np.interp(t, np.arange(len(y)), y)
tol = 1.5
i, idx = 0, [0]
while i < len(x):
total_dist = 0
for j in range(i+1, len(x)):
total_dist += math.sqrt((x[j]-x[j-1])**2 + (y[j]-y[j-1])**2)
if total_dist > tol:
idx.append(j)
break
i = j+1
xn = x[idx]
yn = y[idx]
fig, ax = plt.subplots()
ax.plot(x, y, '-')
ax.scatter(xn, yn, s=50)
ax.set_aspect('equal')
plt.show()
Note: I set the aspect ratio to 'equal' to make it more apparent that the points are equidistant.

The following script will interpolate points with a equal step of x_max - x_min / len(x) = 0.04438
import numpy as np
from scipy.interpolate import interp1d
import matplotlib.pyplot as plt
data = np.loadtxt('data.txt')
x = data[:,0]
y = data[:,1]
f = interp1d(x, y)
x_new = np.linspace(np.min(x), np.max(x), x.shape[0])
y_new = f(x_new)
plt.plot(x,y,'o', x_new, y_new, '*r')
plt.show()

Expanding on the answer by #Christian K., here's how to do this for higher dimensional data with scipy.interpolate.interpn. Let's say we want to resample to 10 equally-spaced points:
import numpy as np
import scipy
# Assuming that 'data' is rows x dims (where dims is the dimensionality)
diffs = data[1:, :] - data[:-1, :]
dist = np.linalg.norm(diffs, axis=1)
u = np.cumsum(dist)
u = np.hstack([[0], u])
t = np.linspace(0, u[-1], 10)
resampled = scipy.interpolate.interpn((u,), pts, t)

It IS possible to generate equidistant points along the curve. But there must be more definition of what you want for a real answer. Sorry, but the code I've written for this task is in MATLAB, but I can describe the general ideas. There are three possibilities.
First, are the points to be truly equidistant from the neighbors in terms of a simple Euclidean distance? To do so would involve finding the intersection at any point on the curve with a circle of a fixed radius. Then just step along the curve.
Next, if you intend distance to mean distance along the curve itself, if the curve is a piecewise linear one, the problem is again easy to do. Just step along the curve, since distance on a line segment is easy to measure.
Finally, if you intend for the curve to be a cubic spline, again this is not incredibly difficult, but is a bit more work. Here the trick is to:
Compute the piecewise linear arclength from point to point along the curve. Call it t.
Generate a pair of cubic splines, x(t), y(t).
Differentiate x and y as functions of t. Since these are cubic segments, this is easy. The derivative functions will be piecewise quadratic.
Use an ode solver to move along the curve, integrating the differential arclength function. In MATLAB, ODE45 worked nicely.
Thus, one integrates
sqrt((x')^2 + (y')^2)
Again, in MATLAB, ODE45 can be set to identify those locations where the function crosses certain specified points.
If your MATLAB skills are up to the task, you can look at the code in interparc for more explanation. It is reasonably well commented code.

Related

Algorithm for generating smooth curves from incoming data points

I'm looking for an algorithm that smoothly interpolates points as they come in live.
For example, say I start with an array of 10 (x,y) pairs. I'm currently using scipy and a gaussian window to generate a smooth curve. However, what I can't figure out is how to update the smoothed curve in response to an 11th point generated at some future point (without completely redoing the smoothing for all 11 points).
What I'm looking for is an algorithm that follows the previous smooth curve up to the 10th (x,y) pair and also smoothly interpolates between the 10th and 11th pair (in a way that's similar to redoing the entire algorithm - so no sharp edges). Is there something out there that does what I'm looking for?

I think you could make use of a Cubic Spline. Given a list of n points (x_1, y_1)..(x_n, y_n), the algorithm finds a cubic polynomial p_k between (x_k, y_k) and (x_{k+1}, y_{k+1}) with the following constraints:
polynomials p_k and p_{k+1} passes through the point (x_{k+1}, y_{k+1});
polynomials p_k and p_{k+1} have the same first derivative at (x_{k+1}, y_{k+1});
polynomials p_k and p_{k+1} have the same second derivative at (x_{k+1}, y_{k+1}).
Also, there are some boundary conditions, defined for the first and the last polynomial. I have used natural, which forces the second derivative to zero at the end of the curves.
The steps that you could apply are:
Interpolate the first 10 points using the Cubic Spline.
Assign the first derivative value at p_10 to a variable d.
Run the Cubic Spline for p_10 and p_11, enforcing that the first derivative at p_10 is d and the second derivative at p_11 is zero.
From there, you can repeat the same steps for the remaining points.
This code will generate a interpolation for all points:
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import CubicSpline
height=4
n = 20
x = np.arange(n)
xs = np.arange(-0.1,n+0.1,0.1)
y = np.random.uniform(low=0, high=height, size=n)
plt.plot(x, y, 'o', label='data')
cs = CubicSpline(x, y)
plt.plot(xs, cs(xs), color='orange')
plt.ylim([0, height+1])
Now, this code will interpolate the first 10 points, followed by another interpolation between points 10 and 11:
k = 10
delta = 0.001
plt.plot(x, y, 'o', label='data')
xs = np.arange(x[0], x[k-1]+delta, delta)
cs = CubicSpline(x[0:k], y[0:k])
plt.plot(xs, cs(xs), color='red')
d = cs(x[k-1], 1)
xs2 = np.arange(x[k-1], x[k]+delta, delta)
cs2 = CubicSpline(x[k-1:k+1], y[k-1:k+1], bc_type=((1, d), 'natural'))
plt.plot(xs2, cs2(xs2), color='blue')
plt.ylim([0, height+1])

How to compute and plot the pdf from the empirical cdf?

I have two numpy arrays, one is an array of x values and the other an array of y values and together they give me the empirical cdf. E.g.:
plt.plot(xvalues, yvalues)
plt.show()
I assume the data needs to be smoothed somehow in order to give a smooth pdf.
I would like to plot the pdf. How can I do that?
The raw data is at: http://dpaste.com/1HVK5DR .

There are two main problems: Your data seems to be quite noisy, and it is not equally spaced: The points at the low end are sampled quite densly, while the ponts at the high end are sampled quite sparsely. This can cause numerical issues.
So first I suggest resampling the data using a linear interpolation to get equaly spaced samples: (Note that all the snippets appended to eachother form the content of one python file.)
import matplotlib.pyplot as plt
import numpy as np
from data import xvalues, yvalues #load data from file
print("#datapoints: {}".format(len(xvalues)))
#don't use every point if your computer is not very fast
xv = np.array(xvalues)[::5]
yv = np.array(yvalues)[::5]
#interpolate to have evenly space data
xi = np.linspace(xv.min(), xv.max(), 400)
yi = np.interp(xi, xv, yv)
Then, to smoothen the data, I suggest performing a RBF regression (=using an "RBF Network"). The idea is fiting a curve of the form
c(t) = sum a(i) * phi(t - x(i)) #(not part of the program)
where phi is some radial basis function. (In theory we could use any functions.) To have a very smooth result I choose a very smooth function, namely a gaussian: phi(x) = exp( - x^2/sigma^2) where sigma is yet to be determined. The x(i) are just some nodes that we can define. If we have a smooth function, we just need a few nodes. The number of nodes also determines how much computation needs to be done. The a(i) are the coefficients we can optimize to get the best fit. In this case I just use a least squares approach.
Note that IF we can write a function in the form above, it is very easy to compute the derivative, it is just
c(t) = sum a(i) * phi'(t - x(i))
where phi' is the derivative of phi. #(not part of the program)
Regarding sigma: It is usually a good idea to choose it as a multiple of the step between the nodes we chose. The greater we choose sigma, the smoother the resulting function gets.
#set up rbf network
rbf_nodes = xv[::50][None, :]#use a subset of the x-values as rbf nodes
print("#rbfs: {}".format(rbf_nodes.shape[1]))
#estimate width of kernels:
sigma = 20 #greater = smoother, this is the primary parameter to play with
sigma *= np.max(np.abs(rbf_nodes[0,1:]-rbf_nodes[0,:-1]))
# kernel & derivative
rbf = lambda r:1/(1+(r/sigma)**2)
Drbf = lambda r: -2*r*sigma**2/(sigma**2 + r**2)**2
#compute coefficients of rbf network
r = np.abs(xi[:, None]-rbf_nodes)
A = rbf(r)
coeffs = np.linalg.lstsq(A, yi, rcond=None)[0]
print(coeffs)
#evaluate rbf network
N=1000
xe = np.linspace(xi.min(), xi.max(), N)
Ae = rbf(xe[:, None] - rbf_nodes)
ye = Ae # coeffs
#evaluate derivative
N=1000
xd = np.linspace(xi.min(), xi.max(), N)
Bd = Drbf(xe[:, None] - rbf_nodes)
yd = Bd # coeffs
fig,ax = plt.subplots()
ax2 = ax.twinx()
ax.plot(xv, yv, '-')
ax.plot(xi, yi, '-')
ax.plot(xe, ye, ':')
ax2.plot(xd, yd, '-')
fig.savefig('graph.png')
print('done')

You need the derivative to go from CDF to PDF
PDF(x) = d CDF(x)/ dx
With NumPy, you could use gradient
pdf = np.gradient(yvalues, xvalues)
plt.plot(xvalues, pdf)
plt.show()
or manual differential
pdf = np.diff(yvalues)/np.diff(xvalues)
l = np.asarray(xvalues[:-1])
r = np.asarray(xvalues[1:])
plt.plot((l+r)/2.0, pdf) # points in the middle of interval
plt.show()
Both produce something like, updated picture it got botched somehow

How can I count number of points between 2 contours in Python

I am using matplotlib.pyplot to interpolate my data and create contours.
Following this answer/example (about how to calculate area within a contour), I am able to get the vertices of a contour line.
Is there a way to use that information, i.e., the vertices of a line, to count how many points fall between two given contours? These points will be different from the data used for deriving the contours.

Usually, you do not want to reverse engineer your plot to obtain some data. Instead you can interpolate the array that is later used for plotting the contours and find out which of the points lie in regions of certain values.
The following would find all points between the levels of -0.8 and -0.4, print them and show them in red on the plot.
import numpy as np; np.random.seed(1)
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
from scipy.interpolate import Rbf
X, Y = np.meshgrid(np.arange(-3.0, 3.0, 0.1), np.arange(-2.4, 1.0, 0.1))
Z1 = mlab.bivariate_normal(X, Y, 1.0, 1.0, 0.0, 0.0)
Z2 = mlab.bivariate_normal(X, Y, 1.5, 0.5, 1, 1)
Z = 10.0 * (Z2 - Z1)
points = np.random.randn(15,2)/1.2
levels = [-1.2, -0.8,-0.4,-0.2]
# interpolate points
f = Rbf(X.flatten(), Y.flatten(), Z.flatten())
zi = f(points[:,0], points[:,1])
# add interpolated points to array with columns x,y,z
points3d = np.zeros((points.shape[0],3))
points3d[:,:2] = points
points3d[:,2] = zi
# masking condition for points between levels
filt = (zi>levels[1]) & (zi <levels[2])
# print points between the second and third level
print(points3d[filt,:])
### plotting
fig, ax = plt.subplots()
CS = ax.contour(X, Y, Z, levels=levels)
ax.clabel(CS, inline=1, fontsize=10)
#plot points between the second and third level in red:
ax.scatter(points[:,0], points[:,1], c=filt.astype(float), cmap="bwr" )
plt.show()

I am not sure if I understand what points do you want to check, but, if you have the line vertices (two points) and want to check if the third point falls in between the two, you can take a simple (not efficient) approach and calculate the area of the triangle formed by the three. If the area is 0 then the point falls on the same line. Also, you can calculate the distance between the points and see if the point is in between on the line or outside (on the extended) line.
Hope this helps!

Ampere's law of magnetic fields can help here - although it can be computationally expensive. This law says that the path integral of a magnetic field along a closed loop is proportional to the current inside the loop.
Suppose you have a contour C and a point P (x0,y0). Imagine an infinite wire located at P perpendicular to the page (current going into the page) carrying some current. Using Ampere's law we can prove that the magnetic field produced by the wire at a point P (x,y) is inversely proportional to the distance from (x0,y0) to (x,y) and tangential to the circle centered at point P passing through point (x,y). Therefore if the wire is located outside the contour the path integral is zero.
import numpy as np
import pylab as plt
# generating a mesh and values on it
delta = 0.1
x = np.arange(-3.1*2, 3.1*2, delta)
y = np.arange(-3.1*2, 3.1*2, delta)
X, Y = np.meshgrid(x, y)
Z = np.sqrt(X**2 + Y**2)
# generating the contours with some levels
levels = [1.0]
plt.figure(figsize=(10,10))
cs = plt.contour(X,Y,Z,levels=levels)
# finding vertices on a particular level
contours = cs.collections[0]
vertices_level = contours.get_paths()[0].vertices # In this example the
shape of vertices_level is (161,2)
# converting points into two lists; one per dimension. This step can be optimized
lX, lY = list(zip(*vertices_level))
# computing Ampere's Law rhs
def AmpereLaw(x0,y0,lX,lY):
S = 0
for ii in range(len(lX)-1):
dx = lX[ii+1] - lX[ii]
dy = lY[ii+1] - lY[ii]
ds = (1/((lX[ii]-x0)**2+(lY[ii]-y0)**2))*(-(lY[ii]-y0)*dx+(lX[ii]-x0)*dy)
if -1000 < ds < 1000: #to avoid very lare numbers when denominator is small
S = S + ds
return(S)
# we know point (0,0) is inside the contour
AmpereLaw(0,0,lX,lY)
# result: -6.271376740062852
# we know point (0,0) is inside the contour
AmpereLaw(-2,0,lX,lY)
# result: 0.00013279920934375876
You can use this result to find points inside one contour but outside of the other.

Basemap streamplot blank sphere

I want to make a streamplot in Basemap module, but I get a blank sphere. Please help me resolve this problem. I use matplotlib 1.3 and ordinary streamplot is working fine.
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits.basemap import Basemap
map = Basemap(projection='ortho',lat_0=45,lon_0=-100,resolution='l')
# draw lat/lon grid lines every 30 degrees.
map.drawmeridians(np.arange(0,360,30))
map.drawparallels(np.arange(-90,90,30))
# prepare grids
lons = np.linspace(0, 2*np.pi, 100)
lats = np.linspace(-np.pi/2, np.pi/2, 100)
lons, lats = np.meshgrid(lons, lats)
# parameters for vector field
beta = 0.0
alpha = 1.0
u = -np.cos(lats)*(beta - alpha*np.cos(2.0*lons))
v = alpha*(1.0 - np.cos(lats)**2)*np.sin(2.0*lons)
speed = np.sqrt(u*u + v*v)
# compute native map projection coordinates of lat/lon grid.
x, y = map(lons*180./np.pi, lats*180./np.pi)
# contour data over the map.
cs = map.streamplot(x, y, u, v, latlon = True, color = speed, cmap=plt.cm.autumn, linewidth=0.5)
plt.show()

I can't exactly tell you what's wrong, but from the matplotlib.streamplot manual:
matplotlib.pyplot.streamplot(x, y, u, v, density=1, linewidth=None,
color=None, cmap=None, norm=None, arrowsize=1, arrowstyle=u'-|>',
minlength=0.1, transform=None, zorder=1, hold=None)¶
Draws streamlines of a vector flow.
x, y : 1d arrays
an evenly spaced grid.
u, v : 2d arrays
x and y-velocities. Number of rows should match length of y, and the number of columns should match x.
Additionally from matplotlib.basemap.streamplot you can read that
If latlon keyword is set to True, x,y are intrepreted as longitude and latitude in degrees.
Which corresponds to the fact that x and y should be 1D arrays (lat, lon). However in your example x and y are
>>> np.shape(x)
(100, 100)
>>> np.shape(y)
(100, 100)
Then again you call the method map() "to compute native map projection coordinates of lat/lon grid" which is coincidentally the same as the name of your basemap.map. So it depends on which one do you want? Because both will return a value! (or better to say, both will return an error)
Aditionally check out the values you have in your u array. They are of range e-17. While other values are easily in the range e+30. IIRC the way you get streamlines is by solving a differential equation in which points you sent it as values are used as parameters at coordinates you sent. It's not hard to imagine a that while calculating something with these numbers a floating point round-off occurs and you suddenly start getting NaN or 0 values.
Try to scale your example better or if you want to pursue the solution to the end you can try and use np.seterr to get a more detailed idea where it fails.
Sorry I couldn't have been of a bigger help.

Fitting a line in 3D

Are there any algorithms that will return the equation of a straight line from a set of 3D data points? I can find plenty of sources which will give the equation of a line from 2D data sets, but none in 3D.
Thanks.

If you are trying to predict one value from the other two, then you should use lstsq with the a argument as your independent variables (plus a column of 1's to estimate an intercept) and b as your dependent variable.
If, on the other hand, you just want to get the best fitting line to the data, i.e. the line which, if you projected the data onto it, would minimize the squared distance between the real point and its projection, then what you want is the first principal component.
One way to define it is the line whose direction vector is the eigenvector of the covariance matrix corresponding to the largest eigenvalue, that passes through the mean of your data. That said, eig(cov(data)) is a really bad way to calculate it, since it does a lot of needless computation and copying and is potentially less accurate than using svd. See below:
import numpy as np
# Generate some data that lies along a line
x = np.mgrid[-2:5:120j]
y = np.mgrid[1:9:120j]
z = np.mgrid[-5:3:120j]
data = np.concatenate((x[:, np.newaxis],
y[:, np.newaxis],
z[:, np.newaxis]),
axis=1)
# Perturb with some Gaussian noise
data += np.random.normal(size=data.shape) * 0.4
# Calculate the mean of the points, i.e. the 'center' of the cloud
datamean = data.mean(axis=0)
# Do an SVD on the mean-centered data.
uu, dd, vv = np.linalg.svd(data - datamean)
# Now vv[0] contains the first principal component, i.e. the direction
# vector of the 'best fit' line in the least squares sense.
# Now generate some points along this best fit line, for plotting.
# I use -7, 7 since the spread of the data is roughly 14
# and we want it to have mean 0 (like the points we did
# the svd on). Also, it's a straight line, so we only need 2 points.
linepts = vv[0] * np.mgrid[-7:7:2j][:, np.newaxis]
# shift by the mean to get the line in the right place
linepts += datamean
# Verify that everything looks right.
import matplotlib.pyplot as plt
import mpl_toolkits.mplot3d as m3d
ax = m3d.Axes3D(plt.figure())
ax.scatter3D(*data.T)
ax.plot3D(*linepts.T)
plt.show()
Here's what it looks like:

If your data is fairly well behaved then it should be sufficient to find the least squares sum of the component distances. Then you can find the linear regression with z independent of x and then again independent of y.
Following the documentation example:
import numpy as np
pts = np.add.accumulate(np.random.random((10,3)))
x,y,z = pts.T
# this will find the slope and x-intercept of a plane
# parallel to the y-axis that best fits the data
A_xz = np.vstack((x, np.ones(len(x)))).T
m_xz, c_xz = np.linalg.lstsq(A_xz, z)[0]
# again for a plane parallel to the x-axis
A_yz = np.vstack((y, np.ones(len(y)))).T
m_yz, c_yz = np.linalg.lstsq(A_yz, z)[0]
# the intersection of those two planes and
# the function for the line would be:
# z = m_yz * y + c_yz
# z = m_xz * x + c_xz
# or:
def lin(z):
x = (z - c_xz)/m_xz
y = (z - c_yz)/m_yz
return x,y
#verifying:
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
fig = plt.figure()
ax = Axes3D(fig)
zz = np.linspace(0,5)
xx,yy = lin(zz)
ax.scatter(x, y, z)
ax.plot(xx,yy,zz)
plt.savefig('test.png')
plt.show()
If you want to minimize the actual orthogonal distances from the line (orthogonal to the line) to the points in 3-space (which I'm not sure is even referred to as linear regression). Then I would build a function that computes the RSS and use a scipy.optimize minimization function to solve it.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.