How can a spline be created if only the points and the coefficients are known? I'm using scipy.interpolate.BSpline here, but am open to other standard packages as well. So basically I want to be able to give someone just those short arrays of coefficients for them to be able to recreate the fit to the data. See the failed red-dashed curve below.
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import BSpline, LSQUnivariateSpline
x = np.linspace(0, 10, 50) # x-data
y = np.exp(-(x-5)**2/4) # y-data
# define the knot positions
t = [1, 2, 4, 5, 6, 8, 9]
# get spline fit
s1 = LSQUnivariateSpline(x, y, t)
x2 = np.linspace(0, 10, 200) # new x-grid
y2 = s1(x2) # evaluate spline on that new grid
# FAILED: try to construct BSpline using the knots and coefficients
k = s1.get_knots()
c = s1.get_coeffs()
s2 = BSpline(t,c,2)
# plotting
plt.plot(x, y, label='original')
plt.plot(t, s1(t),'o', label='knots')
plt.plot(x2, y2, '--', label='spline 1')
plt.plot(x2, s2(x2), 'r:', label='spline 2')
plt.legend()
The fine print under get_knots says:
Internally, the knot vector contains 2*k additional boundary knots.
That means, to get a usable knot array from get_knots, one should add k copies of the left boundary knot at the beginning of array, and k copies of the right boundary knot at the end. Here k is the degree of the spline, which is usually 3 (you asked for LSQUnivariateSpline of default degree, so that's 3). So:
kn = s1.get_knots()
kn = 3*[kn[0]] + list(kn) + 3*[kn[-1]]
c = s1.get_coeffs()
s2 = BSpline(kn, c, 3) # not "2" as in your sample; we are working with a cubic spline
Now, the spline s2 is the same as s1:
Equivalently, kn = 4*[x[0]] + t + 4*[x[-1]] would work: your t list contains only interior knots, so x[0] and x[-1] are added, and then each repeated k times more.
The mathematical reason for the repetition is that B-splines need some room to get built, due to their inductive definition which requires (k-1)-degree splines to exist around every interval in which we define the kth degree spline.
Here is a slightly more compact way of doing it if you don't care too much about the details of the knot positions. The tk array is what you are looking for. Once tk is in hand the spline can be reproduced using the y=splev(x,tk,der=0) line.
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import splrep,splev
import matplotlib.pyplot as plt
### Input data
x_arr = np.linspace(0, 10, 50) # x-data
y_arr = np.exp(-(x_arr-5)**2/4) # y-data
### Order of spline
order = 3
### Make the spline
tk = splrep(x_arr, y_arr, k=order) # Returns the knots and coefficents
### Evaluate the spline using the knots and coefficents on the domian x
x = np.linspace(0, 10, 1000) # new x-grid
y = splev(x, tk, der=0)
### Plot
f,ax=plt.subplots()
ax.scatter(x_arr, y_arr, label='original')
ax.plot(x,y,label='Spline')
ax.legend(fontsize=15)
plt.show()
Related
say we have a 2D grid that is projected on a 3D surface, resulting in a 3D numpy array, like the below image. What is the most efficient way to calculate a surface normal for each point of this grid?
I can give you an example with simulated data:
I showed your way, with three points. With three points you can always calculate the cross product to get the perpendicular vector based on the two vectors created from three points. Order does not matter.
I took the liberty to also add the PCA approach using predefined sklearn functions. You can create your own PCA, good exercise to understand what happens under the hood but this works fine. The benefit of the approach is that it is easy to increase the number of neighbors and you are still able to calculate the normal vector. It is also possible to select the neighbors within a range instead of N nearest neighbors.
If you need more explanation about the working of the code please let me know.
from functools import partial
import numpy as np
from sklearn.neighbors import KDTree
from mpl_toolkits.mplot3d import axes3d
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
# Grab some test data.
X, Y, Z = axes3d.get_test_data(0.25)
X, Y, Z = map(lambda x: x.flatten(), [X, Y, Z])
plt.plot(X, Y, Z, '.')
plt.show(block=False)
data = np.array([X, Y, Z]).T
tree = KDTree(data, metric='minkowski') # minkowki is p2 (euclidean)
# Get indices and distances:
dist, ind = tree.query(data, k=3) #k=3 points including itself
def calc_cross(p1, p2, p3):
v1 = p2 - p1
v2 = p3 - p1
v3 = np.cross(v1, v2)
return v3 / np.linalg.norm(v3)
def PCA_unit_vector(array, pca=PCA(n_components=3)):
pca.fit(array)
eigenvalues = pca.explained_variance_
return pca.components_[ np.argmin(eigenvalues) ]
combinations = data[ind]
normals = list(map(lambda x: calc_cross(*x), combinations))
# lazy with map
normals2 = list(map(PCA_unit_vector, combinations))
## NEW ##
def calc_angle_with_xy(vectors):
'''
Assuming unit vectors!
'''
l = np.sum(vectors[:,:2]**2, axis=1) ** 0.5
return np.arctan2(vectors[:, 2], l)
dist, ind = tree.query(data, k=5) #k=3 points including itself
combinations = data[ind]
# map with functools
pca = PCA(n_components=3)
normals3 = list(map(partial(PCA_unit_vector, pca=pca), combinations))
print( combinations[10] )
print(normals3[10])
n = np.array(normals3)
n[calc_angle_with_xy(n) < 0] *= -1
def set_axes_equal(ax):
'''Make axes of 3D plot have equal scale so that spheres appear as spheres,
cubes as cubes, etc.. This is one possible solution to Matplotlib's
ax.set_aspect('equal') and ax.axis('equal') not working for 3D.
Input
ax: a matplotlib axis, e.g., as output from plt.gca().
FROM: https://stackoverflow.com/questions/13685386/matplotlib-equal-unit-length-with-equal-aspect-ratio-z-axis-is-not-equal-to
'''
x_limits = ax.get_xlim3d()
y_limits = ax.get_ylim3d()
z_limits = ax.get_zlim3d()
x_range = abs(x_limits[1] - x_limits[0])
x_middle = np.mean(x_limits)
y_range = abs(y_limits[1] - y_limits[0])
y_middle = np.mean(y_limits)
z_range = abs(z_limits[1] - z_limits[0])
z_middle = np.mean(z_limits)
# The plot bounding box is a sphere in the sense of the infinity
# norm, hence I call half the max range the plot radius.
plot_radius = 0.5*max([x_range, y_range, z_range])
ax.set_xlim3d([x_middle - plot_radius, x_middle + plot_radius])
ax.set_ylim3d([y_middle - plot_radius, y_middle + plot_radius])
ax.set_zlim3d([z_middle - plot_radius, z_middle + plot_radius])
u, v, w = n.T
fig = plt.figure()
ax = fig.add_subplot(projection='3d')
# ax.set_aspect('equal')
# Make the grid
ax.quiver(X, Y, Z, u, v, w, length=10, normalize=True)
set_axes_equal(ax)
plt.show()
The surface normal for a point cloud is not well defined. One way to define them is from the surface normal of a reconstructed mesh using triangulation (which can introduce artefacts regarding you specific input). A relatively simple and fast solution is to use VTK to do that, and more specifically, vtkSurfaceReconstructionFilter and vtkPolyDataNormals . Regarding your needs, it might be useful to apply other filters.
Goal
I would like to compute the 3D volume integral of a numeric scalar field.
Code
For this post, I will use an example of which the integral can be exactly computed. I have therefore chosen the following function:
In Python, I define the function, and a set of points in 3D, and then generate the discrete values at these points:
import numpy as np
# Make data.
def function(x, y, z):
return x**y**z
N = 5
grid = np.meshgrid(
np.linspace(0, 1, N),
np.linspace(0, 1, N),
np.linspace(0, 1, N)
)
points = np.vstack(list(map(np.ravel, grid))).T
x = points[:, 0]
y = points[:, 1]
z = points[:, 2]
values = [function(points[i, 0], points[i, 1], points[i, 2])
for i in range(len(points))]
Question
How can I find the integral, if I don't know the underlying function, i.e. if I only have the coordinates (x, y, z) and the values?
A nice way to go about this would be using scipy's tplquad integration. However, to use that, we need a function and not a cloud point.
An easy way around that is to use an interpolator, to get a function approximating our cloud point - we can for example use scipy's RegularGridInterpolator if the data is on a regular grid:
import numpy as np
from scipy import integrate
from scipy.interpolate import RegularGridInterpolator
# Make data.
def function(x,y,z):
return x*y*z
N = 5
xmin, xmax = 0, 1
ymin, ymax = 0, 1
zmin, zmax = 0, 1
x = np.linspace(xmin, xmax, N)
y = np.linspace(ymin, ymax, N)
z = np.linspace(zmin, zmax, N)
values = function(*np.meshgrid(x,y,z, indexing='ij'))
# Interpolate:
function_interpolated = RegularGridInterpolator((x, y, z), values)
# tplquad integrates func(z,y,x)
f = lambda z,y,x : my_interpolating_function([z,y,x])
result, error = integrate.tplquad(f, xmin, xmax, lambda _: ymin, lambda _:ymax,lambda *_: zmin, lambda *_: zmax)
In the example above, we get result = 0.12499999999999999 - close enough!
The easiest way to achieve what you are looking for is probably scipy's integration function. Here your example:
from scipy import integrate
# Make data.
def func(x,y,z):
return x**y**z
ranges = [[0,1], [0,1], [0,1]]
result, error = integrate.nquad(func, ranges)
Are you aware that the function that you created is different from the one that you show in the image. The one you created is an exponential (x^y^z) while the one that you are showing is just multiplications. If you want to represent the function in the image, use
def func(x,y,z):
return x*y*z
Hope this answers your question, otherwise just write a comment!
Edit:
Misread your post. If you only have the results, and they are not regularly spaced, you would have to figure out some form of interpolation (i.e. linear) and a lookup-table. If you do not know how to create that, let me know. The rest of the stated answer could still be used if you define func to return interpolated values from your original data
The first answer explains nicely the principal approach to handle this. Just wanted to illustrate an alternative way by showing the power of sklearn package and machine learning regression.
Doing the meshgrid in 3D gives a very large numpy array,
import numpy as np
N = 5
xmin, xmax = 0, 1
ymin, ymax = 0, 1
zmin, zmax = 0, 1
x = np.linspace(xmin, xmax, N)
y = np.linspace(ymin, ymax, N)
z = np.linspace(zmin, zmax, N)
grid = np.array(np.meshgrid(x,y,z, indexing='ij'))
grid.shape = (3, 5, 5, 5) # 2*5*5*5 = 250 numbers
Which is visually not very intuitive with 250 numbers. With different possible indexing ('ij' or 'xy'). Using regression we can get the same result with few input points (15-20).
# building random combinations from (x,y,z)
X = np.random.choice(x, 20)[:,None]
Y = np.random.choice(y, 20)[:,None]
Z = np.random.choice(z, 20)[:,None]
xyz = np.concatenate((X,Y,Z), axis = 1)
data = np.multiply.reduce(xyz, axis = 1)
So the input (grid) is just a 2D numpy array,
xyz.shape
(20, 3)
With the corresponding data,
data.shape = (20,)
Now the regression function and integration,
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from scipy import integrate
pipe=Pipeline([('polynomial',PolynomialFeatures(degree=3)),('modal',LinearRegression())])
pipe.fit(xyz, data)
def func(x,y,z):
return pipe.predict([[x, y, z]])
ranges = [[0,1], [0,1], [0,1]]
result, error = integrate.nquad(func, ranges)
print(result)
0.1257
This approach is useful with limited number of points.
Based on your requirements, it sounds like the most appropriate technique would be Monte Carlo integration:
# Step 0 start with some empirical data
observed_points = np.random.uniform(0,1,size=(10000,3))
unknown_fn = lambda x: np.prod(x) # just used to generate fake values
observed_values = np.apply_along_axis(unknown_fn, 1, observed_points)
K = 1000000
# Step 1 - assume that f(x,y,z) can be approximated by an interpolation
# of the data we have (you could get really fancy with the
# selection of interpolation method - we'll stick with straight lines here)
from scipy.interpolate import LinearNDInterpolator
f_interpolate = LinearNDInterpolator(observed_points, observed_values)
# Step 2 randomly sample from within convex hull of observed data
# Step 2a - Uniformly sample from bounding 3D-box of data
lower_bounds = observed_points.min(axis=0)
upper_bounds = observed_points.max(axis=0)
sampled_points = np.random.uniform(lower_bounds, upper_bounds,size=(K, 3))
# Step 2b - Reject points outside of convex hull...
# Luckily, we get a np.nan from LinearNDInterpolator in this case
sampled_values = f_interpolate(sampled_points)
rejected_idxs = np.argwhere(np.isnan(sampled_values))
# Step 2c - Remember accepted values of estimated f(x_i, y_i, z_i)
final_sampled_values = np.delete(sampled_values, rejected_idxs, axis=0)
# Step 3 - Calculate estimate of volume of observed data domain
# Since we sampled uniformly from the convex hull of data domain,
# each point was selected with P(x,y,z)= 1 / Volume of convex hull
volume = scipy.spatial.ConvexHull(observed_points).volume
# Step 4 - Multiply estimated volume of domain by average sampled value
I_hat = volume * final_sampled_values.mean()
print(I_hat)
For a derivation of why this works see this: https://cs.dartmouth.edu/wjarosz/publications/dissertation/appendixA.pdf
So I am trying to plot the nullclines of a system of ODEs, however I can't seem to plot them in the correct way. When I plot them, I manage to plot them according to time (t vs x and t vs y) but not at (x vs y). I'm not really sure how to explain it, and I think it would be better to just show it. I am trying to replicate this. The equations and parameters are given, however this was done in a program called XPP (I'll post these at the bottom), and there are some parameters that i don't understand what they mean.
My entire code is:
import numpy as np
from scipy import integrate
import matplotlib.pyplot as plt
# define system in terms of a Numpy array
def Sys(X, t=0):
# here X[0] = x and x[1] = y
#protien [] is represented with y, and mRNA [] is represented by x
return np.array([ (k1*S*Kd**p)/(Kd**p + X[1]**p) - kdx*X[0], ksy*X[0] - (k2*ET*X[1])/(Km + X[1])])
#variables
k1=.1
S=1
Kd=1
kdx=.1
p=2
ksy=1
k2=1
ET=1
Km=1
# generate 1000 linearly spaced numbers for x-axes
t = np.linspace(0, 50,100)
# initial values
Sys0 = np.array([1, 0])
#Solves the ODE
X, infodict = integrate.odeint(Sys, Sys0, t, full_output = 1, mxstep = 50000)
#assigns appropriate equations to x and y
x,y = X.T
#plot's the graph
fig = plt.figure(figsize=(15,5))
fig.subplots_adjust(wspace = 0.5, hspace = 0.3)
ax1 = fig.add_subplot(1,2,1)
ax1.plot(x, color="blue")
ax1.plot(y, color = 'red')
ax1.set_xlabel("Protien concentration")
ax1.set_ylabel("mRNA concentration")
ax1.set_title("Phase space")
ax1.grid()
The given equations and parameters are:
model for a simple negative feedback loop
protein (y) inhibits the synthesis of its mRNA (x)
dx/dt = k1SKd^p/(Kd^p + y^p) - kdx*x
dy/dt = ksyx - k2ET*y/(Km + y)
p k1=0.1, S=1, Kd=1, kdx=0.1, p=2
p ksy=1, k2=1, ET=1, Km=1
# XP=y, YP=x, TOTAL=100, METH=stiff, XLO=0, XHI=4, YLO=0, YHI=1.05 (I don't exactly understand what is going on here)
Again, this uses a program called XPP or WINPP.
Any help with this would be appreciated, the original paper I am trying to replicate this from is : Design principles of biochemical oscillators by Bela Novak and John J. Tyson
Lets say I have a path in a 2d-plane given by a parametrization, for example the archimedian spiral:
x(t) = a*φ*cos(φ), y(t) = a*φ*sin(φ)
Im looking for a way to discretize this with a numpy array,
the problem is if I use
a = 1
phi = np.arange(0, 10*np.pi, 0.1)
x = a*phi*np.cos(phi)
y = a*phi*np.sin(phi)
plt.plot(x,y, "ro")
I get a nice curve but the points don't have the same distance, for
growing φ the distance between 2 points gets larger.
Im looking for a nice and if possible fast way to do this.
It might be possible to get the exact analytical formula for your simple spiral, but I am not in the mood to do that and this might not be possible in a more general case. Instead, here is a numerical solution:
import matplotlib.pyplot as plt
import numpy as np
a = 1
phi = np.arange(0, 10*np.pi, 0.1)
x = a*phi*np.cos(phi)
y = a*phi*np.sin(phi)
dr = (np.diff(x)**2 + np.diff(y)**2)**.5 # segment lengths
r = np.zeros_like(x)
r[1:] = np.cumsum(dr) # integrate path
r_int = np.linspace(0, r.max(), 200) # regular spaced path
x_int = np.interp(r_int, r, x) # interpolate
y_int = np.interp(r_int, r, y)
plt.subplot(1,2,1)
plt.plot(x, y, 'o-')
plt.title('Original')
plt.axis([-32,32,-32,32])
plt.subplot(1,2,2)
plt.plot(x_int, y_int, 'o-')
plt.title('Interpolated')
plt.axis([-32,32,-32,32])
plt.show()
It calculates the length of all the individual segments, integrates the total path with cumsum and finally interpolates to get a regular spaced path. You might have to play with your step-size in phi, if it is too large you will see that the spiral is not a smooth curve, but instead built from straight line segments. Result:
I have some one dimensional data and fit it with a spline. Then I want to find the inflection points (ignoring saddle points) in it. Now I am searching the extrema of its first derivation by using scipy.signal.argrelmin (and argrelmax) on a lot of values generated by splev.
import scipy.interpolate
import scipy.optimize
import scipy.signal
import numpy as np
import matplotlib.pyplot as plt
import operator
y = [-1, 5, 6, 4, 2, 5, 8, 5, 1]
x = np.arange(0, len(y))
tck = scipy.interpolate.splrep(x, y, s=0)
print 'roots', scipy.interpolate.sproot(tck)
# output:
# [0.11381478]
xnew = np.arange(0, len(y), 0.01)
ynew = scipy.interpolate.splev(xnew, tck, der=0)
ynew_deriv = scipy.interpolate.splev(xnew, tck, der=1)
min_idxs = scipy.signal.argrelmin(ynew_deriv)
max_idxs = scipy.signal.argrelmax(ynew_deriv)
mins = zip(xnew[min_idxs].tolist(), ynew_deriv[min_idxs].tolist())
maxs = zip(xnew[max_idxs].tolist(), ynew_deriv[max_idxs].tolist())
inflection_points = sorted(mins + maxs, key=operator.itemgetter(0))
print 'inflection_points', inflection_points
# output:
# [(3.13, -2.9822449358974357),
# (5.03, 4.3817785256410255)
# (7.13, -4.867132628205128)]
plt.legend(['data','Cubic Spline', '1st deriv'])
plt.plot(x, y, 'o',
xnew, ynew, '-',
xnew, ynew_deriv, '-')
plt.show()
But this feels terribly wrong. I guess there is a possibility to find what I am looking for without generating so many values. Something like sproot but applicable to the second derivation perhaps?
The derivative of a B-spline is also a B-spline. You can therefore first fit a spline to your data, then use the derivative formula to construct the coefficients of the derivative spline, and finally use the spline root finding to get the roots of the derivative spline. These are then the maxima/minima of the original curve.
Here is code to do it: https://gist.github.com/pv/5504366
The relevant computation of the coefficients is:
t, c, k = scipys_spline_representation
# Compute the denominator in the differentiation formula.
dt = t[k+1:-1] - t[1:-k-1]
# Compute the new coefficients
d = (c[1:-1-k] - c[:-2-k]) * k / dt
# Adjust knots
t2 = t[1:-1]
# Pad coefficient array to same size as knots (FITPACK convention)
d = np.r_[d, [0]*k]
# Done, a new spline
new_spline_repr = t2, d, k-1