Python Keep points in spline interpolation - python

I wrote a code that performs a spline interpolation:
x1 = [ 0., 13.99576991, 27.99153981, 41.98730972, 55.98307963, 69.97884954, 83.97461944, 97.97038935, 111.9661593, 125.9619292, 139.9576991, 153.953469 ]
y1 = [ 1., 0.88675318, 0.67899118, 0.50012243, 0.35737022, 0.27081293, 0.18486778, 0.11043095, 0.08582272, 0.04946131, 0.04285015, 0.02901567]
x = np.array(x1)
y = np.array(y1)
# Interpolate the data using a cubic spline to "new_length" samples
new_length = 50
new_x = np.linspace(x.min(), x.max(), new_length)
new_y = sp.interpolate.interp1d(x, y, kind='cubic')(new_x)
But in the new dataset generated new_x and new_y the original points are eliminated, only the first and the last values are kept. I would like to keep the original points.

Right, linspace won't generate any of the values in x except the ones you pass to it (x.min() and x.max()).
I don't have a great snappy answer, but here is one way to do it:
# Interpolate the data using a cubic spline to "new_length" samples
new_length = 50
interpolated_x = np.linspace(x.min(), x.max(), new_length - len(x) + 2)
new_x = np.sort(np.append(interpolated_x, x[1:-1])) # include the original points
new_y = sp.interpolate.interp1d(x, y, kind='cubic')(new_x)
This code uses:
np.linspace to create as many extra points as we need
np.append to combine the array of extra points with the original points from x
np.sort to put the combined array in order

If your data is uniform, this is another way:
import numpy as np
def interpolate_add(x_add,num):
x_add_ls=[]
for i in range(x_add.shape[0]-1):
x_add_ls += (np.linspace(x_add[i],x_add[i+1],num+2)[0:-1]).tolist()
x_add_ls.append(x[-1])
return np.array(x_add_ls)
x=np.linspace(1,5,5)
print(x)
print(interpolate_add(x,3))
it prints:
[1. 2. 3. 4. 5.]
[1. 1.5 2. 2.5 3. 3.5 4. 4.5 5. ]
In the code, interpolate_add have two parameters
x_add is your data(numpy.array). Data shape is (N,)
num is the insert data number between two original data.
For example, if your data is array([1, 3]) and the num is 1, the result is array([1, 2, 3])

Related

Keep receiving Too many indices for array for interpolation

interp - Program to interpolate data using Lagrange
I am not able to complete the for-loop in the coding sequence below. I don't see anything wrong with it, since I choose np.empty(nplot) to create the 1D array for xi, and for some reason the loop won't fill those values.
def intrpf(xi,x,y):
"""Function to interpolate between data points
using Lagrange polynomial (quadratic)
Inputs
x Vector of x coordinates of data points (3 values)
y Vector of y coordinates of data points (3 values)
xi The x value where interpolation is computed
Output
yi The interpolation polynomial evaluated at xi
"""
#* Calculate yi = p(xi) using Lagrange polynomial
yi = ( (xi-x[1])*(xi-x[2])/((x[0]-x[1])*(x[0]-x[2])) * y[0]
+ (xi-x[0])*(xi-x[2])/((x[1]-x[0])*(x[1]-x[2])) * y[1]
+ (xi-x[0])*(xi-x[1])/((x[2]-x[0])*(x[2]-x[1])) * y[2] )
return yi
#* Initialize the data points to be fit by quadratic
x = np.empty(3)
y = np.empty(3)
print ('Enter data points as x,y pairs (e.g., [1, 2]')
for i in range(3):
temp = np.array(input('Enter data point: '))
x[i] = temp[0]
y[i] = temp[1]
#* Establish the range of interpolation (from x_min to x_max)
xr = np.array(input('Enter range of x values as [x_min, x_max]: '))
I'm getting stuck on this part, where it seems properly set up, but "Too many indices for array" appears on xi[i] within the for loop.
#* Find yi for the desired interpolation values xi using
# the function intrpf
nplot = 100 # Number of points for interpolation curve
xi = np.empty(nplot)
yi = np.empty(nplot)
for i in range(nplot) :
xi[i] = xr[0] + (xr[1]-xr[0])* i/float(nplot)
yi[i] = intrpf(xi[i], x, y) # Use intrpf function to interpolate
From the docs of np.array:
Parameters:
object: _array_like_
An array, any object exposing the array interface, an object whose array method returns an array, or any (nested) sequence.
This means array should receive something like a list, in order to make the casting, while input returns a string. What python is trying to do here at the end of the day is something like
np.array('[1, 2]')
While it might be tempting to do something like
np.array(eval(input()))
you should never do this because it is unsafe as it allows the user to execute any kind of code in your program. If you really need that kind of input I would suggest something like
np.array(list(map(int, input('Enter data point: ')
.replace('[','')
.replace(']','')
.split(','))))
The error occurs with your data input lines:
Enter data points as x,y pairs (e.g., [1, 2]
Enter data point: [1,2]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-6-8d648ad8c9e4> in <module>
22 for i in range(3):
23 temp = np.array(input('Enter data point: '))
---> 24 x[i] = temp[0]
25 y[i] = temp[1]
26
IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed
The code doesn't even get to " I choose np.empty(nplot) to create the 1D array for xi, and for some reason the loop won't fill those values." part.
When asking for help, give full and accurate information about the error.
If I change the input lines to:
...: x = np.empty(3)
...: y = np.empty(3)
...: print ('Enter data points as x,y pairs')
...: for i in range(3):
...: temp = input('Enter data point: ').split()
...: x[i] = temp[0]
...: y[i] = temp[1]
...:
...: #* Establish the range of interpolation (from x_min to x_max)
...: xr = np.array(input('Enter range of x values as x_min, x_max: ').split(),float)
Enter data points as x,y pairs
Enter data point: 1 2
Enter data point: 3 4
Enter data point: 5 6
Enter range of x values as x_min, x_max: 0 4
In [9]: x
Out[9]: array([1., 3., 5.])
In [10]: y
Out[10]: array([2., 4., 6.])
In [11]: xr
Out[11]: array([0., 4.])
Getting array values via user input is not ideal, but this at least works. input (in Py3) does not evaluate the inputs; it just returns a string. I split it (on default space), and then assign the values to an array. x is defined as a float array, so the x[i]=temp[0] takes care of converting the string to float. Similarly the xr line makes a float array from the string inputs. This input style is not very robust; I could easily raise an error with wrong input.
===
The rest of the code runs with this input:
In [12]: nplot = 100 # Number of points for interpolation curve
...: xi = np.empty(nplot)
...: yi = np.empty(nplot)
...: for i in range(nplot) :
...: xi[i] = xr[0] + (xr[1]-xr[0])* i/float(nplot)
...: yi[i] = intrpf(xi[i], x, y) # Use intrpf function to interpolate
...:
In [13]: xi
Out[13]:
array([0. , 0.04, 0.08, 0.12, 0.16, 0.2 , 0.24, 0.28, 0.32, 0.36, 0.4 ,
0.44, 0.48, 0.52, 0.56, 0.6 , 0.64, 0.68, 0.72, 0.76, 0.8 , 0.84,
...
3.52, 3.56, 3.6 , 3.64, 3.68, 3.72, 3.76, 3.8 , 3.84, 3.88, 3.92,
3.96])
In [14]: yi
Out[14]:
array([1. , 1.04, 1.08, 1.12, 1.16, 1.2 , 1.24, 1.28, 1.32, 1.36, 1.4 ,
1.44, 1.48, 1.52, 1.56, 1.6 , 1.64, 1.68, 1.72, 1.76, 1.8 , 1.84,
....
4.52, 4.56, 4.6 , 4.64, 4.68, 4.72, 4.76, 4.8 , 4.84, 4.88, 4.92,
4.96])

The components of numpy.gradient of a symmetric function are different

The gradient of a symmetric function should have same derivatives in all dimensions.
numpy.gradient is providing different components.
Here is a MWE.
import numpy as np
x = (-1,0,1)
y = (-1,0,1)
X,Y = np.meshgrid(x,y)
f = 1/(X*X + Y*Y +1.0)
print(f)
>> [[0.33333333 0.5 0.33333333]
[0.5 1. 0.5 ]
[0.33333333 0.5 0.33333333]]
This has same values in both dimensions.
But np.gradient(f) gives
[array([[ 0.16666667, 0.5 , 0.16666667],
[ 0. , 0. , 0. ],
[-0.16666667, -0.5 , -0.16666667]]),
array([[ 0.16666667, 0. , -0.16666667],
[ 0.5 , 0. , -0.5 ],
[ 0.16666667, 0. , -0.16666667]])]
Both the components of the gradient are different.
Why so?
What I am missing in interpretation of the output?
Let's walk through this step by step. So first, as correctly mentioned by meowgoesthedog
numpy calculates derivatives in a direction.
Numpy's way of calculating gradients
It's important to note that np.gradient uses centric differences meaning (for simplicity we look at just one direction):
grad_f[i] = (f[i+1] - f[i])/2 + (f[i] - f[i-1])/2 = (f[i+1] - f[i-1])/2
At the boundary numpy calculates (take the min as example)
grad_f[min] = f[min+1] - f[min]
grad_f[max] = f[max] - f[max-1]
In your case the boundary is 0 and 2.
2D case
If you use more than one dimension we need to the direction of the derivative into account. np.gradient calculates the derivatives in all possible directions. Let's reproduce your results:
Let's move alongside the columns, so we calculate with row vectors
f[1,:] - f[0,:]
Output
array([0.16666667, 0.5 , 0.16666667])
which is exactly the first row of the first element of your gradient.
The row is calculated with centered derivatives, therefore:
(f[2,:]-f[1,:])/2 + (f[1,:]-f[0,:])/2
Output
array([0., 0., 0.])
The third row:
f[2,:] - f[1,:]
Output
array([-0.16666667, -0.5 , -0.16666667])
For the other direction just exchange the : and the numbers and take in mind that you are now calculating column vectors. This leads directly to the transposed derivative in the case of a symmetric function, like in your case.
3D case
x_ = (-1,0,4)
y_ = (-3,0,1)
z_ = (-1,0,12)
x, y, z = np.meshgrid(x_, y_, z_, indexing='ij')
f = 1/(x**2 + y**2 + z**2 + 1)
np.gradient(f)[1]
Output
array([[[ *2.50000000e-01, 4.09090909e-01, 3.97702165e-04*],
[ 8.33333333e-02, 1.21212121e-01, 1.75554093e-04],
[-8.33333333e-02, -1.66666667e-01, -4.65939801e-05]],
[[ **4.09090909e-01, 9.00000000e-01, 4.03045231e-04**],
[ 1.21212121e-01, 2.00000000e-01, 1.77904287e-04],
[-1.66666667e-01, -5.00000000e-01, -4.72366556e-05]],
[[ ***1.85185185e-02, 2.03619910e-02, 3.28827183e-04***],
[ 7.79727096e-03, 8.54700855e-03, 1.45243282e-04],
[-2.92397661e-03, -3.26797386e-03, -3.83406181e-05]]])
The gradient which is given here is calculated along rows (0 would be along matrices, 1 along rows, 2 along columns).
This can be calculated by
(f[:,1,:] - f[:,0,:])
Output
array([[*2.50000000e-01, 4.09090909e-01, 3.97702165e-04*],
[**4.09090909e-01, 9.00000000e-01, 4.03045231e-04**],
[***1.85185185e-02, 2.03619910e-02, 3.28827183e-04***]])
I added the asteriks so that it becomes clear where to find corresponding row vectors. Since we calculated the gradient in direction 1 we have to look for row vectors.
If one wants to reproduce the whole gradient, this is done by
np.stack(((f[:,1,:] - f[:,0,:]), (f[:,2,:] - f[:,0,:])/2, (f[:,2,:] - f[:,1,:])), axis=1)
n-dim case
We can generalize the things we learned to here to calculate gradients of arbitrary functions along directions.
def grad_along_axis(f, ax):
f_grad_ind = []
for i in range(f.shape[ax]):
if i == 0:
f_grad_ind.append(np.take(f, i+1, ax) - np.take(f, i, ax))
elif i == f.shape[ax] -1:
f_grad_ind.append(np.take(f, i, ax) - np.take(f, i-1, ax))
else:
f_grad_ind.append((np.take(f, i+1, ax) - np.take(f, i-1, ax))/2)
f_grad = np.stack(f_grad_ind, axis=ax)
return f_grad
where
np.take(f, i, ax) = f[:,...,i,...,:]
and i is at index ax.
Usually gradients and jacobians are operators on functions
Id you need the gradient of f = 1/(X*X + Y*Y +1.0) then you have to compute it symbolically. Or estimate it with numerical methods that use that function.
I do not know what a gradient of a constant 3d array is. numpy.gradient is a one dimensional concept.
Python has the sympy package that can automatically compute jacobians symbolically.
If by second order derivative of a scalar 3d field you mean a laplacian then you can estimate that with a standard 4 point stencil.

Getting coefficients of a cubic spline from scipy.interpolate.splrep

I am doing a cubic spline interpolation using scipy.interpolate.splrep as following:
import numpy as np
import scipy.interpolate
x = np.linspace(0, 10, 10)
y = np.sin(x)
tck = scipy.interpolate.splrep(x, y, task=0, s=0)
F = scipy.interpolate.PPoly.from_spline(tck)
I print t and c:
print F.x
array([ 0. , 0. , 0. , 0. ,
2.22222222, 3.33333333, 4.44444444, 5.55555556,
6.66666667, 7.77777778, 10. , 10. ,
10. , 10. ])
print F.c
array([[ -1.82100357e-02, -1.82100357e-02, -1.82100357e-02,
-1.82100357e-02, 1.72952212e-01, 1.26008293e-01,
-4.93704109e-02, -1.71230879e-01, -1.08680287e-01,
1.00658224e-01, 1.00658224e-01, 1.00658224e-01,
1.00658224e-01],
[ -3.43151441e-01, -3.43151441e-01, -3.43151441e-01,
-3.43151441e-01, -4.64551679e-01, 1.11955696e-01,
5.31983340e-01, 3.67415303e-01, -2.03354294e-01,
-5.65621916e-01, 1.05432909e-01, 1.05432909e-01,
1.05432909e-01],
[ 1.21033389e+00, 1.21033389e+00, 1.21033389e+00,
1.21033389e+00, -5.84561936e-01, -9.76335250e-01,
-2.60847433e-01, 7.38484392e-01, 9.20774403e-01,
6.63563923e-02, -9.56285846e-01, -9.56285846e-01,
-9.56285846e-01],
[ -4.94881722e-18, -4.94881722e-18, -4.94881722e-18,
-4.94881722e-18, 7.95220057e-01, -1.90567963e-01,
-9.64317117e-01, -6.65101515e-01, 3.74151231e-01,
9.97097891e-01, -5.44021111e-01, -5.44021111e-01,
-5.44021111e-01]])
So I had supplied the x array as :
array([ 0. , 1.11111111, 2.22222222, 3.33333333,
4.44444444, 5.55555556, 6.66666667, 7.77777778,
8.88888889, 10. ])
Q.1: The F.x (knots) are not the same as original x array and has duplicate values (possibly to force first derivative to zero?). Also some values in x (1.11111111, 8.88888889) are missing in F.x. Any ideas?
Q.2 The shape of F.c is (4, 13). I understand that 4 comes from the fact that it is cubic spline fit. But I do not know how do I select coefficients for each of the 9 sections that I want (from x = 0 to x=1.11111, x = 1.111111 to x = 2.222222 and so on). Any help in extraction of the coefficients for different segments would be appreciated.
If you want to have the knots in specific locations along the curves you need to use the argument task=-1 of splrep and give an array of interior knots as the t argument.
The knots in t must satisfy the following condition:
If provided, knots t must satisfy the Schoenberg-Whitney conditions, i.e., there must be a subset of data points x[j] such that t[j] < x[j] < t[j+k+1], for j=0, 1,...,n-k-2.
See the documentation here.
Then you should get F.c of the following size (4, <length of t> + 2*(k+1)-1) corresponding to the consecutive intervals along the curve (k+1 knots are added at either end of the curve by splrep).
Try the following:
import numpy as np
import scipy.interpolate
x = np.linspace(0, 10, 20)
y = np.sin(x)
t = np.linspace(0, 10, 10)
tck = scipy.interpolate.splrep(x, y, t=t[1:-1])
F = scipy.interpolate.PPoly.from_spline(tck)
print(F.x)
print(F.c)
# Accessing coeffs of nth segment: index = k + n - 1
# Eg. for second segment:
print(F.c[:,4])

Get solution to overdetermined linear homogeneous system numpy

I'm trying to find the solution to overdetermined linear homogeneous system (Ax = 0) using numpy in order to get the least linear squares solution for a linear regression.
This is the code I am using to generate the linear regression:
N = 100
x_data = np.linspace(0, N-1, N)
m = +5
n = -5
y_model = m*x_data + n
y_noise = y_model + np.random.normal(0, +5, N)
I want to recover m and n from y_noise. In other words, I want to resolve the homogeneous system (Ax = 0) where "x = (m, n)" and "A = (x_data | 1 | -y_noise)". So I convert non-homogeneous (Ax = y) into homogeneous (Ax = 0) using this code:
A = np.array(np.vstack((x_data, np.ones(N), -y_noise)).T)
I know I could resolve non-homogeneous system using np.linalg.lstsq((x_data | 1), y_noise)) but I want to get the solution for homogeneous system. I am finding a problem with this function as it only returns the trivial solution (x = 0):
x = np.linalg.lstsq(A, np.zeros(N))[0] => array([ 0., 0., 0.])
I was thinking about using eigenvectors to get the solution but it seems not to work:
A_T_A = np.dot(A.T, A)
eigen_values, eigen_vectors = np.linalg.eig(A_T_A)
# eigenvectors
[[ -2.03500000e-01 4.89890000e+00 5.31170000e+00]
[ -3.10000000e-03 1.02230000e+00 -2.64330000e+01]
[ 1.00000000e+00 1.00000000e+00 1.00000000e+00]]
# eigenvectors normalized
[[ -0.98365497700 -4.744666220 1.0] # (m1, n1, 1)
[ 0.00304878118 0.210130914 1.0] # (m2, n2, 1)
[ 25.7752417000 -5.132910010 1.0]] # (m3, n3, 1)
Which none of them fits model parameters (m=+5, n=-5)
How can I find (m, n) correctly? Thanks!
I have already found how to fix it, the problem is how I was interpreting the output of np.linalg.eig function, but the approach using eigenvectors is right. In spite of that, #Stelios is in the right when he says that the function np.linalg.lstsq returns the trivial solution (x = 0) because matrix A is full column rank.
I was assuming the output of np.linalg.eig was:
[[m1 n1 1]
[m2 n2 1]
[m3 n3 1]]
But it is not, the correct format is:
[[m1 m2 m3]
[n1 n2 n3]
[ 1 1 1]]
So if we want to get the solution which better fits model paramaters (m, n), we have to choose the eigenvector with the smallest eigenvalue and normalize it:
A_T_A = np.dot(A_homo.T, A_homo)
eigen_values, eigen_vectors = np.linalg.eig(A_T_A)
# eigenvectors
[[ 1.96409304e-01 9.48763118e-01 -2.47531678e-01]
[ 2.94608003e-04 2.52391765e-01 9.67625088e-01]
[ -9.80521952e-01 1.90123494e-01 -4.92925776e-02]]
# MIN eigenvector
eigen_vector_min = eigen_vectors[:, np.argmin(eigen_values)]
[-0.24753168 0.96762509 -0.04929258]
# MIN eigenvector normalized
[ 5.02168258 -19.63023915 1. ] # [m, n, 1]
Finally we get that m = 5.02 and n = -19,6 which is a pretty good approximation.

Setting arbitrary axis value for a contour plot of form (x,y,f(x,y))?

So I have a data set that is in the matrix form:
x1, Y1, VALUE1
x2, Y1, VALUE2
x3, Y1, VALUE3
x1, Y2, VALUE4
x2, Y2, VALUE5
x3, Y2, VALUE6
and so on. I get my contours properly except my x and y axes go from say 1, 2, 3...N. This is fine because it is representing pixels so isn't incorrect, but I would like to change the axes values from pixels to the actual units. I can't seem to find a way to instruct contour to allow me to add this.
bsquare=np.reshape(value,(x length,y length))
blue=contour(bsquare,colors='b')
plt.show()
where xlength and ylength are the number of points in either axis.
plt.contour can be given arrays X, Y, Z then it takes the Z as the contour values and the X and Y are used on their respective axes. Here is a script that first makes some data to play with, then gets into an array of the form you describe:
import matplotlib.pyplot as plt
import numpy as np
# Make some test data
nx = 2
ny = 3
x = np.linspace(0, 3, nx)
y = np.linspace(50, 55, ny)
X, Y = np.meshgrid(x, y)
Z = np.sin(X) + Y
# Now get it into the form you describe
data = [[[x[i], y[j], Z[j, i]] for i in range(nx)] for j in range(ny)]
data = np.array(data)
print data
>>>
[[[ 0. 50. 50. ]
[ 3. 50. 50.14112001]]
[[ 0. 52.5 52.5 ]
[ 3. 52.5 52.64112001]]
[[ 0. 55. 55. ]
[ 3. 55. 55.14112001]]]
Note I am using a numpy.array not just a normal list this is important in the next step. Lets split up that data as I presume you have done into the x and y values and the values themselves:
# Now extract the data
x_values = data[:, :, 0]
y_values = data[:, :, 1]
values = data[:, :, 2]
Now all of these things are nx, ny arrays, that is they have the same shape as your bsquare. You can check this by printing values.shape and changing the integers nx, ny. Now I will plot three things:
Firstly as you have done simply contour plot the values, this automatically adds the axes values
Secondly I plot using the arrays to give the correct scalings and
Finally I will plot the origin data set to show it properly recovers the data.
You will need to compare the axis values with where the fake data was created:
fig, axes = plt.subplots(ncols=3, figsize=(10, 2))
axes[0].contour(values)
axes[1].contour(x_values, y_values, values)
axes[2].contour(X, Y, Z)
How you implement this will largely depend on how you have imported your data. If you can simply turn it into a numpy.array() then I think this will solve your issue.
Hopefully I understood your problem correctly.

Categories