I want to calculate the coefficients of a spline interpolation by scipy.
In MATLAB:
x=[0:3];
y=[0,1,4,0];
spl=spline(x,y);
disp(spl.coefs);
and it will return:
ans =
-1.5000 5.5000 -3.0000 0
-1.5000 1.0000 3.5000 1.0000
-1.5000 -3.5000 1.0000 4.0000
But i can't do that by interpolate.splrep in scipy. Can you tell me how to calc it?
I'm not sure there is any way to get exactly those coefficients from scipy. What scipy.interpolate.splrep gives you is the coefficients for the knots for a b-spline. What Matlab's spline gives you appears to be the partial polynomial coefficients describing the cubic equations connecting the points you pass in, which leads me to believe that the Matlab spline is a control-point based spline such as a Hermite or Catmull-Rom instead of a b-spline.
However, scipy.interpolate.interpolate.spltopp does provide a way to get the partial polynomial coefficients of a b-spline. Unfortunately, it doesn't seem to work very well.
>>> import scipy.interpolate
>>> x = [0, 1, 2, 3]
>>> y = [0, 1, 4, 0]
>>> tck = scipy.interpolate.splrep(x, y)
>>> tck
Out:
(array([ 0., 0., 0., 0., 3., 3., 3., 3.]),
array([ 3.19142761e-16, -3.00000000e+00, 1.05000000e+01,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00]),
3)
>>> pp = scipy.interpolate.interpolate.spltopp(tck[0][1:-1], tck[1], tck[2])
>>> pp.coeffs.T
Out:
array([[ -4.54540394e-322, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000],
[ -4.54540394e-322, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000],
[ -4.54540394e-322, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000],
[ 0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000],
[ 0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000]])
Note that there is one set of coefficients per knot, not one for each of the original points passed in. Also, multiplying the coefficients by the b-spline basis matrix doesn't seem to be very helpful.
>>> bsbm = array([[-1, 3, -3, 1], [ 3, -6, 3, 0], [-3, 0, 3, 0],
[ 1, 4, 1, 0]]) * 1.0/6
Out:
array([[-0.16666667, 0.5 , -0.5 , 0.16666667],
[ 0.5 , -1. , 0.5 , 0. ],
[-0.5 , 0. , 0.5 , 0. ],
[ 0.16666667, 0.66666667, 0.16666667, 0. ]])
>>> dot(pp.coeffs.T, bsbm)
Out:
array([[ 7.41098469e-323, -2.27270197e-322, 2.27270197e-322,
-7.41098469e-323],
[ 7.41098469e-323, -2.27270197e-322, 2.27270197e-322,
-7.41098469e-323],
[ 7.41098469e-323, -2.27270197e-322, 2.27270197e-322,
-7.41098469e-323],
[ 0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000],
[ 0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000]])
The FORTRAN Piecewise Polynomial Package, PPPack, has a command bsplpp that converts from B-spline to piecewise polynomial form, which may serve your needs. Unfortunately, there isn't a Python wrapper for PPPack at this time.
If you have scipy version >= 0.18.0 installed you can use CubicSpline function from scipy.interpolate for cubic spline interpolation.
You can check scipy version by running following commands in python:
#!/usr/bin/env python3
import scipy
scipy.version.version
If your scipy version is >= 0.18.0 you can run following example code for cubic spline interpolation:
#!/usr/bin/env python3
import numpy as np
from scipy.interpolate import CubicSpline
# calculate 5 natural cubic spline polynomials for 6 points
# (x,y) = (0,12) (1,14) (2,22) (3,39) (4,58) (5,77)
x = np.array([0, 1, 2, 3, 4, 5])
y = np.array([12,14,22,39,58,77])
# calculate natural cubic spline polynomials
cs = CubicSpline(x,y,bc_type='natural')
# show values of interpolation function at x=1.25
print('S(1.25) = ', cs(1.25))
## Aditional - find polynomial coefficients for different x regions
# if you want to print polynomial coefficients in form
# S0(0<=x<=1) = a0 + b0(x-x0) + c0(x-x0)^2 + d0(x-x0)^3
# S1(1< x<=2) = a1 + b1(x-x1) + c1(x-x1)^2 + d1(x-x1)^3
# ...
# S4(4< x<=5) = a4 + b4(x-x4) + c5(x-x4)^2 + d5(x-x4)^3
# x0 = 0; x1 = 1; x4 = 4; (start of x region interval)
# show values of a0, b0, c0, d0, a1, b1, c1, d1 ...
cs.c
# Polynomial coefficients for 0 <= x <= 1
a0 = cs.c.item(3,0)
b0 = cs.c.item(2,0)
c0 = cs.c.item(1,0)
d0 = cs.c.item(0,0)
# Polynomial coefficients for 1 < x <= 2
a1 = cs.c.item(3,1)
b1 = cs.c.item(2,1)
c1 = cs.c.item(1,1)
d1 = cs.c.item(0,1)
# ...
# Polynomial coefficients for 4 < x <= 5
a4 = cs.c.item(3,4)
b4 = cs.c.item(2,4)
c4 = cs.c.item(1,4)
d4 = cs.c.item(0,4)
# Print polynomial equations for different x regions
print('S0(0<=x<=1) = ', a0, ' + ', b0, '(x-0) + ', c0, '(x-0)^2 + ', d0, '(x-0)^3')
print('S1(1< x<=2) = ', a1, ' + ', b1, '(x-1) + ', c1, '(x-1)^2 + ', d1, '(x-1)^3')
print('...')
print('S5(4< x<=5) = ', a4, ' + ', b4, '(x-4) + ', c4, '(x-4)^2 + ', d4, '(x-4)^3')
# So we can calculate S(1.25) by using equation S1(1< x<=2)
print('S(1.25) = ', a1 + b1*0.25 + c1*(0.25**2) + d1*(0.25**3))
# Cubic spline interpolation calculus example
# https://www.youtube.com/watch?v=gT7F3TWihvk
Here is how I could get results similar to MATLAB:
>>> from scipy.interpolate import PPoly, splrep
>>> x = [0, 1, 2, 3]
>>> y = [0, 1, 4, 0]
>>> tck = splrep(x, y)
>>> tck
Out: (array([ 0., 0., 0., 0., 3., 3., 3., 3.]),
array([ 3.19142761e-16, -3.00000000e+00, 1.05000000e+01,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00]),
3)
>>> pp = PPoly.from_spline(tck)
>>> pp.c.T
Out: array([[ -1.50000000e+00, 5.50000000e+00, -3.00000000e+00,
3.19142761e-16],
[ -1.50000000e+00, 5.50000000e+00, -3.00000000e+00,
3.19142761e-16],
[ -1.50000000e+00, 5.50000000e+00, -3.00000000e+00,
3.19142761e-16],
[ -1.50000000e+00, 5.50000000e+00, -3.00000000e+00,
3.19142761e-16],
[ -1.50000000e+00, -8.00000000e+00, -1.05000000e+01,
0.00000000e+00],
[ -1.50000000e+00, -8.00000000e+00, -1.05000000e+01,
0.00000000e+00],
[ -1.50000000e+00, -8.00000000e+00, -1.05000000e+01,
0.00000000e+00]])
The docs on scipy.interpolate.splrep say that you can get the coefficients:
Returns:
tck : tuple
(t,c,k) a tuple containing the vector of knots, the B-spline coefficients, and the degree of the spline.
Related
I am receiving the right answer when I compute the Vandermonde
coefficients of this matrix. However, the output matrix is reversed.
It should be [6,-39,55,27] instead of [27,55,-39,6].
My output for my Vandermonde Matrix is flipped and the final solution
c, is flipped.
import numpy as np
from numpy import linalg as LA
x = np.array([[4],[2],[0],[-1]])
f = np.array([[7],[29],[27],[-73]])
def main():
A_matrix = VandermondeMatrix(x)
print(A_matrix)
c = LA.solve(A_matrix,f) #coefficients of Vandermonde Polynomial
print(c)
def VandermondeMatrix(x):
n = len(x)
A = np.zeros((n, n))
exponent = np.array(range(0,n))
for j in range(n):
A[j, :] = x[j]**exponent
return A
if __name__ == "__main__":
main()
Just make the exponent range the other way around from the beginning, then you don't have to flip afterwards reducing runtime:
def VandermondeMatrix(x):
n = len(x)
A = np.zeros((n, n))
exponent = np.array(range(n-1,-1,-1))
for j in range(n):
A[j, :] = x[j]**exponent
return A
Out:
#A_matrix:
[[64. 16. 4. 1.]
[ 8. 4. 2. 1.]
[ 0. 0. 0. 1.]
[-1. 1. -1. 1.]]
#c:
[[ 6.]
[-39.]
[ 55.]
[ 27.]]
np.flip(c)?
link to documentation
You could do
print(c[::-1])
which will reverse the order of c.
From How can I flip the order of a 1d numpy array?
There is a parameter that does exactly that: increasing=True
Example from the documentation:
x = np.array([1, 2, 3, 5])
np.vander(x)
array([[ 1, 1, 1, 1],
[ 8, 4, 2, 1],
[ 27, 9, 3, 1],
[125, 25, 5, 1]])
np.vander(x, increasing=True)
array([[ 1, 1, 1, 1],
[ 1, 2, 4, 8],
[ 1, 3, 9, 27],
[ 1, 5, 25, 125]])
In [3]: def VandermondeMatrix(x):
...: n = len(x)
...: A = np.zeros((n, n))
...: exponent = np.array(range(0,n))
...: for j in range(n):
...: A[j, :] = x[j]**exponent
...: return A
...:
In [4]: x = np.array([[4],[2],[0],[-1]])
In [5]: VandermondeMatrix(x)
Out[5]:
array([[ 1., 4., 16., 64.],
[ 1., 2., 4., 8.],
[ 1., 0., 0., 0.],
[ 1., -1., 1., -1.]])
In [6]: f = np.array([[7],[29],[27],[-73]])
In [7]: np.linalg.solve(_5,f)
Out[7]:
array([[ 27.],
[ 55.],
[-39.],
[ 6.]])
The result is a (4,1) array; reverse rows with:
In [9]: _7[::-1]
Out[9]:
array([[ 6.],
[-39.],
[ 55.],
[ 27.]])
Negative strides, [::-1] indexing is also used to reverse Python lists and strings.
In [10]: ['a','b','c'][::-1]
Out[10]: ['c', 'b', 'a']
I have noticed there is a difference between how matlab calculates the eigenvalue and eigenvector of a matrix, where matlab returns the real valued while numpy's return the complex valued eigen valus and vector. For example:
for matrix:
A=
1 -3 3
3 -5 3
6 -6 4
Numpy:
w, v = np.linalg.eig(A)
w
array([ 4. +0.00000000e+00j, -2. +1.10465796e-15j, -2. -1.10465796e-15j])
v
array([[-0.40824829+0.j , 0.24400118-0.40702229j,
0.24400118+0.40702229j],
[-0.40824829+0.j , -0.41621909-0.40702229j,
-0.41621909+0.40702229j],
[-0.81649658+0.j , -0.66022027+0.j , -0.66022027-0.j ]])
Matlab:
[E, D] = eig(A)
E
-0.4082 -0.8103 0.1933
-0.4082 -0.3185 -0.5904
-0.8165 0.4918 -0.7836
D
4.0000 0 0
0 -2.0000 0
0 0 -2.0000
Is there a way of getting the real eigen values in python as it is in matlab?
To get NumPy to return a diagonal array of real eigenvalues when the complex part is small, you could use
In [116]: np.real_if_close(np.diag(w))
Out[116]:
array([[ 4., 0., 0.],
[ 0., -2., 0.],
[ 0., 0., -2.]])
According to the Matlab docs,
[E, D] = eig(A) returns E and D which satisfy A*E = E*D:
I don't have Matlab, so I'll use Octave to check the result you posted:
octave:1> A = [[1, -3, 3],
[3, -5, 3],
[6, -6, 4]]
octave:6> E = [[ -0.4082, -0.8103, 0.1933],
[ -0.4082, -0.3185, -0.5904],
[ -0.8165, 0.4918, -0.7836]]
octave:25> D = [[4.0000, 0, 0],
[0, -2.0000, 0],
[0, 0, -2.0000]]
octave:29> abs(A*E - E*D)
ans =
3.0000e-04 0.0000e+00 3.0000e-04
3.0000e-04 2.2204e-16 3.0000e-04
0.0000e+00 4.4409e-16 6.0000e-04
The magnitude of the errors is mainly due to the values reported by Matlab being
displayed to a lower precision than the actual values Matlab holds in memory.
In NumPy, w, v = np.linalg.eig(A) returns w and v which satisfy
np.dot(A, v) = np.dot(v, np.diag(w)):
In [113]: w, v = np.linalg.eig(A)
In [135]: np.set_printoptions(formatter={'complex_kind': '{:+15.5f}'.format})
In [136]: v
Out[136]:
array([[-0.40825+0.00000j, +0.24400-0.40702j, +0.24400+0.40702j],
[-0.40825+0.00000j, -0.41622-0.40702j, -0.41622+0.40702j],
[-0.81650+0.00000j, -0.66022+0.00000j, -0.66022-0.00000j]])
In [116]: np.real_if_close(np.diag(w))
Out[116]:
array([[ 4., 0., 0.],
[ 0., -2., 0.],
[ 0., 0., -2.]])
In [112]: np.abs((np.dot(A, v) - np.dot(v, np.diag(w))))
Out[112]:
array([[4.44089210e-16, 3.72380123e-16, 3.72380123e-16],
[2.22044605e-16, 4.00296604e-16, 4.00296604e-16],
[8.88178420e-16, 1.36245817e-15, 1.36245817e-15]])
In [162]: np.abs((np.dot(A, v) - np.dot(v, np.diag(w)))).max()
Out[162]: 1.3624581677742195e-15
In [109]: np.isclose(np.dot(A, v), np.dot(v, np.diag(w))).all()
Out[109]: True
I have an array n×m, where n = 217000 and m = 3 (some data from telescope).
I need to calculate the distances between 2 points in 3D (according to my x, y, z coordinates in columns).
When I try to use sklearn tools the result is:
ValueError: array is too big; `arr.size * arr.dtype.itemsize` is larger than the maximum possible size.
What tool can I use in this situation and what max possible size for this tools?
What tool can I use in this situation...?
You could implement the euclidean distance function on your own using the approach suggested by #Saksow. Assuming that a and b are one-dimensional NumPy arrays, you could also use any of the methods proposed in this thread:
import numpy as np
np.linalg.norm(a-b)
np.sqrt(np.sum((a-b)**2))
np.sqrt(np.dot(a-b, a-b))
If you wish to compute in one go the pairwise distance (not necessarily the euclidean distance) between all the points in your array, the module scipy.spatial.distance is your friend.
Demo:
In [79]: from scipy.spatial.distance import squareform, pdist
In [80]: arr = np.asarray([[0, 0, 0],
...: [1, 0, 0],
...: [0, 2, 0],
...: [0, 0, 3]], dtype='float')
...:
In [81]: squareform(pdist(arr, 'euclidean'))
Out[81]:
array([[ 0. , 1. , 2. , 3. ],
[ 1. , 0. , 2.23606798, 3.16227766],
[ 2. , 2.23606798, 0. , 3.60555128],
[ 3. , 3.16227766, 3.60555128, 0. ]])
In [82]: squareform(pdist(arr, 'cityblock'))
Out[82]:
array([[ 0., 1., 2., 3.],
[ 1., 0., 3., 4.],
[ 2., 3., 0., 5.],
[ 3., 4., 5., 0.]])
Notice that the number of points in the mock data array used in this toy example is and the resulting pairwise distance array has elements.
...and what max possible size for this tools?
If you try to apply the approach above using your data () you get an error:
In [105]: data = np.random.random(size=(217000, 3))
In [106]: squareform(pdist(data, 'euclidean'))
Traceback (most recent call last):
File "<ipython-input-106-fd273331a6fe>", line 1, in <module>
squareform(pdist(data, 'euclidean'))
File "C:\Users\CPU 2353\Anaconda2\lib\site-packages\scipy\spatial\distance.py", line 1220, in pdist
dm = np.zeros((m * (m - 1)) // 2, dtype=np.double)
MemoryError
The issue is you are running out of RAM. To perform such computation you would need more than 350TB! The required amount of memory result from multiplying the number of elements of the distance matrix (2170002) by the number of bytes of each element of that matrix (8), and dividing this product by the apropriate factor (10243) to express the result in gigabytes:
In [107]: round(data.shape[0]**2 * data.dtype.itemsize / 1024.**3)
Out[107]: 350.8
So the maximum allowed size for your data is determined by the amount of available RAM (take a look at this thread for further details).
Using only Python and Euclidean distance formula for 3 dimensions:
import math
distance = math.sqrt((x1 - x2) ** 2 + (y1 - y2) ** 2 + (z1 - z2) ** 2)
Could someone care to explain the meshgrid method? I cannot wrap my mind around it. The example is from the [SciPy][1] site:
import numpy as np
nx, ny = (3, 2)
x = np.linspace(0, 1, nx)
print ("x =", x)
y = np.linspace(0, 1, ny)
print ("y =", y)
xv, yv = np.meshgrid(x, y)
print ("xv_1 =", xv)
print ("yv_1 =", yv)
xv, yv = np.meshgrid(x, y, sparse=True) # make sparse output arrays
print ("xv_2 =", xv)
print ("yv_2 =", yv)
Printout is :
x = [ 0. 0.5 1. ]
y = [ 0. 1.]
xv_1 = [[ 0. 0.5 1. ]
[ 0. 0.5 1. ]]
yv_1 = [[ 0. 0. 0.]
[ 1. 1. 1.]]
xv_2 = [[ 0. 0.5 1. ]]
yv_2 = [[ 0.]
[ 1.]]
Why are arrays xv_1 and yv_1 formed like this ? Ty :)
[1]: http://docs.scipy.org/doc/numpy/reference/generated/numpy.meshgrid.html#numpy.meshgrid
In [214]: nx, ny = (3, 2)
In [215]: x = np.linspace(0, 1, nx)
In [216]: x
Out[216]: array([ 0. , 0.5, 1. ])
In [217]: y = np.linspace(0, 1, ny)
In [218]: y
Out[218]: array([ 0., 1.])
Using unpacking to better see the 2 arrays produced by meshgrid:
In [225]: X,Y = np.meshgrid(x, y)
In [226]: X
Out[226]:
array([[ 0. , 0.5, 1. ],
[ 0. , 0.5, 1. ]])
In [227]: Y
Out[227]:
array([[ 0., 0., 0.],
[ 1., 1., 1.]])
and for the sparse version. Notice that X1 looks like one row of X (but 2d). and Y1 like one column of Y.
In [228]: X1,Y1 = np.meshgrid(x, y, sparse=True)
In [229]: X1
Out[229]: array([[ 0. , 0.5, 1. ]])
In [230]: Y1
Out[230]:
array([[ 0.],
[ 1.]])
When used in calculations like plus and times, both forms behave the same. That's because of numpy's broadcasting.
In [231]: X+Y
Out[231]:
array([[ 0. , 0.5, 1. ],
[ 1. , 1.5, 2. ]])
In [232]: X1+Y1
Out[232]:
array([[ 0. , 0.5, 1. ],
[ 1. , 1.5, 2. ]])
The shapes might also help:
In [235]: X.shape, Y.shape
Out[235]: ((2, 3), (2, 3))
In [236]: X1.shape, Y1.shape
Out[236]: ((1, 3), (2, 1))
The X and Y have more values than are actually needed for most uses. But usually there isn't much of penalty for using them instead the sparse versions.
Your linear spaced vectors x and y defined by linspace use 3 and 2 points respectively.
These linear spaced vectors are then used by the meshgrid function to create a 2D linear spaced point cloud. This will be a grid of points for each of the x and y coordinates. The size of this point cloud will be 3 x 2.
The output of the function meshgrid creates an indexing matrix that holds in each cell the x and y coordinates for each point of your space.
This is created as follows:
# dummy
def meshgrid_custom(x,y):
xv = np.zeros((len(x),len(y)))
yv = np.zeros((len(x),len(y)))
for i,ix in zip(range(len(x)),x):
for j,jy in zip(range(len(y)),y):
xv[i,j] = ix
yv[i,j] = jy
return xv.T, yv.T
So, for example the point at the location (1,1) has the coordinates:
x = xv_1[1,1] = 0.5
y = yv_1[1,1] = 1.0
There are many questions already asked in the same grounds.
I also read the official documentation (http://www.scipy.org/scipylib/faq.html#what-is-the-difference-between-matrices-and-arrays) regarding the differences. But I am still struggling to understand the philosophical difference between numpy arrays and matrices.
More preciously I am seeking the reason for the below mention results.
#using array
>>> A = np.array([[ 1, -1, 2],
[ 0, 1, -1],
[ 0, 0, 1]])
>>> b = np.array([5,-1,3])
>>> x = np.linalg.solve(A,b)
>>> x
array([ 1., 2., 3.])
`#using matrix
>>> A=np.mat(A)
>>> b=np.mat(b)
>>> A
matrix([[ 1, -1, 2],
[ 0, 1, -1],
[ 0, 0, 1]])
>>> b
matrix([[ 5, -1, 3]])
>>> x = np.linalg.solve(A,b)
>>> x
matrix([[ 5., -1., 3.],
[ 10., -2., 6.],
[ 5., -1., 3.]])
Why the linear equations represented as array yields correct solution while the matrix representation yields another matrix solution.
Also honestly I don't understand the reason for getting matrix as a solution in the second case.
Sorry if the question is already answered and I failed to notice and also pardon me if my understanding of numpy array and matrix is wrong.
You have a transpose issue...when you go to matrix land, column-vectors and row-vectors are no longer interchangeable:
import numpy as np
A = np.array([[ 1, -1, 2],
[ 0, 1, -1],
[ 0, 0, 1]])
b = np.array([5,-1,3])
x = np.linalg.solve(A, b)
print 'arrays:'
print x
A = np.matrix(A)
b = np.matrix(b)
x = np.linalg.solve(A, b)
print 'matrix, wrong set up:'
print x
b = b.T
x = np.linalg.solve(A, b)
print 'matrix, right set up:'
print x
yields:
arrays:
[ 1. 2. 3.]
matrix, wrong set up:
[[ 5. -1. 3.]
[ 10. -2. 6.]
[ 5. -1. 3.]]
matrix, right set up:
[[ 1.]
[ 2.]
[ 3.]]