I am doing a cubic spline interpolation using scipy.interpolate.splrep as following:
import numpy as np
import scipy.interpolate
x = np.linspace(0, 10, 10)
y = np.sin(x)
tck = scipy.interpolate.splrep(x, y, task=0, s=0)
F = scipy.interpolate.PPoly.from_spline(tck)
I print t and c:
print F.x
array([ 0. , 0. , 0. , 0. ,
2.22222222, 3.33333333, 4.44444444, 5.55555556,
6.66666667, 7.77777778, 10. , 10. ,
10. , 10. ])
print F.c
array([[ -1.82100357e-02, -1.82100357e-02, -1.82100357e-02,
-1.82100357e-02, 1.72952212e-01, 1.26008293e-01,
-4.93704109e-02, -1.71230879e-01, -1.08680287e-01,
1.00658224e-01, 1.00658224e-01, 1.00658224e-01,
1.00658224e-01],
[ -3.43151441e-01, -3.43151441e-01, -3.43151441e-01,
-3.43151441e-01, -4.64551679e-01, 1.11955696e-01,
5.31983340e-01, 3.67415303e-01, -2.03354294e-01,
-5.65621916e-01, 1.05432909e-01, 1.05432909e-01,
1.05432909e-01],
[ 1.21033389e+00, 1.21033389e+00, 1.21033389e+00,
1.21033389e+00, -5.84561936e-01, -9.76335250e-01,
-2.60847433e-01, 7.38484392e-01, 9.20774403e-01,
6.63563923e-02, -9.56285846e-01, -9.56285846e-01,
-9.56285846e-01],
[ -4.94881722e-18, -4.94881722e-18, -4.94881722e-18,
-4.94881722e-18, 7.95220057e-01, -1.90567963e-01,
-9.64317117e-01, -6.65101515e-01, 3.74151231e-01,
9.97097891e-01, -5.44021111e-01, -5.44021111e-01,
-5.44021111e-01]])
So I had supplied the x array as :
array([ 0. , 1.11111111, 2.22222222, 3.33333333,
4.44444444, 5.55555556, 6.66666667, 7.77777778,
8.88888889, 10. ])
Q.1: The F.x (knots) are not the same as original x array and has duplicate values (possibly to force first derivative to zero?). Also some values in x (1.11111111, 8.88888889) are missing in F.x. Any ideas?
Q.2 The shape of F.c is (4, 13). I understand that 4 comes from the fact that it is cubic spline fit. But I do not know how do I select coefficients for each of the 9 sections that I want (from x = 0 to x=1.11111, x = 1.111111 to x = 2.222222 and so on). Any help in extraction of the coefficients for different segments would be appreciated.
If you want to have the knots in specific locations along the curves you need to use the argument task=-1 of splrep and give an array of interior knots as the t argument.
The knots in t must satisfy the following condition:
If provided, knots t must satisfy the Schoenberg-Whitney conditions, i.e., there must be a subset of data points x[j] such that t[j] < x[j] < t[j+k+1], for j=0, 1,...,n-k-2.
See the documentation here.
Then you should get F.c of the following size (4, <length of t> + 2*(k+1)-1) corresponding to the consecutive intervals along the curve (k+1 knots are added at either end of the curve by splrep).
Try the following:
import numpy as np
import scipy.interpolate
x = np.linspace(0, 10, 20)
y = np.sin(x)
t = np.linspace(0, 10, 10)
tck = scipy.interpolate.splrep(x, y, t=t[1:-1])
F = scipy.interpolate.PPoly.from_spline(tck)
print(F.x)
print(F.c)
# Accessing coeffs of nth segment: index = k + n - 1
# Eg. for second segment:
print(F.c[:,4])
Related
The gradient of a symmetric function should have same derivatives in all dimensions.
numpy.gradient is providing different components.
Here is a MWE.
import numpy as np
x = (-1,0,1)
y = (-1,0,1)
X,Y = np.meshgrid(x,y)
f = 1/(X*X + Y*Y +1.0)
print(f)
>> [[0.33333333 0.5 0.33333333]
[0.5 1. 0.5 ]
[0.33333333 0.5 0.33333333]]
This has same values in both dimensions.
But np.gradient(f) gives
[array([[ 0.16666667, 0.5 , 0.16666667],
[ 0. , 0. , 0. ],
[-0.16666667, -0.5 , -0.16666667]]),
array([[ 0.16666667, 0. , -0.16666667],
[ 0.5 , 0. , -0.5 ],
[ 0.16666667, 0. , -0.16666667]])]
Both the components of the gradient are different.
Why so?
What I am missing in interpretation of the output?
Let's walk through this step by step. So first, as correctly mentioned by meowgoesthedog
numpy calculates derivatives in a direction.
Numpy's way of calculating gradients
It's important to note that np.gradient uses centric differences meaning (for simplicity we look at just one direction):
grad_f[i] = (f[i+1] - f[i])/2 + (f[i] - f[i-1])/2 = (f[i+1] - f[i-1])/2
At the boundary numpy calculates (take the min as example)
grad_f[min] = f[min+1] - f[min]
grad_f[max] = f[max] - f[max-1]
In your case the boundary is 0 and 2.
2D case
If you use more than one dimension we need to the direction of the derivative into account. np.gradient calculates the derivatives in all possible directions. Let's reproduce your results:
Let's move alongside the columns, so we calculate with row vectors
f[1,:] - f[0,:]
Output
array([0.16666667, 0.5 , 0.16666667])
which is exactly the first row of the first element of your gradient.
The row is calculated with centered derivatives, therefore:
(f[2,:]-f[1,:])/2 + (f[1,:]-f[0,:])/2
Output
array([0., 0., 0.])
The third row:
f[2,:] - f[1,:]
Output
array([-0.16666667, -0.5 , -0.16666667])
For the other direction just exchange the : and the numbers and take in mind that you are now calculating column vectors. This leads directly to the transposed derivative in the case of a symmetric function, like in your case.
3D case
x_ = (-1,0,4)
y_ = (-3,0,1)
z_ = (-1,0,12)
x, y, z = np.meshgrid(x_, y_, z_, indexing='ij')
f = 1/(x**2 + y**2 + z**2 + 1)
np.gradient(f)[1]
Output
array([[[ *2.50000000e-01, 4.09090909e-01, 3.97702165e-04*],
[ 8.33333333e-02, 1.21212121e-01, 1.75554093e-04],
[-8.33333333e-02, -1.66666667e-01, -4.65939801e-05]],
[[ **4.09090909e-01, 9.00000000e-01, 4.03045231e-04**],
[ 1.21212121e-01, 2.00000000e-01, 1.77904287e-04],
[-1.66666667e-01, -5.00000000e-01, -4.72366556e-05]],
[[ ***1.85185185e-02, 2.03619910e-02, 3.28827183e-04***],
[ 7.79727096e-03, 8.54700855e-03, 1.45243282e-04],
[-2.92397661e-03, -3.26797386e-03, -3.83406181e-05]]])
The gradient which is given here is calculated along rows (0 would be along matrices, 1 along rows, 2 along columns).
This can be calculated by
(f[:,1,:] - f[:,0,:])
Output
array([[*2.50000000e-01, 4.09090909e-01, 3.97702165e-04*],
[**4.09090909e-01, 9.00000000e-01, 4.03045231e-04**],
[***1.85185185e-02, 2.03619910e-02, 3.28827183e-04***]])
I added the asteriks so that it becomes clear where to find corresponding row vectors. Since we calculated the gradient in direction 1 we have to look for row vectors.
If one wants to reproduce the whole gradient, this is done by
np.stack(((f[:,1,:] - f[:,0,:]), (f[:,2,:] - f[:,0,:])/2, (f[:,2,:] - f[:,1,:])), axis=1)
n-dim case
We can generalize the things we learned to here to calculate gradients of arbitrary functions along directions.
def grad_along_axis(f, ax):
f_grad_ind = []
for i in range(f.shape[ax]):
if i == 0:
f_grad_ind.append(np.take(f, i+1, ax) - np.take(f, i, ax))
elif i == f.shape[ax] -1:
f_grad_ind.append(np.take(f, i, ax) - np.take(f, i-1, ax))
else:
f_grad_ind.append((np.take(f, i+1, ax) - np.take(f, i-1, ax))/2)
f_grad = np.stack(f_grad_ind, axis=ax)
return f_grad
where
np.take(f, i, ax) = f[:,...,i,...,:]
and i is at index ax.
Usually gradients and jacobians are operators on functions
Id you need the gradient of f = 1/(X*X + Y*Y +1.0) then you have to compute it symbolically. Or estimate it with numerical methods that use that function.
I do not know what a gradient of a constant 3d array is. numpy.gradient is a one dimensional concept.
Python has the sympy package that can automatically compute jacobians symbolically.
If by second order derivative of a scalar 3d field you mean a laplacian then you can estimate that with a standard 4 point stencil.
I have a distributional analysis algorithm for words. It generates observational vectors for each target word and from this table, I use stats.spearmanr() to calculate the distances (rescaled from [-1,1] to [0,1]), generating a distance matrix (Y). Then I use hierarchy.average() to obtain the clustering (Z). Finally, a dendrogram is generated and plotted.
The problem I have is this: the dendrogram scale varies with the number of target words. I was assuming that its distance axis varied along the [0,1] range (as obtained (and rescaled) with spearmanr()), as presented above. But it is [0, 0.5] for, say, 50 words, but [0, 1] for 150, and [0, 2] for 1000.
Why is that so (that the distance scale have values bigger than the ones in Y)?
I'd appreciate any ideas on this issue, because I can't seem to find any hint in the documentation and over the web (which makes me worried about making the wrong questions...). And I'd need a fixed scale or at least a way to know which one the dendrogram is using, for cut level specification purposes. Thanks in advance for any help.
The simplified code:
# coding: utf-8
# Estatísticas e visualização
import numpy as np
import scipy, random
import scipy.stats
# Clusterização e visualização do dendrograma
import scipy.cluster.hierarchy as hac
import matplotlib.pyplot as plt
def remap(x, in_min, in_max, out_min, out_max):
return (x - in_min) * (out_max - out_min) / (in_max - in_min) + out_min
random.seed('7622')
sizes = [50, 250, 500, 1000, 2000]
for n in sizes:
# Generate observation matrix
X = []
for i in range(n):
vet = []
for j in range(300):
# Generate random observations
vet.append(random.randint(0, 50))
X.append(vet)
# X is a matrix where lines are variables (target words) and columns are observations (contexts of occurrence)
Y = scipy.stats.spearmanr(X, axis=1)
# Y rescaling
for i in range(len(Y[0])):
Y[0][i] = [ remap(v, -1, 1, 0, 1) for v in Y[0][i] ]
print 'Y [', np.matrix(Y[0]).min(), ',', np.matrix(Y[0]).max(), ']'
# Clustering
Z = hac.average(Y[0])
print 'n=', n, \
'Z [', min([ el[2] for el in Z ]), ',', max([ el[2] for el in Z ]), ']'
[UPDATE] Results of the above code:
Y [ 0.401120498124 , 1.0 ]
n= 50 Z [ 0.634408300876 , 0.77633631869 ]
Y [ 0.379375733574 , 1.0 ]
n= 250 Z [ 0.775241869849 , 0.969704246048 ]
Y [ 0.37559031365 , 1.0 ]
n= 500 Z [ 0.935671154717 , 1.16505319575 ]
Y [ 0.370600337649 , 1.0 ]
n= 1000 Z [ 1.19646327361 , 1.47897594053 ]
Y [ 0.359010408057 , 1.0 ]
n= 2000 Z [ 1.56890165007 , 1.96898566034 ]
I have been trying to make a plot smoother like it is done here, but my Xs are datetime objects that are not compatible with linspace..
I convert the Xs to matplotlib dates:
Xnew = matplotlib.dates.date2num(X)
X_smooth = np.linspace(Xnew.min(), Xnew.max(), 10)
Y_smooth = spline(Xnew, Y, X_smooth)
But then I get an empty plot, as my Y_smooth is
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ]
for some unknown reason.
How can I make this work?
EDIT
Here's what I get when I print the variables, I see nothing abnormal :
X : [datetime.date(2016, 7, 31), datetime.date(2016, 7, 30), datetime.date(2016, 7, 29)]
X new: [ 736176. 736175. 736174.]
X new max: 736176.0
X new min: 736174.0
XSMOOTH [ 736174. 736174.22222222 736174.44444444 736174.66666667
736174.88888889 736175.11111111 736175.33333333 736175.55555556
736175.77777778 736176. ]
Y [711.74, 730.0, 698.0]
YSMOOTH [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
Your X values are reversed, scipy.interpolate.spline requires the independent variable to be monotonically increasing, and this method is deprecated - use interp1d instead (see below).
>>> from scipy.interpolate import spline
>>> import numpy as np
>>> X = [736176.0, 736175.0, 736174.0] # <-- your original X is decreasing
>>> Y = [711.74, 730.0, 698.0]
>>> Xsmooth = np.linspace(736174.0, 736176.0, 10)
>>> spline(X, Y, Xsmooth)
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
reverse X and Y first and it works
>>> spline(
... list(reversed(X)), # <-- reverse order of X so also
... list(reversed(Y)), # <-- reverse order of Y to match
... Xsmooth
... )
array([ 698. , 262.18297973, 159.33767533, 293.62017489,
569.18656683, 890.19293934, 1160.79538066, 1285.149979 ,
1167.41282274, 711.74 ])
Note that many spline interpolation methods require X to be monotonically increasing:
UnivariateSpline
x : (N,) array_like - 1-D array of independent input data. Must be increasing.
InterpolatedUnivariateSpline
x : (N,) array_like - Input dimension of data points – must be increasing
The default order of scipy.interpolate.spline is cubic. Because there are only 3 data points there are large differences between a cubic spline (order=3) and a quadratic spline (order=2). The plot below shows the difference between different order splines; note: 100 points were used to smooth the fitted curve more.
The documentation for scipy.interpolate.splineis vague and suggests it may not be supported. For example, it is not listed on the scipy.interpolate main page or on the interploation tutorial. The source for spline shows that it actually calls spleval and splmake which are listed under Additional Tools as:
Functions existing for backward compatibility (should not be used in new code).
I would follow cricket_007's suggestion and use interp1d. It is the currently suggested method, it is very well documented with detailed examples in both the tutorial and API, and it allows the independent variable to be unsorted (any order) by default (see assume_sorted argument in API).
>>> from scipy.interpolate import interp1d
>>> f = interp1d(X, Y, kind='quadratic')
>>> f(Xsmooth)
array([ 711.74 , 720.14123457, 726.06049383, 729.49777778,
730.45308642, 728.92641975, 724.91777778, 718.4271605 ,
709.4545679 , 698. ])
Also it will raise an error if the data is rank deficient.
>>> f = interp1d(X, Y, kind='cubic')
ValueError: x and y arrays must have at least 4 entries
I wrote a code that performs a spline interpolation:
x1 = [ 0., 13.99576991, 27.99153981, 41.98730972, 55.98307963, 69.97884954, 83.97461944, 97.97038935, 111.9661593, 125.9619292, 139.9576991, 153.953469 ]
y1 = [ 1., 0.88675318, 0.67899118, 0.50012243, 0.35737022, 0.27081293, 0.18486778, 0.11043095, 0.08582272, 0.04946131, 0.04285015, 0.02901567]
x = np.array(x1)
y = np.array(y1)
# Interpolate the data using a cubic spline to "new_length" samples
new_length = 50
new_x = np.linspace(x.min(), x.max(), new_length)
new_y = sp.interpolate.interp1d(x, y, kind='cubic')(new_x)
But in the new dataset generated new_x and new_y the original points are eliminated, only the first and the last values are kept. I would like to keep the original points.
Right, linspace won't generate any of the values in x except the ones you pass to it (x.min() and x.max()).
I don't have a great snappy answer, but here is one way to do it:
# Interpolate the data using a cubic spline to "new_length" samples
new_length = 50
interpolated_x = np.linspace(x.min(), x.max(), new_length - len(x) + 2)
new_x = np.sort(np.append(interpolated_x, x[1:-1])) # include the original points
new_y = sp.interpolate.interp1d(x, y, kind='cubic')(new_x)
This code uses:
np.linspace to create as many extra points as we need
np.append to combine the array of extra points with the original points from x
np.sort to put the combined array in order
If your data is uniform, this is another way:
import numpy as np
def interpolate_add(x_add,num):
x_add_ls=[]
for i in range(x_add.shape[0]-1):
x_add_ls += (np.linspace(x_add[i],x_add[i+1],num+2)[0:-1]).tolist()
x_add_ls.append(x[-1])
return np.array(x_add_ls)
x=np.linspace(1,5,5)
print(x)
print(interpolate_add(x,3))
it prints:
[1. 2. 3. 4. 5.]
[1. 1.5 2. 2.5 3. 3.5 4. 4.5 5. ]
In the code, interpolate_add have two parameters
x_add is your data(numpy.array). Data shape is (N,)
num is the insert data number between two original data.
For example, if your data is array([1, 3]) and the num is 1, the result is array([1, 2, 3])
Using some experimental data, I cannot for the life of me work out how to use splrep to create a B-spline. The data are here: http://ubuntuone.com/4ZFyFCEgyGsAjWNkxMBKWD
Here is an excerpt:
#Depth Temperature
1 14.7036
-0.02 14.6842
-1.01 14.7317
-2.01 14.3844
-3 14.847
-4.05 14.9585
-5.03 15.9707
-5.99 16.0166
-7.05 16.0147
and here's a plot of it with depth on y and temperature on x:
Here is my code:
import numpy as np
from scipy.interpolate import splrep, splev
tdata = np.genfromtxt('t-data.txt',
skip_header=1, delimiter='\t')
depth = tdata[:, 0]
temp = tdata[:, 1]
# Find the B-spline representation of 1-D curve:
tck = splrep(depth, temp)
### fails here with "Error on input data" returned. ###
I know I am doing something bleedingly stupid, but I just can't see it.
You just need to have your values from smallest to largest :). It shouldn't be a problem for you #a different ben, but beware readers from the future, depth[indices] will throw a TypeError if depth is a list instead of a numpy array!
>>> indices = np.argsort(depth)
>>> depth = depth[indices]
>>> temp = temp[indices]
>>> splrep(depth, temp)
(array([-7.05, -7.05, -7.05, -7.05, -5.03, -4.05, -3. , -2.01, -1.01,
1. , 1. , 1. , 1. ]), array([ 16.0147 , 15.54473241, 16.90606794, 14.55343229,
15.12525673, 14.0717599 , 15.19657895, 14.40437622,
14.7036 , 0. , 0. , 0. , 0. ]), 3)
Hat tip to #FerdinandBeyer for the suggestion of argsort instead of my ugly "zip the values, sort the zip, re-assign the values" method.