Smooth line with spline + datetime objects doesn't work

Smooth line with spline + datetime objects doesn't work - python

I have been trying to make a plot smoother like it is done here, but my Xs are datetime objects that are not compatible with linspace..
I convert the Xs to matplotlib dates:
Xnew = matplotlib.dates.date2num(X)
X_smooth = np.linspace(Xnew.min(), Xnew.max(), 10)
Y_smooth = spline(Xnew, Y, X_smooth)
But then I get an empty plot, as my Y_smooth is
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ]
for some unknown reason.
How can I make this work?
EDIT
Here's what I get when I print the variables, I see nothing abnormal :
X : [datetime.date(2016, 7, 31), datetime.date(2016, 7, 30), datetime.date(2016, 7, 29)]
X new: [ 736176. 736175. 736174.]
X new max: 736176.0
X new min: 736174.0
XSMOOTH [ 736174. 736174.22222222 736174.44444444 736174.66666667
736174.88888889 736175.11111111 736175.33333333 736175.55555556
736175.77777778 736176. ]
Y [711.74, 730.0, 698.0]
YSMOOTH [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

Your X values are reversed, scipy.interpolate.spline requires the independent variable to be monotonically increasing, and this method is deprecated - use interp1d instead (see below).
>>> from scipy.interpolate import spline
>>> import numpy as np
>>> X = [736176.0, 736175.0, 736174.0] # <-- your original X is decreasing
>>> Y = [711.74, 730.0, 698.0]
>>> Xsmooth = np.linspace(736174.0, 736176.0, 10)
>>> spline(X, Y, Xsmooth)
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
reverse X and Y first and it works
>>> spline(
... list(reversed(X)), # <-- reverse order of X so also
... list(reversed(Y)), # <-- reverse order of Y to match
... Xsmooth
... )
array([ 698. , 262.18297973, 159.33767533, 293.62017489,
569.18656683, 890.19293934, 1160.79538066, 1285.149979 ,
1167.41282274, 711.74 ])
Note that many spline interpolation methods require X to be monotonically increasing:
UnivariateSpline
x : (N,) array_like - 1-D array of independent input data. Must be increasing.
InterpolatedUnivariateSpline
x : (N,) array_like - Input dimension of data points – must be increasing
The default order of scipy.interpolate.spline is cubic. Because there are only 3 data points there are large differences between a cubic spline (order=3) and a quadratic spline (order=2). The plot below shows the difference between different order splines; note: 100 points were used to smooth the fitted curve more.
The documentation for scipy.interpolate.splineis vague and suggests it may not be supported. For example, it is not listed on the scipy.interpolate main page or on the interploation tutorial. The source for spline shows that it actually calls spleval and splmake which are listed under Additional Tools as:
Functions existing for backward compatibility (should not be used in new code).
I would follow cricket_007's suggestion and use interp1d. It is the currently suggested method, it is very well documented with detailed examples in both the tutorial and API, and it allows the independent variable to be unsorted (any order) by default (see assume_sorted argument in API).
>>> from scipy.interpolate import interp1d
>>> f = interp1d(X, Y, kind='quadratic')
>>> f(Xsmooth)
array([ 711.74 , 720.14123457, 726.06049383, 729.49777778,
730.45308642, 728.92641975, 724.91777778, 718.4271605 ,
709.4545679 , 698. ])
Also it will raise an error if the data is rank deficient.
>>> f = interp1d(X, Y, kind='cubic')
ValueError: x and y arrays must have at least 4 entries

Related

Why scipy's splrep shows error on input data?

While using scipy's splrep function to fit a cubic B-Spline for the below given data points, the output comes out to be an array of zeros and it says error with input data. I have checked the conditions written in the doc and input seems sane accordingly.
knot = [70.0]
X= [65. , 67.5, 70. , 72.5]
Y= [70.9277775 , 50.40025663 , 42.45372799 , 57.39316434]
Weight= [0.13514246 , 0.33885943 , 0.87606185 , 0.31531958]
SplineOutput=intp.splrep(X, Y, task=-1, t=knot, full_output=1, w=Weight)
SplineOutput
>>>((array([65. , 65. , 65. , 65. , 70. , 72.5, 72.5, 72.5, 72.5]), array([0., 0., 0., 0., 0., 0., 0., 0., 0.]), 3), 0.0, 10, 'Error on input data')
Any help about the source of this error and its cure would be appreciated. Thanks in advance!

From the documentation, under Notes
If provided, knots t must satisfy the Schoenberg-Whitney conditions, i.e., there must be a subset of data points x[j] such that t[j] < x[j] < t[j+k+1], for j=0, 1,...,n-k-2.
This effectively means that if k is 3, which I believe is the default, n must be at least 5. In your case, n is 4, hence why the error. Either provide an additional entry to x, y and w or decrease k. If you opt for the latter, keep the following in mind:
k : int, optional
The degree of the spline fit. It is recommended to use cubic splines. Even values of k should be avoided especially with small s values. 1 <= k <= 5

What does IFFT return in Python?

I need the inverse Fourier transform of a complex array. ifft should return a real array, but it returns another complex array.
In MATLAB,
a=ifft(fft(a)), but in Python it does not work like that.
a = np.arange(6)
m = ifft(fft(a))
m # Google says m should = a, but m is complex
Output :
array([0.+0.00000000e+00j, 1.+3.70074342e-16j, 2.+0.00000000e+00j,
3.-5.68396583e-17j, 4.+0.00000000e+00j, 5.-3.13234683e-16j])

The imaginary part is result floating precision number calculation error. If it is very small, it rather can be dropped.
Numpy has built-in function real_if_close, to do so:
>>> np.real_if_close(np.fft.ifft(np.fft.fft(a)))
array([0., 1., 2., 3., 4., 5.])
You can read about floating system limitations here:
https://docs.python.org/3.8/tutorial/floatingpoint.html

if the imaginary part is close to zero you could discard it:
import numpy as np
arr = np.array(
[
0.0 + 0.00000000e00j,
1.0 + 3.70074342e-16j,
2.0 + 0.00000000e00j,
3.0 - 5.68396583e-17j,
4.0 + 0.00000000e00j,
5.0 - 3.13234683e-16j,
]
)
if all(np.isclose(arr.imag, 0)):
arr = arr.real
# [ 0. 1. 2. 3. 4. 5.]
(that's what real_if_close does in one line as in R2RT's answer).

You can test like this:
import numpy as np
from numpy import fft
a = np.arange(6)
print(a)
f = np.fft.fft(a)
print(f)
m = np.fft.ifft(f)
print(m)
[0 1 2 3 4 5]
[15.+0.j -3.+5.19615242j -3.+1.73205081j -3.+0.j
-3.-1.73205081j -3.-5.19615242j]
[0.+0.j 1.+0.j 2.+0.j 3.+0.j 4.+0.j 5.+0.j]
To get the real part only you can use:
print(m.real) # [0. 1. 2. 3. 4. 5.]

You are mistaken in "Ifft should return a real array". If you want a real valued output (i.e. you have the fft of real data and now want to perform the ifft) you should use irfft.
See this example from the docs:
>>> np.fft.ifft([1, -1j, -1, 1j])
array([ 0.+0.j, 1.+0.j, 0.+0.j, 0.+0.j]) #Output is complex which is correct
>>> np.fft.irfft([1, -1j, -1])
array([ 0., 1., 0., 0.]) #Output is real valued

Getting coefficients of a cubic spline from scipy.interpolate.splrep

I am doing a cubic spline interpolation using scipy.interpolate.splrep as following:
import numpy as np
import scipy.interpolate
x = np.linspace(0, 10, 10)
y = np.sin(x)
tck = scipy.interpolate.splrep(x, y, task=0, s=0)
F = scipy.interpolate.PPoly.from_spline(tck)
I print t and c:
print F.x
array([ 0. , 0. , 0. , 0. ,
2.22222222, 3.33333333, 4.44444444, 5.55555556,
6.66666667, 7.77777778, 10. , 10. ,
10. , 10. ])
print F.c
array([[ -1.82100357e-02, -1.82100357e-02, -1.82100357e-02,
-1.82100357e-02, 1.72952212e-01, 1.26008293e-01,
-4.93704109e-02, -1.71230879e-01, -1.08680287e-01,
1.00658224e-01, 1.00658224e-01, 1.00658224e-01,
1.00658224e-01],
[ -3.43151441e-01, -3.43151441e-01, -3.43151441e-01,
-3.43151441e-01, -4.64551679e-01, 1.11955696e-01,
5.31983340e-01, 3.67415303e-01, -2.03354294e-01,
-5.65621916e-01, 1.05432909e-01, 1.05432909e-01,
1.05432909e-01],
[ 1.21033389e+00, 1.21033389e+00, 1.21033389e+00,
1.21033389e+00, -5.84561936e-01, -9.76335250e-01,
-2.60847433e-01, 7.38484392e-01, 9.20774403e-01,
6.63563923e-02, -9.56285846e-01, -9.56285846e-01,
-9.56285846e-01],
[ -4.94881722e-18, -4.94881722e-18, -4.94881722e-18,
-4.94881722e-18, 7.95220057e-01, -1.90567963e-01,
-9.64317117e-01, -6.65101515e-01, 3.74151231e-01,
9.97097891e-01, -5.44021111e-01, -5.44021111e-01,
-5.44021111e-01]])
So I had supplied the x array as :
array([ 0. , 1.11111111, 2.22222222, 3.33333333,
4.44444444, 5.55555556, 6.66666667, 7.77777778,
8.88888889, 10. ])
Q.1: The F.x (knots) are not the same as original x array and has duplicate values (possibly to force first derivative to zero?). Also some values in x (1.11111111, 8.88888889) are missing in F.x. Any ideas?
Q.2 The shape of F.c is (4, 13). I understand that 4 comes from the fact that it is cubic spline fit. But I do not know how do I select coefficients for each of the 9 sections that I want (from x = 0 to x=1.11111, x = 1.111111 to x = 2.222222 and so on). Any help in extraction of the coefficients for different segments would be appreciated.

If you want to have the knots in specific locations along the curves you need to use the argument task=-1 of splrep and give an array of interior knots as the t argument.
The knots in t must satisfy the following condition:
If provided, knots t must satisfy the Schoenberg-Whitney conditions, i.e., there must be a subset of data points x[j] such that t[j] < x[j] < t[j+k+1], for j=0, 1,...,n-k-2.
See the documentation here.
Then you should get F.c of the following size (4, <length of t> + 2*(k+1)-1) corresponding to the consecutive intervals along the curve (k+1 knots are added at either end of the curve by splrep).
Try the following:
import numpy as np
import scipy.interpolate
x = np.linspace(0, 10, 20)
y = np.sin(x)
t = np.linspace(0, 10, 10)
tck = scipy.interpolate.splrep(x, y, t=t[1:-1])
F = scipy.interpolate.PPoly.from_spline(tck)
print(F.x)
print(F.c)
# Accessing coeffs of nth segment: index = k + n - 1
# Eg. for second segment:
print(F.c[:,4])

Sorting array based two goals

I have a list of vectors (each vectors only contain 0 or 1) :
In [3]: allLabelPredict
Out[3]: array([[ 0., 0., 0., ..., 0., 0., 1.],
[ 0., 0., 0., ..., 0., 0., 1.],
[ 0., 0., 0., ..., 0., 0., 1.],
...,
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 1.],
[ 0., 0., 0., ..., 0., 0., 1.]])
In [4]: allLabelPredict.shape
Out[4]: (5000, 190)
As you can see I have 190 different vectors each vector is a result of one classifier, now I want to select some of these output based on proximity of each vector to my original label
In [7]: myLabel
Out[7]: array([ 0., 0., 0., ..., 1., 1., 1.])
In [8]: myLabel.shape
Out[8]: (5000,)
For this purpose I've defined two different criteria for each vector; Zero Hamming Distance and One Hamming Distance.
"One Hamming Distance": hamming distance between the sub-array of myLabel which are equal to "1" and sub-array of each vector (I have created sub-array of each vector by selecting value from each vector based on indices of "myLabel" where the value is '1')
"zero Hamming Distance": hamming distance between the sub-array of myLabel which are equal to "0" and sub-array of each vector (I have created sub-array of each vector by selecting value from each vector based on indices of "myLabel" where the value is '0')
To make it more clear will give you a small example:
MyLabel [1,1,1,1,0,0,0,0]
V1 [1,1,0,1,0,0,1,1]
sub-array1 [1,1,0,1]
sub-array0 [0,0,1,1]
"zero Hamming Distance": hamming(sub-array0, MyLabel[4:])
"one Hamming Distance": hamming(sub-array1, MyLabel[:4])
Now I want to select some vectors from 'allLabelPredict' based on "One Hamming Distance" and
"zero Hamming Distance"
I want to select those vectors which have the minimum "One Hamming Distance" and
"zero Hamming Distance". (by minimum I mean both criteria for this vector be the lowest amongst others)
If above request is not possible how can I do something like this sort somehow that always sort first based on "One Hamming Distance" and after that try to minimize "Zero Hamming Distance"

OK, so first I'd split up the entire allLabelPredict into two subarrays based on the values in myLabel:
import numpy as np
allLabelPredict = np.random.randint(0, 2, (5000, 190))
myLabel = np.random.randint(0, 2, 5000)
sub0 = allLabelPredict[myLabel==0]
sub1 = allLabelPredict[myLabel==1]
ham0 = np.abs(sub0 - 0).mean(0)
ham1 = np.abs(sub1 - 1).mean(0)
hamtot = np.abs(allLabelPredict - myLabel[:, None]).mean(0) # if they're not split
This is the same as scipy.spatial.distance.hamming, but that can only be applied to one vector at a time:
>>> np.allclose(scipy.spatial.distance.hamming(allLabelPredict[:,0], myLabel),
... np.abs(allLabelPredict[:,0] - myLabel).mean(0))
True
Now, the indices in either ham array will be the indices in the second axis of the allLabelPredict array. If you want to sort your vectors by hamming distance:
sortby0 = allLabelPredict[:, ham0.argsort()]
sortby1 = allLabelPredict[:, ham1.argsort()]
Or if you want the lowest zero (or one) hamming, you would look at
best0 = allLabelPredict[:, ham0.argmin()]
best1 = allLabelPredict[:, ham1.argmin()]
Or if you want the lowest one hamming with zero hamming near 0.1, you could say something like
hamscore = (ham0 - 0.1)**2 + ham1**2
best = allLabelPredict[:, hamscore.argmin()]

The crux of the answer should include this: use sorted(allLabelPredict, key=<criteria>)
It will let you sort the list based on the criteria you defined as a function and passed to keys argument.
To do this, first let's convert your 190 vectors into pair of (0-H Dist, 1-H Dist). Then you'll have something like this:
(0.10, 0.15)
(0.12, 0.09)
(0.25, 0.03)
(0.14, 0.16)
(0.14, 0.11)
...
Next, we need to clarify what you meant by "both criteria for this vector be the lowest amongst others". In the above case, should we choose (0.25, 0.03)? Or is it (0.10, 0.15)? How about (0.14, 0.11)? Fortunately you already said that in this case, we need to prioritize 1-H Dist first. So we will choose (0.25, 0.03), is this correct? From your comments in #askewchan's answer it seems that you want the sort criteria to be flexible.
If that's so, then your first criterion that "both criteria for this vector be the lowest amongst others" is actually part of your second criterion, which is "sort based on One Hamming Distance, then by Zero Hamming Distance", since after the sorting the vector with lowest distance on both scores will be at the top anyway.
Hence we just need to sort based on 1-D Dist and then by 0-H Dist when the 1-H Dist score is the same. This sort criteria can be changed flexibly, as long as you already have the pair of scores.
Here is a sample code:
import numpy as np
from scipy.spatial.distance import hamming
def sort_criteria(pair_of_scores):
score0, score1 = pair_of_scores
return (score1, score0) # Sort by 1-H, then by 0-H
# The following will sort by Euclidean distance
#return score0**2 + score1**2
# The following is to select the vectors with score0==0.5, then sort based on score1
#return score1 if np.abs(score0-0.5)<1e7 else (1+score1, score0) == 0.5
def main():
allLabelPredict = np.asarray(np.random.randint(0, 2, (5, 10)), dtype=np.float64)
myLabel = np.asarray(np.random.randint(0, 2, 10), dtype=np.float64)
print allLabelPredict
print myLabel
allSub0 = allLabelPredict[:, myLabel==0]
allSub1 = allLabelPredict[:, myLabel==1]
all_scores = [(hamming(sub0, 0), hamming(sub1, 1))
for sub0, sub1 in zip(allSub0, allSub1)]
print all_scores # The (0-H, 1-H) score pairs
all_scores = sorted(all_scores, key=sort_criteria) # The sorting
#all_scores = np.array([pair for pair in all_scores if pair[0]==0.5]) # For filtering
print all_scores
if __name__ == '__main__':
main()
Result:
[[ 1. 0. 0. 0. 0. 1. 1. 0. 1. 1.]
[ 1. 0. 0. 0. 1. 0. 1. 0. 0. 1.]
[ 0. 1. 1. 0. 1. 1. 1. 1. 1. 0.]
[ 0. 0. 1. 1. 1. 1. 1. 0. 1. 1.]
[ 1. 1. 1. 1. 1. 1. 0. 0. 0. 0.]]
[ 1. 1. 1. 1. 1. 0. 1. 1. 0. 1.]
[(1.0, 0.625), (0.0, 0.5), (1.0, 0.375), (1.0, 0.375), (0.5, 0.375)]
[(0.5, 0.375), (1.0, 0.375), (1.0, 0.375), (0.0, 0.5), (1.0, 0.625)]
You just need to change the sort_criteria function to change your criteria.

If you sort first based on one criteria, then another, the first entry in that sort will be the only one that could simultaneously minimize both criteria.
You can do that operation with numpy using argsort. This requires you to make a numpy array that has keys. I will assume that you have an array called zeroHamming and oneHamming.
# make an array of the distances with keys
# these must be input as pairs, not as columns
hammingDistances = np.array([(one,zero) for one,zero in zip(oneHamming,zeroHamming],\
dtype=[("one","float"),("zero","float")])
# to see how the keys work, try:
print hammingDistances['zero']
# do a sort by oneHamming, then by zeroHamming
sortedIndsOneFirst = np.argsort(hammingDistances,order=['one','zero'])
# do a sort by zeroHamming, then by oneHamming
sortedIndsZeroFirst = np.argsort(hammingDistances,order=['zero','one'])

Its easier to work with as1 = allLabelPredict.T, because then as1[0] will be your first vector, as1[1] your second etc. Then, your hamming distance function is simply:
def ham(a1, b1): return sum(map(abs, a1-b1))
So, if you want the vectors that match your criterion, you can use composition:
vects = numpy.array( [ a for a in as1 if ham(a, myLabel) < 2 ] )
where, myLabel is the vector you want to compare with.

How do I use scipy.interpolate.splrep to interpolate a curve?

Using some experimental data, I cannot for the life of me work out how to use splrep to create a B-spline. The data are here: http://ubuntuone.com/4ZFyFCEgyGsAjWNkxMBKWD
Here is an excerpt:
#Depth Temperature
1 14.7036
-0.02 14.6842
-1.01 14.7317
-2.01 14.3844
-3 14.847
-4.05 14.9585
-5.03 15.9707
-5.99 16.0166
-7.05 16.0147
and here's a plot of it with depth on y and temperature on x:
Here is my code:
import numpy as np
from scipy.interpolate import splrep, splev
tdata = np.genfromtxt('t-data.txt',
skip_header=1, delimiter='\t')
depth = tdata[:, 0]
temp = tdata[:, 1]
# Find the B-spline representation of 1-D curve:
tck = splrep(depth, temp)
### fails here with "Error on input data" returned. ###
I know I am doing something bleedingly stupid, but I just can't see it.

You just need to have your values from smallest to largest :). It shouldn't be a problem for you #a different ben, but beware readers from the future, depth[indices] will throw a TypeError if depth is a list instead of a numpy array!
>>> indices = np.argsort(depth)
>>> depth = depth[indices]
>>> temp = temp[indices]
>>> splrep(depth, temp)
(array([-7.05, -7.05, -7.05, -7.05, -5.03, -4.05, -3. , -2.01, -1.01,
1. , 1. , 1. , 1. ]), array([ 16.0147 , 15.54473241, 16.90606794, 14.55343229,
15.12525673, 14.0717599 , 15.19657895, 14.40437622,
14.7036 , 0. , 0. , 0. , 0. ]), 3)
Hat tip to #FerdinandBeyer for the suggestion of argsort instead of my ugly "zip the values, sort the zip, re-assign the values" method.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Smooth line with spline + datetime objects doesn't work - python

Related

Why scipy's splrep shows error on input data?

What does IFFT return in Python?

Getting coefficients of a cubic spline from scipy.interpolate.splrep

Sorting array based two goals

How do I use scipy.interpolate.splrep to interpolate a curve?

Categories

Resources