What is the equvalent interpolation from matlab in python? - python

I want to rewrite a code from matlab to python. In matlab I have the following:
interp1(TMP.time_hor, TMP.lane_hor, TMP.travel_time, 'next') % matlab
Which interpolation is meant by 'next'? Usually by default is linear. Is there a numpy equivalent? For example:
numpy.interp(x, xp, fp, left=None, right=None, period=None) # python
which is 'linear' interpolated...

Which interpolation is meant by 'next'? Usually by default is linear. Is there a numpy equivalent?
The interpolation method 'next' interpolates to the next data point in the data set (see: https://www.mathworks.com/help/matlab/ref/interp1.html).
Looking at NumPy's documentation (https://numpy.org/doc/stable/reference/generated/numpy.interp.html), it appears as though they use linear interpolation, so if you want the same output, you just need to specify this in your MATLAB command, like this:
interp1(TMP.time_hor, TMP.lane_hor, TMP.travel_time, 'linear')
That being said, 'linear' is the default interpolation method for interp1, so you can also simply leave that argument out and use the command:
interp1(TMP.time_hor, TMP.lane_hor, TMP.travel_time)
I hope this helps!
Edit: I just realized what you were asking was backwards you want to interpolate using the 'next' method in Python instead. Here's how I'd do it:
import numpy as np
import scipy as sp
# Generate data
x = np.linspace(0, 1, 10)
y = np.exp(x)
# Create interpolation function
f = sp.interpolate.interp1d(x, y, 'next')
# Create resampled range
x_resampled = np.linspace(x[0], x[-1], 100)
# Here's your interpolated data
y_interpolated = f(x_resampled)

Related

How do I substitute and array into a sympy equation so I can plot the equation?

I have used dsolve to generate a result in the form of an equation in variable result.
I want to plot this equation for the range of values in the x_val array so I can plot it.
I do not seem to be able to find a way to use apply the function to the x values.
import sympy as sy
import numpy as np
t,m,k,c = sy.symbols('t m k c')
x = sy.Function('x')
diffeq = sy.Eq(x(t).diff(t, t) - x(t), sy.cos(t))
result = sy.dsolve(diffeq, ics={x(0): 0, sy.diff(x(t), t).subs(t,0): 0})
print(result)
x_val=np.linspace(0,10,100)
#relent95's answer does the job in this case. However, evalf is going to evaluate a symbolic expression into a SymPy's Float number, which is another symbolic expression. This kind of evaluation is slow: if you need to evaluate a symbolic expression over many points, you have two options (in this case).
Use sympy's plot command:
from sympy import plot
plot(result.rhs, (t, 0, 10))
Convert your symbolic expression to a numerical function and evaluate it over a numpy array. In contrast to the previous approach, here you have full control over the appearance of your plot:
import matplotlib.pyplot as plt
from sympy import lambdify
f = lambdify(t, result.rhs)
ts = np.linspace(0, 10, 100)
plt.figure()
plt.plot(ts, f(ts))
plt.show()
You can use evalf() on the right hand side of the returned equality like this.
...
import matplotlib.pyplot as plt
...
ts = np.linspace(0, 10, 100)
xs = [result.rhs.evalf(subs = dict(t=t)) for t in ts]
plt.plot(ts, xs)
plt.show()

`ValueError: A value in x_new is above the interpolation range.` - what other reasons than not ascending values?

I receive this error in scipy interp1d function. Normally, this error would be generated if the x was not monotonically increasing.
import scipy.interpolate as spi
def refine(coarsex,coarsey,step):
finex = np.arange(min(coarsex),max(coarsex)+step,step)
intfunc = spi.interp1d(coarsex, coarsey,axis=0)
finey = intfunc(finex)
return finex, finey
for num, tfile in enumerate(files):
tfile = tfile.dropna(how='any')
x = np.array(tfile['col1'])
y = np.array(tfile['col2'])
finex, finey = refine(x,y,0.01)
The code is correct, because it successfully worked on 6 data files and threw the error for the 7th. So there must be something wrong with the data. But as far as I can tell, the data increase all the way down.
I am sorry for not providing an example, because I am not able to reproduce the error on an example.
There are two things that could help me:
Some brainstorming - if the data are indeed monotonically
increasing, what else could produce this error? Another hint,
regarding the decimals, could be in this question, but I think
my solution (the min and max of x) is robust enough to avoid it. Or
isn't it?
Is it possible (how?) to return the value of x_new and
it's index when throwing the ValueError: A value in x_new is above the interpolation range. so that I could actually see where in the
file is the problem?
UPDATE
So the problem is that, for some reason, max(finex) is larger than max(coarsex) (one is .x39 and the other is .x4). I hoped rounding the original values to 2 significant digits would solve the problem, but it didn't, it displays fewer digits but still counts with the undisplayed. What can I do about it?
If you are running Scipy v. 0.17.0 or newer, then you can pass fill_value='extrapolate' to spi.interp1d, and it will extrapolate to accomadate these values of your's that lie outside the interpolation range. So define your interpolation function like so:
intfunc = spi.interp1d(coarsex, coarsey,axis=0, fill_value="extrapolate")
Be forewarned, however!
Depending on what your data looks like and the type on interpolation you are performing, the extrapolated values can be erroneous. This is especially true if you have noisy or non-monotonic data. In your case you might be ok because your x_new value is only slighly beyond your interpolation range.
Here's simple demonstration of how this feature can work nicely but also give erroneous results.
import scipy.interpolate as spi
import numpy as np
x = np.linspace(0,1,100)
y = x + np.random.randint(-1,1,100)/100
x_new = np.linspace(0,1.1,100)
intfunc = spi.interp1d(x,y,fill_value="extrapolate")
y_interp = intfunc(x_new)
import matplotlib.pyplot as plt
plt.plot(x_new,y_interp,'r', label='interp/extrap')
plt.plot(x,y, 'b--', label='data')
plt.legend()
plt.show()
So the interpolated portion (in red) worked well, but the extrapolated portion clearly fails to follow the otherwise linear trend in this data because of the noise. So have some understanding of your data and proceed with caution.
A quick test of your finex calc shows that it can (always?) gets into the extrapolation region.
In [124]: coarsex=np.random.rand(100)
In [125]: max(coarsex)
Out[125]: 0.97393109991816473
In [126]: step=.01;finex=np.arange(min(coarsex), max(coarsex)+step, step);(max(
...: finex),max(coarsex))
Out[126]: (0.98273730602114795, 0.97393109991816473)
In [127]: step=.001;finex=np.arange(min(coarsex), max(coarsex)+step, step);(max
...: (finex),max(coarsex))
Out[127]: (0.97473730602114794, 0.97393109991816473)
Again it is a quick test, and may be missing some critical step or value.

Scipy's RectBivariateSpline returns wrong value?

Trying to interpolate data from a regular input grid, and came across this in the documentation for scipy.interpolate.interp2d:
See also RectBivariateSpline Much faster 2D interpolation if your
input data is on a grid
So I tried using scipy.interpolate.RectBivariateSpline instead of interp2d. Docs for both functions seem very similar, so I expected this to produce similar results:
import numpy as np
from scipy.interpolate import RectBivariateSpline, interp2d
from .constants import data
x_coords = y_coords = np.arange(data.shape[0]) # Square array
interp_fun = interp2d(x_coords, y_coords, data)
bivar_fun = RectBivariateSpline(x_coords, y_coords, data)
data[250, 60] # 76.1451873779
interp_fun(60, 250) # 76.14518738
bivar_fun(60, 250, grid=False) # 345.24444
Am I calling this wrong? I have no idea why the interpolation based on RectBivariateSpline is so far off?
I did suspect that maybe RectBivariateSpline operates on a cartesian grid and inverted the y-Axis of input data, but still no luck.
Right, just before submitting this I thought I should try calling bivar_fun(y, x) instead of bivar_fun(x, y) and things suddenly work:
data[250, 60] # 76.1451873779
interp_fun(60, 250) # 76.14518738
bivar_fun(250, 60, grid=False) # [ 76.14518738]
Still not quite sure why, because the first arguments to interp_fun and bivar_fun should be the same:
RectBivariateSpline.__call__(x, y, mth=None, dx=0, dy=0, grid=True)
RectBivariateSpline Docs
interp2d.__call__(x, y, dx=0, dy=0)
Interp2d Docs
There's also a related issue on Github: https://github.com/scipy/scipy/issues/3164

How to make user defined functions for binned_statistic

I am using scipy stats package to take statistics along the an axis, but I am having trouble taking the percentile statistic using binned_statistic. I have generalized the code below, where I am trying taking the 10th percentile of a dataset with x, y values within a series of x bins, and it fails.
I can of course do function options, like median, and even the numpy standard deviation using np.std. However, I cannot figure out how to use np.percentile because it requires 2 arguments (e.g. np.percentile(y, 10)), but then it gives me a ValueError: statistic not understood error.
import numpy as np
import scipy.stats as scist
y_median = scist.binned_statistic(x,y,statistic='median',bins=20,range=[(0,5)])[0]
y_std = scist.binned_statistic(x,y,statistic=np.std,bins=20,range=[(0,5)])[0]
y_10 = scist.binned_statistic(x,y,statistic=np.percentile(10),bins=20,range=[(0,5)])[0]
print y_median
print y_std
print y_10
I am at a loss and have even played around with user defined functions like this, but with no luck:
def percentile10():
return(np.percentile(y,10))
Any help, is greatly appreciated.
Thanks.
The problem with the function you defined is that it takes no arguments at all! It needs to take a y argument that corresponds to your sample, like this:
def percentile10(y):
return(np.percentile(y,10))
You could also use a lambda function for brevity:
scist.binned_statistic(x, y, statistic=lambda y: np.percentile(y, 10), bins=20,
range=[(0, 5)])[0]

Why does InterpolatedUnivariateSpline return nan values

I have some data, y vs x, which I would like to interpolate at a finer resolution xx using a cubic spline.
Here is my dataset:
import numpy as np
print np.version.version
import scipy
print scipy.version.version
1.9.2
0.15.1
x = np.array([0.5372973, 0.5382103, 0.5392305, 0.5402197, 0.5412042, 0.54221, 0.543209,
0.5442277, 0.5442277, 0.5452125, 0.546217, 0.5472153, 0.5482086,
0.5492241, 0.5502117, 0.5512249, 0.5522136, 0.5532056, 0.5532056,
0.5542281, 0.5552039, 0.5562125, 0.5567836])
y = np.array([0.01, 0.03108, 0.08981, 0.18362, 0.32167, 0.50941, 0.72415, 0.90698,
0.9071, 0.97955, 0.99802, 1., 0.97863, 0.9323, 0.85344, 0.72936,
0.56413, 0.36997, 0.36957, 0.17623, 0.05922, 0.0163, 0.01, ])
xx = np.array([0.5372981, 0.5374106, 0.5375231, 0.5376356, 0.5377481, 0.5378606,
0.5379731, 0.5380856, 0.5381981, 0.5383106, 0.5384231, 0.5385356,
0.5386481, 0.5387606, 0.5388731, 0.5389856, 0.5390981, 0.5392106,
0.5393231, 0.5394356, 0.5395481, 0.5396606, 0.5397731, 0.5398856,
0.5399981, 0.5401106, 0.5402231, 0.5403356, 0.5404481, 0.5405606,
0.5406731, 0.5407856, 0.5408981, 0.5410106, 0.5411231, 0.5412356,
0.5413481, 0.5414606, 0.5415731, 0.5416856, 0.5417981, 0.5419106,
0.5420231, 0.5421356, 0.5422481, 0.5423606, 0.5424731, 0.5425856,
0.5426981, 0.5428106, 0.5429231, 0.5430356, 0.5431481, 0.5432606,
0.5433731, 0.5434856, 0.5435981, 0.5437106, 0.5438231, 0.5439356,
0.5440481, 0.5441606, 0.5442731, 0.5443856, 0.5444981, 0.5446106,
0.5447231, 0.5448356, 0.5449481, 0.5450606, 0.5451731, 0.5452856,
0.5453981, 0.5455106, 0.5456231, 0.5457356, 0.5458481, 0.5459606,
0.5460731, 0.5461856, 0.5462981, 0.5464106, 0.5465231, 0.5466356,
0.5467481, 0.5468606, 0.5469731, 0.5470856, 0.5471981, 0.5473106,
0.5474231, 0.5475356, 0.5476481, 0.5477606, 0.5478731, 0.5479856,
0.5480981, 0.5482106, 0.5483231, 0.5484356, 0.5485481, 0.5486606,
0.5487731, 0.5488856, 0.5489981, 0.5491106, 0.5492231, 0.5493356,
0.5494481, 0.5495606, 0.5496731, 0.5497856, 0.5498981, 0.5500106,
0.5501231, 0.5502356, 0.5503481, 0.5504606, 0.5505731, 0.5506856,
0.5507981, 0.5509106, 0.5510231, 0.5511356, 0.5512481, 0.5513606,
0.5514731, 0.5515856, 0.5516981, 0.5518106, 0.5519231, 0.5520356,
0.5521481, 0.5522606, 0.5523731, 0.5524856, 0.5525981, 0.5527106,
0.5528231, 0.5529356, 0.5530481, 0.5531606, 0.5532731, 0.5533856,
0.5534981, 0.5536106, 0.5537231, 0.5538356, 0.5539481, 0.5540606,
0.5541731, 0.5542856, 0.5543981, 0.5545106, 0.5546231, 0.5547356,
0.5548481, 0.5549606, 0.5550731, 0.5551856, 0.5552981, 0.5554106,
0.5555231, 0.5556356, 0.5557481, 0.5558606, 0.5559731, 0.5560856,
0.5561981, 0.5563106, 0.5564231, 0.5565356, 0.5566481, 0.5567606])
I am trying to fit using the scipy InterpolatedUnivariateSpline method, interpolated with a 3rd order spline k=3, and extrapolated as zeros ext='zeros':
import scipy.interpolate as interp
yspline = interp.InterpolatedUnivariateSpline(x,y, k=3, ext='zeros')
yvals = yspline(xx)
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(x, y, 'ko', label='Values')
ax.plot(xx, yvals, 'b-.', lw=2, label='Spline')
plt.xlim([min(x), max(x)])
However, as you can see in this image, my Spline returns NaN values :(
Is there a reason? I am pretty sure my x values are all increasing, so I am stumped as to why this is happening. I have many other datasets I am fitting using this method, and it only fails on this specific set of data.
Any help is greatly appreciated.
Thank you for reading.
EDIT!
The solution was that I have duplicate x values, with differing y values!
For this interpolation, you should rather use scipy.interpolate.interp1d with the argument kind='cubic' (see a related SO question )
I have yet to find a use case where InterpolatedUnivariateSpline can be used in practice (or maybe I just don't understand its purpose). With your code I get,
So the interpolation works but shows extremely strong oscillations, making it unusable, which is typically the result I was getting with this interpolation method in the past. With a lower order spline (e.g. k=1) that works better, but then you lose the advantage of cubic interpolation.
I've also encountered the problem with InterpolatedUnivariateSpline returning NaN values. But in my case the reason was not in having duplicates in x array but because values in x were decreasing when docs states that values "must be increasing".
So, in such a case, instead of original x and y one must supply them reversed: x[::-1] and y[::-1].

Categories