Excel-like Interpolation in Python - python

Plotting my data in excel as a scatter plot with smooth line and markers produces the type of figure I'm expecting. Image of Excel plots:
However when trying to plot the data in matplotlib I'm running into some issues with interpolation. I'm using the interpolation package from SciPy, I've tried a range of different interpolation methods including spline interpolation and BarycentricInterpolator as suggested previously. These plots are obviously very different to the excel produced plots however:
I've tried different smoothing and k values for spline interpolation, while the curve changes the root problem still exists.
How would I be able to produce a fitted curve similar to the excel-produced plots?
Thanks

The problem is that you interpolate the data on a linear scale but expect the outcome to look smooth on a logarithmic scale.
The idea would therefore be perform the interpolation on a log scale already by transforming the data to its logarithm first and then perform the interpolation. You can then transform it back to linear scale such that you can plot it on a log scale again.
from scipy.interpolate import interp1d, Akima1DInterpolator
import numpy as np
import matplotlib.pyplot as plt
x = np.array([0.02,0.2,2,20,200])
y = np.array([700,850,680,410, 700])
plt.plot(x,y, marker="o", ls="")
sx=np.log10(x)
xi_ = np.linspace(sx.min(),sx.max(), num=201)
xi = 10**(xi_)
f = interp1d(sx,y, kind="cubic")
yi = f(xi_)
plt.plot(xi,yi, label="cubic spline")
f2 = Akima1DInterpolator(sx, y)
yi2 = f2(xi_)
plt.plot(xi,yi2, label="Akima")
plt.gca().set_xscale("log")
plt.legend()
plt.show()

Related

Smooth Contourf plot completely filled

I have the data with (X,Y,Z) values. I tried to make a density plot with Z values for intensity. However the plot I get is not smooth and and has polytope i.e not completely filled.
The following is the code with the Data
but I want to obtain smooth and completely filled plot
import numpy as np
from scipy.interpolate import griddata
import matplotlib.pyplot as plt
import xlrd
location = "~/Desktop/Data.xlsx"
data = xlrd.open_workbook(location)
sheet = data.sheet_by_index(0)
sample=2000
x=np.array(sheet.col_values(0))[0:sample]
y=np.array(sheet.col_values(1))[0:sample]
z=np.hamming(9000)[0:sample]
print z
def plot_contour(x,y,z,resolution = 500,contour_method='cubic'):
resolution = str(resolution)+'j'
X,Y = np.mgrid[min(x):max(x):complex(resolution), min(y):max(y):complex(resolution)]
points = [[a,b] for a,b in zip(x,y)]
Z = griddata(points, z, (X, Y), method=contour_method)
return X,Y,Z
X,Y,Z = plot_contour(x,y,z,resolution = 500,contour_method='linear')
plt.style.context("seaborn-deep")
plt.contourf(X,Y,Z)
plt.colorbar()
plt.show()
This is the output:
This is what I want to achieve using contourplotf:
plt.contourf() is not the main problem here, it's just working with the data it has. The problem is the linear interpolation in scipy.interpolate.griddata().
I recommend not using griddata, but instead using one of the following methods:
scipy.interpolate.Rbf() — this is what you were using before (see my previous answer).
verde — an awesome gridding package.
sklearn.gaussian_process — or some other prediction model.
All of these methods will fill in the grid. If you plot the result with plt.imshow() you'll get the type of plot you show in your question — that is not a plt.contourf() plot.
Here's a demo notebook showing all of these approaches (including griddata).

Is there a way in lmfit to only show the curve of the fit?

So I wrote some code with the help of lmfit to fit a Gaussian curve on some histogram data. While the curve itself is fine, when I try to plot the results in matplotlib, it displays the fit along with the data points. In reality, I want to plot histogram bars with the curve fit. How do you do this? Or alternatively, is there a way in lmfit to only show the fit curve and then add the histogram plot and combine them together?
Relevant part of my code:
counts, bin_edges = np.histogram(some_array, bins=1000)
bin_widths = np.diff(bin_edges)
x = bin_edges[:-1] + (bin_widths / 2)
y = counts
mod = GaussianModel()
pars = mod.guess(y, x=x)
final_fit = mod.fit(y, pars, x=x)
final_fit.plot_fit()
plt.show()
Here's the graphed result:
Gaussian curve
lmfit's builtin plotting routines are minimal wrappers around matplotlib, intended to give reasonable default plots for many cases. They don't make histograms.
But the arrays are readily available and using matplotlib to make a histogram is easy. I think all you need is:
import matplotlib.pyplot as plt
plt.hist(some_array, bins=1000, rwidth=0.5, label='binned data')
plt.plot(x, final_fit.best_fit, label='best fit')
plt.legend()
plt.show()

Matplotlib Interpolation

I have a Data Frame df with two columns 'Egy' and 'fx' that I plot in this way:
plot_1 = df_data.plot(x="Egy", y="fx", color="red", ax=ax1, linewidth=0.85)
plot_1.set_xscale('log')
plt.show()
But then I want to smooth this curve using spline like this:
from scipy.interpolate import spline
import numpy as np
x_new = np.linspace(df_data['Egy'].min(), df_data['Egy'].max(),500)
f = spline(df_data['Egy'], df_data['fx'],x_new)
plot_1 = ax1.plot(x_new, f, color="black", linewidth=0.85)
plot_1.set_xscale('log')
plt.show()
And the plot I get is this (forget about the scatter blue points).
There are a lot of "peaks" in the smooth curve, mainly at lower x. How Can I smooth this curve properly?
When I consider the "busybear" suggestion of use np.logspace instead of np.linspace I get the following picture, which is not very satisfactory either.
You have your x values linearly scaled with np.linspace but your plot is log scaled. You could try np.geomspace for your x values or plot without the log scale.
Using spline will only work well for functions that are already smooth. What you need is to regularize the data and then interpolate afterwards. This will help to smooth out the bumps. Regularization is an advanced topic, and it would not be appropriate to discuss it in detail here.
Update: for regularization using machine learning, you might look into the scikit library for Python.

Interpolating Data Using SciPy

I have two arrays of data that correspond to x and y values, that I would like to interpolate with a cubic spline.
I have tried to do this, but my interpolated function doesn't pass through my data points.
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
re = np.array([0.2,2,20,200,2000,20000],dtype = float)
cd = np.array([103,13.0,2.72,0.800,0.401,0.433],dtype = float)
plt.yscale('log')
plt.xscale('log')
plt.xlabel( "Reynold's number" )
plt.ylabel( "Drag coefficient" )
plt.plot(re,cd,'x', label='Data')
x = np.linspace(0.2,20000,200000)
f = interp1d(re,cd,kind='cubic')
plt.plot(x,f(x))
plt.legend()
plt.show()
What I end up with looks like this;
Which is clearly an awful representation of my function. What am I missing here?
Thank you.
You can get the result you probably expect (smooth spline on the log axes) by doing this:
f = interp1d(np.log(re),np.log(cd), kind='cubic')
plt.plot(x,np.exp(f(np.log(x))))
This will build the interpolation in the log space and plot it correctly. Plot your data on a linear scale to see how the cubic has to flip to get the tail on the left hand side.
The main thing you are missing is the log scaling on your axes. The spline shown is not an unreasonable result given your input data. Try drawing the plot with plt.xscale('linear') instead of plt.xscale('log'). Perhaps a cubic spline is not the best interpolation technique, at least on the raw data. A better option may be to interpolate on the log of the data insead.

Python equivalent for MATLAB's normplot?

Is there a python equivalent function similar to normplot from MATLAB?
Perhaps in matplotlib?
MATLAB syntax:
x = normrnd(10,1,25,1);
normplot(x)
Gives:
I have tried using matplotlib & numpy module to determine the probability/percentile of the values in array but the output plot y-axis scales are linear as compared to the plot from MATLAB.
import numpy as np
import matplotlib.pyplot as plt
data =[-11.83,-8.53,-2.86,-6.49,-7.53,-9.74,-9.44,-3.58,-6.68,-13.26,-4.52]
plot_percentiles = range(0, 110, 10)
x = np.percentile(data, plot_percentiles)
plt.plot(x, plot_percentiles, 'ro-')
plt.xlabel('Value')
plt.ylabel('Probability')
plt.show()
Gives:
Else, how could the scales be adjusted as in the first plot?
Thanks.
A late answer, but I just came across the same problem and found a solution, that is worth sharing. I guess.
As joris pointed out the probplot function is an equivalent to normplot, but the resulting distribution is in form of the cumulative density function. Scipy.stats also offers a function, to convert these values.
cdf -> percentile
stats.'distribution function'.cdf(cdf_value)
percentile -> cdf
stats.'distribution function'.ppf(percentile_value)
for example:
stats.norm.ppf(percentile)
To get an equivalent y-axis, like normplot, you can replace the cdf-ticks:
from scipy import stats
import matplotlib.pyplot as plt
nsample=500
#create list of random variables
x=stats.t.rvs(100, size=nsample)
# Calculate quantiles and least-square-fit curve
(quantiles, values), (slope, intercept, r) = stats.probplot(x, dist='norm')
#plot results
plt.plot(values, quantiles,'ob')
plt.plot(quantiles * slope + intercept, quantiles, 'r')
#define ticks
ticks_perc=[1, 5, 10, 20, 50, 80, 90, 95, 99]
#transfrom them from precentile to cumulative density
ticks_quan=[stats.norm.ppf(i/100.) for i in ticks_perc]
#assign new ticks
plt.yticks(ticks_quan,ticks_perc)
#show plot
plt.grid()
plt.show()
The result:
I'm fairly certain matplotlib doesn't provide anything like this.
It's possible to do, of course, but you'll have to either rescale your data and change your y axis ticks/labels to match, or, if you're planning on doing this often, perhaps code a new scale that can be applied to matplotlib axes, like in this example: http://matplotlib.sourceforge.net/examples/api/custom_scale_example.html.
Maybe you can use the probplot function of scipy (scipy.stats), this seems to me an equivalent for MATLABs normplot:
Calculate quantiles for a probability
plot of sample data against a
specified theoretical distribution.
probplot optionally calculates a
best-fit line for the data and plots
the results using Matplotlib or a
given plot function.
http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.probplot.html
But is does not solve your problem of the different y-axis scale.
Using matplotlib.semilogy will get closer to the matlab output.

Categories