I have a Data Frame df with two columns 'Egy' and 'fx' that I plot in this way:
plot_1 = df_data.plot(x="Egy", y="fx", color="red", ax=ax1, linewidth=0.85)
plot_1.set_xscale('log')
plt.show()
But then I want to smooth this curve using spline like this:
from scipy.interpolate import spline
import numpy as np
x_new = np.linspace(df_data['Egy'].min(), df_data['Egy'].max(),500)
f = spline(df_data['Egy'], df_data['fx'],x_new)
plot_1 = ax1.plot(x_new, f, color="black", linewidth=0.85)
plot_1.set_xscale('log')
plt.show()
And the plot I get is this (forget about the scatter blue points).
There are a lot of "peaks" in the smooth curve, mainly at lower x. How Can I smooth this curve properly?
When I consider the "busybear" suggestion of use np.logspace instead of np.linspace I get the following picture, which is not very satisfactory either.
You have your x values linearly scaled with np.linspace but your plot is log scaled. You could try np.geomspace for your x values or plot without the log scale.
Using spline will only work well for functions that are already smooth. What you need is to regularize the data and then interpolate afterwards. This will help to smooth out the bumps. Regularization is an advanced topic, and it would not be appropriate to discuss it in detail here.
Update: for regularization using machine learning, you might look into the scikit library for Python.
Related
So I wrote some code with the help of lmfit to fit a Gaussian curve on some histogram data. While the curve itself is fine, when I try to plot the results in matplotlib, it displays the fit along with the data points. In reality, I want to plot histogram bars with the curve fit. How do you do this? Or alternatively, is there a way in lmfit to only show the fit curve and then add the histogram plot and combine them together?
Relevant part of my code:
counts, bin_edges = np.histogram(some_array, bins=1000)
bin_widths = np.diff(bin_edges)
x = bin_edges[:-1] + (bin_widths / 2)
y = counts
mod = GaussianModel()
pars = mod.guess(y, x=x)
final_fit = mod.fit(y, pars, x=x)
final_fit.plot_fit()
plt.show()
Here's the graphed result:
Gaussian curve
lmfit's builtin plotting routines are minimal wrappers around matplotlib, intended to give reasonable default plots for many cases. They don't make histograms.
But the arrays are readily available and using matplotlib to make a histogram is easy. I think all you need is:
import matplotlib.pyplot as plt
plt.hist(some_array, bins=1000, rwidth=0.5, label='binned data')
plt.plot(x, final_fit.best_fit, label='best fit')
plt.legend()
plt.show()
I'm facing a silly problem while plotting a graph from a regression function calculated using sci-kit-learn. After constructing the function I need to plot a graph that shows X and Y from the original data and calculated dots from my function. The problem is that my function is not a line, despite being linear, it uses a Fourier series in order to give the right shape for my curve, and when I try to plot the lines using:
ax.plot(df['GDPercapita'], modelp1.predict(df1), color='k')
I got a Graph like this:
Plot
But the trhu graph is supposed to be a line following those black points:
Dots to be connected
I'm generating the graph using the follow code:
fig, ax = plt.subplots()
ax.scatter(df['GDPercapita'], df['LifeExpectancy'], edgecolors=(0, 0, 0))
ax.scatter(df['GDPercapita'], modelp1.predict(df1),color='k') #this line is changed to get the first pic.
ax.set_xlabel('Measured')
ax.set_ylabel('Predicted')
plt.show(block=True)
Does anyone have an idea about what to do?
POST DISCUSSION EDIT:
Ok, so first things first:
The data can be download at: http://www.est.ufmg.br/~marcosop/est171-ML/dados/worldDevelopmentIndicators.csv
I had to generate new data using a Fourier expasion, with normalized values of GDPercapita, in order to perform an exhaustive optimization algorithm for Regression Function used to predict the LifeExpectancy, and found out the number o p parameters that generate the best Regression Function, this number is p=22.
Now I have to generate a Polynomial Function using the predictions points of the regression fuction with p=22, to show how the best regression function is compared to the Polynomial function with the 22 degrees.
To generate the prediction I use the following code:
from sklearn import linear_model
modelp22 = linear_model.LinearRegression()
modelp22.fit(xp22,y_train)
df22 = df[p22]
fig, ax = plt.subplots()
ax.scatter(df['GDPercapita'], df['LifeExpectancy'], edgecolors=(0, 0, 0))
ax.scatter(df['GDPercapita'], modelp22.predict(df22),color='k')
ax.set_xlabel('GDPercapita')
ax.set_ylabel('LifeExpectancy')
plt.show(block=True)
Now I need to use the predictions points to create a Polynomial Function and plot a graph with: The original data(first scatter), the predictions points(secont scatter) and the Polygonal Funciontion (a curve or plot) to show their visual relation.
Plotting my data in excel as a scatter plot with smooth line and markers produces the type of figure I'm expecting. Image of Excel plots:
However when trying to plot the data in matplotlib I'm running into some issues with interpolation. I'm using the interpolation package from SciPy, I've tried a range of different interpolation methods including spline interpolation and BarycentricInterpolator as suggested previously. These plots are obviously very different to the excel produced plots however:
I've tried different smoothing and k values for spline interpolation, while the curve changes the root problem still exists.
How would I be able to produce a fitted curve similar to the excel-produced plots?
Thanks
The problem is that you interpolate the data on a linear scale but expect the outcome to look smooth on a logarithmic scale.
The idea would therefore be perform the interpolation on a log scale already by transforming the data to its logarithm first and then perform the interpolation. You can then transform it back to linear scale such that you can plot it on a log scale again.
from scipy.interpolate import interp1d, Akima1DInterpolator
import numpy as np
import matplotlib.pyplot as plt
x = np.array([0.02,0.2,2,20,200])
y = np.array([700,850,680,410, 700])
plt.plot(x,y, marker="o", ls="")
sx=np.log10(x)
xi_ = np.linspace(sx.min(),sx.max(), num=201)
xi = 10**(xi_)
f = interp1d(sx,y, kind="cubic")
yi = f(xi_)
plt.plot(xi,yi, label="cubic spline")
f2 = Akima1DInterpolator(sx, y)
yi2 = f2(xi_)
plt.plot(xi,yi2, label="Akima")
plt.gca().set_xscale("log")
plt.legend()
plt.show()
I am new here, although I have been reading answers to questions for quite a while. I have a problem, I have a seismic hazard curve looking roughly as follows:
I need to plot it like an histogram. That is what a hazard curve looks like - I would need to plot the median as a histogram
I have tried to use plt.hist as follows:
n, bins, patches = plt.hist(x, 50, facecolor='green', alpha=0.75)
where x is my frequency data array:
x = [1.00E-02, 1.00E-03, 1.00E-04, 1.00E-05, 1.00E-06, 1.00E-07, 1.00E-08, 1.00E-09, 1.00E-10]
but it gives me back an empty image. I think it is because it's used to plot probability density functions (and mine is not a probability density function) but I am not sure if I am right. Can someone give me some pointers on how to do this?
Is there a python equivalent function similar to normplot from MATLAB?
Perhaps in matplotlib?
MATLAB syntax:
x = normrnd(10,1,25,1);
normplot(x)
Gives:
I have tried using matplotlib & numpy module to determine the probability/percentile of the values in array but the output plot y-axis scales are linear as compared to the plot from MATLAB.
import numpy as np
import matplotlib.pyplot as plt
data =[-11.83,-8.53,-2.86,-6.49,-7.53,-9.74,-9.44,-3.58,-6.68,-13.26,-4.52]
plot_percentiles = range(0, 110, 10)
x = np.percentile(data, plot_percentiles)
plt.plot(x, plot_percentiles, 'ro-')
plt.xlabel('Value')
plt.ylabel('Probability')
plt.show()
Gives:
Else, how could the scales be adjusted as in the first plot?
Thanks.
A late answer, but I just came across the same problem and found a solution, that is worth sharing. I guess.
As joris pointed out the probplot function is an equivalent to normplot, but the resulting distribution is in form of the cumulative density function. Scipy.stats also offers a function, to convert these values.
cdf -> percentile
stats.'distribution function'.cdf(cdf_value)
percentile -> cdf
stats.'distribution function'.ppf(percentile_value)
for example:
stats.norm.ppf(percentile)
To get an equivalent y-axis, like normplot, you can replace the cdf-ticks:
from scipy import stats
import matplotlib.pyplot as plt
nsample=500
#create list of random variables
x=stats.t.rvs(100, size=nsample)
# Calculate quantiles and least-square-fit curve
(quantiles, values), (slope, intercept, r) = stats.probplot(x, dist='norm')
#plot results
plt.plot(values, quantiles,'ob')
plt.plot(quantiles * slope + intercept, quantiles, 'r')
#define ticks
ticks_perc=[1, 5, 10, 20, 50, 80, 90, 95, 99]
#transfrom them from precentile to cumulative density
ticks_quan=[stats.norm.ppf(i/100.) for i in ticks_perc]
#assign new ticks
plt.yticks(ticks_quan,ticks_perc)
#show plot
plt.grid()
plt.show()
The result:
I'm fairly certain matplotlib doesn't provide anything like this.
It's possible to do, of course, but you'll have to either rescale your data and change your y axis ticks/labels to match, or, if you're planning on doing this often, perhaps code a new scale that can be applied to matplotlib axes, like in this example: http://matplotlib.sourceforge.net/examples/api/custom_scale_example.html.
Maybe you can use the probplot function of scipy (scipy.stats), this seems to me an equivalent for MATLABs normplot:
Calculate quantiles for a probability
plot of sample data against a
specified theoretical distribution.
probplot optionally calculates a
best-fit line for the data and plots
the results using Matplotlib or a
given plot function.
http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.probplot.html
But is does not solve your problem of the different y-axis scale.
Using matplotlib.semilogy will get closer to the matlab output.