Plotting a curve from dots of a scatter plot - python

I'm facing a silly problem while plotting a graph from a regression function calculated using sci-kit-learn. After constructing the function I need to plot a graph that shows X and Y from the original data and calculated dots from my function. The problem is that my function is not a line, despite being linear, it uses a Fourier series in order to give the right shape for my curve, and when I try to plot the lines using:
ax.plot(df['GDPercapita'], modelp1.predict(df1), color='k')
I got a Graph like this:
Plot
But the trhu graph is supposed to be a line following those black points:
Dots to be connected
I'm generating the graph using the follow code:
fig, ax = plt.subplots()
ax.scatter(df['GDPercapita'], df['LifeExpectancy'], edgecolors=(0, 0, 0))
ax.scatter(df['GDPercapita'], modelp1.predict(df1),color='k') #this line is changed to get the first pic.
ax.set_xlabel('Measured')
ax.set_ylabel('Predicted')
plt.show(block=True)
Does anyone have an idea about what to do?
POST DISCUSSION EDIT:
Ok, so first things first:
The data can be download at: http://www.est.ufmg.br/~marcosop/est171-ML/dados/worldDevelopmentIndicators.csv
I had to generate new data using a Fourier expasion, with normalized values of GDPercapita, in order to perform an exhaustive optimization algorithm for Regression Function used to predict the LifeExpectancy, and found out the number o p parameters that generate the best Regression Function, this number is p=22.
Now I have to generate a Polynomial Function using the predictions points of the regression fuction with p=22, to show how the best regression function is compared to the Polynomial function with the 22 degrees.
To generate the prediction I use the following code:
from sklearn import linear_model
modelp22 = linear_model.LinearRegression()
modelp22.fit(xp22,y_train)
df22 = df[p22]
fig, ax = plt.subplots()
ax.scatter(df['GDPercapita'], df['LifeExpectancy'], edgecolors=(0, 0, 0))
ax.scatter(df['GDPercapita'], modelp22.predict(df22),color='k')
ax.set_xlabel('GDPercapita')
ax.set_ylabel('LifeExpectancy')
plt.show(block=True)
Now I need to use the predictions points to create a Polynomial Function and plot a graph with: The original data(first scatter), the predictions points(secont scatter) and the Polygonal Funciontion (a curve or plot) to show their visual relation.

Related

Matplotlib Interpolation

I have a Data Frame df with two columns 'Egy' and 'fx' that I plot in this way:
plot_1 = df_data.plot(x="Egy", y="fx", color="red", ax=ax1, linewidth=0.85)
plot_1.set_xscale('log')
plt.show()
But then I want to smooth this curve using spline like this:
from scipy.interpolate import spline
import numpy as np
x_new = np.linspace(df_data['Egy'].min(), df_data['Egy'].max(),500)
f = spline(df_data['Egy'], df_data['fx'],x_new)
plot_1 = ax1.plot(x_new, f, color="black", linewidth=0.85)
plot_1.set_xscale('log')
plt.show()
And the plot I get is this (forget about the scatter blue points).
There are a lot of "peaks" in the smooth curve, mainly at lower x. How Can I smooth this curve properly?
When I consider the "busybear" suggestion of use np.logspace instead of np.linspace I get the following picture, which is not very satisfactory either.
You have your x values linearly scaled with np.linspace but your plot is log scaled. You could try np.geomspace for your x values or plot without the log scale.
Using spline will only work well for functions that are already smooth. What you need is to regularize the data and then interpolate afterwards. This will help to smooth out the bumps. Regularization is an advanced topic, and it would not be appropriate to discuss it in detail here.
Update: for regularization using machine learning, you might look into the scikit library for Python.

How do I plot for Multiple Linear Regression Model using matplotlib

I try to Fit Multiple Linear Regression Model
Y= c + a1.X1 + a2.X2 + a3.X3 + a4.X4 +a5X5 +a6X6
Had my model had only 3 variable I would have used 3D plot to plot.
How can I plot this . I basically want to see how the best fit line looks like or should I plot multiple scatter plot and see the effect of individual variable
Y = a1X1 when all others are zero and see the best fit line.
What is the best approach for these models. I know it is not possible to visualize higher dimensions want to know what should be the best approach. I am desperate to see the best fit line
I found this post which is more helpful and followed
https://stats.stackexchange.com/questions/73320/how-to-visualize-a-fitted-multiple-regression-model.
Based on suggestions
I am currently just plotting scatter plots like dependent variable vs. 1st independent variable, then vs. 2nd independent variable etc I am doing same thing . I may not be able to see best fit line for complete model but I know how it is dependent on individual variable
from sklearn.linear_model import LinearRegression
train_copy = train[['OverallQual', 'AllSF','GrLivArea','GarageCars']]
train_copy =pd.get_dummies(train_copy)
train_copy=train_copy.fillna(0)
linear_regr_test = LinearRegression()
fig, axes = plt.subplots(1,len(train_copy.columns.values),sharey=True,constrained_layout=True,figsize=(30,15))
for i,e in enumerate(train_copy.columns):
linear_regr_test.fit(train_copy[e].values[:,np.newaxis], y.values)
axes[i].set_title("Best fit line")
axes[i].set_xlabel(str(e))
axes[i].set_ylabel('SalePrice')
axes[i].scatter(train_copy[e].values[:,np.newaxis], y,color='g')
axes[i].plot(train_copy[e].values[:,np.newaxis],
linear_regr_test.predict(train_copy[e].values[:,np.newaxis]),color='k')
You can use Seaborn's regplot function, and use the predicted and actual data for comparison. It is not the same as plotting a best fit line, but it shows you how well the model works.
sns.regplot(x=y_test, y=y_predict, ci=None, color="b")
You could try to visualize how well your model is performing by comparing actual and predicted values.
Assuming that our actual values are stored in Y, and the predicted ones in Y_, we could plot and compare both.
import seaborn as sns
ax1 = sns.distplot(Y, hist=False, color="r", label="Actual Value")
sns.distplot(Y_, hist=False, color="b", label="Fitted Values" , ax=ax1)

Excel-like Interpolation in Python

Plotting my data in excel as a scatter plot with smooth line and markers produces the type of figure I'm expecting. Image of Excel plots:
However when trying to plot the data in matplotlib I'm running into some issues with interpolation. I'm using the interpolation package from SciPy, I've tried a range of different interpolation methods including spline interpolation and BarycentricInterpolator as suggested previously. These plots are obviously very different to the excel produced plots however:
I've tried different smoothing and k values for spline interpolation, while the curve changes the root problem still exists.
How would I be able to produce a fitted curve similar to the excel-produced plots?
Thanks
The problem is that you interpolate the data on a linear scale but expect the outcome to look smooth on a logarithmic scale.
The idea would therefore be perform the interpolation on a log scale already by transforming the data to its logarithm first and then perform the interpolation. You can then transform it back to linear scale such that you can plot it on a log scale again.
from scipy.interpolate import interp1d, Akima1DInterpolator
import numpy as np
import matplotlib.pyplot as plt
x = np.array([0.02,0.2,2,20,200])
y = np.array([700,850,680,410, 700])
plt.plot(x,y, marker="o", ls="")
sx=np.log10(x)
xi_ = np.linspace(sx.min(),sx.max(), num=201)
xi = 10**(xi_)
f = interp1d(sx,y, kind="cubic")
yi = f(xi_)
plt.plot(xi,yi, label="cubic spline")
f2 = Akima1DInterpolator(sx, y)
yi2 = f2(xi_)
plt.plot(xi,yi2, label="Akima")
plt.gca().set_xscale("log")
plt.legend()
plt.show()

Python plot connecting lines between dots issue

Trying to fit a sin function using curve_fit to some points here. When plotting both the fit and the points, I get something I have no idea how to explain, so I better post some images.
Using the following code line yields:
plt.plot(phase_min, sinusoidal_function(phase_min, *popt), '.', lw=3)
Using the line style '-', I get:
How can I just have a damn line connecting each adjacent dot, not all the dots in between?
Thanks!
When you plot a line in matplotlib, it automatically connects the points in the same order as they are provided. See the example below:
import matplotlib.pyplot as plt
plt.plot([1,3,2], [1,2,3])
Your problem is that your phase_min is not sorted and matplotlib is trying to connect your data points in order. Actually since you already got the fitted function, you don't need to use the original data to plot the function. You can just define the data points you want to plot the line as below. In this way you can have as many data points as you want to make the plot, so the line will be more smooth than if you make the plots using your original data points.
x=np.arange(0, 1, 0.001)
y=sinusoidal_function(x, *popt)
plt.plot(x,y)
You could use np.argsort:
order = np.argsort(x)
xsorted = x[order]
ysorted = y[order]
where x andy are the coordinates of your orange dots.

plotting curve decision boundary in python using matplotlib

I am new to machine learning with python. I've managed to draw the straight decision boundary for logistic regression using matplotlib. However, I am facing a bit of difficulty in plotting a curve line to understand the case of overfitting using some sample dataset.
I am trying to build a logistic regression model using regularization and use regularization to control overfitting my data set.
I am aware of the sklearn library, however I prefer writing code separately
The test data sample I am working on is given below:
x=np.matrix('2,300;4,600;7,300;5,500;5,400;6,400;3,400;4,500;1,200;3,400;7,700;3,550;2.5,650')
y=np.matrix('0;1;1;1;0;1;0;0;0;0;1;1;0')
The decision boundary I am expecting is given in the graph below:
Any help would be appreciated.
I could plot a straight decision boundary using the code below:
# plot of x 2D
plt.figure()
pos=np.where(y==1)
neg=np.where(y==0)
plt.plot(X[pos[0],0], X[pos[0],1], 'ro')
plt.plot(X[neg[0],0], X[neg[0],1], 'bo')
plt.xlim([min(X[:,0]),max(X[:,0])])
plt.ylim([min(X[:,1]),max(X[:,1])])
plt.show()
# plot of the decision boundary
plt.figure()
pos=np.where(y==1)
neg=np.where(y==0)
plt.plot(x[pos[0],1], x[pos[0],2], 'ro')
plt.plot(x[neg[0],1], x[neg[0],2], 'bo')
plt.xlim([x[:, 1].min()-2 , x[:, 1].max()+2])
plt.ylim([x[:, 2].min()-2 , x[:, 2].max()+2])
plot_x = [min(x[:,1])-2, max(x[:,1])+2] # Takes a lerger decision line
plot_y = (-1/theta_NM[2])*(theta_NM[1]*plot_x +theta_NM[0])
plt.plot(plot_x, plot_y)
And my decision boundary looks like this:
In an ideal scenario the above decision boundary is good but I would like to plot a curve decision boundary that will fit my training data very well but will overfit my test data. something similar to shown in the 1st plot
This can be done by gridding the parameter space and setting each grid point to the value of the closest point. Then running a contour plot on this grid.
But there are numerous variations, such as setting it to a value of a distance-weighted average; or smoothing the final contour; etc.
Here's an example for finding the initial contour:
import numpy as np
import matplotlib.pyplot as plt
# get the data as numpy arrays
xys = np.array(np.matrix('2,300;4,600;7,300;5,500;5,400;6,400;3,400;4,500;1,200;3,400;7,700;3,550;2.5,650'))
vals = np.array(np.matrix('0;1;1;1;0;1;0;0;0;0;1;1;0'))[:,0]
N = len(vals)
# some basic spatial stuff
xs = np.linspace(min(xys[:,0])-2, max(xys[:,0])+1, 10)
ys = np.linspace(min(xys[:,1])-100, max(xys[:,1])+100, 10)
xr = max(xys[:,0]) - min(xys[:,0]) # ranges so distances can weight x and y equally
yr = max(xys[:,1]) - min(xys[:,1])
X, Y = np.meshgrid(xs, ys) # meshgrid for contour and distance calcs
# set each gridpoint to the value of the closest data point:
Z = np.zeros((len(xs), len(ys), N))
for n in range(N):
Z[:,:,n] = ((X-xys[n,0])/xr)**2 + ((Y-xys[n,1])/yr)**2 # stack arrays of distances to each points
z = np.argmin(Z, axis=2) # which data point is the closest to each grid point
v = vals[z] # set the grid value to the data point value
# do the contour plot (use only the level 0.5 since values are 0 and 1)
plt.contour(X, Y, v, cmap=plt.cm.gray, levels=[.5]) # contour the data point values
# now plot the data points
pos=np.where(vals==1)
neg=np.where(vals==0)
plt.plot(xys[pos,0], xys[pos,1], 'ro')
plt.plot(xys[neg,0], xys[neg,1], 'bo')
plt.show()

Categories