How to draw line between two points in a dataframe using matplotlib? - python

I'm trying to compare my predicted output and the test data using matplotlib.As I'm new to python I'm not able to find how to connect each entry with line like in this photo.
I was able to write a code like this which compares the Y coordinate and entries but I'm unable to map each entry of test data with predicted output with a line
X_1 = range(len(Y_test))
plt.figure(figsize=(5,5))
plt.scatter(X_1, output, label='Y_output',alpha=0.3)
plt.scatter(X_1, Y_test, label='Y_test',alpha=0.3)
plt.title("Scatter Plot")
plt.legend()
plt.xlabel("entries")
plt.ylabel("Y value")
plt.show()
graph we are getting

Try something like this in addition to your code
plt.plot(np.stack((X_1,X_1)), np.stack((output,Y_test)), color="black")
In fact, to reproduce the plot you want, you need different x for output and for Y_test (for example, X_1 and X_2 that are different).

Related

Plot colours in custom function (matplotlib)

I am attempting to write a function that can plot a best fit curve and its original data points. I would ideally like to run the function for 4-5 data sets and have them all appear on the same figure. The function I have at the moment does this well for plotting the best fit curve, but when I add in the individual data points they show up as a different colour to the best fit curve.
I would like them both to be the same colour so that when I run the function 4-5 times it is not too messy with 10 or so different colours. Ideally I would like the output to be like this
My code:
def plot(k, w, lab):
popt, pcov = cf(linfunc, np.log(k), np.log(w))
yfit = linfunc(np.log(k), *popt)
plt.plot(np.log(k), yfit, '-', label = lab)
plt.plot(np.log(k), np.log(w), 'o')
plt.legend();
plot(k2ml, w2ml, '2ml')
Additionally, is there a way that I could make my function take any input for the parameter "lab" and have it automatically converted to a string so it can be used in the legend?
So what You want is to plot line and it's fit in the same colors.
To achieve Your goal, You can plot first line, get it's color and then set this color to the fit line.
Here is small code snippet doing that:
# Plot first line and get list of plotted lines
lines = plt.plot([0,1,2,3,4], [5,6,7,8,9])
# Get color of first (and only) line
line_color = lines[0].get_color()
# Plot Your fit with same color parameter
plt.plot([0,1,2,3,4], [0,1,2,3,4], color=line_color)
As for label, I would just convert it into string with str(lab).

How do I plot for Multiple Linear Regression Model using matplotlib

I try to Fit Multiple Linear Regression Model
Y= c + a1.X1 + a2.X2 + a3.X3 + a4.X4 +a5X5 +a6X6
Had my model had only 3 variable I would have used 3D plot to plot.
How can I plot this . I basically want to see how the best fit line looks like or should I plot multiple scatter plot and see the effect of individual variable
Y = a1X1 when all others are zero and see the best fit line.
What is the best approach for these models. I know it is not possible to visualize higher dimensions want to know what should be the best approach. I am desperate to see the best fit line
I found this post which is more helpful and followed
https://stats.stackexchange.com/questions/73320/how-to-visualize-a-fitted-multiple-regression-model.
Based on suggestions
I am currently just plotting scatter plots like dependent variable vs. 1st independent variable, then vs. 2nd independent variable etc I am doing same thing . I may not be able to see best fit line for complete model but I know how it is dependent on individual variable
from sklearn.linear_model import LinearRegression
train_copy = train[['OverallQual', 'AllSF','GrLivArea','GarageCars']]
train_copy =pd.get_dummies(train_copy)
train_copy=train_copy.fillna(0)
linear_regr_test = LinearRegression()
fig, axes = plt.subplots(1,len(train_copy.columns.values),sharey=True,constrained_layout=True,figsize=(30,15))
for i,e in enumerate(train_copy.columns):
linear_regr_test.fit(train_copy[e].values[:,np.newaxis], y.values)
axes[i].set_title("Best fit line")
axes[i].set_xlabel(str(e))
axes[i].set_ylabel('SalePrice')
axes[i].scatter(train_copy[e].values[:,np.newaxis], y,color='g')
axes[i].plot(train_copy[e].values[:,np.newaxis],
linear_regr_test.predict(train_copy[e].values[:,np.newaxis]),color='k')
You can use Seaborn's regplot function, and use the predicted and actual data for comparison. It is not the same as plotting a best fit line, but it shows you how well the model works.
sns.regplot(x=y_test, y=y_predict, ci=None, color="b")
You could try to visualize how well your model is performing by comparing actual and predicted values.
Assuming that our actual values are stored in Y, and the predicted ones in Y_, we could plot and compare both.
import seaborn as sns
ax1 = sns.distplot(Y, hist=False, color="r", label="Actual Value")
sns.distplot(Y_, hist=False, color="b", label="Fitted Values" , ax=ax1)

Python matplotlib: legend gives wrong result for scatter

I'm trying to visualize fashion MNIST dataset with different dimensional reduction techniques and I also want to attach the legend to resulted picture with so called real_labels which tells the real name of the label. For fashion MNIST real labels are:
real_labels = ['t-shirt','trouser','pullover','dress','coat','sandal','shirt','sneaker','bag','ankle boot']
I'm doing the plotting part inside of following fucntion:
def Draw_datasamples_to_figure(X_scaled, labels, axis):
y = ['${}$'.format(i) for i in labels]
num_cls = len(list(set(labels)))
for (X_plot, Y_plot, y1, label1) in zip(X_scaled[:,0], X_scaled[:,1], y, labels):
axis.scatter(X_plot, Y_plot, color=cm.gnuplot(int(label1)/num_cls),label=y1, marker=y1, s=60)
, where X_scaled tells x and y coordinate, labels are integer numbers (0-9) for class information and axis tells in which subplot window picture will be drawn.
The legend is drawn with following command:
ax3.legend(real_labels, loc='center left', bbox_to_anchor=(1, 0.5))
Everything seems to work pretty well until the legend is drawn to picture. As you can see from picture in below, instead of numbers goes from 0 to 9, the chosen numbers in legend are arbitrary.
I know that the problem is probably in scatter part and I should implement it in another way but I hope that there is still something simple that I miss which can fix my implementation. I don't want either to use hand-made legend in where markers and names are defined in the code because I have also other datasets with different classes and real label names. Thanks in advance!

Plotting a curve from dots of a scatter plot

I'm facing a silly problem while plotting a graph from a regression function calculated using sci-kit-learn. After constructing the function I need to plot a graph that shows X and Y from the original data and calculated dots from my function. The problem is that my function is not a line, despite being linear, it uses a Fourier series in order to give the right shape for my curve, and when I try to plot the lines using:
ax.plot(df['GDPercapita'], modelp1.predict(df1), color='k')
I got a Graph like this:
Plot
But the trhu graph is supposed to be a line following those black points:
Dots to be connected
I'm generating the graph using the follow code:
fig, ax = plt.subplots()
ax.scatter(df['GDPercapita'], df['LifeExpectancy'], edgecolors=(0, 0, 0))
ax.scatter(df['GDPercapita'], modelp1.predict(df1),color='k') #this line is changed to get the first pic.
ax.set_xlabel('Measured')
ax.set_ylabel('Predicted')
plt.show(block=True)
Does anyone have an idea about what to do?
POST DISCUSSION EDIT:
Ok, so first things first:
The data can be download at: http://www.est.ufmg.br/~marcosop/est171-ML/dados/worldDevelopmentIndicators.csv
I had to generate new data using a Fourier expasion, with normalized values of GDPercapita, in order to perform an exhaustive optimization algorithm for Regression Function used to predict the LifeExpectancy, and found out the number o p parameters that generate the best Regression Function, this number is p=22.
Now I have to generate a Polynomial Function using the predictions points of the regression fuction with p=22, to show how the best regression function is compared to the Polynomial function with the 22 degrees.
To generate the prediction I use the following code:
from sklearn import linear_model
modelp22 = linear_model.LinearRegression()
modelp22.fit(xp22,y_train)
df22 = df[p22]
fig, ax = plt.subplots()
ax.scatter(df['GDPercapita'], df['LifeExpectancy'], edgecolors=(0, 0, 0))
ax.scatter(df['GDPercapita'], modelp22.predict(df22),color='k')
ax.set_xlabel('GDPercapita')
ax.set_ylabel('LifeExpectancy')
plt.show(block=True)
Now I need to use the predictions points to create a Polynomial Function and plot a graph with: The original data(first scatter), the predictions points(secont scatter) and the Polygonal Funciontion (a curve or plot) to show their visual relation.

How to plot Power Iteration Clustering model using matplotlib

So I implemented the Power Iteration Clustering in Spark(inbuilt) with the Dataset I have. I got the model after using this
model = PowerIterationClustering.train(similarities, 2, 10)
When I do
model.assignments.collect()
I've all the values.
Now I want to plot a scatter plot of this model using Matplotlib.
But I'm not able to understand how to do it.
I got that x and y in the below code is id and cluster in model-
plt.scatter(x, y, s=area, c=colors, alpha=0.5)
But I'm not able to understand how to use it. What should be the area, colors ?
You first need to parse the Assignment tuple, then collect. The output will be:
(<id int>, <cluster int>)
Instead of
Assignment(id=..,cluster=...)
You can do this by
model.assignments.map(lambda asm: (asm[0], asm[1])).collect()
You can then extract the x and y from the resulting list of tuples.

Categories