Python sklearn poly regression

Python sklearn poly regression - python

I'm stuck solving this issue for two days now. I have some datapoints I put in a scatter plot and get this:
Which is nice, but now I also want to add a regression line, so I had a look at this example from sklearn and changed the code to this
import numpy as np
import matplotlib.pyplot as plt
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
degrees = [3, 4, 5]
X = combined[['WPI score']]
y = combined[['CPI score']]
plt.figure(figsize=(14, 5))
for i in range(len(degrees)):
ax = plt.subplot(1, len(degrees), i + 1)
plt.setp(ax, xticks=(), yticks=())
polynomial_features = PolynomialFeatures(degree=degrees[i], include_bias=False)
linear_regression = LinearRegression()
pipeline = Pipeline([("polynomial_features", polynomial_features), ("linear_regression", linear_regression)])
pipeline.fit(X, y)
# Evaluate the models using crossvalidation
scores = cross_val_score(pipeline, X, y, scoring="neg_mean_squared_error", cv=10)
X_test = X #np.linspace(0, 1, len(combined))
plt.plot(X, pipeline.predict(X_test), label="Model")
plt.scatter(X, y, label="CPI-WPI")
plt.xlabel("X")
plt.ylabel("y")
plt.legend(loc="best")
plt.title("Degree {}\nMSE = {:.2e}(+/- {:.2e})".format(degrees[i], -scores.mean(), scores.std()))
plt.savefig(pic_path + 'multi.png', bbox_inches='tight')
plt.show()
which has the following output:
Note that X and y are both DataFrames of size (151, 1). I can post the content of X and y too, if necessary.
What I want is a nice smooth line, but I seem not to be able to figure out, how to do this.
[Edit]
The question here is: How do I get a single smooth, curvy polynomial line instead of multiple ones with seemingly random pattern.
[Edit 2]
The problem is, when I use the linspace like this:
X_test = np.linspace(1, 4, 151)
X_test = X_test[:, np.newaxis]
I get a even more random pattern:

The trick was to set the code like following:
X_test = np.linspace(min(X['GPI score']), max(X['GPI score']), X.shape[0])
X_test = X_test[:, np.newaxis]
plt.plot(X_test, pipeline.predict(X_test), label="Model")
Which yields the following result (a much nicer, single smooth line)

Related

How to fit more complex funtions with sklearn?

I used sklearn in python to fit polynomial functions:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
poly = PolynomialFeatures(degree=2, include_bias=False)
poly_reg_model = LinearRegression()
poly_features = poly.fit_transform(xvalues.reshape(-1, 1))
poly_reg_model.fit(poly_features, y_values)
final_predicted = poly_reg_model.predict(poly_features)
...
Instead of only using x^n parts, I want to incude a (1-x^2)^(1/2) part in the fit-function.
Is this possible with sklearn?
I tried to define a Feature which includes more complex terms but I falied to achieve this.

No idea whether it is possible within scikitlearn - after all polynomial fit is constrained to specific polynomial formulations from the mathematical stanndpoint. If you want to fit a formula with some unknown parameters, you can use scipy.optimize.curve_fit. First let us generate some dummy data with noise:
import numpy as np
from matplotlib import pyplot as plt
def f(x):
return (1-x**2)**(1/2)
xvalues = np.linspace(-1, 1, 30)
yvalues = [f(x) + np.random.randint(-10, 10)/100 for x in xvalues]
Then, we set up our function to be optimized:
from scipy.optimize import curve_fit
def f_opt(x, a, b):
return (a-x**2)**(1/b)
popt, pcov = curve_fit(f_opt, xvalues, yvalues)
You can of course modify this function to be more elastic. Finally we plot the results
plt.scatter(xvalues, yvalues, label='data')
plt.plot(xvalues, f_opt(xvalues, *popt), 'r-', label='fit')
plt.legend()
So now you can use f_opt(new_x, *popt) to predict new points (alternatively you can print the values and hard-code them). popt basically has the parameters that you specify in f_opt except x - for more details check the documentation I've linked!

Polynomial Regression plot not showing correctly

I run this code for polynomial regression using sklearn but my plot is not what i was expecting. As you can see here i'm not getting a smooth line but it's jumping from one point to another. From my understanding i have to sort X, but when i do that all i get is an empty plot with a linear line.
import operator
import numpy as np
from sklearn.cluster import KMeans
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.metrics import mean_squared_error, r2_score
import statsmodels.formula.api as smf
df = pd.read_csv('D:\Mall_Customers.csv', usecols = ['Age', 'Annual Income (k$)','Spending Score (1-100)'])
x = StandardScaler().fit_transform(df)
kmeans = KMeans(n_clusters=3, max_iter=100)
y_kmeans= kmeans.fit_predict(x)
mydict = {i: np.where(kmeans.labels_ == i)[0] for i in range(kmeans.n_clusters)}
dictlist = []
for key, value in mydict.items():
temp = [key,value]
dictlist.append(temp)
df0 = df[df.index.isin(mydict[0].tolist())]
X = df0[['Age', 'Annual Income (k$)']]
Y = df0['Spending Score (1-100)']
poly_reg = PolynomialFeatures(degree=2)
X_poly = poly_reg.fit_transform(X)
model = LinearRegression()
model.fit(X_poly, Y)
y_poly_pred = model.predict(X_poly)
r2 = r2_score(Y,y_poly_pred)
print(r2)
model = make_pipeline(PolynomialFeatures(degree=2), LinearRegression(fit_intercept = False))
model.fit(X,Y)
plt.scatter(X.iloc[:, 1], Y, color='red')
plt.plot(X, Y, color='blue')
plt.xlabel('Age. Annual income')
plt.ylabel('Spending Score')
plt.show()

TLDR; the data is not linear dependent.
The reason the graph got so messy is because you plotted the X (train data) with the Y (the actual prediction data) and the fact that you were plotting this data while:
the data was messy and not really linear dependent
is what made the result this messy graph.
I suggest you to:
split to the train data into train, test and then after you train the model check the error with the test and maybe create 2 plots, 1 with the model results according to the test data and one with the actual result for the test data.
and change plot code to this:
.
plt.scatter(X, Y)
plt.plot(X, Y_pred, color='red')
plt.show()

Linear regression plot is really bad

observations that are different from each other so i run regression again but for only one cluster.But it also came out wrong What exactly is wrong here? I'll also have to point out that i am still new to this (linerear regression etc.) so my understanding of all this is still bad. How can i fix this plot and please if it's possible try to explain why it's wrong.
Code :
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import numpy as np
np.random.seed(0)
kmeans.cluster_centers_
kmeans.labels_
n,y_test = train_test_split(X, Y, test_size = 0.4, random_state = 0)
plt.scatter(X.iloc[:, 1], Y)
plt.show()

You're performing multiple linear regression, since you have 2 input features ('Age', 'Annual Income (k$)') that try to predict the output feature ('Spending Score (1-100)'). You need to plot this data in 3D, in order to properly visualize the regression.
Even though I can't test your code without the data, something like this should work (after training the model):
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X.iloc[:, 0], X.iloc[:, 1], Y)
ax.plot(X.iloc[:, 0], X.iloc[:, 1], y_pred, color='red')
ax.set_xlabel('Age')
ax.set_ylabel('Annual Income')
ax.set_zlabel('Spending Score')

Why is this array compared to two values instead of one?

I'm having trouble understanding what this bit of code means. Why is y.values compared to a tuple with two values, when the shape of the array is a single row (650,)?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.model_selection import train_test_split
df = pd.read_csv('readonly/mushrooms.csv')
df2 = pd.get_dummies(df)
df3 = df2.sample(frac=0.08)
X = df3.iloc[:,2:]
y = df3.iloc[:,1]
pca = PCA(n_components=2).fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(pca, y, random_state=0)
plt.figure(dpi=120)
plt.scatter(pca[y.values==0,0], pca[y.values==0,1], alpha=0.5, label='Edible', s=2) # < --- why is y.values compared to two values instead of one???
plt.scatter(pca[y.values==1,0], pca[y.values==1,1], alpha=0.5, label='Poisonous', s=2)
plt.legend()
plt.title('Mushroom Data Set\nFirst Two Principal Components')
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.gca().set_aspect('equal')

Actually y.values is a single row. In pca[y.values==0,0], it is obtaining the values which satisfies y.values==0 condition.
Please see the fit_transform documentation. It outputs a 2-dimensional array.

R abline() equivalent in Python

I am trying to plot a Linear Regression onto a scatterplot in Python.
In R I would simply do the following:
Run OLS Linear Regresion
fit_1 <- lm(medv ~ lstat)
plot(medv ~ lstat)
abline(fit_1, col = "red")
I have been looking at different solutions in Python, but I can't seem to be able to actually get it to work.
My script is:
Plot Data
Boston.plot(kind='scatter', x='medv', y='lstat', color = "black")
plt.show()
Run Linear Regression
fit_1 = sm.ols(formula='medv ~ lstat', data= Boston).fit()
Show Summary
fit_1.summary()
Plot Regression Line
Insert code here

It can be done quite simply. In the below code, I use sklearn to fit the model and predict the values.
import pandas as pd
from sklearn.linear_model import LinearRegression
from matplotlib import pyplot as plt
model = LinearRegression()
model.fit(X,y)
predictions = model.predict(X)
plt.plot(X,y,'o')
# change here
plt.plot(X, predictions, '-')
plt.show()

Try this:
plt.plot(Boston.lstat, fit_1.fittedvalues, 'r')

Saw this on Statology that helped me a lot:
def abline(slope, intercept):
axes = plt.gca()
x_vals = np.array(axes.get_xlim())
y_vals = intercept + slope * x_vals
plt.plot(x_vals, y_vals, '--')

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python sklearn poly regression - python

The trick was to set the code like following: X_test = np.linspace(min(X['GPI score']), max(X['GPI score']), X.shape[0]) X_test = X_test[:, np.newaxis] plt.plot(X_test, pipeline.predict(X_test), label="Model") Which yields the following result (a much nicer, single smooth line)

Related

How to fit more complex funtions with sklearn?

Polynomial Regression plot not showing correctly

Linear regression plot is really bad

Why is this array compared to two values instead of one?

R abline() equivalent in Python

Categories

Resources