Polynomial regression in Python using sklearn, numpy and matplotlib - python

I'm trying to make a small program that will plot a graph with best fit line and that will predict the COST value based on inputted SIZE value.
I always get this error, and I do not know what it means:
DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
No handles with labels found to put in legend.
This is the graph that I get (red), and I think that curve should look like green curve that i have draw.
And finally, program makes prediction only when I exit the graph.
What am I doing wrong?
This is the code:
import numpy as np
from sklearn.svm import SVR
import matplotlib.pyplot as plt
size=[[1],[2],[3],[4],[5],[7],[9],[10],[11],[13]]
cost=[[10],[22],[35],[48],[60],[80],[92],[111],[118],[133]]
def predict(size,cost,x):
dates=np.reshape(size,(len(size),1))
svr_poly=SVR(kernel="poly",C=1e3, degree=2)
svr_poly.fit(size,cost)
plt.scatter(size,cost, color="blue")
plt.plot(cost, svr_poly.predict(cost), color="red")
plt.xlabel("Size")
plt.ylabel("Cost")
plt.title("prediction")
plt.legend()
plt.show()
predictedcost=predict(size,cost,7)
print(predictedcost)

Here, I found the answer to this problem. So if you are interested, check it
import numpy as np
import matplotlib.pyplot as plt
import math
X = np.array([1,2,3,5,6,7,4,7,8,9,5,10,11,7,6,6,10,11,11,12,13,13,14])
Y=np.array([2,3,5,8,11,14,9,19,15,19,15,16,14,7,13,13,14,13,23,25,26,27,33])
koeficienti_polinom = np.polyfit(X, Y, 2)
a=koeficienti_polinom[0]
b=koeficienti_polinom[1]
c=koeficienti_polinom[2]
xval=np.linspace(np.min(X), np.max(X))
regression=a * xval**2 + b*xval + c
predX = float(input("Enter: "))
predY = a * predX**2 + b*predX + c
plt.scatter(X,Y, s=20, color="blue" )
plt.scatter(predX, predY, color="red")
plt.plot(xval, regression, color="black", linewidth=1)
print("Kvadratno predvidjanje: ",round(predY,2))

Related

How to find the 'peak' of a polynomial regression line in Matplotlib

Is it possible to find the peak (vertex?) values (x,y) of a polynomial regression line that was computed using Matplotlib?
I've included my basic setup below (of course with fuller data sets), as well as a screenshot of the actual regression line question.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LinearRegression
degree=6
setX={'Fixation Duration': {0:1,1:2,2:3}}
setY={'Fixation Occurrences': {0:1,1:2,2:3}}
X_gall=pd.DataFrame.from_dict(setX)
Y_gall=pd.DataFrame.from_dict(setY)
X_seqGall = np.linspace(X_gall.min(),X_gall.max(),300).reshape(-1,1)
polyregGall=make_pipeline(PolynomialFeatures(degree),LinearRegression())
polyregGall.fit(X_gall,Y_gall)
plt.scatter(X_gall,Y_gall, c="#1E4174", s=100.0, alpha=0.4)
plt.plot(X_seqGall,polyregGall.predict(X_seqGall),color="#1E4174", linewidth=4)
plt.show()
would like to find x,y values along red arrows
You can find the maximum from the underlying plot data.
First, let's change your plotting commands to explicitly define the axes:
fig, ax = plt.subplots(figsize=(6,4))
_ = ax.scatter(X_gall,Y_gall, c="#1E4174", s=100.0, alpha=0.4)
poly = ax.plot(X_seqGall,polyregGall.predict(X_seqGall),color="#1E4174", linewidth=4)
plt.show()
Now you can access the line data:
lines = poly[0].axes.lines
for line in lines:
max_y = np.max(line.get_ydata())
print(f"Maximum y is: {max_y}")
x_of_max_y = line.get_xdata()[np.argmax(line.get_ydata())]
print(f"x value of maximum y is: {x_of_max_y}")
Output:
Maximum y is: 3.1515605364361114
x value of maximum y is: 2.8127090301003346

I am not able to display graph in matplotlib

I'm trying to print a logistic differential equation and I'm pretty sure the equation is written correctly but my graph doesn't display anything.
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
def eq(con,x):
return con*x*(1-x)
xList = np.linspace(0,4, num=1000)
con = 2.6
x= .4
for num in range(len(xList)-1):
plt.plot(xList[num], eq(con,x))
x=eq(con,x)
plt.xlabel('Time')
plt.ylabel('Population')
plt.title("Logistic Differential Equation")
plt.show()
You get nothing in your plot because you are plotting points.
In plt you need to have x array and y array (that have the same length) in order to make a plot.
If you want to do exactly what you are doing I suggest to do like this:
import matplotlyb.pyplot as plt # just plt is sufficent
import numpy as np
def eq(con,x):
return con*x*(1-x)
xList = np.linspace(0,4, num=1000)
con = 2.6
x= .4
y = np.zeros(len(xList)) # initialize an array with the same lenght as xList
for num in range(len(xList)-1):
y[num] = eq(con,x)
x=eq(con,x)
plt.figure() # A good habit is always to use figures in plt
plt.plot(xList, y) # 2 arrays of the same lenght
plt.xlabel('Time')
plt.ylabel('Population')
plt.title("Logistic Differential Equation")
plt.show() # now you should get somthing here
I hope that this helps you ^^

is there a simple method to smooth a curve without taking into account future values and without a time shift?

I have a Unix time series (x) with an associated signal value (y) which is generated every minute, dropping the first value and appending a new one. I am trying to smooth the resulting curve without loosing time accuracy with a specific emphasis on the final value of the smoothed curve which will be written to a database. I would like to be able to adjust the smoothing to a considerable degree.
I have studied (as mathematical layman, more or less) all options I could find and I could master. I came across Savitzki Golay which looked perfect until I realized it works well on past data but fails to produce a reliable final value if no future data is available for smoothing. I have tried many other methods which produced results but could not be adjusted like Savgol.
import pandas as pd
from bokeh.plotting import figure, show, output_file
from bokeh.layouts import column
from math import pi
from scipy.signal import savgol_filter
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from scipy.interpolate import splrep, splev
from scipy.ndimage import gaussian_filter1d
from scipy.signal import lfilter
from scipy.interpolate import UnivariateSpline
import matplotlib.pyplot as plt
df_sim = pd.read_csv("/home/20190905_Signal_Smooth_Test.csv")
#sklearn Polynomial*****************************************
poly = PolynomialFeatures(degree=4)
X = df_sim.iloc[:, 0:1].values
print(X)
y = df_sim.iloc[:, 1].values
print(y)
X_poly = poly.fit_transform(X)
poly.fit(X_poly, y)
lin2 = LinearRegression()
lin2.fit(X_poly, y)
# Visualising the Polynomial Regression results
plt.scatter(X, y, color='blue')
plt.plot(X, lin2.predict(poly.fit_transform(X)), color='red')
plt.title('Polynomial Regression')
plt.xlabel('Time')
plt.ylabel('Signal')
plt.show()
#scipy interpolate********************************************
bspl = splrep(df_sim['timestamp'], df_sim['signal'], s=5)
bspl_y = splev(df_sim['timestamp'], bspl)
df_sim['signal_spline'] = bspl_y
#scipy gaussian filter****************************************
smooth = gaussian_filter1d(df_sim['signal'], 3)
df_sim['signal_gauss'] = smooth
#scipy lfilter************************************************
n = 5 # the larger n is, the smoother curve will be
b = [1.0 / n] * n
a = 1
histo_filter = lfilter(b, a, df_sim['signal'])
df_sim['signal_lfilter'] = histo_filter
print(df_sim)
#scipy UnivariateSpline**************************************
s = UnivariateSpline(df_sim['timestamp'], df_sim['signal'], s=5)
xs = df_sim['timestamp']
ys = s(xs)
df_sim['signal_univariante'] = ys
#scipy savgol filter****************************************
sg = savgol_filter(df_sim['signal'], 11, 3)
df_sim['signal_savgol'] = sg
df_sim['date'] = pd.to_datetime(df_sim['timestamp'], unit='s')
#plotting it all********************************************
print(df_sim)
w = 60000
TOOLS = "pan,wheel_zoom,box_zoom,reset,save"
p = figure(x_axis_type="datetime", tools=TOOLS, plot_width=1000, plot_height=250,
title=f"Various Signals y vs Timestamp x")
p.xaxis.major_label_orientation = pi / 4
p.grid.grid_line_alpha = 0.9
p.line(x=df_sim['date'], y=df_sim['signal'], color='green')
p.line(x=df_sim['date'], y=df_sim['signal_spline'], color='blue')
p.line(x=df_sim['date'], y=df_sim['signal_gauss'], color='red')
p.line(x=df_sim['date'], y=df_sim['signal_lfilter'], color='magenta')
p.line(x=df_sim['date'], y=df_sim['signal_univariante'], color='yellow')
p1 = figure(x_axis_type="datetime", tools=TOOLS, plot_width=1000, plot_height=250,
title=f"Savgol vs Signal")
p1.xaxis.major_label_orientation = pi / 4
p1.grid.grid_line_alpha = 0.9
p1.line(x=df_sim['date'], y=df_sim['signal'], color='green')
p1.line(x=df_sim['date'], y=df_sim['signal_savgol'], color='blue')
output_file("signal.html", title="Signal Test")
show(column(p, p1)) # open a browser
I expect a result that is similar to Savitzky Golay but with valid final smoothed values for the data series. None of the other methods present the same flexibility to adjust the grade of smoothing. Most other methods shift the curve to the right. I can provide to csv file for testing.
This really depends on why you are smoothing the data. Every smoothing method will have side effects, such as letting some 'noise' through more than other. Research 'phase response of filtering'.
A common technique to avoid the problem of missing data at the end of a symmetric filter is to just forecast your data a few points ahead and use that. For example, if you are using a 5-term moving average filter you will be missing 2 data points when you go to calculate your end value.
To forecast these two points, you could use the auto_arima() function from the pmdarima module, or look at the fbprophet module (which I find quite good for this kind of situation).

How do I use scipy and matplotlib to fit a reverse sigmoid function

I have some data points which I was successfully able to graph, but now I would like to fit a curve to the data. I looked into other stackoverflow answers and found a few questions, but I can't seem to implement them. I know the function is a reverse sigmoid.
I would like to use this hill equation: https://imgur.com/rYqEASm
So far I tried to use the curve_fit() function from the scipy package to find the parameters but my code always breaks.
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
x = np.array([1, 1.90, 7.70, 30.10, 120.40, 481.60, 1925.00, 7700.00])
y = np.array([4118.47, 4305.79, 4337.47, 4838.11, 2660.76, 1365.05, 79.21, -16.40])
def fit_hill(t,b,s,i,h):
return b + ((t-b)/(1 + (((x * s)/i)**-h)))
plt.plot(x,y, 'o')
plt.xscale('log')
plt.show()
params = curve_fit(fit_hill, x, y)
[t,b,s,i,h] = params[0]
fit_hill should have 6 parameters.
(see https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html)
fit_hill(x,t,b,s,i,h).
You should try to give an initial guess for parameters.
For example in your model, when x=0, the value is t. So you can set the value at x=0 as an estimate for t.
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
x = np.array([1, 1.90, 7.70, 30.10, 120.40, 481.60, 1925.00])
y = np.array([4118.47, 4305.79, 4337.47, 4838.11, 2660.76, 1365.05, 79.21])
def fit_hill(x,t,b,s,i,h):
return b + ((t-b)/(1 + (((x * s)/i)**-h)))
plt.plot(x,y, 'o')
popt,pcov = curve_fit(fit_hill, x, y,(4118,200,1,1900,-2))
plt.plot(x,fit_hill(x,*popt),'+')
plt.xscale('log')
plt.show()
Have you drawn your model to visualize if it is suitable for you data ?
s and i, used only in s/i could be replaced with one variable in your model.

resampled time using scipy.signal.resample

I have a signal that is not sampled equidistant; for further processing it needs to be. I thought that scipy.signal.resample would do it, but I do not understand its behavior.
The signal is in y, corresponding time in x.
The resampled is expected in yy, with all corresponding time in xx. Does anyone know what I do wrong or how to achieve what I need?
This code does not work: xx is not time:
import numpy as np
from scipy import signal
import matplotlib.pyplot as plt
x = np.array([0,1,2,3,4,5,6,6.5,7,7.5,8,8.5,9])
y = np.cos(-x**2/4.0)
num=50
z=signal.resample(y, num, x, axis=0, window=None)
yy=z[0]
xx=z[1]
plt.plot(x,y)
plt.plot(xx,yy)
plt.show()
Even when you give the x coordinates (which corresponds to the t argument), resample assumes that the sampling is uniform.
Consider using one of the univariate interpolators in scipy.interpolate.
For example, this script:
import numpy as np
from scipy import interpolate
import matplotlib.pyplot as plt
x = np.array([0,1,2,3,4,5,6,6.5,7,7.5,8,8.5,9])
y = np.cos(-x**2/4.0)
f = interpolate.interp1d(x, y)
num = 50
xx = np.linspace(x[0], x[-1], num)
yy = f(xx)
plt.plot(x,y, 'bo-')
plt.plot(xx,yy, 'g.-')
plt.show()
generates this plot:
Check the docstring of interp1d for options to control the interpolation, and also check out the other interpolation classes.

Categories