I used sklearn in python to fit polynomial functions:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
poly = PolynomialFeatures(degree=2, include_bias=False)
poly_reg_model = LinearRegression()
poly_features = poly.fit_transform(xvalues.reshape(-1, 1))
poly_reg_model.fit(poly_features, y_values)
final_predicted = poly_reg_model.predict(poly_features)
...
Instead of only using x^n parts, I want to incude a (1-x^2)^(1/2) part in the fit-function.
Is this possible with sklearn?
I tried to define a Feature which includes more complex terms but I falied to achieve this.
No idea whether it is possible within scikitlearn - after all polynomial fit is constrained to specific polynomial formulations from the mathematical stanndpoint. If you want to fit a formula with some unknown parameters, you can use scipy.optimize.curve_fit. First let us generate some dummy data with noise:
import numpy as np
from matplotlib import pyplot as plt
def f(x):
return (1-x**2)**(1/2)
xvalues = np.linspace(-1, 1, 30)
yvalues = [f(x) + np.random.randint(-10, 10)/100 for x in xvalues]
Then, we set up our function to be optimized:
from scipy.optimize import curve_fit
def f_opt(x, a, b):
return (a-x**2)**(1/b)
popt, pcov = curve_fit(f_opt, xvalues, yvalues)
You can of course modify this function to be more elastic. Finally we plot the results
plt.scatter(xvalues, yvalues, label='data')
plt.plot(xvalues, f_opt(xvalues, *popt), 'r-', label='fit')
plt.legend()
So now you can use f_opt(new_x, *popt) to predict new points (alternatively you can print the values and hard-code them). popt basically has the parameters that you specify in f_opt except x - for more details check the documentation I've linked!
Related
I have a set of value for x and y and I'm looking to find a way to find the value of a parameter for a function.
I have a function y = Ax^{4/3}.
I was thinking about using curvefit, but I'm not sure if this is the right way.
I tried to linearize the function and find the slope with polyfit, but the slope change radically if I remove some points.
Edit: I tried curvefit and something strange is happening. curvefit gives me A=0.55, but this value doesn't fit at all. However, 0.055 seems to work.
Here's my code.
def func(A,x):
return A*x**(4/3)
popt,pcov = curve_fit(func,x[:18], y[:18])
Any help will be appraciated.
Here is an example on how to fit data to your model:
Import relevant libraries:
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
Define x values:
x = np.linspace(0, 10)
Define a function representing your model:
def f(x: np.ndarray, a: float) -> float:
return a * x ** (4/3)
Let's sample data from the above model and add noise:
y = f(x, a=16) * np.random.uniform(1, 2, len(x))
Perform the curve fitting:
popt, pcov = curve_fit(f, x, y)
Plot the results:
plt.scatter(x, y)
plt.plot(x, f(x, *popt), c="r")
plt.show()
I have a Unix time series (x) with an associated signal value (y) which is generated every minute, dropping the first value and appending a new one. I am trying to smooth the resulting curve without loosing time accuracy with a specific emphasis on the final value of the smoothed curve which will be written to a database. I would like to be able to adjust the smoothing to a considerable degree.
I have studied (as mathematical layman, more or less) all options I could find and I could master. I came across Savitzki Golay which looked perfect until I realized it works well on past data but fails to produce a reliable final value if no future data is available for smoothing. I have tried many other methods which produced results but could not be adjusted like Savgol.
import pandas as pd
from bokeh.plotting import figure, show, output_file
from bokeh.layouts import column
from math import pi
from scipy.signal import savgol_filter
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from scipy.interpolate import splrep, splev
from scipy.ndimage import gaussian_filter1d
from scipy.signal import lfilter
from scipy.interpolate import UnivariateSpline
import matplotlib.pyplot as plt
df_sim = pd.read_csv("/home/20190905_Signal_Smooth_Test.csv")
#sklearn Polynomial*****************************************
poly = PolynomialFeatures(degree=4)
X = df_sim.iloc[:, 0:1].values
print(X)
y = df_sim.iloc[:, 1].values
print(y)
X_poly = poly.fit_transform(X)
poly.fit(X_poly, y)
lin2 = LinearRegression()
lin2.fit(X_poly, y)
# Visualising the Polynomial Regression results
plt.scatter(X, y, color='blue')
plt.plot(X, lin2.predict(poly.fit_transform(X)), color='red')
plt.title('Polynomial Regression')
plt.xlabel('Time')
plt.ylabel('Signal')
plt.show()
#scipy interpolate********************************************
bspl = splrep(df_sim['timestamp'], df_sim['signal'], s=5)
bspl_y = splev(df_sim['timestamp'], bspl)
df_sim['signal_spline'] = bspl_y
#scipy gaussian filter****************************************
smooth = gaussian_filter1d(df_sim['signal'], 3)
df_sim['signal_gauss'] = smooth
#scipy lfilter************************************************
n = 5 # the larger n is, the smoother curve will be
b = [1.0 / n] * n
a = 1
histo_filter = lfilter(b, a, df_sim['signal'])
df_sim['signal_lfilter'] = histo_filter
print(df_sim)
#scipy UnivariateSpline**************************************
s = UnivariateSpline(df_sim['timestamp'], df_sim['signal'], s=5)
xs = df_sim['timestamp']
ys = s(xs)
df_sim['signal_univariante'] = ys
#scipy savgol filter****************************************
sg = savgol_filter(df_sim['signal'], 11, 3)
df_sim['signal_savgol'] = sg
df_sim['date'] = pd.to_datetime(df_sim['timestamp'], unit='s')
#plotting it all********************************************
print(df_sim)
w = 60000
TOOLS = "pan,wheel_zoom,box_zoom,reset,save"
p = figure(x_axis_type="datetime", tools=TOOLS, plot_width=1000, plot_height=250,
title=f"Various Signals y vs Timestamp x")
p.xaxis.major_label_orientation = pi / 4
p.grid.grid_line_alpha = 0.9
p.line(x=df_sim['date'], y=df_sim['signal'], color='green')
p.line(x=df_sim['date'], y=df_sim['signal_spline'], color='blue')
p.line(x=df_sim['date'], y=df_sim['signal_gauss'], color='red')
p.line(x=df_sim['date'], y=df_sim['signal_lfilter'], color='magenta')
p.line(x=df_sim['date'], y=df_sim['signal_univariante'], color='yellow')
p1 = figure(x_axis_type="datetime", tools=TOOLS, plot_width=1000, plot_height=250,
title=f"Savgol vs Signal")
p1.xaxis.major_label_orientation = pi / 4
p1.grid.grid_line_alpha = 0.9
p1.line(x=df_sim['date'], y=df_sim['signal'], color='green')
p1.line(x=df_sim['date'], y=df_sim['signal_savgol'], color='blue')
output_file("signal.html", title="Signal Test")
show(column(p, p1)) # open a browser
I expect a result that is similar to Savitzky Golay but with valid final smoothed values for the data series. None of the other methods present the same flexibility to adjust the grade of smoothing. Most other methods shift the curve to the right. I can provide to csv file for testing.
This really depends on why you are smoothing the data. Every smoothing method will have side effects, such as letting some 'noise' through more than other. Research 'phase response of filtering'.
A common technique to avoid the problem of missing data at the end of a symmetric filter is to just forecast your data a few points ahead and use that. For example, if you are using a 5-term moving average filter you will be missing 2 data points when you go to calculate your end value.
To forecast these two points, you could use the auto_arima() function from the pmdarima module, or look at the fbprophet module (which I find quite good for this kind of situation).
I have some data points which I was successfully able to graph, but now I would like to fit a curve to the data. I looked into other stackoverflow answers and found a few questions, but I can't seem to implement them. I know the function is a reverse sigmoid.
I would like to use this hill equation: https://imgur.com/rYqEASm
So far I tried to use the curve_fit() function from the scipy package to find the parameters but my code always breaks.
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
x = np.array([1, 1.90, 7.70, 30.10, 120.40, 481.60, 1925.00, 7700.00])
y = np.array([4118.47, 4305.79, 4337.47, 4838.11, 2660.76, 1365.05, 79.21, -16.40])
def fit_hill(t,b,s,i,h):
return b + ((t-b)/(1 + (((x * s)/i)**-h)))
plt.plot(x,y, 'o')
plt.xscale('log')
plt.show()
params = curve_fit(fit_hill, x, y)
[t,b,s,i,h] = params[0]
fit_hill should have 6 parameters.
(see https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html)
fit_hill(x,t,b,s,i,h).
You should try to give an initial guess for parameters.
For example in your model, when x=0, the value is t. So you can set the value at x=0 as an estimate for t.
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
x = np.array([1, 1.90, 7.70, 30.10, 120.40, 481.60, 1925.00])
y = np.array([4118.47, 4305.79, 4337.47, 4838.11, 2660.76, 1365.05, 79.21])
def fit_hill(x,t,b,s,i,h):
return b + ((t-b)/(1 + (((x * s)/i)**-h)))
plt.plot(x,y, 'o')
popt,pcov = curve_fit(fit_hill, x, y,(4118,200,1,1900,-2))
plt.plot(x,fit_hill(x,*popt),'+')
plt.xscale('log')
plt.show()
Have you drawn your model to visualize if it is suitable for you data ?
s and i, used only in s/i could be replaced with one variable in your model.
I'm trying to make a small program that will plot a graph with best fit line and that will predict the COST value based on inputted SIZE value.
I always get this error, and I do not know what it means:
DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
No handles with labels found to put in legend.
This is the graph that I get (red), and I think that curve should look like green curve that i have draw.
And finally, program makes prediction only when I exit the graph.
What am I doing wrong?
This is the code:
import numpy as np
from sklearn.svm import SVR
import matplotlib.pyplot as plt
size=[[1],[2],[3],[4],[5],[7],[9],[10],[11],[13]]
cost=[[10],[22],[35],[48],[60],[80],[92],[111],[118],[133]]
def predict(size,cost,x):
dates=np.reshape(size,(len(size),1))
svr_poly=SVR(kernel="poly",C=1e3, degree=2)
svr_poly.fit(size,cost)
plt.scatter(size,cost, color="blue")
plt.plot(cost, svr_poly.predict(cost), color="red")
plt.xlabel("Size")
plt.ylabel("Cost")
plt.title("prediction")
plt.legend()
plt.show()
predictedcost=predict(size,cost,7)
print(predictedcost)
Here, I found the answer to this problem. So if you are interested, check it
import numpy as np
import matplotlib.pyplot as plt
import math
X = np.array([1,2,3,5,6,7,4,7,8,9,5,10,11,7,6,6,10,11,11,12,13,13,14])
Y=np.array([2,3,5,8,11,14,9,19,15,19,15,16,14,7,13,13,14,13,23,25,26,27,33])
koeficienti_polinom = np.polyfit(X, Y, 2)
a=koeficienti_polinom[0]
b=koeficienti_polinom[1]
c=koeficienti_polinom[2]
xval=np.linspace(np.min(X), np.max(X))
regression=a * xval**2 + b*xval + c
predX = float(input("Enter: "))
predY = a * predX**2 + b*predX + c
plt.scatter(X,Y, s=20, color="blue" )
plt.scatter(predX, predY, color="red")
plt.plot(xval, regression, color="black", linewidth=1)
print("Kvadratno predvidjanje: ",round(predY,2))
My question pertains to bayesian inference and how to numerically calculate model evidence given some data and a prior and a posterior distribution.
Given conjugate priors, the wikipedia article specifies model evidence as the following:
Where sigma and beta are parameters, m is the model, Y is the data and X is the prior.
Given the setup below, how do I calculate model evidence? I need something that returns one scalar number.
Below I have a minimal working example of generating some data (draws from a normal) and assuming a prior (a normal) and a likelihood function (a gaussian). Notice how both the PDF of the data and the prior integrate to (approximately) one, while the likelihood function can take values over 1.
I am mainly confused as to how to "integrate out" the parameters from the model, and thus take model complexity into consideration. I can see how this can be done analytically if you can write down the log-likelihood function. But can't really see how this can result in one scalar number.
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
import scipy
import seaborn as sns
sns.set(style="white", palette="muted", color_codes=True)
%matplotlib inline
mu = 0
variance = 1
sigma = np.sqrt(variance)
data = np.random.normal(mu,variance,100)
x = np.linspace(-5,5,100)
density = scipy.stats.kde.gaussian_kde(data)
data_pdf = density(x)
prior_pdf = scipy.stats.norm.pdf(x, mu, sigma)
likelihood = np.exp(-np.power(x - mu, 2.) / (2 * np.power(sigma, 2.)))
I1=scipy.integrate.trapz(data_pdf,x)
I2=scipy.integrate.trapz(prior_pdf,x)
I3=scipy.integrate.trapz(likelihood,x)
fig1 = plt.figure(figsize=(7.5,5))
ax1 = fig1.add_subplot(3,1,1)
sns.despine(right=True)
ax1.plot(x,data_pdf,'k')
ax1.legend([r'$PDF(Data)$'],loc='upper left')
ax2 = fig1.add_subplot(3,1,2)
sns.despine(right=True)
ax2.plot(x,prior_pdf,'b')
ax2.legend([r'$Prior$'],loc='upper left')
ax3 = fig1.add_subplot(3,1,3)
sns.despine(right=True)
ax3.plot(x,likelihood,'r')
ax3.legend([r'$Likelihood$'],loc='upper left')
plt.tight_layout()
print(I1,I2,I3)