what is the fault in my gradient descent algorithm built manually?

what is the fault in my gradient descent algorithm built manually? - python

I am a learner of data science and machine learning. I have written a code for gradient descent optimization of linear regression cost function without using builtin python library. However, just to confirm whether my code is correct and verify results, I have also implemented the same using builtin python library.
The coefficient and intercept values I obtained through my code are not matching with the coefficient and intercept values obtained using builtin python module. Kindly suggest what is the error in my way of gradient descent optimization of linear regression?
my method:
import pandas as pd
import numpy as np
import seaborn as sb
import matplotlib.pyplot as plt
from sklearn.linear_model import SGDRegressor
Data=pd.DataFrame({'X': list(np.arange(0,10,1)), 'Y': [1,3,2,5,7,8,8,9,10,12]})
Data.head()
sb.scatterplot(x ='X', y = 'Y', data = Data)
plt.show()
#generating column of ones
X0 = np.ones(len(Data)).reshape(-1,1)
#print(X0.shape)
X = Data.drop(['Y'], axis = 1).values
X_new = np.concatenate((X0,X), axis = 1)
#print(X_new)
#print(X_new.shape)
Y = Data.loc[:,['Y']].values
#print(Y)
#print(Y.shape)
# initial theta
theta =np.random.randint(low=0, high=1, size= X_new.shape[1]).reshape(-1,1)
#print(theta.shape)
J_history = []
theta_history = [list(theta.flatten())]
#gradient descent implementation
iterations = 1000
alpha = 0.01
m = len(Y)
for iter in range(1,iterations):
H = X_new.dot(theta)
loss = (H-Y)
J = loss/(2*m)
J_history.append(J)
G = X_new.T.dot(loss)/m
theta_new = theta - alpha*G
theta_history.append(list(theta_new.flatten()))
theta = theta_new
# collecting costs (J) and coefficients (theta_0,theta_1)
theta_history.pop()
J_history = [i[0] for i in J_history]
params = pd.DataFrame()
params['J']=J_history
for i in range(len(theta_history[0])):
params['theta_'+str(i)]=[k[i] for k in theta_history]
idx = params[params['J']==min(params['J'])].index
values = params.iloc[idx[0]][1:params.shape[1]].tolist()
print('intercept: {}, coeff: {}'.format(values[0],values[1]))
using builtin library:
import pandas as pd
import numpy as np
import seaborn as sb
import matplotlib.pyplot as plt
from sklearn.linear_model import SGDRegressor
Data=pd.DataFrame({'X': list(np.arange(0,10,1)), 'Y': [1,3,2,5,7,8,8,9,10,12]})
Data.head()
sb.scatterplot(x ='X', y = 'Y', data = Data)
plt.show()
model = SGDRegressor(loss = 'squared_loss', learning_rate = 'constant', eta0 = 0.01, max_iter= 1000)
model.fit(Data['X'].values.reshape(-1,1), Data['Y'].values.reshape(-1,1))
print('coeff: {}, intercept: {}'.format(model.coef_, model.intercept_))

First of all I appreciate your effort to understand and implement by yourself the SGD algorithm.
Now, back to your code. There are some minor errors that need to be corrected:
Your Js are not scalars but numpy.arrays but the way you're using them implies that they're assumed to be scalars hence the error raised when your code is executed.
After running your chain, you must take the theta who has the lowest error and this error is actually J^2 and not J as J may be negative as well.
The scikit learn SGDRegressor that you're actually using is, as its name suggests, stochastic by definition and given the small size of your dataset you need to run it many times and average its estimates if you want to get something reliable from it.
Your learning rate 0.01 seems to be a little big
When those changes are made, I get from your code a "comparable" results with SGDRegressor.
import pandas as pd
import numpy as np
import seaborn as sb
import matplotlib.pyplot as plt
from sklearn.linear_model import SGDRegressor
Data=pd.DataFrame({'X': list(np.arange(0,10,1)), 'Y': [1,3,2,5,7,8,8,9,10,12]})
Data.head()
sb.scatterplot(x ='X', y = 'Y', data = Data)
plt.show()
#generating column of ones
X0 = np.ones(len(Data)).reshape(-1,1)
#print(X0.shape)
X = Data.drop(['Y'], axis = 1).values
X_new = np.concatenate((X0,X), axis = 1)
#print(X_new)
#print(X_new.shape)
Y = Data.loc[:,['Y']].values
#print(Y)
#print(Y.shape)
# initial theta
theta =np.random.randint(low=0, high=1, size= X_new.shape[1]).reshape(-1,1)
#print(theta.shape)
J_history = []
theta_history = [list(theta.flatten())]
#gradient descent implementation
iterations = 2000
alpha = 0.001
m = len(Y)
for iter in range(1,iterations):
H = X_new.dot(theta)
loss = (H-Y)
J = loss/(2*m)
J_history.append(J[0]**2)
G = X_new.T.dot(loss)/m
theta_new = theta - alpha*G
theta_history.append(list(theta_new.flatten()))
theta = theta_new
theta_history.pop()
J_history = [i[0] for i in J_history]
# collecting costs (J) and coefficients (theta_0,theta_1)
params = pd.DataFrame()
params['J']=J_history
for i in range(len(theta_history[0])):
params['theta_'+str(i)]=[k[i] for k in theta_history]
idx = params[params['J']== params['J'].min()].index
values = params.iloc[idx[0]][1:params.shape[1]].tolist()
print('intercept: {}, coeff: {}'.format(values[0],values[1]))
#> intercept: 0.654041555750147, coeff: 1.2625626277290982
Now let's see the scikit learn model
from sklearn.linear_model import SGDRegressor
intercepts = []
coefs = []
for _ in range(500):
model = SGDRegressor(loss = 'squared_loss', learning_rate = 'constant', eta0 = 0.01, max_iter= 1000)
model.fit(Data['X'].values.reshape(-1,1), Data['Y'].values.reshape(-1))
intercepts.append(model.intercept_)
coefs.append(model.coef_)
intercept = np.concatenate(intercepts).mean()
coef = np.vstack(coefs).mean(0)
print('intercept: {}, coeff: {}'.format( intercept, coef))
#> intercept: 0.6912403374422401, coeff: [1.24932246]

Related

Quantile residual Q-Q plot in python

I know how I can get Normal Q-Q plots in Python but how can I get quantile residual Q-Q plots?
I tried to do the three steps written here (Chapter 20.2.6.1):
First I tried to adapt this solution for use with smf.glm (I need to use smf because I have a huge dataframe with hundreds of variables I need to pass):
import numpy as np
from scipy import stats
import statsmodels.formula.api as smf
# generate some data to check
nobs = 1000
n, p = 50, 0.25
dist0 = stats.nbinom(n, p)
y = dist0.rvs(size=nobs)
x = np.ones(nobs)
df_test = pd.DataFrame({'y': y, 'x': x})
loglike_method = 'nb2' # or use 'nb2'
#res = sm.NegativeBinomial(y, x, loglike_method=loglike_method).fit(start_params=[0.1, 0.1])
res = smf.glm(formula="y ~ x", data=df_test, family=sm.families.NegativeBinomial()).fit(start_params=[0.1, 0.1])
print(dist0.mean())
print(res.params)
mu = res.predict() # use this for mean if not constant
mu = mu.mean()
#mu = np.exp(res.params[0]) # shortcut, we just regress on a constant
alpha = res.params[0]
if loglike_method == 'nb1':
Q = 1
elif loglike_method == 'nb2':
Q = 0
size = 1. / alpha * mu**Q
prob = size / (size + mu)
print('data generating parameters'.format(n, p))
print('estimated params'.format(size, prob))
#estimated distribution
dist_est = stats.nbinom(size, prob)
But the estimated parameters are totally off.
Next step would be to call stats.nbinom.cdf with those parameters to simulate values ...
Is this the right way?
And how can I get the correct values for size and prob from my fitted model?

How to estimate theta value in FOPDT equation using gekko?

I'm trying to use GEKKO to fit to a certain dataset, using the FOPDT Optimization Method to estimate k, tau and theta.
I saw the example using odeint on https://apmonitor.com/pdc/index.php/Main/FirstOrderOptimization and tried to do the same thing with GEKKO, but I can't use the value of theta in the equation.
I saw this question where should the delay call be placed inside a gekko code? and the docs https://apmonitor.com/wiki/index.php/Apps/TimeDelay, but in this case I wanted to estimate the value of theta and not use the initial guess value. I tried to use gekko's delay, but I get an error that it only works if the delay is an int value (and not a gekko FV).
I also tried to use time directly in the equation, but I can't figure out how to place x(t-theta) in there, since I can't do that syntax with gekko variables.
import pandas as pd
import numpy as np
from gekko import GEKKO
import plotly.express as px
data = pd.read_csv('data.csv',sep=',',header=0,index_col=0)
xm1 = data['x']
ym1 = data['y']
xm = xm1.to_numpy()
ym = ym1.to_numpy()
xm_r = len(xm)
tm = np.linspace(0,xm_r-1,xm_r)
m = GEKKO()
m.options.IMODE=5
m.time = tm
k = m.FV()
k.STATUS=1
tau = m.FV()
tau.STATUS=1
theta = m.FV()
theta.STATUS=1
x = m.Param(value=xm)
y = m.CV()
y.FSTATUS = 1
yObj = m.Param(value=ym)
xtheta = m.Var()
m.delay(x,xtheta,theta)
m.Equation(y.dt()==(-y + k * xtheta)/tau)
m.Minimize((y-yObj)**2)
m.options.EV_TYPE=2
m.solve(disp=True)

Here are some strategies for implementing variable time-delay in a model such as when an optimizer adjusts the time delay in a First Order Plus Dead Time (FOPDT) model.
Create a cubic spline (continuous approximation) of the relationship between time t and the input u. This allows a fractional time delay that is not restricted to an integer multiple of the sample interval.
Create time as a variable with derivative equal to 1.
Define tc with an equation tc==time-theta to get the time shifted value. This will lookup the spline uc value that corresponds to this tc value.
You can also fit the FOPDT model to data with Excel or other tools.
from gekko import GEKKO
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# load data
url = 'http://apmonitor.com/do/uploads/Main/tclab_siso_data.txt'
data = pd.read_csv(url)
t = data['time'].values
u = data['voltage'].values
y = data['temperature'].values
m = GEKKO(remote=False)
m.time = t; time = m.Var(0); m.Equation(time.dt()==1)
K = m.FV(lb=0,ub=1); K.STATUS=1
tau = m.FV(lb=1,ub=300); tau.STATUS=1
theta = m.FV(lb=2,ub=30); theta.STATUS=1
# create cubic spline with t versus u
uc = m.Var(u); tc = m.Var(t); m.Equation(tc==time-theta)
m.cspline(tc,uc,t,u,bound_x=False)
ym = m.Param(y)
yp = m.Var(y); m.Equation(tau*yp.dt()+(yp-y[0])==K*(uc-u[0]))
m.Minimize((yp-ym)**2)
m.options.IMODE=5
m.solve()
print('K: ', K.value[0])
print('tau: ', tau.value[0])
print('theta: ', theta.value[0])
plt.figure()
plt.subplot(2,1,1)
plt.plot(t,u)
plt.legend([r'$V_1$ (mV)'])
plt.ylabel('MV Voltage (mV)')
plt.subplot(2,1,2)
plt.plot(t,y)
plt.plot(t,yp)
plt.legend([r'$T_{1meas}$',r'$T_{1pred}$'])
plt.ylabel('CV Temp (degF)')
plt.xlabel('Time')
plt.savefig('sysid.png')
plt.show()
K: 0.25489655932
tau: 229.06377617
theta: 2.0
Another way to approach this is to estimate a higher-order ARX model and then determine the statistical significance of the beta terms. Here is an example of using the Gekko sysid function.
from gekko import GEKKO
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# load data and parse into columns
url = 'http://apmonitor.com/do/uploads/Main/tclab_siso_data.txt'
data = pd.read_csv(url)
t = data['time']
u = data['voltage']
y = data['temperature']
# generate time-series model
m = GEKKO()
# system identification
na = 5 # output coefficients
nb = 5 # input coefficients
yp,p,K = m.sysid(t,u,y,na,nb,pred='meas')
print('alpha: ', p['a'])
print('beta: ', p['b'])
print('gamma: ', p['c'])
plt.figure()
plt.subplot(2,1,1)
plt.plot(t,u)
plt.legend([r'$V_1$ (mV)'])
plt.ylabel('MV Voltage (mV)')
plt.subplot(2,1,2)
plt.plot(t,y)
plt.plot(t,yp)
plt.legend([r'$T_{1meas}$',r'$T_{1pred}$'])
plt.ylabel('CV Temp (degF)')
plt.xlabel('Time')
plt.savefig('sysid.png')
plt.show()
With results:
alpha: [[0.525143 ]
[0.19284469]
[0.08177381]
[0.06152181]
[0.12918898]]
beta: [[[-8.51804876e-05]
[ 5.88425202e-04]
[ 1.99205676e-03]
[-2.81456773e-03]
[ 2.38110003e-03]]]
gamma: [0.75189199]
The first two beta terms are nearly zero but they can also be left in the model for a higher-order representation of the system.

Bayesian Calibration with PyMC3, Kennedy O'Hagan

I'm quite new to probabilistic programming and pymc3...
Currently, I want to implement the Kennedy-O’Hagan framework in pymc3.
The setup is according to the paper of Kennedy and O'Hagan as follows:
We have n observations zi of the form
zi = f(xi , theta) + g(xi) + ei,
where xi are known imputs and theta are unknown calibration parameters and ei are iid error terms. We also have m model evaluations yj of the form
yj = f(x'j, thetaj), where both x'j (different than xi above) and thetaj are known. Therefore, the data consists of all zi and yj. In the paper, Kennedy-O'Hagan model f, g using gaussian processes:
f ~ GP{m1 (.,.), Sigma1[(.,.),(.,.)] }
g ~ GP{m2 (.), Sigma2[(.),(.)] }
Among other things, the goal is to get posterior samples for the unknow calibration parameters theta.
What I've done so far:
import pymc3 as pm
import numpy as np
import matplotlib.pyplot as plt
from multiprocessing import freeze_support
import sys
import theano
import theano.tensor as tt
from mpl_toolkits.mplot3d import Axes3D
import pyDOE
from scipy.stats.distributions import uniform
def physical_system(x):
return 0.65 * x / (1 + x / 5)
def observation(x):
return physical_system(x[:]) + np.random.normal(0,0.01,len(x))
def computational_system(input):
return input[:,0]*input[:,1]
if __name__ == "__main__":
freeze_support()
# observations with noise
x_obs = np.linspace(0,4,10)
y_real = physical_system(x_obs[:])
y_obs = observation(x_obs[:])
# computation model
N = 60
design = pyDOE.lhs(2, samples=N, criterion='center')
left = [-0.2,-0.2]; right = [4.2,1.2]
for i in range(2):
design[:,i] = uniform(loc=left[i],scale=right[i]-left[i]).ppf(design[:,i])
x_comp = design[:,0][:,None]; t_comp = design[:,1][:,None]
input_comp = np.hstack((x_comp,t_comp))
y_comp = computational_system(input_comp)
x_obs_shared = theano.shared(x_obs[:, None])
with pm.Model() as model:
noise = pm.HalfCauchy('noise',beta=5)
ls_1 = pm.Gamma('ls_1', alpha=1, beta=1, shape=2)
cov = pm.gp.cov.ExpQuad(2,ls=ls_1)
f = pm.gp.Marginal(cov_func=cov)
# train the gp f with data from computer model:
f_0 = f.marginal_likelihood('f_0', X=input_comp, y=y_comp, noise=noise)
trace = pm.sample(500, pm.Metropolis(), chains=4)
burned_trace = trace[300:]
Until here, everything is fine. My GP f is trained according the computer model.
Now, I want to test if I could fit this trained GP to my observed data:
#gp f is now trained to data from computer model
#now I want to fit this trained gp to observed data and find posterior for theta
with model:
sd = pm.Gamma('eta', alpha=1, beta=1)
theta = pm.Normal('theta', mu=0, sd=sd)
sigma = pm.Gamma('sigma', alpha=1, beta=1)
input_1 = tt.concatenate([x_obs_shared, tt.tile(theta, len(x_obs[:,None]), ndim=2).T], axis=1)
f_1 = gp1.conditional('f_1', Xnew=input_1, shape=(10,))
y_ = pm.Normal('y_', mu=f_1,sd=sigma, observed=y_obs)
step = pm.Metropolis()
trace_ = pm.sample(30000, step,start=pm.find_MAP(), chains=4)
Is this formulation correct? I get very unstable results...
The full formulation according KOH should be something like this:
with pm.Model() as model:
theta = pm.Normal('theta', mu=0, sd=10)
noise = pm.HalfCauchy('noise',beta=5)
ls_1 = pm.Gamma('ls_1', alpha=1, beta=1, shape=2)
cov = pm.gp.cov.ExpQuad(2,ls=ls_1)
gp1 = pm.gp.Marginal(cov_func=cov)
gp2 = pm.gp.Marginal(cov_func=cov)
gp = gp1 + gp2
input_1 = tt.concatenate([x_obs_shared, tt.tile(theta, len(x_obs), ndim=2).T], axis=1)
f_0 = gp1.marginal_likelihood('f_0', X=input_comp, y=y_comp, noise=noise)
f_1 = gp1.marginal_likelihood('f_1', X=input_1, y=y_obs, noise=noise)
f = gp.marginal_likelihood('f', X=input_1, y=y_obs, noise=noise)
Could somebody give me some advise how to formulate the KOH properly with pymc3? I am desperate... Would appreciate any help. Thank you!

You might have found the solution but if not, that's a good one (Guidelines for the Bayesian calibration of building energy models)

How do i use the scipy optimize to optimize my cost Logistic Regression cost function?

scipy.optimize.fmin_tnc is not working. its giving error about dimension mismatch.
I am optimizing Logistic Regression cost function. I have also computed the gradient. I have vectorized my implementation. Can anyone kindly help.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.optimize as opt
data = pd.read_csv('ex2data1.txt',header = None)
X = data.iloc[:,:-1]
y = data.iloc[:,2]
data.head()
mask = (y==1)
adm = plt.scatter(X[mask][0].values,X[mask][1].values)
not_adm = plt.scatter(X[~mask][0].values,X[~mask][1].values)
plt.xlabel('exam 1 score')
plt.ylabel('exam 2 score')
plt.legend((adm,not_adm),('admitted','not admitted'))
plt.show()
m,n = X.shape
theta = np.zeros((n+1,1))
X = np.hstack((np.ones((m,1)),X))
y = y[y,np.newaxis]
#only data reading and preparation so far.
def sigmoid(z):
return 1/(1+np.exp(-z))
def computeCost(X,y,theta):
h = sigmoid(np.dot(X,theta))
J = (-1/m)*(np.dot(y.T,np.log(h))+np.dot((1-y).T,np.log(1-h)))
return J
def gradient(X,y,theta):
h = sigmoid(np.dot(X,theta))
beta = h-y
return np.dot(X.T,beta)*(1/m)
temp = opt.fmin_tnc(func = computeCost,x0 = theta.flatten(),fprime =gradient, args = (X, y.flatten()))
This is giving dimension mismatch error. I have checked computeCost and gradient functions they are working properly. X is 100x3 ,y is 100x1,theta is 3x1.
https://i.stack.imgur.com/0hlKA.jpg this is the link to the image of the error

Asymmetric error bars in Scipy's odrpack

I am using Scipy's odrpack to fit a linear function to some data that has uncertainties in both the x and y dimensions. Each data point has it's own uncertainty that is asymmetric.
I can fit a function using symmetric uncertainties, but this is not a true representation of my data.
How can I perform the fit with this in mind?
This is my code so far. It receives input data as a command line argument, and the uncertainties i'm using are just random numbers at the moment. (also, two fits are happening, one for positive data points another for the negative. The reasons are unrelated to this question)
import sys
import numpy as np
import scipy.odr.odrpack as odrpack
def f(B, x):
return B[0]*x + B[1]
xdata = sys.argv[1].split(',')
xdata = [float(i) for i in xdata]
xdata = np.array(xdata)
#find indices of +/- data
zero_ind = np.where(xdata >= 0)[0][0]
x_p = xdata[zero_ind:]
x_m = xdata[:zero_ind+1]
ydata = sys.argv[2].split(',')
ydata = [float(i) for i in ydata]
ydata = np.array(ydata)
y_p = ydata[zero_ind:]
y_m = ydata[:zero_ind+1]
sx_m = np.random.random(len(x_m))
sx_p = np.random.random(len(x_p))
sy_m = np.random.random(len(y_m))
sy_p = np.random.random(len(y_p))
linear = odrpack.Model(f)
data_p = odrpack.RealData(x_p, y_p, sx=sx_p, sy=sy_p)
odr_p = odrpack.ODR(data_p, linear, beta0=[1.,2.])
out_p = odr_p.run()
data_m = odrpack.RealData(x_m, y_m, sx=sx_m, sy=sy_m)
odr_m = odrpack.ODR(data_m, linear, beta0=[1.,2.])
out_m = odr_m.run()
Thanks!

I will just give you solution with random data,I could not bother to import your data
import numpy as np
import scipy.odr.odrpack as odrpack
np.random.seed(1)
N = 10
x = np.linspace(0,5,N)*(-1)
y = 2*x - 1 + np.random.random(N)
sx = np.random.random(N)
sy = np.random.random(N)
def f(B, x):
return B[0]*x + B[1]
linear = odrpack.Model(f)
# mydata = odrpack.Data(x, y, wd=1./np.power(sx,2), we=1./np.power(sy,2))
mydata = odrpack.RealData(x, y, sx=sx, sy=sy)
myodr = odrpack.ODR(mydata, linear, beta0=[1., 2.])
myoutput = myodr.run()
myoutput.pprint()
Than we got
Beta: [ 1.92743947 -0.94409236]
Beta Std Error: [ 0.03117086 0.11273067]
Beta Covariance: [[ 0.02047196 0.06690713]
[ 0.06690713 0.26776027]]
Residual Variance: 0.04746112419196648
Inverse Condition #: 0.10277763521624257
Reason(s) for Halting:
Sum of squares convergence

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

what is the fault in my gradient descent algorithm built manually? - python

Related

Quantile residual Q-Q plot in python

How to estimate theta value in FOPDT equation using gekko?

Bayesian Calibration with PyMC3, Kennedy O'Hagan

How do i use the scipy optimize to optimize my cost Logistic Regression cost function?

Asymmetric error bars in Scipy's odrpack

Categories

Resources