Why isn't my gradient descent algorithm working? - python

I made a gradient descent algorithm in Python and it doesn't work. My m and b values keep increasing and never stop until I get the -inf error or the overflow encountered in square error.
import numpy as np
x = np.array([2,3,4,5])
y = np.array([5,7,9,5])
m = np.random.randn()
b = np.random.randn()
error = 0
lr = 0.0001
for q in range(1000):
for i in range(len(x)):
ypred = m*x[i] + b
error += (ypred - y[i]) **2
m = m - (x * error) *lr
b = b - (lr * error)
print(b,m)
I expected my algorithm to return the best m and b values for my data (x and y) but it didn't work. What is going wrong?

import numpy as np
x = np.array([2,3,4,5])
y = 0.3*x+0.6
m = np.random.randn()
b = np.random.randn()
lr = 0.001
for q in range(100000):
ypred = m*x + b
error = (1./(2*len(x))) * np.sum(np.square(ypred - y)) #eq 1
m = m - lr * np.sum((ypred - y)*x)/len(x) # eq 2 and eq 4
b = b - lr * np.sum(ypred - y)/len(x) # eq 3 and eq 5
print (m , b)
Output:
0.30007724168011807 0.5997039817571881
Math behind it
Use numpy vectorized operations to avoid loops.

I think you implemented the formula incorrectly:
Use summation on x - error
divide by length of x
See below code:
import numpy as np
x = np.array([2,3,4,5])
y = np.array([5,7,9,11])
m = np.random.randn()
b = np.random.randn()
error = 0
lr = 0.1
print(b, m)
for q in range(1000):
ypred = []
for i in range(len(x)):
temp = m*x[i] + b
ypred.append(temp)
error += temp - y[i]
m = m - np.sum(x * (ypred-y)) *lr/len(x)
b = b - np.sum(lr * (ypred-y))/len(x)
print(b,m)
Output:
-1.198074371762264 0.058595039571115955 # initial weights
0.9997389097653074 2.0000681277214487 # Final weights

Related

Runge Kutta 4th order Python

I am trying to solve this equation using Runge Kutta 4th order:
applying d2Q/dt2=F(y,x,v) and dQ/dt=u Q=y in my program.
I try to run the code but i get this error:
Traceback (most recent call last):
File "C:\Users\Egw\Desktop\Analysh\Askhsh1\asdasda.py", line 28, in <module>
k1 = F(y, u, x) #(x, v, t)
File "C:\Users\Egw\Desktop\Analysh\Askhsh1\asdasda.py", line 13, in F
return ((Vo/L -(R0/L)*u -(R1/L)*u**3 - y*(1/L*C)))
OverflowError: (34, 'Result too large')
I tried using the decimal library but I still couldnt make it work properly.I might have not used it properly tho.
My code is this one:
import numpy as np
from math import pi
from numpy import arange
from matplotlib.pyplot import plot, show
#parameters
R0 = 200
R1 = 250
L = 15
h = 0.002
Vo=1000
C=4.2*10**(-6)
t=0.93
def F(y, u, x):
return ((Vo/L -(R0/L)*u -(R1/L)*u**3 - y*(1/L*C)))
xpoints = arange(0,t,h)
ypoints = []
upoints = []
y = 0.0
u = Vo/L
for x in xpoints:
ypoints.append(y)
upoints.append(u)
m1 = u
k1 = F(y, u, x) #(x, v, t)
m2 = h*(u + 0.5*k1)
k2 = (h*F(y+0.5*m1, u+0.5*k1, x+0.5*h))
m3 = h*(u + 0.5*k2)
k3 = h*F(y+0.5*m2, u+0.5*k2, x+0.5*h)
m4 = h*(u + k3)
k4 = h*F(y+m3, u+k3, x+h)
y += (m1 + 2*m2 + 2*m3 + m4)/6
u += (k1 + 2*k2 + 2*k3 + k4)/6
plot(xpoints, upoints)
show()
plot(xpoints, ypoints)
show()
I expected to get the plots of u and y against t.
Turns out I messed up with the equations I was using for Runge Kutta
The correct code is the following:
import numpy as np
from math import pi
from numpy import arange
from matplotlib.pyplot import plot, show
#parameters
R0 = 200
R1 = 250
L = 15
h = 0.002
Vo=1000
C=4.2*10**(-6)
t0=0
#dz/dz
def G(x,y,z):
return Vo/L -(R0/L)*z -(R1/L)*z**3 - y/(L*C)
#dy/dx
def F(x,y,z):
return z
t = np.arange(t0, 0.93, h)
x = np.zeros(len(t))
y = np.zeros(len(t))
z = np.zeros(len(t))
y[0] = 0.0
z[0] = 0
for i in range(1, len(t)):
k0=h*F(x[i-1],y[i-1],z[i-1])
l0=h*G(x[i-1],y[i-1],z[i-1])
k1=h*F(x[i-1]+h*0.5,y[i-1]+k0*0.5,z[i-1]+l0*0.5)
l1=h*G(x[i-1]+h*0.5,y[i-1]+k0*0.5,z[i-1]+l0*0.5)
k2=h*F(x[i-1]+h*0.5,y[i-1]+k1*0.5,z[i-1]+l1*0.5)
l2=h*G(x[i-1]+h*0.5,y[i-1]+k1*0.5,z[i-1]+l1*0.5)
k3=h*F(x[i-1]+h,y[i-1]+k2,z[i-1]+l2)
l3 = h * G(x[i - 1] + h, y[i - 1] + k2, z[i - 1] + l2)
y[i]=y[i-1]+(k0+2*k1+2*k2+k3)/6
z[i] = z[i - 1] + (l0 + 2 * l1 + 2 * l2 + l3) / 6
Q=y
I=z
plot(t, Q)
show()
plot(t, I)
show()
If I may draw your attention to these 4 lines
m1 = u
k1 = F(y, u, x) #(x, v, t)
m2 = h*(u + 0.5*k1)
k2 = (h*F(y+0.5*m1, u+0.5*k1, x+0.5*h))
You should note a fundamental structural difference between the first two lines and the second pair of lines.
You need to multiply with the step size h also in the first pair.
The next problem is the step size and the cubic term. It contributes a term of size 3*(R1/L)*u^2 ~ 50*u^2 to the Lipschitz constant. In the original IVP per the question with u=Vo/L ~ 70 this term is of size 2.5e+5. To compensate only that term to stay in the stability region of the method, the step size has to be smaller 1e-5.
In the corrected initial conditions with u=0 at the start the velocity u remains below 0.001 so the cubic term does not determine stability, this is now governed by the last term contributing a Lipschitz term of 1/sqrt(L*C) ~ 125. The step size for stability is now 0.02, with 0.002 one can expect quantitatively useful results.
You can use decimal libary for more precision (handle more digits), but it's kind of annoying every value should be the same class (decimal.Decimal).
For example:
import numpy as np
from math import pi
from numpy import arange
from matplotlib.pyplot import plot, show
# Import decimal.Decimal as D
import decimal
from decimal import Decimal as D
# Precision
decimal.getcontext().prec = 10_000_000
#parameters
# Every value should be D class (decimal.Decimal class)
R0 = D(200)
R1 = D(250)
L = D(15)
h = D(0.002)
Vo = D(1000)
C = D(4.2*10**(-6))
t = D(0.93)
def F(y, u, x):
# Decomposed for use D
a = D(Vo/L)
b = D(-(R0/L)*u)
c = D(-(R1/L)*u**D(3))
d = D(-y*(D(1)/L*C))
return ((a + b + c + d ))
xpoints = arange(0,t,h)
ypoints = []
upoints = []
y = D(0.0)
u = D(Vo/L)
for x in xpoints:
ypoints.append(y)
upoints.append(u)
m1 = u
k1 = F(y, u, x) #(x, v, t)
m2 = (h*(u + D(0.5)*k1))
k2 = (h*F(y+D(0.5)*m1, u+D(0.5)*k1, x+D(0.5)*h))
m3 = h*(u + D(0.5)*k2)
k3 = h*F(y+D(0.5)*m2, u+D(0.5)*k2, x+D(0.5)*h)
m4 = h*(u + k3)
k4 = h*F(y+m3, u+k3, x+h)
y += (m1 + D(2)*m2 + D(2)*m3 + m4)/D(6)
u += (k1 + D(2)*k2 + D(2)*k3 + k4)/D(6)
plot(xpoints, upoints)
show()
plot(xpoints, ypoints)
show()
But even with ten million of precision I still get an overflow error. Check the components of the formula, their values are way too high. You can increase precision for handle them, but you'll notice it takes time to calculate them.
Problem implementation using scipy.integrate.odeint and scipy.integrate.solve_ivp.
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint, solve_ivp
# Input data initial conditions
ti = 0.0
tf = 0.5
N = 100000
h = (tf-ti)/N
# Initial conditions
u0 = 0.0
Q0 = 0.0
t_span = np.linspace(ti,tf,N)
r0 = np.array([Q0,u0])
# Parameters
R0 = 200
R1 = 250
L = 15
C = 4.2*10**(-6)
V0 = 1000
# Systems of First Order Equations
# This function is used with odeint, as specified in the documentation for scipy.integrate.odeint
def f(r,t,R0,R1,L,C,V0):
Q,u = r
ode1 = u
ode2 = -((R0/L)*u)-((R1/L)*u**3)-((1/(L*C))*Q)+(V0/L)
return np.array([ode1,ode2])
# This function is used in our 4Order Runge-Kutta implementation and in scipy.integrate.solve_ivp
def F(t,r,R0,R1,L,C,V0):
Q,u = r
ode1 = u
ode2 = -((R0/L)*u)-((R1/L)*u**3)-((1/(L*C))*Q)+(V0/L)
return np.array([ode1,ode2])
# Resolution with oedint
sol_1 = odeint(f,r0,t_span,args=(R0,R1,L,C,V0))
sol_2 = solve_ivp(fun=F,t_span=(ti,tf), y0=r0, method='LSODA',args=(R0,R1,L,C,V0))
Q_odeint, u_odeint = sol_1[:,0], sol_1[:,1]
Q_solve_ivp, u_solve_ivp = sol_2.y[0,:], sol_2.y[1,:]
# Figures
plt.figure(figsize=[30.0,10.0])
plt.subplot(3,1,1)
plt.grid(color = 'red',linestyle='--',linewidth=0.4)
plt.plot(t_span,Q_odeint,'r',t_span,u_odeint,'b')
plt.xlabel('t(s)')
plt.ylabel('Q(t), u(t)')
plt.subplot(3,1,2)
plt.plot(sol_2.t,Q_solve_ivp,'g',sol_2.t,u_solve_ivp,'y')
plt.grid(color = 'yellow',linestyle='--',linewidth=0.4)
plt.xlabel('t(s)')
plt.ylabel('Q(t), u(t)')
plt.subplot(3,1,3)
plt.plot(Q_solve_ivp,u_solve_ivp,'green')
plt.grid(color = 'yellow',linestyle='--',linewidth=0.4)
plt.xlabel('Q(t)')
plt.ylabel('u(t)')
plt.show()
Runge-Kutta 4th
# Code development of Runge-Kutta 4 Order
# Parameters
R0 = 200
R1 = 250
L = 15
C = 4.2*10**(-6)
V0 = 1000
# Input data initial conditions #
ti = 0.0
tf = 0.5
N = 100000
h = (tf-ti)/N
# Initial conditions
u0 = 0.0
Q0 = 0.0
# First order ordinary differential equations
def f1(t,Q,u):
return u
def f2(t,Q,u):
return -((R0/L)*u)-((R1/L)*u**3)-((1/(L*C))*Q)+(V0/L)
t = np.zeros(N); Q = np.zeros(N); u = np.zeros(N)
t[0] = ti
Q[0] = Q0
u[0] = u0
for i in range(0,N-1,1):
k1 = h*f1(t[i],Q[i],u[i])
l1 = h*f2(t[i],Q[i],u[i])
k2 = h*f1(t[i]+(h/2),Q[i]+(k1/2),u[i]+(l1/2))
l2 = h*f2(t[i]+(h/2),Q[i]+(k1/2),u[i]+(l1/2))
k3 = h*f1(t[i]+(h/2),Q[i]+(k2/2),u[i]+(l2/2))
l3 = h*f2(t[i]+(h/2),Q[i]+(k2/2),u[i]+(l2/2))
k4 = h*f1(t[i]+h,Q[i]+k3,u[i]+l3)
l4 = h*f2(t[i]+h,Q[i]+k3,u[i]+l3)
Q[i+1] = Q[i] + ((k1+2*k2+2*k3+k4)/6)
u[i+1] = u[i] + ((l1+2*l2+2*l3+l4)/6)
t[i+1] = t[i] + h
plt.figure(figsize=[20.0,10.0])
plt.subplot(1,2,1)
plt.plot(t,Q_solve_ivp,'r',t,Q_odeint,'y',t,Q,'b')
plt.grid(color = 'yellow',linestyle='--',linewidth=0.4)
plt.xlabel('t(s)')
plt.ylabel(r'$Q(t)_{Odeint}$, $Q(t)_{RK4}$')
plt.subplot(1,2,2)
plt.plot(t,Q_solve_ivp,'g',t,Q_odeint,'y',t,Q,'b')
plt.grid(color = 'yellow',linestyle='--',linewidth=0.4)
plt.xlabel('t(s)')
plt.ylabel(r'$Q(t)_{solve_ivp}$, $Q(t)_{RK4}$')

Linear regression outputs "inf" value

I'm trying to learn linear regression, gave this problem a try. The results of the adjusted b(bias) and m(linear coefficient) are being outputted as "inf" or "-inf", what should i do?
sorry if the problem in the code is obvius, I'm new at this.
from matplotlib import pyplot as plt
import random
x = [1,2,3,3,4,4,3,2,1,2,5,4]
y = [1,2,2,1,3,4,1,1,2,3,4,5]
b = random.random()
m = random.random()
learning_rate = 0.3
iterations = 1000
for i in range(iterations):
for k in range(len(x)):
X = m * x[k] + b
derivative_error = 2 * (X - y[k])
dX_dm = x[k]
dX_db = 1
m += derivative_error * dX_dm * learning_rate
b += derivative_error * learning_rate
If I get it right, you are trying to use gradient descent to solve the linear regression
model. Here are the problems with your approch:
First:
The derivative is incorrect, instead of of
X = m * x[k] + b
derivative_error = 2 * (X - y[k])
dX_dm = x[k]
dX_db = 1
m += derivative_error * dX_dm * learning_rate
b += derivative_error * learning_rate
it should be taking the derivate of the error with respect to m and b.
Second:
You don't update the gradient every time you see a data point x[k], like what you are doing in the inner for-loop of your code:
for k in range(len(x)):
X = m * x[k] + b
derivative_error = 2 * (X - y[k])
dX_dm = x[k]
dX_db = 1
m += derivative_error * dX_dm * learning_rate
b += derivative_error * learning_rate
Instead, you accumulate errors of all x and average them. Use the averaged error to update ypur m and n.
Third:
Perhaps your learning_rate set to 0.3 is too large, such that it 'overshoots' the optimimum point at each of your update and hence the value of m and b get to a very wild number all the way to inf.
That said, the following is my solution, with a error function to check the
average errors you get at every iteration.
def error(x,y, m, b):
error = 0
for k in range(len(x)):
error = error + ((x[k] * m + b - y[k]) **2)
return error
from matplotlib import pyplot as plt
import random
x = [1,2,3,3,4,4,3,2,1,2,5,4]
y = [1,2,2,1,3,4,1,1,2,3,4,5]
b = random.random()
m = random.random()
learning_rate = 0.01
iterations = 100
for i in range(iterations):
print(error(x, y, m, b))
d_m = 0
d_b = 0
for k in range(len(x)):
# Calulate the derivative w.r.t. m and accumulate the error
derivative_error_m = -2*(y[k] - m*x[k] - b)*x[k]
d_m = d_m + derivative_error_m
# Calulate the derivative w.r.t. b and accumulate the error
derirative_error_b = -2*(y[k] - m*x[k] - b)
d_b = d_b + derirative_error_b
# Average the derivate of errors.
d_m = d_m / len(x)
d_b = d_b / len(x)
# Update parameters to the negative direction of gradient.
m = m - d_m * learning_rate
b = b - d_b * learning_rate
After running the code for iterations = 10, you get:
15.443121587504484
14.019097680461613
13.123926121402514
12.561191094860135
12.207425702911078
11.985018705759003
11.8451837105445
11.757253610772613
11.70195107555181
11.66715838203049
where errors are shrinking at every update.
Besides, you should also notice that a simple model like linear regression. There is a nice closed-form solution which gets you the opitimum solution immediately without applying iterations such as gradient descent.

General minimal residual method with right-preconditioner of SSOR

I am trying to implement the algorithm of GMRES with right-preconditioner P for solving the linear system Ax = b . The code is running without error; however, it pops into unprecise result for me because the error I have is very large. For the GMRES method (without preconditioning matrix - remove P in the algorithm), the error I get is around 1e^{-12} and it converges with the same matrix.
import numpy as np
from scipy import sparse
import matplotlib.pyplot as plt
from scipy.linalg import norm as norm
import scipy.sparse as sp
from scipy.sparse import diags
"""The program is to split the matrix into D-diagonal; L: strictly lower matrix; U strictly upper matrix
satisfying: A = D - L - U """
def splitMat(A):
n,m = A.shape
if (n == m):
diagval = np.diag(A)
D = diags(diagval,0).toarray()
L = (-1)*np.tril(A,-1)
U = (-1)*np.triu(A,1)
else:
print("A needs to be a square matrix")
return (L,D,U)
"""Preconditioned Matrix for symmetric successive over-relaxation (SSOR): """
def P_SSOR(A,w):
## Split up matrix A:
L,D,U = splitMat(A)
Comp1 = (D - w*U)
Comp2 = (D - w*L)
Comp1inv = np.linalg.inv(Comp1)
Comp2inv = np.linalg.inv(Comp2)
P = w*(2-w)*np.matmul(Comp1inv, np.matmul(D,Comp2inv))
return P
"""GMRES_SSOR using right preconditioning P:
A - matrix of linear system Ax = b
x0 - initial guess
tol - tolerance
maxit - maximum iteration """
def myGMRES_SSOR(A,x0, b, tol, maxit):
matrixSize = A.shape[0]
e = np.zeros((maxit+1,1))
rr = 1
rstart = 2
X = x0
w = 1.9 ## in ssor
P = P_SSOR(A,w) ### preconditioned matrix
### Starting the GMRES ####
for rs in range(0,rstart+1):
### first check the residual:
if rr<tol:
break
else:
r0 = (b-A.dot(x0))
rho = norm(r0)
e[0] = rho
H = np.zeros((maxit+1,maxit))
Qcol = np.zeros((matrixSize, maxit+1))
Qcol[:,0:1] = r0/rho
for k in range(1, maxit+1):
### Arnodi procedure ##
Qcol[:,k] =np.matmul(np.matmul(A,P), Qcol[:,k-1]) ### This step applies P here:
for j in range(0,k):
H[j,k-1] = np.dot(np.transpose(Qcol[:,k]),Qcol[:,j])
Qcol[:,k] = Qcol[:,k] - (np.dot(H[j,k-1], Qcol[:,j]))
H[k,k-1] =norm(Qcol[:,k])
Qcol[:,k] = Qcol[:,k]/H[k,k-1]
### QR decomposition step ###
n = k
Q = np.zeros((n+1, n))
R = np.zeros((n, n))
R[0, 0] = norm(H[0:n+2, 0])
Q[:, 0] = H[0:n+1, 0] / R[0,0]
for j in range (0, n+1):
t = H[0:n+1, j-1]
for i in range (0, j-1):
R[i, j-1] = np.dot(Q[:, i], t)
t = t - np.dot(R[i, j-1], Q[:, i])
R[j-1, j-1] = norm(t)
Q[:, j-1] = t / R[j-1, j-1]
g = np.dot(np.transpose(Q), e[0:k+1])
Y = np.dot(np.linalg.inv(R), g)
Res= e[0:n] - np.dot(H[0:n, 0:n], Y[0:n])
rr = norm(Res)
#### second check on the residual ###
if rr < tol:
break
#### Updating the solution with the preconditioned matrix ####
X = X + np.matmul(np.matmul(P,Qcol[:, 0:k]), Y) ### This steps applies P here:
return X
######
A = np.random.rand(100,100)
x = np.random.rand(100,1)
b = np.matmul(A,x)
x0 = np.zeros((100,1))
maxit = 100
tol = 0.00001
x = myGMRES_SSOR(A,x0,b,tol,maxit)
res = b - np.matmul(A,x)
print(norm(res))
print("Solution with gmres\n", np.matmul(A,x))
print("---------------------------------------")
print("b matrix:", b)
I hope anyone could help me figure out this!!!
I'm not sure where you got you "Symmetric_successive_over-relaxation" SSOR code from, but it appears to be wrong. You also seem to be assuming that A is symmetric matrix, but in your random test case it is not.
Following SSOR's Wikipedia entry, I replaced your P_SSOR function with
def P_SSOR(A,w):
L,D,U = splitMat(A)
P = 2/(2-w) * (1/w*D+L)*np.linalg.inv(D)*(1/w*D+L).T
return P
and your test matrix with
A = np.random.rand(100,100)
A = A + A.T
and your code works up to a 12 digit residual error.

Gradient Descent returns NaN values for slope and error

I'm new to machine learning and am trying to implement gradient descent. The code I have looks like this and has been resulting in NaN values for all parameters:
def compute_error_for_line_given_points(b,m,points):
totalError = 0 #sum of square error formula
for i in range (0, len(points)):
x = points[i, 0]
y = points[i, 1]
totalError += (y-(m*x + b)) ** 2
return totalError/ float(len(points))
def step_gradient(b_current, m_current, points, learning_rate):
#gradient descent
b_gradient = 0
m_gradient = 0
N = float(len(points))
for i in range(0, len(points)):
x = points[i, 0]
y = points[i, 1]
b_gradient += -(2/N) * (y - (m_current * x + b_current))
m_gradient += -(2/N) * x * (y - (m_current * x + b_current))
new_b = b_current - (learning_rate * b_gradient)
new_m = m_current - (learning_rate * m_gradient)
return [new_b,new_m]
def gradient_descent_runner(points, starting_b, starting_m, learning_rate, num_iterations):
b = starting_b
m = starting_m
for i in range(num_iterations):
b,m = step_gradient(b, m, array(points), learning_rate)
return [b,m]
def run():
#Step 1: Collect the data
points = genfromtxt("C:/Users/mishruti/Downloads/For Linear Regression.csv", delimiter = ",")
#Step 2: Define our Hyperparameters
learning_rate = 0.0000001 #how fast the data converge
#y=mx+b (Slope formule)
initial_b = 0 # initial y-intercept guess
initial_m = 0 # initial slope guess
num_iterations = 4
print("Starting gradient descent at b = {0}, m = {1}, error = {2}".format(initial_b, initial_m, compute_error_for_line_given_points(initial_b, initial_m, points)))
print("Running...")
[b, m] = gradient_descent_runner(points, initial_b, initial_m, learning_rate, num_iterations)
print("After {0} iterations b = {1}, m = {2}, error = {3}".format(num_iterations, b, m, compute_error_for_line_given_points(b, m, points)))
# main function
if __name__ == "__main__":
run()
A sample from my data set is attached. Can someone please help me figure this out? Thanks!

Converting Numpy Lstsq residual value to R^2

I am performing a least squares regression as below (univariate). I would like to express the significance of the result in terms of R^2. Numpy returns a value of unscaled residual, what would be a sensible way of normalizing this.
field_clean,back_clean = rid_zeros(backscatter,field_data)
num_vals = len(field_clean)
x = field_clean[:,row:row+1]
y = 10*log10(back_clean)
A = hstack([x, ones((num_vals,1))])
soln = lstsq(A, y )
m, c = soln [0]
residues = soln [1]
print residues
See http://en.wikipedia.org/wiki/Coefficient_of_determination
Your R2 value =
1 - residual / sum((y - y.mean())**2)
which is equivalent to
1 - residual / (n * y.var())
As an example:
import numpy as np
# Make some data...
n = 10
x = np.arange(n)
y = 3 * x + 5 + np.random.random(n)
# Note that polyfit is an easier way to do this...
# It would just be "model, resid = np.polyfit(x,y,1,full=True)[:2]"
A = np.vstack((x, np.ones(n))).T
model, resid = np.linalg.lstsq(A, y)[:2]
r2 = 1 - resid / (y.size * y.var())
print r2

Categories