Hey I am trying to understand this algorithm for a linear hypothesis. I can't figure out if my implementation is correct or not. I think it is not correct but I can't figure out what am I missing.
theta0 = 1
theta1 = 1
alpha = 0.01
for i in range(0,le*10):
for j in range(0,le):
temp0 = theta0 - alpha * (theta1 * x[j] + theta0 - y[j])
temp1 = theta1 - alpha * (theta1 * x[j] + theta0 - y[j]) * x[j]
theta0 = temp0
theta1 = temp1
print ("Values of slope and y intercept derived using gradient descent ",theta1, theta0)
It is giving me the correct answer to the 4th degree of precision. but when I compare it to other programs on the net I am getting confused by it.
Thanks in advance!
Implementation of the Gradient Descent algorithm:
import numpy as np
cur_x = 1 # Initial value
gamma = 1e-2 # step size multiplier
precision = 1e-10
prev_step_size = cur_x
# test function
def foo_func(x):
y = (np.sin(x) + x**2)**2
return y
# Iteration loop until a certain error measure
# is smaller than a maximal error
while (prev_step_size > precision):
prev_x = cur_x
cur_x += -gamma * foo_func(prev_x)
prev_step_size = abs(cur_x - prev_x)
print("The local minimum occurs at %f" % cur_x)
Related
This is more of a computational physics problem, and I've asked it on physics stack exchange, but no answers on there. This is, I suppose, a mix of the disciplines on here and there (and maybe even mathematics stack exchange), so finding the right place to post is a task in of itself apparently...
I'm attempting to use Crank-Nicolson scheme to solve the TDSE in 1D. The initial wave is a real Gaussian that has been normalised wrt its probability density. As the solution evolves, a depression grows in the central peak of the real part of the wave, and the imaginary part's central trough is perhaps a bit higher than I expect (image below).
Does this behaviour seem reasonable? I have searched around and not seen questions/figures that are similar. I've tested another person's code from Github and it exhibits the same behaviour, which makes me feel a bit better. But I still think the center peak should just decrease in height and increase in width. The likelihood of me getting a physics-based explanation is relatively low here I'd assume, but a computational-based explanation on errors I may have made is more likely.
I'm happy to give more information, for example my code, or the matrices used in the scheme, etc. Thanks in advance!
Here's a link to GIF of time evolution:
And the part of my code relevant to solving the 1D TDSE:
(pretty much the entire thing except the plotting)
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
# Define function for norm.
def normf(dxc, uc, ic):
return sum(dxc * np.square(np.abs(uc[ic, :])))
# Define function for expectation value of position.
def xexpf(dxc, xc, uc, ic):
return sum(dxc * xc * np.square(np.abs(uc[ic, :])))
# Define function for expectation value of squared position.
def xexpsf(dxc, xc, uc, ic):
return sum(dxc * np.square(xc) * np.square(np.abs(uc[ic, :])))
# Define function for standard deviation.
def sdaf(xexpc, xexpsc, ic):
return np.sqrt(xexpsc[ic] - np.square(xexpc[ic]))
# Time t: t0 =< t =< tf. Have N steps at which to evaluate the CN scheme. The
# time interval is dt. decp: variable for plotting to certain number of decimal
# places.
t0 = 0
tf = 20
N = 200
dt = tf / N
t = np.linspace(t0, tf, num = N + 1, endpoint = True)
decp = str(dt)[::-1].find('.')
# Initialise array for filling with norm values at each time step.
norm = np.zeros(len(t))
# Initialise array for expectation value of position.
xexp = np.zeros(len(t))
# Initialise array for expectation value of squared position.
xexps = np.zeros(len(t))
# Initialise array for alternate standard deviation.
sda = np.zeros(len(t))
# Position x: -a =< x =< a. M is an even number. There are M + 1 total discrete
# positions, for the points to be symmetric and centred at x = 0.
a = 100
M = 1200
dx = (2 * a) / M
x = np.linspace(-a, a, num = M + 1, endpoint = True)
# The gaussian function u diffuses over time. sd sets the width of gaussian. u0
# is the initial gaussian at t0.
sd = 1
var = np.power(sd, 2)
mu = 0
u0 = np.sqrt(1 / np.sqrt(np.pi * var)) * np.exp(-np.power(x - mu, 2) / (2 * \
var))
u = np.zeros([len(t), len(x)], dtype = 'complex_')
u[0, :] = u0
# Normalise u.
u[0, :] = u[0, :] / np.sqrt(normf(dx, u, 0))
# Set coefficients of CN scheme.
alpha = dt * -1j / (4 * np.power(dx, 2))
beta = dt * 1j / (4 * np.power(dx, 2))
# Tridiagonal matrices Al and AR. Al to be solved using Thomas algorithm.
Al = np.zeros([len(x), len(x)], dtype = 'complex_')
for i in range (0, M):
Al[i + 1, i] = alpha
Al[i, i] = 1 - (2 * alpha)
Al[i, i + 1] = alpha
# Corner elements for BC's.
Al[M, M], Al[0, 0] = 1 - alpha, 1 - alpha
Ar = np.zeros([len(x), len(x)], dtype = 'complex_')
for i in range (0, M):
Ar[i + 1, i] = beta
Ar[i, i] = 1 - (2 * beta)
Ar[i, i + 1] = beta
# Corner elements for BC's.
Ar[M, M], Ar[0, 0] = 1 - 2*beta, 1 - beta
# Thomas algorithm variables. Following similar naming as in Wiki article.
a = np.diag(Al, -1)
b = np.diag(Al)
c = np.diag(Al, 1)
NT = len(b)
cp = np.zeros(NT - 1, dtype = 'complex_')
for n in range(0, NT - 1):
if n == 0:
cp[n] = c[n] / b[n]
else:
cp[n] = c[n] / (b[n] - (a[n - 1] * cp[n - 1]))
d = np.zeros(NT, dtype = 'complex_')
dp = np.zeros(NT, dtype = 'complex_')
# Iterate over each time step to solve CN method. Maintain boundary
# conditions. Keep track of standard deviation.
for i in range(0, N):
# BC's.
u[i, 0], u[i, M] = 0, 0
# Find RHS.
d = np.dot(Ar, u[i, :])
for n in range(0, NT):
if n == 0:
dp[n] = d[n] / b[n]
else:
dp[n] = (d[n] - (a[n - 1] * dp[n - 1])) / (b[n] - (a[n - 1] * \
cp[n - 1]))
nc = NT - 1
while nc > -1:
if nc == NT - 1:
u[i + 1, nc] = dp[nc]
nc -= 1
else:
u[i + 1, nc] = dp[nc] - (cp[nc] * u[i + 1, nc + 1])
nc -= 1
norm[i] = normf(dx, u, i)
xexp[i] = xexpf(dx, x, u, i)
xexps[i] = xexpsf(dx, x, u, i)
sda[i] = sdaf(xexp, xexps, i)
# Fill in final norm value.
norm[N] = normf(dx, u, N)
# Fill in final position expectation value.
xexp[N] = xexpf(dx, x, u, N)
# Fill in final squared position expectation value.
xexps[N] = xexpsf(dx, x, u, N)
# Fill in final standard deviation value.
sda[N] = sdaf(xexp, xexps, N)
I need to implement gradient descent on my own. My task is to create an arbitrary function, add noise to it, and then find the coefficients values for that function. So first, I created a function and created some random values:
# Preprocessing Input data
function = lambda x: x ** 2 +(x)+1
X=[]
Y=[]
for i in range(-100,100):
X.append(i)
Y.append(function(i) + random.randrange(-10,10)
then I normalized the values-
maxVal = np.max(np.hstack((X,Y)))
X = X/maxVal
Y = Y/maxVal
X= np.asarray(X)
Y= np.asarray(Y)
and this is my code for gradient descent, using derivative to find the coefficients
w1Arr = []
w2Arr = []
bArr = []
lossArr = []
for i in range(epochs):
Y_pred =w1*np.square(X)+w2*X+b
D_w1 = (-2/n) * sum( np.square(X) * (Y - Y_pred)) # Derivative for w1
D_w2 = (-2/n) * sum(X * (Y - Y_pred)) # Derivative for w2
D_b = (-2/n) * sum(Y - Y_pred) # Derivative for b
w1 = w1 - L * D_w1 # Update w1
w2 = w2 - L * D_w2 # Update w2
b = b - L * D_b # Update b
loss = sum((Y - Y_pred) * (Y - Y_pred)) #MSE
w1Arr.append(w1)
w2Arr.append(w2)
bArr.append(b)
lossArr.append(loss)
when I try to plot the results:
# Making predictions
Y_pred = w1*(np.square(X))+w2*X+b
#print(Y_pred)
plt.scatter(X, Y)
plt.plot(X, Y_pred) # predicted
plt.legend()
plt.show()
I see that the coefficients are pretty much the same,and just looks like a linear line-
I'm pretty much stuck, and don't know what is wrong with my code or how to fix it.
I looked online looking for solutions, but couldn't find any.
any help would be appreciated.
Found out the problem!
The normalization you applied just messed up the relation between the x and y, in particular you skewed the domain with respect of the codomain:
maxVal = np.max(np.hstack((X,Y)))
X = X/maxVal
Y = Y/maxVal
Just remove the normalization and you will find that you can learn the coefficients.
If you really want, you can normalize both the axis, but they have to be two proportional values:
X = X / np.max(X)
Y = Y / np.max(Y)
I'm trying to learn linear regression, gave this problem a try. The results of the adjusted b(bias) and m(linear coefficient) are being outputted as "inf" or "-inf", what should i do?
sorry if the problem in the code is obvius, I'm new at this.
from matplotlib import pyplot as plt
import random
x = [1,2,3,3,4,4,3,2,1,2,5,4]
y = [1,2,2,1,3,4,1,1,2,3,4,5]
b = random.random()
m = random.random()
learning_rate = 0.3
iterations = 1000
for i in range(iterations):
for k in range(len(x)):
X = m * x[k] + b
derivative_error = 2 * (X - y[k])
dX_dm = x[k]
dX_db = 1
m += derivative_error * dX_dm * learning_rate
b += derivative_error * learning_rate
If I get it right, you are trying to use gradient descent to solve the linear regression
model. Here are the problems with your approch:
First:
The derivative is incorrect, instead of of
X = m * x[k] + b
derivative_error = 2 * (X - y[k])
dX_dm = x[k]
dX_db = 1
m += derivative_error * dX_dm * learning_rate
b += derivative_error * learning_rate
it should be taking the derivate of the error with respect to m and b.
Second:
You don't update the gradient every time you see a data point x[k], like what you are doing in the inner for-loop of your code:
for k in range(len(x)):
X = m * x[k] + b
derivative_error = 2 * (X - y[k])
dX_dm = x[k]
dX_db = 1
m += derivative_error * dX_dm * learning_rate
b += derivative_error * learning_rate
Instead, you accumulate errors of all x and average them. Use the averaged error to update ypur m and n.
Third:
Perhaps your learning_rate set to 0.3 is too large, such that it 'overshoots' the optimimum point at each of your update and hence the value of m and b get to a very wild number all the way to inf.
That said, the following is my solution, with a error function to check the
average errors you get at every iteration.
def error(x,y, m, b):
error = 0
for k in range(len(x)):
error = error + ((x[k] * m + b - y[k]) **2)
return error
from matplotlib import pyplot as plt
import random
x = [1,2,3,3,4,4,3,2,1,2,5,4]
y = [1,2,2,1,3,4,1,1,2,3,4,5]
b = random.random()
m = random.random()
learning_rate = 0.01
iterations = 100
for i in range(iterations):
print(error(x, y, m, b))
d_m = 0
d_b = 0
for k in range(len(x)):
# Calulate the derivative w.r.t. m and accumulate the error
derivative_error_m = -2*(y[k] - m*x[k] - b)*x[k]
d_m = d_m + derivative_error_m
# Calulate the derivative w.r.t. b and accumulate the error
derirative_error_b = -2*(y[k] - m*x[k] - b)
d_b = d_b + derirative_error_b
# Average the derivate of errors.
d_m = d_m / len(x)
d_b = d_b / len(x)
# Update parameters to the negative direction of gradient.
m = m - d_m * learning_rate
b = b - d_b * learning_rate
After running the code for iterations = 10, you get:
15.443121587504484
14.019097680461613
13.123926121402514
12.561191094860135
12.207425702911078
11.985018705759003
11.8451837105445
11.757253610772613
11.70195107555181
11.66715838203049
where errors are shrinking at every update.
Besides, you should also notice that a simple model like linear regression. There is a nice closed-form solution which gets you the opitimum solution immediately without applying iterations such as gradient descent.
While implementing Gradient Descent Algorithm in linear regression, the prediction that my algorithm is making and the resulting regression line are coming as a wrong output. Could anyone please have a look at my implementation and help me out? Also, please guide me that how can I know what value of "learning rate" and "number of iterations" to choose in specific regression problem?
theta0 = 0 #first parameter
theta1 = 0 #second parameter
alpha = 0.001 #learning rate (denoted by alpha)
num_of_iterations = 100 #total number of iterations performed by Gradient Descent
m = float(len(X)) #total number of training examples
for i in range(num_of_iterations):
y_predicted = theta0 + theta1 * X
derivative_theta0 = (1/m) * sum(y_predicted - Y)
derivative_theta1 = (1/m) * sum(X * (y_predicted - Y))
temp0 = theta0 - alpha * derivative_theta0
temp1 = theta1 - alpha * derivative_theta1
theta0 = temp0
theta1 = temp1
print(theta0, theta1)
y_predicted = theta0 + theta1 * X
plt.scatter(X,Y)
plt.plot(X, y_predicted, color = 'red')
plt.show()
Resulting regression line about which I need some help
Your learning rate is to high, I got it working by reducing the learning rate to alpha = 0.0001.
For my physics degree, I have to take some Python lessons. I'm an absolute beginner and as such, I can't understand other answers. The code is to plot an object's trajectory with air resistance. I would really appreciate a quick fix - I think it has something to do with the time variable being too small but increasing it doesn't help.
import matplotlib.pyplot as plt
import numpy as np
import math # need math module for trigonometric functions
g = 9.81 #gravitational constant
dt = 1e-3 #integration time step (delta t)
v0 = 40 # initial speed at t = 0
angle = math.pi/4 #math.pi = 3.14, launch angle in radians
time = np.arange(0, 10, dt) #time axis
vx0 = math.cos(angle)*v0 # starting velocity along x axis
vy0 = math.sin(angle)*v0 # starting velocity along y axis
xa = vx0*time # compute x coordinates
ya = -0.5*g*time**2 + vy0*time # compute y coordinates
def traj_fric(angle, v0): # function for trajectory
vx0 = math.cos(angle) * v0 # for some launch angle and starting velocity
vy0 = math.sin(angle) * v0 # compute x and y component of starting velocity
x = np.zeros(len(time)) #initialise x and y arrays
y = np.zeros(len(time))
x[0], y[0], 0 #projecitle starts at 0,0
x[1], y[1] = x[0] + vx0 * dt, y[0] + vy0 * dt # second elements of x and
# y are determined by initial
# velocity
i = 1
while y[i] >= 0: # conditional loop continuous until
# projectile hits ground
gamma = 0.005 # constant of friction
height = 100 # height at which air friction disappears
f = 0.5 * gamma * (height - y[i]) * dt
x[i + 1] = (2 * x[i] - x[i - 1] + f * x[i - 1])/1 + f # numerical integration to find x[i + 1]
y[i + 1] = (2 * y[i] - y[i - 1] + f * y[i - 1] - g * dt ** 2)/ 1 + f # and y[i + 1]
i = i + 1 # increment i for next loop
x = x[0:i+1] # truncate x and y arrays
y = y[0:i+1]
return x, y, (dt*i), x[i] # return x, y, flight time, range of projectile
x, y, duration, distance = traj_fric(angle, v0)
fig1 = plt.figure()
plt.plot(xa, ya) # plot y versus x
plt.xlabel ("x")
plt.ylabel ("y")
plt.ylim(0, max(ya)+max(ya)*0.2)
plt.xlim(0, distance+distance*0.1)
plt.show()
print "Distance:" ,distance
print "Duration:" ,duration
n = 5
angles = np.linspace(0, math.pi/2, n)
maxrange = np.zeros(n)
for i in range(n):
x,y, duration, maxrange [i] = traj_fric(angles[i], v0)
angles = angles/2/math.pi*360 #convert rad to degress
print "Optimum angle:", angles[np.where(maxrange==np.max(maxrange))]
The error is:
File "C:/Python27/Lib/site-packages/xy/projectile_fric.py", line 43, in traj_fric
x[i + 1] = (2 * x[i] - x[i - 1] + f * x[i - 1])/1 + f # numerical integration to find x[i + 1]
IndexError: index 10000 is out of bounds for axis 0 with size 10000
This is pretty straightforward. When you have a size of 10000, element index 10000 is out of bounds because indexing begins with 0, not 1. Therefore, the 10,000th element is index 9999, and anything larger than that is out of bounds.
Mason Wheeler's answer told you what Python was telling you. The problem occurs in this loop:
while y[i] >= 0: # conditional loop continuous until
# projectile hits ground
gamma = 0.005 # constant of friction
height = 100 # height at which air friction disappears
f = 0.5 * gamma * (height - y[i]) * dt
x[i + 1] = (2 * x[i] - x[i - 1] + f * x[i - 1])/1 + f # numerical integration to find x[i + 1]
y[i + 1] = (2 * y[i] - y[i - 1] + f * y[i - 1] - g * dt ** 2)/ 1 + f # and y[i + 1]
i = i + 1 # increment i for next loop
The simple fix is to change the loop to something like (I don't know Python syntax, so bear with me):
while (y[i] >= 0) and (i < len(time)):
That will stop the sim when you run out of array, but it will (potentially) also stop the sim with the projectile hanging in mid-air.
What you have here is a very simple ballistic projectile simulation, modeling atmospheric friction as a linear function of altitude. QUALITATIVELY, what is happening is that your projectile is not hitting the ground in the time you allowed, and you are attempting to overrun your tracking arrays. This is caused by failure to allow sufficient time-of-flight. Observe that the greatest possible time-of-flight occurs when atmospheric friction is zero, and it is then trivial to compute a closed-form upper bound for time-of-flight. You then use that upper bound as your time, and you will allocate sufficient array space to simulate the projectile all the way to impact.
enter code heredef data_to_array(total):
random.shuffle(total)
X = np.zeros((len(total_train), 224, 224, 3)).astype('float')
y = []
for i, img_path in enumerate(total):
img = cv2.imread('/content/gdrive/My Drive/PP/Training/COVID/COVID-19 (538).jpg')
img = cv2.resize(img, (224, 224))
X[i] = img - 1
if len(re.findall('covid', '/content/gdrive/My Drive/PP/Training/COVID/COVID-19 (538).jpg')) == 3:
y.append(0)
else:
y.append(1)
y = np.array(y)
return X, y
X_train, y_train = data_to_array(total_train)
X_test, y_test = data_to_array(total_val)