How to find 2 parameters with gradient descent method in Python?

How to find 2 parameters with gradient descent method in Python? - python

I have a few lines of code which doesn't converge. If anyone has an idea why, I would greatly appreciate. The original equation is written in def f(x,y,b,m) and I need to find parameters b,m.
np.random.seed(42)
x = np.random.normal(0, 5, 100)
y = 50 + 2 * x + np.random.normal(0, 2, len(x))
def f(x, y, b, m):
return (1/len(x))*np.sum((y - (b + m*x))**2) # it is supposed to be a sum operator
def dfb(x, y, b, m): # partial derivative with respect to b
return b - m*np.mean(x)+np.mean(y)
def dfm(x, y, b, m): # partial derivative with respect to m
return np.sum(x*y - b*x - m*x**2)
b0 = np.mean(y)
m0 = 0
alpha = 0.0001
beta = 0.0001
epsilon = 0.01
while True:
b = b0 - alpha * dfb(x, y, b0, m0)
m = m0 - alpha * dfm(x, y, b0, m0)
if np.sum(np.abs(m-m0)) <= epsilon and np.sum(np.abs(b-b0)) <= epsilon:
break
else:
m0 = m
b0 = b
print(m, f(x, y, b, m))

Both derivatives got some signs mixed up:
def dfb(x, y, b, m): # partial derivative with respect to b
# return b - m*np.mean(x)+np.mean(y)
# ^-------------^------ these are incorrect
return b + m*np.mean(x) - np.mean(y)
def dfm(x, y, b, m): # partial derivative with respect to m
# v------ this should be negative
return -np.sum(x*y - b*x - m*x**2)
In fact, these derivatives are still missing some constants:
dfb should be multiplied by 2
dfm should be multiplied by 2/len(x)
I imagine that's not too bad because the gradient is scaled by alpha anyway, but it could make the speed of convergence worse.
If you do use the correct derivatives, your code will converge after one iteration:
def dfb(x, y, b, m): # partial derivative with respect to b
return 2 * (b + m * np.mean(x) - np.mean(y))
def dfm(x, y, b, m): # partial derivative with respect to m
# Used `mean` here since (2/len(x)) * np.sum(...)
# is the same as 2 * np.mean(...)
return -2 * np.mean(x * y - b * x - m * x**2)

Related

Numpy - vectorize the bivariate poisson pmf equation

I'm trying to write a function to evaluate the probability mass function for the bivariate poisson distribution.
This is easy when all of the parameters (x, y, theta1, theta2, theta0) are scalars, but tricky to scale up without loops to allow these parameters to be vectors. I need it to scale such that, for:
theta0 being a scalar - the "correlation parameter" in the equation
theta1 and theta2 having length l
x, y both having length n
the output array would have shape (l, n, n). For example, a slice [j, :, :] from the output array would look like:
The first part (the constant, before the summation) I think i've figured out:
import numpy as np
from scipy.special import factorial
def constant(theta1, theta2, theta0, x, y):
exponential_part = np.exp(-(theta1 + theta2 + theta0)).reshape(-1, 1, 1)
x = np.tile(x, (len(x), 1)).transpose()
y = np.tile(y, (len(y), 1))
double_factorial = (np.power(np.array(theta1).reshape(-1, 1, 1), x)/factorial(x)) * \
(np.power(np.array(theta2).reshape(-1, 1, 1), y)/factorial(y))
return exponential_part * double_factorial
But I'm struggling with the summation part. How can I vectorize a summation where the limits depend on variable arrays?

I think I have this figured out, based on the approach that #w-m suggests: calculate every possible summation term which could appear, based on the maximum x or y value which appears, and use a mask to get rid of the ones you don't want. Assuming you have your x and y terms go from 0 to N, in consecutive order, this is calculating up to three times more terms than are actually required, but this is offset by getting to use vectorization.
Reference implementation
I wrote this by first writing a pure-Python reference implementation, which just implements your problem using loops. With 4 nested loops, it's not exactly fast, but it's handy to have while testing the numpy version.
import numpy as np
from scipy.special import factorial, comb
import operator as op
from functools import reduce
def choose(n, r):
# https://stackoverflow.com/a/4941932/530160
r = min(r, n-r)
numer = reduce(op.mul, range(n, n-r, -1), 1)
denom = reduce(op.mul, range(1, r+1), 1)
return numer // denom # or / in Python 2
def reference_impl_constant(s_theta1, s_theta2, s_theta0, s_x, s_y):
# Cast to float to prevent overflow
s_theta1 = float(s_theta1)
s_theta2 = float(s_theta2)
s_theta0 = float(s_theta0)
s_x = float(s_x)
s_y = float(s_y)
term1 = np.exp(-(s_theta1 + s_theta2 + s_theta0))
term2 = (s_theta1 ** s_x / factorial(s_x))
term3 = (s_theta2 ** s_y / factorial(s_y))
assert term1 >= 0
assert term2 >= 0
assert term3 >= 0
return term1 * term2 * term3
def reference_impl_constant_loop(theta1, theta2, theta0, x, y):
theta_len = theta1.shape[0]
xy_len = x.shape[0]
constant_array = np.zeros((theta_len, xy_len, xy_len))
for i in range(theta_len):
for j in range(xy_len):
for k in range(xy_len):
s_theta1 = theta1[i]
s_theta2 = theta2[i]
s_theta0 = theta0
s_x = x[j]
s_y = y[k]
constant_term = reference_impl_constant(s_theta1, s_theta2, s_theta0, s_x, s_y)
assert constant_term >= 0
constant_array[i, j, k] = constant_term
return constant_array
def reference_impl_summation(s_theta1, s_theta2, s_theta0, s_x, s_y):
sum_ = 0
for i in range(min(s_x, s_y) + 1):
sum_ += choose(s_x, i) * choose(s_y, i) * factorial(i) * ((s_theta0/s_theta1/s_theta2) ** i)
assert sum_ >= 0
return sum_
def reference_impl_summation_loop(theta1, theta2, theta0, x, y):
theta_len = theta1.shape[0]
xy_len = x.shape[0]
summation_array = np.zeros((theta_len, xy_len, xy_len))
for i in range(theta_len):
for j in range(xy_len):
for k in range(xy_len):
s_theta1 = theta1[i]
s_theta2 = theta2[i]
s_theta0 = theta0
s_x = x[j]
s_y = y[k]
summation_term = reference_impl_summation(s_theta1, s_theta2, s_theta0, s_x, s_y)
assert summation_term >= 0
summation_array[i, j, k] = summation_term
return summation_array
def reference_impl(theta1, theta2, theta0, x, y):
# all array inputs must be 1D
assert len(theta1.shape) == 1
assert len(theta2.shape) == 1
assert len(x.shape) == 1
assert len(y.shape) == 1
# theta vectors must have same length
theta_len = theta1.shape[0]
assert theta2.shape[0] == theta_len
# x and y must have same length
xy_len = x.shape[0]
assert y.shape[0] == xy_len
# theta0 is scalar
assert isinstance(theta0, (int, float))
constant_array = np.zeros((theta_len, xy_len, xy_len))
output = np.zeros((theta_len, xy_len, xy_len))
constant_array = reference_impl_constant_loop(theta1, theta2, theta0, x, y)
summation_array = reference_impl_summation_loop(theta1, theta2, theta0, x, y)
output = constant_array * summation_array
return output
Numpy implementation
I split the implementation of this across two functions.
The fast_constant() function calculates everything to the left of the summation symbol. The fast_summation() function calculates everything inside the summation symbol.
import numpy as np
from scipy.special import factorial, comb
def fast_summation(theta1, theta2, theta0, x, y):
x = np.tile(x, (len(x), 1)).transpose()
y = np.tile(y, (len(y), 1))
sum_limit = np.minimum(x, y)
max_sum_limit = np.max(sum_limit)
i = np.arange(max_sum_limit + 1).reshape(-1, 1, 1)
summation_mask = (i <= sum_limit)
theta_ratio = (theta0 / (theta1 * theta2)).reshape(-1, 1, 1, 1)
theta_to_power = np.power(theta_ratio, i)
terms = comb(x, i) * comb(y, i) * factorial(i) * theta_to_power
# mask out terms which aren't part of sum
terms *= summation_mask
# axis 0 is theta
# axis 1 is i
# axis 2 & 3 are x and y
# so sum across axis 1
terms = terms.sum(axis=1)
return terms
def fast_constant(theta1, theta2, theta0, x, y):
theta1 = theta1.astype('float64')
theta2 = theta2.astype('float64')
exponential_part = np.exp(-(theta1 + theta2 + theta0)).reshape(-1, 1, 1)
# x and y must be 1D
assert len(x.shape) == 1
assert len(y.shape) == 1
# x and y must have same shape
assert x.shape == y.shape
x_len, y_len = x.shape[0], y.shape[0]
x = x.reshape((x_len, 1))
y = y.reshape((1, y_len))
double_factorial = (np.power(np.array(theta1).reshape(-1, 1, 1), x)/factorial(x)) * \
(np.power(np.array(theta2).reshape(-1, 1, 1), y)/factorial(y))
return exponential_part * double_factorial
def fast_impl(theta1, theta2, theta0, x, y):
return fast_summation(theta1, theta2, theta0, x, y) * fast_constant(theta1, theta2, theta0, x, y)
Benchmarking
Assuming that X and Y range from 0 to 20, and that theta is centered somewhere inside that range, I get the result that the numpy version is roughly 280 times faster than the pure python reference.
Numerical stability
I'm unsure how numerically stable this is. For example, when I center theta at 100, I get a floating-point overflow. Typically, when computing an expression which has lots of choose and factorial expressions inside it, you'll use some mathematical equivalent which results in smaller intermediate sums. In this case I have so little understanding of the math that I don't know how you'd do that.

Learning parameters with gradient descent

I just started a ML course and I'm trying to run gradient descent in python. The below functions work fine, but as I move on to the bigger chunk where I do the actual learning, I just can't get the expected output and learn the right parameters, as you can tell from this decision boundary I plotted afterwards. And I'm trying to figure out why.
plotting the decision boundary
def sigmoid(z):
sigma = 1/(1+np.exp(-z))
return sigma
def compute_cost(X, y, w, b):
y_hat = sigmoid((X * np.expand_dims(w, axis=0)).sum(axis=1) + b)
total_cost = (-y * np.log(y_hat) - (1-y) * np.log(1-y_hat)).mean()
return total_cost
def compute_gradient(X, y, w, b):
z = w * X + b
yhat = sigmoid(z)
y1 = np.expand_dims(y, axis=1)
error = yhat - y1
db = error.mean()
dw_j1 = (X * error)
dw_j = np.mean(dw_j1,axis=0)
return dw_j, db
Before building this gradient descent function, I tested all the above with my training data & they all work and output the correct numbers. Really appreciate it if you can spot my mistakes.
Learning parameters with gradient descent
def gradient_descent(X, y, w, b, alpha, num_iters):
m = len(X)
J_history = []
wb_history = []
for i in range(num_iters):
cost = compute_cost(X, y, w, b)
dw_j, db = compute_gradient(X, y, w, b)
w = w - alpha * dw_j
b = b - alpha * db
wb_history.append((w,b))
J_history.append(cost)
if i % math.ceil(num_iters/10) == 0 or i == (num_iters-1):
print(f"Iteration {i:4}: Cost {float(J_history[-1]):8.2f}")
return w, b, J_history, wb_history
np.random.seed(1)
initial_w = 0.01 * (np.random.rand(2) - 0.5)
initial_b = -8
iterations = 10000
alpha = 0.001
w, b, J_history, _ = gradient_descent(X_train ,y_train, initial_w, initial_b, alpha, iterations)

Programming a PDE in Python with Brownian path

I'm currently trying to simulate a PDE including a Brownian path (one of the terms includes, that when going one timestep 'dt' further the change is weighted by a normal distributed variable with mean 0 and variance dt).
For this I used Fast Fourier Transform to get a system of ODEs which I can solve much more easily (at least that's what I thought). This lead me to the following code.
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint
#Defining some parameters a, b, c, which are included in the PDE
a = 10
b = 1.5
c = 20
#Creating the mesh
L = 100
N = 100
dx = L/N
x = np.arange(0, L, dx)
dt=0.01
t = np.linspace(0, 1, 100)
#Frequency for the Fourier Transformation
kappa= 2*np.pi*np.fft.fftfreq(N, d=dx)
#Initial condition for function u and its Fast Fourier Transformation
u0= np.zeros_like(x)
u0[int((L/4-L/10)/dx):int((L/4+L/10)/dx)]=2.5
u0[int((3*L/4-L/10)/dx):int((3*L/4+L/10)/dx)]=2.5
u0hat = np.fft.fft(u0)
u0hat_ri= np.concatenate((u0hat.real, u0hat.imag))
#Define the function describing the Transformation from the PDE to the system of ODEs
def func(uhat_ri, t, kappa, a, b, c):
uhat = uhat_ri[:N] +(1j)*uhat_ri[N:]
#Define the weighted change by the Brownian path B
mean = [0]*len(uhat)
diag = [0.1] * len(uhat)
cov = np.diag(diag)
B = np.random.multivariate_normal(mean, cov)
d_uhat = -a**2 * (np.power(kappa, 2))* uhat-c*(1j)*kappa*uhat + b* (1j) * kappa * uhat * B
d_uhat_ri = np.concatenate((d_uhat.real, d_uhat.imag))
return d_uhat_ri
#Solve the ODE with odeint
uhat_ri = odeint(func, u0hat_ri, t, args=(kappa, a, b, c))
uhat = uhat_ri[:, :N] + (1j) * uhat_ri[:, N:]
u = np.zeros_like(uhat)
#Inverse Transform the Solution
for k in range(len(t)):
u[:, k] = np.fft.ifft(uhat[k, :])
u = u.real
This program works if I exclude the Brownian path B in func
def func(uhat_ri, t, kappa, a, b, c):
uhat = uhat_ri[:N] +(1j)*uhat_ri[N:]
d_uhat = -a**2 * (np.power(kappa, 2))* uhat-c*(1j)*kappa*uhat + b* (1j) * kappa * uhat
d_uhat_ri = np.concatenate((d_uhat.real, d_uhat.imag))
return d_uhat_ri
But it takes a long time to execute when including B and also it tells me:
C:\Users\leo_h\AppData\Local\Programs\Python\Python39\lib\site-packages\scipy\integrate\odepack.py:247: ODEintWarning: Excess work done on this call (perhaps wrong Dfun type). Run with full_output = 1 to get quantitative information.
warnings.warn(warning_msg, ODEintWarning)
EDIT/ANSWER:
I solved the problem, by putting the change in the Brownian path out of func. I guess it was just too much for odeint to cope with the function (or it generated a new Brownian path for each t?)
mean = [0]*len(u0hat)
diag = [2] * len(u0hat)
cov = np.diag(diag)
B = np.random.multivariate_normal(mean, cov)
def func(uhat_ri, t, kappa, a, b, c, B):
uhat = uhat_ri[:N] +(1j)*uhat_ri[N:]
d_uhat = np.zeros_like(uhat)
d_uhat = -a**2 * (np.power(kappa, 2)) * uhat-c * (1j) * kappa * uhat + b * B * (1j) * kappa * uhat
d_uhat_ri = np.concatenate((d_uhat.real, d_uhat.imag))
return d_uhat_ri
uhat_ri = odeint(func, u0hat_ri, t, args=(kappa, a, b, c, B))

scipy.optimize.fmin_cg function need two callable function f and fprime, how could I extract two function from a function that return both two values?

def linear_regression(theta, X, y, lamb):
# X(12,1+1) theta(2,1) y(12,1)
m = X.shape[0]
ones = np.ones([m, 1])
X = np.hstack([ones, X])
h = X.dot(theta)
# cost function
J = 1 / 2 / m * np.sum(np.power(h - y, 2)) + lamb / 2 / m * np.sum(np.power(theta[1:], 2))
# gradient X(12,2) X.T(2,12) (h-y)(12,1) sum_error(2,1)\
sum_error = 1 / m * X.T.dot(h - y)
temp = theta
temp[0] = 0
gradient = sum_error + lamb / m * temp
return J, gradient
def f(theta, X, y, lamb):
J, gradient = linear_regression(theta, X, y, lamb)
return J
def fprime(theta, X, y, lamb):
J, gradient = linear_regression(theta, X, y, lamb)
return gradient
J, gradient = linear_regression(theta,X,y,1)
# theta need to be a vector not matrix
result = opt.fmin_cg(f, theta, fprime=fprime, args=(X,y,1))
print(result[0])
Explain:
opt.fmin_cg(f, theta, fprime=fprime, args=(X,y,1)) need a callable function f and fprime
f, fprime is the return value of linear_regression(theta, X, y, lamb)
It is easy to compute the cost function and gradient in the same function
Question:
Is there an easy way to extract two callable function from linear_regression(theta, X, y, lamb)
calling J, gradient = linear_regression(theta,X,y,1) and pass to opt.fmin_cg(J, theta, fprime=gradient , args=(X,y,1)) is not work

Explicit Euler method doesn't behave how I expect

I have implemented the following explicit euler method in python:
def explicit_euler(df, x0, h, N):
"""Solves an ODE IVP using the Explicit Euler method.
Keyword arguments:
df - The derivative of the system you wish to solve.
x0 - The initial value of the system you wish to solve.
h - The step size.
N - The number off steps.
"""
x = np.zeros(N)
x[0] = x0
for i in range(0, N-1):
x[i+1] = x[i] + h * df(x[i])
return x
Following the article on wikipedia I can plot the function and verify that I get the same plot: . I believe that here the method I have written is working correctly.
Next I tried to use it to solve the last system given on this page and instead of the plot shown there I obtain this:
I am not sure why my plot doesn't match the one shown on the webpage. The explicit euler method seems to work fine when I use it to solve systems where the slope doesn't change, but for an oscillating function it never seems to mimic it at all. Not even showing the expected error gain as indicated on the linked webpage. I am not sure what is wrong with the method I have implemented.
Here is the code used for plotting and the derivative:
def g(t):
return -0.5 * np.exp(t * 0.5) * np.sin(5 * t) + 5 * np.exp(t * 0.5)
* np.cos(5 * t)
h = 0.001
x0 = 0
tn = 4
N = int(tn / h)
x = ee.explicit_euler(f, x0, h, N)
t = np.arange(0, tn, h)
fig = plt.figure()
plt.plot(t, x, label="Explicit Euler")
plt.plot(t, (np.exp(0.5 * t) * np.sin(5 * t)), label="Analytical
solution")
#plt.plot(t, np.exp(0.5 * t), label="Analytical solution")
plt.xlabel('Timesteps t')
plt.ylabel('x(t)=e^(0.5*t) * sin(5*t)')
plt.legend()
plt.grid()
plt.show()
Edit:
As requested here is the current equation I am applying the method to:
y'-y=-0.5*e^(t/2)*sin(5t)+5e^(t/2)*cos(5t)
Where y(0)=0.
I would like to make clear however that this behaviour doesn't occur just for this equation but all equations where the slope has a change in sign, or oscillating behaviour.
Edit 2:
Ok thanks. Yes the code below does indeed work. But I have one further question. In the simple example I had for the exponential function, I had defined a method:
def f(x):
return x
for the system f'(x)=x. This gave the output of my first graph which looks correct. I then defined another function:
def k(x):
return cos(x)
for the system f'(x)=cos(x), this does not give expected output. But when I change the function definition to
def k(t, x):
return cos(t)
I get the expected output. If I change my function
def f(t, x):
return t
I get an incorrect output. Am I always actually evaluating the function at a time step and is it just by chance for the system x'=x that at each time step the value is just the value of x?
I had understood that the Euler method used the value of the previously calculated value in order to get the next value. But if I run code for my function k(x)=cos(x), I get output pictured below, which must be incorrect. This now uses the updated code you provided.
def k(t, x):
return np.cos(x)
h = 0.1 # Step size
x0 = (0, 0) # Initial point of iteration
tn = 10 # Time step to iterate to
N = int(tn / h) # Number of steps
x = ee.explicit_euler(k, x0, h, N)
t = np.arange(0, tn, h)

The problem is that you have incorrectly raised the function g, you want to solve the equation:
From where we observe that:
y' = y -0.5*e^(t/2)*sin(5t)+5e^(t/2)*cos(5t)
Then we define the function f(t, y) = y -0.5*e^(t/2)*sin(5t)+5e^(t/2)*cos(5t) as:
def f(t, y):
return y -0.5 * np.exp(t * 0.5) * np.sin(5 * t) + 5 * np.exp(t * 0.5) * np.cos(5 * t)
The initial point of iteration is f0=(t(0), y(0)):
f0 = (0, 0)
Then from Euler's equations:
def explicit_euler(df, x0, h, N):
"""Solves an ODE IVP using the Explicit Euler method.
Keyword arguments:
df - The derivative of the system you wish to solve.
x0 - The initial value of the system you wish to solve.
h - The step size.
N - The number off steps.
"""
x = np.zeros(N)
t, x[0] = x0
for i in range(0, N-1):
x[i+1] = x[i] + h * df(t ,x[i])
t += h
return x
Complete Code:
def explicit_euler(df, x0, h, N):
"""Solves an ODE IVP using the Explicit Euler method.
Keyword arguments:
df - The derivative of the system you wish to solve.
x0 - The initial value of the system you wish to solve.
h - The step size.
N - The number off steps.
"""
x = np.zeros(N)
t, x[0] = x0
for i in range(0, N-1):
x[i+1] = x[i] + h * df(t ,x[i])
t += h
return x
def df(t, y):
return -0.5 * np.exp(t * 0.5) * np.sin(5 * t) + 5 * np.exp(t * 0.5) * np.cos(5 * t) + y
h = 0.001
f0 = (0, 0)
tn = 4
N = int(tn / h)
x = explicit_euler(df, f0, h, N)
t = np.arange(0, tn, h)
fig = plt.figure()
plt.plot(t, x, label="Explicit Euler")
plt.plot(t, (np.exp(0.5 * t) * np.sin(5 * t)), label="Analytical solution")
#plt.plot(t, np.exp(0.5 * t), label="Analytical solution")
plt.xlabel('Timesteps t')
plt.ylabel('x(t)=e^(0.5*t) * sin(5*t)')
plt.legend()
plt.grid()
plt.show()
Screenshot:
Dump y' and what is on the right side is what you should place in the df function.
We will modify the variables to maintain the same standard for the variables, and will y be the dependent variable, and t the independent variable.
Equation 2: In this case the equation f'(x)=cos(x) will be rewritten to:
y'=cos(t)
Then:
def df(t, y):
return np.cos(t)
In conclusion, if we have an equation of the following form:
y' = f(t, y)
Then:
def df(t, y):
return f(t, y)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to find 2 parameters with gradient descent method in Python? - python

Related

Numpy - vectorize the bivariate poisson pmf equation

Learning parameters with gradient descent

Programming a PDE in Python with Brownian path

scipy.optimize.fmin_cg function need two callable function f and fprime, how could I extract two function from a function that return both two values?

Explicit Euler method doesn't behave how I expect

Categories

Resources