Plotting tangent of cost function for gradient descent - linear regression - python

Ok so this is maybe more a math question than a programming one, but something is really bogging me down.
Suppose I manually perform gradient descent for a simple univariate linear regression, as follows:
# add biases to data
X_ = np.concatenate(
[np.ones(X_scaled.shape[0]).reshape(-1, 1), X_scaled], axis=1)
X_copy = X_.copy()
history = []
thetas = initial_theta
costs = []
grads = []
for step in range(200):
hypothesis = np.dot(X_copy, thetas)
# cost
J = (1 / m) * np.sum(np.square(hypothesis-y))
# derivative
d = np.dot(hypothesis-y, X_copy) / m
# store
history.append(thetas)
costs.append(J)
grads.append(d)
# update
thetas = thetas - d * 0.1
The final thetas I get are approximately the same I get with scikit-lern, so so far all good.
Now I want to plot the tangent line to the cost function for a given value of one of the theta params.
I do this:
fig = plt.figure()
s = 4 # which gradient descent iteration should I pick
i = 2 # just a basic increment factor to plot the tangent line
# plot cost as function of first param
plt.plot([params[0] for params in history], costs, "-")
# pick a tangent point
tangent_point_x, tangent_point_y = history[s][0], costs[s]
plt.plot(tangent_point_x, tangent_point_y, "to")
# plot tangent
slope = grads[s][0]
new_point1_x = history[s-i][0]
new_point1_y = tangent_point_y + slope * (new_point1_x - tangent_point_x)
new_point2_x = history[s+i][0]
new_point2_y = tangent_point_y + slope * (new_point2_x - tangent_point_x)
plt.plot((new_point1_x, new_point2_x), (new_point1_y, new_point2_y), "-")
plt.plot(new_point1_x, new_point1_y, "bo")
plt.plot(new_point2_x, new_point2_y, "go")
Here is the resulting plot. What am I doing wrong?

Related

Function seems to append arguments even when arguments are defined explicitely

I am working with a Markov Chain Monte Carlo algorithm (Metropolis-Hastings Algorithm) to find the best fit for experimental data using model data. I have a function called evaluation that takes in two arguments, theta and phi. I am using this function to calculate both experimental and model data for the trajectory of a particle. Note: I am creating my own experimental data using the function to see if my program works before I use actual experimental data.
Here is the code:
def evaluation(theta,phi): ### For creating model/experimental data
velocity_x[0] = v0*np.sin(theta)*np.cos(phi) ### Initial values for velocities
velocity_y[0] = v0*np.sin(theta)*np.sin(phi)
velocity_z[0] = v0*np.cos(theta)
for i in range(len(actual_y) - 1): ### Loop over experimental/model trajectory
velocity = np.array([velocity_x[i],velocity_y[i],velocity_z[i]])
cross_product = np.cross(velocity,Bz)
### Calculate subsequent velocities for model/experimental
velocity_x[i+1] = velocity_x[i] #+ const*cross_product[0]*dt / gamma_2
velocity_y[i+1] = velocity_y[i] #+ const*cross_product[1]*dt / gamma_2
velocity_z[i+1] = velocity_z[i] #+ const*cross_product[2]*dt / gamma_2
xmodel[i+1] = xmodel[i] + velocity_x[i]*dt #+ 0.5*const*cross_product[0]*dt / gamma_2
ymodel[i+1] = ymodel[i] + velocity_y[i]*dt #+ 0.5*const*cross_product[1]*dt / gamma_2
zmodel[i+1] = zmodel[i] + velocity_z[i]*dt #+ 0.5*const*cross_product[2]*dt / gamma_2
return xmodel, ymodel, zmodel ### Returns x,y,z model data
def calculate_error(actualx, modelx, actualy, modely, actualz, modelz, sigma = 400):
chi_squared = np.zeros(len(actual_x))
for i in range(len(actual_x)):
for j in range(len(actual_x)):
chi_squared[i] = (actualx[i] - modelx[j])**2 + (actualy[i] - modely[j])**2 + (actualz[i] - modelz[j])**2
return min(chi_squared)
thetas = [1.37] ### In radians; initial guess for thetas and phis
phis = [0.187]
chi = [] ### These lists store the values after MC calculations
num_sample = 1000 ### Number of samples
theta_step_size = 0.01
phi_step_size = 0.01
### x,y,and z model data with initial guess for thetas and phis
x_rand = evaluation(thetas,phis)[0]
y_rand = evaluation(thetas,phis)[1]
z_rand = evaluation(thetas,phis)[2]
error = calculate_error(x_exp_data,x_rand,y_exp_data,y_rand,z_exp_data,z_rand) ### Error
chi.append(error) ### error
for i in range(num_sample): ### Begin Monte Carlo loop
theta0 = thetas[-1]
phi0 = phis[-1]
theta1 = theta0 + np.random.normal()*theta_step_size ### Take random step
phi1 = phi0 + np.random.normal()*phi_step_size
x_exp_data = evaluation(1.5705,0)[0] ### Experimental data should stay constant with defined arguments
y_exp_data = evaluation(1.5705,0)[1]
z_exp_data = evaluation(1.5705,0)[2]
x_rand = evaluation(theta1,phi1)[0]
y_rand = evaluation(theta1,phi1)[1]
z_rand = evaluation(theta1,phi1)[2] ### Evaluate x,y,z exp data with random thetas and phis
error_1 = calculate_error(x_exp_data,x_rand,y_exp_data,y_rand,z_exp_data,z_rand)
#print('x:',x_rand[0:5], 'X-Exp:', x_exp_data[0:5])
P = np.exp(-error_1 + error) ### Acceptance probability
r = np.random.uniform() ### Generating uniform number (numbers are equally likely to be chosen)
print('Exp X:', x_exp_data, 'X Rand:', x_rand)
if r < P: ### Condition that accepts the theta and phi values in current iteration
thetas.append(theta1)
phis.append(phi1)
chi.append(error_1)
#print('Error 1:',error_1, 'Error:', error, 'Phi:',phi1, 'Theta:',theta1, 'i:',i)
error = error_1
The problem that I am having is that x_exp_data, y_exp_data, and z_exp_data don't seem to staying constant despite the arguments staying constant inside the Monte Carlo loop: it seems to be appending theta1 and phi1 values for each iteration, resulting in the error to be zero for all iterations. This should not be the case since the experimental data uses theta,phi = 1.5705, 0 while the model data uses 1.37 and 0.187 for theta and phi, respectively, and changes with each random step. I am not sure why x_exp_data, y_exp_data, and z_exp_data are also appending the new random step values when they're arguments are clearly defined. I have also tried defining the experimental data outside of the Monte Carlo loop, but this didn't change how the code is working. Any help or suggestions would be appreciated.

Gekko Python adjusting defined piecewise linear function during solving

I am trying to solve a dynamic optimization problem using gekko. The goal is to minimize a form of energy consumption represented by VSP over a set distance under speed constraints. I define a piece-wise linear function as a the speed constraint and to model the slope of the road at different distances:
min_velocity = 0
max_velocity = 10
max_decel = -1
max_accel = 1
distances = np.linspace(0,20,21)
goal_dist = 200
trip_time = 100
# set up PWL functions
distances = np.linspace(0,200,10)
speed_limits = np.ones(10)*5
speed_limits[5:]=7
slope = np.zeros(10)
slope[3:5]=1; slope[7:9]=-1
model = GEKKO(remote=False)
model.time = [i for i in range(trip_time)]
x = model.Var(value=0.0, lb=0)
v = model.Var(value=0.0, lb = min_velocity, ub = max_velocity)
v_max = model.Var()
slope_var = model.Var()
a = model.MV(value=0, lb=max_decel ,ub=max_accel)
a.STATUS = 1
#define vehicle movement
model.Equation(x.dt()==v)
model.Equation(v.dt()==a)
#aggregated velocity constraint
model.pwl(x, v_max, distances, speed_limits)
model.Equation(v<=v_max)
#slope is modeled as a piecewise linear function
model.pwl(x, slope_var, distances, slope)
#End state constraints
p = np.zeros_like(model.time); p[-1]=1
final = model.Param(p)
model.Minimize(1e4*final*(v**2))# vehicle must be fully stopped
model.Minimize(1e4*final*((x-goal_dist)**2))# vehicle must arrive at destination
#VSPI Objective function
obj = model.Intermediate(v * (1.1 * a + 9.81 * slope_var + 0.132) + 0.0003002*pow(v, 3))
#VSPI Objective function
model.Obj(obj)
# solve
model.options.IMODE = 6
model.options.REDUCE = 3
model.options.MAX_ITER=1000
model.solve(disp=False)
plt.plot(x.value, v_max.value, 'b-', label = r'$vmaxvals$')
plt.plot(x.value , v.value,'g-',label=r'$vopt$')
plt.plot(x.value, a.value, 'c-', label=r'$accel$')
plt.plot(x.value, slope_var.value, 'r-', label=r'$slope$')
plt.plot([i*20 for i in range(10)], slope, 'mx', label=r'$orig_slope$')
plt.plot([i*20 for i in range(10)], speed_limits, 'kx', label=r'$orig_spd_limit$')
plt.legend(loc='best')
plt.xlabel('Distance Covered')
plt.show()
print(model.options.APPSTATUS)
Unfortunately, however, the values of slope_var and v_max get adjusted in the process of solving the problem. I am sure this is intended in this case, so is there a way to fix these PWL functions in place similar to a Parameter?
If I use a cspline object to apprximate the speed limits and slope, the values dont't change since it is pre-built as far as I understand, however, the accuracy of a cubic spline is limited to a few data points and few changes in slope, which is why I would like to model it using a piecewise linear function.
The pwl function does give a linear interpolation but it relies on a Mathematical Program with Complementarity Constraints (MPCCs) that are challenging to solve with many local minima at saddle points. You mentioned that you don't want to use the cspline function, but it may be your best option. There are some slight errors at the transition points, but it can be fixed by adding additional points during transitions or by increasing the resolution.
import numpy as np
from gekko import GEKKO
import matplotlib.pyplot as plt
min_velocity = 0
max_velocity = 10
max_decel = -1
max_accel = 1
distances = np.linspace(0,20,21)
goal_dist = 200
trip_time = 100
# set up PWL functions
distances = np.linspace(0,200,10)
speed_limits = np.ones(10)*5
speed_limits[5:]=7
slope = np.zeros(10)
slope[3:5]=1; slope[7:9]=-1
model = GEKKO(remote=False)
model.time = [i for i in range(trip_time)]
x = model.Var(value=0.0, lb=0)
v = model.Var(value=0.0, lb = min_velocity, ub = max_velocity)
v_max = model.Var()
slope_var = model.Var()
a = model.MV(value=0, lb=max_decel ,ub=max_accel)
a.STATUS = 1
#define vehicle movement
model.Equation(x.dt()==v)
model.Equation(v.dt()==a)
#aggregated velocity constraint
model.cspline(x,v_max,distances,speed_limits,True)
#model.pwl(x, v_max, distances, speed_limits)
model.Equation(v<=v_max)
#slope is modeled as a piecewise linear function
#model.pwl(x, slope_var, distances, slope)
model.cspline(x,slope_var,distances,slope,True)
#End state constraints
p = np.zeros_like(model.time); p[-1]=1
final = model.Param(p)
model.Minimize(1e4*final*(v**2))# vehicle must be fully stopped
model.Minimize(1e4*final*((x-goal_dist)**2))# vehicle must arrive at destination
#VSPI Objective function
obj = model.Intermediate(v * (1.1 * a + 9.81 * slope_var + 0.132) + 0.0003002*pow(v, 3))
#VSPI Objective function
model.Obj(obj)
# solve
model.options.IMODE = 6
model.options.REDUCE = 3
model.options.MAX_ITER=1000
model.solve(disp=False)
plt.plot(x.value, v_max.value, 'b-', label = 'vmaxvals')
plt.plot(x.value , v.value,'g-',label='vopt')
plt.plot(x.value, a.value, 'c-', label='accel')
plt.plot(x.value, slope_var.value, 'r-', label='slope')
plt.plot(distances, slope, 'mx', label='orig_slope')
plt.plot(distances, speed_limits, 'kx', label='orig_spd_limit')
plt.legend(loc='best')
plt.xlabel('Distance Covered')
plt.show()
print(model.options.APPSTATUS)
There was an error in the plotting with:
plt.plot([i*20 for i in range(10)], slope, 'mx', label=r'$orig_slope$')
plt.plot([i*20 for i in range(10)], speed_limits, 'kx', label=r'$orig_spd_limit$')
p
Use this instead:
plt.plot(distances, slope, 'mx', label='orig_slope')
plt.plot(distances, speed_limits, 'kx', label='orig_spd_limit')

scipy curve_fi returns initial parameters estimates

I am triyng to use scipy curve_fit function to fit a gaussian function to my data to estimate a theoretical power spectrum density. While doing so, the curve_fit function always return the initial parameters (p0=[1,1,1]) , thus telling me that the fitting didn't work.
I don't know where the issue is. I am using python 3.9 (spyder 5.1.5) from the anaconda distribution on windows 11.
here a Wetransfer link to the data file
https://wetransfer.com/downloads/6097ebe81ee0c29ee95a497128c1c2e420220704110130/86bf2d
Here is my code below. Can someone tell me what the issue is, and how can i solve it?
on the picture of the plot, the blue plot is my experimental PSD and the orange one is the result of the fit.
import numpy as np
import math
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import scipy.constants as cst
File = np.loadtxt('test5.dat')
X = File[:, 1]
Y = File[:, 2]
f_sample = 50000
time=[]
for i in range(1,len(X)+1):
t=i*(1/f_sample)
time= np.append(time,t)
N = X.shape[0] # number of observation
N1=int(N/2)
delta_t = time[2] - time[1]
T_mes = N * delta_t
freq = np.arange(1/T_mes, (N+1)/T_mes, 1/T_mes)
freq=freq[0:N1]
fNyq = f_sample/2 # Nyquist frequency
nb = 350
freq_block = []
# discrete fourier tansform
X_ft = delta_t*np.fft.fft(X, n=N)
X_ft=X_ft[0:N1]
plt.figure()
plt.plot(time, X)
plt.xlabel('t [s]')
plt.ylabel('x [micro m]')
# Experimental power spectrum on both raw and blocked data
PSD_X_exp = (np.abs(X_ft)**2/T_mes)
PSD_X_exp_b = []
STD_PSD_X_exp_b = []
for i in range(0, N1+2, nb):
freq_b = np.array(freq[i:i+nb]) # i-nb:i
psd_b = np.array(PSD_X_exp[i:i+nb])
freq_block = np.append(freq_block, (1/nb)*np.sum(freq_b))
PSD_X_exp_b = np.append(PSD_X_exp_b, (1/nb)*np.sum(psd_b))
STD_PSD_X_exp_b = np.append(STD_PSD_X_exp_b, PSD_X_exp_b/np.sqrt(nb))
plt.figure()
plt.loglog(freq, PSD_X_exp)
plt.legend(['Raw Experimental PSD'])
plt.xlabel('f [Hz]')
plt.ylabel('PSD')
plt.figure()
plt.loglog(freq_block, PSD_X_exp_b)
plt.legend(['Experimental PSD after blocking'])
plt.xlabel('f [Hz]')
plt.ylabel('PSD')
kB = cst.k # Boltzmann constant [m^2kg/s^2K]
T = 273.15 + 25 # Temperature [K]
r = (2.8 / 2) * 1e-6 # Particle radius [m]
v = 0.00002414 * 10 ** (247.8 / (-140 + T)) # Water viscosity [Pa*s]
gamma = np.pi * 6 * r * v # [m*Pa*s]
Do = kB*T/gamma # expected value for D
f3db_o = 50000 # expected value for f3db
fc_o = 300 # expected value pour fc
n = np.arange(-10,11)
def theo_spectrum_lorentzian_filter(x, D_, fc_, f3db_):
PSD_theo=[]
for i in range(0,len(x)):
# print(i)
psd_theo=np.sum((((D_*Do)/2*math.pi**2)/((fc_*fc_o)**2+(x[i]+n*f_sample)
** 2))*(1/(1+((x[i]+n*f_sample)/(f3db_*f3db_o))**2)))
PSD_theo= np.append(PSD_theo,psd_theo)
return PSD_theo
popt, pcov = curve_fit(theo_spectrum_lorentzian_filter, freq_block, PSD_X_exp_b, p0=[1, 1, 1], sigma=STD_PSD_X_exp_b, absolute_sigma=True, check_finite=True,bounds=(0.1, 10), method='trf', jac=None)
D_, fc_, f3db_ = popt
D1 = D_*Do
fc1 = fc_*fc_o
f3db1 = f3db_*f3db_o
print('Diffusion constant D = ', D1, ' Corner frequency fc= ',fc1, 'f3db(diode,eff)= ', f3db1)
I believe I've successfully fitted your data. Here's the approach I took.
First, I plotted your model (with popt=[1, 1, 1]) and the data you had. I noticed your data was significantly lower than the model. Then I started fiddling with the parameters. I wanted to push the model upwards. I did that by multiplying popt[0] by increasingly large values. I ended up with 1E13 as a ballpark value. Note that I have no idea if this is physically possible for your model. Then I jury-rigged your fitting function to multiply D_ by 1E13 and ran your code. I got this fit:
So I believe it's a problem of 1) inappropriate starting values and 2) inappropriate bounds. In your position, I would revise this model, check if there's any problems with units and so on.
Here's what I used to try to fit your model:
plt.figure()
plt.loglog(freq_block[:170], PSD_X_exp_b[:170], label='Exp')
plt.loglog(freq_block[:170],
theo_spectrum_lorentzian_filter(
freq_block[:170],
1E13*popt[0], popt[1], popt[2]),
label='model'
)
plt.xlabel('f [Hz]')
plt.ylabel('PSD')
plt.legend()
I limited the data to point 170 because there were some weird backwards values that made me uncomfortable. I would recheck them if I were you.
Here's the model code I used. I didn't change the curve_fit call (except to limit x to :170.
def theo_spectrum_lorentzian_filter(x, D_, fc_, f3db_):
PSD_theo=[]
D_ = 1E13*D_ # I only changed here
for i in range(0,len(x)):
psd_theo=np.sum((((D_*Do)/2*math.pi**2)/((fc_*fc_o)**2+(x[i]+n*f_sample)
** 2))*(1/(1+((x[i]+n*f_sample)/(f3db_*f3db_o))**2)))
PSD_theo= np.append(PSD_theo,psd_theo)
return PSD_theo

Gradient Descent algorithm for linear regression do not optmize the y-intercept parameter

I'm following Andrew Ng Coursera course on Machine Learning and I tried to implement the Gradient Descent Algorithm in Python. I'm having trouble with the y-intercept parameter because it doesn't look like to go to the best value. Here's my code:
# IMPORTS
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
# Acquiring Data
# Source: https://github.com/mattnedrich/GradientDescentExample
data = pd.read_csv('data.csv')
def cost_function(a, b, x_values, y_values):
'''
Calculates the square mean error for a given dataset
with (x,y) pairs and the model y' = a + bx
a: y-intercept for the model
b: slope of the curve
x_values, y_values: points (x,y) of the dataset
'''
data_len = len(x_values)
total_error = sum([((a + b * x_values[i]) - y_values[i])**2
for i in range(data_len)])
return total_error / (2 * float(data_len))
def a_gradient(a, b, x_values, y_values):
'''
Partial derivative of the cost_function with respect to 'a'
a, b: values for 'a' and 'b'
x_values, y_values: points (x,y) of the dataset
'''
data_len = len(x_values)
a_gradient = sum([((a + b * x_values[i]) - y_values[i])
for i in range(data_len)])
return a_gradient / float(data_len)
def b_gradient(a, b, x_values, y_values):
'''
Partial derivative of the cost_function with respect to 'b'
a, b: values for 'a' and 'b'
x_values, y_values: points (x,y) of the dataset
'''
data_len = len(x_values)
b_gradient = sum([(((a + b * x_values[i]) - y_values[i]) * x_values[i])
for i in range(data_len)])
return b_gradient / float(data_len)
def gradient_descent_step(a_current, b_current, x_values, y_values, alpha):
'''
Give a step in direction of the minimum of the cost_function using
the 'a' and 'b' gradiants. Return new values for 'a' and 'b'.
a_current, b_current: the current values for 'a' and 'b'
x_values, y_values: points (x,y) of the dataset
'''
new_a = a_current - alpha * a_gradient(a_current, b_current, x_values, y_values)
new_b = b_current - alpha * b_gradient(a_current, b_current, x_values, y_values)
return (new_a, new_b)
def run_gradient_descent(a, b, x_values, y_values, alpha, precision, plot=False, verbose=False):
'''
Runs the gradient_descent_step function and updates (a,b) until
the value of the cost function varies less than 'precision'.
a, b: initial values for the point a and b in the cost_function
x_values, y_values: points (x,y) of the dataset
alpha: learning rate for the algorithm
precision: value for the algorithm to stop calculation
'''
iterations = 0
delta_cost = cost_function(a, b, x_values, y_values)
error_list = [delta_cost]
iteration_list = [0]
# The loop runs until the delta_cost reaches the precision defined
# When the variation in cost_function is small it means that the
# the function is near its minimum and the parameters 'a' and 'b'
# are a good guess for modeling the dataset.
while delta_cost > precision:
iterations += 1
iteration_list.append(iterations)
# Calculates the initial error with current a,b values
prev_cost = cost_function(a, b, x_values, y_values)
# Calculates new values for a and b
a, b = gradient_descent_step(a, b, x_values, y_values, alpha)
# Updates the value of the error
actual_cost = cost_function(a, b, x_values, y_values)
error_list.append(actual_cost)
# Calculates the difference between previous and actual error values.
delta_cost = prev_cost - actual_cost
# Plot the error in each iteration to see how it decreases
# and some information about our final results
if plot:
plt.plot(iteration_list, error_list, '-')
plt.title('Error Minimization')
plt.xlabel('Iteration',fontsize=12)
plt.ylabel('Error',fontsize=12)
plt.show()
if verbose:
print('Iterations = ' + str(iterations))
print('Cost Function Value = '+ str(cost_function(a, b, x_values, y_values)))
print('a = ' + str(a) + ' and b = ' + str(b))
return (actual_cost, a, b)
When I run the algorithm with:
run_gradient_descent(0, 0, data['x'], data['y'], 0.0001, 0.01)
I get (a = 0.0496688656535 and b = 1.47825808018)
But the best value for 'a' is around 7.9 (tried another resources for linear regression).
Also, if I change the initial guess for the parameter 'a' the algorithm simply try to adjust the parameter 'b'.
For example, if I set a = 200 and b = 0
run_gradient_descent(200, 0, data['x'], data['y'], 0.0001, 0.01)
I get (a = 199.933763331 and b = -2.44824996193)
I couldn't find anything wrong with the code and I realized that the problem is the initial guess for a parameter. See my own answer above where I defined a helper function to get a range for search some values for initial a guess.
Gradient descent does not guarantee to find global optimum. Your chances of finding the global optimum depend on your starting value. To get the real values of the parameters, first I solved the least squares problem which guarantees global minimum.
data = pd.read_csv('data.csv',header=-1)
x,y = data[0],data[1]
from scipy.stats import linregress
linregress(x,y)
This results in following statistics:
LinregressResult(slope=1.32243102275536, intercept=7.9910209822703848, rvalue=0.77372849988782377, pvalue=3.855655536990139e-21, stderr=0.109377979589804)
Thus b = 1.32243102275536 and a = 7.9910209822703848. Given this, using your code I solved the problem a couple of times using randomized starting values a and b:
a,b = np.random.rand()*10,np.random.rand()*10
print("Initial values of parameters: ")
print("a=%f\tb=%f" % (a,b))
run_gradient_descent(a, b,x,y,1e-4,1e-2)
Here is the solution that I got:
Initial values of parameters:
a=6.100305 b=2.606448
Iterations = 21
Cost Function Value = 55.2093808263
a = 6.07601889437 and b = 1.36310312751
Therefore, it seems like the reason that you cannot get close to minimum is because of choice of your initial parameter values. You will see it yourself as well, if you put a and b obtained from least squares into your gradient descent algorithm, it will iterate only for one time and stay where it is.
Somehow, at some point delta_cost > precision is True and it stops there considering it a local optimum. If you decrease your precision and if you run it long enough then you might be able to find the global optimum.
The complete code for my Gradient Descent implementation could be found on my Github repository:
Gradient Descent for Linear Regression
Thinking about what #relay said that the Gradient Descent algorithm does not guarantee to find the global minima I tried to come up with an helper function to limit guesses for the parameter a in a certain search range, as follows:
def search_range(x, y, plot=False):
'''
Given a dataset with points (x, y) searches for a best guess for
initial values of 'a'.
'''
data_lenght = len(x) # Total size of of the dataset
q_lenght = int(data_lenght / 4) # Size of a quartile of the dataset
# Finding the max and min value for y in the first quartile
min_Q1 = (x[0], y[0])
max_Q1 = (x[0], y[0])
for i in range(q_lenght):
temp_point = (x[i], y[i])
if temp_point[1] < min_Q1[1]:
min_Q1 = temp_point
if temp_point[1] > max_Q1[1]:
max_Q1 = temp_point
# Finding the max and min value for y in the 4th quartile
min_Q4 = (x[data_lenght - 1], y[data_lenght - 1])
max_Q4 = (x[data_lenght - 1], y[data_lenght - 1])
for i in range(data_lenght - 1, data_lenght - q_lenght, -1):
temp_point = (x[i], y[i])
if temp_point[1] < min_Q4[1]:
min_Q4 = temp_point
if temp_point[1] > max_Q4[1]:
max_Q4 = temp_point
mean_Q4 = (((min_Q4[0] + max_Q4[0]) / 2), ((min_Q4[1] + max_Q4[1]) / 2))
# Finding max_y and min_y given the points found above
# Two lines need to be defined, L1 and L2.
# L1 will pass through min_Q1 and mean_Q4
# L2 will pass through max_Q1 and mean_Q4
# Calculatin slope for L1 and L2 given m = Delta(y) / Delta (x)
slope_L1 = (min_Q1[1] - mean_Q4[1]) / (min_Q1[0] - mean_Q4[0])
slope_L2 = (max_Q1[1] - mean_Q4[1]) / (max_Q1[0] -mean_Q4[0])
# Calculating y-intercepts for L1 and L2 given line equation in the form y = mx + b
# Float numbers are converted to int because they will be used as range for itaration
y_L1 = int(min_Q1[1] - min_Q1[0] * slope_L1)
y_L2 = int(max_Q1[1] - max_Q1[0] * slope_L2)
# Ploting L1 and L2
if plot:
L1 = [(y_L1 + slope_L1 * x) for x in data['x']]
L2 = [(y_L2 + slope_L2 * x) for x in data['x']]
plt.plot(data['x'], data['y'], '.')
plt.plot(data['x'], L1, '-', color='r')
plt.plot(data['x'], L2, '-', color='r')
plt.title('Scatterplot of Sample Data')
plt.xlabel('x',fontsize=12)
plt.ylabel('y',fontsize=12)
plt.show()
return y_L1, y_L2
The idea is to run the gradient descent with guesses for a within the range given by search_range() function and get the minimum possible value for the cost_function(). The new way to run the gradient descente becomes:
def run_search_gradient_descent(x_values, y_values, alpha, precision, verbose=False):
'''
Runs the gradient_descent_step function and updates (a,b) until
the value of the cost function varies less than 'precision'.
x_values, y_values: points (x,y) of the dataset
alpha: learning rate for the algorithm
precision: value for the algorithm to stop calculation
'''
from math import inf
a1, a2 = search_range(x_values, y_values)
best_guess = [inf, 0, 0]
for a in range(a1, a2):
cost, linear_coef, slope = run_gradient_descent(a, 0, x_values, y_values, alpha, precision)
# Saving value for cost_function and parameters (a,b)
if cost < best_guess[0]:
best_guess = [cost, linear_coef, slope]
if verbose:
print('Cost Function = ' + str(best_guess[0]))
print('a = ' + str(best_guess[1]) + ' and b = ' + str(best_guess[2]))
return (best_guess[0], best_guess[1], best_guess[2])
Running the code
run_search_gradient_descent(data['x'], data['y'], 0.0001, 0.001, verbose=True)
I've got:
Cost Function = 55.1294483959
a = 8.02595996606 and b = 1.3209768383
For comparison, using the linear regression from scipy.stats it returned
a = 7.99102098227and b = 1.32243102276

Cost Function and Gradient Seem to be Working, but scipy.optimize functions are not

I'm working through my Matlab code for the Andrew NG Coursera course and turning it into python. I am working on non-regularized logistic regression and after writing my gradient and cost functions I needed something similar to fminunc and after some googling, I found a couple options. They are both returning the same results, but they do not match what is in Andrew NG's expected results code. Others seem to be getting this to work correctly, but I'm wondering why my specific code does not seem to return the desired result when using scipy.optimize functions, but does for the cost and gradient pieces earlier in the code.
The data I'm using can be found at the link below;
ex2data1
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as op
#Machine Learning Online Class - Exercise 2: Logistic Regression
#Load Data
#The first two columns contains the exam scores and the third column contains the label.
data = pd.read_csv('ex2data1.txt', header = None)
X = np.array(data.iloc[:, 0:2]) #100 x 3
y = np.array(data.iloc[:,2]) #100 x 1
y.shape = (len(y), 1)
#Creating sub-dataframes for plotting
pos_plot = data[data[2] == 1]
neg_plot = data[data[2] == 0]
#==================== Part 1: Plotting ====================
#We start the exercise by first plotting the data to understand the
#the problem we are working with.
print('Plotting data with + indicating (y = 1) examples and o indicating (y = 0) examples.')
plt.plot(pos_plot[0], pos_plot[1], "+", label = "Admitted")
plt.plot(neg_plot[0], neg_plot[1], "o", label = "Not Admitted")
plt.xlabel('Exam 1 score')
plt.ylabel('Exam 2 score')
plt.legend()
plt.show()
def sigmoid(z):
'''
SIGMOID Compute sigmoid function
g = SIGMOID(z) computes the sigmoid of z.
Instructions: Compute the sigmoid of each value of z (z can be a matrix,
vector or scalar).
'''
g = 1 / (1 + np.exp(-z))
return g
def costFunction(theta, X, y):
'''
COSTFUNCTION Compute cost and gradient for logistic regression
J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the
parameter for logistic regression and the gradient of the cost
w.r.t. to the parameters.
'''
m = len(y) #number of training examples
h = sigmoid(X.dot(theta)) #logisitic regression hypothesis
J = (1/m) * np.sum((-y*np.log(h)) - ((1-y)*np.log(1-h)))
#h is 100x1, y is %100x1, these end up as 2 vector we subtract from each other
#then we sum the values by rows
#cost function for logisitic regression
return J
def gradient(theta, X, y):
m = len(y)
grad = np.zeros((theta.shape))
h = sigmoid(X.dot(theta))
for i in range(len(theta)): #number of rows in theta
XT = X[:,i]
XT.shape = (len(X),1)
grad[i] = (1/m) * np.sum((h-y)*XT) #updating each row of the gradient
return grad
#============ Part 2: Compute Cost and Gradient ============
#In this part of the exercise, you will implement the cost and gradient
#for logistic regression. You neeed to complete the code in costFunction.m
#Add intercept term to x and X_test
Bias = np.ones((len(X), 1))
X = np.column_stack((Bias, X))
#Initialize fitting parameters
initial_theta = np.zeros((len(X[0]), 1))
#Compute and display initial cost and gradient
(cost, grad) = costFunction(initial_theta, X, y), gradient(initial_theta, X, y)
print('Cost at initial theta (zeros): %f' % cost)
print('Expected cost (approx): 0.693\n')
print('Gradient at initial theta (zeros):')
print(grad)
print('Expected gradients (approx):\n -0.1000\n -12.0092\n -11.2628')
#Compute and display cost and gradient with non-zero theta
test_theta = np.array([[-24], [0.2], [0.2]]);
(cost, grad) = costFunction(test_theta, X, y), gradient(test_theta, X, y)
print('\nCost at test theta: %f' % cost)
print('Expected cost (approx): 0.218\n')
print('Gradient at test theta:')
print(grad)
print('Expected gradients (approx):\n 0.043\n 2.566\n 2.647\n')
result = op.fmin_tnc(func = costFunction, x0 = initial_theta, fprime = gradient, args = (X,y))
result[1]
Result = op.minimize(fun = costFunction,
x0 = initial_theta,
args = (X, y),
method = 'TNC',
jac = gradient, options={'gtol': 1e-3, 'disp': True, 'maxiter': 1000})
theta = Result.x
theta
test = np.array([[1, 45, 85]])
prob = sigmoid(test.dot(theta))
print('For a student with scores 45 and 85, we predict an admission probability of %f,' % prob)
print('Expected value: 0.775 +/- 0.002\n')
This was a very difficult problem to debug, and illustrates a poorly documented aspect of the scipy.optimize interface. The documentation vaguely indicates that theta will be passed around as a vector:
Minimization of scalar function of one or more variables.
In general, the optimization problems are of the form:
minimize f(x) subject to
g_i(x) >= 0, i = 1,...,m
h_j(x) = 0, j = 1,...,p
where x is a vector of one or more variables.
What's important is that they really mean vector in the most primitive sense, a 1-dimensional array. So you have to expect that whenever theta is passed into one of your callbacks, it will be passed in as a 1-d array. But in numpy, 1-d arrays sometimes behave differently from 2-d row arrays (and, obviously, from 2-d column arrays).
I don't know exactly why it's causing a problem in your case, but it's easily fixed regardless. You just have to add the following at the top of both your cost function and your gradient function:
theta = theta.reshape(-1, 1)
This guarantees that theta will be a 2-d column array, as expected. Once you've done this, the results are correct.
I have had similar issues with Scipy dealing with the same problem as you. As senderle points out the interface is not the easiest to deal with, especially combined with the numpy array interface... Here is my implementation which works as expected.
Defining the cost and gradient functions
Note that initial_theta is passed as a simple array of shape (3,) and converted to a column vector of shape (3,1) within the function. The gradient function then returns the grad.ravel() which has shape (3,) again. This is important as doing otherwise caused an error message with various optimization methods in Scipy.optimize.
Note that different methods have different behaviours but returning .ravel() seems to fix most issues...
import pandas as pd
import numpy as np
import scipy.optimize as opt
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def CostFunc(theta,X,y):
#Initializing variables
m = len(y)
J = 0
grad = np.zeros(theta.shape)
#Vectorized computations
z = X # theta
h = sigmoid(z)
J = (1/m) * ( (-y.T # np.log(h)) - (1 - y).T # np.log(1-h));
return J
def Gradient(theta,X,y):
#Initializing variables
m = len(y)
theta = theta[:,np.newaxis]
grad = np.zeros(theta.shape)
#Vectorized computations
z = X # theta
h = sigmoid(z)
grad = (1/m)*(X.T # ( h - y));
return grad.ravel() #<-- This is the trick
Initializing variables and parameters
Note that initial_theta.shape returns (3,)
X = data1.iloc[:,0:2].values
m,n = X.shape
X = np.concatenate((np.ones(m)[:,np.newaxis],X),1)
y = data1.iloc[:,-1].values[:,np.newaxis]
initial_theta = np.zeros((n+1))
Calling Scipy.optimize
model = opt.minimize(fun = CostFunc, x0 = initial_theta, args = (X, y), method = 'TNC', jac = Gradient)
Any comments from more knowledgeable people are welcome, this Scipy interface is a mystery to me, thanks

Categories