Python: Intersection of two equations - python

I have the following equations:
sqrt((x0 - x)^2 + (y0 - y)^2) - sqrt((x1 - x)^2 + (y1 - y)^2) = c1
sqrt((x3 - x)^2 + (y3 - y)^2) - sqrt((x4 - x)^2 + (y4 - y)^2) = c2
And I would like to find the intersection. I tried using fsolve, and transforming the equations into linear f(x) functions, and it worked for small numbers. I am working with huge numbers and to solve the linear equation there are lots of calculations performed, specifically the calculations reach to a square root of a subtraction, and when handling huge numbers precision is lost, and the left operand is smaller than the right one getting to a math value domain error trying to solve the square root of a negative number.
I am trying to solve this issue in different manners:
Trying to use bigger precision floats. Tried using numpy.float128 but fsolve wont allow using this.
Currently searching for a library that allows to solve non linear equations system, but no luck so far.
Any help/guidance/tip I will appreciate!!
Thanks!!

Taking all advice, i ended using code like the following:
for the the system:
0 = x + y - 8
0 = sqrt((-6 - x)^2 + (4 - y)^2) - sqrt((1 - x)^2 + y^) - 5
from math import sqrt
import numpy as np
from scipy.optimize import fsolve
def f(x):
y = np.zeros(2)
y[0] = x[1] + x[0] - 8
y[1] = sqrt((-6 - x[0]) ** 2 + (4 - x[1]) ** 2) - sqrt((1 - x[0]) ** 2 + x[1] ** 2) - 5
return y
x0 = np.array([0, 0])
solution = fsolve(f, x0)
print "(x, y) = (" + str(solution[0]) + ", " + str(solution[1]) + ")"
Note: the line x0 = np.array([0, 0]) corresponds to the seed that the method uses in fsolve in order to get to a solution. It is important to have a close seed to reach for a solution.
The example provided works :)

You might find some use in SymPy, which is a symbolic algebra manipulation in Python.
From it's home page:
SymPy is a Python library for symbolic mathematics. It aims to become a full-featured computer algebra system (CAS) while keeping the code as simple as possible in order to be comprehensible and easily extensible. SymPy is written entirely in Python.

As you have a non-linear equation you need some kind of optimizer to solve it. Probably you can use something like scipy.optimize (https://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html). However, as I have no experience with that scipy function can offer you only a solution with the gradient descent method of the tensorflow library. You can find a short guide here: https://learningtensorflow.com/lesson7/ (check out the Gradient descent cahpter). Analog to the method described there you could do something like that:
# These arrays are pseudo code, fill in your values for x0,x1,y0,y1,...
x_array = [x0,x1,x3,x4]
y_array = [y0,y1,y3,y4]
c_array = [c1,c2]
# Tensorflow model starts here
x=tf.placeholder("float")
y=tf.placeholder("float")
z=tf.placeholder("float")
# the array [0,0] are initial guesses for the "correct" x and y that solves the equation
xy_array = tf.Variable([0,0], name="xy_array")
x0 = tf.constant(x_array[0], name="x0")
x1 = tf.constant(x_array[1], name="x1")
x3 = tf.constant(x_array[2], name="x3")
x4 = tf.constant(x_array[3], name="x4")
y0 = tf.constant(y_array[0], name="y0")
y1 = tf.constant(y_array[1], name="y1")
y3 = tf.constant(y_array[2], name="y3")
y4 = tf.constant(y_array[3], name="y4")
c1 = tf.constant(c_array[0], name="c1")
c2 = tf.constant(c_array[1], name="c2")
# I took your first line and subtracted c1 from it, same for the second line, and introduced d_1 and d_2
d_1 = tf.sqrt(tf.square(x0 - xy_array[0])+tf.square(y0 - xy_array[1])) - tf.sqrt(tf.square(x1 - xy_array[0])+tf.square(y1 - xy_array[1])) - c_1
d_2 = tf.sqrt(tf.square(x3 - xy_array[0])+tf.square(y3 - xy_array[1])) - tf.sqrt(tf.square(x4 - xy_array[0])+tf.square(y4 - xy_array[1])) - c_2
# this z_model should actually be zero in the end, in that case there is an intersection
z_model = d_1 - d_2
error = tf.square(z-z_model)
# you can try different values for the "learning rate", here 0.01
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(error)
model = tf.global_variables_initializer()
with tf.Session() as session:
session.run(model)
# here you are creating a "training set" of size 1000, you can also make it bigger if you like
for i in range(1000):
x_value = np.random.rand()
y_value = np.random.rand()
d1_value = np.sqrt(np.square(x_array[0]-x_value)+np.square(y_array[0]-y_value)) - np.sqrt(np.square(x_array[1]-x_value)+np.square(y_array[1]-y_value)) - c_array[0]
d2_value = np.sqrt(np.square(x_array[2]-x_value)+np.square(y_array[2]-y_value)) - np.sqrt(np.square(x_array[3]-x_value)+np.square(y_array[3]-y_value)) - c_array[1]
z_value = d1_value - d2_value
session.run(train_op, feed_dict={x: x_value, y: y_value, z: z_value})
xy_value = session.run(xy_array)
print("Predicted model: {a:.3f}x + {b:.3f}".format(a=xy_value[0], b=xy_value[1]))
But be aware: This code will probably run a while... This is why haven't tested it...
Also I am currently not sure what will happen if there is no intersection. Probably you get the coordinates of the closest distance of the functions...
Tensorflow can be somewhat difficult if you haven't used it yet, but it is worth to learn it, as you can also use it for any deep learning application (actual purpose of this library).

Related

Lagrange Multipliers /w sympy

I am currently trying to find the maximum radius of a circle I can manifest between existing circles around it.
i.e. I'm trying to find not only the maximum radius, but the center point most suited for it over a specific given straight line.
In order to find said maxima I'm trying to implement a generalized Lagrange multipliers solution using sympy.
If "n" represents the amount of constraints I have, then I was able to:
Create n symbols generator.
Perform the necessary nth-gradient over the Lagrange function
Manifest the required inequalities (from constraints) to achieve the list of equalities and inequalities needed to be solved.
The code:
from sympy import S
from sympy import *
import sympy as smp
#Lagrange Multipliers
def sympy_distfun(cx,cy,radius):
x,y=smp.symbols('x y',real=True)
return sqrt((x-cx)**2+(y-cy)**2)-radius
def sympy_circlefun(cx,cy,radius):
x,y=smp.symbols('x y',real=True)
return (x-cx)**2+(y-cy)**2-radius**2
def sympy_linefun(slope,b):
x,y=smp.symbols('x y',real=True)
return slope*x+b-y
def lagrange_multiplier(objective,constraints):
x,y=smp.symbols('x y',real=True)
a=list(smp.symbols('a0:%d'%len(constraints),real=True))
cons=[constraints[i]*a[i] for i in range(len(a))]
L=objective+(-1)*sum(cons)
gradL=[smp.diff(L,var) for var in [x,y]+a]
constraints=[(con)>= 0 for con in constraints]
eqs=gradL+constraints
vars=a+[x,y]
solution=smp.solve(eqs[0],vars)
#solution=smp.solveset(eqs,vars)
print(solution)
line=sympy_linefun(0.66666,-4.3333)
dist=sympy_distfun(11,3,4)
circlefunc1=sympy_circlefun(11,3,4)
circlefunc2=sympy_circlefun(0,0,3)
lagrange_multiplier(dist,[line,circlefunc1,circlefunc2])
But, when using smp.solveset(eqs,vars) I encounter the error message:
ValueError: [-0.66666*a0 - a1*(2*x - 22) - 2*a2*x + (x - 11)/sqrt((x - 11)**2 + (y - 3)**2), a0 - a1*(2*y - 6) - 2*a2*y + (y - 3)/sqrt((x - 11)**2 + (y - 3)**2), -0.66666*x + y + 4.3333, -(x - 11)**2 - (y - 3)**2 + 16, -x**2 - y**2 + 9, 0.66666*x - y - 4.3333 >= 0, (x - 11)**2 + (y - 3)**2 - 16 >= 0, x**2 + y**2 - 9 >= 0] is not a valid SymPy expression
When using: solution=smp.solve(eqs[0],vars) to try and solve one equation, it sends sympy into a CPU crushing frenzy and it obviously fails to complete the calculation. I made sure to declare all variables as real so i fail to see why it takes so long to solve.
Would like to understand what I'm missing when it comes to handling multiple inequalities with sympy, and if there is a more optimized faster way to solve Lagrange multiplication I'd love to give it a try

Multiple unmatched matrices in backpropagation through time

I am going to implement binary addition by Recurrent Neural Network (RNN) as a sample. I have coped with an issue to implement it by Python, so I decided to share my problem in there to come up with ideas to fix it.
As can be seen in my notebook code (Backpropagation through time (BPTT) section),
There is a chain rule like below to update input weight matrix like below:
My problem is this part:
I've tried to implement this part in my Python code or notebook code (class input_layer, backward method), but unmatched dimensions raises an error.
In my sample code, W_hidden is 16*16, whereas the result of delta pre_hidden is 1*2. This makes the error. If you run the code, you could see the error.
I spent a lot of time to check my chain rule as well as my code. I guess my chain rule is right. Only reason to make this error is my code.
As I know, multiple unmatched matrices in terms of dimension is impossible. If my chain rule is correct, how it could be implemented by Python?
Any idea?
Thanks in advance.
You need to apply dimension balancing on the gradients. Taken from the Stanford's cs231n course, it comes down to two simple modifications:
Given , and , we will have:
,
Here is the code I used to ensure the gradient calculation is correct. You should be able to update your code accordingly.
import torch
torch.random.manual_seed(0)
x_1, x_2 = torch.zeros(size=(1, 8)).normal_(0, 0.01), torch.zeros(size=(1, 8)).normal_(0, 0.01)
y = torch.zeros(size=(1, 8)).normal_(0, 0.01)
h_0 = torch.zeros(size=(1, 16)).normal_(0, 0.01)
weight_ih = torch.zeros(size=(8, 16)).normal_(mean=0, std=0.01).requires_grad_(True)
weight_hh = torch.zeros(size=(16, 16)).normal_(mean=0, std=0.01).requires_grad_(True)
weight_ho = torch.zeros(size=(16, 8)).normal_(mean=0, std=0.01).requires_grad_(True)
h_1 = x_1.mm(weight_ih) + h_0.mm(weight_hh)
h_2 = x_2.mm(weight_ih) + h_1.mm(weight_hh)
g_2 = h_2.sigmoid()
j_2 = g_2.mm(weight_ho)
y_predicted = j_2.sigmoid()
loss = 0.5 * (y - y_predicted).pow(2).sum()
loss.backward()
delta_1 = -1 * (y - y_predicted) * y_predicted * (1 - y_predicted)
delta_2 = delta_1.mm(weight_ho.t()) * (g_2 * (1 - g_2))
delta_3 = delta_2.mm(weight_hh.t())
# 16 x 8
weight_ho_grad = g_2.t() * delta_1
# 16 x 16
weight_hh_grad = h_1.t() * delta_2 + (h_0.t() * delta_3)
# 8 x 16
weight_ih_grad = x_2.t() * delta_2 + x_1.t() * delta_3
atol = 1e-10
assert torch.allclose(weight_ho.grad, weight_ho_grad, atol=atol)
assert torch.allclose(weight_hh.grad, weight_hh_grad, atol=atol)
assert torch.allclose(weight_ih.grad, weight_ih_grad, atol=atol)

Gradient Descent basic algorithm overshooting and doesn't converge in python

So I'm new to learning ML and I am using gradient descent as my first algorithm I would like to get good at and learn well. I wrote my first code and have looked online for the issue I'm facing but due to lack of concrete knowledge I'm having a hard time understanding how I would go about diagnosing my issue. My gradient begins by approaching the correct answer and when the error has been cut by a factor of 8, the algorithm loses it's value and the b-value begins to go negative and the m-value goes past the target value. I'm sorry if I worded this odd, hopefully the code will help.
I am learning this from multiple sources on youtube and on google. I have been following Siraj Raval's math of intelligence playlist on youtube, I understood how the underlying algorithm worked but I decided to take my own approach and it seems to not be working too great. I'm struggling to read online resources as I'm inexperienced in what ever algorithm means and how it's implemented into python. I know this issue has something to do with training and testing but I don't know where to apply this.
def gradient_updater(error, mcurr, bcurr):
for i in x:
# gets the predicted y-value
ypred = (mcurr * i) + bcurr
# uses partial derivative formula to get new m and b
new_m = -(2/N) * sum(x*(y - ypred))
new_b = -(2/N) * sum(y - ypred)
# applies the new b and m value
mcurr = mcurr - (learning_rate * new_m)
bcurr = bcurr - (learning_rate * new_b)
return mcurr, bcurr
def run(iterations, initial_m, initial_b):
current_m = initial_m
current_b = initial_b
for i in range(iterations):
error = get_error(current_m, current_b)
current_m, current_b = gradient_updater(error, current_m, current_b)
print(current_m, current_b, error)
I expected the m and b values to converge to a specific value, this didn't occur and the values kept increasing in opposite direction.
If I am understanding your code correctly, I think your problem is that your taking the partial derivative to get your new slope and intercept on just one point. I'm not sure what exactly some of the variables within the gradient_updater are, so I will try to provide an example that better explains the concept:
I'm not sure we are calculating the optimization in the same way, so in my code, b0 is your 'x' in y=mx+b and b1 is your 'b' that same equation. The following code is for calculating a total b0_temp and b1_temp that will be divided by the batch size to present a new b0 and b1 to fit your graph.
for i in range(len(X)):
ERROR = ERROR + (b1*X[i] + b0 - Y[i])**2
b1_temp = b1_temp + (1/2)*((1/len(X))*(b1*X[i] + b0 - Y[i])**2)**(-1/2) * (2/len(X))*(b1*X[i] + b0 - Y[i])*X[i]
b0_temp = b0_temp + (1/2)*((1/len(X))*(b1*X[i] + b0 - Y[i])**2)**(-1/2) * (2/len(X))*(b1*X[i] + b0 - Y[i])
I run through this for every value within my dataset, where X[i] and Y[i] represent an individual datapoint.
Next, I adjust the slope that is currently fitting the graph:
b1_temp = b1_temp / batch_size
b0_temp = b0_temp / batch_size
b0 = b0 - learning_rate * b0_temp
b1 = b1 - learning_rate * b1_temp
b1_temp = 0
b0_temp = 0
Where batch_size can just be taken as len(X). I run through this for some number of epochs (i.e. a for loop of some number, 100 should work), and the line of best fit will adjust accordingly over time. The overall concept behind it is decrease the distance between each point and the line to where it is at a minimum.
Hope I was able to better explain this to you and provide you with a basic code base to adjust your's upon!
Here's where I think the error in your code lies - the calculation of the gradient. I believe that your cost function is similar to the one used in https://ml-cheatsheet.readthedocs.io/en/latest/gradient_descent.html. To solve the gradient, you need to aggregate the effects from all partial derivatives. In your implementation however, you iterate over the range x, without accumulating the effects. Therefore, your new_m and new_b are only calculated for the final term, x (Items marked 1 and 2 below).
Your implementation:
def gradient_updater(error, mcurr, bcurr):
for i in x:
# gets the predicted y-value
ypred = (mcurr * i) + bcurr
# uses partial derivative formula to get new m and b
new_m = -(2/N) * sum(x*(y - ypred)) #-- 1 --
new_b = -(2/N) * sum(y - ypred) #-- 2 --
# applies the new b and m value <-- Indent this block to place inside the for loop
mcurr = mcurr - (learning_rate * new_m)
bcurr = bcurr - (learning_rate * new_b)
return mcurr, bcurr
That said, I think your implementation should come closer to the mathematical formula if you just update mcurr and bcurr in every iteration (See inline comment). The other thing to do is to divide both sum(x*(y - ypred)) and sum(y - ypred) by N as well, in computing new_m and new_b.
Note
Since I do not know what your actual cost function is, I just want to point out that you are also using a constant y value in your code. It is more likely to be an array of different values and be called by Y[i] and X[i] respectively.

How to solve a 9-equations system of non linear DE with python?

I'm desperately trying to solve (and display the graph) a system made of nine nonlinear differential equations which model the path of a boomerang. The system is the following:
All the letters on the left side are variables, the others are either constants or known functions depending on v_G and w_z
I have tried with scipy.odeint with no conclusive results (I had this issue but the workaround did not work.)
I begin to think that the problem is linked with the fact that these equations are nonlinear or that the function in denominator might cause a singularity that the scipy solver is simply unable to handle. However, I am not familiar with that sort of mathematical knowledge.
What possibilities python-wise do I have to solve this set of equations?
EDIT : Sorry if I was not clear enough. Since it models the path of a boomerang, my goal is not to solve analytically this system (ie I don't care about the mathematical expression of each function), but rather to get the values of each function for a specific time range (say, from t1 = 0s to t2 = 15s with an interval of 0.01s between each value) in order to display the graph of each function and the graph of the center of mass of the boomerang (X,Y,Z are its coordinates).
Here is the code I tried :
import scipy.integrate as spi
import numpy as np
#Constants
I3 = 10**-3
lamb = 1
L = 5*10**-1
mu = I3
m = 0.1
Cz = 0.5
rho = 1.2
S = 0.03*0.4
Kz = 1/2*rho*S*Cz
g = 9.81
#Initial conditions
omega0 = 20*np.pi
V0 = 25
Psi0 = 0
theta0 = np.pi/2
phi0 = 0
psi0 = -np.pi/9
X0 = 0
Y0 = 0
Z0 = 1.8
INPUT = (omega0, V0, Psi0, theta0, phi0, psi0, X0, Y0, Z0) #initial conditions
def diff_eqs(t, INP):
'''The main set of equations'''
Y=np.zeros((9))
Y[0] = (1/I3) * (Kz*L*(INP[1]**2+(L*INP[0])**2))
Y[1] = -(lamb/m)*INP[1]
Y[2] = -(1/(m * INP[1])) * ( Kz*L*(INP[1]**2+(L*INP[0])**2) + m*g) + (mu/I3)/INP[0]
Y[3] = (1/(I3*INP[0]))*(-mu*INP[0]*np.sin(INP[6]))
Y[4] = (1/(I3*INP[0]*np.sin(INP[3]))) * (mu*INP[0]*np.cos(INP[5]))
Y[5] = -np.cos(INP[3])*Y[4]
Y[6] = INP[1]*(-np.cos(INP[5])*np.cos(INP[4]) + np.sin(INP[5])*np.sin(INP[4])*np.cos(INP[3]))
Y[7] = INP[1]*(-np.cos(INP[5])*np.sin(INP[4]) - np.sin(INP[5])*np.cos(INP[4])*np.cos(INP[3]))
Y[8] = INP[1]*(-np.sin(INP[5])*np.sin(INP[3]))
return Y # For odeint
t_start = 0.0
t_end = 20
t_step = 0.01
t_range = np.arange(t_start, t_end, t_step)
RES = spi.odeint(diff_eqs, INPUT, t_range)
However, I keep getting the same problem as shown here and especially the error message :
Excess work done on this call (perhaps wrong Dfun type)
I am not quite sure what it means but it looks like the solver have troubles solving the system. In any case, when I try to display the 3D path thanks to the XYZ coordinates, I just get 3 or 4 points where there should be something like 2000.
So my questions are : - Am I doing something wrong in my code ?
- If not, is there an other maybe more sophisticated tool to solve this sytem ?
- If not, is it even possible to get what I want from this system of ODEs ?
Thanks in advance
There are several issues:
if I copy the code, it does not run
the workaround you mention does not work with odeint, the given
solution uses ode
The scipy reference for odeint says:"For new code, use
scipy.integrate.solve_ivp to solve a differential equation."
the call RES = spi.odeint(diff_eqs, INPUT, t_range) should be
consistent to the function head def diff_eqs(t, INP) . Mind the
order: RES = spi.odeint(diff_eqs,t_range, INPUT)
There are some issues about to mathematical formulas too:
have a look at the 3rd formula on your picture. It has no tendency term, it starts with a zero - what does that mean ?
it's hard to check wether you have translated the formula correctly into code since the code does not follow the formulas strictly.
Below I tried a solution with scipy solve_ivp. In case A I'm able to run a pendulum, but in case B no meaningful solution for the boomerang can be found. So check the maths, I guess some error in the mathematical expressions.
For the graphics use pandas to plot all variables together (see code below).
import scipy.integrate as spi
import numpy as np
import pandas as pd
def diff_eqs_boomerang(t,Y):
INP = Y
dY = np.zeros((9))
dY[0] = (1/I3) * (Kz*L*(INP[1]**2+(L*INP[0])**2))
dY[1] = -(lamb/m)*INP[1]
dY[2] = -(1/(m * INP[1])) * ( Kz*L*(INP[1]**2+(L*INP[0])**2) + m*g) + (mu/I3)/INP[0]
dY[3] = (1/(I3*INP[0]))*(-mu*INP[0]*np.sin(INP[6]))
dY[4] = (1/(I3*INP[0]*np.sin(INP[3]))) * (mu*INP[0]*np.cos(INP[5]))
dY[5] = -np.cos(INP[3])*INP[4]
dY[6] = INP[1]*(-np.cos(INP[5])*np.cos(INP[4]) + np.sin(INP[5])*np.sin(INP[4])*np.cos(INP[3]))
dY[7] = INP[1]*(-np.cos(INP[5])*np.sin(INP[4]) - np.sin(INP[5])*np.cos(INP[4])*np.cos(INP[3]))
dY[8] = INP[1]*(-np.sin(INP[5])*np.sin(INP[3]))
return dY
def diff_eqs_pendulum(t,Y):
dY = np.zeros((3))
dY[0] = Y[1]
dY[1] = -Y[0]
dY[2] = Y[0]*Y[1]
return dY
t_start, t_end = 0.0, 12.0
case = 'A'
if case == 'A': # pendulum
Y = np.array([0.1, 1.0, 0.0]);
Yres = spi.solve_ivp(diff_eqs_pendulum, [t_start, t_end], Y, method='RK45', max_step=0.01)
if case == 'B': # boomerang
Y = np.array([omega0, V0, Psi0, theta0, phi0, psi0, X0, Y0, Z0])
print('Y initial:'); print(Y); print()
Yres = spi.solve_ivp(diff_eqs_boomerang, [t_start, t_end], Y, method='RK45', max_step=0.01)
#---- graphics ---------------------
yy = pd.DataFrame(Yres.y).T
tt = np.linspace(t_start,t_end,yy.shape[0])
with plt.style.context('fivethirtyeight'):
plt.figure(1, figsize=(20,5))
plt.plot(tt,yy,lw=8, alpha=0.5);
plt.grid(axis='y')
for j in range(3):
plt.fill_between(tt,yy[j],0, alpha=0.2, label='y['+str(j)+']')
plt.legend(prop={'size':20})

Python Neural Network Backpropagation

I'm learning about neural networks, specifically looking at MLPs with a back-propagation implementation. I'm trying to implement my own network in python and I thought I'd look at some other libraries before I started. After some searching I found Neil Schemenauer's python implementation bpnn.py. (http://arctrix.com/nas/python/bpnn.py)
Having worked through the code and read the first part of Christopher M. Bishops book titled 'Neural Networks for Pattern Recognition' I found an issue in the backPropagate function:
# calculate error terms for output
output_deltas = [0.0] * self.no
for k in range(self.no):
error = targets[k]-self.ao[k]
output_deltas[k] = dsigmoid(self.ao[k]) * error
The line of code that calculates the error is different in Bishops book. On page 145, equation 4.41 he defines the output units error as:
d_k = y_k - t_k
Where y_k are the outputs and t_k are the targets. (I'm using _ to represent subscript)
So my question is should this line of code:
error = targets[k]-self.ao[k]
Be infact:
error = self.ao[k] - targets[k]
I'm most likely completely wrong but could someone help clear up my confusion please. Thanks
It all depends on the error measure you use. To give just a few examples of error measures (for brevity, I'll use ys to mean a vector of n outputs and ts to mean a vector of n targets):
mean squared error (MSE):
sum((y - t) ** 2 for (y, t) in zip(ys, ts)) / n
mean absolute error (MAE):
sum(abs(y - t) for (y, t) in zip(ys, ts)) / n
mean logistic error (MLE):
sum(-log(y) * t - log(1 - y) * (1 - t) for (y, t) in zip(ys, ts)) / n
Which one you use depends entirely on the context. MSE and MAE can be used for when the target outputs can take any values, and MLE gives very good results when your target outputs are either 0 or 1 and when y is in the open range (0, 1).
With that said, I haven't seen the errors y - t or t - y used before (I'm not very experienced in machine learning myself). As far as I can see, the source code you provided doesn't square the difference or use the absolute value, are you sure the book doesn't either? The way I see it y - t or t - y can't be very good error measures and here's why:
n = 2 # We only have two output neurons
ts = [ 0, 1 ] # Our target outputs
ys = [ 0.999, 0.001 ] # Our sigmoid outputs
# Notice that your outputs are the exact opposite of what you want them to be.
# Yet, if you use (y - t) or (t - y) to measure your error for each neuron and
# then sum up to get the total error of the network, you get 0.
t_minus_y = (0 - 0.999) + (1 - 0.001)
y_minus_t = (0.999 - 0) + (0.001 - 1)
Edit: Per alfa's comment, in the book, y - t is actually the derivative of MSE. In that case, t - y is incorrect. Note, however, that the actual derivative of MSE is 2 * (y - t) / n, not simply y - t.
If you don't divide by n (so you actually have a summed squared error (SSE), not a mean squared error), then the derivative would be 2 * (y - t). Furthermore, if you use SSE / 2 as your error measure, then the 1 / 2 and the 2 in the derivative cancel out and you are left with y - t.
You have to backpropagate the derivative of
0.5*(y-t)^2 or 0.5*(t-y)^2 with respect to y
which is always
y-t = (y-t)(+1) = (t-y)(-1)
You can study this implementation of MLP from Padasip library.
And the documentation is here
In actual code, we often calculate NEGATIVE grad(of loss with regard to w), and use w += eta*grad to update weight. Actually its a grad ascent.
In some text book, POSITIVE grad is calculated and w -= eta*grad to update weight.

Categories