Neural network - Matrix problems in python - python

Im playing around with a simple neural network, specifically this tutorial https://stevenmiller888.github.io/mind-how-to-build-a-neural-network/ , and im getting some problems when doing backpropagation. My matrices won't match when backpropagating for the initial input weights.
I guess its a simple linear algebra problem. However, i wonder if the programming language in the tutorial confuses me. Or i could be the problem occured much earlier.
If anybody has any ideas what i might do wrong please let me know!
My Code
import numpy as np
inputM = np.matrix([
[0,1],
[1,0],
[1,1],
[0,0]
])
outputM = np.matrix([
[0],
[0],
[1],
[1]
])
neurons = 3
mu, sigma = 0, 0.1 # mean and standard deviation
weights = np.random.normal(mu, sigma, len(inputM.T) * neurons)
weightsMatrix = np.matrix(weights).reshape(3,2)
weights = np.matrix(weights)
#Forward
inputHidden = inputM * weightsMatrix.T
hiddenLayerLog = 1 / (1 + np.exp(inputHidden))
hiddenWeights = np.random.normal(mu, sigma, neurons)[np.newaxis, :]
sumOfHiddenLayer = np.sum(hiddenWeights + hiddenLayerLog, axis=1)
predictedOutput = 1 / (1 + np.exp(sumOfHiddenLayer))
residual = outputM - predictedOutput
logDerivative = 1 / (1 + np.exp(sumOfHiddenLayer)) * (1 - 1 / (1 + np.exp(sumOfHiddenLayer))).T
deltaOutputSum = logDerivative * residual
#Backward
deltaWeights = deltaOutputSum / hiddenLayerLog
newHiddenWeights = hiddenWeights - deltaWeights
deltaHiddenSum = (deltaOutputSum / hiddenWeights)
deltaHiddenSum = deltaHiddenSum.T * (1 / (1 + np.exp(inputHidden))) * (1 - 1 / (1 + np.exp(inputHidden))).T
newInputWeights = np.array(deltaHiddenSum) / np.array(inputM)

Related

Error in implementation of Crank-Nicolson method applied to 1D TDSE?

This is more of a computational physics problem, and I've asked it on physics stack exchange, but no answers on there. This is, I suppose, a mix of the disciplines on here and there (and maybe even mathematics stack exchange), so finding the right place to post is a task in of itself apparently...
I'm attempting to use Crank-Nicolson scheme to solve the TDSE in 1D. The initial wave is a real Gaussian that has been normalised wrt its probability density. As the solution evolves, a depression grows in the central peak of the real part of the wave, and the imaginary part's central trough is perhaps a bit higher than I expect (image below).
Does this behaviour seem reasonable? I have searched around and not seen questions/figures that are similar. I've tested another person's code from Github and it exhibits the same behaviour, which makes me feel a bit better. But I still think the center peak should just decrease in height and increase in width. The likelihood of me getting a physics-based explanation is relatively low here I'd assume, but a computational-based explanation on errors I may have made is more likely.
I'm happy to give more information, for example my code, or the matrices used in the scheme, etc. Thanks in advance!
Here's a link to GIF of time evolution:
And the part of my code relevant to solving the 1D TDSE:
(pretty much the entire thing except the plotting)
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
# Define function for norm.
def normf(dxc, uc, ic):
return sum(dxc * np.square(np.abs(uc[ic, :])))
# Define function for expectation value of position.
def xexpf(dxc, xc, uc, ic):
return sum(dxc * xc * np.square(np.abs(uc[ic, :])))
# Define function for expectation value of squared position.
def xexpsf(dxc, xc, uc, ic):
return sum(dxc * np.square(xc) * np.square(np.abs(uc[ic, :])))
# Define function for standard deviation.
def sdaf(xexpc, xexpsc, ic):
return np.sqrt(xexpsc[ic] - np.square(xexpc[ic]))
# Time t: t0 =< t =< tf. Have N steps at which to evaluate the CN scheme. The
# time interval is dt. decp: variable for plotting to certain number of decimal
# places.
t0 = 0
tf = 20
N = 200
dt = tf / N
t = np.linspace(t0, tf, num = N + 1, endpoint = True)
decp = str(dt)[::-1].find('.')
# Initialise array for filling with norm values at each time step.
norm = np.zeros(len(t))
# Initialise array for expectation value of position.
xexp = np.zeros(len(t))
# Initialise array for expectation value of squared position.
xexps = np.zeros(len(t))
# Initialise array for alternate standard deviation.
sda = np.zeros(len(t))
# Position x: -a =< x =< a. M is an even number. There are M + 1 total discrete
# positions, for the points to be symmetric and centred at x = 0.
a = 100
M = 1200
dx = (2 * a) / M
x = np.linspace(-a, a, num = M + 1, endpoint = True)
# The gaussian function u diffuses over time. sd sets the width of gaussian. u0
# is the initial gaussian at t0.
sd = 1
var = np.power(sd, 2)
mu = 0
u0 = np.sqrt(1 / np.sqrt(np.pi * var)) * np.exp(-np.power(x - mu, 2) / (2 * \
var))
u = np.zeros([len(t), len(x)], dtype = 'complex_')
u[0, :] = u0
# Normalise u.
u[0, :] = u[0, :] / np.sqrt(normf(dx, u, 0))
# Set coefficients of CN scheme.
alpha = dt * -1j / (4 * np.power(dx, 2))
beta = dt * 1j / (4 * np.power(dx, 2))
# Tridiagonal matrices Al and AR. Al to be solved using Thomas algorithm.
Al = np.zeros([len(x), len(x)], dtype = 'complex_')
for i in range (0, M):
Al[i + 1, i] = alpha
Al[i, i] = 1 - (2 * alpha)
Al[i, i + 1] = alpha
# Corner elements for BC's.
Al[M, M], Al[0, 0] = 1 - alpha, 1 - alpha
Ar = np.zeros([len(x), len(x)], dtype = 'complex_')
for i in range (0, M):
Ar[i + 1, i] = beta
Ar[i, i] = 1 - (2 * beta)
Ar[i, i + 1] = beta
# Corner elements for BC's.
Ar[M, M], Ar[0, 0] = 1 - 2*beta, 1 - beta
# Thomas algorithm variables. Following similar naming as in Wiki article.
a = np.diag(Al, -1)
b = np.diag(Al)
c = np.diag(Al, 1)
NT = len(b)
cp = np.zeros(NT - 1, dtype = 'complex_')
for n in range(0, NT - 1):
if n == 0:
cp[n] = c[n] / b[n]
else:
cp[n] = c[n] / (b[n] - (a[n - 1] * cp[n - 1]))
d = np.zeros(NT, dtype = 'complex_')
dp = np.zeros(NT, dtype = 'complex_')
# Iterate over each time step to solve CN method. Maintain boundary
# conditions. Keep track of standard deviation.
for i in range(0, N):
# BC's.
u[i, 0], u[i, M] = 0, 0
# Find RHS.
d = np.dot(Ar, u[i, :])
for n in range(0, NT):
if n == 0:
dp[n] = d[n] / b[n]
else:
dp[n] = (d[n] - (a[n - 1] * dp[n - 1])) / (b[n] - (a[n - 1] * \
cp[n - 1]))
nc = NT - 1
while nc > -1:
if nc == NT - 1:
u[i + 1, nc] = dp[nc]
nc -= 1
else:
u[i + 1, nc] = dp[nc] - (cp[nc] * u[i + 1, nc + 1])
nc -= 1
norm[i] = normf(dx, u, i)
xexp[i] = xexpf(dx, x, u, i)
xexps[i] = xexpsf(dx, x, u, i)
sda[i] = sdaf(xexp, xexps, i)
# Fill in final norm value.
norm[N] = normf(dx, u, N)
# Fill in final position expectation value.
xexp[N] = xexpf(dx, x, u, N)
# Fill in final squared position expectation value.
xexps[N] = xexpsf(dx, x, u, N)
# Fill in final standard deviation value.
sda[N] = sdaf(xexp, xexps, N)

Using Adam to find the minimum of the Rosenbrock function using Pytorch

I am comparing the Adam - Algorithm to SGD with Momentum. I realised that the convergence rate of Adam is way worse than the convergence rate of SGD with Momentum if applied to the Rosenbrock function. This finding is in contrast to this visualisation. You can read the underlying code here.
Too ensure that I did not have an implementation error I compared the results of my algorithm to the Pytorch implementation. Pytorch and my implementation return the same result.
Therefore either Pytorch and my implementation is incorrect or the implementation in the link is incorrect. If you check out the code from the link above you will find that the Bias correction step is missing. After adapting my code in the same way the results did not significantly improve.
So my question is why does it work in the linked scenario but not in my/Pytorch implementation? Even though all of the three should return the same result.
import numpy as np
import torch
# Rosenbrock function
class Rosenbrock:
a_f = 1.
b_f = 2.
# The minimum is at (a_f, a_f**2)
class Adam_para:
beta1 = 0.9 # 0.7 # modified because of github: https://gist.github.com/EmilienDupont/f97a3902f4f3a98f350500a3a00371db
beta2 = 0.999
eps = 1e-8
lr = 2e-2
iterations = 100
def f(x,y):
return ( Rosenbrock.a_f - x ) ** 2 + Rosenbrock.b_f * (y - x ** 2 ) ** 2
def grad_f(x,y):
grad_x = - 1. * 2 * (Rosenbrock.a_f - x) + Rosenbrock.b_f * (- 2 * x) * 2 * ( y - x ** 2 )
grad_y = Rosenbrock.b_f * ( 1. ) * 2 * (y - x ** 2)
return np.array([grad_x, grad_y])
def adam_inner(p: np.ndarray,t,exp_avg,exp_avg_sqr, lr):
# inner loop of adam algorithm
# p current point
# exp_avg first moment estimate
# exp_avg_sqr second moment estimate
# lr learning rate
# the following values are taken from the ADAM Paper
beta1 = Adam_para.beta1
beta2 = Adam_para.beta2
eps = Adam_para.eps
t = t+1
g = grad_f(*p)
exp_avg = beta1 * exp_avg + ( 1 - beta1 ) * g
exp_avg_sqr = beta2 * exp_avg_sqr + ( 1 - beta2 ) * np.square(g)
bias_corr_1 = 1 - beta1 ** t
bias_corr_2 = 1 - beta2 ** t
exp_avg_hat = exp_avg / bias_corr_1
exp_avg_sqr_hat = exp_avg_sqr / bias_corr_2
denom = np.sqrt(exp_avg_sqr_hat) + eps
p = p - lr * exp_avg_hat / denom
return {'p': p, 'first_mom': exp_avg, 'second_mom': exp_avg_sqr}
def adam(p, it, lr=0.001):
# it number of iterations
# m first moment estimate
# v second moment estimate
# init
m = 0
v = 0
p_list = [p]
for i in range(it):
tmp = adam_inner(p_list[-1],i,m,v,lr)
p_list.append(tmp['p'])
m = tmp['first_mom']
v = tmp['second_mom']
return np.asarray(p_list)
x0 = np.array([3.,3.])
t = adam(x0,Adam_para.iterations,Adam_para.lr)
x0_torch = torch.tensor(x0, requires_grad=True)
f_torch = f(x0_torch[0],x0_torch[1])
optimizer = torch.optim.Adam([x0_torch], lr = Adam_para.lr, betas=(Adam_para.beta1,Adam_para.beta2))
for i in range(Adam_para.iterations):
optimizer.zero_grad()
f_torch = f(x0_torch[0],x0_torch[1])
f_torch.backward()
optimizer.step()
print("pytorch result:", x0_torch)
print("my result:", t[-1])

Gradient descent with polynomial features implementation issue

I am trying to implement gradient descent after transforming some random data using sklearns polynomial transformer. My code works when not using polynomial features, but gives really high coefficients when transforming.
Is there an issue with my code (below)?
l= 20
np.random.seed(0)
X = 2 - 3 * np.random.normal(0, 1, l)
y = X - 2 * (X ** 2) + 0.5 * (X ** 3) + np.random.normal(-3, 3, l)
plt.scatter(X,y, s=10)
plt.show()
X = X.reshape(-1,1)
m, n = X.shape
p = PolynomialFeatures(degree=2)
xbp = p.fit_transform(X)
# initiate coefs
theta = np.ones((xbp.shape[1], 1))
for _ in range(500):
err = xbp.dot(theta) - y.reshape(-1, 1)
gradients = 2 / m * xbp.T.dot(err)
theta -= 0.01 * gradients
print(theta)
print()
print(LinearRegression().fit(xbp, y).coef_)
output
[[ 802.60118234]
[ 360.65857329]
[12234.00939771]]
[ 0. 8.48492679 -1.62853134]
code snippet
Thanks

Simple Neural Network from scratch using NumPy

I added learning rate and momentum to a neural network implementation from scratch I found at: https://towardsdatascience.com/how-to-build-your-own-neural-network-from-scratch-in-python-68998a08e4f6
However I had a few questions about my implementation:
Is it correct? Any suggested improvements? It appears to output adequate results generally but outside advice is very appreciated.
With a learning rate < 0.5 or momentum > 0.9 the network tends to gets stuck in a local optimum where loss = ~1. I assume this is because step size isn't big enough to escape this but is there a way to overcome this? Or is this inherent with the nature of the data being solved and unavoidable.
import numpy as np
import matplotlib.pyplot as plt
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(x):
sig = 1 / (1 + np.exp(-x))
return sig * (1 - sig)
class NeuralNetwork:
def __init__(self, x, y):
self.input = x
self.weights1 = np.random.rand(self.input.shape[1], 4)
self.weights2 = np.random.rand(4, 1)
self.y = y
self.output = np.zeros(self.y.shape)
self.v_dw1 = 0
self.v_dw2 = 0
self.alpha = 0.5
self.beta = 0.5
def feedforward(self):
self.layer1 = sigmoid(np.dot(self.input, self.weights1))
self.output = sigmoid(np.dot(self.layer1, self.weights2))
def backprop(self, alpha, beta):
# application of the chain rule to find derivative of the loss function with respect to weights2 and weights1
d_weights2 = np.dot(self.layer1.T, (2*(self.y - self.output) * sigmoid_derivative(self.output)))
d_weights1 = np.dot(self.input.T, (np.dot(2*(self.y - self.output) *
sigmoid_derivative(self.output), self.weights2.T) *
sigmoid_derivative(self.layer1)))
# adding effect of momentum
self.v_dw1 = (beta * self.v_dw1) + ((1 - beta) * d_weights1)
self.v_dw2 = (beta * self.v_dw2) + ((1 - beta) * d_weights2)
# update the weights with the derivative (slope) of the loss function
self.weights1 = self.weights1 + (self.v_dw1 * alpha)
self.weights2 = self.weights2 + (self.v_dw2 * alpha)
if __name__ == "__main__":
X = np.array([[0, 0, 1],
[0, 1, 1],
[1, 0, 1],
[1, 1, 1]])
y = np.array([[0], [1], [1], [0]])
nn = NeuralNetwork(X, y)
total_loss = []
for i in range(10000):
nn.feedforward()
nn.backprop(nn.alpha, nn.beta)
total_loss.append(sum((nn.y-nn.output)**2))
iteration_num = list(range(10000))
plt.plot(iteration_num, total_loss)
plt.show()
print(nn.output)
first thing, in your "sigmoid_derivative(x)", input to this function is already output of a sigmoid, but you get the sigmoid again and then computed derivative, that is one problem, it should be :
return x * (1 - x)
second problem, you are not using any bias, how do you know your decision boundary would cross the origin in the problem hypothesis space? so you need to add a bias term.
And last thing I think your derivatives are not correct, you can refer to Andrew Ng deep learning course 1, week 2 at coursera.org , for a list of general formulas for computing back propagation in neural networks to make sure you are doing it right.

Smoothing signal - convolution

I have an iterative model in Python which generates at signal using a function which contains a derivative. As the model iterates the signal becomes noisy - I suspect it may be an issue with computing the numerical derivative. I've tried to smooth this by applying a low-pass filter, convolving the noisy signal with a Gaussian kernel. I use the code snippet:
nw = 256
std = 40
window = gaussian(nw, std, sym=True)
filtered = convolve(current, window, mode='same') / np.sum(window)
where current is my signal, and gaussian and convolve have been imported from scipy. This seems to give a slight improvement, and the first 2 or 3 iterations appear very smooth. However, after that the signal becomes extremely noisy again, despite the fact that the low-pass filter is positioned inside the iterative loop.
Can anyone suggest where I might be going wrong or how I could better tackle this problem? Thanks.
EDIT: As suggested I've included the code I'm using below. At 5 iterations the noise on the signal is clearly apparent.
import numpy as np
from scipy import special
import matplotlib.pyplot as plt
from scipy.integrate import odeint
from scipy.signal import convolve
from scipy.signal import gaussian
# Constants
B = 426400E-9 # tesla
R = 71723E3
Rkm = R / 1000.
Omega = 1.75e-4 #8.913E-4 # rads/s
period = (2. * np.pi / Omega) / 3600. # Gets period in hours
Bj = 2.0 * B
mdot = 1000.
sigmapstar = 0.05
# Create rhoe array
rho0 = 5.* R
rho1 = 100. * R
rhoe = np.linspace(rho0, rho1, 2.E5)
# Define flux function and z component of equatorial field strength
Fe = B * R**3 / rhoe
Bze = B * (R/rhoe)**3
def derivs(u, rhoe, p):
"""Computes the derivative"""
wOmegaJ = u
Bj, sigmapstar, mdot, B, R = p
# Compute the derivative of w/omegaJ wrt rhoe (**Fe and Bjz have been subbed)
dwOmegaJ = (((8.0*np.pi*sigmapstar*B**2 * (R**6)) / (mdot * rhoe**5)) \
*(1.0-wOmegaJ) - (2.*wOmegaJ/rhoe))
res = dwOmegaJ
return res
its = 5 # number of iterations to perform
i = 0
# Loop to iterate
while i < its:
# Define the initial condition of rigid corotation
wOmegaJ_0 = 1
params = [Bj, sigmapstar, mdot, B, R]
init = wOmegaJ_0
# Compute numerical solution to Hill eqn
u = odeint(derivs, init, rhoe, args=(params,))
wOmega = u[:,0]
# Calculate I_rho
i_rho = 8. * np.pi * sigmapstar * Omega * Fe * ( 1. - wOmega)
dx = rhoe[1] - rhoe[0]
differential = np.gradient(i_rho, dx)
jpara = 1. * differential / (4 * np.pi * rhoe * Bze )
jpari = 2. * B * para
# Remove infinity and NaN values)
jpari[~np.isfinite(jpari)] = 0.0
# Convolve to smooth curve
nw = 256
std = 40
window = gaussian(nw, std, sym=True)
filtered = convolve(jpari, window, mode='same') /np.sum(window)
jpari = filtered
# Pedersen conductivity as function of jpari
sigmapstar0 = 0.05
jstar = 0.01e-6
jstarstar = 0.25e-6
s1 = 0.1e6#0.1e6 # (Am^-2)^-1
s2 = 9.9e6 # (Am^-2)^-1
n = 8.
# Calculate news sigmapstar. Realistic conductivity
sigmapstarNew = sigmapstar0 + 0.5 * (s1 + s2/(1 + (jpari/jstarstar)**n)**(1./n)) * (np.sqrt(jpari**2 + jstar**2) + jpari)
sigmapstarNew = sigmapstarNew
diff = np.abs(sigmapstar - sigmapstarNew) / sigmapstar * 100
diff = max(diff)
sigmapstar = 0.5* sigmapstar + 0.5* sigmapstarNew # Weighted averaging
i += 1
print diff
# Plot jpari
ax = plt.subplot(111)
ax.plot(rhoe/R, jpari * 1e6)
ax.axhline(0, ls=':')
ax.set_xlabel(r'$\rho_e / R_{UCD}$')
ax.set_ylabel(r'$j_{\parallel i} $ / $ \mu$ A m$^{-2}$')
ax.set_xlim([0,80])
ax.set_ylim(-0.01,0.01)
plt.locator_params(nbins=5)
plt.draw()
plt.show()

Categories