Python: Neural network strange and consistent output - python

I'm trying to create my first neural network. I'm trying to create a "virtual creatures" system, in which each creature has two sensors in the top of their head.
The input the network get is the distances from the two sensors to the closest food source. By the output I'm supposed to decide whether the creature is suppose to go left or right, and in what angle. The problem is, that i always get the same result (It always goes right or left). My calculations are as follow:
def forward(self, X):
self.z1 = np.dot(np.insert(X, len(X), 1), self.W1)
self.a1 = self.sigmoid(self.z1)
self.z2 = np.dot(np.insert(self.a1, len(self.a1), 1), self.W2)
results = self.sigmoid(self.z2)
return results
I calculate the angle by:
left_d = distance(sensors[0], food_pos)
right_d = distance(sensors[1], food_pos)
max_dis = sqrt(WIDTH**2 + HEIGHT ** 2)
output = self.NN.forward(np.array([float(left_d) / max_dis, float(right_d) / max_dis]))
if output[0] > output[1]:
self.angle += (output[0] - 0.5) * np.pi
else:
self.angle -= (output[1] - 0.5) * np.pi
I've also tried the network with random inputs, and i found out that for a given NN, the output will always be (X,Y) such that X > Y or that X < Y, never randomized between the two in the same NN.
Here is some data from a run:
X - [ 0.78958477 0.69948212]
Z1 - [ 1.61766664 1.56767388 1.82580234]
A1 - [ 0.99890179 0.99999513 0.96766178]
Z2 - [ 1.45907443 0.92895941]
R - [ 0.9937656 0.80099741]
X - [ 0.14044444 0.60987121]
Z1 - [ 0.97104647 1.00091401 1.23983745]
A1 - [ 0.82547683 0.84196448 0.94573119]
Z2 - [ 1.28368194 0.85941254]
R - [ 0.95906503 0.75745915]
as you can see, R is consistant for this and for ~million measures.
Here is my weight initialization:
self.W1 = np.random.rand(self.inputLayerSize + 1, self.hiddenLayerSize)
self.W2 = np.random.rand(self.hiddenLayerSize + 1, self.outputLayerSize)
Note: I'm using genetic algorithm rather than backpropagation.
Here is the GA part:
def merge_guppies(screen, g1, g2):
W11, W12 = g1.NN.get_W()
W21, W22 = g2.NN.get_W()
W1 = [[0] * len(W11[0])] * len(W11)
W2 = [[0] * len(W12[0])] * len(W12)
for k in xrange(len(W11)):
for j in xrange(len(W11[k])):
if uniform(0, 1) > 0.9:
W1[k][j] = uniform(0, 1)
elif uniform(0, 1) > 0.45:
W1[k][j] = W11[k][j]
else:
W1[k][j] = W21[k][j]
for k in xrange(len(W12)):
for j in xrange(len(W12[k])):
if uniform(0, 1) > 0.9:
W2[k][j] = uniform(0, 1)
elif uniform(0, 1) > 0.45:
W2[k][j] = W12[k][j]
else:
W2[k][j] = W22[k][j]
W1 = np.array(W1)
W2 = np.array(W2)
g = Guppy(screen)
g.NN.set_W(W1, W2)
return g
I doubt if the problem is here, considering it occured in other tests i've made with my NN (just giving it random inputs), but who knows... I've been searching for an answer for couple of days now and i'm pretty lost.
Any help will be appreciated. Any idea where have I made a mistake?

Related

Error in implementation of Crank-Nicolson method applied to 1D TDSE?

This is more of a computational physics problem, and I've asked it on physics stack exchange, but no answers on there. This is, I suppose, a mix of the disciplines on here and there (and maybe even mathematics stack exchange), so finding the right place to post is a task in of itself apparently...
I'm attempting to use Crank-Nicolson scheme to solve the TDSE in 1D. The initial wave is a real Gaussian that has been normalised wrt its probability density. As the solution evolves, a depression grows in the central peak of the real part of the wave, and the imaginary part's central trough is perhaps a bit higher than I expect (image below).
Does this behaviour seem reasonable? I have searched around and not seen questions/figures that are similar. I've tested another person's code from Github and it exhibits the same behaviour, which makes me feel a bit better. But I still think the center peak should just decrease in height and increase in width. The likelihood of me getting a physics-based explanation is relatively low here I'd assume, but a computational-based explanation on errors I may have made is more likely.
I'm happy to give more information, for example my code, or the matrices used in the scheme, etc. Thanks in advance!
Here's a link to GIF of time evolution:
And the part of my code relevant to solving the 1D TDSE:
(pretty much the entire thing except the plotting)
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
# Define function for norm.
def normf(dxc, uc, ic):
return sum(dxc * np.square(np.abs(uc[ic, :])))
# Define function for expectation value of position.
def xexpf(dxc, xc, uc, ic):
return sum(dxc * xc * np.square(np.abs(uc[ic, :])))
# Define function for expectation value of squared position.
def xexpsf(dxc, xc, uc, ic):
return sum(dxc * np.square(xc) * np.square(np.abs(uc[ic, :])))
# Define function for standard deviation.
def sdaf(xexpc, xexpsc, ic):
return np.sqrt(xexpsc[ic] - np.square(xexpc[ic]))
# Time t: t0 =< t =< tf. Have N steps at which to evaluate the CN scheme. The
# time interval is dt. decp: variable for plotting to certain number of decimal
# places.
t0 = 0
tf = 20
N = 200
dt = tf / N
t = np.linspace(t0, tf, num = N + 1, endpoint = True)
decp = str(dt)[::-1].find('.')
# Initialise array for filling with norm values at each time step.
norm = np.zeros(len(t))
# Initialise array for expectation value of position.
xexp = np.zeros(len(t))
# Initialise array for expectation value of squared position.
xexps = np.zeros(len(t))
# Initialise array for alternate standard deviation.
sda = np.zeros(len(t))
# Position x: -a =< x =< a. M is an even number. There are M + 1 total discrete
# positions, for the points to be symmetric and centred at x = 0.
a = 100
M = 1200
dx = (2 * a) / M
x = np.linspace(-a, a, num = M + 1, endpoint = True)
# The gaussian function u diffuses over time. sd sets the width of gaussian. u0
# is the initial gaussian at t0.
sd = 1
var = np.power(sd, 2)
mu = 0
u0 = np.sqrt(1 / np.sqrt(np.pi * var)) * np.exp(-np.power(x - mu, 2) / (2 * \
var))
u = np.zeros([len(t), len(x)], dtype = 'complex_')
u[0, :] = u0
# Normalise u.
u[0, :] = u[0, :] / np.sqrt(normf(dx, u, 0))
# Set coefficients of CN scheme.
alpha = dt * -1j / (4 * np.power(dx, 2))
beta = dt * 1j / (4 * np.power(dx, 2))
# Tridiagonal matrices Al and AR. Al to be solved using Thomas algorithm.
Al = np.zeros([len(x), len(x)], dtype = 'complex_')
for i in range (0, M):
Al[i + 1, i] = alpha
Al[i, i] = 1 - (2 * alpha)
Al[i, i + 1] = alpha
# Corner elements for BC's.
Al[M, M], Al[0, 0] = 1 - alpha, 1 - alpha
Ar = np.zeros([len(x), len(x)], dtype = 'complex_')
for i in range (0, M):
Ar[i + 1, i] = beta
Ar[i, i] = 1 - (2 * beta)
Ar[i, i + 1] = beta
# Corner elements for BC's.
Ar[M, M], Ar[0, 0] = 1 - 2*beta, 1 - beta
# Thomas algorithm variables. Following similar naming as in Wiki article.
a = np.diag(Al, -1)
b = np.diag(Al)
c = np.diag(Al, 1)
NT = len(b)
cp = np.zeros(NT - 1, dtype = 'complex_')
for n in range(0, NT - 1):
if n == 0:
cp[n] = c[n] / b[n]
else:
cp[n] = c[n] / (b[n] - (a[n - 1] * cp[n - 1]))
d = np.zeros(NT, dtype = 'complex_')
dp = np.zeros(NT, dtype = 'complex_')
# Iterate over each time step to solve CN method. Maintain boundary
# conditions. Keep track of standard deviation.
for i in range(0, N):
# BC's.
u[i, 0], u[i, M] = 0, 0
# Find RHS.
d = np.dot(Ar, u[i, :])
for n in range(0, NT):
if n == 0:
dp[n] = d[n] / b[n]
else:
dp[n] = (d[n] - (a[n - 1] * dp[n - 1])) / (b[n] - (a[n - 1] * \
cp[n - 1]))
nc = NT - 1
while nc > -1:
if nc == NT - 1:
u[i + 1, nc] = dp[nc]
nc -= 1
else:
u[i + 1, nc] = dp[nc] - (cp[nc] * u[i + 1, nc + 1])
nc -= 1
norm[i] = normf(dx, u, i)
xexp[i] = xexpf(dx, x, u, i)
xexps[i] = xexpsf(dx, x, u, i)
sda[i] = sdaf(xexp, xexps, i)
# Fill in final norm value.
norm[N] = normf(dx, u, N)
# Fill in final position expectation value.
xexp[N] = xexpf(dx, x, u, N)
# Fill in final squared position expectation value.
xexps[N] = xexpsf(dx, x, u, N)
# Fill in final standard deviation value.
sda[N] = sdaf(xexp, xexps, N)

Faulty image with Bilinear Interpolation

I am implementing Bilinear Interpolation to resize image. The function for bilinear interpolation and resizing is as follows:
def bl_resize(original_img, new_h, new_w):
old_h, old_w, c = original_img.shape
resized = np.ones((new_h, new_w, c))
w_scale_factor = (old_w - 1) / (new_w - 1) if new_h != 0 else 0
h_scale_factor = (old_h - 1) / (new_h - 1) if new_w != 0 else 0
for i in range(new_h):
for j in range(new_w):
for k in range(c):
x = i * h_scale_factor
y = j * w_scale_factor
x_floor = math.floor(x)
x_ceil = min( old_h - 1, math.ceil(x))
y_floor = math.floor(y)
y_ceil = min(old_w - 1, math.ceil(y))
if (x_ceil == x_floor) and (y_ceil == y_floor):
q = original_img[int(x), int(y), k]
else:
v1 = original_img[x_floor, y_floor, k]
v2 = original_img[x_ceil, y_floor, k]
v3 = original_img[x_floor, y_ceil, k]
v4 = original_img[x_ceil, y_ceil, k]
q1 = v1 * (x_ceil - x) + v2 * (x - x_floor)
q2 = v3 * (x_ceil - x) + v4 * (x - x_floor)
q = q1 * (y_ceil - y) + q2 * (y - y_floor)
resized[i,j,k] = q
return resized.astype(np.uint8)
I am using x_ceil = min( old_h - 1, math.ceil(x)) and y_ceil = min(old_w - 1, math.ceil(y)) to avoid access to index larger the the dimensions of the original image array. Without it I would get index out of range error for the last index in both dimensions.
The resized image using this code contains a black grid on it. Here are some output images. The first image is of a shrunken version of the original image and the second one is that of the enlarged one!
EDIT: I have identified what is exactly causing the problem, but I don't understand why it is causing a problem. Changing the scale factor for both the dimensions from (old/new) to (old - 1)/(new - 1) lead to grid free results. I want to understand how the scale factor values can create a problem.
Well, after doing some debugging I figured out the reason. The black grid is obtained because of incorrectly assigned zero values to pixels where either x or y take integer values, that results in q = 0.
I have documented everything here: https://meghal-darji.medium.com/implementing-bilinear-interpolation-for-image-resizing-357cbb2c2722#f91e-235aaa8634b8

SSD(single shot detector)'s default box implementation

I can't understand SSD's default box implementation. Original paper's formula is below;
w_k=s_k√a_k, h_k=s_k/√a_k
But many SSD's implementation seems to be different above's formula. For example, ssd.pytorch;
mean = []
for k, f in enumerate(self.feature_maps):
for i, j in product(range(f), repeat=2):
f_k = self.image_size / self.steps[k]
# unit center x,y
cx = (j + 0.5) / f_k
cy = (i + 0.5) / f_k
# aspect_ratio: 1
# rel size: min_size
s_k = self.min_sizes[k]/self.image_size
mean += [cx, cy, s_k, s_k]
# aspect_ratio: 1
# rel size: sqrt(s_k * s_(k+1))
s_k_prime = sqrt(s_k * (self.max_sizes[k]/self.image_size))
mean += [cx, cy, s_k_prime, s_k_prime]
# rest of aspect ratios
for ar in self.aspect_ratios[k]:
mean += [cx, cy, s_k*sqrt(ar), s_k/sqrt(ar)]
mean += [cx, cy, s_k/sqrt(ar), s_k*sqrt(ar)]
# back to torch land
output = torch.Tensor(mean).view(-1, 4)
Other ssd_keras is;
# define prior boxes shapes
box_widths = []
box_heights = []
for ar in self.aspect_ratios:
if ar == 1 and len(box_widths) == 0:
box_widths.append(self.min_size)
box_heights.append(self.min_size)
elif ar == 1 and len(box_widths) > 0:
box_widths.append(np.sqrt(self.min_size * self.max_size))
box_heights.append(np.sqrt(self.min_size * self.max_size))
elif ar != 1:
box_widths.append(self.min_size * np.sqrt(ar))
box_heights.append(self.min_size / np.sqrt(ar))
box_widths = 0.5 * np.array(box_widths)
box_heights = 0.5 * np.array(box_heights)
Question
What is s_k = self.min_sizes[k]/self.image_size ? What is self.min_size * self.max_size ? I can't find formula in original paper.
I found the answer in github's issue
UPDATE:
min_sizes/img_size and max_sizes/img_size mean s_k and s_k+1 respectively. Also, conv4_3 applies s_k=0.1 instead of equation(4). Therefore, all of feature maps can't apply equation(4). So I think all of scales are defined as min_sizes and max_sizes beforehand.

Updating weight in previous layers in Backpropagation

I am trying to create a simple neural network and stuck at updating the weights at first layer in two layers. I imagine the first update I am doing to w2 are correct as what I learned from back propagation algorithm. I am not including bias for now. But how do we update the first layer weights is where I am stuck at.
import numpy as np
np.random.seed(10)
def sigmoid(x):
return 1.0/(1+ np.exp(-x))
def sigmoid_derivative(x):
return x * (1.0 - x)
def cost_function(output, y):
return (output - y) ** 2
x = 2
y = 4
w1 = np.random.rand()
w2 = np.random.rand()
h = sigmoid(w1 * x)
o = sigmoid(h * w2)
cost_function_output = cost_function(o, y)
prev_w2 = w2
w2 -= 0.5 * 2 * cost_function_output * h * sigmoid_derivative(o) # 0.5 being learning rate
w1 -= 0 # What do you update this to?
print(cost_function_output)
I'm not able to comment on your question, so writing here.
Firstly, your sigmoid_derivative function is wrong.
The derivative of sigmoid(x*y) w.r.t x is = sigmoid(x*y)*(1-sigmoid(x*y))*y.
Edit: (deleted unnecessary text)
We need dW1 and dW2 (These are dJ/dW1 and dJ/dW (partial derivatives) respectively.
J = (o - y)^2 therefore dJ/do = 2*(o - y)
Now, dW2
dJ/dW2 = dJ/do * do/dW2 (chain rule)
dJ/dW2 = (2*(o - y)) * (o*(1 - o)*h)
dW2 (equals above equation)
W2 -= learning_rate*dW2
Now, for dW1
dJ/dh = dJ/do * do/dh = (2*(o - y)) * (o*(1 - o)*W2
dJ/dW1 = dJ/dh * dh/dW1 = ((2*(o - y)) * (o*(1 - o)*W2)) * (h*(1- h)*x)
dW1 (equals above equation)
W1 -= learning_rate*dW2
PS: Try to make a computational graphs, finding derivatives become a lot more easier. (If you don't know this, read it online)

Error in Backpropagation: Neural Network predicts same class

I am writing Neural Network code from scratch using Numpy. But even after training my Network for many epochs, the predictions for each class is random and remains same irrespective of the input.
I have checked my concept according to Andrew Ng's Coursera ML course and towardsdatascience.com 's post. I think I'm making some very conceptual mistake which I cannot figure out.
Here is my code:
import numpy as np
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def dsigmoid(y):
return y * (1 - y)
class NeuralNetwork:
def __init__(self, shape):
self.n_layers = len(shape)
self.shape = shape
self.weight = []
self.bias = []
i = 0
while i < self.n_layers - 1:
self.weight.append(np.random.normal(loc=0.0, scale=0.5,
size=(self.shape[i + 1], self.shape[i])))
self.bias.append(np.random.normal(loc=0.0, scale=0.3,
size=(self.shape[i + 1], 1)))
i += 1
def predict(self, X):
z = self.weight[0] # X + self.bias[0]
a = sigmoid(z)
i = 1
while i < self.n_layers - 1:
z = self.weight[i] # a + self.bias[i]
a = sigmoid(z)
i += 1
return a
def predictVerbose(self, X):
layers = [X]
z = self.weight[0] # X + self.bias[0]
a = sigmoid(z)
layers.append(a)
i = 1
while i < self.n_layers - 1:
z = self.weight[i] # a + self.bias[i]
a = sigmoid(z)
layers.append(a)
i += 1
return layers
def gradOne(self, X, y):
layers = self.predictVerbose(X)
h = layers[-1]
delta_b = [(h - y) * dsigmoid(h)]
delta_w = [delta_b[0] # layers[-2].T]
i = 1
while i < self.n_layers - 1:
buff = delta_b[-1]
delta_b.append((self.weight[-i].T # buff) * dsigmoid(layers[-(i + 1)]))
delta_w.append(delta_b[-1] # layers[-(i + 2)].T)
i += 1
return delta_b[::-1], delta_w[::-1]
def grad(self, data, l_reg=0):
#data: x1, x2, x3, ..., xm, y=(0, 1, 2,...)
m = len(data)
delta_b = []
delta_w = []
i = 0
while i < self.n_layers - 1:
delta_b.append(np.zeros((self.shape[i + 1], 1)))
delta_w.append(np.zeros((self.shape[i + 1], self.shape[i])))
i += 1
for row in data:
X = np.array(row[:-1])[np.newaxis].T
y = np.zeros((self.shape[-1], 1))
# print(row)
y[row[-1], 0] = 1
buff1, buff2 = self.gradOne(X, y)
i = 0
while i < len(delta_b):
delta_b[i] += buff1[i] / m
delta_w[i] += buff2[i] / m
i += 1
return delta_b, delta_w
def train(self, data, batch_size, epoch, alpha, l_reg=0):
m = len(data)
for i in range(epoch):
j = 0
while j < m:
delta_b, delta_w = self.grad(data[i: (i + batch_size + 1)])
i = 0
while i < len(self.weight):
self.weight[i] -= alpha * delta_w[i]
self.bias[i] -= alpha * delta_b[i]
i += 1
j += batch_size
if __name__ == "__main__":
x = NeuralNetwork([2, 2, 2])
# for y in x.gradOne(np.array([[1], [2], [3]]), np.array([[0], [1]])):
# print(y.shape)
data = [
[1, 1, 0],
[0, 0, 0],
[1, 0, 1],
[0, 1, 1]
]
x.train(data, 4, 1000, 0.1)
print(x.predict(np.array([[1], [0]])))
print(x.predict(np.array([[1], [1]])))
Please point out where I am going wrong.
Unfortunately I don't have enough reputation to comment on your post but here's a link to a numpy only neural network that I've made (tested on blob data from sklearn and mnist).
https://github.com/jaymody/backpropagation/blob/master/old/NeuralNetwork.py
Are you still interested in this problem? As I understood, you try to get the XOR-perceptron with direct and inverse outputs?
It looks like:
1. You need to change the expression
delta_b, delta_w = self.grad(data[i: (i + batch_size + 1)]) to
delta_b, delta_w = self.grad(data[::])
in the train function.
2. Some of random values, used for initialization of synaptic and biases weights, requires much more training cycles for alpha=0.1. Try to play with the alpha (I set it up to 2) and number of epochs (I tried up to 20000).
Also your code do not works with 1-layered networks. I tried to train 1-layered AND and OR perceptrons and I got very strange results (or maybe it requires even much more cycles). But in 2-layered cases it works fine.

Categories