I've tried to make a script in python able to recognize handwritten digits, using this data set: http://deeplearning.net/data/mnist/mnist.pkl.gz.
More information about this problem and about the algorithm that I'm trying to implement can be found at this link: http://neuralnetworksanddeeplearning.com/chap1.html
I've implemented a classification algorithm using a perceptron for each digit.
import cPickle, gzip
import numpy as np
f = gzip.open('mnist.pkl.gz', 'rb')
train_set, valid_set, test_set = cPickle.load(f)
f.close()
def activation(x):
if x > 0:
return 1
return 0
bias = 0.5
learningRate = 0.01
images = train_set[0]
targets = train_set[1]
weights = np.random.uniform(0,1,(10,784))
for nr in range(0,10):
for i in range(0,49999):
x = images[i]
t = targets[i]
z = np.dot(weights[nr],x) + bias
output = activation(z)
weights[nr] = weights[nr] + (t - output) * x * learningRate
bias = bias + (t - output) * learningRate
images = test_set[0]
targets = test_set[1]
OK = 0
for i in range range(0, 10000):
vec = []
for j in range(0,10):
vec.append(np.dot(weights[j],images[i]))
if np.argmax(vec) == targets[i]:
OK = OK + 1
print("The network recognized " + str(OK) +'/'+ "10000")
I usually recognized 10% of the digits, which means that my algorithm is doing nothing, is the same as a random algorithm.
Even dough I know that this problem is popular and I can easily find another solution on the web, I'm still asking you to help me to identify mistakes in my code.
Maybe I've initialized the values of learningRate, bias and weights wrongly.
thanks to #Kevinj22 and the other ones, I was able to solve this problem in the end.
import cPickle, gzip
import numpy as np
f = gzip.open('mnist.pkl.gz', 'rb')
train_set, valid_set, test_set = cPickle.load(f)
f.close()
def activation(x):
if x > 0:
return 1
return 0
learningRate = 0.01
images = train_set[0]
targets = train_set[1]
weights = np.random.uniform(0,1,(10,784))
for nr in range(0,10):
for i in range(0,50000):
x = images[i]
t = targets[i]
z = np.dot(weights[nr],x)
output = activation(z)
if nr == t:
target = 1
else:
target = 0
adjust = np.multiply((target - output) * learningRate, x)
weights[nr] = np.add(weights[nr], adjust)
images = test_set[0]
targets = test_set[1]
OK = 0
for i in range(0, 10000):
vec = []
for j in range(0,10):
vec.append(np.dot(weights[j],images[i]))
if np.argmax(vec) == targets[i]:
OK = OK + 1
print("The network recognized " + str(OK) +'/'+ "10000")
here is my updated code. I didn't introduce loss computation in my first attempt. I also get rid of bias because I didn't find it useful in my implementation.
I run this piece of code 10 times, with an average accuracy of 88%
Related
Below I have the code of my attempt to make a neural network with 2 inputs and 3 outputs. While the training gives good results, when I try to input the numbers, the results are way off. After I made some small changes, I observed that, even though they return the output from the function which should be the same, again, the results were different. The only explanation I can think of is that there is a bug.
The functions that I'm talking about are "train" and "result".
Here is the code:
from numpy import dot, exp, max, sum, random, array
class Network:
def __init__(self):
self.w = random.random((2,3))
def sigmoid(self, x, derivate = False):
if(derivate == True):
return x * (1 - x)
return 1 /(1 + exp(-x))
def train(self):
trainingInput = array([[0,0],[0,1],[1,0],[1,1]])
trainingOutput = array([[0,0,0],[0,1,0],[0,0,1],[1,0,0]])
n = 0
while(n < 10000):
exOutput = self.sigmoid(dot(trainingInput, self.w) - 0.1)
error = trainingOutput - exOutput
self.w += dot(trainingInput.T, error *
self.sigmoid(exOutput,True))
n += 1
return exOutput
def result(self):
trainingInput = array([[0,0],[0,1],[1,0],[1,1]])
exOutput = self.sigmoid(dot(trainingInput, self.w) - 0.1)
return exOutput
network = Network()
c = 0
d = 1
o = network.result()
output = network.train()
print(o)
print(output)
you should first train and then check the results.
If you check it before training, obviously two results will be different.
you can just once again calculate the results after training, hopefully, this will solve your bug.
I'm trying to just use NumPy to get a simple, relatively accurate digit-reading neural net. My code runs and gets the right MNIST digit information, but ends up giving the same result of predicting each digit to be unlikely to fall in any of the 10 digit classes.
I think my error has to be somewhat basic. Is there a huge issue with not having thresholds? Are my datatypes messed up? Anything to point me in the right direction would be hugely appreciated; I've been staring at this and tweaking stuff for hours.
Here is a link to my code on GitHub: https://github.com/popuguy/ai-tests/blob/master/npmnistnn.py
And here's a paste:
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
np.set_printoptions(precision=3)
np.set_printoptions(suppress=True)
def display_mnist(img, label):
'''Visually display the 28x28 unformatted array
'''
basic_array = img
plt.imshow(basic_array.reshape((28,28)), cmap=cm.Greys)
plt.suptitle('Image is of a ' + label)
plt.show()
hidden_layer_1_num_nodes = 500
hidden_layer_2_num_nodes = 500
hidden_layer_3_num_nodes = 500
output_layer_num_nodes = 10
batch_size = 100
dimension = 28
full_iterations = 10
def convert_digit_to_onehot(digit):
return [0] * digit + [1] + [0] * (9 - digit)
images = mnist.train.images
# images = np.add(images, 0.1)
labels = mnist.train.labels
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def slope_from_sigmoid(x):
return x * (1 - x)
syn1 = 2 * np.random.random((dimension**2, hidden_layer_1_num_nodes)) - 1
syn2 = 2 * np.random.random((hidden_layer_1_num_nodes, hidden_layer_2_num_nodes)) - 1
syn3 = 2 * np.random.random((hidden_layer_2_num_nodes, hidden_layer_3_num_nodes)) - 1
syn4 = 2 * np.random.random((hidden_layer_3_num_nodes, output_layer_num_nodes)) - 1
testing = False
test_n = 3
for iter in range(full_iterations):
print('Epic epoch bro, we\'re at #' + str(iter+1))
for section in range(0, len(images), batch_size):
if testing:
print('Syn before',syn1)
training_images = images[section:section+batch_size]
training_labels = labels[section:section+batch_size]
l0 = training_images
l1 = sigmoid(np.dot(l0, syn1))
l2 = sigmoid(np.dot(l1, syn2))
l3 = sigmoid(np.dot(l2, syn3))
l4 = sigmoid(np.dot(l3, syn4))
l4_err = training_labels - l4
l4_delta = l4_err * slope_from_sigmoid(l4)
l3_err = np.dot(l4_delta, syn4.T)
l3_delta = l3_err * slope_from_sigmoid(l3)
l2_err = np.dot(l3_delta, syn3.T)
l2_delta = l2_err * slope_from_sigmoid(l2)
l1_err = np.dot(l2_delta, syn2.T)
l1_delta = l1_err * slope_from_sigmoid(l1)
syn4_update = np.dot(l3.T, l4_delta)
syn4 += syn4_update
syn3_update = np.dot(l2.T, l3_delta)
syn3 += syn3_update
syn2_update = np.dot(l1.T, l2_delta)
syn2 += syn2_update
syn1_update = np.dot(l0.T, l1_delta)
syn1 += syn1_update
if testing:
print('Syn after',syn1)
print('Due to syn1 update', syn1_update)
print('Number non-zero elems', len(syn1_update.nonzero()))
print('Which were', syn1_update.nonzero())
print('From the l1_delta', l1_delta)
print(l0[0:test_n])
print("----------")
print(l1[0:test_n])
print("----------")
print(l2[0:test_n])
print("----------")
print(l3[0:test_n])
print("----------")
print(l4[0:test_n])
print("----------")
print(training_labels[0:test_n])
a=input()
if len(a) > 0 and a[0]=='s':
testing=False
correct = 0
total = 0
l4list = l4.tolist()
training_labelslist = training_labels.tolist()
print('Num things', len(l4list))
for i in range(len(l4list)):
print(["{0:0.2f}".format(a) for a in l4list[i]])
# print(l4list[i])
# display_mnist(l0[i], str(l4list[i].index(max(l4list[i]))))
if l4list[i].index(max(l4list[i])) == training_labelslist[i].index(max(training_labelslist[i])):
correct += 1
total += 1
print('Final round', 100*(correct/total),'percent correct')
Hyperparameters in this instance were just improperly tuned. Bringing down the number of nodes per hidden layer to 15 and changing the learning rate down to 0.1 yields a significant performance increase.
I tried to implement the k Nearest Neighbour classification algorithm for images in Python, here is my code
def classifyImage(self, RGBAValsForOneImage, kVal):
redMaster = RGBAValsForOneImage[0]
greenMaster = RGBAValsForOneImage[1]
blueMaster = RGBAValsForOneImage[2]
alphaMaster = RGBAValsForOneImage[3]
L2DistanceDictionary = {}
kLabels = []
for i in range(self.nImagesTrain):
print(str("comparing to image nr " + str(i)))
L2Norm = 0
# label = self.LabelsTraining[i]
for j in range(self.nPixel):
redGreenBlueAlpha = self.RGBAPixelValuesTraining[i, j, :]
redCompare = redGreenBlueAlpha[0]
greenCompare = redGreenBlueAlpha[1]
blueCompare = redGreenBlueAlpha[2]
alphaCompare = redGreenBlueAlpha[3]
L2Norm += np.sqrt((redCompare - redMaster) ** 2 + (greenCompare - greenMaster) ** 2 + (blueCompare - blueMaster) ** 2 + (alphaCompare - alphaMaster) ** 2)[0]
L2Norm *= 100
L2NormInt = int(L2Norm)
alreadyThere = L2DistanceDictionary.get(L2NormInt, [])
alreadyThere.append(i)
L2DistanceDictionary[L2NormInt] = alreadyThere
theSortedKeys = sorted(L2DistanceDictionary.keys())
howManyUntilNow = 0
for i in range(0, len(L2DistanceDictionary)):
thekey = theSortedKeys[i]
thevalue = L2DistanceDictionary[thekey]
howMany = len(thevalue)
for z in range(0, howMany):
if howManyUntilNow < kVal:
kLabels.append(self.LabelsTraining[thevalue[z]])
else:
break
labels_to_count = (label for label in kLabels)
c = Counter(labels_to_count)
winLabel, count = c.most_common(1)[0]
return winLabel
The basic idea of the classifyImage function is to compare the RGBA values of the pixels of the image which needs to be classified to the RGBA values of the other images in the training dataset and return the most common tag among the tags of the k nearest neighbours.
The problem with the code is that it is incredibly slow. Are there any ways to improve the efficiency? I almost never code in Python, so there might be easy ways to improve this code.
I am following Andrew's Coursera course on machine learning. I am trying to build a 3 layers neural net for digit recognition in Python (784 input, 25 hidden, 10 output). However, I am unable to get the predictions (of the training data) correct (accuracy < 5% at 100 iter, accuracy not increasing with iteration).
J (the cost function) seems to be going down (see photo 1) and I have done gradient checking (before minimizing) and it seems to match to around 1e-11 (see photo 2).
I have compared the theta1 and theta2 after 100 iterations to my working matlab code (see code snippet 1 for octave and code snippet 2 for python). It seems theta1 is reasonably similar but theta2 is very different -- see code snippet 2. (I know they should differ because of the different optimisation routines. However, firstly, I have place the same initial thetas into both codes. Secondly, my reasoning is that they should start to converge, or at least get close, after 100 iterations)
The only error I see is:
-c:32: RuntimeWarning: overflow encountered in exp
when running the sigmoid during the optimising. However, I was told that this is not essential and it is normal to encounter this error during optimising? Furthermore, because it is a sigmoid, anytime the input is large, it will tend towards 1 anyways.
I have also attached my code in snippet 3. I have cut out all the other non-essential bits (like gradient checking) to make it as short as possible.
I would appreciate any help into this as I cannot even find where it is going wrong, let alone fix it. Thank you.
Photos:
J (cost function) decreasing to 1.8 after 12 iterations
Gradient checking before optimizing, they look very similar
Code snippet:
Initializing Neural Network Parameters ...
initial1
-0.0100100
-0.0771400
-0.1113800
-0.0230100
0.0547800
-0.0505500
-0.0731200
-0.0988700
0.0128000
-0.0855400
-0.1002500
-0.1137200
-0.0669300
-0.0999900
0.0084500
-0.0363200
-0.0588600
-0.0431100
-0.1133700
-0.0326300
0.0282800
0.0052400
-0.1134600
-0.0617700
0.0267600
initial2
0.0273700
0.1026000
-0.0502100
-0.0699100
0.0190600
0.1004000
0.0784600
-0.0075900
-0.0362100
0.0286200
Doing fminunc
Training Neural Network...
Iteration 100 | Cost: 6.219605e-01
theta1
-0.0099719
-0.0768462
-0.1109559
-0.0229224
0.0545714
-0.0503575
-0.0728415
-0.0984935
0.0127513
-0.0852143
-0.0998682
-0.1132869
-0.0666751
-0.0996092
0.0084178
-0.0361817
-0.0586359
-0.0429458
-0.1129383
-0.0325057
0.0281723
0.0052200
-0.1130279
-0.0615348
0.0266581
theta2
1.124918
1.603780
-1.266390
-0.848874
0.037956
-1.360841
2.145562
-1.448657
-1.262285
-1.357635
theta1_initial
[-0.01001 -0.07714 -0.11138 -0.02301 0.05478 -0.05055 -0.07312 -0.09887
0.0128 -0.08554 -0.10025 -0.11372 -0.06693 -0.09999 0.00845 -0.03632
-0.05886 -0.04311 -0.11337 -0.03263 0.02828 0.00524 -0.11346 -0.06177
0.02676]
theta2_initial
[ 0.02737 0.1026 -0.05021 -0.06991 0.01906 0.1004 0.07846 -0.00759
-0.03621 0.02862]
Doing fminunc
-c:32: RuntimeWarning: overflow encountered in exp
theta1
[-0.00997202 -0.07680716 -0.11086841 -0.02292044 0.05455335 -0.05034252
-0.07280686 -0.09842603 0.01275117 -0.08516515 -0.0997987 -0.11319546
-0.06664666 -0.09954009 0.00841804 -0.03617494 -0.05861458 -0.04293555
-0.1128474 -0.0325006 0.02816879 0.00522031 -0.1129369 -0.06151103
0.02665508]
theta2
[ 0.27954826 -0.08007496 -0.36449273 -0.22988024 0.06849659 -0.47803973
1.09023041 -0.25570559 -0.24537494 -0.40341995]
#-----------------BEGIN HEADERS-----------------
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
import csv
import scipy
#-----------------END HEADERS-----------------
#-----------------BEGIN FUNCTION 1-----------------
def randinitialize(L_in, L_out):
w = np.zeros((L_out, 1 + L_in))
epsilon_init = 0.12
w = np.random.rand(L_out, 1 + L_in) * 2 * epsilon_init - epsilon_init
return w
#-----------------END FUNCTION 1-----------------
#-----------------BEGIN FUNCTION 2-----------------
def sigmoid(lz):
g = 1.0/(1.0+np.exp(-lz))
return g
#-----------------END FUNCTION 2-----------------
#-----------------BEGIN FUNCTION 3-----------------
def sigmoidgradient(lz):
g = np.multiply(sigmoid(lz),(1-sigmoid(lz)))
return g
#-----------------END FUNCTION 3-----------------
#-----------------BEGIN FUNCTION 4-----------------
def nncostfunction(ltheta_ravel, linput_layer_size, lhidden_layer_size, lnum_labels, lx, ly, llambda_reg):
ltheta1 = np.array(np.reshape(ltheta_ravel[:lhidden_layer_size * (linput_layer_size + 1)], (lhidden_layer_size, (linput_layer_size + 1))))
ltheta2 = np.array(np.reshape(ltheta_ravel[lhidden_layer_size * (linput_layer_size + 1):], (lnum_labels, (lhidden_layer_size + 1))))
ltheta1_grad = np.zeros((np.shape(ltheta1)))
ltheta2_grad = np.zeros((np.shape(ltheta2)))
y_matrix = []
lm = np.shape(lx)[0]
eye_matrix = np.eye(lnum_labels)
for i in range(len(ly)):
y_matrix.append(eye_matrix[int(ly[i])-1,:]) #The minus one as python is zero based
y_matrix = np.array(y_matrix)
a1 = np.hstack((np.ones((lm,1)), lx)).astype(float)
z2 = sigmoid(ltheta1.dot(a1.T))
a2 = (np.concatenate((np.ones((np.shape(z2)[1], 1)), z2.T), axis=1)).astype(float)
a3 = sigmoid(ltheta2.dot(a2.T))
h = a3
J_unreg = 0
J = 0
J_unreg = (1/float(lm))*np.sum(\
-np.multiply(y_matrix,np.log(h.T))\
-np.multiply((1-y_matrix),np.log(1-h.T))\
,axis=None)
J = J_unreg + (llambda_reg/(2*float(lm)))*\
(np.sum(\
np.multiply(ltheta1[:,1:],ltheta1[:,1:])\
,axis=None)+np.sum(\
np.multiply(ltheta2[:,1:],ltheta2[:,1:])\
,axis=None))
delta3 = a3.T - y_matrix
delta2 = np.multiply((delta3.dot(ltheta2[:,1:])), (sigmoidgradient(ltheta1.dot(a1.T))).T)
cdelta2 = ((a2.T).dot(delta3)).T
cdelta1 = ((a1.T).dot(delta2)).T
ltheta1_grad = (1/float(lm))*cdelta1
ltheta2_grad = (1/float(lm))*cdelta2
theta1_hold = ltheta1
theta2_hold = ltheta2
theta1_hold[:,0] = 0;
theta2_hold[:,0] = 0;
ltheta1_grad = ltheta1_grad + (llambda_reg/float(lm))*theta1_hold;
ltheta2_grad = ltheta2_grad + (llambda_reg/float(lm))*theta2_hold;
thetagrad_ravel = np.concatenate((np.ravel(ltheta1_grad), np.ravel(ltheta2_grad)))
return (J, thetagrad_ravel)
#-----------------END FUNCTION 4-----------------
#-----------------BEGIN FUNCTION 5-----------------
def predict(ltheta1, ltheta2, x):
m, n = np.shape(x)
p = np.zeros(m)
h1 = sigmoid((np.hstack((np.ones((m,1)),x.astype(float)))).dot(ltheta1.T))
h2 = sigmoid((np.hstack((np.ones((m,1)),h1))).dot(ltheta2.T))
for i in range(0,np.shape(h2)[0]):
p[i] = np.argmax(h2[i,:])
return p
#-----------------END FUNCTION 5-----------------
## Setup the parameters you will use for this exercise
input_layer_size = 784; # 28x28 Input Images of Digits
hidden_layer_size = 25; # 25 hidden units
num_labels = 10; # 10 labels, from 0 to 9
data = []
#Reading in data, split into X and y, rewrite label 0 to 10 (for easy comparison to course)
with open('train.csv', 'rb') as csvfile:
has_header = csv.Sniffer().has_header(csvfile.read(1024))
csvfile.seek(0) # rewind
data_csv = csv.reader(csvfile, delimiter=',')
if has_header:
next(data_csv)
for row in data_csv:
data.append(row)
data = np.array(data)
x = data[:,1:]
y = data[:,0]
y = y.astype(int)
for i in range(len(y)):
if y[i] == 0:
y[i] = 10
#Set basic parameters
m, n = np.shape(x)
lambda_reg = 1.0
#Randomly initalize weights for Theta_initial
#theta1_initial = np.genfromtxt('tt1.csv', delimiter=',')
#theta2_initial = np.genfromtxt('tt2.csv', delimiter=',')
theta1_initial = randinitialize(input_layer_size, hidden_layer_size);
theta2_initial = randinitialize(hidden_layer_size, num_labels);
theta_initial_ravel = np.concatenate((np.ravel(theta1_initial), np.ravel(theta2_initial)))
#Doing optimize
fmin = scipy.optimize.minimize(fun=nncostfunction, x0=theta_initial_ravel, args=(input_layer_size, hidden_layer_size, num_labels, x, y, lambda_reg), method='L-BFGS-B', jac=True, options={'maxiter': 10, 'disp': True})
fmin
theta1 = np.array(np.reshape(fmin.x[:hidden_layer_size * (input_layer_size + 1)], (hidden_layer_size, (input_layer_size + 1))))
theta2 = np.array(np.reshape(fmin.x[hidden_layer_size * (input_layer_size + 1):], (num_labels, (hidden_layer_size + 1))))
p = predict(theta1, theta2, x);
for i in range(len(y)):
if y[i] == 10:
y[i] = 0
correct = [1 if a == b else 0 for (a, b) in zip(p,y)]
accuracy = (sum(map(int, correct)) / float(len(correct)))
print 'accuracy = {0}%'.format(accuracy * 100)
I think I have fixed the problem: it seems I messed up the index
should be:
y_matrix.append(eye_matrix[int(ly[i]),:])
instead of:
y_matrix.append(eye_matrix[int(ly[i])-1,:])
I'm trying to make a XOR gate by using 2 perceptron network but for some reason the network is not learning, when I plot the change of error in a graph the error comes to a static level and oscillates in that region.
I did not add any bias to the network at the moment.
import numpy as np
def S(x):
return 1/(1+np.exp(-x))
win = np.random.randn(2,2)
wout = np.random.randn(2,1)
eta = 0.15
# win = [[1,1], [2,2]]
# wout = [[1],[2]]
obj = [[0,0],[1,0],[0,1],[1,1]]
target = [0,1,1,0]
epoch = int(10000)
emajor = ""
for r in range(0,epoch):
for xy in range(len(target)):
tar = target[xy]
fdata = obj[xy]
fdata = S(np.dot(1,fdata))
hnw = np.dot(fdata,win)
hnw = S(np.dot(fdata,win))
out = np.dot(hnw,wout)
out = S(out)
diff = tar-out
E = 0.5 * np.power(diff,2)
emajor += str(E[0]) + ",\n"
delta_out = (out-tar)*(out*(1-out))
nindelta_out = delta_out * eta
wout_change = np.dot(nindelta_out[0], hnw)
for x in range(len(wout_change)):
change = wout_change[x]
wout[x] -= change
delta_in = np.dot(hnw,(1-hnw)) * np.dot(delta_out[0], wout)
nindelta_in = eta * delta_in
for x in range(len(nindelta_in)):
midway = np.dot(nindelta_in[x][0], fdata)
for y in range(len(win)):
win[y][x] -= midway[y]
f = open('xor.csv','w')
f.write(emajor) # python will convert \n to os.linesep
f.close() # you can omit in most cases as the destructor will call it
This is the error changing by the number of learning rounds. Is this correct? The red color line is the line I was expecting how the error should change.
Anything wrong I'm doing in the code? As I can't seem to figure out what's causing the error. Help much appreciated.
Thanks in advance
Here is a one hidden layer network with backpropagation which can be customized to run experiments with relu, sigmoid and other activations. After several experiments it was concluded that with relu the network performed better and reached convergence sooner, while with sigmoid the loss value fluctuated. This happens because, "the gradient of sigmoids becomes increasingly small as the absolute value of x increases".
import numpy as np
import matplotlib.pyplot as plt
from operator import xor
class neuralNetwork():
def __init__(self):
# Define hyperparameters
self.noOfInputLayers = 2
self.noOfOutputLayers = 1
self.noOfHiddenLayerNeurons = 2
# Define weights
self.W1 = np.random.rand(self.noOfInputLayers,self.noOfHiddenLayerNeurons)
self.W2 = np.random.rand(self.noOfHiddenLayerNeurons,self.noOfOutputLayers)
def relu(self,z):
return np.maximum(0,z)
def sigmoid(self,z):
return 1/(1+np.exp(-z))
def forward (self,X):
self.z2 = np.dot(X,self.W1)
self.a2 = self.relu(self.z2)
self.z3 = np.dot(self.a2,self.W2)
yHat = self.relu(self.z3)
return yHat
def costFunction(self, X, y):
#Compute cost for given X,y, use weights already stored in class.
self.yHat = self.forward(X)
J = 0.5*sum((y-self.yHat)**2)
return J
def costFunctionPrime(self,X,y):
# Compute derivative with respect to W1 and W2
delta3 = np.multiply(-(y-self.yHat),self.sigmoid(self.z3))
djw2 = np.dot(self.a2.T, delta3)
delta2 = np.dot(delta3,self.W2.T)*self.sigmoid(self.z2)
djw1 = np.dot(X.T,delta2)
return djw1,djw2
if __name__ == "__main__":
EPOCHS = 6000
SCALAR = 0.01
nn= neuralNetwork()
COST_LIST = []
inputs = [ np.array([[0,0]]), np.array([[0,1]]), np.array([[1,0]]), np.array([[1,1]])]
for epoch in xrange(1,EPOCHS):
cost = 0
for i in inputs:
X = i #inputs
y = xor(X[0][0],X[0][1])
cost += nn.costFunction(X,y)[0]
djw1,djw2 = nn.costFunctionPrime(X,y)
nn.W1 = nn.W1 - SCALAR*djw1
nn.W2 = nn.W2 - SCALAR*djw2
COST_LIST.append(cost)
plt.plot(np.arange(1,EPOCHS),COST_LIST)
plt.ylim(0,1)
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title(str('Epochs: '+str(EPOCHS)+', Scalar: '+str(SCALAR)))
plt.show()
inputs = [ np.array([[0,0]]), np.array([[0,1]]), np.array([[1,0]]), np.array([[1,1]])]
print "X\ty\ty_hat"
for inp in inputs:
print (inp[0][0],inp[0][1]),"\t",xor(inp[0][0],inp[0][1]),"\t",round(nn.forward(inp)[0][0],4)
End Result:
X y y_hat
(0, 0) 0 0.0
(0, 1) 1 0.9997
(1, 0) 1 0.9997
(1, 1) 0 0.0005
The weights obtained after training were:
nn.w1
[ [-0.81781753 0.71323677]
[ 0.48803631 -0.71286155] ]
nn.w2
[ [ 2.04849235]
[ 1.40170791] ]
I found the following youtube series extremely helpful for understanding neural nets: Neural networks demystified
There is only little which I know and also that can be explained in this answer. If you want an even better understanding of neural nets, then I would suggest you to go through the following link: cs231n: Modelling one neuron
The error calculated in each epoch should be a sum total of all sum squared errors (i.e. error for every target)
import numpy as np
def S(x):
return 1/(1+np.exp(-x))
win = np.random.randn(2,2)
wout = np.random.randn(2,1)
eta = 0.15
# win = [[1,1], [2,2]]
# wout = [[1],[2]]
obj = [[0,0],[1,0],[0,1],[1,1]]
target = [0,1,1,0]
epoch = int(10000)
emajor = ""
for r in range(0,epoch):
# ***** initialize final error *****
finalError = 0
for xy in range(len(target)):
tar = target[xy]
fdata = obj[xy]
fdata = S(np.dot(1,fdata))
hnw = np.dot(fdata,win)
hnw = S(np.dot(fdata,win))
out = np.dot(hnw,wout)
out = S(out)
diff = tar-out
E = 0.5 * np.power(diff,2)
# ***** sum all errors *****
finalError += E
delta_out = (out-tar)*(out*(1-out))
nindelta_out = delta_out * eta
wout_change = np.dot(nindelta_out[0], hnw)
for x in range(len(wout_change)):
change = wout_change[x]
wout[x] -= change
delta_in = np.dot(hnw,(1-hnw)) * np.dot(delta_out[0], wout)
nindelta_in = eta * delta_in
for x in range(len(nindelta_in)):
midway = np.dot(nindelta_in[x][0], fdata)
for y in range(len(win)):
win[y][x] -= midway[y]
# ***** Save final error *****
emajor += str(finalError[0]) + ",\n"
f = open('xor.csv','w')
f.write(emajor) # python will convert \n to os.linesep
f.close() # you can omit in most cases as the destructor will call it