I am trying to understand why this sample Neural Network with Numpy does not learn non-linear data. Even a simple NN is supposed to learn non-linear data right?
I want my NN to learn that if the input is 1 then 0 if the input is greater than 1 and less than 4 then 1. If value > 4 then 0.
I have tried many sample NN codes with numpy from google, I seem to get this problem.
The below code does not learn, but learn well with input [2,2,0,0] desired [1,1,0,0].
import numpy as np
# #sigmoid function
def nonlin(x,deriv=False):
if(deriv==True):
return x*(1-x)
return 1/(1+np.exp(-x))
# input dataset
X = np.array([ [1],
[2],
[3],
[4] ])
# #output dataset
y = np.array([[0,1,1,0]]).T
# #seed random numbers to make calculation
# #deterministic (just a good practice)
np.random.seed(1)
# #initialize weights randomly with mean 0
syn0 = 2*np.random.random((1,1)) - 1
for iter in range(10000):
# #forward propagation
l0 = X
l1 = nonlin(np.dot(l0,syn0))
# #how much did we miss?
l1_error = y - l1
# multiply how much we missed by the
# slope of the sigmoid at the values in l1
l1_delta = l1_error * nonlin(l1,True)
# #update weights
syn0 += np.dot(l0.T,l1_delta)
print ("Output After Training:")
print (l1)
Because your model is essentially a linear model. You need to add at least one hidden layer if you want to fit nonlinear data.
As already said, you have built a simple linear logistic regression model.
The sigmoid in your NN is only used to get the prediction of your model and not to actually non-linearly train the NN.
A good start at learning neural networks is this: http://www.wildml.com/2015/09/implementing-a-neural-network-from-scratch/
Related
Hi I'm new in neuralNetworks with tensorflow. I've taken a small fraction of the spaces365 dataset. I want to make a neural network to classify betweeen 10 places.
For that I've tried to do a small copy of a vgg network. The problem I have is that at the output of the softmax function I get a one-hot encoded array. Looking for problems in my code, I've realised that the output of relu functions are either 0 or a big number (around 10000).
I don't know where I'm wrong. Here it's my code:
def variables(shape):
return tf.Variable(2*tf.random_uniform(shape,seed=1)-1)
def layerConv(x,filter):
return tf.nn.conv2d(x,filter, strides=[1, 1, 1, 1], padding='SAME')
def maxpool(x):
return tf.nn.max_pool(x,[1,2,2,1],[1,2,2,1],padding='SAME')
weights0 = variables([3,3,1,16])
l0 = tf.nn.relu(layerConv(input,weights0))
l0 = maxpool(l0)
weights1 = variables([3,3,16,32])
l1 = tf.nn.relu(layerConv(l0,weights1))
l1 = maxpool(l1)
weights2 = variables([3,3,32,64])
l2 = tf.nn.relu(layerConv(l1,weights2))
l2 = maxpool(l2)
l3 = tf.reshape(l2,[-1,64*32*32])
syn0 = variables([64*32*32,1024])
bias0 = variables([1024])
l4 = tf.nn.relu(tf.matmul(l3,syn0) + bias0)
l4 = tf.layers.dropout(inputs=l4, rate=0.4)
syn1 = variables([1024,10])
bias1 = variables([10])
output_pred = tf.nn.softmax(tf.matmul(l4,syn1) + bias1)
error = tf.square(tf.subtract(output_pred,output),name='error')
loss = tf.reduce_sum(error, name='cost')
#TRAINING
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
train = optimizer.minimize(loss)
The input of the neural netWork is a normalized grayscale image of 256*256 pixels.
The learning Rate is 0.1 and the Batch Size is 32.
Thank you in advance!!
Essentially what reLu is :
def relu(vector):
vector[vector < 0] = 0
return vector
and softmax:
def softmax(x):
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum(axis=0)
The output of softmax being a one-hot encoded array means there is a problem and that could be many things.
You can try reducing the learning_rate for starters, you can use 1e-4 / 1e-3 and check. If it doesn't work, try adding some regularization. I am also skeptical about your weight initialization.
Regulatization : This is a form of regression, that constrains/ regularizes or shrinks the coefficient estimates towards zero. In other words, this technique discourages learning a more complex or flexible model, so as to avoid the risk of overfitting. - Regularization in ML
Link to : Build a multilayer neural network with L2 regularization in tensorflow
The problem you have is your weight initialization. NN are highly complicated non-convex optimization problems. Therefore, a good init is paramount to getting any good results. If you use ReLUs you should use the Initialization proposed by He et al. (https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/He_Delving_Deep_into_ICCV_2015_paper.pdf?spm=5176.100239.blogcont55892.28.pm8zm1&file=He_Delving_Deep_into_ICCV_2015_paper.pdf).
In Essence the initialization of your network should be initialized with iid gaussian distributed values with mean 0 and standard deviation as follows:
stddev = sqrt(2 / Nr_input_neurons)
I'm learning Logistic regression in Python and have managed to fit a model to my existing data (stock market data), and the predictions produce a nice result.
But I do not know how to convert that predictive model in a way I can apply it to future data. Ie is there a y=ax+b algo I can use to input future samples? How do I use the 'model'? How does one use the prediction for subsequent data? Or am I off track here - is Logistic regression not applied in this manner?
When you train the Logistic Regression, you learn the parameters a and b in y = ax + b. So, after training, a and b are known and can be used to solve the equation y = ax + b.
I don't know what exact Python packages you did use to train your model and how many classes you have, but if it would be, let's say, numpy and 2 classes, the prediction function could look like this:
import numpy as np
def sigmoid(z):
"""
Compute the sigmoid of z.
Arguments:
z: a scalar or numpy array
Return:
s: sigmoid(z)
"""
s = 1 / (1 + np.exp(-z))
return s
def predict(w, b, X):
'''
Predict whether the label is 0 or 1 using learned logistic
regression parameters (w, b).
Arguments:
w: weights
b: bias
X: data to predict
Returns:
Y_pred: a numpy array (vector) containing all predictions (0/1)
for the examples in X
'''
m = X.shape[1] # number of instances in X
Y_pred = np.zeros((1,m))
w = w.reshape(X.shape[0], 1)
# Apply the same activation function which you applied during
# training, in this case it is a sigmoid
A = sigmoid(np.dot(w.T, X) + b)
for i in range(A.shape[1]):
# Convert probabilities A[0,i] to actual predictions p[0,i]
if A[0, i] > 0.5:
Y_pred[0, i] = 1
else:
Y_pred[0, i] = 0
return Y_pred
I am trying to implement a XOR in neural networks with the typology of 2 inputs, 1 element in the hidden layer, and 1 output. But the learning rate is really bad (0,5). I think it is because I am missing a connection between the inputs AND the outputs, but I am not really sure how to do it. I have already made the bias connection so that the learning is better. Only using Numpy.
def sigmoid_output_to_derivative(output):
return output*(1-output)
a=0.1
X = np.array([[0,0],
[0,1],
[1,0],
[1,1]])
np.random.seed(1)
y = np.array([[0],
[1],
[1],
[0]])
bias = np.ones(4)
X = np.c_[bias, X]
synapse_0 = 2*np.random.random((3,1)) - 1
synapse_1 = 2*np.random.random((1,1)) - 1
for j in (0,600000):
layer_0 = X
layer_1 = sigmoid(np.dot(layer_0,synapse_0))
layer_2 = sigmoid(np.dot(layer_1,synapse_1))
layer_2_error = layer_2 - y
if (j% 10000) == 0:
print( "Error after "+str(j)+" iterations:" + str(np.mean(np.abs(layer_2_error))))
layer_2_delta = layer_2_error*sigmoid_output_to_derivative(layer_2)
layer_1_error = layer_2_delta.dot(synapse_1.T)
layer_1_delta = layer_1_error * sigmoid_output_to_derivative(layer_1)
synapse_1 -= a *(layer_1.T.dot(layer_2_delta))
synapse_0 -= a *(layer_0.T.dot(layer_1_delta))
You need to be careful with statements like
the learning rate is bad
Usually the learning rate is the step size that gradient descent takes in negative gradient direction. So, I'm not sure what you mean by a bad learning rate.
I'm also not sure if I understand your code correctly, but the forward step of a neural net is basically a matrix multiplication of the weight matrix for the hidden layer times the input vector. This will (if you set up everything correctly) result in a matrix which is equal to the size of your hidden layer. Now, you can simply add the bias before applying your logistic function elementwise to this matrix.
h_i = f(h_i+bias_in)
Afterwards you can do the same thing for the hidden layer times the output weights and apply its activation to get the outputs.
o_j = f(o_j+bias_h)
The backwards step is to calculate the deltas at output and hidden layer including another elementwise operation with your function
sigmoid_output_to_derivative(output)
and update both weight matrices using the gradients (here the learning rate is needed to define the step size). The gradients are simply the value of a corresponding node times its delta.
Note: The deltas are differently calculated for output and hidden nodes.
I'd advice you to keep separate variables for the biases. Because modern approaches usually update those by summing up the deltas of its connected notes times a different learning rate and subtract this product from the specific bias.
Take a look at the following tutorial (it uses numpy):
http://peterroelants.github.io/posts/neural_network_implementation_part04/
I am new to neural networks and I am using an example neural network I found online to attempt to approximate the sphere function(the addition of a set of numbers squared) using back propagation.
The initial code is:
class NeuralNetwork():
def __init__(self):
#Seed the random number generator, so it generates the same numbers
#every time the program is run.
#random.seed(1)
#Model a single neuron, with 3 input connections and 1 output connection.
#We assign random weights to a 3 x 1 matrix, with values in the range -1 to 1
#and mean 0
self.synaptic_weights = 2 * random.random((2,1)) - 1
#The Sigmoid function, which describes an S shaped curve.
#We pass the weighted sum of thle inputs through this function to
#normalise them between 0 and 1.
def __sigmoid(self, x):
return 1 / (1 + exp(-x))
#The derivative of the sigmoid function
#This is the gradient of the sigmoid curve.
#It indicates how confident we are about existing weight.
def __sigmoid_derivative(self, x):
return x * (1 -x)
#Train the network through a process of trial and error.
#Adjusting the synaptic weights each time.
def train(self, training_set_inputs, training_set_outputs, number_of_training_iterations):
for iteration in xrange(10000):
#Pass the training set through our neural network(a single neuron)
output = self.think(training_set_inputs)
#Calculate the error(Difference between the desired output and predicted output).
error = training_set_outputs - output
#Multiply the error by the input and again by the gradient of the Sigmoid curve.
#This means less confident weights are adjusted more.
#This means inputs, which are zero, do not cause changes to the weights.
adjustment = dot(training_set_inputs.T, error * self.__sigmoid_derivative(output))
#Adjust the weights
self.synaptic_weights += adjustment
#The neural network thinks.
def think(self, inputs):
#Pass inputs through our neural network(OUR SINGLE NEURON).
return self.__sigmoid(dot(inputs, self.synaptic_weights))
if __name__ == "__main__":
#Initialise a single neuron neural network.
neural_network = NeuralNetwork()
print"Random starting synaptic weights: "
print neural_network.synaptic_weights
#The training set. We have 4 examples, each consisting of 3 input values and 1 output value
training_set_inputs = array([[0, 1], [1,0], [0,0]])
training_set_outputs = array([[1,1,0]]).T
#Train the neural network using a training set.
#Do it 10,000 times and make small adjustments each time.
neural_network.train(training_set_inputs, training_set_outputs, 10000)
print "New synaptic weights after training: "
print neural_network.synaptic_weights
#Test the neural network with a new situation.
print "Considering new situation [1,1] -> ?: "
print neural_network.think(array([1,1]))
My aim is to input training data(the sphere function input and outputs) into the neural network to train it and meaningfully adjust the weights. After continuous training the weights should reach a point where reasonably accurate results are given from the training inputs.
I imagine an example of some training sets for the sphere function would be something like:
training_set_inputs = array([[2,1], [3,2], [4,6], [8,3]])
training_set_outputs = array([[5, 13, 52, 73]])
The example I found online can successfully approximate the XOR operation, but when given sphere function inputs it only gives me an output of 1 when tested on a new example(for example, [6,7] which should ideally return an approximation around 85)
From what I have read about neural networks I suspect this is because I need to normalize the inputs but I am not entirely sure how to do this. Any help on this or something to point me on the right track would be appreciated a lot, thank you.
This is an example of use Elman Recurrent Neural Network from Neurolab Python Library:
import neurolab as nl
import numpy as np
# Create train samples
i1 = np.sin(np.arange(0, 20))
i2 = np.sin(np.arange(0, 20)) * 2
t1 = np.ones([1, 20])
t2 = np.ones([1, 20]) * 2
input = np.array([i1, i2, i1, i2]).reshape(20 * 4, 1)
target = np.array([t1, t2, t1, t2]).reshape(20 * 4, 1)
# Create network with 2 layers
net = nl.net.newelm([[-2, 2]], [10, 1], [nl.trans.TanSig(), nl.trans.PureLin()])
# Set initialized functions and init
net.layers[0].initf = nl.init.InitRand([-0.1, 0.1], 'wb')
net.layers[1].initf= nl.init.InitRand([-0.1, 0.1], 'wb')
net.init()
# Train network
error = net.train(input, target, epochs=500, show=100, goal=0.01)
# Simulate network
output = net.sim(input)
# Plot result
import pylab as pl
pl.subplot(211)
pl.plot(error)
pl.xlabel('Epoch number')
pl.ylabel('Train error (default MSE)')
pl.subplot(212)
pl.plot(target.reshape(80))
pl.plot(output.reshape(80))
pl.legend(['train target', 'net output'])
pl.show()
In this example it's merging 2 unit length input and also it's merging 2 unit length output. After that it's training the network with these merged arrays.
First of all it doesn't seem like the schema that I got from here:
My main question is;
I have to train the network with arbitrary length of inputs and outputs like these:
Arbitrary length inputs to fixed length outputs
Fixed length inputs to arbitrary length outputs
Arbitrary length inputs to arbitrary length outputs
At this point this will come to your mind: "Your answer is Long short-term memory networks."
And I know It but Neurolab is easy to use because of it's good features. Particularly, it is exceptionally Pythonic. So I'm insisting on using Neurolab Library for my problem. But if you suggest me another library like Neurolab with better LSTM functionality, I will accept it.
Eventually, How can I rearrange this example for arbitrary length of inputs and outputs?
I don't have the best understanding about RNNs and LSTMs so please be explanatory.
After a long time when I look at this question of mine today, I can see it was a question of a person with lack of understanding about neural networks.
Matrix multiplication is the basic math at the heart of neural networks. You can not simply change the shape of input matrix because it changes the shape of the product and breaks the consistency among the dataset.
Neural networks are always trained with fixed length of input and output. Here is a very simple neural network implementation that using nothing but numpy's dot product to feedforward:
import numpy as np
# sigmoid function
def nonlin(x,deriv=False):
if(deriv==True):
return x*(1-x)
return 1/(1+np.exp(-x))
# input dataset
X = np.array([ [0,0,1],
[0,1,1],
[1,0,1],
[1,1,1] ])
# output dataset
y = np.array([[0,0,1,1]]).T
# seed random numbers to make calculation
# deterministic (just a good practice)
np.random.seed(1)
# initialize weights randomly with mean 0
syn0 = 2*np.random.random((3,1)) - 1
for iter in xrange(10000):
# forward propagation
l0 = X
l1 = nonlin(np.dot(l0,syn0))
# how much did we miss?
l1_error = y - l1
# multiply how much we missed by the
# slope of the sigmoid at the values in l1
l1_delta = l1_error * nonlin(l1,True)
# update weights
syn0 += np.dot(l0.T,l1_delta)
print "Output After Training:"
print l1
credit: http://iamtrask.github.io/2015/07/12/basic-python-network/