I am trying to build a Neural Network from scratch, using only numpy. I have the following code and functions. However, the output after the training is not matching the expected output that i have (using XOR as an example). I think one of my functions is not correct but cannot figure out the mistake. The output I get is, for example: [[0.73105858], [0.53336314],[0.79343002],[0.5786911 ]], which is not close to the expected output [0,0,0,1]
I don't so any issues with your code, but here are some thing you should have in mind:
Your neural network is trained for 2 iterations, with a learning rate of 0.01. This means that your network is only updated 2 times with a small rate of improvement resulting in an undertrained neural network. Also, your always using a tensor of the size 4*4 for input, meaning that the neural network is only updated for the average of all samples, hence the result that just seems like an average.
For improvement, my suggestion would be to increase the number of iterations and also increase the number of samples for each iterations, also making sure that each iteration has more than one update. Still, i believe that you won't get 100% accurate results, since you are only using one linear layer for XOR, which can't be solved with just one linear system. You could consider adding another layer for better results.
Related
I've coded a simple neural network for XOR in python. While there is loads of information online about how to program this, there isn't much on how to feed the data through it. I've tested the change in weights after one cycle for inputs [1,1] to compare my results with my lecture slides and it's 100% the same, so I believe the code works. I can train the network for that same input, but when I change the input (and corresponding target) every cycle the error doesn't go down.
Should I allow changing the weights and inputs after every cycle or should I run through all the possible inputs first, get an average error and then change the weights? (But changing weights are dependent on the output, so what output would I use then)
I can share my code, if needed, but I'm pretty certain it's correct.
Please give me some advice? Thank you in advance.
So, you're saying you implemented a neural network on your own ?
well in this case, basically each neuron on the input layer must be assigned with a feature of a certain row, than just iterate through each layer and each neuron in that layer and calculate as instructed.
I'm sure you are familiar with the back-propagation algorithm so you'll know when to stop.
once you're done with that row, do it again to the next row, assign each feature to each of the input neurons and start the iterations again.
once youre done with all records, thats an Epoch.
I hope that answers your question.
also, I would recommend you to try out Keras, its easy to use and a good tool to be experienced in.
I'm implementing a residual cnn(modified smaller version of xception) in a low latency environment. I've done a lot of manual tuning to minimize the run time speed of my network (reducing number of filters, removing layers, etc).
But now I want to try allowing my network to make its classification prediction(final fcnn layer) on the residual connection after each residual block.
basic logic-
attempt final prediction with residual connection as input
if this fcnn layer predicts a certain class with a probability > a set threshold:
return fcnn output as if it was normal final layer
else:
do next residual block like normal and try the previous conditional again unless we are already at final block
My hope is this will allow my network to learn to solve easier problems with less computation while allowing it to still do the additional layers if it is still unsure of the classification.
So my basic question is: In pytorch, whats the best way to implement this conditional in a way that allows my nn at run time to decide whether to do more processing or not
Currently Ive tested returning the intermediate x's after the blocks in the forward function, but I dont know how best to setup the conditional to chose which x to return
Also side note: I believe I may end up needing another cnn layer between the residual and fcnn to serve as a function to convert the internal representation for processing to a representation the fcnn understands for classification.
It has already been done and presented in ICLR 2018.
It appears as if in ResNets the first few bottlenecks learn representations (and therefore cannot be skipped) while the remaining bottlenecks refine the features and therefore can be skipped at a moderate loss of accuracy. (Stanisław Jastrzebski, Devansh Arpit, Nicolas Ballas, Vikas Verma, Tong Che, Yoshua Bengio Residual Connections Encourage Iterative Inference, ICLR 2018).
This idea was taken to the extreme with sharing weights across bottlenecks in Sam Leroux, Pavlo Molchanov, Pieter Simoens, Bart Dhoedt, Thomas Breuel, Jan Kautz IamNN: Iterative and Adaptive Mobile Neural Network for efficient image classification, ICLR 2018.
For several days now, I am trying to build a simple sine-wave sequence generation using LSTM, without any glimpse of success so far.
I started from the time sequence prediction example
All what I wanted to do differently is:
Use different optimizers (e.g RMSprob) than LBFGS
Try different signals (more sine-wave components)
This is the link to my code. "experiment.py" is the main file
What I do is:
I generate artificial time-series data (sine waves)
I cut those time-series data into small sequences
The input to my model is a sequence of time 0...T, and the output is a sequence of time 1...T+1
What happens is:
The training and the validation losses goes down smoothly
The test loss is very low
However, when I try to generate arbitrary-length sequences, starting from a seed (a random sequence from the test data), everything goes wrong. The output always flats out
I simply don't see what the problem is. I am playing with this for a week now, with no progress in sight.
I would be very grateful for any help.
Thank you
This is normal behaviour and happens because your network is too confident of the quality of the input and doesn't learn to rely on the past (on it's internal state) enough, relying soley on the input. When you apply the network to its own output in the generation setting, the input to the network is not as reliable as it was in the training or validation case where it got the true input.
I have two possible solutions for you:
The first is the simplest but less intuitive one: Add a little bit of Gaussian noise to your input. This will force the network to rely more on its hidden state.
The second, is the most obvious solution: during training, feed it not the true input but its generated output with a certain probability p. Start out training with p=0 and gradually increase it so that it learns to general longer and longer sequences, independently. This is called schedualed sampling, and you can read more about it here: https://arxiv.org/abs/1506.03099 .
Background: I'm writing in Python a three-layer neural network using mini-batch stochastic gradient descent specifically designed to identify between three classes of iris plants from the famous iris data set. The input layer has four neurons, one for each feature in the data. The hidden layer has 3 neurons (but the code allows variations in hidden layer neuron numbers) and the output layer has three neurons (one for each species). All neurons use sigmoid activation functions.
Problem: The loss (mean-squared error) generally decreases over time, however the accuracy (usually below 55.55% or even 33.33%) is stagnant. I've tried experimenting with different epoch iteration numbers and learning rates, but nothing worked. Interestingly, more often than not, the outputs for the algorithm remain fixed no matter what the input values are. I'm fairly certain of my math, since the loss seems to be decreasing as the number of epochs increases.
To replicate problem: Just run the Python code and observe the LEARNING_RESULTS.txt file. (Make sure iris.txt file in the repo is in same directory)
Question: How can I improve performance for this neural network?
Link to GitHub repo: https://github.com/kwonkyo/neural-networks
Thanks!
UPDATE: Problem solved. I was adding a constant value (numerical sum of the sum of the mini-batch matrices) to the weight and bias matrices instead of the sum of the mini-batch gradient matrices. Updated code has been pushed to github.
I have a neural network with one input, three hidden neurons and one output. I have 720 input and corresponding target values, 540 for training, 180 for testing.
When I train my network using Logistic Sigmoid or Tan Sigmoid function, I get the same outputs while testing, i.e. I get same number for all 180 output values. When I use Linear activation function, I get NaN, because apparently, the value gets too high.
Is there any activation function to use in such a case? Or any improvements to be done? I can update the question with details and code if required.
Neural nets are not stable when fed input data on arbitrary scales (such as between approximately 0 and 1000 in your case). If your output units are tanh they can't even predict values outside the range -1 to 1 or 0 to 1 for logistic units!
You should try recentering/scaling the data (making it have mean zero and unit variance - this is called standard scaling in the datascience community). Since it is a lossless transformation you can revert back to your original scale once you've trained the net and predicted on the data.
Additionally, a linear output unit is probably the best as it makes no assumptions about the output space and I've found tanh units to do much better on recurrent neural networks in low dimensional input/hidden/output nets.
Newmu is right that the scaling is probably the issue here; you need to scale your inputs to lie in the valid range. (Standardization to zero mean, unit variance, as they suggest, though, isn't a great choice since that means about a third of your data will like outside [-1, 1]....) I don't know about pybrain, but in scikit-learn you'd want sklearn.preprocessing.MinMaxScaler.
But, also, in the comments you said your dataset looks like this:
where the horizontal axis is inputs, vertical is targets. So, when you see an input of 200, you have one training example saying it's 80 and one saying it's 320; what do you want it to say then? An "optimal" neural network (which may be hard to achieve) would predict 200 or so.
You may need to think about how to reframe your learning problem to be a more-consistent function from inputs to targets.