For several days now, I am trying to build a simple sine-wave sequence generation using LSTM, without any glimpse of success so far.
I started from the time sequence prediction example
All what I wanted to do differently is:
Use different optimizers (e.g RMSprob) than LBFGS
Try different signals (more sine-wave components)
This is the link to my code. "experiment.py" is the main file
What I do is:
I generate artificial time-series data (sine waves)
I cut those time-series data into small sequences
The input to my model is a sequence of time 0...T, and the output is a sequence of time 1...T+1
What happens is:
The training and the validation losses goes down smoothly
The test loss is very low
However, when I try to generate arbitrary-length sequences, starting from a seed (a random sequence from the test data), everything goes wrong. The output always flats out
I simply don't see what the problem is. I am playing with this for a week now, with no progress in sight.
I would be very grateful for any help.
Thank you
This is normal behaviour and happens because your network is too confident of the quality of the input and doesn't learn to rely on the past (on it's internal state) enough, relying soley on the input. When you apply the network to its own output in the generation setting, the input to the network is not as reliable as it was in the training or validation case where it got the true input.
I have two possible solutions for you:
The first is the simplest but less intuitive one: Add a little bit of Gaussian noise to your input. This will force the network to rely more on its hidden state.
The second, is the most obvious solution: during training, feed it not the true input but its generated output with a certain probability p. Start out training with p=0 and gradually increase it so that it learns to general longer and longer sequences, independently. This is called schedualed sampling, and you can read more about it here: https://arxiv.org/abs/1506.03099 .
Related
I am trying to build a Neural Network from scratch, using only numpy. I have the following code and functions. However, the output after the training is not matching the expected output that i have (using XOR as an example). I think one of my functions is not correct but cannot figure out the mistake. The output I get is, for example: [[0.73105858], [0.53336314],[0.79343002],[0.5786911 ]], which is not close to the expected output [0,0,0,1]
I don't so any issues with your code, but here are some thing you should have in mind:
Your neural network is trained for 2 iterations, with a learning rate of 0.01. This means that your network is only updated 2 times with a small rate of improvement resulting in an undertrained neural network. Also, your always using a tensor of the size 4*4 for input, meaning that the neural network is only updated for the average of all samples, hence the result that just seems like an average.
For improvement, my suggestion would be to increase the number of iterations and also increase the number of samples for each iterations, also making sure that each iteration has more than one update. Still, i believe that you won't get 100% accurate results, since you are only using one linear layer for XOR, which can't be solved with just one linear system. You could consider adding another layer for better results.
I'm new on Neural Networks and I am doing a project that has to define a NN and train it. I've defined a NN of 2 hidden layers with 17 inputs and 17 output. The NN has 21 inputs and 3 outputs.
I have a data set of labels of 10 million, and a dataset of samples of another 10 million. My first issue is about the size of the validation set and the training set. I'm using PyTorch and batches, and of what I've read, the batches shouldn't be larger. But I don't know how many approximately should be the size of the sets.
I've tried with larger and small numbers, but I cannot find a correlation that shows me if I'm right choosing a large set o small set in one of them (apart from the time that requires to process a very large set).
My second issue is about the Training and Validation loss, which I've read that can tell me if I'm overfitting or underfitting depending on if it is bigger or smaller. The perfect should be the same value for both, and it also depends on the epochs. But I am not able to tune the network parameters like batch size, learning rate or choosing how much data should I use in the training and validation. If 80% of the set (8 million), it takes hours to finish it, and I'm afraid that if I choose a smaller dataset, it won't learn.
If anything is badly explained, please feel free to ask me for more information. As I said, the data is given, and I only have to define the network and train it with PyTorch.
Thanks!
For your first question about batch size, there is no fix rule for what value should it have. You have to try and see which one works best. When your NN starts performing badly don't go above or below that value for batch size. There is no hard rule here to follow.
For your second question, first of all, having training and validation loss same doesn't mean your NN is performing nicely, it is just an indication that its performance will be good enough on a test set if the above is the case, but it largely depends on many other things like your train and test set distribution.
And with NN you need to try as many things you can try. Try different parameter values, train and validation split size, etc. You cannot just assume that it won't work.
I had implemented a CNN with 3 Convolutional layers with Maxpooling and dropout after each layer
I had noticed that when I trained the model for the first time it gave me 88% as testing accuracy but after retraining it for the second time successively, with the same training dataset it gave me 92% as testing accuracy.
I could not understand this behavior, is it possible that the model had overfitting in the second training process?
Thank you in advance for any help!
It is quite possible if you have not provided the seed number set.seed( ) in the R language or tf.random.set_seed(any_no.) in python
Well I am no expert when it comes to machine learning but I do know the math behind it. What you are doing when you train a neural network you basicly find the local minima to the loss function. What this means is that the end result will heavily depend on the initial guess of all of the internal varaibles.
Usually the variables are randomized as a initial estimation and you could therefore reach quite different results from running the training process multiple times.
That being said, from when I studied the subject I was told that you usually reach similar regardless of the initial guess of the parameters. However it is hard to say if 0.88 and 0.92 would be considered similar or not.
Hope this gives a somewhat possible answer to your question.
As mentioned in another answer, you could remove the randomization, both in the parameter initialization of the parameters and the randomization of the data used for each epoch of training by introducing a seed. This would insure that when you run it twice, everything will get "randomized" in the exact same order. In tensorflow this is done using for example tf.random.set_seed(1), the number 1 can be changed to any number to get a new seed.
I've coded a simple neural network for XOR in python. While there is loads of information online about how to program this, there isn't much on how to feed the data through it. I've tested the change in weights after one cycle for inputs [1,1] to compare my results with my lecture slides and it's 100% the same, so I believe the code works. I can train the network for that same input, but when I change the input (and corresponding target) every cycle the error doesn't go down.
Should I allow changing the weights and inputs after every cycle or should I run through all the possible inputs first, get an average error and then change the weights? (But changing weights are dependent on the output, so what output would I use then)
I can share my code, if needed, but I'm pretty certain it's correct.
Please give me some advice? Thank you in advance.
So, you're saying you implemented a neural network on your own ?
well in this case, basically each neuron on the input layer must be assigned with a feature of a certain row, than just iterate through each layer and each neuron in that layer and calculate as instructed.
I'm sure you are familiar with the back-propagation algorithm so you'll know when to stop.
once you're done with that row, do it again to the next row, assign each feature to each of the input neurons and start the iterations again.
once youre done with all records, thats an Epoch.
I hope that answers your question.
also, I would recommend you to try out Keras, its easy to use and a good tool to be experienced in.
I'm trying to build a multilabel-classifier to predict the probabilities of some input data being either 0 or 1. I'm using a neural network and Tensorflow + Keras (maybe a CNN later).
The problem is the following:
The data is highly skewed. There are a lot more negative examples than positive maybe 90:10. So my neural network nearly always outputs very low probabilities for positive examples. Using binary numbers it would predict 0 in most of the cases.
The performance is > 95% for nearly all classes, but this is due to the fact that it nearly always predicts zero...
Therefore the number of false negatives is very high.
Some suggestions how to fix this?
Here are the ideas I considered so far:
Punishing false negatives more with a customized loss function (my first attempt failed). Similar to class weighting positive examples inside a class more than negative ones. This is similar to class weights but within a class.
How would you implement this in Keras?
Oversampling positive examples by cloning them and then overfitting the neural network such that positive and negative examples are balanced.
Thanks in advance!
You're on the right track.
Usually, you would either balance your data set before training, i.e. reducing the over-represented class or generate artificial (augmented) data for the under-represented class to boost its occurrence.
Reduce over-represented class
This one is simpler, you would just randomly pick as many samples as there are in the under-represented class, discard the rest and train with the new subset. The disadvantage of course is that you're losing some learning potential, depending on how complex (how many features) your task has.
Augment data
Depending on the kind of data you're working with, you can "augment" data. That just means that you take existing samples from your data and slightly modify them and use them as additional samples. This works very well with image data, sound data. You could flip/rotate, scale, add-noise, in-/decrease brightness, scale, crop etc.
The important thing here is that you stay within bounds of what could happen in the real world. If for example you want to recognize a "70mph speed limit" sign, well, flipping it doesn't make sense, you will never encounter an actual flipped 70mph sign. If you want to recognize a flower, flipping or rotating it is permissible. Same for sound, changing volume / frequency slighty won't matter much. But reversing the audio track changes its "meaning" and you won't have to recognize backwards spoken words in the real world.
Now if you have to augment tabular data like sales data, metadata, etc... that's much trickier as you have to be careful not to implicitly feed your own assumptions into the model.
I think your two suggestions are already quite good.
You can also simply undersample the negativ class, of course.
def balance_occurences(dataframe, zielspalte=target_name, faktor=1):
least_frequent_observation=dataframe[zielspalte].value_counts().idxmin()
bottleneck=len(dataframe[dataframe[zielspalte]==least_frequent_observation])
balanced_indices=dataframe.index[dataframe[zielspalte]==least_frequent_observation].tolist()
for value in (set(dataframe[zielspalte])-{least_frequent_observation}):
full_list=dataframe.index[dataframe[zielspalte]==value].tolist()
selection=np.random.choice(a=full_list,size=bottleneck*faktor, replace=False)
balanced_indices=np.append(balanced_indices,selection)
df_balanced=dataframe[dataframe.index.isin(balanced_indices)]
return df_balanced
Your loss function could look into the recall of the positive class combined with some other measurement.