I'm trying to predict some time-series signal using a number of other time-series signals. For that purpose i'm using a LSTM network. The input signals are normalized and the same are the output signals. I'm using MSE loss, and implementing using tensorflow
The network give farely good prediction, but it is very noisy. I want to make it more smooth, as if a LPF filter was used on the LSTM output.
The optimal solution for me is some parameter that i can change that will filter more/less frequencies from the LSTM output.
How can i do that? i was thinking about trying to constrain the loss function somehow?
Thanks
I tried adding a fully connected layers after the LSTM, batch normalization and single and multi-layer LSTM networks
Related
When training Neural Networks for classification in TensorFlow/Keras, or Pytorch, is it possible to put constraints on the weights in the output layer such that they are chosen from a specific finite feasible set?
For example, let's say W is the weight in the output layer, is it possible to put constraints on W such that the optimal W is selected from the set S={W_1, W_2, ..., W_n}, where each W_i is a given feasible value for W? i.e. I will give the values of the W_1,...,W_n to the model
If this is not possible in TensorFlow or Pytorch, is there any other ways to achieve this?
Thanks!
The answer you might be looking is something similar to set weights of a layer using set_weights in Tensorflow.
If that doesn't help your cause then you can check Tensorflow's Custom Layers.
I have designed convolutional neural network(tf. Keras) which has few parallel convolutional units with different kernal sizes. Then, each output results of that convolution layers are fed into another convolutional units which are in parallel. Then all the outputs are concatenated. Next flattening is done. After that I added fully connected layer and connected to the final softmax layer for multi class classification. I trained it and had good results in validation test.
However I remove the fully connected layer and accuracy was higher than the previous.
Please someone can explain, how does it happen, it will be very helpful.
Thank you for your valuable time.
Parameters as follows.
When you remove a layer, your model will have less chance of over-fitting the training set. Consequently, by making the network shallower, you make your model more robust to unknown examples and the validation accuracy increases.
Since your training accuracy is also increasing, it can be an indication that -
Exploding or vanishing gradients. You can try solving this problem using careful weight initialization, proper regularization, adding shortcuts, or gradient clipping.
You are not training for enough epochs to learn a deeper network. You can try few more epochs.
You do not have enough data to train a deeper network.
Eliminating the dense layer reduces the tendency for over fitting. Therefore your validation loss should improve as long as your model's training accuracy remains high. Alternatively you can add an additional dropout layer. You can also reduce the tendency for over fitting by using regularizers. Documentation for that is here.
I am working on a classification task which uses byte sequences as samples. A byte sequence can be normalized as input to neural networks by applying x/255 to each byte x. In this way, I trained a simple MLP and the accuracy is about 80%. Then I trained an autoencoder using 'mse' loss on the whole data to see if it works well for the task. I freezed the weights of the encoder's layers and add a softmax dense layer to it for classification. I retrained the new model (only trained the last layer) and to my surprise, the result was much worse than the MLP, merely 60% accuracy.
Can't the autoencoder learn good features from all the data? Why the result is so bad?
Possible actions to take:
Check the error of autoencoder, could it really predict itself?
Visualize the autoencoder results (dimensionality reduction), is the variance explained with fewer dimensions?
Making model more complex does not necessarily outperform simpler ones, did you plot the validation mse versus epoch? Is there a global minimum after a number of steps?
Do you have enough epochs?
What is the number of units you have in your autoencoder? It may be too less (or too much, in case of underfitting) depending on the behavior of your data and its volume.
Did you make any comparison with other dimensionality reduction methods like PCA, NMF?
Last but not least, is it the best way to engineer your features with autoencoder for this task?
"Why the result is so bad?" This is not actually a surprise. You've trained one model to be good at compressing the information. The transformations it learns at each layer do not need to be good for any other type of task at all. In fact, it could be throwing away a lot of information that is perfectly helpful for whatever auxiliary classification task you have, but which is not needed for a task purely of compressing and reconstructing the sequence.
Instead of approaching it by training a separate autoencoder, you might have better luck with just adding sparsity penalty terms from the MLP layers into the loss function, or use some other types of regularization like dropout. Finally you could consider more advanced network architectures, like ResNet / ODE layers or Inception layers, modified for a 1D sequence.
Say we train a multilayer NN in tensorflow for a regression task (i.e. multi input and multi output case). Then we have new instances and we apply the trained model and of course we get the corresponding outputs. Is there a way to backpropagate the outputs and reconstruct the inputs in tensorflow in an easy/efficient manner? What I am thinking is to then use the difference of the original and the reconstructed inputs of the new instances as a QC measure i.e. if the reconstructed inputs are not close enough to the originals then we have a problem etc. I hope I am making myself clear.
No, unfortunately you cannot take a trained model and try to get the corresponding input. The reason for this is that you have infinite possible solutions for each output.
Furthermore, backpropagation is not passing an output backwards through the network. Its the idea of determining what parameters in the model are contributing to what extent to loss function. This will not give the inputs to these hidden layers, but the extent at which the weights affected your decision.
I'm trying to model a technical process (a number of nonlinear equations) with artificial neural networks. The function has a number of inputs and a number of outputs (e.g. 50 inputs, 150 outputs - all floats).
I have tried the python library ffnet (wrapper for a fortran library) with great success. The errors for a certain dataset are well below 0.2%.
It is using a fully connected graph and these additional parameters.
Basic assumptions and limitations:
Network has feed-forward architecture.
Input units have identity activation function, all other units have sigmoid activation function.
Provided data are automatically normalized, both input and output, with a linear mapping to the range (0.15, 0.85). Each input and output is treated separately (i.e. linear map is unique for each input and output).
Function minimized during training is a sum of squared errors of each output for each training pattern.
I am using one input layer, one hidden layer (size: 2/3 of input vector + size of output vector) and an output layer. I'm using the scipy conjugate gradient optimizer.
The downside of ffnet is the long training time and the lack of functionality to use GPUs. Therefore i want to switch to a different framework and have chosen keras with TensorFlow as the backend.
I have tried to model the previous configuration:
model = Sequential()
model.add(Dense(n_hidden, input_dim=n_in))
model.add(BatchNormalization())
model.add(Dense(n_hidden))
model.add(Activation('sigmoid'))
model.add(Dense(n_out))
model.add(Activation('sigmoid'))
model.summary()
model.compile(loss='mean_squared_error',
optimizer='Adamax',
metrics=['accuracy'])
However the results are far worse, the error is up to 0.5% with a few thousand (!) epochs of training. The ffnet training was automatically canceled at 292 epochs. Furthermore the differences between the network response and the validation target are not centered around 0, but mostly negative.
I have tried all optimizers and different loss functions. I have also skipped the BatchNormalization and normalized the data manually in the same way that ffnet does it. Nothing helps.
Does anyone have a suggestion to obtain better results with keras?
I understand you are trying to re-train the same architecture from scratch, with a different library. The first fundamental issue to keep in mind here is that neural nets are not necessarily reproducible, when weights are initialized randomly.
For example, here is the default constructor parameter for Dense in Keras:
init='glorot_uniform'
But even before trying to evaluate the convergence of Keras optimizations, I would recommend trying to port the weights for which you got good results, from ffnet, into your Keras model. You can do so either with the kwarg Dense(..., weights=) of each layer, or globally at the end model.set_weights(...)
Using the same weights must yield the exact same result between the two libs. Unless you run into some floating point rounding issues. I believe that as long as porting the weights is not consistent, working on the optimization is unlikely to help.