Multiple real inputs and multiple real outputs in a neural network - python

How can I train a perceptron where there are multiple input and output nodes and both are real-valued?
I'm doing this because I want to train a neural network to predict the MFCCs given some data points (from the signal.)
Here is an example data: http://pastebin.com/dtHGUeax
I wont put the data here because the file is "big".
I am using nolearn at the moment, because later I will add more layers for deep learning.
net = NeuralNet(
layers=[('input', layers.InputLayer),
('output', layers.DenseLayer),
],
# Layer parameters
input_shape=(None, 256),
output_nonlinearity=lasagne.nonlinearities.softmax,
output_num_units=13,
# Optimization
update=nesterov_momentum,
update_learning_rate=0.01,
update_momentum=0.9,
regression=True,
max_epochs=500,
verbose=1
)
The error rate I get with this approach is very high.

MFCC extraction from power spectrum is non-linear operation, you can not reproduce it with a single layer. If you want to reproduce it with multiple layers you need to consider MFCC algorithm itself.
MFCC extraction might be represented with the following neural network:
Layer 1 dense matrix of size 256x40 with logarithm nonlinearity
Layer 2 dense matrix of size 40x13 without nonlinearity (same as linear nonlinearity or identical nonlinearity in Lasagne)
If you reproduce this network with nolearn it will be able to learn it properly, however, logarithm nonlinearity is not implemented yet in Lasagne, so you will have to implement it yourself. Another solution would be to replace logarithm nonlinearity with tanh or with couple of standard non-linear layers.
So to reproduce MFCC you need to have 2 layers with logarithm nonlinearity or 3-4 layers with say softmax nonlinearity and the last layer must be configured linear nonlinearity.

Related

Using MLP for Feature Extraction and Dimension Reduction

I'm trying to build a model that use MLP for feature extraction and dimension reduction. The model could transform the data from 204 dimensions to 80 dimensions after this process. The proposed model is as follows:
A 512 dimension dense layer with the input of original data (204 dimension)
A 256 dimension dense layer with the input of 512 dimensions
A 80 dimension dense layer with the input of 256 dimensions
The proposed training epoch is 1, and the output of the MLP is regarded as the input of the further models (such as, LR, SVM, etc.)
My question is: When training the MLP, what loss function should I set? Is the MSE loss OK, or I should use other loss functions? Thanks!
What would you be training this MLP on? (what would be the target 80-dimensional "Y"?)
MLPs are used to learn features at the same time as the model. For example if you wanted to have an MLP that does linear regression and learns a set of features that are 80-dimensional you could create something like this:
model = keras.models.Sequential()
model.add(layers.Dense(80, input_dim=512, activation=MY_ACTIVATION))
model.add(layers.Dense(1))
model.compile(loss="mean_squared_error")
In the last layer, the network will learn to find the "best" weights and biases to capture Y as a function of the 80 features extracted. These features are in turn a function of X - a function the network learns by adjusting for how well these features are able to capture Y (this is backpropagation).
So creating an MLP just to learn features doesn't make sense without a problem statement for what these features are supposed to do.
As such I would recommend using something like Principal Component Analysis or Singular Value Decomposition. These project the data onto the k-dimensional space that captures the most variance (information) in the data.

how will I be able to choose the right parameters and the structure of a neural networks during the training phase?

I'm trying to train a neural network in a supervised learning which has as input x_train a list of 100 list each containing 2000 column ....... and a target output y_train which has a list of 100 list also but contains each 20 column.
This is what x_train and y_train look like:
here is the neural networks that I created :
dnnmodel = tf.keras.models.Sequential()
dnnmodel.add(tf.keras.layers.Dense(40, input_dim = len(id2word), activation='relu'))
dnnmodel.add(tf.keras.layers.Dense(20, activation='relu'))
dnnmodel. compile ( loss = tf.keras.losses.MeanSquaredLogarithmicError(), optimizer = 'adam' , metrics = [ 'accuracy' ])
during the training phase I cannot choose the right number of neurons, layers and the activation and loss functions, since the accurency and loss values are not at all reasonable. .... can someone help me please?
Here is the display after the execution:
There is no correct method or formula to decide the correct number of layers or neurons or any other functions you use in your model. It all comes down experimentation and what works best for your data and the problem that you are trying to solve.
Here are some tips:
sigmoid, tanh = These activations are generally not used in hidden layers as their computed slopes or gradient is very small. So the model can take a long to converge.
Relu, elu, leaky relu - These activations can be used used in hidden layers as they have steep slope compared to others so the training process is fast. Relu is commonly used.
Layers: The more layers you add the deeper you make your neural network. Deeper neural networks are able to learn complex features about your data but they are prone to overfitting. Also, Deep Neural Network suffers from problems like vanishing gradient or exploding gradients. Fewer layers mean fewer params to learn and prone to underfitting.
Loss Function - Loss function depends on the problem you are trying to solve.
For classification
If y_label is categorical go for categorical_cross_entropy
If y_label is discreet go for sparse_categorical_cross_entropy
For regression problems
Use Rmse or MSE
Coming to the training logs. Your model is training as you can see the loss at each epoch less than the previous one. You should train your model for more epochs in order to see improvements in your accuracy.

How to make a neural network generalizes better?

I designed a neural network model with large number of output predicted by softmax function. However, I want categorize all the outputs into 5 outputs without modifying the architecture of other layers. The model performs well in the first case but when I decrease the number of output it loses accuracy and get a bad generalization. My question is : Is there a method to make my model performs well even if there is just 5 outputs ? for example : adding dropout layer before output layer, using other activation function, etc.
If it is a plain neural network then yeah definitely use the RelU activation function in the hidden layers and add dropout layer for each hidden layer. Also you can normalize you data before feeding them to the network.

Using softmax in hidden layer and relu in output layer for CNN regression

I am creating a CNN to predict the distributed strain applied to an optical fiber from the measured light spectrum (2D), which is ideally a Lorentzian curve. The label is a 1D array where only the strained section is non-zero (the label looks like a square wave).
My CNN has 10 alternating convolution and pooling layers, all activated by RelU. This is then followed by 3 fully-connected hidden layers with softmax activation, then an output layer activated by RelU. Usually, CNNs and other neural networks make use of RelU for hidden layers and then softmax for output layer (in the case of classification problems). But in this case, I use softmax to first determine the positions of optical fiber for which strain is applied (i.e. non-zero) and then use RelU in the output for regression. My CNN is able to predict the labels rather accurately, but I cannot find any supporting publication where softmax is used in hidden layers, followed by RelU in the output layer; nor why this approach is conversely not recommended (i.e. not mathematically possible) other than those I found in Quora/Stackoverflow. I would really appreciate if anyone could enlighten me on this matter as I am pretty new to deep learning, and wish to learn from this. Thank you in advance!
If you look at the way a layer l sees the input from a previous layer l-1, it is assuming that the dimensions of the feature vector are linearly independent.
If the model is building some kind of confidence using a set of neurons, then the neurons better be linearly independent otherwise it is simply exaggerating the value of 1 neuron.
If you apply softmax in hidden layers then you are essentially combining multiple neurons and tampering with their independence. Also if you look at the reasons why ReLU is preferred is because it can give you a better gradients which other activations like sigmoid won't. Also if your goal is too add normalization to your layers, you’d be better off with using an explicit batch normalization layer

How to represent the coefficients of a multi layer perception neural network?

For learning purposes, I'm trying to code from stratch a simple multi layer perceptron (MLP) neural network, with:
2500 inputs in the input layer,
100 neurons in hidden layer's #1 and #2,
and 10 outputs in the output layer
and backpropagation, without using tensorflow or such ready to use tools.
Each neuron in the hidden layer #1 has to be connected to the 2500 inputs and requires to store 2500 coefficients. The same applies for all neurons of all layers.
Question: which datastructure is usually used to store all the coefficients from the neurons of layer n-1 to specific neurons of layer n?
Is there a unique data structure (for example in Numpy) that can store all these coefficients for the whole MLP?
Is a tensor (n dim array) mandatory for such things?
Neural networks are mostly just a series of matrix multiplications, and non linear transformations. Hence n dimensional arrays are the natural storage method. Depending on the application you could use a sparse matrix which stores coeficients and indeces of those coeficients. But in general the storage is just matrices.
A good peak under the hood of libraries like tensorflow is to look at/implement a neural network in numpy.

Categories