is there any rule to choose the number of layers in convolutional neural network or it's just as try and error ? so if we have an input image of size 256 x 256 how can we create an best mode, how many conv layer, max pool, FC, the filter sizes and learning rate. Also how can we know the optimal number of epochs for training the dataset. Any ideas
There has been a good discussion of this topic on stats.stackexchange:
How to choose the number of hidden layers and nodes in a feedforward neural network?
Otherwise, there is no straight and easy answer to this. I can only suggest to dive into scientific literature:
https://scholar.google.com/scholar?q=optimal+size+of+neural+network&btnG=&hl=en&as_sdt=0%2C5&as_vis=1
Related
I am training a 3 layer conv autoencoder and reconstruction using 3 layer conv decoder.
issue is my both training loss and validation loss keep increasing.
Is the network not able to learn more generic feature as it is small ?
or there could be more reason
6k images with constant learning rate (saffron color validation, blue color training)
since the first learning curve shown at some point the model error shoot, i have tried a dynamic learning rate deceasing over epochs,and with more images 10k
10k image with learning rate decreasing
So my question is, am i simplifying my network ? should I add more layers/neuron to my network ?
After increasing a layer also the number of neuron per layer, i think i have better learning rate,
sharing the result.
with more number of layers and neurons
Conclusion: My hypothesis was right, My network was over simplified, having a deeper layer curve.
I have designed convolutional neural network(tf. Keras) which has few parallel convolutional units with different kernal sizes. Then, each output results of that convolution layers are fed into another convolutional units which are in parallel. Then all the outputs are concatenated. Next flattening is done. After that I added fully connected layer and connected to the final softmax layer for multi class classification. I trained it and had good results in validation test.
However I remove the fully connected layer and accuracy was higher than the previous.
Please someone can explain, how does it happen, it will be very helpful.
Thank you for your valuable time.
Parameters as follows.
When you remove a layer, your model will have less chance of over-fitting the training set. Consequently, by making the network shallower, you make your model more robust to unknown examples and the validation accuracy increases.
Since your training accuracy is also increasing, it can be an indication that -
Exploding or vanishing gradients. You can try solving this problem using careful weight initialization, proper regularization, adding shortcuts, or gradient clipping.
You are not training for enough epochs to learn a deeper network. You can try few more epochs.
You do not have enough data to train a deeper network.
Eliminating the dense layer reduces the tendency for over fitting. Therefore your validation loss should improve as long as your model's training accuracy remains high. Alternatively you can add an additional dropout layer. You can also reduce the tendency for over fitting by using regularizers. Documentation for that is here.
I am creating a CNN to predict the distributed strain applied to an optical fiber from the measured light spectrum (2D), which is ideally a Lorentzian curve. The label is a 1D array where only the strained section is non-zero (the label looks like a square wave).
My CNN has 10 alternating convolution and pooling layers, all activated by RelU. This is then followed by 3 fully-connected hidden layers with softmax activation, then an output layer activated by RelU. Usually, CNNs and other neural networks make use of RelU for hidden layers and then softmax for output layer (in the case of classification problems). But in this case, I use softmax to first determine the positions of optical fiber for which strain is applied (i.e. non-zero) and then use RelU in the output for regression. My CNN is able to predict the labels rather accurately, but I cannot find any supporting publication where softmax is used in hidden layers, followed by RelU in the output layer; nor why this approach is conversely not recommended (i.e. not mathematically possible) other than those I found in Quora/Stackoverflow. I would really appreciate if anyone could enlighten me on this matter as I am pretty new to deep learning, and wish to learn from this. Thank you in advance!
If you look at the way a layer l sees the input from a previous layer l-1, it is assuming that the dimensions of the feature vector are linearly independent.
If the model is building some kind of confidence using a set of neurons, then the neurons better be linearly independent otherwise it is simply exaggerating the value of 1 neuron.
If you apply softmax in hidden layers then you are essentially combining multiple neurons and tampering with their independence. Also if you look at the reasons why ReLU is preferred is because it can give you a better gradients which other activations like sigmoid won't. Also if your goal is too add normalization to your layers, you’d be better off with using an explicit batch normalization layer
I had had an overwiew of how neural networks work and have come up with some interconnected questions, on which I am not able to find an answer.
Considering one-hidden-layer feedforward neural network: if the function for each of the hidden-layer neurons is the same
a1 = relu (w1x1+w2x2), a2=relu(w3x1+w4x2), ...
How do we make the model learn different values of weights?
I do undestand the point of manually-established connections between neurons. As shown on the picture Manually established connections between neurons, that way we define the possible functions of functions (i.e., house size and bedrooms number taken together might represent a possible family size which the house would accomodate). But the fully-connected network doesn't make sense to me.
I get the point that a fully-connected neural network should somehow automatically define, which functions of functions make sense, but how does it do it?
Not being able to answer this question, I don't also understand why should increasing the number of neurons increase the accuracy of model prediction?
How do we make the model learn different values of weights?
By initializing the parameters before training starts. In case of a fully connected neural network otherwise we would have the same update step on each parameter - that is where your confusion is coming from. Initialization, either randomly or more sophisticated (e.g. Glorot) solves this.
Why should increasing the number of neurons increase the accuracy of the model prediction?
This is only partially true, increasing the number of neurons should improve your training accuracy (it is a different game for your validation and test performance). By adding units your model is able to store additional information or incorporate outliers into your network, and hence improve the accuracy of the prediction. Think of a 2D problem (predicting house prizes per sqm over sqm of some property). With two parameters you can fit a line, with three a curve and so on, the more parameters the more complex your curve can get and fit through each of your training points.
Great next step for a deep dive - Karpathy's lecture on Computer Vision at Stanford.
I am currently working on some project related to machine learning.
I extracted some features from the object.
So I train and test that features with NB, SVM and other classification algorithms and got result about 70 to 80 %
When I train the same features with neural networks using nolearn.dbn and then test it I got about 25% correctly classified. I had 2 hidden layers.
I still don't understand what is wrong with neural networks.
I hope to have some help.
Thanks
Try increasing the number of hidden units and the learning rate. The power of neural networks comes from the hidden layers. Depending on the size of your dataset, the number of hidden layers can go upto a few thousands. Also, please elaborate on the kind, and number of features you're using. If the feature set is small, you're better off using SVMs and RandomForests instead of neural networks.