I have recently completed a "pretty good" TF2/keras model for image recog using a number of layers, SGD optimization AND starting with a MobileNetv2 pre-trained model.
I could tweak this forever: adding/removing layers, different optimization algos, learning rate, momentum, various dataset augmentation, etc. And I haven't even considered starting with other pre-trained models. I changed the optimizer from SGD to ADAM (which should be better, right?) and it was slightly more inaccurate.
So, how do I converge on a better pre-trained model, parameters, values? Is it just trial-and-error? It takes about 45min to train my model (10 epochs), which seems forever when I'm tweaking so many variables.
I think I could write a python framework to plug in various training attributes and then just let it run for a couple days.
I dont know if this is a suitable SO question or not.
This problem is called hyperparameter tuning (or optimization). You can decide to do this manually or by using search technique like grid search over all your parameters.
There is also more advanced techniques that use Bayesian optimization to automate this process.
One common and established tool for doing hyperparameter optimization in the ML community is called hyperopt.
https://github.com/hyperopt/hyperopt
Hyperopt is a Python library for serial and parallel optimization over awkward search spaces, which may include real-valued, discrete, and conditional dimensions.
Also, since you tagged Keras in the question, there is a tool called auto keras which also searches for hyperparameters https://autokeras.com/
Auto-Keras provides functions to automatically search for architecture and hyperparameters of deep learning models.
Related
R's package 'forecast' has a function nnetar, which uses feed-forward neural networks with a single hidden layer to predict in time series.
Now I am using Python to do the similar analysis. I want to use neural network which does not need to be as complex as deep learning. Maybe 2 layers and a couple of nodes are good enough for my case.
So, does Python have a model of simple neural networks which can be used in time series lik nnetar? If not, how to deal with this problem?
Any NN model that uses 1 or more hidden layers is a multi-layer perceptron model, and for that case it is trivial to make it extendable to N layers. So any library that you pick will support it. My guess for you not picking a complex library like pytorch/Tensorflow is its size.
Tensorflow does have TF-Lite which can work for smaller IOT devices.
Sklearn does have MLPRegressor that can train NNs if that is more to your liking.
You can always write your model. There are plenty of examples for this that use numpy and are plenty fast for cpu computation.( Single Hidden layer NN I am guessing will be more memory bound than computation bound)
Use another ML algorithm. Single Hidden layer NNs will not perform nearly as well as other other simpler algorithms.
If there are other reasons for not using a standard library like tensorflow/pytorch then you should mention them.
I have a basic question. I want to know whether it is possible to use Keras (e.g. the functional API) to specify a neural network model, and then use the Keras optimization routines to train the network outside of the Keras training process. In other words, use Keras purely to specify a neural network, pull the weights out and put them into a loss function (outside the Keras APIs if necessary) then use one of the built-in optimization routines purely to minimize the loss function over a single batch of data (no multiple epochs etc. or anything beyond minimization of the loss function with regard to one set of data at this point).
My reason for wanting to do this is that I would like to use a dynamic optimization process that changes from batch iteration to batch iteration, which seems difficult to implement entirely within the Keras APIs.
I ask this question because many deep learning frameworks, such as Caffe, supports model refining function. For example, in Caffe, we can use snapshot to initialling the neural network parameters and then continue performing training as the following command shows:
./caffe train -solver solver_file.prototxt -snapshot snap_file.solverstate
In order to further train the model, the following tricks I can play with:
use smaller learning rate
change optimisation method. For example, change stochastic gradient descent to ADAM algorithm
Any other tricks I can play with?
ps: I understand that reducing the loss function value of the training samples does not mean that we can get a better model.
The question is way too broad, I think. However, this is a common practice, especially in case of a small training set. I would rank possible methods like this:
smaller learning rate
more/different data augmentation
add noise to train set (related to data augmentation, indeed)
fine-tune on subset of the training set.
The very last one is indeed a very powerful method to finalize the model that performs poor on some corner cases. You can then make a 'difficult' train subset in order to bias model towards it. I personally use it very often.
I am currently using Tensorflow Object Detection API for my human detection app.
I tried filtering in the API itself which worked but I am still not contended by it because it's slow. So I'm wondering if I could remove other categories in the model itself to also make it faster.
If it is not possible, can you please give me other suggestions to make the API faster since I will be using two cameras. Thanks in advance and also pardon my english :)
Your questions addresses several topics for using neural network pretrained models.
Theoretical methods
In general, you can always neutralize categories by removing the corresponding neurons in the softmax layer and compute a new softmax layer only with the relevant rows of the matrix.
This method will surely work (maybe that is what you meant by filtering) but will not accelerate the network computation time by much, since most of the flops (multiplications and additions) will remain.
Similar to decision trees, pruning is possible but may reduce performance. I will explain what pruning means, but note that the accuracy over your categories may remain since you are not just trimming, you are predicting less categories as well.
Transfer the learning to your problem. See stanford's course in computer vision here. Most of the times I've seen that works good is by keeping the convolution layers as-is, and preparing a medium-size dataset of the objects you'd like to detect.
I will add more theoretical methods if you request, but the above are the most common and accurate I know.
Practical methods
Make sure you are serving your tensorflow model, and not just using an inference python code. This could significantly accelerate performance.
You can export the parameters of the network and load them in a faster framework such as CNTK or Caffe. These frameworks work in C++/CSharp and can inference much faster. Make sure you load the weights correctly, some frameworks use different order in tensor dimensions when saving/loading (little/big endian-like issues).
If your application perform inference on several images, you can distribute the computation via several GPUs. **This can also be done in tensorflow, see Using GPUs.
Pruning a neural network
Maybe this is the most interesting method of adapting big networks for simple tasks. You can see a beginner's guide here.
Pruning means that you remove parameters from your network, specifically the whole nodes/neurons in a decision tree/neural network (resp). To do that in object detection, you can do as follows (simplest way):
Randomly prune neurons from the fully connected layers.
Train one more epoch (or more) with low learning rate, only on objects you'd like to detect.
(optional) Perform the above several times for validation and choose best network.
The above procedure is the most basic one, but you can find plenty of papers that suggest algorithms to do so. For example
Automated Pruning for Deep Neural Network Compression and An iterative pruning algorithm for feedforward neural networks.
I am using TensorFlow for training model which has 1 output for the 4 inputs. The problem is of regression.
I found that when I use RandomForest to train the model, it quickly converges and also runs well on the test data. But when I use a simple Neural network for the same problem, the loss(Random square error) does not converge. It gets stuck on a particular value.
I tried increasing/decreasing number of hidden layers, increasing/decreasing learning rate. I also tried multiple optimizers and tried to train the model on both normalized and non-normalized data.
I am new to this field but the literature that I have read so far vehemently asserts that the neural network should marginally and categorically work better than the random forest.
What could be the reason behind non-convergence of the model in this case?
If your model is not converging it means that the optimizer is stuck in a local minima in your loss function.
I don't know what optimizer you are using but try increasing the momentum or even the learning rate slightly.
Another strategy employed often is the learning rate decay, which reduces your learning rate by a factor every several epochs. This can also help you not get stuck in a local minima early in the training phase, while achieving maximum accuracy towards the end of training.
Otherwise you could try selecting an adaptive optimizer (adam, adagrad, adadelta, etc) that take care of the hyperparameter selection for you.
This is a very good post comparing different optimization techniques.
Deep Neural Networks need a significant number of data to perform adequately. Be sure you have lots of training data or your model will overfit.
A useful rule for beginning training models, is not to begin with the more complex methods, for example, a Linear model, which you will be able to understand and debug more easily.
In case you continue with the current methods, some ideas:
Check the initial weight values (init them with a normal distribution)
As a previous poster said, diminish the learning rate
Do some additional checking on the data, check for NAN and outliers, the current models could be more sensitive to noise. Remember, garbage in, garbage out.