I am using the following neural net classifier in python
from sknn.mlp import Layer,Classifier
nn = mlp.Classifier(
layers=[
mlp.Layer("Tanh", units=n_feat/8),
mlp.Layer("Sigmoid", units=n_feat/16),
mlp.Layer("Softmax", units=n_targets)],
n_iter=50,
n_stable=10,
batch_size=25,
learning_rate=0.002,
learning_rule="momentum",
valid_size=0.1,
verbose=1)
which is working just fine.My question is that how to proceed if I require for example 100,200 or 500 hidden layers? Do I have to specify each layer here manually or someone has better Idea in python for MLP?
You could create some loop-based mechanism to build the list of layers I suppose, but there's a bigger issue here. A standard MLP with hundreds of layers is likely to be extremely expensive to train - both in terms of computational speed as well as memory usage. MLPs typically only have one or two hidden layers, or occasionally a few more. But for problems that can truly benefit from more hidden layers, it becomes important to incorporate some of the lessons learned in the field of deep learning. For example, for object classification on images, using all fully-connected layers is incredibly inefficient, because you're interested in identifying spatially-local patterns, and therefore interactions between spatially-distant pixels or regions is largely noise. (This is a perfect case for using a deep convolutional neural net.)
Although some very deep networks have been created, it's worth pointing out that even Google's very powerful Inception-v3 model is only 42-layers deep. Anyway, if you're interested in building deep models, I'd recommend reading this Deep Learning book. From what I've read of it, it seems to be a very good introduction. Hope that helps!
Related
I'm building a Sequential NN model in Keras for binary classification. The training data has about 600,000 rows and 2,000 features, so every epoch and every layer is very time consuming. I believe many of the features are not relevant to the model, and can be dropped altogether, to make the model thinner, so it it would be faster to work with.
I run a simple model with one hidden-layer of 200 neurons. How can I tell which of the features (which are actually the nodes in the input layer) are meaningless, so I could drop them from the data set and re run the model without them?
There is a very big topic in machine learning called feature selection. Though, neural networks are considered to automatically choose the best features for the problem, to an extent, by using their weights, to either consider more or less some of them. Neural networks also need a lot of experience to be tuned correctly. I would definitely suggest you to increase the layers of the network, because you have a lot of data and features and use l1 regularisation, in order to get sparse weights and exclude most of the features. Also, these information are indicative, since I do not know anything about your dataset and your network architecture. At last, I would suggest you to study more about the basics of machine learning and then continue learning about neural networks, before practicing with real data.
I am looking to train a large model (resnet or vgg) for face identification.
Is it valid strategy to train on few faces (1..3) to validate a model?
In other words - if a model learns one face well - is it evidence that the model is good for the task?
point here is that I don't want to spend a week of GPU expensive time only to find out that my model is no good or data has errors or my TF coding has a bug
Short answer: No, because Deep Learning works well on huge amount of data.
Long answer: No. The problem is that learning only one face could overfit your model on that specific face, without learning features not present in your examples. Because for example, the model has learn to detect your face thanks to a specific, very simple, pattern in that face (that's called overfitting).
Making a stupid simple example, your model has learn to detect that face because there is a mole on your right cheek, and it has learn to identify it
To make your model perform well on the general case, you need an huge amount of data, making your model capable to learn different kind of patterns
Suggestion:
Because the training of a deep neural network is a time consuming task, usually one does not train one single neural network at time, but many neural network are trained in parallel, with different hyperparameters (layers, nodes, activation functions, learning rate, etc).
Edit because of the discussion below:
If your dataset is small is quite impossible to have a good performance on the general case, because the neural network will learn the easiest pattern, which is usually not the general/better one.
Adding data you force the neural network to extract good patterns, that work on the general case.
It's a tradeoff, but usually a training on a small dataset would not lead to a good classifier on the general case
edit2: refrasing everything to make it more clear. A good performance on a small dataset don't tell you if your model when trained on all the dataset is a good model. That's why you train to
the majority of your dataset and test/validate on a smaller dataset
For face recognition, usually a siamese net or triplet loss are used. This is an approach for one-shot learning. Which means it could perform really well given only few examples per class (person face here), but you still need to train it on many examples (different person faces). See for example:
https://towardsdatascience.com/one-shot-learning-with-siamese-networks-using-keras-17f34e75bb3d
You wouldn't train your model from scratch but use a pretrained model anyways and fine-tune it for your task
You could also have a look at pretrained face recognition models for better results like facenet
https://github.com/davidsandberg/facenet
R's package 'forecast' has a function nnetar, which uses feed-forward neural networks with a single hidden layer to predict in time series.
Now I am using Python to do the similar analysis. I want to use neural network which does not need to be as complex as deep learning. Maybe 2 layers and a couple of nodes are good enough for my case.
So, does Python have a model of simple neural networks which can be used in time series lik nnetar? If not, how to deal with this problem?
Any NN model that uses 1 or more hidden layers is a multi-layer perceptron model, and for that case it is trivial to make it extendable to N layers. So any library that you pick will support it. My guess for you not picking a complex library like pytorch/Tensorflow is its size.
Tensorflow does have TF-Lite which can work for smaller IOT devices.
Sklearn does have MLPRegressor that can train NNs if that is more to your liking.
You can always write your model. There are plenty of examples for this that use numpy and are plenty fast for cpu computation.( Single Hidden layer NN I am guessing will be more memory bound than computation bound)
Use another ML algorithm. Single Hidden layer NNs will not perform nearly as well as other other simpler algorithms.
If there are other reasons for not using a standard library like tensorflow/pytorch then you should mention them.
I am currently using Tensorflow Object Detection API for my human detection app.
I tried filtering in the API itself which worked but I am still not contended by it because it's slow. So I'm wondering if I could remove other categories in the model itself to also make it faster.
If it is not possible, can you please give me other suggestions to make the API faster since I will be using two cameras. Thanks in advance and also pardon my english :)
Your questions addresses several topics for using neural network pretrained models.
Theoretical methods
In general, you can always neutralize categories by removing the corresponding neurons in the softmax layer and compute a new softmax layer only with the relevant rows of the matrix.
This method will surely work (maybe that is what you meant by filtering) but will not accelerate the network computation time by much, since most of the flops (multiplications and additions) will remain.
Similar to decision trees, pruning is possible but may reduce performance. I will explain what pruning means, but note that the accuracy over your categories may remain since you are not just trimming, you are predicting less categories as well.
Transfer the learning to your problem. See stanford's course in computer vision here. Most of the times I've seen that works good is by keeping the convolution layers as-is, and preparing a medium-size dataset of the objects you'd like to detect.
I will add more theoretical methods if you request, but the above are the most common and accurate I know.
Practical methods
Make sure you are serving your tensorflow model, and not just using an inference python code. This could significantly accelerate performance.
You can export the parameters of the network and load them in a faster framework such as CNTK or Caffe. These frameworks work in C++/CSharp and can inference much faster. Make sure you load the weights correctly, some frameworks use different order in tensor dimensions when saving/loading (little/big endian-like issues).
If your application perform inference on several images, you can distribute the computation via several GPUs. **This can also be done in tensorflow, see Using GPUs.
Pruning a neural network
Maybe this is the most interesting method of adapting big networks for simple tasks. You can see a beginner's guide here.
Pruning means that you remove parameters from your network, specifically the whole nodes/neurons in a decision tree/neural network (resp). To do that in object detection, you can do as follows (simplest way):
Randomly prune neurons from the fully connected layers.
Train one more epoch (or more) with low learning rate, only on objects you'd like to detect.
(optional) Perform the above several times for validation and choose best network.
The above procedure is the most basic one, but you can find plenty of papers that suggest algorithms to do so. For example
Automated Pruning for Deep Neural Network Compression and An iterative pruning algorithm for feedforward neural networks.
Is there a way to learn unsupervised features from set of images. Similar to word2vec or doc2vec, where neural network is learnt and given new document we get its features.
Expecting similar to this example shows that it can load learnt nn-model and predict features for new images.
Is there any simple example how to implement cnn over images and get their features back will help !!
Suppose in this example
If I want to get cnn features for all X_train and X_test ... is there any way?
Also, if we can get weights per layer per image, we can stack them and use as features. In that case is there a way to get the same.
Using those features for unsupervised task would be easier, if we consider them as vectors.
If I correctly understood your question, this task is quite common in a deep learning field. In case of images what I consider the best is a convolutional autoencoder. You may read about this architecture e.g. here
http://people.idsia.ch/~ciresan/data/icann2011.pdf
Previous version of Keras supported this architecture as one of core layers, though from version 1.0 I noticed that it disappeared from documentation. But - it's still quite easy to build it from a scratch :)
In noimage cases there are also another approaches like e.g. Restricted Boltzmann Machines.
UPDATE :
When it comes to what sort of activations are the best for obtaining new features from neural network activations - from my personal experience - it depends on the size of the net which you use. If you use a network which last layer is wide (has a lot of nodes) it might be useful to get only last layer (due to number of parameters if you want to consider also previous layers - it may harm the performance of learning). But - if (like in case of some MNIST networks) your last layer is not sufficient for this task - you may try using also previous layers activation or even all net activity. To be honest - I'm not expecting much of improvement in this case - but you may try. I think that you should use both approaches - starting from taking only last layer activations - and then trying to check the behaviour of your new classifier when you add activations from previous layers.
What I will strongly advise to you is also getting some insights from what sort of features network is learning - using T-SNE embeddings of it activations. In many cases I found it useful - e.g. checking if the size of a layer is sufficient. Using T-SNE you may check if the features obtained from last layer are good discriminators of your classes. It may also give you good insights about your data and what neural networks are really learning (alongside with amazing visualizations :) )