Evolve a Multi Layer Perceptron using genetic algorithms - python

I want to evolve a neural network using a genetic algorithm in order to approximate mathematical functions (linear, cubic, sine, tanh, etc). The requirement is that the NN should be evolved in terms of topology, weights and activation function of the neurons. The evaluation function is one that will calculate the error between the output of the NN and the function that should be approximated (mean squared error).
My main concern is that I want to be able to have absolute control over the representation/encoding/genome as I want to evolve layers of neurons, their weights and their activation functions at the same time, so I have to come up with a chromosome that incorporates all of these things.
What tools would you recommend? I'm checking DEAP for evolving the neural network and I was thinking of PyLearn 2 for implementing the actual best configuration the GA will result in.

The state-of-the-art for neural nets is Torch7 that is written in Lua. This is what Facebook AI and Google DeepMind use (as well as my lab). But I guess lua is not the best language to implement the sampling algorithms.
The alternative state-of-the-art framework is written in python it is called Theano, but I have a strong preference towards Torch. However, both of them might be too much for your task.
An easy alternative could also be to use Autograd, an automatic numpy differentiation library https://github.com/HIPS/autograd to easily train your neural net and have absolute control over it. There are several examples in their documentation.
Finally, genetic algorithms is another name for Monte Carlo, and there are many resources around implementing such things python (ex. https://codereview.stackexchange.com/questions/41004/organising-code-for-a-genetic-algorithm-in-python or github).
DEAP sounds a good and popular choice, and PyEvolve is outdated.
Edit: I forgot to mention Keras (http://keras.io/) which actually might be another good alternative and is Theano based.

You might also look at Blocks. It's a library built on theano that manages the computation graph via annotations of variables. It allows richer (ie, more complex, more recurrancy, more of those weird neural-stack-like structures you daydreamed) models than libraries like keras, pylearn2, and lasgne. Although those libraries might be better if you only want to 'evolve' the variations of the classical MLP architecture.

Related

What's the difference between scikit-learn and tensorflow? Is it possible to use them together?

I cannot get a satisfying answer to this question. As I understand it, TensorFlow is a library for numerical computations, often used in deep learning applications, and Scikit-learn is a framework for general machine learning.
But what is the exact difference between them, what is the purpose and function of TensorFlow? Can I use them together, and does it make any sense?
Your understanding is pretty much spot on, albeit very, very basic. TensorFlow is more of a low-level library. Basically, we can think of TensorFlow as the Lego bricks (similar to NumPy and SciPy) that we can use to implement machine learning algorithms whereas Scikit-Learn comes with off-the-shelf algorithms, e.g., algorithms for classification such as SVMs, Random Forests, Logistic Regression, and many, many more. TensorFlow really shines if we want to implement deep learning algorithms, since it allows us to take advantage of GPUs for more efficient training. TensorFlow is a low-level library that allows you to build machine learning models (and other computations) using a set of simple operators, like “add”, “matmul”, “concat”, etc.
Makes sense so far?
Scikit-Learn is a higher-level library that includes implementations of several machine learning algorithms, so you can define a model object in a single line or a few lines of code, then use it to fit a set of points or predict a value.
Tensorflow is mainly used for deep learning while Scikit-Learn is used for machine learning.
Here is a link that shows you how to do Regression and Classification using TensorFlow. I would highly suggest downloading the data sets and running the code yourself.
https://stackabuse.com/tensorflow-2-0-solving-classification-and-regression-problems/
Of course, you can do many different kinds of Regression and Classification using Scikit-Learn, without TensorFlow. I would suggesting reading through the Scikit-Learn documentation when you have a chance.
https://scikit-learn.org/stable/user_guide.html
It's going to take a while to get through everything, but if yo make it to the end, you will have learned a ton!!! Finally, you can get the 2,600+ page user guide for Scikit-Learn from the link below.
https://scikit-learn.org/stable/_downloads/scikit-learn-docs.pdf
The Tensorflow is a library for constructing Neural Networks. The scikit-learn contains ready to use algorithms. The TF can work with a variety of data types: tabular, text, images, audio. The scikit-learn is intended to work with tabular data.
Yes, you can use both packages. But if you need only classic Multi-Layer implementation then the MLPClassifier and MLPRegressor available in scikit-learn is a very good choice. I have run a comparison of MLP implemented in TF vs Scikit-learn and there weren't significant differences and scikit-learn MLP works about 2 times faster than TF on CPU. You can read the details of the comparison in my blog post.
Below the scatter plots of performance comparison:
Both are 3rd party machine learning modules, and both are good at it.
Tensorflow is the more popular of the two.
Tensorflow is typically used more in Deep Learning and Neural Networks.
SciKit learn is more general Machine Learning.
And although I don't think I've come across anyone using both simultaneously, no one is saying you can't.

Do Python have a model which is similar to nnetar in R's package forecast?

R's package 'forecast' has a function nnetar, which uses feed-forward neural networks with a single hidden layer to predict in time series.
Now I am using Python to do the similar analysis. I want to use neural network which does not need to be as complex as deep learning. Maybe 2 layers and a couple of nodes are good enough for my case.
So, does Python have a model of simple neural networks which can be used in time series lik nnetar? If not, how to deal with this problem?
Any NN model that uses 1 or more hidden layers is a multi-layer perceptron model, and for that case it is trivial to make it extendable to N layers. So any library that you pick will support it. My guess for you not picking a complex library like pytorch/Tensorflow is its size.
Tensorflow does have TF-Lite which can work for smaller IOT devices.
Sklearn does have MLPRegressor that can train NNs if that is more to your liking.
You can always write your model. There are plenty of examples for this that use numpy and are plenty fast for cpu computation.( Single Hidden layer NN I am guessing will be more memory bound than computation bound)
Use another ML algorithm. Single Hidden layer NNs will not perform nearly as well as other other simpler algorithms.
If there are other reasons for not using a standard library like tensorflow/pytorch then you should mention them.

Is it possible to remove categories in a pretrained tensorflow model?

I am currently using Tensorflow Object Detection API for my human detection app.
I tried filtering in the API itself which worked but I am still not contended by it because it's slow. So I'm wondering if I could remove other categories in the model itself to also make it faster.
If it is not possible, can you please give me other suggestions to make the API faster since I will be using two cameras. Thanks in advance and also pardon my english :)
Your questions addresses several topics for using neural network pretrained models.
Theoretical methods
In general, you can always neutralize categories by removing the corresponding neurons in the softmax layer and compute a new softmax layer only with the relevant rows of the matrix.
This method will surely work (maybe that is what you meant by filtering) but will not accelerate the network computation time by much, since most of the flops (multiplications and additions) will remain.
Similar to decision trees, pruning is possible but may reduce performance. I will explain what pruning means, but note that the accuracy over your categories may remain since you are not just trimming, you are predicting less categories as well.
Transfer the learning to your problem. See stanford's course in computer vision here. Most of the times I've seen that works good is by keeping the convolution layers as-is, and preparing a medium-size dataset of the objects you'd like to detect.
I will add more theoretical methods if you request, but the above are the most common and accurate I know.
Practical methods
Make sure you are serving your tensorflow model, and not just using an inference python code. This could significantly accelerate performance.
You can export the parameters of the network and load them in a faster framework such as CNTK or Caffe. These frameworks work in C++/CSharp and can inference much faster. Make sure you load the weights correctly, some frameworks use different order in tensor dimensions when saving/loading (little/big endian-like issues).
If your application perform inference on several images, you can distribute the computation via several GPUs. **This can also be done in tensorflow, see Using GPUs.
Pruning a neural network
Maybe this is the most interesting method of adapting big networks for simple tasks. You can see a beginner's guide here.
Pruning means that you remove parameters from your network, specifically the whole nodes/neurons in a decision tree/neural network (resp). To do that in object detection, you can do as follows (simplest way):
Randomly prune neurons from the fully connected layers.
Train one more epoch (or more) with low learning rate, only on objects you'd like to detect.
(optional) Perform the above several times for validation and choose best network.
The above procedure is the most basic one, but you can find plenty of papers that suggest algorithms to do so. For example
Automated Pruning for Deep Neural Network Compression and An iterative pruning algorithm for feedforward neural networks.

Can I use SQP(Sequential quadratic programming) in scipy for neural network regression optimization?

As title, after training and testing my neural network model in python.
Can I use SQP function in scipy for neural network regression problem optimization?
For example, I am using temperature,humid,wind speed ,these three feature for input,predicting energy usage in some area.
So I use neural network to model these input and output's relationship, now I wanna know some energy usage lowest point, what input feature are(i.e. what temperature,humid,wind seed are).This just example so may sound unrealistic.
Because as far as I know, not so many people just use scipy for neural network optimization. But in some limitation , scipy is the most ideal optimization tool what I have by now(p.s.: I can't use cvxopt).
Can someone give me some advice? I will be very appreciate!
Sure, that's possible, but your question is too broad to give a complete answer as all details are missing.
But: SLSQP is not the right tool!
There is a reason, NN training is dominated by first-order methods like SGD and all it's variants
Fast calculation of gradients and easy to do in mini-batch mode (not paying for the full gradient; less memory)
Very different convergence theory for Stochastic-Gradient-Descent which is usually much better for large-scale problems
In general: fast iteration speed (e.g. time per epoch) while possibly needing more epochs (for full convergence)
NN is unconstrained continuous optimization
SLSQP is a very general optimization able to tackle constraints and you will pay for that (performance and robustness)
LBFGS is actually the only tool (which i saw) sometimes used to do that (and also available in scipy)
It's a bound-constrained optimizer (no general constraints as SLSQP)
It approximates the inverse-hessian and therefore memory-usage is greatly reduced compared to BFGS and also SLSQP
Both methods are full-batch methods (opposed to the online/minibatch nature of SGD
They are also using Line-searches or something similar which results less hyper-parameters to tune: no learning-rates!
I think you should stick to SGD and it's variants.
If you want to go for the second-order approach: learn from sklearn's implementation using LBFGS

Support Vector Regression with High Dimensional Output using python's libsvm

I would like to ask if anyone has an idea or example of how to do support vector regression in python with high dimensional output( more than one) using a python binding of libsvm? I checked the examples and they are all assuming the output to be one dimensional.
libsvm might not be the best tool for this task.
The problem you describe is called multivariate regression, and usually for regression problems, SVM's are not necessarily the best choice.
You could try something like group lasso (http://www.di.ens.fr/~fbach/grouplasso/index.htm - matlab) or sparse group lasso (http://spams-devel.gforge.inria.fr/ - seems to have a python interface), which solve the multivariate regression problem with different types of regularization.
Support Vector Machines as a mathematical framework is formulated in terms of a single prediction variable. Hence most libraries implementing them will reflect this as using one single target variable in their API.
What you could do is train a single SVM model for each target dimension in your data.
on the plus side, you can train them in // on a cluster as each model is independent of one another
on the minus side, sub-models will share nothing and won't benefit from what they individually discovered in the structure of the input data and potentially need a lot of memory to store the model as they will have no shared intermediate representation
Variant of SVMs can probably be devised in a multi-task learning setting to learn some common kernel-based intermediate representation suitable for reuse to predict multi-dimensional targets however this is not implemented in libsvm AFAIK. Google for multi task learning SVM if you want to learn more.
Alternatively, multi-layer perceptrons (a kind of feed forward neural networks) can naturally deal with multi-dimensional outcomes and hence should be better at sharing intermediate representations of the data reused across targets, especially if they are deep enough with the first layers pre-trained in an unsupervised manner using an autoencoder objective function.
You might want to have a look at http://deeplearning.net/tutorial/ for a nice introduction to various neural network architectures and practical tools and examples to implement them efficiently.

Categories