Reinforcement learning (q-learning) evaluation - python

I am new to reinforcement learning, and currently I am working on a small q-learning project but I am a little confused?
1- what is the testing phase of a q-learning model, and how do we make a prediction (try it on single, unseen data) with it? at this point I have created the needed function for choosing action and getting reward ...etc, and I was able to run a 10000 episodes, but I believe this is the training phase.
2- what are the metrics that we use to say that our model has learned and performed well or not? something like accuracy in classification scheme for example.
Thank you.

Related

How to confirm convergence of LSTM network?

I am using LSTM for time-series prediction using Keras. I am using 3 LSTM layers with dropout=0.3, hence my training loss is higher than validation loss. To monitor convergence, I using plotting training loss and validation loss together. Results looks like the following.
After researching about the topic, I have seen multiple answers for example ([1][2] but I have found several contradictory arguments on various different places on the internet, which makes me a little confused. I am listing some of them below :
1) Article presented by Jason Brownlee suggests that validation and train data should meet for the convergence and if they don't, I might be under-fitting the data.
https://machinelearningmastery.com/diagnose-overfitting-underfitting-lstm-models/
https://machinelearningmastery.com/learning-curves-for-diagnosing-machine-learning-model-performance/
2) However, following answer on here suggest that my model is just converged :
How do we analyse a loss vs epochs graph?
Hence, I am just bit confused about the whole concept in general. Any help will be appreciated.
Convergence implies you have something to converge to. For a learning system to converge, you would need to know the right model beforehand. Then you would train your model until it was the same as the right model. At that point you could say the model converged! ... but the whole point of machine learning is that we don't know the right model to begin with.
So when do you stop training? In practice, you stop when the model works well enough to do what you want it to do. This might be when validation error drops below a certain threshold. It might just be when you can't afford any more computing power. It's really up to you.

Applying "reinforcement learning" on a supervised learning model

Is it possible to use "reinforcement learning" or a feedback loop on a supervised model?
I have worked on a machine learning problem using a supervised learning model, more precisely a linear regression model, but I would like to improve the results by creating a feedback loop on the outputs of the prediction, i.e, tell the algorithm if it made mistakes on some examples.
As I know, this is basically how reinforcement learning works: the model learns from positive and negative feedbacks.
I found out that we can implement supervised learning and reinforcement learning algorithms using PyBrain, but I couldn't find a way to relate between both.
Most (or maybe all) iterative supervised learning methods already use a feedback loop on the outputs of the prediction. If fact, this feedback is very informative since it provides information with the exact amount of error in each sample. Think for example in stochastic gradient descent, where you compute the error of each sample to update the model parameters.
In reinforcement learning the feedback signal (i.e., reward) is much more limited than in supervised learning. Therefore, in the typical setup of adjusting some model parameters, if you have a set of input-output (i.e., a training data set), probably it has no sense to apply reinforcement learning.
If you are thinking on a more specific case/problem, you should be more specific in your question.
Reinforcement Learning has been used to tune hyper-parameters and/or select optimal Supervised Learning Models. There's also a paper on it: "Learning to optimize with Reinforcement Learning".
Reading Pablo's answer you may want to read up on "back propagation". It may be what you are looking for.

What kind of tricks we can play with to further refine the trained neural network model so that it has lower objective function value?

I ask this question because many deep learning frameworks, such as Caffe, supports model refining function. For example, in Caffe, we can use snapshot to initialling the neural network parameters and then continue performing training as the following command shows:
./caffe train -solver solver_file.prototxt -snapshot snap_file.solverstate
In order to further train the model, the following tricks I can play with:
use smaller learning rate
change optimisation method. For example, change stochastic gradient descent to ADAM algorithm
Any other tricks I can play with?
ps: I understand that reducing the loss function value of the training samples does not mean that we can get a better model.
The question is way too broad, I think. However, this is a common practice, especially in case of a small training set. I would rank possible methods like this:
smaller learning rate
more/different data augmentation
add noise to train set (related to data augmentation, indeed)
fine-tune on subset of the training set.
The very last one is indeed a very powerful method to finalize the model that performs poor on some corner cases. You can then make a 'difficult' train subset in order to bias model towards it. I personally use it very often.

TensorFlow RandomForest vs Deep learning

I am using TensorFlow for training model which has 1 output for the 4 inputs. The problem is of regression.
I found that when I use RandomForest to train the model, it quickly converges and also runs well on the test data. But when I use a simple Neural network for the same problem, the loss(Random square error) does not converge. It gets stuck on a particular value.
I tried increasing/decreasing number of hidden layers, increasing/decreasing learning rate. I also tried multiple optimizers and tried to train the model on both normalized and non-normalized data.
I am new to this field but the literature that I have read so far vehemently asserts that the neural network should marginally and categorically work better than the random forest.
What could be the reason behind non-convergence of the model in this case?
If your model is not converging it means that the optimizer is stuck in a local minima in your loss function.
I don't know what optimizer you are using but try increasing the momentum or even the learning rate slightly.
Another strategy employed often is the learning rate decay, which reduces your learning rate by a factor every several epochs. This can also help you not get stuck in a local minima early in the training phase, while achieving maximum accuracy towards the end of training.
Otherwise you could try selecting an adaptive optimizer (adam, adagrad, adadelta, etc) that take care of the hyperparameter selection for you.
This is a very good post comparing different optimization techniques.
Deep Neural Networks need a significant number of data to perform adequately. Be sure you have lots of training data or your model will overfit.
A useful rule for beginning training models, is not to begin with the more complex methods, for example, a Linear model, which you will be able to understand and debug more easily.
In case you continue with the current methods, some ideas:
Check the initial weight values (init them with a normal distribution)
As a previous poster said, diminish the learning rate
Do some additional checking on the data, check for NAN and outliers, the current models could be more sensitive to noise. Remember, garbage in, garbage out.

How to do Incremental Learning with Theano?

I have a model which is trained on the large training corpus. I also have a feedback loop which is providing me the feedback from the users. Model is built on top of Theano and Python.
How can I add this feedback into my model? Right now I am thinking about two approaches :
Add mini-batch to the training corpus and training it again. This is straight forward but it will take a lot of time to train.
Use the saved state of trained model and just train on the mini-batch. This looks promising but right now stuck in how to do it with Theano.
Can someone help me for the second case?

Categories