Training in Keras with external evaluation function - python

Let me first describe the setup:
We have an autonomous agent in Unity, whose decisions are based on the perceived environment(level) and some pre-defined parameters for value mapping. Our aim is to pre-train the agents' parameters in a DNN. So the idea is basically to define an error metric to evaluate the performance of the agent in a Unity simulation (run the level, e.g. measure the deviation from the optimal trajectory = ground truth in unity). So based on the input level into the DNN, the network should train to output the params, the simulation is performed and the error is passed back to the network as the error value, like accuracy, so the network could train based on that error/performance.
Is there any way to perform the evaluation(comparison to the ground truth) during the training outside of Keras? Usually, one passes X data to the network, train stuff and compare it to the ground truth Y. This works fine for predictions, but I don't want to predict something. What I do want is to measure the deviation from the ground truth inside the simulation.
I know there is Unity ML Agents, but as far as I could read, the 'brain' controls the agent on runtime, i.e. update it on every frame and control the movement. What I want is to perform the whole simulation to update the params/weights of the network.
Best wishes.

After some talks at my university:
the setup won't work this way since I need to split the process.
I need the parameters of working agents to train the network based only on the level description(e.g. matrix like video game description language). To obtain the parametrized agents based on the actual level and the ground truth data(e.g. deviation from trajectory), one need to use reinforcement deep learning with a score function to obtain these parameters. Therefore Unity ML Agents might be useful. Afterwards, I can use the parameters settings and the correlating level data to train a network to yield the desired parameters based only on the level description.

Related

Isolation Forest for Intrusion Detection System

I'm currently trying to use Scikit Learn to create a simple anomaly detection snippet.
The program receives a .csv file which then dissects into a Panda's DataFrame.
The Dataframe has 8 columns: 'Src IP'; 'Dst IP'; 'sPort'; 'dPort'; 'Protocol'; 'Load'; 'Packets'; 'TCP Flag'.
I fit the data into an IsolationForest like so:
iForest = IsolationForest(n_estimators=128, max_samples='auto', max_features=1, behaviour='new', contamination='auto', random_state=None, n_jobs=-1, verbose=0, bootstrap=True)
usecols=["Src IP","Dst IP","sPort","dPort","Protocol","Load","Packets","TCP Flags"]
iForest.fit(data[usecols])
And then get the Outliers/anomalies from the IForest:
pred = iForest.predict(data[usecols])
data['anomaly']=pred
outliers=data.loc[data['anomaly']==-1]
It all works well, however, my question here is:
How can I use Isolation Forest to detect anomalies on the network, while being independent on the 'contamination' property?
In an IDS, having a low False Positive rate is crucial. In my case i'm somewhat deciding what entries are 'contaminated' by choosing a percentage.
My goal is to make Isolation Forest set the contamination factor automatically, knowing that if x.csv is 100% clear of contamination, then find the % of contamination on y.csv.
This should be a part of an Hybrid IDS that uses both signature analysis and behavior to detect intrusions based on Flow Data (NetFlow).
TLDR: IsolationForest needs to receive a clean .csv (no contamination) to then detect anomalies on a new set of data (another .csv or pipe data). How is that possible using ScikitLearn?
If you have a training set which contains only normal data, then set contamination=0. To chose an appropriate threshold for anomalies, use a validation set and plot the histogram of anomaly scores. Without labeled data this can only be done heuristically:
For maximum True Positive (but sacrificing False Positives), set the threshold based on a budget of what resources you have for looking into Positives. You can compute this based on the inbound data rate, the expected Positives from the histogram statistics, and a cost (time/money) per evaluation.
For minimizing False Positives, set the threshold to be a bit outside the existing scores. The assumption is then that the training/validation contains practically no anomalies, and anything new and different is anomalous. Sometimes this is called novelty detection.
If it is possible to determine by looking at the data whether something was a true or false anomaly, I would recommend doing that for some 10-100 of the items with the highest anomaly score. This will usually be very fast compared to labeling all the data, and can help estimating the False Positive Rate.
When you put this model into production, your protocol for acting on anomalies should ensure that cases are evaluated and scored as anomaly/not. Then this is your future labeled validation/test data, which you can use to adjust the thresholds.
If you do have labeled anomaly/not anomaly in the validation/testset (but not in training), you can use this to optimize the threshold to maximize/minimize the desired metric using a hyper-parameter search.

How to learn a hidden model variable (task variance) in tensorflow that has no ground truth data

I am working with a faster-rcnn type of system where automated focal loss was recently implemented from https://arxiv.org/pdf/1904.09048.pdf
In the above-linked paper in section 3.4. Regression it states
We assume that the labels are distributed around the actual correct ground truth by a Gaussian distribution with a variance of σ^2.
and
However, to correctly compute the cumulative distribution function the variance σ^2 of the task needs to be estimated. [...] training the variable σ^2 like a weight of the network.
I do not have data for the task variance σ^2.
I do not fully understand how it can be learned without having data for it.
Should I simply make the variable trainable and assume that the optimize knows what to do?

Neural network regression with multi-value (probabilistic) functions

I'm a bit of a beginner in the art of machine learning. Here is a rather conceptual question I've been wondering:
Suppose I have a function X->Y, say y=x^2, then, generating enough data of X->Y, I can train a neural network to perform regression on the function, and get x^2 with any input x. This is basically also what the Universal Approximation Theorem suggests.
Now, my question is, what if I want the inverse relation, Y->X? In this case, Y is a multi-valued function of X, for instance for X>0, x=+-sqrt(y). I can swap X and Y as input/output data to train the network alright, but for any given y, there should be a random 1/2 - 1/2 chance that x=sqrt(y) and x=-sqrt(y). But of course, if one trains it with min-squared-error, the network wouldn't know this is a multi-value function, and would just follow SGD on the loss function and get x=0, the average value, for any given y.
Therefore, I wonder if there is any way a neural network can model a multi-valued function? For instance, my guess would be
(1) the neural network can output a collection of, say, the top 2 possible values for X and train it with cross-entropy. The problem is, if X is a vector or even a matrix (like a bit-map image) instead of a number, we don't know how many solutions Y=X has (which could very well be an infinite number, i.e. a continuous range), so a "list" of possible values and probabilities won't work - ideally the neural network should output values randomly and continuously distributed across possible X solutions.
(2) perhaps does this fall into the realm of probabilistic neural networks (PNN)? Does PNN model functions that support a given probabilistic distribution (continuous or discrete) of vectors as its output? If so, is it possible to implement PNN with popular frameworks like Tensorflow+Keras?
(Also, note that this is different from a "multivariate" function, which is the case where X,Y could be multi-component vectors, which is still something a traditional network can easily train on. The actual problem in question here is where the output could be a probabilistic distribution of vectors, which is something that a simple feed-forward network doesn't capture, since it doesn't have the inherent randomness.)
Thank you for your kind help!
Image of forward function Y=X^2 (can be easily modeled by network with regression)
Image of inverse function X=+-sqrt(Y) (the network cannot capture the two-value function and outputs the average value X=0 for any Y)
Try to read the following paper:
https://onlinelibrary.wiley.com/doi/abs/10.1002/ecjc.1028
Mifflin's algorithm (or its more general version SLQP-GS) mentioned in this paper is available here and corresponding paper with description is here.

Implement simple PPO Agent in TensorFlow

I read this good article about the Proximal Policy Optimization algorithm, and now I want update my VanillaPG agent to a PPO agent to learn more about it. However, I'm still not sure how to implement this in real code, especially since I'm using a simple discrete action space.
So what I do with my VPG Agent is, if there are 3 actions, the network outputs 3 values (out), on which I use softmax (p) and use the result as a distribution to choose one of the actions. For training, I take the states, actions and advantages and use this loss function:
loss = -tf.reduce_sum(advantages * tf.log(ch_action_p_values))
How can I extend this algorithm to use PPO for discrete actions? All of the implementations I found work with continuous actions spaces. I'm not sure if I have to change my loss function to the first one used in the article. And I'm not even sure of which probabilities I have to calculate KLD. Are prob_s_a_* and D_KL single values for the whole batch, or one value for each sample? How can I calculate them in TF for my agent?
You should be able to do it also with discrete state without any problem (I never tried, though). The probability prob_s_a_* you are talking about are the probabilities of drawing the sampled actions with the current policy (one value per sample).
PPO does not use D_KL (the KL divergence), as from its experiments it performed worse (they just clip the probabilities ratio).
So you need just to add a placeholder for the old log prob and clip the ratio between the new log prob (tf.log(ch_action_p_values)) and the old log ones.
Here is an example (e_clip is the clipping value, in the paper they use 0.2)
vanilla_loss = -tf.reduce_sum(advantages * tf.log(ch_action_p_values))
old_log_probs = tf.placeholder(...)
log_probs = tf.log(ch_action_p_values)
prob_ratio = tf.exp(log_prob - old_log_probs)
clip_prob = tf.clip_by_value(prob_ratio, 1.-e_clip, 1.+e_clip)
ppo_loss = -tf.reduce_mean(tf.minimum(tf.multiply(prob_ratio, advantages), tf.multiply(clip_prob, advantages)))
Beside the usual advantages and ch_action_p_values, you need to feed the loss with old_log_probs, computed as the log probability of the current policy on the sampled actions.

TensorFlow neural network as an API?

I am in the process of writing an AI for the game 2048. At the moment, I can pull the game state from the browser and send moves to the game, but I don't know how to integrate that with TensorFlow. The nature of the project isn't conducive to training data, so I was wondering if it's possible to pass in the state of the game, have the network chuck out a move, run the move, repeat until the game is over, and then have it do the training?
This is certainly possible and trivial. You'll have to set up the model you want to use and I will assume that's been built.
From the perspective of interacting with a tensorflow model you just need to marshal your data into numpy arrays to pass in via the feed_dict property of sess.run.
To pass an input to tensorflow and get a result you would run something like this:
result = sess.run([logits], feed_dict={x:input_data})
This would perform a forward pass producing the output of the model without making any update. Now you'll take the results and use them to take the next step in the game.
Now that you have the result of your action (e.g. labels) you can perform an update step:
sess.run([update_op], feed_dict={x:input_data, y:labels})
It's as simple as that. Notice that your model will have an optimizer defined (update_op in this example), but if you don't ask tensorflow to compute it (as in the first code sample) no updates will occur. Tensorflow is all about a dependency graph. The optimizer is dependent on the output logits, but computing logits is not dependent on the optimizer.
Presumably you'll initialize this model randomly, so the first results will be randomly generated, but each step after that will benefit from the previous updates being applied.
If you're using a reinforcement learning model then you would only produce a reward at some indeterminant time in the future and when you run the update would vary a little from this example, but the general nature of the problem remains the same.

Categories