what's the purpose of tf.nn.dropout?

what's the purpose of tf.nn.dropout? - python

I notice there are two APIs in TensorFlow concerning with dropout, one is tf.nn.dropout, the other is tf.layers.dropout. I just wonder what's the purpose of tf.nn.dropout?
According to https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf, there should be a parameter to distinguish between training and testing stage. I see tf.layers.dropout provides the proper behavior, so why another function tf.nn.dropout? Anyone has any idea? Thanks.

tf.layers.dropout uses tf.nn.dropout function internally.
tf.nn.dropout might be useful if you just want to use a higher level abstraction and do not want to control many facets of the dropout.
Look at the api docs:
1)https://www.tensorflow.org/api_docs/python/tf/layers/dropout
2)https://www.tensorflow.org/api_docs/python/tf/nn/dropout
tf.layers.dropout is a wrapper around tf.nn.dropout and there's a slight difference in terms that tf.layers uses "rate of dropout" while tf.nn "uses the probability to keep the inputs". Though a direct relation can be established between them.
Also there's an extra argument "Training" in tf.layers.dropout which is used to control Whether to return the output in training mode (apply dropout) or in inference mode (return the input untouched).

Related

can every model be defined as nn.Sequential?

I'm learning to use PyTorch. If anyone is familiar with PyTorch could they tell me if all models can be nn.Sequential?
I'm asking because some framework features only accept as input a model defined as nn.Sequential

Yes, but only in the sense that you could wrap a highly complex model as a single step in ‘nn.Sequential’. If you want an example of a model that breaks sequential behavior, look up ResNet and its ilk. These require data to be passed between “layers”; however, even those can be implemented using ‘nn.Sequential’ by creating special ‘nn.Module’ classes to handle the residual functionality, and stacking those blocks together into sequential.

Can I extend a Tensorflow Estimator to also return the explanation values?

I have a model built by following roughly the tutorial provided for the tf.estimator.BoostedTreesClassifier in the docs. I then exported it by using the tf.Estimator.export_saved_model method as described in the SavedModels from Estimators section of the SavedModel docs. This loads in to TensorFlow Serving and answers gRPC and REST requests.
I'd now like to include the explanation factors along with any predictions. Or, less ideally, as a second signature available on the exported model. tf.estimator._BoostedTreesBase.experimental_predict_with_explanations already implements an appropriate algorithm, as described in Local Interpretability section of the docs.
I thought it would be possible to 'extend' the existing estimator in a way that would let me expose this method as another served signature. I've thought of several approaches, but only tried the first two so far:
I've Tried
Change which signatures export_saved_model exports
This didn't go very far. The exposed signatures are a little dynamic, but seem to be limited to the train, predict or eval options defined by tensorflow_core.python.saved_model.model_utils.mode_keys.KerasModeKeys.
Just use an eval_savedmodel?
I briefly thought Eval might be what I was looking for, and followed some of the getting started guide for TensorFlow Model Analysis. The further I go on this path the more it seems like the main difference with an Eval model is how the data is loaded, and that isn't what I want to change.
Subclass the estimator
There are extra caveats with exporting subclassed models. And on top of that an Estimator isn't a Model. It's a model with extra metadata around inputs, outputs and configuration, so I am not clear if a subclassed estimator would even be exportable in the same way a Keras Model is.
I abandoned this subclassing approach without writing much code.
Pull the BoostedTrees Model out of the Estimator
I am not savvy enough to arrange a BoostedTrees model myself, using the low-level primitives. The code in the Estimator that sets it up looks fairly complex. It would be nice to leverage that work, but it seems that the Estimator deals in model_fns, they change depending on the train/predict/eval mode, and it isn't clear what the relationship to a Keras Model is.
I wrote a little code for this, but also gave up on it quickly.
What Next?
Given the above dead ends, which angle should I be persuing further?
Both the low-level export API, and the low-level model building API look like they could get me closer to a solution. The gap between setting up an Estimator, and re-creating one using either API seems fairly wide.
Is it possible I could continue using the existing Estimator, but use the low-level export API to create something with an "interpret" signature that calls through to experimental_predict_with_explanations? Or even "predict and interpret" in a single step? Which tutorial will put me on that path?

TF Keras v 1.14+: subclass model or subclass layer for "module"

Tensorflow has some docs for subclassing (tf) Keras Model and Layer.
However, it is unclear which to use for "modules" or "blocks" (e.g. several layers collectively).
Since it is technically several layers, I feel that subclassing Layer would be deceiving, and while subclassing Model works, I am unsure if there are any negative penalties for doing so.
e.g.
x = inputs
a = self.dense_1(x) # <--- self.dense_1 = tf.keras.Dense(...)
b = self.dense_2(a)
c = self.add([x, b])
which is appropriate to use?

(Please note that this answer is old, later, Keras changed to allow and use subclassing regularly)
Initially, there is no need to sublass anything with Keras. Unless you have a particular reason for that (which is not building, training, predicting), you don't subclass for Keras.
Buiding a Keras model:
Either using Sequential (the model is ready already, just add layers), or using Model (create a graph with layers and finally call Model(inputs, outputs)), you don't need to subclass.
The moment you create an instance of Sequential or Model, you have a fully defined model, ready to use in all situations.
This model can even be used as parts of other models, its layers can be easily accessed to get intermetiate outputs and create new branches in your graph.
So, I don't see any reason at all to subclass Model, unless you are using some additional framework that would require this (but I don't think so). This seems to be something from PyTorch users (because this kind of model building is typical for PyTorch, create a subclass for Module and add layers and a call method). But Pytorch doesn't offer the same ease as Keras does for getting intermediate results.
The main advantage of using Keras is exactly this: you can easily access layers from blocks and models and instantly start branching from that point without needing to rebuild any call methods or adding any extra code for that in the models. So, when you subclass Model, you just defeat the purpose of Keras making it all more difficult.
The docs you mentioned say:
Model subclassing is particularly useful when eager execution is enabled since the forward pass can be written imperatively.
I don't really understand what "imperatively" means, and I don't see how it would be easier than just building a model with regular layers.
Another quote from the docs:
Key Point: Use the right API for the job. While model subclassing offers flexibility, it comes at a cost of greater complexity and more opportunities for user errors. If possible, prefer the functional API.
Well... it's always possible.
Subclassing layers
Here, there may be good reasons to do so. And these reasons are:
You want a layer that performs custom calculations that are not available with regular layers
This layer must have persistent weights.
If you don't need "both" things above, you don't need to subclass a layer. If you just want "custom calculations" without weights, a Lambda layer is enough.

TensorFlow BasicLSTMCell vs LSTMFusedBlockCell

I am new in TensorFlow. I have managed to build a graph that uses LSTMs to train a basic model using a BaiscLSTMCell, based on the TensorFlow tutorial.
But I need to make it faster. I have seen a comparison here and, since I do not have an Nvidia GPU, the LSTMBlockFusedCell seems to be the best option. I had a look at the documentation and I noticed that the signatures for the __init__() and __call__() functions are different. Specifically, I am worried about the cell_clip parameter in __init()__ and the sequence_length in call. What is more, the inputs tensor is of shape [time_len, batch_size, input_size]; isn't that different from that of the basic cell ([batch_size, time_len, input_size])? I do not want to use peepholes, so I will leave that to False (default).
Could someone explain if there are any other differences (apart from an improvement in the performance) between the BasicLSTMCell and the LSTMBlockFusedCell and how to properly set the parameters mentioned above to achieve the same result as the original?

The documentation for LSTMBlockCell says it should be drop-in compatible with LSTMCell, so the same arguments should have the same meaning.
Whether the input tensor is batch-first or time-first is unrelated to the cell and related instead to the dynamic_rnn / static_rnn you're using.

Policy Gradients in Keras

I've been trying to build a model using 'Deep Q-Learning' where I have a large number of actions (2908). After some limited success with using standard DQN:
(https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf), I decided to do some more research because I figured the action space was too large to do effective exploration.
I then discovered this paper: https://arxiv.org/pdf/1512.07679.pdf where they use an actor-critic model and policy gradients, which then led me to: https://arxiv.org/pdf/1602.01783.pdf where they use policy gradients to get much better results then DQN overall.
I've found a few sites where they have implemented policy gradients in Keras, https://yanpanlau.github.io/2016/10/11/Torcs-Keras.html and https://oshearesearch.com/index.php/2016/06/14/kerlym-a-deep-reinforcement-learning-toolbox-in-keras/ however I'm confused how they are implemented. In the former (and when I read the papers) it seems like instead of providing an input and output pair for the actor network, you provide the gradients for the all the weights and then use the network to update it, whereas, in the latter they just calculate an input-output pair.
Have I just confused myself? Am I just supposed to be training the network by providing an input-output pair and use the standard 'fit', or do I have to do something special? If it's the latter, how do I do it with the Theano backend? (the examples above use TensorFlow).

TL;DR
Learn how to implement custom loss functions and gradients using Keras.backend. You will need it for more advanced algorithms and it's actually much easier once you get the hang of it
One CartPole example of using keras.backend could be https://gist.github.com/kkweon/c8d1caabaf7b43317bc8825c226045d2 (though its backend used Tensorflow but it should be very similar if not the same)
Problem
When playing,
the agent needs a policy that is basically a function that maps a state into a policy that is a probability for each action. So, the agent will choose an action according to its policy.
i.e, policy = f(state)
When training,
Policy Gradient does not have a loss function. Instead, it tries to maximize the expected return of rewards. And, we need to compute the gradients of log(action_prob) * advantage
advantage is a function of rewards.
advantage = f(rewards)
action_prob is a function of states and action_taken. For example, we need to know which action we took so that we can update parameters to increase/decrease a probability for the action we took.
action_prob = sum(policy * action_onehot) = f(states, action_taken)
I'm assuming something like this
policy = [0.1, 0.9]
action_onehot = action_taken = [0, 1]
then action_prob = sum(policy * action_onehot) = 0.9
Summary
We need two functions
update function: f(state, action_taken, reward)
choose action function: f(state)
You already know it's not easy to implement like typical classification problems where you can just model.compile(...) -> model.fit(X, y)
However,
In order to fully utilize Keras, you should be comfortable with defining custom loss functions and gradients. This is basically the same approach the author of the former one took.
You should read more documentations of Keras functional API and keras.backend
Plus, there are many many kinds of policy gradients.
The former one is called DDPG which is actually quite different from regular policy gradients
The latter one I see is a traditional REINFORCE policy gradient (pg.py) which is based on Kapathy's policy gradient example. But it's very simple for example it only assumes only one action. That's why it could have been implemented somehow using model.fit(...) instead.
References
Schulman, "Policy Gradient Methods", http://rll.berkeley.edu/deeprlcourse/docs/lec2.pdf

The seemingly conflicting implementations you encountered are both valid implementations. They are two equivalent ways two implement the policy gradients.
In the vanilla implementation, you calculate the gradients of the policy network w.r.t. rewards and directly update the weights in the direction of the gradients. This would require you to do the steps described by Mo K.
The second option is arguably a more convenient implementation for autodiff frameworks like keras/tensorflow. The idea is to implement an input-output (state-action) function like supervised learning, but with a loss function who's gradient is identical to the policy gradient. For a softmax policy, this simply means predicting the 'true action' and multiplying the (cross-entropy) loss with the observed returns/advantage. Aleksis Pirinen has some useful notes about this [1].
The modified loss function for option 2 in Keras looks like this:
import keras.backend as K
def policy_gradient_loss(Returns):
def modified_crossentropy(action,action_probs):
cost = K.categorical_crossentropy(action,action_probs,from_logits=False,axis=1 * Returns)
return K.mean(cost)
return modified_crossentropy
where 'action' is the true action of the episode (y), action_probs is the predicted probability (y*). This is based on another stackoverflow question [2].
References
https://aleksispi.github.io/assets/pg_autodiff.pdf
Make a custom loss function in keras

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.