I'm learning to build a customized sequence-to-sequence model with keras, and have been reading some codes that other people wrote, for example here. I got confused in the call method regarding constants. There is the keras "Note on passing external constants to RNNs", however I'm having trouble to understand what the constants are doing to the model.
I did go through the attention model and the pointer network papers, but maybe I've missed something.
Any reference to the modeling details would be appreciated! Thanks in advance.
Okay just as a reference in case someone else stumbles across this question: I went through the code in the recurrent.py file, I think the get_constants is getting the dropout mask and the recurrent dropout mask, then concatenating it with the [h,c] states (the order of these four elements is required in the LSTM step method). After that it doesn't matter anymore to the original LSTM cell, but you can add your own 'constants' (in the sense that it won't be learned) to pass from one timestep to the next. All constants will be added to the returned [h,c] states implicitly. In Keon's example the fifth position of the returned state is the input sequence, and it can be referenced in every timestep by calling states[-1].
Related
So I've been studying a little bit about LSTMs and how to implement them using Keras API. But I'm a bit confused on what the arguments return_sequence and return_state do exactly. While reading through some books and articles I see that the cell state and the hidden state are passed as an input for the next cell to compute it's states. And I've also seen some examples on how these arguments to the constructor end up resulting in a different output shape.
If I understood it correctly when both are set to True then, when the model is used to predict on a new batch of data not only does it return the output but the entire sequence produced by the model as well as the cell states for the respective terms in the output sequence. At first I thought this arguments allowed you to use the entire sequence of hidden states and cell states as input for the next cell, but that would mess up the architecture and overall schema of the network and wouldn't really be a LSTM anymore, not to mention possible dimensionality problems that would rise up.
Sorry for the long post but I tried to make my line of reasoning as clear as possible hoping someone might shed some light on this topic and help me by saying if my understanding is actually correct or not.
I just read through and ran the code here: https://tomaxent.com/2017/04/26/LSTM-by-Example-using-Tensorflow-Text-Generate/
(this guy rips off the following medium.com article, but I can't access medium.com from my work computer); https://medium.com/towards-data-science/lstm-by-example-using-tensorflow-feb0c1968537
From my previous reading, it is my understanding that to train RNNs, we have to 'unwrap' them into feed forward networks (FFN) for a certain number of steps (along with an extra input for the "x at time t"), and set it so that all the weights in the FFN that correspond to a single weight in the RNN are equal.
I'm looking at the code and I don't see any 'unwrapping' step, or even a variable indicating the number of steps for which we want to unwrap.
Is there another way to train an RNN? Am I just missing the line in the code where that variable is defined?
If i am not mistaken there is no such 'unwrapping step'.We generally "unroll" a RNN in-order to understand its working properly (through each time step).
Now,coming to Tensorflow Implementation I found this repo:MuhammedBuyukkinaci/TensorFlow-Text-Generator to be very useful and this might clear up your most of the doubts.
Other Useful Links:
Tensorflow-RNN
Basic_Rnn_Cell
Static_RNN Cell
TF has tf.variable_scope() that allows users to access tf.Variable() anywhere in the code. Basically every variable in TF is a global variable.
Is there a similar way to access class objects like tf.nn.rnn_cell.LSTMCell() or tf.layers.Dense()? To be more specific, can i create a new class object, let's say lstm_cell_2 (used for prediction) that uses the same weights and biases in lstm_cell_1 (used during training).
I am building an RNN to do language modeling. What i am doing right now is to return the lstm_cell_1 then pass it onto the prediction function. This works, but i want to in the end use separate tf.Graph() and tf.Session() for training, inference and prediction. Hence comes the problem of sharing Tensorflow objects.
Also, my lstm_cell is an instance of tf.nn.rnn_cell.MultiRNNCell() which doesn't take a name argument.
Thanks
You can share final state that is one of the outputs of for example consider dynamic rnn. You get outputs,cell_final_state which you can share. You probably know about cell_final_state.c and cell_final_state.h. you can set that final state as initial state. lemme know if I cleared your question or not.
Just noticed it's such an old question. Hopefully helps future people.
I am going through couple of Tensorflow examples that use LSTM cells and trying to understand the purpose of initial_state variable that is used in one implementation but not in the other for some unknown reason.
For example PTB example uses it as:
self._initial_state = cell.zero_state(config.batch_size, data_type())
state = self._initial_state
where it represents hidden state transitions and used to keep the hidden state intact during batch training. This variable should be zeroed between the epochs naturally. And yet some recurrent Bi-LSTM models don't use initial_state at all which makes you think that either it is somehow done by Tensorflow behind-the-scenes or not necessary at all hence the confusion. So, why do some recurrent models use it and others don't? In Torch for example, same mechanism is as simple as:
local params, grad_params = model:getParameters()
-- start training loop
while epoch < max_epoch do
for mini_batch in training_data do
(...)
grad_params:zero()
end
end
The hidden state is handled by the framework no need for all that really clunky stuff or am I missing something here. Can you please explain how does it work in Tensorflow?
As I understood, it it appears to be specific setup for Tensorflow PTB model which is supposed to be running not only with single LSTM cells but with several ones (who would even try to train it on more than 2 cells I wonder). For that it needs to keep track of c and h tensors between the cells and thus the _initial_state variable. It also is supposed to be running in parallel over several GPUs as well, continue if interrupted etc. And that is why PTB example code looks ugly and overengineered to a newcomer.
I have a stateful LSTM model. I need to call my own K.function() a few times myself for the same data so I can measure the uncertainty of the prediction. I have assumed the state of the model won't actually be updated since that would have been done with the update op Keras would normally pass to the K.function(), updates = parameter. Is that a correct assumption?
Correct, passing updates=None or updates=[] into K.function will evaluate only outputs. So you are fine as long as your K.function() argument doesn't make any changes itself.
This can be seen in the tesorflow backend source code:
the call will evaluate just self.outputs, because self.updates_op is going to be an empty op.
I haven't worked with other backends, but I've looked into cntk_backend.py and theano_backend.py sources: they do the same.