The difference between RNN Decoder and RNN - python

We are only using the RNN decoder (without encoder) for text generation, how is RNN decoder different from pure RNN operation?
RNN Decoder in TensorFlow: https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/dynamic_rnn_decoder
Pure RNN in TensorFlow: https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn
Thanks for your time

RNN Decoder is sequence to sequence mapping model. In combination with Encoder takes a sequence as input and generates another sequence as output. In many to many sequence learning setups, the input is a sequence of vectors and the output is another sequence of vectors.
Eg: Speech Wave to Text, Language Translation etc
In General We use Normal RNN for one to one sequence learning setup.
Eg: Next Character Prediction.

Related

What are the inputs to the transformer encoder and decoder in BERT?

I was reading the BERT paper and was not clear regarding the inputs to the transformer encoder and decoder.
For learning masked language model (Cloze task), the paper says that 15% of the tokens are masked and the network is trained to predict the masked tokens. Since this is the case, what are the inputs to the transformer encoder and decoder?
Is the input to the transformer encoder this input representation (see image above). If so, what is the decoder input?
Further, how is the output loss computed? Is it a softmax for only the masked locations? For this, the same linear layer is used for all masked tokens?
Ah, but you see, BERT does not include a Transformer decoder.
It is only the encoder part, with a classifier added on top.
For masked word prediction, the classifier acts as a decoder of sorts, trying to reconstruct the true identities of the masked words.
Classifying Non-masked is not included in the classification task and does not effect loss.
BERT is also trained on predicting whether a pair of sentences really does precedes one another or not.
I do not remember how the two losses are weighted.
I hope this draws a clearer picture.

Text classification with torchnlp

i'm trying to build a neural network using pytorch-nlp (https://pytorchnlp.readthedocs.io/en/latest/).
My intent is to build a network like this:
Embedding layer (uses pytorch standard layer and from_pretrained method)
Encoder with LSTM (also uses standard nn.LSTM)
Attention mechanism (uses torchnlp.nn.Attention)
Decoder siwth LSTM (as encoder)
Linear layer standard
I'm encountering a major problem with the dimensions of the input sentences (each word is a vector) but most importantly with the attention layer : I don't know how to declare it because i need the exact dimensions of the output from the encoder, but the sequences have varying dimensions (corresponding to the fact that sentences have different number of words).
I've tried to look at torch.nn.utils.rnn.pad_packed_sequence and torch.nn.utils.rnn.pack_padded_sequence since they're supported by LSTM, but i cannot find the solution.
Can anyone help me?
EDIT
I thought about padding all sequences to a specific dimension, but I don't want to truncate longer sequences because I want to keep all the information.
You are on the right track with padding all sequences to a specific dimension. You will have to pick a dimension that is larger than "most" of your sentences but you will need to cutoff some sentences. This blog article should help.

Meaningful State Representations with Autoencoders & Embedding Layers

As I understand, Embedding Layers are simply lookup matrices, weights of which are learned by the optimisation problem.
Suppose, for this example, my dataset contains a single categorical variable. For example, I would like to auto encode a sentence of words to itself, to learn the sentence representation.
# example model
input = tf.keras.layers.Input()
embed = tf.keras.layers.Embedding(99)(input)
encoder = tf.keras.layers.LSTM()(embed)
decoder = tf.keras.layers.LSTM()(encoder)
model = tf.keras.models.Model(input, decoder)
The error will minimise the difference between embed and decoder outputs.
However, since embeddings are learned depending on optimisation condition, I think that I will end up learning trivial representations e.g.
the embedding matrix is all ones, and decoder always outputs ones. (Or zeros even), giving me a 100% accuracy in training.
For example, in the embedding matrix all words are just a vector of ones, and the auto encoder simply returns ones.
What I would like to do is to learn a meaningful representation of categorical variables.

Use Glove vectors without Embedding layers in LSTM

I want to use the Glove vectors in Language Modeling. But the problem is if I use Embedding layer in the model, I can't predict the output vector and match the word. What I mean here is, I want to give the glove vector representation for my sentences as the Input. and get them out from the Lstm layer and get the vectors and match it with Glove vectors I want to use the glove vectors without embedding layer. can someone propose a method to do it? I am using the keras and python3
What I want is to use the embedding layer as one model1 and return the output vector and give it to another LSTM model2 as the input. where it gives the index of the word vector.

Using trainable word embedding layer with LSTM and dynamic RNN: AdamOptimizer expected float_ref instead of float

I'm using an RNN on sequences of word embeddings to classify sentences. At first I was feeding pre-trained word embeddings and everything worked fine. I made the embeddings matrix a tf.placeholder with dimension (Vocab size, Embedding size) and fed some pre-trained embeddings from GloVe. I also use tf.nn.embedding_lookup to translate my inputs (which are sequences of word IDs) into sequences of embeddings.
Then I wanted to allow the model to train the embeddings as well, so I made the embedding matrix a tf.Variable instead of a placeholder. Now TensorFlow gives me this error -- apparently the AdamOptimizer can't handle the embedding lookup. Any idea what's up or how to fix this?
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input 0 of node
Adam/update_embeddings/AssignSub was passed float from _recv_embeddings_0:0
incompatible with expected float_ref.
You cannot feed a value to a variable and optimize it at the same time. Instead, you must first run a tf.assign on that variable to initialize it to the fed value, and then run the optimier. Or, more easily, you can just pass the glove vectors as the initializer of the variable and run tf.global_variables_initializer.

Categories