i'm trying to build a neural network using pytorch-nlp (https://pytorchnlp.readthedocs.io/en/latest/).
My intent is to build a network like this:
Embedding layer (uses pytorch standard layer and from_pretrained method)
Encoder with LSTM (also uses standard nn.LSTM)
Attention mechanism (uses torchnlp.nn.Attention)
Decoder siwth LSTM (as encoder)
Linear layer standard
I'm encountering a major problem with the dimensions of the input sentences (each word is a vector) but most importantly with the attention layer : I don't know how to declare it because i need the exact dimensions of the output from the encoder, but the sequences have varying dimensions (corresponding to the fact that sentences have different number of words).
I've tried to look at torch.nn.utils.rnn.pad_packed_sequence and torch.nn.utils.rnn.pack_padded_sequence since they're supported by LSTM, but i cannot find the solution.
Can anyone help me?
EDIT
I thought about padding all sequences to a specific dimension, but I don't want to truncate longer sequences because I want to keep all the information.
You are on the right track with padding all sequences to a specific dimension. You will have to pick a dimension that is larger than "most" of your sentences but you will need to cutoff some sentences. This blog article should help.
Related
Now, I want feature of image to compute their similarity. We can get feature using pre-trained VGG19 model in tensorflow easily. But VGG19 model has many layers, and I don't know which layer should I use to get feature. Which layer's output is appropriate for this problem?
# I think this how is correct to extract feature
model = tf.keras.application.VGG19(include_top=True,
weight='imagenet')
input = model.input
output = model.layers[-2].output
extract_model = tf.keras.Model(input, output)
It's my infer that the more closer to last output, the more the model output powerful feature. But some tutorials says 'use include_top=False to extract feature' (e.g Image Captioning with Attention TensorFlow)
So, I don't know which layer should I use. Please try to help me here in this thread.
The include_top=False may be used because the last 3 layers (for that specific model) are fully connected layers which are not typically good feature vectors. If the model directly outputs a feature vector, then you don't need it.
Most people use the last layer for transfer learning, but it may depend on your application. For example, Gatys et. al. show that the first few layers of VGG are sensitive to the style of the image and later layers are sensitive to the content.
I would probably try all of them in a hyperparameter search and see which gives the best performance. If by image similarity you mean the similarity of objects contained inside, I would probably start with the last layer.
Let's assume i want to make the following layer in a neural network: Instead of having a square convolutional filter that moves over some image, I want the shape of the filter to be some other shape, say a rectangle, circle, triangle, etc (this is of course a silly example; the real case I have in mind is something different). How would I implement such a layer in TensorFlow?
I found that one can define custom layers in Keras by extending tf.keras.layers.Layer, but the documentation is quite limited without many examples. A python implementation of a convolutional layer by for example extending the tf.keras.layer.Layer would probably help as well, but it seems that the convolutional layers are implemented in C. Does this mean that I have to implement my custom layer in C to get any reasonable speed or would Python TensorFlow operations be enough?
Edit: Perhaps it is enough if I can just define a tensor of weights, but where I can customize entries in the tensor that are identically zero and some weights showing up in multiple places in this tensor, then I should be able to by hand build a convolutional layer and other layers. How would I do this, and also include these variables in training?
Edit2: Let me add some more clarifications. We can take the example of building a 5x5 convolutional layer with one output channel from scratch. If the input is say 10x10 (plus padding so output is also 10x10)), I would imagine doing this by creating a matrix of size 100x100. Then I would fill in the 25 weights in the correct locations in this matrix (so some entries are zero, and some entries are equal, ie all 25 weights will show up in many locations in this matrix). I then multiply the input with this matrix to get an output. So my question would be twofold: 1. How do I do this in TensorFlow? 2. Would this be very inefficient and is some other approach recommended (assuming that I want to later customize what this filter looks like and thus the standard conv2d is not good enough).
Edit3: It seems doable by using sparse tensors and assigning values via a previously defined tf.Variable. However I don't know if this approach will suffer from performance issues.
Just use regular conv. layers with square filters, and zero out some values after each weight update:
g = tf.get_default_graph()
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
conv1_filter = g.get_tensor_by_name('conv1:0')
sess.run(tf.assign(conv1_filter, tf.multiply(conv1_filter, my_mask)))
where my_mask is a binary tensor (of the same shape and type as your filters) that matches the desired pattern.
EDIT: if you're not familiar with tensorflow, you might get confused about using the code above. I recommend looking at this example, and specifically at the way the model is constructed (if you do it like this you can access first layer filters as 'conv1/weights'). Also, I recommend switching to PyTorch :)
Does anybody know a way to access the outputs of the intermediate layers from BERT's hosted models on Tensorflow Hub?
The model is hosted here. I have explored the meta graph and found the only signatures available are "tokens, "tokenization_info", and "mlm". The first two are illustrated in the examples on github, and the masked language model signature doesn't help much. Some models like inception allow you to access all of the intermediate layers, but not this one.
Right now, all I can think of to do is:
Run [i.values() for i in tf.get_default_graph().get_operations()] to get the names of the tensors, find the ones I want (out of thousands) then
tf.get_default_graph().get_tensor_by_name(name_of_the_tensor) to access the values and stitch them together and connect them to my downstream layers.
Anybody know a cleaner solution with Tensorflow?
BERT is one of the latest achievements in the transformer language models area. Unlike predecessors BERT can achieve bidirectional architecture using MLM (Masked Language Model). This provides better contextualized word / sentence embedding for a variety of NLP solutions. As for general usage. Bert provides SOTA embeds with its last layer. But for research purposes, it is also advised to consider intermediate layers for text representations. The picture below shows the effect of the different use-cases with intermediate layers.
As it can be seen from the picture it is mostly advised to sum last four layers. Summing last four layers gives less embedding dimension compared to concatenation and it results with %0.2 difference. Intermediate layers can be achieved with scripts provided from BERT original GitHub page but implementing this scripts to downstream NLP task requires custom Keras layers. Instead TensorFlow-Hub provides one-line BERT with Keras layer. BERT TensorFlow-Hub solutions are updated on regular basis. First two versions only provided sentence (pooled output) or word (sequence_output). With v3 BERT now provides intermediate layer information. Link to BERT V3 is provided below.
BERT-LARGE v3 TF-HUB
In the given page, section named "Advanced topics" states the following information.
The intermediate activations of all L=24 Transformer blocks (hidden layers) are returned as a Python list: outputs["encoder_outputs"][i] is a Tensor of shape [batch_size, seq_length, 1024] with the outputs of the i-th Transformer block, for 0 <= i < L. The last value of the list is equal to sequence_output.
In v3, one can achieve intermediate layer information using (encoder_outputs). Intermediate layers are returned as python lists to carry out concatenation or sum operations or else. Another extension in v3 is that BERT TensorFlow-Hub now provides pre-processor. BERT takes three inputs "input_word_ids, input_mask, and input_type_ids". Pre-processor can take a string as an input and returns required inputs for BERT.
I haven't had a chance to test this approach but if it is not very curicual I recommend using last layer information. BERT is very powerful and GPU-dependent text representator compared to old lookup-table approaches. Common issue that researchers faced with bert is OOM problems with single GPU. To solve this use tf2 with memory growth. I will try to test BERT Hub v3 and give more feedback.
As I understand, Embedding Layers are simply lookup matrices, weights of which are learned by the optimisation problem.
Suppose, for this example, my dataset contains a single categorical variable. For example, I would like to auto encode a sentence of words to itself, to learn the sentence representation.
# example model
input = tf.keras.layers.Input()
embed = tf.keras.layers.Embedding(99)(input)
encoder = tf.keras.layers.LSTM()(embed)
decoder = tf.keras.layers.LSTM()(encoder)
model = tf.keras.models.Model(input, decoder)
The error will minimise the difference between embed and decoder outputs.
However, since embeddings are learned depending on optimisation condition, I think that I will end up learning trivial representations e.g.
the embedding matrix is all ones, and decoder always outputs ones. (Or zeros even), giving me a 100% accuracy in training.
For example, in the embedding matrix all words are just a vector of ones, and the auto encoder simply returns ones.
What I would like to do is to learn a meaningful representation of categorical variables.
I want to create a Neural Network in Keras for converting handwriting into computer letters.
My first step is to convert a sentence into an Array. My Array has the shape (1, number of letters,27). Now I want to input it in my Deep Neural Network and train.
But how do I input it properly if the dimension doesn't fit those from my image? And how do I achieve that my predict function gives me an output array of (1, number of letters,27)?
Seems like you are attempting to do Handwritten Recognition or similarly Optical Character Recognition or OCR. This is quite a broad field, and there are many ways to proceed. Even though, one approach I suggest is the following:
It is commonly known that Neural Networks have fixed size inputs, that is if you build it to take, say, inputs of shape (28,28,1) then the model will expect that shape as their inputs. Therefore, having a dimension in your samples that depends on the number of letters in a sentence (something variable) is not recommended, as you will not be able to train a model in such way with NNs.
Training such a model could be possible if you design it to predict one character at a time, instead a whole sentence that can have different lengths, and then group the predicted characters. The steps you could try to achieve this could be:
Obtain training samples for the characters you wish to recognize (like the MNIST database for example), and design and train your model to predict one character at a time.
Take the image with writing to classify and pass a Sliding Window over it that matches your expected input size (say a 28x28 window). Then, classify each of those windows to a character. Instead of Sliding Window, you could try isolating your desired features somehow and just classify those 28x28 segments instead.
Group the predicted characters somehow so you get words (probably grouping those separated by empty spaces) or do whatever you want with the predictions.
You can also try searching for tutorials or guides for Handwriting recognition like this one I have found quite useful. Hope this helps you get on track, good luck.