RNN with inconsistent (repeated) padding (using Pytorch's Pack_padded_sequence) - python

Following the example from PyTorch docs I am trying to solve a problem where the padding is inconsistent rather than at the end of the tensor for each batch (in other words, no pun intended, I have a left-censored and right-censored problem across my batches):
# Data structure example from docs
seq = torch.tensor([[1,2,0], [3,0,0], [4,5,6]])
# Data structure of my problem
inconsistent_seq = torch.tensor([[1,2,0], [0,3,0], [0,5,6]])
lens = ...?
packed = pack_padded_sequence(seq, lens, batch_first=True, enforce_sorted=False)
How can I solve the problem of masking these padded 0’s when running them through an LSTM using (preferably) PyTorch functionality?

I "solved" this by essentially reindexing my data and padding left-censored data with 0's (makes sense for my problem). I also injected and extra tensor to the input dimension to track this padding. I then masked the right-censored data using the pack_padded_sequence method from the PyTorch library. Found a good source here:
https://www.kdnuggets.com/2018/06/taming-lstms-variable-sized-mini-batches-pytorch.html

Related

What is the difference between an Embedding Layer with a bias immediately afterwards and a Linear Layer in PyTorch

I am reading the "Deep Learning for Coders with fastai & PyTorch" book. I'm still a bit confused as to what the Embedding module does. It seems like a short and simple network, except I can't seem to wrap my head around what Embedding does differently than Linear without a bias. I know it does some faster computational version of a dot product where one of the matrices is a one-hot encoded matrix and the other is the embedding matrix. It does this to in effect select a piece of data? Please point out where I am wrong. Here is one of the simple networks shown in the book.
class DotProduct(Module):
def __init__(self, n_users, n_movies, n_factors):
self.user_factors = Embedding(n_users, n_factors)
self.movie_factors = Embedding(n_movies, n_factors)
def forward(self, x):
users = self.user_factors(x[:,0])
movies = self.movie_factors(x[:,1])
return (users * movies).sum(dim=1)
Embedding
[...] what Embedding does differently than Linear without a bias.
Essentially everything. torch.nn.Embedding is a lookup table; it works the same as torch.Tensor but with a few twists (like possibility to use sparse embedding or default value at specified index).
For example:
import torch
embedding = torch.nn.Embedding(3, 4)
print(embedding.weight)
print(embedding(torch.tensor([1])))
Would output:
Parameter containing:
tensor([[ 0.1420, -0.1886, 0.6524, 0.3079],
[ 0.2620, 0.4661, 0.7936, -1.6946],
[ 0.0931, 0.3512, 0.3210, -0.5828]], requires_grad=True)
tensor([[ 0.2620, 0.4661, 0.7936, -1.6946]], grad_fn=<EmbeddingBackward>)
So we took the first row of the embedding. It does nothing more than that.
Where is it used?
Usually when we want to encode some meaning (like word2vec) for each row (e.g. words being close semantically are close in euclidean space) and possibly train them.
Linear
torch.nn.Linear (without bias) is also a torch.Tensor (weight) but it does operation on it (and the input) which is essentially:
output = input.matmul(weight.t())
every time you call the layer (see source code and functional definition of this layer).
Code snippet
The layer in your code snippet does this:
creates two lookup tables in __init__
the layer is called with input of shape (batch_size, 2):
first column contains indices of user embeddings
second column contains indices of movie embeddings
these embeddings are multiplied and summed returning (batch_size,) (so it's different from nn.Linear which would return (batch_size, out_features) and perform dot product instead of element-wise multiplication followed by summation like here)
This is probably used to train both representations (of users and movies) for some recommender-like system.
Other stuff
I know it does some faster computational version of a dot product
where one of the matrices is a one-hot encoded matrix and the other is
the embedding matrix.
No, it doesn't. torch.nn.Embedding can be one hot encoded and might also be sparse, but depending on the algorithms (and whether those support sparsity) there might be performance boost or not.

TensorFlow Federated: How can I write an Input Spec for a model with more than one input

I'm trying to make an image captioning model using the federated learning library provided by tensorflow, but I'm stuck at this error
Input 0 of layer dense is incompatible with the layer: : expected min_ndim=2, found ndim=1.
this is my input_spec:
input_spec=collections.OrderedDict(x=(tf.TensorSpec(shape=(2048,), dtype=tf.float32), tf.TensorSpec(shape=(34,), dtype=tf.int32)), y=tf.TensorSpec(shape=(None), dtype=tf.int32))
The model takes image features as the first input and a list of vocabulary as a second input, but I can't express this in the input_spec variable. I tried expressing it as a list of lists but it still didn't work. What can I try next?
Great question! It looks to me like this error is coming out of TensorFlow proper--indicating that you probably have the correct nested structure, but the leaves may be off. Your input spec looks like it "should work" from TFF's perspective, so it seems it is probably slightly mismatched with the data you have
The first thing I would try--if you have an example tf.data.Dataset which will be passed in to your client computation, you can simply read input_spec directly off this dataset as the element_spec attribute. This would look something like:
# ds = example dataset
input_spec = ds.element_spec
This is the easiest path. If you have something like "lists of lists of numpy arrays", there is still a way for you to pull this information off the data itself--the following code snippet should get you there:
# data = list of list of numpy arrays
input_spec = tf.nest.map_structure(lambda x: tf.TensorSpec(x.shape, x.dtype), data)
Finally, if you have a list of lists of tf.Tensors, TensorFlow provides a similar function:
# tensor_structure = list of lists of tensors
tf.nest.map_structure(tf.TensorSpec.from_tensor, tensor_structure)
In short, I would reocmmend not specifying input_spec by hand, but rather letting the data tell you what its input spec should be.

Initialize TensorFlow CNN model with Numpy weight matrices

I am working on manually converting a pretrained matconvnet model to a tensorflow model. I have pulled the weights/biases from the matconvnet model mat file using scipy.io and obtained numpy matrices for the weights and biases.
Code snippets where data is a dictionary returned from scipy.io:
for i in data['net2']['layers']:
if i.type == 'conv':
model.append({'weights': i.weights[0], 'bias': i.weights[1], 'stride': i.stride, 'padding': i.pad, 'momentum': i.momentum,'lr': i.learningRate,'weight_decay': i.weightDecay})
...
weights = {
'wc1': tf.Variable(model[0]['weights']),
'wc2': tf.Variable(model[2]['weights']),
'wc3': tf.Variable(model[4]['weights']),
'wc4': tf.Variable(model[6]['weights'])
}
...
Where model[0]['weights'] are the 4x4x60 numpy matrices pulled from matconvnet model for for layer, for example. And this is how I define the place holder for the 9x9 inputs.
X = tf.placeholder(tf.float32, [None, 9, 9]) #also tried with [None, 81] with a tf.reshape, [None, 9, 9, 1]
Current Issue: I cannot get ranks to match up. I consistently getValueError:
ValueError: Shape must be rank 4 but is rank 3 for 'Conv2D' (op: 'Conv2D') with input shapes: [?,9,9], [4,4,60]
Summary
Is it possible to explicitly define a tensorflow model's weights from numpy arrays?
Why is the rank for my weight matrices 4? Should my numpy array be something more like [?, 4, 4, 60], and can I make it that way?
Unsuccessfully Attempted:
Rotating numpy matrices: I know that matlab and python have different indexing, (0 based indexing vs 1 based, and row major vs column major). Even though I believe I have converted everything appropriately, I still have experimented using libraries like np.rot90() changing 4x4x60 array to 60x4x4.
Using tf.reshape: I have attempted to use tf.reshape on the weights after wrapping them with a tf.Variable wrapper, but I get Variable has no attribute 'reshape'
NOTE:
Please note, I am aware that there are a number of scripts to go from matconvnet to caffe, and caffe to tensorflow (as described here, for example, https://github.com/vlfeat/matconvnet/issues/1021). My question is related to tensorflow weight initialization options:
https://github.com/zoharby/matconvnet/blob/master/utils/convert_matconvnet_caffe.m
https://github.com/ethereon/caffe-tensorflow
I got over this hurdle with tf.reshape(...) (instead of calling weights['wc1'].reshape(...) ). I am still not certain about the performance yet, or if this is a horribly naive endeavor.
UPDATE Further testing, this approach appears to be possible at least functionally (as in I have created a TensorFlow CNN model that will run and produce predictions that appear consistent with MatConvNet model. I make no claims on accuracies between the two).
I am sharing my code. In my case, it was a very small network - and if you are attempting to use this code for your own matconvnet to tensorflow project, you will likely need much more modifications: https://github.com/melissadale/MatConv2TensorFlow

How to update a sub-tensor inside a tensor in tensorflow?

I'm working with MNIST and I have a tensor of gradients with size [?,28,28,1] and I want to zero out a few of the [28,28,1] sub-tensors inside it, how should I accomplish this?
I know the indices (as a list) where I need to zero out the sub-tensors. I tried doing something like this (given below) but, scatter.update can only change variables not tensors. I also tried stacking up the required sub-tensors of zeroes and ones but couldn't build up the required result.
dy_dx, = tf.gradients(loss, x_adv)
zeroes = tf.zeros(dy_dx[0].get_shape(), tf.float32)
dy_dx = tf.scatter_update(dy_dx, indices, zeroes)
Thanks!
I'd suggest creating a TensorFlow constant with zeros at the locations you want to zero out and ones everywhere else. Then you could create an op that uses tf.multiply to do elementwise multiplication of the constant and dy_dx. Depending on the structure of your graph, you might need to feed the result to dy_dx in your next call to session.run; you can replace any Tensor with feed data, including variables and constants.
Incidentally, if you just want to apply dropout to the input layer you can use tf.layers.dropout

Select which tensor to use in middle of TensorFlow graph

In Tensorflow, how would I go about selecting between a python list of Tensors in the middle of my graph as an input to the rest of the graph?
Basically, I have a python list of Tensors that are candidates to be used as inputs in the rest of the graph. I want to select from one of them without adding extra dependencies that require all of the Tensors in the list to be computed (I think that would happen if I used tf.cond). How can I select one of them? I can't do it at the python level because I choose the tensor based on a value computed from a placeholder. So for example:'
x = tf.placeholder(tf.int32, shape=(num_steps, None))
y = tf.placeholder(tf.int32, shape=(None,))
lengths = tf.placeholder(tf.int32, shape=(None,))
# Pretend there is a bunch of lines of code here
output_index = max_sequence_length = tf.reduce_max(lengths)
final_output = potential_outputs[output_index] # won't work, output_index is Tensor
# Pretend the rest of the model uses final_output
More info if you want it:
I am unrolling an RNN and I want to only unroll to the maximum length of the sequence. When this is less then the number of unrolling steps, there is a lot of wasted computation. Dynamic_rnn and static_rnn do not meet my needs, so I am trying to come up with my own custom method of unrolling the graph.
To index in tensorflow use tf.slice.
It should be noted that based on the code you provided, I don't think you are indexing the outputs correctly using tf.reduce_max function since this is providing the actual maximum value across a given axis which may not be an integer (but I'm not sure how your network works). You may be looking for tf.argmax that returns to index for the maximum value. The issue with this however is that tensorflow does not a have a gradient defined for tf.argmax so that function cannot be a learned part of your network.

Categories