Initialize TensorFlow CNN model with Numpy weight matrices

Initialize TensorFlow CNN model with Numpy weight matrices - python

I am working on manually converting a pretrained matconvnet model to a tensorflow model. I have pulled the weights/biases from the matconvnet model mat file using scipy.io and obtained numpy matrices for the weights and biases.
Code snippets where data is a dictionary returned from scipy.io:
for i in data['net2']['layers']:
if i.type == 'conv':
model.append({'weights': i.weights[0], 'bias': i.weights[1], 'stride': i.stride, 'padding': i.pad, 'momentum': i.momentum,'lr': i.learningRate,'weight_decay': i.weightDecay})
...
weights = {
'wc1': tf.Variable(model[0]['weights']),
'wc2': tf.Variable(model[2]['weights']),
'wc3': tf.Variable(model[4]['weights']),
'wc4': tf.Variable(model[6]['weights'])
}
...
Where model[0]['weights'] are the 4x4x60 numpy matrices pulled from matconvnet model for for layer, for example. And this is how I define the place holder for the 9x9 inputs.
X = tf.placeholder(tf.float32, [None, 9, 9]) #also tried with [None, 81] with a tf.reshape, [None, 9, 9, 1]
Current Issue: I cannot get ranks to match up. I consistently getValueError:
ValueError: Shape must be rank 4 but is rank 3 for 'Conv2D' (op: 'Conv2D') with input shapes: [?,9,9], [4,4,60]
Summary
Is it possible to explicitly define a tensorflow model's weights from numpy arrays?
Why is the rank for my weight matrices 4? Should my numpy array be something more like [?, 4, 4, 60], and can I make it that way?
Unsuccessfully Attempted:
Rotating numpy matrices: I know that matlab and python have different indexing, (0 based indexing vs 1 based, and row major vs column major). Even though I believe I have converted everything appropriately, I still have experimented using libraries like np.rot90() changing 4x4x60 array to 60x4x4.
Using tf.reshape: I have attempted to use tf.reshape on the weights after wrapping them with a tf.Variable wrapper, but I get Variable has no attribute 'reshape'
NOTE:
Please note, I am aware that there are a number of scripts to go from matconvnet to caffe, and caffe to tensorflow (as described here, for example, https://github.com/vlfeat/matconvnet/issues/1021). My question is related to tensorflow weight initialization options:
https://github.com/zoharby/matconvnet/blob/master/utils/convert_matconvnet_caffe.m
https://github.com/ethereon/caffe-tensorflow

I got over this hurdle with tf.reshape(...) (instead of calling weights['wc1'].reshape(...) ). I am still not certain about the performance yet, or if this is a horribly naive endeavor.
UPDATE Further testing, this approach appears to be possible at least functionally (as in I have created a TensorFlow CNN model that will run and produce predictions that appear consistent with MatConvNet model. I make no claims on accuracies between the two).
I am sharing my code. In my case, it was a very small network - and if you are attempting to use this code for your own matconvnet to tensorflow project, you will likely need much more modifications: https://github.com/melissadale/MatConv2TensorFlow

Related

RNN with inconsistent (repeated) padding (using Pytorch's Pack_padded_sequence)

Following the example from PyTorch docs I am trying to solve a problem where the padding is inconsistent rather than at the end of the tensor for each batch (in other words, no pun intended, I have a left-censored and right-censored problem across my batches):
# Data structure example from docs
seq = torch.tensor([[1,2,0], [3,0,0], [4,5,6]])
# Data structure of my problem
inconsistent_seq = torch.tensor([[1,2,0], [0,3,0], [0,5,6]])
lens = ...?
packed = pack_padded_sequence(seq, lens, batch_first=True, enforce_sorted=False)
How can I solve the problem of masking these padded 0’s when running them through an LSTM using (preferably) PyTorch functionality?

I "solved" this by essentially reindexing my data and padding left-censored data with 0's (makes sense for my problem). I also injected and extra tensor to the input dimension to track this padding. I then masked the right-censored data using the pack_padded_sequence method from the PyTorch library. Found a good source here:
https://www.kdnuggets.com/2018/06/taming-lstms-variable-sized-mini-batches-pytorch.html

What is the difference between an Embedding Layer with a bias immediately afterwards and a Linear Layer in PyTorch

I am reading the "Deep Learning for Coders with fastai & PyTorch" book. I'm still a bit confused as to what the Embedding module does. It seems like a short and simple network, except I can't seem to wrap my head around what Embedding does differently than Linear without a bias. I know it does some faster computational version of a dot product where one of the matrices is a one-hot encoded matrix and the other is the embedding matrix. It does this to in effect select a piece of data? Please point out where I am wrong. Here is one of the simple networks shown in the book.
class DotProduct(Module):
def __init__(self, n_users, n_movies, n_factors):
self.user_factors = Embedding(n_users, n_factors)
self.movie_factors = Embedding(n_movies, n_factors)
def forward(self, x):
users = self.user_factors(x[:,0])
movies = self.movie_factors(x[:,1])
return (users * movies).sum(dim=1)

Embedding
[...] what Embedding does differently than Linear without a bias.
Essentially everything. torch.nn.Embedding is a lookup table; it works the same as torch.Tensor but with a few twists (like possibility to use sparse embedding or default value at specified index).
For example:
import torch
embedding = torch.nn.Embedding(3, 4)
print(embedding.weight)
print(embedding(torch.tensor([1])))
Would output:
Parameter containing:
tensor([[ 0.1420, -0.1886, 0.6524, 0.3079],
[ 0.2620, 0.4661, 0.7936, -1.6946],
[ 0.0931, 0.3512, 0.3210, -0.5828]], requires_grad=True)
tensor([[ 0.2620, 0.4661, 0.7936, -1.6946]], grad_fn=<EmbeddingBackward>)
So we took the first row of the embedding. It does nothing more than that.
Where is it used?
Usually when we want to encode some meaning (like word2vec) for each row (e.g. words being close semantically are close in euclidean space) and possibly train them.
Linear
torch.nn.Linear (without bias) is also a torch.Tensor (weight) but it does operation on it (and the input) which is essentially:
output = input.matmul(weight.t())
every time you call the layer (see source code and functional definition of this layer).
Code snippet
The layer in your code snippet does this:
creates two lookup tables in __init__
the layer is called with input of shape (batch_size, 2):
first column contains indices of user embeddings
second column contains indices of movie embeddings
these embeddings are multiplied and summed returning (batch_size,) (so it's different from nn.Linear which would return (batch_size, out_features) and perform dot product instead of element-wise multiplication followed by summation like here)
This is probably used to train both representations (of users and movies) for some recommender-like system.
Other stuff
I know it does some faster computational version of a dot product
where one of the matrices is a one-hot encoded matrix and the other is
the embedding matrix.
No, it doesn't. torch.nn.Embedding can be one hot encoded and might also be sparse, but depending on the algorithms (and whether those support sparsity) there might be performance boost or not.

TensorFlow Federated: How can I write an Input Spec for a model with more than one input

I'm trying to make an image captioning model using the federated learning library provided by tensorflow, but I'm stuck at this error
Input 0 of layer dense is incompatible with the layer: : expected min_ndim=2, found ndim=1.
this is my input_spec:
input_spec=collections.OrderedDict(x=(tf.TensorSpec(shape=(2048,), dtype=tf.float32), tf.TensorSpec(shape=(34,), dtype=tf.int32)), y=tf.TensorSpec(shape=(None), dtype=tf.int32))
The model takes image features as the first input and a list of vocabulary as a second input, but I can't express this in the input_spec variable. I tried expressing it as a list of lists but it still didn't work. What can I try next?

Great question! It looks to me like this error is coming out of TensorFlow proper--indicating that you probably have the correct nested structure, but the leaves may be off. Your input spec looks like it "should work" from TFF's perspective, so it seems it is probably slightly mismatched with the data you have
The first thing I would try--if you have an example tf.data.Dataset which will be passed in to your client computation, you can simply read input_spec directly off this dataset as the element_spec attribute. This would look something like:
# ds = example dataset
input_spec = ds.element_spec
This is the easiest path. If you have something like "lists of lists of numpy arrays", there is still a way for you to pull this information off the data itself--the following code snippet should get you there:
# data = list of list of numpy arrays
input_spec = tf.nest.map_structure(lambda x: tf.TensorSpec(x.shape, x.dtype), data)
Finally, if you have a list of lists of tf.Tensors, TensorFlow provides a similar function:
# tensor_structure = list of lists of tensors
tf.nest.map_structure(tf.TensorSpec.from_tensor, tensor_structure)
In short, I would reocmmend not specifying input_spec by hand, but rather letting the data tell you what its input spec should be.

Multiple issues with axes while implementing a Seq2Seq with attention in CNTK

I'm trying to implement a Seq2Seq model with attention in CNTK, something very similar to CNTK Tutorial 204. However, several small differences lead to various issues and error messages, which I don't understand. There are many questions here, which are probably interconnected and all stem from some single thing I don't understand.
Note (in case it's important). My input data comes from MinibatchSourceFromData, created from NumPy arrays that fit in RAM, I don't store it in a CTF.
ins = C.sequence.input_variable(input_dim, name="in", sequence_axis=inAxis)
y = C.sequence.input_variable(label_dim, name="y", sequence_axis=outAxis)
Thus, the shapes are [#, *](input_dim) and [#, *](label_dim).
Question 1: When I run the CNTK 204 Tutorial and dump its graph into a .dot file using cntk.logging.plot, I see that its input shapes are [#](-2,). How is this possible?
Where did the sequence axis (*) disappear?
How can a dimension be negative?
Question 2: In the same tutorial, we have attention_axis = -3. I don't understand this. In my model there are 2 dynamic axis and 1 static, so "third to last" axis would be #, the batch axis. But attention definitely shouldn't be computed over the batch axis.
I hoped that looking at the actual axes in the tutorial code would help me understand this, but the [#](-2,) issue above made this even more confusing.
Setting attention_axis to -2 gives the following error:
RuntimeError: Times: The left operand 'Placeholder('stab_result', [#, outAxis], [128])'
rank (1) must be >= #axes (2) being reduced over.
during creation of the training-time model:
def train_model(m):
#C.Function
def model(ins: InputSequence[Tensor[input_dim]],
labels: OutputSequence[Tensor[label_dim]]):
past_labels = Delay(initial_state=C.Constant(seq_start_encoding))(labels)
return m(ins, past_labels) #<<<<<<<<<<<<<< HERE
return model
where stab_result is a Stabilizer right before the final Dense layer in the decoder. I can see in the dot-file that there are spurious trailing dimensions of size 1 that appear in the middle of the AttentionModel implementation.
Setting attention_axis to -1 gives the following error:
RuntimeError: Binary elementwise operation ElementTimes: Left operand 'Output('Block346442_Output_0', [#, outAxis], [64])'
shape '[64]' is not compatible with right operand
'Output('attention_weights', [#, outAxis], [200])' shape '[200]'.
where 64 is my attention_dim and 200 is my attention_span. As I understand, the elementwise * inside the attention model definitely shouldn't be conflating these two together, therefore -1 is definitely not the right axis here.
Question 3: Is my understanding above correct? What should be the right axis and why is it causing one of the two exceptions above?
Thanks for the explanations!

First, some good news: A couple of things have been fixed in the AttentionModel in the latest master (will be generally available with CNTK 2.2 in a few days):
You don't need to specify an attention_span or an attention_axis. If you don't specify them and leave them at their default values, the attention is computed over the whole sequence. In fact these arguments have been deprecated.
If you do the above the 204 notebook runs 2x faster, so the 204 notebook does not use these arguments anymore
A bug has been fixed in the AttentionModel and it now faithfully implements the Bahdanau et. al. paper.
Regarding your questions:
The dimension is not negative. We use certain negative numbers in various places to mean certain things: -1 is a dimension that will be inferred once based on the first minibatch, -2 is I think the shape of a placeholder, and -3 is a dimension that will be inferred with each minibatch (such as when you feed variable sized images to convolutions). I think if you print the graph after the first minibatch, you should see all shapes are concrete.
attention_axis is an implementation detail that should have been hidden. Basically attention_axis=-3 will create a shape of (1, 1, 200), attention_axis=-4 will create a shape of (1, 1, 1, 200) and so on. In general anything more than -3 is not guaranteed to work and anything less than -3 just adds more 1s without any clear benefit. The good news of course is that you can just ignore this argument in the latest master.
TL;DR: If you are in master (or starting with CNTK 2.2 in a few days) replace AttentionModel(attention_dim, attention_span=200, attention_axis=-3) with
AttentionModel(attention_dim). It is faster and does not contain confusing arguments. Starting from CNTK 2.2 the original API is deprecated.

Select which tensor to use in middle of TensorFlow graph

In Tensorflow, how would I go about selecting between a python list of Tensors in the middle of my graph as an input to the rest of the graph?
Basically, I have a python list of Tensors that are candidates to be used as inputs in the rest of the graph. I want to select from one of them without adding extra dependencies that require all of the Tensors in the list to be computed (I think that would happen if I used tf.cond). How can I select one of them? I can't do it at the python level because I choose the tensor based on a value computed from a placeholder. So for example:'
x = tf.placeholder(tf.int32, shape=(num_steps, None))
y = tf.placeholder(tf.int32, shape=(None,))
lengths = tf.placeholder(tf.int32, shape=(None,))
# Pretend there is a bunch of lines of code here
output_index = max_sequence_length = tf.reduce_max(lengths)
final_output = potential_outputs[output_index] # won't work, output_index is Tensor
# Pretend the rest of the model uses final_output
More info if you want it:
I am unrolling an RNN and I want to only unroll to the maximum length of the sequence. When this is less then the number of unrolling steps, there is a lot of wasted computation. Dynamic_rnn and static_rnn do not meet my needs, so I am trying to come up with my own custom method of unrolling the graph.

To index in tensorflow use tf.slice.
It should be noted that based on the code you provided, I don't think you are indexing the outputs correctly using tf.reduce_max function since this is providing the actual maximum value across a given axis which may not be an integer (but I'm not sure how your network works). You may be looking for tf.argmax that returns to index for the maximum value. The issue with this however is that tensorflow does not a have a gradient defined for tf.argmax so that function cannot be a learned part of your network.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.