I am solving a detection problem using ConvNet. However in my case the labels are matrix of dimension [3 x 5] for each image. I use Caffe for this work. I read the images using the Datalayer while I read the labels using HDF5Layer.
The HDF5Layer reads the [3x5] label matrix as [1x15] dimensional vector.
So I used Reshape Layer with reshape the vector into matrix before computing the L2-loss. However I realized that the reshape layer formats the data in H x W while my label matrix is [W x H] i.e., [w=3, h=5]
hence the reshape is incorrect. I wonder is there a way to reshape the [1x15] label vector in right order i.e., [3x5] and not [5x3]
Another way I thought I can work around is by flatten the output form Convolutional layer into [1 x 15] and then compute the loss using my [1 x 15] label.
I am showing the problem using Figures for better understanding because of my poor English.
Eample of my input matrix label (Note the images are just enlarged for illustration)
Result of Caffe Reshape Layer
Any suggestion if I am doing it right?
Either way of computing the loss is just fine. In fact, computing in the 1x15 shape will save you the time of converting. The loss computation is still pixel-by pixel; the logical organization doesn't matter.
Using the same idea, it doesn't really matter whether you compute 3x5 or 5x3; all that matters is that your convolutional output and your label properly match each other.
If you want the display (graph, picture, etc.) to match, perhaps you can just switch the x and y designations before you plot the output.
Related
I'm implementing my own neural network from scratch (educational purposes, I know there are faster and better libraries for this) and for that I'm trying to calculate the derivative of a fully connected layer. I know the following:
and assuming I have a way to calculate the derivative of f using f.derivative(<some_matrix>), how can I use numpy to efficiently calculate the derivative of f(XW) with respect to W as seen in the picture?
I want to be able to calculate the derivative for N different inputs at the same time (giving me a 4-d tensor, 1 dimension for the N samples, and 3 dimensions for the derivative in the image).
Note: f.derivative takes in a matrix of N inputs with d features each, and returns the derivative of each of the input points.
I have a 4d tensor output for an object detector that outputs per-pixel, per class box predictions, i.e. Shape H x W x C x 6, where the innermost 6 wide dimension is box parameter for that class. Now, when computing loss, I want to update only the predictions from the ground truth class. To do this, I'd like to have a tensor with shape H x W whose elements are the ground truth class index. This tensor is then used to extract the relevant class only from the input, outputting a tensor with H x W x 6. I know this should be possible using gather or gather_nd but I can't get the parameters right to get the desired output. Plus I'm confused about the purpose of the batch_dims parameter for gather_nd, that may be relevant though to solving this. Any suggestions on how I can properly use these or some other tf function to achieve this result?
In standard ANN for fully connected layers we are using the following formula: tf.matmul(X,weight) + bias. Which is clear to me, as we use matrix multiplication in order to connect input with th hidden layer.
But in GloVe implementation(https://nlp.stanford.edu/projects/glove/) we are using the following formula for embeddings multiplication: tf.matmul(W, tf.transpose(U)) what confuses me is tf.transpose(U)part.
Why do we use tf.matmul(W, tf.transpose(U)) instead of tf.matmul(W, U)?
It has to do with the choice of column vs row orientation for the vectors.
Note that weight is the second parameter here:
tf.matmul(X, weight)
But the first parameter, W, here:
tf.matmul(W, tf.transpose(U))
So what you are seeing is a practical application of the following matrix transpose identity:
To bring it back to your example, let's assume 10 inputs and 20 outputs.
The first approach uses row vectors. A single input X would be a 1x10 matrix, called a row vector because it has a single row. To match, the weight matrix needs to be 10x20 to produce an output of size 20.
But in the second approach the multiplication is reversed. That is a hint that everything is using column vectors. If the multiplication is reversed, then everything gets a transpose. So this example is using column vectors, so named because they have a single column.
That's why the transpose is there. The way they GLoVe authors have done their notation, with the multiplication reversed, the weight matrix W must already be transposed to 20x10 instead of 10x20. And they must be expecting a 20x1 column vector for the output.
So if the input vector U is naturally a 1x10 row vector, it also has to be transposed, to a 10x1 column vector, to fit in with everything else.
Basically you should pick row vectors or column vectors, all the time, and then the order of multiplications and the transposition of the weights is determined for you.
Personally I think that column vectors, as used by GloVe, are awkward and unnatural compared to row vectors. It's better to have the multiplication ordering follow the data flow ordering.
What I am trying to do is have a weight matrix for my neural network which grows in size (i.e. a neuron is added to it each iteration). However, I do not want to use tf.Variable again as this will waste memory by copying the values in the previous matrix not expanding the matrix itself.
I have seen that people use tf.assign with validate_shape set to False, however, this does not change the shape of the variable correctly which I believed was a bug but the tensorflow GitHub did not seem to agree (I don't understand why from their reply).
Below is a simplified example of the problem. x is the matrix that I want to expand so that it can be added to z. If anyone knows a solution to what I am trying to achieve here I would be very grateful =)
import tensorflow as tf
import numpy as np
# Initialise some variables
sess = tf.Session()
x = tf.Variable(tf.truncated_normal([2, 4], stddev = 0.04))
z = tf.Variable(tf.truncated_normal([3, 4], stddev = 0.04))
sess.run(tf.variables_initializer([x, z]))
# Enlarge the matrix by assigning it a new set of values
sess.run(tf.assign(x, tf.concat((x, tf.cast(tf.truncated_normal([1, 4], stddev = 0.04), tf.float32)), 0), validate_shape=False))
# Print shapes of matrices, notice that x's actual shape is different for the
# shape tensorflow has recorded for it
print(x.get_shape())
print(x.eval(session=sess).shape)
print(z.get_shape())
print(z.eval(session=sess).shape)
# Add two matrices with equal shapes
print(tf.add(x, z).eval(session=sess))
Note: I realize that if I initialized z to the shape (2, 4) and then expanded it with tf.assign (as I do with x) the above example will work. But due to another constraint, I cannot control the original shape of z.
Tensors in tensorflow are immutable, so you can't re-scale them easily.
You can attempt to pad with 0's and then access parts of the matrix with tf.gather() as shown here How to select rows from a 3-D Tensor in TensorFlow?
to effect the "submatrix" within the larger padded matrix. This however does not seem to be an easy or elegant solution.
I'm using scikit to perform text classification and I'm trying to understand where the points lie with respect to my hyperplane to decide how to proceed. But I can't seem to plot the data that comes from the CountVectorizer() function. I used the following function: pl.scatter(X[:, 0], X[:, 1]) and it gives me the error: ValueError: setting an array element with a sequence.
Any idea how to fix this?`
If X is a sparse matrix, you probably need X = X.todense() in order to get access to the data in the correct format. You probably want to check X.shape before doing this though, as if X is very large (but very sparse) it may consume a lot of memory when "densified".