I have a 4d tensor output for an object detector that outputs per-pixel, per class box predictions, i.e. Shape H x W x C x 6, where the innermost 6 wide dimension is box parameter for that class. Now, when computing loss, I want to update only the predictions from the ground truth class. To do this, I'd like to have a tensor with shape H x W whose elements are the ground truth class index. This tensor is then used to extract the relevant class only from the input, outputting a tensor with H x W x 6. I know this should be possible using gather or gather_nd but I can't get the parameters right to get the desired output. Plus I'm confused about the purpose of the batch_dims parameter for gather_nd, that may be relevant though to solving this. Any suggestions on how I can properly use these or some other tf function to achieve this result?
Related
I am trying to perform some transformations on a batch of images of size [128,128,3] with a batch_size of 20. I have managed to do part of the transformations on a single image but I'm not sure how to extend the same for the entire batch.
x is the batched image tensor of shape [20,128,128,3]
grads is also a tensor of shape [20,128,128,3]
I want to find the maximum value of grads per image ie I will get 20 max values. Let us call the max value for image_i as max_grads_i.
add_max_grads is a tensor of shape [20,128,128,3]
I want to update the value of add_max_grads such that for image_i I need to update the corresponding add_max_grads at the position of the max_grads_i in grads tensor and I need to update this across the 3 channels.
For instance, for image i the shape is [128,128,3], if the max_grads_i was found at grads[50][60][1], I want to update add_max_grads at [50][60][0], [50][60][1], [50][60][2] with the same constant value.
Eventually, this needs to be extended for all the images in the batch to create add_max_grads which has a shape [20,128,128,3].
How can I achieve this?
Currently I am using tf.math.reduce_max(grads,(1,2,3)) to find the max value in the grads tensor for each image in the batch which returns a tensor of shape [20,] but I'm stuck at the rest.
I think this does what you're asking, if I've understood it right.
This is a self-contained example.
import tensorflow as tf
#tshape = (20,128,128,3) # The real shape needed
tshape = (2,4,4,3) # A small shape, to allow print output to be readable
# A sample x vector, set to zeros here so we can see what gets updated easily
x = tf.zeros(shape=tshape, dtype=tf.float32)
# grads, same shape as x, let's use a random matrix here
grads = tf.random.normal(shape=tshape)
print(f"grads: \n{grads}")
# ---- Now calculate the indices we will use in tf.tensor_scatter_nd_update
# 4-D mask tensor (m, x, y, c) with 1 in max locations (by m), zeros otherwise
mask4d = tf.cast(grads == tf.reduce_max(grads, axis=(1,2,3), keepdims=True), dtype=tf.int32)
# 3D mask tensor (m, x, y) with 1 in max pixel locations (by m) across channel
mask3d = tf.reduce_max(mask4d, axis=3)
# indices of maximum values, each item gives m, x, y
indices = tf.where(mask3d)
print(f"\nindices\n{indices}")
# ---- Now calculate the updates we will use in tf.tensor_scatter_nd_update. This has shape (m, c)
newval = 999
updates = tf.constant(newval, shape=(tshape[0], tshape[3]), dtype=tf.float32)
# ---- Now modify x by scattering newval through it. Each entry in indices indicates a slice x[m, x, y, :]
x_updated = tf.tensor_scatter_nd_update(x, indices, updates)
print(f"\nx_updated\n{x_updated}")
What I am trying to do is take a numpy array representing 3D image data and calculate the hessian matrix for every voxel. My input is a matrix of shape (Z,X,Y) and I can easily take a slice along z and retrieve a single original image.
gx, gy, gz = np.gradient(imgs)
gxx, gxy, gxz = np.gradient(gx)
gyx, gyy, gyz = np.gradient(gy)
gzx, gzy, gzz = np.gradient(gz)
And I can access the hessian for an individual voxel as follows:
x = 100
y = 100
z = 63
H = [[gxx[z][x][y], gxy[z][x][y], gxz[z][x][y]],
[gyx[z][x][y], gyy[z][x][y], gyz[z][x][y]],
[gzx[z][x][y], gzy[z][x][y], gzz[z][x][y]]]
But this is cumbersome and I can't easily slice the data.
I have tried using reshape as follows
H = H.reshape(Z, X, Y, 3, 3)
But when I test this by retrieving the hessian for a specific voxel the, the value returned from the reshaped array is completely different than the original array.
I think I could use zip somehow but I have only been able to find that for making lists of tuples.
Bonus: If there's a faster way to accomplish this please let me know, I essentially need to calculate the three eigenvalues of the hessian matrix for every voxel in the 3D data set. Calculating the hessian values is really fast but finding the eigenvalues for a single 2D image slice takes about 20 seconds. Are there any GPUs or tensor flow accelerated libraries for image processing?
We can use a list comprehension to get the hessians -
H_all = np.array([np.gradient(i) for i in np.gradient(imgs)]).transpose(2,3,4,0,1)
Just to give it a bit of explanation : [np.gradient(i) for i in np.gradient(imgs)] loops through the two levels of outputs from np.gradient calls, resulting in a (3 x 3) shaped tensor at the outer two axes. We need these two as the last two axes in the final output. So, we push those at the end with the transpose.
Thus, H_all holds all the hessians and hence we can extract our specific hessian given x,y,z, like so -
x = 100
y = 100
z = 63
H = H_all[z,y,x]
What I am trying to do is have a weight matrix for my neural network which grows in size (i.e. a neuron is added to it each iteration). However, I do not want to use tf.Variable again as this will waste memory by copying the values in the previous matrix not expanding the matrix itself.
I have seen that people use tf.assign with validate_shape set to False, however, this does not change the shape of the variable correctly which I believed was a bug but the tensorflow GitHub did not seem to agree (I don't understand why from their reply).
Below is a simplified example of the problem. x is the matrix that I want to expand so that it can be added to z. If anyone knows a solution to what I am trying to achieve here I would be very grateful =)
import tensorflow as tf
import numpy as np
# Initialise some variables
sess = tf.Session()
x = tf.Variable(tf.truncated_normal([2, 4], stddev = 0.04))
z = tf.Variable(tf.truncated_normal([3, 4], stddev = 0.04))
sess.run(tf.variables_initializer([x, z]))
# Enlarge the matrix by assigning it a new set of values
sess.run(tf.assign(x, tf.concat((x, tf.cast(tf.truncated_normal([1, 4], stddev = 0.04), tf.float32)), 0), validate_shape=False))
# Print shapes of matrices, notice that x's actual shape is different for the
# shape tensorflow has recorded for it
print(x.get_shape())
print(x.eval(session=sess).shape)
print(z.get_shape())
print(z.eval(session=sess).shape)
# Add two matrices with equal shapes
print(tf.add(x, z).eval(session=sess))
Note: I realize that if I initialized z to the shape (2, 4) and then expanded it with tf.assign (as I do with x) the above example will work. But due to another constraint, I cannot control the original shape of z.
Tensors in tensorflow are immutable, so you can't re-scale them easily.
You can attempt to pad with 0's and then access parts of the matrix with tf.gather() as shown here How to select rows from a 3-D Tensor in TensorFlow?
to effect the "submatrix" within the larger padded matrix. This however does not seem to be an easy or elegant solution.
I am solving a detection problem using ConvNet. However in my case the labels are matrix of dimension [3 x 5] for each image. I use Caffe for this work. I read the images using the Datalayer while I read the labels using HDF5Layer.
The HDF5Layer reads the [3x5] label matrix as [1x15] dimensional vector.
So I used Reshape Layer with reshape the vector into matrix before computing the L2-loss. However I realized that the reshape layer formats the data in H x W while my label matrix is [W x H] i.e., [w=3, h=5]
hence the reshape is incorrect. I wonder is there a way to reshape the [1x15] label vector in right order i.e., [3x5] and not [5x3]
Another way I thought I can work around is by flatten the output form Convolutional layer into [1 x 15] and then compute the loss using my [1 x 15] label.
I am showing the problem using Figures for better understanding because of my poor English.
Eample of my input matrix label (Note the images are just enlarged for illustration)
Result of Caffe Reshape Layer
Any suggestion if I am doing it right?
Either way of computing the loss is just fine. In fact, computing in the 1x15 shape will save you the time of converting. The loss computation is still pixel-by pixel; the logical organization doesn't matter.
Using the same idea, it doesn't really matter whether you compute 3x5 or 5x3; all that matters is that your convolutional output and your label properly match each other.
If you want the display (graph, picture, etc.) to match, perhaps you can just switch the x and y designations before you plot the output.
I have a tensor in the shape (n_samples, n_steps, n_features). I want to decompose this into a tensor of shape (n_samples, n_components).
I need a method of decomposition that has a .fit(...) so that I can apply the same decomposition to a new batch of samples. I have been looking at Tucker Decomposition and PARAFAC Decomposition, but neither have that crucial .fit(...) and .transform(...) functionality. (Or at least I think they don't?)
I could use PCA and train it on a representative sample and then call .transform(...) on the remaining samples, but I would rather have some sort of tensor decomposition that can handle all of the samples at once, so as to get a better idea of the differences between each sample.
This is what I mean by "tensor":
In fact tensors are merely a generalisation of scalars and vectors; a scalar is a zero rank tensor, and a vector is a first rank tensor. The rank (or order) of a tensor is defined by the number of directions (and hence the dimensionality of the array) required to describe it.
If you have any questions, please ask, I'll try to clarify my problem if needed.
EDIT: The best solution would be some type of kernel but I have yet to find a kernel that can deal with n-rank Tensors and not just 2D data
You can do this using the development (master) version of TensorLy. Specifically, you can use the new partial_tucker function (it is not yet updated in the documentation...).
Note that the following solution preserves the structure of the tensor, i.e. a tensor of shape (n_samples, n_steps, n_features) is decomposed into a (smaller) tensor of shape (n_samples, n_components_1, n_components_2).
Code
Short answer: this is a very basic class that does what you want (and it would work on tensors of arbitrary order).
import tensorly as tl
from tensorly.decomposition._tucker import partial_tucker
class TensorPCA:
def __init__(self, ranks, modes):
self.ranks = ranks
self.modes = modes
def fit(self, tensor):
self.core, self.factors = partial_tucker(tensor, modes=self.modes, ranks=self.ranks)
return self
def transform(self, tensor):
return tl.tenalg.multi_mode_dot(tensor, self.factors, modes=self.modes, transpose=True)
Usage
Given an input tensor, you can use the previous class by first instantiating it with the desired ranks (size of the core tensor) and modes on which to perform the decomposition (in your 3D case, 1 and 2 since indexing starts at zero):
tpca = TensorPCA(ranks=[4, 5], modes=[1, 2])
tpca.fit(tensor)
Given a new tensor originally called new_tensor, you can project it using the transform method:
tpca.transform(new_tensor)
Explanation
Let's go through the code with an example: first let's import the necessary bits:
import numpy as np
import tensorly as tl
from tensorly.decomposition._tucker import partial_tucker
We then generate a random tensor:
tensor = np.random.random((10, 11, 12))
The next step is to decompose it along its second and third dimensions, or modes (as the first dimension corresponds to the samples):
core, factors = partial_tucker(tensor, modes=[1, 2], ranks=[4, 5])
The core corresponds to the transformed input tensor while factors is a list of two projection matrices, one for the second mode and one for the third mode. Given a new tensor, you can project it to the same subspace (the transform method) by projecting each of its last two dimensions:
tl.tenalg.multi_mode_dot(tensor, factors, modes=[1, 2], transpose=True)
The transposition here is equivalent to an inverse since the factors are orthogonal.
Finally, a note on the terminology: in general, even though it is sometimes done, it is probably best to not use interchangeably order and rank of a tensor. The order of a tensor is simply its number of dimensions while the rank of a tensor is usually a much more complicated notion which you could think of as a generalization of the notion of matrix rank.