How to enlarge a matrix in tensorflow without duplicating values? - python

What I am trying to do is have a weight matrix for my neural network which grows in size (i.e. a neuron is added to it each iteration). However, I do not want to use tf.Variable again as this will waste memory by copying the values in the previous matrix not expanding the matrix itself.
I have seen that people use tf.assign with validate_shape set to False, however, this does not change the shape of the variable correctly which I believed was a bug but the tensorflow GitHub did not seem to agree (I don't understand why from their reply).
Below is a simplified example of the problem. x is the matrix that I want to expand so that it can be added to z. If anyone knows a solution to what I am trying to achieve here I would be very grateful =)
import tensorflow as tf
import numpy as np
# Initialise some variables
sess = tf.Session()
x = tf.Variable(tf.truncated_normal([2, 4], stddev = 0.04))
z = tf.Variable(tf.truncated_normal([3, 4], stddev = 0.04))
sess.run(tf.variables_initializer([x, z]))
# Enlarge the matrix by assigning it a new set of values
sess.run(tf.assign(x, tf.concat((x, tf.cast(tf.truncated_normal([1, 4], stddev = 0.04), tf.float32)), 0), validate_shape=False))
# Print shapes of matrices, notice that x's actual shape is different for the
# shape tensorflow has recorded for it
print(x.get_shape())
print(x.eval(session=sess).shape)
print(z.get_shape())
print(z.eval(session=sess).shape)
# Add two matrices with equal shapes
print(tf.add(x, z).eval(session=sess))
Note: I realize that if I initialized z to the shape (2, 4) and then expanded it with tf.assign (as I do with x) the above example will work. But due to another constraint, I cannot control the original shape of z.

Tensors in tensorflow are immutable, so you can't re-scale them easily.
You can attempt to pad with 0's and then access parts of the matrix with tf.gather() as shown here How to select rows from a 3-D Tensor in TensorFlow?
to effect the "submatrix" within the larger padded matrix. This however does not seem to be an easy or elegant solution.

Related

Traceback while trying to implement 2D Convolutional layer in python: [duplicate]

It has been firmly established that my_tensor.detach().numpy() is the correct way to get a numpy array from a torch tensor.
I'm trying to get a better understanding of why.
In the accepted answer to the question just linked, Blupon states that:
You need to convert your tensor to another tensor that isn't requiring a gradient in addition to its actual value definition.
In the first discussion he links to, albanD states:
This is expected behavior because moving to numpy will break the graph and so no gradient will be computed.
If you don’t actually need gradients, then you can explicitly .detach() the Tensor that requires grad to get a tensor with the same content that does not require grad. This other Tensor can then be converted to a numpy array.
In the second discussion he links to, apaszke writes:
Variable's can’t be transformed to numpy, because they’re wrappers around tensors that save the operation history, and numpy doesn’t have such objects. You can retrieve a tensor held by the Variable, using the .data attribute. Then, this should work: var.data.numpy().
I have studied the internal workings of PyTorch's autodifferentiation library, and I'm still confused by these answers. Why does it break the graph to to move to numpy? Is it because any operations on the numpy array will not be tracked in the autodiff graph?
What is a Variable? How does it relate to a tensor?
I feel that a thorough high-quality Stack-Overflow answer that explains the reason for this to new users of PyTorch who don't yet understand autodifferentiation is called for here.
In particular, I think it would be helpful to illustrate the graph through a figure and show how the disconnection occurs in this example:
import torch
tensor1 = torch.tensor([1.0,2.0],requires_grad=True)
print(tensor1)
print(type(tensor1))
tensor1 = tensor1.numpy()
print(tensor1)
print(type(tensor1))
I think the most crucial point to understand here is the difference between a torch.tensor and np.ndarray:
While both objects are used to store n-dimensional matrices (aka "Tensors"), torch.tensors has an additional "layer" - which is storing the computational graph leading to the associated n-dimensional matrix.
So, if you are only interested in efficient and easy way to perform mathematical operations on matrices np.ndarray or torch.tensor can be used interchangeably.
However, torch.tensors are designed to be used in the context of gradient descent optimization, and therefore they hold not only a tensor with numeric values, but (and more importantly) the computational graph leading to these values. This computational graph is then used (using the chain rule of derivatives) to compute the derivative of the loss function w.r.t each of the independent variables used to compute the loss.
As mentioned before, np.ndarray object does not have this extra "computational graph" layer and therefore, when converting a torch.tensor to np.ndarray you must explicitly remove the computational graph of the tensor using the detach() command.
Computational Graph
From your comments it seems like this concept is a bit vague. I'll try and illustrate it with a simple example.
Consider a simple function of two (vector) variables, x and w:
x = torch.rand(4, requires_grad=True)
w = torch.rand(4, requires_grad=True)
y = x # w # inner-product of x and w
z = y ** 2 # square the inner product
If we are only interested in the value of z, we need not worry about any graphs, we simply moving forward from the inputs, x and w, to compute y and then z.
However, what would happen if we do not care so much about the value of z, but rather want to ask the question "what is w that minimizes z for a given x"?
To answer that question, we need to compute the derivative of z w.r.t w.
How can we do that?
Using the chain rule we know that dz/dw = dz/dy * dy/dw. That is, to compute the gradient of z w.r.t w we need to move backward from z back to w computing the gradient of the operation at each step as we trace back our steps from z to w. This "path" we trace back is the computational graph of z and it tells us how to compute the derivative of z w.r.t the inputs leading to z:
z.backward() # ask pytorch to trace back the computation of z
We can now inspect the gradient of z w.r.t w:
w.grad # the resulting gradient of z w.r.t w
tensor([0.8010, 1.9746, 1.5904, 1.0408])
Note that this is exactly equals to
2*y*x
tensor([0.8010, 1.9746, 1.5904, 1.0408], grad_fn=<MulBackward0>)
since dz/dy = 2*y and dy/dw = x.
Each tensor along the path stores its "contribution" to the computation:
z
tensor(1.4061, grad_fn=<PowBackward0>)
And
y
tensor(1.1858, grad_fn=<DotBackward>)
As you can see, y and z stores not only the "forward" value of <x, w> or y**2 but also the computational graph -- the grad_fn that is needed to compute the derivatives (using the chain rule) when tracing back the gradients from z (output) to w (inputs).
These grad_fn are essential components to torch.tensors and without them one cannot compute derivatives of complicated functions. However, np.ndarrays do not have this capability at all and they do not have this information.
please see this answer for more information on tracing back the derivative using backwrd() function.
Since both np.ndarray and torch.tensor has a common "layer" storing an n-d array of numbers, pytorch uses the same storage to save memory:
numpy() → numpy.ndarray
Returns self tensor as a NumPy ndarray. This tensor and the returned ndarray share the same underlying storage. Changes to self tensor will be reflected in the ndarray and vice versa.
The other direction works in the same way as well:
torch.from_numpy(ndarray) → Tensor
Creates a Tensor from a numpy.ndarray.
The returned tensor and ndarray share the same memory. Modifications to the tensor will be reflected in the ndarray and vice versa.
Thus, when creating an np.array from torch.tensor or vice versa, both object reference the same underlying storage in memory. Since np.ndarray does not store/represent the computational graph associated with the array, this graph should be explicitly removed using detach() when sharing both numpy and torch wish to reference the same tensor.
Note, that if you wish, for some reason, to use pytorch only for mathematical operations without back-propagation, you can use with torch.no_grad() context manager, in which case computational graphs are not created and torch.tensors and np.ndarrays can be used interchangeably.
with torch.no_grad():
x_t = torch.rand(3,4)
y_np = np.ones((4, 2), dtype=np.float32)
x_t # torch.from_numpy(y_np) # dot product in torch
np.dot(x_t.numpy(), y_np) # the same dot product in numpy
I asked, Why does it break the graph to to move to numpy? Is it because any operations on the numpy array will not be tracked in the autodiff graph?
Yes, the new tensor will not be connected to the old tensor through a grad_fn, and so any operations on the new tensor will not carry gradients back to the old tensor.
Writing my_tensor.detach().numpy() is simply saying, "I'm going to do some non-tracked computations based on the value of this tensor in a numpy array."
The Dive into Deep Learning (d2l) textbook has a nice section describing the detach() method, although it doesn't talk about why a detach makes sense before converting to a numpy array.
Thanks to jodag for helping to answer this question. As he said, Variables are obsolete, so we can ignore that comment.
I think the best answer I can find so far is in jodag's doc link:
To stop a tensor from tracking history, you can call .detach() to detach it from the computation history, and to prevent future computation from being tracked.
and in albanD's remarks that I quoted in the question:
If you don’t actually need gradients, then you can explicitly .detach() the Tensor that requires grad to get a tensor with the same content that does not require grad. This other Tensor can then be converted to a numpy array.
In other words, the detach method means "I don't want gradients," and it is impossible to track gradients through numpy operations (after all, that is what PyTorch tensors are for!)
This is a little showcase of a tensor -> numpy array connection:
import torch
tensor = torch.rand(2)
numpy_array = tensor.numpy()
print('Before edit:')
print(tensor)
print(numpy_array)
tensor[0] = 10
print()
print('After edit:')
print('Tensor:', tensor)
print('Numpy array:', numpy_array)
Output:
Before edit:
Tensor: tensor([0.1286, 0.4899])
Numpy array: [0.1285522 0.48987144]
After edit:
Tensor: tensor([10.0000, 0.4899])
Numpy array: [10. 0.48987144]
The value of the first element is shared by the tensor and the numpy array. Changing it to 10 in the tensor changed it in the numpy array as well.

Fast sequential lists for tensorflow?

I have an array A of matrices (or a 3-dim tensor) and I want to do the following:
Denote each matrix with a number, so A is [1,2,3,4,...,], and let's say that we have a window of length 3, I want to pass as input to a TensorFlow graph the 4-dim array [[1,2,3],[2,3,4],[3,4,5],....]. What's the most efficient way of doing this? (It's a bit like a convolution with a constant kernel, but without summing over the resulting matrices).
At the moment this is what I'm doing:
input_NN = [data[t, t + window] for t in range(my_range)]
and then I pass it to a TF placeholder.
Shall I think of a better way of doing it in numpy and pass the result to a placeholder or is there a fast way of doing this in TensorFlow by passing A directly?

Saving confusion matrix

Is there any possibility to save the confusion matrix which is generated by sklearn.metrics?
I would like to save multiple results of different classification algorithms in an array or maybe a pandas data frame so I can show which algorithm works best.
print('Neural net: \n',confusion_matrix(Y_test, Y_pred), sep=' ')
How could I save the generated confusion matrix within a loop? (I am training over a set of 200 different target variables)
array[i] = confusion_matrix(Y_test,Y_pred)
I run into some definition problems here [array is not defined whereas in the non [i] - version it runs smoothly]
Additionally, I am normalizing the confusion matrix. How could I print out the average result of the confusion matrix after the whole loop? (average of the 200 different confusion matrices)
I am not that fluent with python yet.
First getting to array not defined problem.
In python list is declared as :
array=[]
Since size of list is not given during declaration, no space is allocated. Hence we can't assign values the place which is not allocated.
array[i]=some value, but no space is allocated for array
So if you know the required size of array,fill zeroes during declaration and the use array this way or use array.append() method inside the loop.
Now for saving confusion matrix:
Since confusion matrix returns 2-D array and you need to save multiple such arrays, use 3-D array for saving the value.
import numpy as np
matrix_result=np.zeroes((200,len(y_pred),len(y_pred)))
for i in range(200):
matrix_result[i]=confusion_matrix(X_pred,y_pred)
For averaging
matrix_result_average=matrix_result.mean(axis=0)
I'm not sure what you mean by training over a set of target variables (please elaborate), but here is a start at averaging over confusion matrices, using numpy.
First an empty result matrix is created, which is three-dimensional and the size of 200 stacked confusion matrices. These are then filled in one-by-one in the for-loop. Finally the resulting matrix is averaged along the dimension of the targets, resulting in the average confusion matrix.
import numpy as np
N = len(Y_pred)
result = np.zeros((len(targets), N, N))
for i, target in enumerate(targets):
result[i] = confusion_matrix(Y_test, Y_pred) # do someting with target?
print(result.mean(axis=0))

Caffe Multi-Label Matrix Input

I am solving a detection problem using ConvNet. However in my case the labels are matrix of dimension [3 x 5] for each image. I use Caffe for this work. I read the images using the Datalayer while I read the labels using HDF5Layer.
The HDF5Layer reads the [3x5] label matrix as [1x15] dimensional vector.
So I used Reshape Layer with reshape the vector into matrix before computing the L2-loss. However I realized that the reshape layer formats the data in H x W while my label matrix is [W x H] i.e., [w=3, h=5]
hence the reshape is incorrect. I wonder is there a way to reshape the [1x15] label vector in right order i.e., [3x5] and not [5x3]
Another way I thought I can work around is by flatten the output form Convolutional layer into [1 x 15] and then compute the loss using my [1 x 15] label.
I am showing the problem using Figures for better understanding because of my poor English.
Eample of my input matrix label (Note the images are just enlarged for illustration)
Result of Caffe Reshape Layer
Any suggestion if I am doing it right?
Either way of computing the loss is just fine. In fact, computing in the 1x15 shape will save you the time of converting. The loss computation is still pixel-by pixel; the logical organization doesn't matter.
Using the same idea, it doesn't really matter whether you compute 3x5 or 5x3; all that matters is that your convolutional output and your label properly match each other.
If you want the display (graph, picture, etc.) to match, perhaps you can just switch the x and y designations before you plot the output.

Decomposing 3rd Order Tensor in Python

I have a tensor in the shape (n_samples, n_steps, n_features). I want to decompose this into a tensor of shape (n_samples, n_components).
I need a method of decomposition that has a .fit(...) so that I can apply the same decomposition to a new batch of samples. I have been looking at Tucker Decomposition and PARAFAC Decomposition, but neither have that crucial .fit(...) and .transform(...) functionality. (Or at least I think they don't?)
I could use PCA and train it on a representative sample and then call .transform(...) on the remaining samples, but I would rather have some sort of tensor decomposition that can handle all of the samples at once, so as to get a better idea of the differences between each sample.
This is what I mean by "tensor":
In fact tensors are merely a generalisation of scalars and vectors; a scalar is a zero rank tensor, and a vector is a first rank tensor. The rank (or order) of a tensor is defined by the number of directions (and hence the dimensionality of the array) required to describe it.
If you have any questions, please ask, I'll try to clarify my problem if needed.
EDIT: The best solution would be some type of kernel but I have yet to find a kernel that can deal with n-rank Tensors and not just 2D data
You can do this using the development (master) version of TensorLy. Specifically, you can use the new partial_tucker function (it is not yet updated in the documentation...).
Note that the following solution preserves the structure of the tensor, i.e. a tensor of shape (n_samples, n_steps, n_features) is decomposed into a (smaller) tensor of shape (n_samples, n_components_1, n_components_2).
Code
Short answer: this is a very basic class that does what you want (and it would work on tensors of arbitrary order).
import tensorly as tl
from tensorly.decomposition._tucker import partial_tucker
class TensorPCA:
def __init__(self, ranks, modes):
self.ranks = ranks
self.modes = modes
def fit(self, tensor):
self.core, self.factors = partial_tucker(tensor, modes=self.modes, ranks=self.ranks)
return self
def transform(self, tensor):
return tl.tenalg.multi_mode_dot(tensor, self.factors, modes=self.modes, transpose=True)
Usage
Given an input tensor, you can use the previous class by first instantiating it with the desired ranks (size of the core tensor) and modes on which to perform the decomposition (in your 3D case, 1 and 2 since indexing starts at zero):
tpca = TensorPCA(ranks=[4, 5], modes=[1, 2])
tpca.fit(tensor)
Given a new tensor originally called new_tensor, you can project it using the transform method:
tpca.transform(new_tensor)
Explanation
Let's go through the code with an example: first let's import the necessary bits:
import numpy as np
import tensorly as tl
from tensorly.decomposition._tucker import partial_tucker
We then generate a random tensor:
tensor = np.random.random((10, 11, 12))
The next step is to decompose it along its second and third dimensions, or modes (as the first dimension corresponds to the samples):
core, factors = partial_tucker(tensor, modes=[1, 2], ranks=[4, 5])
The core corresponds to the transformed input tensor while factors is a list of two projection matrices, one for the second mode and one for the third mode. Given a new tensor, you can project it to the same subspace (the transform method) by projecting each of its last two dimensions:
tl.tenalg.multi_mode_dot(tensor, factors, modes=[1, 2], transpose=True)
The transposition here is equivalent to an inverse since the factors are orthogonal.
Finally, a note on the terminology: in general, even though it is sometimes done, it is probably best to not use interchangeably order and rank of a tensor. The order of a tensor is simply its number of dimensions while the rank of a tensor is usually a much more complicated notion which you could think of as a generalization of the notion of matrix rank.

Categories