How does tf.reshape() work internally ?

How does tf.reshape() work internally ? - python

I'm trying to understand how tf.reshape works. Let's have an example:
embeddings = tf.placeholder(tf.float32, shape=[N0,N1])
M_2D = tf.placeholder(tf.float32, shape=[N0,None])
M_3D = tf.reshape(M_2D, [-1,N0,1])
weighted_embeddings = tf.multiply(embeddings, M_3D)
Here I have a 2D tensor M_2D whose columns represent coefficients for the N0 embeddings of dimension N1. I want to create a 3D tensor where each column of M_2D is placed in the first dimension of M_3D, and columns are keep in the same order. My final goal is to create a 3D tensor of 2D embeddings, each weighted by the columns of M_2D.
How can I be sure that reshape actually place each column in the new dimension of M_3D. Is it possible that it places the rows instead ? Is there somewhere in tensorflow documentation a clear explanation on the internal working process of tf.reshape, particularly when -1 is provided ?

Tensor before and after tf.reshape have the same flatten order.
In tensorflow runtime, a Tensor is consists of raw data(byte array), shape, and dtype, tf.reshape only change shape, with raw data and dtype not changed. -1 or None in tf.reshape means that this value can be calculated.
For example,
# a tensor with 6 elements, with shape [3,2]
a = tf.constant([[1,2], [3,4], [5,6]])
# reshape tensor to [2, 3, 1], 2 is calculated by 6/3/1
b = tf.reshape(a, [-1, 3, 1])
In this example, a and b have the same flatten order, namely [1,2,3,4,5,6], a has the shape [3,2], its value is [[1,2], [3,4], [5,6]], b has the shape [2,3,1], its value is [[[1],[2],[3]],[[4],[5],[6]]].

Related

What is the difference between numpy array with dimension of size 1 and without that dimension

I'm might be a bit confused. But I wonder what is the difference between say x[2,3] and y[2,3,1] (same array but have extra dimension with size 1).
Are they the same or there is difference between them.

Let's take a 2D example
# shape (2,)
a = np.array([0,1])
# shape (2,1)
b = np.array([[3],[4]])
You can consider a to be a single row with 2 columns (actually a 1D vector), and b array to be 2 rows with one column.
Let's try to add them:
a+a
# addition on a single dimension
# array([0, 2])
b+b
# also common dimensions
# array([[6],
# [8]])
a+b
# different dimensions with one of common size
# addition will be broadcasted to generate a (2,2) shape
# array([[3, 5],
# [4, 6]])

Multiply a 3d tensor with a 2d matrix using torch.matmul

I have two tensors in PyTorch, z is a 3d tensor of shape (n_samples, n_features, n_views) in which n_samples is the number of samples in the dataset, n_features is the number of features for each sample, and n_views is the number of different views that describe the same (n_samples, n_features) feature matrix, but with other values.
I have another 2d tensor b, of shape (n_samples, n_views), which purpose is to rescale all the features of the samples across the different views. In other words, it encapsulates the importance of the features of each view for the same sample.
For example:
import torch
z = torch.Tensor(([[2,3], [1,1], [4,5]],
[[2,2], [1,2], [7,7]],
[[2,3], [1,1], [4,5]],
[[2,3], [1,1], [4,5]]))
b = torch.Tensor(([1, 0],
[0, 1],
[0.2, 0.8],
[0.5, 0.5]))
print(z.shape, b.shape)
>>>torch.Size([4, 3, 2]) torch.Size([4, 2])
I want to obtain a third tensor r of shape (n_samples, n_features) as a result of operations between z and b.
One possible solution is:
b = b.unsqueeze(1)
r = z * b
r = torch.sum(r, dim=-1)
print(r, r.shape)
>>>tensor([[2.0000, 1.0000, 4.0000],
[2.0000, 2.0000, 7.0000],
[2.8000, 1.0000, 4.8000],
[2.5000, 1.0000, 4.5000]]) torch.Size([4, 3])
Is it possible to achieve that same result using torch.matmul()?. I've tried many times to permute the dimensions of the two vector, but to no avail.

Yes that's possible. If you have mutiple batch dimensions in both operatns, you can use the broadcasting. In this case the last two dimensions of each operand are interpreted as a matrix size. (I recommend looking it up in the documentation.)
So you need an additional dimension for your vectors b, to make them a n x 1 "matrix" (column vector):
# original implementation
b1 = b.unsqueeze(1)
r1 = z * b1
r1 = torch.sum(r1, dim=-1)
print(r1.shape)
# using torch.matmul
r2 = torch.matmul(z, b.unsqueeze(2))[...,0]
print(r2.shape)
print((r1-r2).abs().sum()) # should be zero if we do the same operation
Alternatively, torch.einsum also makes this very straightforward.
# using torch.einsum
r3 = torch.einsum('ijk,ik->ij', z, b)
print((r1-r3).abs().sum()) # should be zero if we do the same operation
einsum is a very powerful operation that can do a lot of things: you can permute tensor dimensions, sum along them, or perform scalar products, all with or without broadcasting. It is derived from the Einstein summation convention mostly used in physics. The rough idea is that you give every dimension of your operans a name, and then, using these names define what the output should look like. I think it is best to read the documentation. In our case we have a 4 x 3 x 2 tensor as well as a 4 x 2 tensor. So the let's call the dimensions of the first tensor ijk. Here i and k should be considered the same as the dimensions of the second tensor, so this one can be described as ik. Finally the output should have clearly be ij (it mus be a 4 x 3 tensor). From this "signature" ijk, ik -> ij it is clear that the dimension i is preserved, and the dimensions k must be "summe/multiplied" away (scalar product).

Why (X.shape[0], -1) is used as parameters while using reshape function on a matrix X?

While doing the deeplearning.ai course, on an instant I needed to use numpy.reshape(). However while doing so I was instructed in the course notebook to do it in a specific way.
The purpose was to convert a 4 dimensional vector to a 2 dimensional vector.
//
Instructions:
For convenience, you should now reshape images of shape (num_px, num_px, 3) in a numpy-array of shape (num_px ∗∗ num_px ∗∗ 3, 1). After this, our training (and test) dataset is a numpy-array where each column represents a flattened image. There should be m_train (respectively m_test) columns.
Exercise: Reshape the training and test data sets so that images of size (num_px, num_px, 3) are flattened into single vectors of shape (num_px ∗∗ num_px ∗∗ 3, 1).
A trick when you want to flatten a matrix X of shape (a,b,c,d) to a matrix X_flatten of shape (b∗∗c∗∗d, a) is to use:
X_flatten = X.reshape(X.shape[0], -1).T
(X.T is the transpose of X)
I am unable to understand why the parameters are given in such a way?
Also, while playing with the code, changing '-1' to any any negative integer, didn't change the output.

I am assuming you are working with the MNIST dataset, so you have n images of size mm3 lets assume n to be 100 and m to be 8. So you have 100 RGB-images(3 channels) of size 8*8, thus making your datashape 100,8,8,3. Now you would like to flatten each of the 100 images, so you could either loop through the dataset, and flatten it image by image, or you could reshape it.
You decide to reshape it via:
X.reshape(X.shape[0], -1).T
lets unpack this a bit more, X.shape[0] gives you 100. The shape attribute will return you a tuple of (100,8,8,3) since that is the shape of your dataset and you access its 0th element, that's 100, so you get
X.reshape(100, -1).T
So what this does it that it reshapes the array but makes sure that you still have 100 images, and what -1 states is that you do not care about what shape the result will be reshaped into, so it automatically infers the shape from the original shape. Previously you had a 4-D array of shape 100,8,8,3 but now you want to reshape it into a 2-D array, you specify that 100 should be dimension 0 of the shape, so numpy infers that to reshape it into such a 2-D shape it will have to flatten it, and thus 100,883 is the output shape.
After that you just transpose it
Also, this is what numpy documentation states
The new shape should be compatible with the original shape. If an
integer, then the result will be a 1-D array of that length. One shape
dimension can be -1. In this case, the value is inferred from the
length of the array and remaining dimensions.

Numpy Append Matrix to Tensor

I am trying to build a list of matrices using numpy, but when I try to append a matrix to an empty tensor, I get the error:
ValueError: all the input arrays must have same number of dimensions
Concatenate and append both seem to fail. I tried calling:
tensor = np.concatenate((tensor, matrix), axis=0)
and
tensor = np.append(tensor, matrix, axis=0)
but I get the same error either way.
The tensor starts with a size of [0, h, w], and the matrix is of size [h, w]. The matrix is the correct shape in the direction I want to append to, but it won't seem to attach.

It seems matrix would representing the incoming ones, while you accumulate those into tensor. So, to solve it, add a new axis with None/np.newaxis as the leading one to matrix and then concatenate with tensor -
np.concatenate((tensor, matrix[None]),axis=0)
If you are accumulating, store it back into tensor.
Or use np.vstack((tensor, matrix[None])).
Sample run -
In [16]: h,w = 3,4
...: a = np.random.rand(0,h,w)
...: b = np.random.rand(h,w)
In [17]: np.concatenate((a, b[None]),axis=0).shape
Out[17]: (1, 3, 4)

Proper usage of `tf.scatter_nd` in tensorflow-r1.2

Given indices with shape [batch_size, sequence_len], updates with shape [batch_size, sequence_len, sampled_size], to_shape with shape [batch_size, sequence_len, vocab_size], where vocab_size >> sampled_size, I'd like to use tf.scatter to map the updates to a huge tensor with to_shape, such that to_shape[bs, indices[bs, sz]] = updates[bs, sz]. That is, I'd like to map the updates to to_shape row by row. Please note that sequence_len and sampled_size are scalar tensors, while others are fixed. I tried to do the following:
new_tensor = tf.scatter_nd(tf.expand_dims(indices, axis=2), updates, to_shape)
But I got an error:
ValueError: The inner 2 dimension of output.shape=[?,?,?] must match the inner 1 dimension of updates.shape=[80,50,?]: Shapes must be equal rank, but are 2 and 1 for .... with input shapes: [80, 50, 1], [80, 50,?], [3]
Could you please tell me how to use scatter_nd properly? Thanks in advance!

So assuming you have:
A tensor updates with shape [batch_size, sequence_len, sampled_size].
A tensor indices with shape [batch_size, sequence_len, sampled_size].
Then you do:
import tensorflow as tf
# Create updates and indices...
# Create additional indices
i1, i2 = tf.meshgrid(tf.range(batch_size),
tf.range(sequence_len), indexing="ij")
i1 = tf.tile(i1[:, :, tf.newaxis], [1, 1, sampled_size])
i2 = tf.tile(i2[:, :, tf.newaxis], [1, 1, sampled_size])
# Create final indices
idx = tf.stack([i1, i2, indices], axis=-1)
# Output shape
to_shape = [batch_size, sequence_len, vocab_size]
# Get scattered tensor
output = tf.scatter_nd(idx, updates, to_shape)
tf.scatter_nd takes an indices tensor, an updates tensor and some shape. updates is the original tensor, and the shape is just the desired output shape, so [batch_size, sequence_len, vocab_size]. Now, indices is more complicated. Since your output has 3 dimensions (rank 3), for each of the elements in updates you need 3 indices to determine where in the output each element is going to be placed. So the shape of the indices parameter should be the same as updates with an additional dimension of size 3. In this case, we want the first to dimensions to be the same, but we still have to specify the 3 indices. So we use tf.meshgrid to generate the indices that we need and we tile them along the third dimension (the first and second index for each element vector in the last dimension of updates is the same). Finally, we stack these indices with the previously created mapping indices and we have our full 3-dimensional indices.

I think you might be looking for this.
def permute_batched_tensor(batched_x, batched_perm_ids):
indices = tf.tile(tf.expand_dims(batched_perm_ids, 2), [1,1,batched_x.shape[2]])
# Create additional indices
i1, i2 = tf.meshgrid(tf.range(batched_x.shape[0]),
tf.range(batched_x.shape[2]), indexing="ij")
i1 = tf.tile(i1[:, tf.newaxis, :], [1, batched_x.shape[1], 1])
i2 = tf.tile(i2[:, tf.newaxis, :], [1, batched_x.shape[1], 1])
# Create final indices
idx = tf.stack([i1, indices, i2], axis=-1)
temp = tf.scatter_nd(idx, batched_x, batched_x.shape)
return temp

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.