PyTorch Linear layer input dimension mismatch - python

Im getting this error when passing the input data to the Linear (Fully Connected Layer) in PyTorch:
matrices expected, got 4D, 2D tensors
I fully understand the problem since the input data has a shape (N,C,H,W) (from a Convolutional+MaxPool layer) where:
N: Data Samples
C: Channels of the data
H,W: Height and Width
Nevertheless I was expecting PyTorch to do the "reshaping" of the data form:
[ N , D1,...Dn] --> [ N, D] where D = D1*D2*....Dn
I try to reshape the Variable.data, but I've read that this approach is not recommended since the gradients will conserve the previous shape, and that in general you should not mutate a Variable.data shape.
I am pretty sure there is a simple solution that goes along with the framework, but i haven't find it.
Is there a good solution for this?
PD: The Fully connected layer has as input size the value C * H * W

After reading some Examples I found the solution. here is how you do it without messing up the forward/backward pass flow:
(_, C, H, W) = x.data.size()
x = x.view( -1 , C * H * W)

A more general solution (would work regardless of how many dimensions x has) is to take the product of all dimension sizes but the first one (the "batch size"):
n_features = np.prod(x.size()[1:])
x = x.view(-1, n_features)

It is common to save the batch size and infer the other dimension in a flatten:
batch_size = x.shape[0]
...
x = x.view(batch_size, -1)

Related

How to do argmax in group in pytorch?

Is there any ways to implement maxpooling according to norm of sub vectors in a group in Pytorch? Specifically, this is what I want to implement:
Input:
x: a 2-D float tensor, shape #Nodes * dim
cluster: a 1-D long tensor, shape #Nodes
Output:
y, a 2-D float tensor, and:
y[i]=x[k] where k=argmax_{cluster[k]=i}(torch.norm(x[k],p=2)).
I tried torch.scatter with reduce="max", but this only works for dim=1 and x[i]>0.
Can someone help me to solve the problem?
I don't think there's any built-in function to do what you want. Basically this would be some form of scatter_reduce on the norm of x, but instead of selecting the max norm you want to select the row corresponding to the max norm.
A straightforward implementation may look something like this
"""
input
x: float tensor of size [NODES, DIMS]
cluster: long tensor of size [NODES]
output
float tensor of size [cluster.max()+1, DIMS]
"""
num_clusters = cluster.max().item() + 1
y = torch.zeros((num_clusters, DIMS), dtype=x.dtype, device=x.device)
for cluster_id in torch.unique(cluster):
x_cluster = x[cluster == cluster_id]
y[cluster_id] = x_cluster[torch.argmax(torch.norm(x_cluster, dim=1), dim=0)]
Which should work just fine if clusters.max() is relatively small. If there are many clusters though then this approach has to unnecessarily create masks over cluster for every unique cluster id. To avoid this you can make use of argsort. The best I could come up with in pure python was the following.
num_clusters = cluster.max().item() + 1
x_norm = torch.norm(x, dim=1)
cluster_sortidx = torch.argsort(cluster)
cluster_ids, cluster_counts = torch.unique_consecutive(cluster[cluster_sortidx], return_counts=True)
end_indices = torch.cumsum(cluster_counts, dim=0).cpu().tolist()
start_indices = [0] + end_indices[:-1]
y = torch.zeros((num_clusters, DIMS), dtype=x.dtype, device=x.device)
for cluster_id, a, b in zip(cluster_ids, start_indices, end_indices):
indices = cluster_sortidx[a:b]
y[cluster_id] = x[indices[torch.argmax(x_norm[indices], dim=0)]]
For example in random tests with NODES = 60000, DIMS = 512, cluster.max()=6000 the first version takes about 620ms whie the second version takes about 78ms.

`torch.gather` without unbroadcasting

I have some batched input x of shape [batch, time, feature], and some batched indices i of shape [batch, new_time] which I want to gather into the time dim of x. As output of this operation I want a tensor y of shape [batch, new_time, feature] with values like this:
y[b, t', f] = x[b, i[b, t'], f]
In Tensorflow, I can accomplish this by using the batch_dims: int argument of tf.gather: y = tf.gather(x, i, axis=1, batch_dims=1).
In PyTorch, I can think of some functions which do similar things:
torch.gather of course, but this does not have an argument similar to Tensorflow's batch_dims. The output of torch.gather will always have the same shape as the indices. So I would need to unbroadcast the feature dim into i before passing it to torch.gather.
torch.index_select, but here, the indices must be one-dimensional. So to make it work I would need to unbroadcast x to add a "batch * new_time" dim, and then after torch.index_select reshape the output.
torch.nn.functional.embedding. Here, the embedding matrices would correspond to x. But this embedding function does not support the weights to be batched, so I run into the same issue as for torch.index_select (looking at the code, tf.embedding uses torch.index_select under the hood).
Is it possible to accomplish such gather operation without relying on unbroadcasting which is inefficient for large dims?
This is actually the most frequent case: when input and index tensors don't perfectly match the number of dimensions. You can still utilize torch.gather though since you can rewrite your expression:
y[b, t, f] = x[b, i[b, t], f]
as:
y[b, t, f] = x[b, i[b, t, f], f]
which ensures all three tensors have an equal number of dimensions. This reveals a third dimension on i, which we can easily create for free by unsqueezing a dimension and expanding it to the shape of x. You can do so with i[:,None].expand_as(x).
Here is a minimal example:
>>> b = 2; t = 3; f = 1
>>> x = torch.rand(b, t, f)
>>> i = torch.randint(0, t, (b, f))
>>> x.gather(1, i[:,None].expand_as(x))

Python: [PyTorch] Selectively add along axes without using a loop

let’s say I have a Tensor X with dimensions [batch, channels, H, W]
then I have another tensor b that holds bias values for each channel which has dims [channels,]
I want y = x + b (per sample)
Is there a nice way to broadcast this over H and W for each channel for each sample in the batch without using a loop.
If i’m convolving I know I can use the bias field in the function to achieve this, but I’m just wondering if this can be achieved just with primitive ops (not using explicit looping)
Link to PyTorch forum question
y = x + b[None, :, None, None] (basically expand b into x's axis template)

What is the meaning of the 4 return values from net.input_info[input_blob].input_data.shape?

First I have try to read directly from the documentation here and searching online and I don't still don't understand about what is the return value actually means when loading openvino models.
core = IECore()
net = core.read_network(model=model, weights=weights)
input_blob = next(iter(net.input_info))
out_blob = next(iter(net.outputs))
net = ie.load_network(network=net, num_requests=2, device_name=self.deviceName)
n, c, h, w = net.input_info[input_blob].input_data.shape
What is actually n, c, h and w ? I see some sample code actually just never use the n and c. Thank you.
the NCHW or NHWC is the data format for activations (images).
Activations consist of channels and a spatial domain, 1D, 2D or 3D(dimensions).
Spatial domain together with channels form an image.
During the training phase images are typically grouped together in batches.
Even if there is only one image, we would still assume there is a batch with batch size equal to 1. Hence, the overall dimensionality of activations is 4D (N, C, H, and W) or 5D (N, C, D, H, and W).
If you want to know in-depth what they are, you'll need to understand this concept.

Applying Kullback-Leibler (aka kl divergence) element-wise in Pytorch

I have two tensors named x_t, x_k with follwing shapes NxHxW and KxNxHxW respectively, where K, is the number of autoencoders used to reconstruct x_t (if you have no idea what is this, assume they're K different nets aiming to predict x_t, this probably has nothing to do with the question anyways) N is batch size, H matrix height, W matrix width.
I'm trying to apply Kullback-Leibler divergence algorithm to both tensors (after broadcasting x_t as x_k along the Kth dimension) using Pytorch's nn.functional.kl_div method.
However, it does not seem to be working as I expected. I'm looking to calcualte the kl_div between each observation in x_t and x_k resulting in a tensor of size KxN (i.e., kl_div of each observation for each K autoencoder).
The actual output is a single value if I use the reduction argument, and the same tensor size (i.e., KxNxHxW) if I do not use it.
Has anyone tried something similar?
Reproducible example:
import torch
import torch.nn.functional as F
# K N H W
x_t = torch.randn( 10, 5, 5)
x_k = torch.randn( 3, 10, 5, 5)
x_broadcasted = x_t.expand_as(x_k)
loss = F.kl_div(x_t, x_k, reduction="none") # or "batchmean", or there are many options
It's unclear to me what exactly constitutes a probability distribution in your model. With reduction='none', kl_div, given log(x_n) and y_n, computes kl_div = y_n * (log(y_n) - log(x_n)), which is the "summed" part of the actual Kullback-Leibler divergence. Summation (or, in other words, taking the expectation) is up to you. If your point is that H, W are the two dimensions over which you want to take expectation, it's as simple as
loss = F.kl_div(x_t, x_k, reduction="none").sum(dim=(-1, -2))
Which is of shape [K, N]. If your network output is to be interpreted differently, you need to better specify which are the event dimensions and which are sample dimensions of your distribution.

Categories