How to do argmax in group in pytorch? - python

Is there any ways to implement maxpooling according to norm of sub vectors in a group in Pytorch? Specifically, this is what I want to implement:
Input:
x: a 2-D float tensor, shape #Nodes * dim
cluster: a 1-D long tensor, shape #Nodes
Output:
y, a 2-D float tensor, and:
y[i]=x[k] where k=argmax_{cluster[k]=i}(torch.norm(x[k],p=2)).
I tried torch.scatter with reduce="max", but this only works for dim=1 and x[i]>0.
Can someone help me to solve the problem?

I don't think there's any built-in function to do what you want. Basically this would be some form of scatter_reduce on the norm of x, but instead of selecting the max norm you want to select the row corresponding to the max norm.
A straightforward implementation may look something like this
"""
input
x: float tensor of size [NODES, DIMS]
cluster: long tensor of size [NODES]
output
float tensor of size [cluster.max()+1, DIMS]
"""
num_clusters = cluster.max().item() + 1
y = torch.zeros((num_clusters, DIMS), dtype=x.dtype, device=x.device)
for cluster_id in torch.unique(cluster):
x_cluster = x[cluster == cluster_id]
y[cluster_id] = x_cluster[torch.argmax(torch.norm(x_cluster, dim=1), dim=0)]
Which should work just fine if clusters.max() is relatively small. If there are many clusters though then this approach has to unnecessarily create masks over cluster for every unique cluster id. To avoid this you can make use of argsort. The best I could come up with in pure python was the following.
num_clusters = cluster.max().item() + 1
x_norm = torch.norm(x, dim=1)
cluster_sortidx = torch.argsort(cluster)
cluster_ids, cluster_counts = torch.unique_consecutive(cluster[cluster_sortidx], return_counts=True)
end_indices = torch.cumsum(cluster_counts, dim=0).cpu().tolist()
start_indices = [0] + end_indices[:-1]
y = torch.zeros((num_clusters, DIMS), dtype=x.dtype, device=x.device)
for cluster_id, a, b in zip(cluster_ids, start_indices, end_indices):
indices = cluster_sortidx[a:b]
y[cluster_id] = x[indices[torch.argmax(x_norm[indices], dim=0)]]
For example in random tests with NODES = 60000, DIMS = 512, cluster.max()=6000 the first version takes about 620ms whie the second version takes about 78ms.

Related

Can I apply tf.math.bincount on a rank 3 tensor without for loop?

I would like to get the occurrences of numbers of a 3 dimension tensor (displayed in a tensor at the corresponding indexes, like tf.math.bincount does).
For a 2d tensor, you can simply do this:
T = tf.round(25*tf.random.uniform((5,8)))
bincounts = tf.cast(tf.math.bincount(T, axis=-1),tf.float32)
But on a 3d tensor, the only way I found is looping over the third dimension, like this:
third_dim = 10
T = tf.round(25*tf.random.uniform((5,8,third_dim)))
bincounts = []
for i in range(third_dim):
bincounts.append(tf.math.bincount(T[:,:,i], axis=-1))
bincounts = tf.stack(bincounts,-1)
Does anyone know if there is a way to apply such a function directly on all the dimensions?
I found a way:
Apply bincounts on the reshaped tensor, and then reshape back to the shape you want:
third_dim = 10
T = tf.round(25*tf.random.uniform((5,8,third_dim)))
T2 = tf.reshape(T,(5*8,third_dim))
bincounts2 = tf.math.bincount(T2, axis=-1)
bincounts = tf.reshape(bincounts2, [5,8,bincounts2.shape[-1]])

Use of tf.while_loop with 1-D tensors as input to produce 2-D tensors

I need to perform a loop in parallel wit GPUs of a function that computes independently the rows of a matrix. I was using map_fn, but to be able to have the parallel computing enabled with the eager execution, as far as I understand, I've to use the while_loop function.
Unfortunately I find not very intuitive how to use this function, so I'm kindly asking to you how to convert map_fn to while_loop in my code. Here a simplified version of the code:
*some 1-D float tensors*
def compute_row(ithStep):
*operations on the 1-D tensors that return a 1-D tensor with fixed length*
return values
image = tf.map_fn(compute_row, tf.range(0,nRows))
The version with while_loop I wrote, following the example in the documentation and other questions here on Stackoverflow is:
*some 1-D float tensors*
def compute_row(i):
*operations on the 1-D tensors that return a 1-D tensor with fixed length*
return values
def condition(i):
return tf.less(i, nRows)
i = tf.constant(0)
image = tf.while_loop(condition, compute_row, [i])
But in this case what I obtain is:
ValueError: The two structures don't have the same nested structure.
First structure: type=list str=[TensorSpec(shape=(), dtype=tf.int32, name=None)]
Second structure: type=list ... *a long list of tensors*
Where is the mistake? Thanks in advance. If needed I can provide a simplified runnable code.
EDIT: adding below the runnable code
import numpy
import tensorflow as tf
from matplotlib import pyplot
#Defining the data which normally are loaded from file:
#1- matrix of x position-time values, with weights, in sparse format
matrix = numpy.random.randint(2, size = 100).astype(float).reshape(10,10)
x = numpy.nonzero(matrix)[0]
times = numpy.nonzero(matrix)[1]
weights = numpy.random.rand(x.size)
#2- array of y positions
nStepsY = 5
y = numpy.arange(1,nStepsY+1)
#3- the size of the final matrix
nRows = nStepsY
nColumns = 80
# Building the TF tensors
x = tf.constant(x, dtype = tf.float32)
times = tf.constant(times, dtype = tf.float32)
weights = tf.constant(weights, dtype = tf.float32)
y = tf.constant(y, dtype = tf.float32)
# the function to iterate
def compute_row(i):
yTimed = tf.multiply(y[i],times)
positions = tf.round((x-yTimed)+50)
positions = tf.cast(positions, dtype=tf.int32)
values = tf.math.unsorted_segment_sum(weights, positions, nColumns)
return values
image = tf.map_fn(compute_row, tf.range(0,nRows), dtype=tf.float32)
%matplotlib inline
pyplot.imshow(image, aspect = 10)
pyplot.colorbar(shrink = 0.75,aspect = 10)
The output image is:
To construct a while loop, you need to define two functions:
the conditional function: when this function returns false, the loop stops
the loop body function, that performs the wanted operations. In your case, because you want to build a Tensor, you can see that as an accumulation function: the function takes the Tensor as an argument, and append a new row at the end.
Knowing that, we can define the two functions:
First, the loop body. Let's reuse compute_row function to compute the value of the new row based on the value of i, and append the new row to our accumulator using tf.concat. We make sure that the shapes are compatible for he concatenation by adding one dimension to the new row. We also increase the value of the counter i by 1.
def loop_body(i, accumulator):
new_row = compute_row(i)
accumulator = tf.concat([accumulator, new_row[tf.newaxis,:]],axis=0)
return i+1, accumulator
Next the condition: in that case, we just need to check that the value of i is not greater than the number of rows wanted.
def cond(i,accumulator):
return tf.less(i,nRows)
Note that the two functions, loop_body and cond must have the same signature. (That explains why cond takes a second unused argument).
Now, we can put that together in the while_loop call:
i0 = tf.constant(0) # we initialize the counter i at 0
# we initialize the accumulator with an empty Tensor of dimension 1 equal to 0.
accumulator = tf.zeros((0, nColumns))
final_i, image = tf.while_loop(cond, loop_body, loop_vars=[i0, accumulator])
to make sure that it reproduces the same values as the map_fn version, we can compare the two results:
>>> image_map = tf.map_fn(compute_row_map, tf.range(0, nRows), dtype=tf.float32)
>>> tf.reduce_all(tf.equal(image, image_map))
<tf.Tensor: shape=(), dtype=bool, numpy=True>

How to index multiple values in place of a single value, but maintain shape as though it was a single value?

I have a problem for which I cannot find a solution regarding some Numpy code I'm writing. To give some background, I want to implement latency in a neural network. The neural network has an input array x which has a size [time, trials, neurons], and I'd like to assign a certain temporal latency for each neuron.
The simplest case is where there is no latency, and I can feed my network input information like so:
import numpy as np
def f(x):
""" Dummy function so that the code runs """
return np.mean(x)
# Set up initial state
time, trials, neurons = (100, 256, 16)
x = np.random.rand(time, trials, neurons)
# Iterate through time
for t in range(time):
# Index into the state in time
x_ = x[t,:,:]
y = f(x_)
# Assert shape of indexed array
assert x_.shape == (trials, neurons)
In this case, when I index into a particular time, the shape of the array x becomes [trials, neurons] since I've indexed to a particular time point.
Now, I know I can add a fixed latency L, an integer, by indexing with x[t-L,:,:], and the resulting shape is again [trials, neurons]. The result is basically identical to the above code.
To make things tricky, regretfully, my project calls for a different latency for each neuron. So, instead of having L be some integer, I'd like it to be an array of latency values. Specifically, I'd like to make L = np.random.randint(a, b, size=neurons), so each element of L is some integer between a and b, exclusive.
My goal is to have an idiomatic code phrase that performs the same way as the integer L case. I know that I can easily do a for loop over neurons to achieve an inefficient version of this, as shown:
import numpy as np
def f(x):
""" Dummy function so that the code runs """
return np.mean(x)
# Set up initial state
time, trials, neurons = (100, 256, 16)
a, b = (8, 12)
x = np.random.rand(time, trials, neurons)
L = np.random.randint(a, b, size=neurons)
# Iterate through time
for t in range(time):
### This is what I want to optimize ###
#######################################
# Index into the state in time, with
# a different latency for each neuron
x_ = []
for n in range(neurons):
x_.append(x[t-L[n],:,n])
x_ = np.stack(x_, axis=1)
#######################################
# Use the latency-indexed array
y = f(x_)
# Assert shape of indexed array
assert x_.shape == (trials, neurons)
Thus, my question is how to achieve all of this indexing finagling with hopefully just a few lines of a Numpy-native solution. I've tried abusing advanced indexing in this regard, but to no avail, and I'm hoping for some help on this matter. Cheers!
If I understand you correctly, you just need basic indexing:
import numpy as np
time, trials, neurons = (100, 256, 16)
a, b = (8, 12)
x = np.random.rand(time, trials, neurons)
L = np.random.randint(a, b, size=neurons)
# let's say the time t=50
x1 = []
for n in range(neurons):
x1.append(x[50-L[n],:,n])
x1 = np.stack(x1, axis=1)
# use two list to index the intersection, but notice transposing needed.
x2 = x[50-L,:,range(neurons)].T
print(np.all(x1==x2))

Use tf.gather to extract tensors row-wise based on another tensor row-wisely (first dimension)

I have two tensors with dimensions as A:[B,3000,3] and C:[B,4000] respectively. I want to use tf.gather() to use every single row from tensor C as index, and to use every row from tensor A as params, to get a result with size [B,4000,3].
Here is an example to make this more understandable: Say I have tensors as
A = [[1,2,3],[4,5,6],[7,8,9]],
C = [0,2,1,2,1],
result = [[1,2,3],[7,8,9],[4,5,6],[7,8,9],[4,5,6]],
by using tf.gather(A,C). It is all fine when applying to tensors with dimension less than 3.
But when it is the case as the description as the beginning, by applying tf.gather(A,C,axis=1), the shape of result tensor is
[B,B,4000,3]
It seems that tf.gather() just did the job for every element in tensor C as the indices to gather elements in tensor A. The only solution I am thinking about is to use a for loop, but that would extremely reduce the computational ability by using tf.gather(A[i,...],C[i,...]) to gain the correct size of tensor
[B,4000,3]
Thus, is there any function that is able to do this task similarly?
You need to use tf.gather_nd:
import tensorflow as tf
A = ... # B x 3000 x 3
C = ... # B x 4000
s = tf.shape(C)
B, cols = s[0], s[1]
# Make indices for first dimension
idx = tf.tile(tf.expand_dims(tf.range(B, dtype=C.dtype), 1), [1, cols])
# Complete index for gather_nd
gather_idx = tf.stack([idx, C], axis=-1)
# Gather result
result = tf.gather_nd(A, gather_idx)

TensorFlow, how to index so that (batch_size x num_labels)+(batch_size) -> (batch_size)

Suppose I just have got a matrix (2D tensor) X, whose shape is (batch_size x num_labels). And the scores of labels for each sample are stored in the matrix. Now I want to extract the true labels' scores, while the true labels are stored in another 1D tensor y, whose shape is (batch_size).
What can I do ?
I know that in Theano or Numpy. It can be done with a single expression: X[y].
BUT in TensorFlow, what is the most convenient or cost-less way to achieve that ?
X = tf.get_variable("X",[batch_size,num_labels])
y = tf.placeholder(tf.int32,[batch_size])
Note 0 <= y[i] <= num_labels - 1. The output z should be 1D tensor where z[i]= X[i][y[i]]
I understand that X is a vector containing probabilities for each class and batch instance and that you want to get the probability of the true label. I propose on solution, though it may not be the optimal one:
# Create mask for values
increasing = tf.range(start=0, limit=tf.shape(X)[0], delta=1)
# Concatenate batch index and true label
# Note that in Tensorflow < 1.0.0 you must call tf.pack
mask = tf.stack([increasing, y], axis=1)
# Extract values
masked = tf.gather_nd(params=X, indices=mask)
Hope it helps.

Categories