Parallel indexing in Keras or Theano - python

The 2D problem
For each datapoint, I have an index matrix which I want to use to gather vectors from a 2D lookup matrix.
For a single datapoint, Theano and Keras allow easy indexing.
import keras.backend as K
result = K.gather(reference, indices)
E.g.:
result = K.gather(reference, indices)
#let:
indices.shape = (100, 5)
reference.shape = (101, 68)
#where:
max(indices) < reference.shape[0]
#then:
result.shape = (100, 5, 68)
The 3D problem
However, I need to repeat this process for each datapoint in a batch. E.g. I want to parallelise the lookup.
I have a 3D matrix that I want to convert into a 4D matrix.
E.g.
#let:
indices.shape = (batch_n, 100, 5)
reference.shape = (batch_n, 101, 68)
#desired result
result.shape = (batch_n, 100, 5, 68)
More formally, I am looking for an operation such that:
result[i,j,k,:] = lookup[i, indices[i,j,k], :]
or
result[i,j,k,l] = lookup[i, indices[i,j,k], l]
I implemented a Theano solution using scan. It is actually quite straightforward:
import theano
import theano.tensor as T
def parallel_gather(references, indices):
result, _ = theano.scan(fn=lambda reference, indices:reference[indices], outputs_info=None, sequences=[references, indices])
return result
Rewriting this to the Keras backend seems troublesome given that keras.rnn is the Keras alternative. It does not seem to support iteration of a list of tensors, and has some weird requirements.
I also wonder if this is the fastest option, perhaps some clever reshaping could also solve the problem.

Related

How to do a scalar product along the right axes with numpy and vectorize the process

I have numpy array 'test' of dimension (100, 100, 16, 16) which gives me a different 16x16 array for points on a 100x100 grid.
I also have some eigenvalues and vectors where vals has the dimension (100, 100, 16) and vecs (100, 100, 16, 16) where vecs[x, y, :, i] would be the ith eigenvector of the matrix at the point (x, y) corresponding to the ith eigenvalue vals[x, y, i].
Now I want to take the first eigenvector of the array at ALL points on the grid, do a matrix product with the test matrix and then do a scalar product of the resulting vector with all the other eigenvectors of the array at all points on the grid and sum them.
The resulting array should have the dimension (100, 100). After this I would like to take the 2nd eigenvector of the array, matrix multiply it with test and then take the scalar product of the result with all the eigenvectors that is not the 2nd and so on so that in the end I have 16 (100, 100) or rather a (100, 100, 16) array. I only succeded sofar with a lot of for loops which I would like to avoid, but using tensordot gives me the wrong dimension and I don't see how to pick the axis which is vectorized along for the np.dot function.
I heard that einsum might be suitable to this task, but everything that doesn't rely on the python loops is fine by me.
import numpy as np
from numpy import linalg as la
test = np.arange(16*16*100*100).reshape((100, 100, 16, 16))
vals, vecs = la.eig(test + 1)
np.tensordot(vecs, test, axes=[2, 3]).shape
>>> (10, 10, 16, 10, 10, 16)
EDIT: Ok, so I used np.einsum to get a correct intermediate result.
np.einsum('ijkl, ijkm -> ijlm', vecs, test)
But in the next step I want to do the scalarproduct only with all the other entries of vec. Can I implement maybe some inverse Kronecker delta in this einsum formalism? Or should I switch back to the usual numpy now?
Ok, I played around and with np.einsum I found a way to do what is described above. A nice feature of einsum is that if you repeat doubly occuring indices in the 'output' (so right of the '->'-thing) you can have element-wise multiplication along some and contraction along some other axes (something that you don't have in handwritten tensor algebra notation).
result = np.einsum('ijkl, ijlm -> ijkm', np.einsum('ijkl, ijkm -> ijlm', vecs, test), vecs)
This nearly does the trick. Now only the diagonal terms have to be taken out. We could do this by just substracting the diagonal terms like this:
result = result - result * np.eye(np.shape(test)[-1])[None, None, ...]

Tensorflow - indexing according to batch position

I'm working on masking r-cnn and I have a problem with indexing the masks according to labels.
Here's what I want to achieve: I have a tensor (?,28,28,c), where ? is unknown batch_size, "28x28" are 2d coordinates and c stands for different labels, then I have a list of indices (basically my label predictions) (?,) of int32. Now I want to extract the masks for a given label according to batch index -> make it a (?,28,28,1) tensor.
I tried self.masks_sigmoids = tf.gather(self.final_conv, self.label_predictions, axis=3), but the shape remained the same.
I also looked at tf.gather_nd here http://www.riptutorial.com/tensorflow/example/29069/how-to-use-tf-gather-nd, and I guess this is the right path, but I don't know how to incorporate that I want the indices according to batch index (in numpy (b_i,:,:,c_i))
I also get a feeling that my question is somewhat similar to Batched 4D tensor Tensorflow indexing, though my problem seems less complicated. However, that question is old in terms of the quick development of tensorflow, so I'm asking for a possibly better, more clear solution. EDIT: Even a dirty solution might beneficial as I didn't get the question in the linked SO (already wrote a comment asking to clarify the question), thus I don't get much from the only answer. It might be beneficial for the community as well, because this question is simpler, which means it would demonstrate the solution more clearly.
Solution 1: more generic
You can look at the answer here, it's basically the same problem as yours, with different dimensions.
The solution described there is to create a [?, 28, 28, 4]-shaped tensor indices where indices[i, x, y, :] = [i, x, y, self.label_predictions[i]], and then use tf.gather_nd:
self.masks_sigmoids = tf.gather_nd(self.final_conv, indices=indices)
Building the indices is not very elegant, as shown in this answer (with one more dimension for you), but easy in itself.
Solution 2: A bit more elegant and adapted to your problem
This solution is very similar to the first one, but avoids creating the [x, y] part of indices. The idea is to use the slicing capabilities of gather_nd to avoid writing [x, y] in indices for each (i, x, y), by transposing the data before gathering it. I'll put the whole code here, including how to create indices and how to test:
import numpy as np
import tensorflow as tf
N_CHANNELS = 5
pl=tf.placeholder(dtype=tf.int32, shape=(None, 28, 28, N_CHANNELS))
# Indices we'll use. batch_size = 4 here.
label_predictions = tf.constant([0, 2, 0, 3])
# Indices of shape [?, 2], with indices[i] = [i, self.label_predictions[i]],
# which is easy to do with tf.range() and tf.stack()
indices = tf.stack([tf.range(tf.size(label_predictions)), label_predictions], axis=-1)
# [[0, 0], [1, 2], [2, 0], [3, 3]]
transposed = tf.transpose(pl, perm=[0, 3, 1, 2])
gathered = tf.gather_nd(transposed, indices) # Should be of shape (4, 2, 3)
result = tf.expand_dims(gathered, -1)
initial_value = np.arange(4*28*28*N_CHANNELS).reshape((4, 28, 28, N_CHANNELS))
sess = tf.InteractiveSession()
res = sess.run(result, feed_dict={pl: initial_value})
# print(res)
print("checking validity")
for i in range(4):
for x in range(28):
print(x)
for y in range(28):
assert res[i, x, y, 0] == initial_value[i, x, y, indices[i, 1].eval()]
print("All assertions passed")

Slicing tensors in tensorflow using argmax

I want to make a dynamic loss function in tensorflow. I want to calculate the energy of a signal's FFT, more specifically only a window of size 3 around the most dominant peak. I am unable to implement in TF, as it throws a lot of errors like Stride and InvalidArgumentError (see above for traceback): Expected begin, end, and strides to be 1D equal size tensors, but got shapes [1,64], [1,64], and [1] instead.
My code is this:
self.spec = tf.fft(self.signal)
self.spec_mag = tf.complex_abs(self.spec[:,1:33])
self.argm = tf.cast(tf.argmax(self.spec_mag, 1), dtype=tf.int32)
self.frac = tf.reduce_sum(self.spec_mag[self.argm-1:self.argm+2], 1)
Since I am computing batchwise of 64 and dimension of data as 64 too, the shape of self.signal is (64,64). I wish to calculate only the AC components of the FFT. As the signal is real valued, only half the spectrum would do the job. Hence, the shape of self.spec_mag is (64,32).
The max in this fft is located at self.argm which has a shape (64,1).
Now I want to calculate the energy of 3 elements around the max peak via: self.spec_mag[self.argm-1:self.argm+2].
However when I run the code and try to obtain the value of self.frac, I get thrown with multiple errors.
It seems like you were missing and index when accessing argm. Here is the fixed version of the 1, 64 version.
import tensorflow as tf
import numpy as np
x = np.random.rand(1, 64)
xt = tf.constant(value=x, dtype=tf.complex64)
signal = xt
print('signal', signal.shape)
print('signal', signal.eval())
spec = tf.fft(signal)
print('spec', spec.shape)
print('spec', spec.eval())
spec_mag = tf.abs(spec[:,1:33])
print('spec_mag', spec_mag.shape)
print('spec_mag', spec_mag.eval())
argm = tf.cast(tf.argmax(spec_mag, 1), dtype=tf.int32)
print('argm', argm.shape)
print('argm', argm.eval())
frac = tf.reduce_sum(spec_mag[0][(argm[0]-1):(argm[0]+2)], 0)
print('frac', frac.shape)
print('frac', frac.eval())
and here is the expanded version (batch, m, n)
import tensorflow as tf
import numpy as np
x = np.random.rand(1, 1, 64)
xt = tf.constant(value=x, dtype=tf.complex64)
signal = xt
print('signal', signal.shape)
print('signal', signal.eval())
spec = tf.fft(signal)
print('spec', spec.shape)
print('spec', spec.eval())
spec_mag = tf.abs(spec[:, :, 1:33])
print('spec_mag', spec_mag.shape)
print('spec_mag', spec_mag.eval())
argm = tf.cast(tf.argmax(spec_mag, 2), dtype=tf.int32)
print('argm', argm.shape)
print('argm', argm.eval())
frac = tf.reduce_sum(spec_mag[0][0][(argm[0][0]-1):(argm[0][0]+2)], 0)
print('frac', frac.shape)
print('frac', frac.eval())
you may want to fix function names since I edit this code at a newer version of tensorflow.
Tensorflow indexing uses tf.Tensor.getitem:
This operation extracts the specified region from the tensor. The notation is similar to NumPy with the restriction that currently only support basic indexing. That means that using a tensor as input is not currently allowed
So using tf.slice and tf.strided_slice is out of the question as well.
Whereas in tf.gather indices defines slices into the first dimension of Tensor, in tf.gather_nd, indices defines slices into the first N dimensions of the Tensor, where N = indices.shape[-1]
Since you wanted the 3 values around the max, I manually extract the first, second and third element using a list comprehension, followed be a tf.stack
import tensorflow as tf
signal = tf.placeholder(shape=(64, 64), dtype=tf.complex64)
spec = tf.fft(signal)
spec_mag = tf.abs(spec[:,1:33])
argm = tf.cast(tf.argmax(spec_mag, 1), dtype=tf.int32)
frac = tf.stack([tf.gather_nd(spec,tf.transpose(tf.stack(
[tf.range(64), argm+i]))) for i in [-1, 0, 1]])
frac = tf.reduce_sum(frac, 1)
This will fail for the corner case where argm is the first or last element in the row, but it should be easy to resolve.

Broadcasting np.dot vs tf.matmul for tensor-matrix multiplication (Shape must be rank 2 but is rank 3 error)

Let's say I have the following tensors:
X = np.zeros((3,201, 340))
Y = np.zeros((340, 28))
Making a dot product of X, Y is successful with numpy, and yields a tensor of shape (3, 201, 28).
However with tensorflow I get the following error: Shape must be rank 2 but is rank 3 error ...
minimal code example:
X = np.zeros((3,201, 340))
Y = np.zeros((340, 28))
print(np.dot(X,Y).shape) # successful (3, 201, 28)
tf.matmul(X, Y) # errornous
Any idea how to achieve the same result with tensorflow?
Since, you are working with tensors, it would be better (for performance) to use tensordot there than np.dot. NumPy allows it (numpy.dot) to work on tensors through lowered performance and it seems tensorflow simply doesn't allow it.
So, for NumPy, we would use np.tensordot -
np.tensordot(X, Y, axes=((2,),(0,)))
For tensorflow, it would be with tf.tensordot -
tf.tensordot(X, Y, axes=((2,),(0,)))
Related post to understand tensordot.
Tensorflow doesn't allow for multiplication of matrices with different ranks as numpy does.
To cope with this, you can reshape the matrix. This essentially casts a matrix of,
say, rank 3 to one with rank 2 by "stacking the matrices" one on top of the other.
You can use this:
tf.reshape(tf.matmul(tf.reshape(Aijk,[i*j,k]),Bkl),[i,j,l])
where i, j and k are the dimensions of matrix one and k and l are the dimensions of matrix 2.
Taken from here.

Tensorflow: Elementwise-inversion of multiple matrices of different shape

I have a set of differently-shaped matrices M = (M_1, M_2, ... M_K). For efficiency purposes, I can store all of M into a single tensor of size K x max(M_k.shape[0]) x max(M_k.shape[1]). This works fine for doing things like batch matrix multiplication and elementwise additions. But what if I want to do elementwise divisions where the zero elements are ignored?
The best version of this I've come up with is:
import numpy as np
import tensorflow as tf
M = tf.constant(np.array([[1.,2.,0],[3.,4.,5.],[6.,0,0]]), tf.float32)
Minv = tf.select(tf.equal(M, 0), tf.zeros_like(M), tf.inv(M))
Is this the fastest way? Does tf.select still get accelerated well via a GPU?

Categories