how to add list of arrays (tensors) - python

I am defining a simple conv2d function to calculate the cross-correlation between input and kernel (both 2D tensor) as below:
import torch
def conv2D(X, K):
h = K.shape[0]
w = K.shape[1]
ĥ = X.shape[0] - h + 1
ŵ = X.shape[1] - w + 1
Y = torch.zeros((ĥ, ŵ))
for i in range (ĥ):
for j in range (ŵ):
Y[i, j] = (X[i: i+h, j: j+w]*K).sum()
return Y
When X and K are of rank-3 tensor, I calculate the conv2d for each channel and then add them together as below:
def conv2D_multiple(X, K):
cross = []
result = 0
for x, k in zip(X, K):
cross.append(conv2D(x,k))
for t in cross:
result += t
return result
To test my function:
X_2 = torch.tensor([[[0, 1, 2], [3, 4, 5], [6, 7, 8]],
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]], dtype=torch.float32)
K_2 = torch.tensor([[[0, 1], [2, 3]], [[1, 2], [3, 4]]], dtype=torch.float32)
conv2D_multiple(X_2, K_2)
The results is:
tensor([[ 56., 72.],
[104., 120.]])
The result is as expected, however, I believe my second
for loop inside conv2D_multiple(X, K) function is redundant. My question is how to sum (element wise)
tensors (arrays) in the list so I omit the second for loop.

Since your conv2D operates on a per slice behaviour, what you can do is allocate a 3D tensor so that when you use the first for loop, you store the results by taking each result and populating each slice. You can then sum along the dimension of the slices using PyTorch's built-in torch.sum operator on the tensor to get the same result. To make it palatable, I'll make the slice dimension dim=0. Therefore, replace cross from being an initial empty list to a Torch tensor that is 3D to allow you to store the intermediate results, then compress along the slice dimension by summing. We can get away with doing this as your initial implementation stored the intermediate results as a list of 2D tensors. To make it easier, go to 3D and allow PyTorch to sum along the slice axis.
This will require that you define the correct dimensions for this 3D tensor first prior to looping:
def conv2D_multiple(X, K):
h = K.shape[1]
w = K.shape[2]
ĥ = X.shape[1] - h + 1
ŵ = X.shape[2] - w + 1
c = X.shape[0]
cross = torch.zeros((c, ĥ, ŵ), dtype=torch.float32)
for i, (x, k) in enumerate(zip(X, K)):
cross[i] = conv2D(x,k)
result = cross.sum(dim=0)
return result
Notice that for each slice you're iterating over between the input and kernel, instead of appending to a new list we directly place this into a slice in the intermediate tensor. Once you store these results, sum along the slice axis to finally compress it into what you expect. Running the new function above with your example inputs generates the same result.
If this isn't a desired result for you, another way is to simply take the list of tensors you created, build the intermediate tensor out of that by stacking them all together using torch.stack and sum. By default it stacks along the first axis (dim=0):
def conv2D_multiple(X, K):
cross = []
result = 0
for x, k in zip(X, K):
cross.append(conv2D(x,k))
cross = torch.stack(cross)
result = cross.sum(dim=0)
return result

Related

PyTorch's torch.as_strided with negative strides for making a Toeplitz matrix

I am writing a jury-rigged PyTorch version of scipy.linalg.toeplitz, which currently has the following form:
def toeplitz_torch(c, r=None):
c = torch.tensor(c).ravel()
if r is None:
r = torch.conj(c)
else:
r = torch.tensor(r).ravel()
# Flip c left to right.
idx = [i for i in range(c.size(0)-1, -1, -1)]
idx = torch.LongTensor(idx)
c = c.index_select(0, idx)
vals = torch.cat((c, r[1:]))
out_shp = len(c), len(r)
n = vals.stride(0)
return torch.as_strided(vals[len(c)-1:], size=out_shp, stride=(-n, n)).copy()
But torch.as_strided currently does not support negative strides. My function, therefore, throws the error:
RuntimeError: as_strided: Negative strides are not supported at the moment, got strides: [-1, 1].
My (perhaps incorrect) understanding of as_strided is that it inserts the values of the first argument into a new array whose size is specified by the second argument and it does so by linearly indexing those values in the original array and placing them at subscript-indexed strides given by the final argument.
Both the NumPy and PyTorch documentation concerning as_strided have scary warnings about using the function with "extreme care" and I don't understand this function fully, so I'd like to ask:
Is my understanding of as_strided correct?
Is there a simple way to rewrite this so negative strides work?
Will I be able to pass a gradient w.r.t c (or r) through toeplitz_torch?
> 1. Is my understanding of as_strided correct?
The stride is an interface for your tensor to access the underlying contiguous data buffer. It does not insert values, no copies of the values are done by torch.as_strided, the strides define the artificial layout of what we refer to as multi-dimensional array (in NumPy) or tensor (in PyTorch).
As Andreas K. puts it in another answer:
Strides are the number of bytes to jump over in the memory in order to get from one item to the next item along each direction/dimension of the array. In other words, it's the byte-separation between consecutive items for each dimension.
Please feel free to read the answers over there if you have some trouble with strides. Here we will take your example and look at how it is implemented with as_strided.
The example given by Scipy for linalg.toeplitz is the following:
>>> toeplitz([1,2,3], [1,4,5,6])
array([[1, 4, 5, 6],
[2, 1, 4, 5],
[3, 2, 1, 4]])
To do so they first construct the list of values (what we can refer to as the underlying values, not actually underlying data): vals which is constructed as [3 2 1 4 5 6], i.e. the Toeplitz column and row flattened.
Now notice the arguments passed to np.lib.stride_tricks.as_strided:
values: vals[len(c)-1:] notice the slice: the tensors show up smaller, yet the underlying values remain, and they correspond to those of vals. Go ahead and compare the two with storage_offset: it's just an offset of 2, the values are still there! How this works is that it essentially shifts the indices such that index=0 will refer to value 1, index=1 to 4, etc...
shape: given by the column/row inputs, here (3, 4). This is the shape of the resulting object.
strides: this is the most important piece: (-n, n), in this case (-1, 1)
The most intuitive thing to do with strides is to describe a mapping between the multi-dimensional space: (i, j) ∈ [0,3[ x [0,4[ and the flattened 1D space: k ∈ [0, 3*4[. Since the strides are equal to (-n, n) = (-1, 1), the mapping is -n*i + n*j = -1*i + 1*j = j-i. Mathematically you can describe your matrix as M[i, j] = F[j-i] where F is the flattened values vector [3 2 1 4 5 6].
For instance, let's try with i=1 and j=2. If you look at the Topleitz matrix above M[1, 2] = 4. Indeed F[k] = F[j-i] = F[1] = 4
If you look closely you will see the trick behind negative strides: they allow you to 'reference' to negative indices: for instance, if you take j=0 and i=2, then you see k=-2. Remember how vals was given with an offset of 2 by slicing vals[len(c)-1:]. If you look at its own underlying data storage it's still [3 2 1 4 5 6], but has an offset. The mapping for vals (in this case i: 1D -> k: 1D) would be M'[i] = F'[k] = F'[i+2] because of the offset. This means M'[-2] = F'[0] = 3.
In the above I defined M' as vals[len(c)-1:] which basically equivalent to the following tensor:
>>> torch.as_strided(vals, size=(len(vals)-2,), stride=(1,), storage_offset=2)
tensor([1, 4, 5, 6])
Similarly, I defined F' as the flattened vector of underlying values: [3 2 1 4 5 6].
The usage of strides is indeed a very clever way to define a Toeplitz matrix!
> 2. Is there a simple way to rewrite this so negative strides work?
The issue is, negative strides are not implemented in PyTorch... I don't believe there is a way around it with torch.as_strided, otherwise it would be rather easy to extend the current implementation and provide support for that feature.
There are however alternative ways to solve the problem. It is entirely possible to construct a Toeplitz matrix in PyTorch, but that won't be with torch.as_strided.
We will do the mapping ourselves: for each element of M indexed by (i, j), we will find out the corresponding index k which is simply j-i. This can be done with ease, first by gathering all (i, j) pairs from M:
>>> i, j = torch.ones(3, 4).nonzero().T
(tensor([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2]),
tensor([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3]))
Now we essentially have k:
>>> j-i
tensor([ 0, 1, 2, 3, -1, 0, 1, 2, -2, -1, 0, 1])
We just need to construct a flattened tensor of all possible values from the row r and column c inputs. Negative indexed values (the content of c) are put last and flipped:
>>> values = torch.cat((r, c[1:].flip(0)))
tensor([1, 4, 5, 6, 3, 2])
Finally index values with k and reshape:
>>> values[j-i].reshape(3, 4)
tensor([[1, 4, 5, 6],
[2, 1, 4, 5],
[3, 2, 1, 4]])
To sum it up, my proposed implementation would be:
def toeplitz(c, r):
vals = torch.cat((r, c[1:].flip(0)))
shape = len(c), len(r)
i, j = torch.ones(*shape).nonzero().T
return vals[j-i].reshape(*shape)
> 3. Will I be able to pass a gradient w.r.t c (or r) through toeplitz_torch?
That's an interesting question because torch.as_strided doesn't have a backward function implemented. This means you wouldn't have been able to backpropagate to c and r! With the above method, however, which uses 'backward-compatible' builtins, the backward pass comes free of charge.
Notice the grad_fn on the output:
>>> toeplitz(torch.tensor([1.,2.,3.], requires_grad=True),
torch.tensor([1.,4.,5.,6.], requires_grad=True))
tensor([[1., 4., 5., 6.],
[2., 1., 4., 5.],
[3., 2., 1., 4.]], grad_fn=<ViewBackward>)
This was a quick draft (that did take a little while to write down), I will make some edits. If you have some questions or remarks, don't hesitate to comment! I would be interested in seeing other answers as I am not an expert with strides, this is just my take on the problem.

Efficiently multiplying tensor elements in Keras

I have a Tensor in Keras with the following shape
x = (None, 14, 14, 32)
This is the weight of the convolution layer from my network.
I need to multiply elements of the tensor with each other i.e. self multiplication and then sum all the values together.
Let us consider a simpler example, if I have the following tensor
x = Tensor([1,2,3],[4,5,6])
Then I need to compute x*x and the output should be
1*4 + 1*5 + 1*6 + 2*4 + 2*5 + 2*6 + 3*4 + 3*5 + 3*6
As a naive implementation, I tried the following
flattened_unpacked = tf.unstack(tf.reshape(tf.gather(x,0), [-1]))
list1 = []
list2 = []
for elem in flattened_unpacked:
list1.append(elem)
list2.append(elem)
res = [i * j for j in list1 for i in list2]
sum_res = sum(res)
But it quickly ran out of memory on Google Colab. Is there an efficient way to perform this multiplication ?
You can use broadcast_to to make your array size compatible for matrix multiplication. See the documentation here.
Looking at the example given you are trying to do a matrix multiplication od these two matrices.
Matrix 1: shape 3x3
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
Matrix 2: shape 3x1
array([[4],
[5],
[6]])
And calculate the sum of the resultant 3 x 1 matrix.
This can be achieved by:
Step 1: Matrix multiplication:
x = tf.matmul(tf.broadcast_to(tf.constant([[1], [2], [3]]), shape = (3,3)), tf.constant([[4], [5], [6]]))
Output:
array([[15],
[30],
[45]])
Step 2: Adding the elements
tf.reduce_sum(x)
Output:
90

Numba-compatible implementation of np.tile?

I'm working on some code for dehazing images, based on this paper, and I started with an abandoned Py2.7 implementation. Since then, particularly with Numba, I've made some real performance improvements (important since I'll have to run this on 8K images).
I'm pretty convinced my last significant performance bottleneck is in performing the box filter step (I've already shaved off almost a minute per image, but this last slow step is ~30s/image), and I'm close to getting it to run as nopython in Numba:
#njit # Row dependencies means can't be parallel
def yCumSum(a):
"""
Numba based computation of y-direction
cumulative sum. Can't be parallel!
"""
out = np.empty_like(a)
out[0, :] = a[0, :]
for i in prange(1, a.shape[0]):
out[i, :] = a[i, :] + out[i - 1, :]
return out
#njit(parallel= True)
def xCumSum(a):
"""
Numba-based parallel computation
of X-direction cumulative sum
"""
out = np.empty_like(a)
for i in prange(a.shape[0]):
out[i, :] = np.cumsum(a[i, :])
return out
#jit
def _boxFilter(m, r, gpu= hasGPU):
if gpu:
m = cp.asnumpy(m)
out = __boxfilter__(m, r)
if gpu:
return cp.asarray(out)
return out
#jit(fastmath= True)
def __boxfilter__(m, r):
"""
Fast box filtering implementation, O(1) time.
Parameters
----------
m: a 2-D matrix data normalized to [0.0, 1.0]
r: radius of the window considered
Return
-----------
The filtered matrix m'.
"""
#H: height, W: width
H, W = m.shape
#the output matrix m'
mp = np.empty(m.shape)
#cumulative sum over y axis
ySum = yCumSum(m) #np.cumsum(m, axis=0)
#copy the accumulated values of the windows in y
mp[0:r+1,: ] = ySum[r:(2*r)+1,: ]
#differences in y axis
mp[r+1:H-r,: ] = ySum[(2*r)+1:,: ] - ySum[ :H-(2*r)-1,: ]
mp[(-r):,: ] = np.tile(ySum[-1,: ], (r, 1)) - ySum[H-(2*r)-1:H-r-1,: ]
#cumulative sum over x axis
xSum = xCumSum(mp) #np.cumsum(mp, axis=1)
#copy the accumulated values of the windows in x
mp[:, 0:r+1] = xSum[:, r:(2*r)+1]
#difference over x axis
mp[:, r+1:W-r] = xSum[:, (2*r)+1: ] - xSum[:, :W-(2*r)-1]
mp[:, -r: ] = np.tile(xSum[:, -1][:, None], (1, r)) - xSum[:, W-(2*r)-1:W-r-1]
return mp
There's plenty to do around the edges, but if I can get the tile operation as a nopython call, I can nopython the whole boxfilter step and get a big performance boost. I'm not super inclined to do something really really specific as I'd love to reuse this code elsewhere, but I wouldn't particularly object to it being limited to a 2D scope. For whatever reason I'm just staring at this and not really sure where to start.
np.tile is a bit too complicated to reimplement in full, but unless I'm misreading it looks like you only need to take a vector and then repeat it along a different axis r times.
A Numba-compatible way to do this is to write
y = x.repeat(r).reshape((-1, r))
Then x will be repeated r times along the second dimension, so that y[i, j] == x[i].
Example:
In [2]: x = np.arange(5)
In [3]: x.repeat(3).reshape((-1, 3))
Out[3]:
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2],
[3, 3, 3],
[4, 4, 4]])
If you want x to be repeated along the first dimension instead, just take the transpose y.T.

Tensorflow - pick values from indicies, what is the operation called?

An example
Suppose I have a tensor values with shape (2,2,2)
values = [[[0, 1],[2, 3]],[[4, 5],[6, 7]]]
And a tensor indicies with shape (2,2) which describes what values to be selected in the innermost dimension
indicies = [[1,0],[0,0]]
Then the result will be a (2,2) matrix with these values
result = [[1,2],[4,6]]
What is this operation called in tensorflow and how to do it?
General
Note that the above shape (2,2,2) is only an example, it can be any dimension. Some conditions for this operation:
ndim(values) -1 = ndim(indicies)
values.shape[:-1] == indicies.shape == result.shape
indicies.max() < values.shape[-1] -1
I think you can emulate this with tf.gather_nd. You will just have to convert "your" indices to a representation that is suitable for tf.gather_nd. The following example here is tied to your specific example, i.e. input tensors of shape (2, 2, 2) but I think this gives you an idea how you could write the conversion for input tensors with arbitrary shape, although I am not sure how easy it would be to implement this (haven't thought about it too long). Also, I'm not claiming that this is the easiest possible solution.
import tensorflow as tf
import numpy as np
values = np.array([[[0, 1], [2, 3]], [[4, 5], [6, 7]]])
values_tf = tf.constant(values)
indices = np.array([[1, 0], [0, 0]])
converted_idx = []
for k in range(values.shape[0]):
outer = []
for l in range(values.shape[1]):
inds = [k, l, indices[k][l]]
outer.append(inds)
print(inds)
converted_idx.append(outer)
with tf.Session() as sess:
result = tf.gather_nd(values_tf, converted_idx)
print(sess.run(result))
This prints
[[1 2]
[4 6]]
Edit: To handle arbitrary shapes here is a recursive solution that should work (only tested on your example):
def convert_idx(last_dim_vals, ori_indices, access_to_ori, depth):
if depth == len(last_dim_vals.shape) - 1:
inds = access_to_ori + [ori_indices[tuple(access_to_ori)]]
return inds
outer = []
for k in range(ori_indices.shape[depth]):
inds = convert_idx(last_dim_vals, ori_indices, access_to_ori + [k], depth + 1)
outer.append(inds)
return outer
You can use this together with the original code I posted like so:
...
converted_idx = convert_idx(values, indices, [], 0)
with tf.Session() as sess:
result = tf.gather_nd(values_tf, converted_idx)
print(sess.run(result))

TensorFlow getting elements of every row for specific columns

If A is a TensorFlow variable like so
A = tf.Variable([[1, 2], [3, 4]])
and index is another variable
index = tf.Variable([0, 1])
I want to use this index to select columns in each row. In this case, item 0 from first row and item 1 from second row.
If A was a Numpy array then to get the columns of corresponding rows mentioned in index we can do
x = A[np.arange(A.shape[0]), index]
and the result would be
[1, 4]
What is the TensorFlow equivalent operation/operations for this? I know TensorFlow doesn't support many indexing operations. What would be the work around if it cannot be done directly?
You can extend your column indices with row indices and then use gather_nd:
import tensorflow as tf
A = tf.constant([[1, 2], [3, 4]])
indices = tf.constant([1, 0])
# prepare row indices
row_indices = tf.range(tf.shape(indices)[0])
# zip row indices with column indices
full_indices = tf.stack([row_indices, indices], axis=1)
# retrieve values by indices
S = tf.gather_nd(A, full_indices)
session = tf.InteractiveSession()
session.run(S)
You can use one hot method to create a one_hot array and use it as a boolean mask to select the indices you'd like.
A = tf.Variable([[1, 2], [3, 4]])
index = tf.Variable([0, 1])
one_hot_mask = tf.one_hot(index, A.shape[1], on_value = True, off_value = False, dtype = tf.bool)
output = tf.boolean_mask(A, one_hot_mask)
After dabbling around for quite a while. I found two functions that could be useful.
One is tf.gather_nd() which might be useful if you can produce a tensor
of the form [[0, 0], [1, 1]] and thereby you could do
index = tf.constant([[0, 0], [1, 1]])
tf.gather_nd(A, index)
If you are unable to produce a vector of the form [[0, 0], [1, 1]](I couldn't produce this as the number of rows in my case was dependent on a placeholder) for some reason then the work around I found is to use the tf.py_func(). Here is an example code on how this can be done
import tensorflow as tf
import numpy as np
def index_along_every_row(array, index):
N, _ = array.shape
return array[np.arange(N), index]
a = tf.Variable([[1, 2], [3, 4]], dtype=tf.int32)
index = tf.Variable([0, 1], dtype=tf.int32)
a_slice_op = tf.py_func(index_along_every_row, [a, index], [tf.int32])[0]
session = tf.InteractiveSession()
a.initializer.run()
index.initializer.run()
a_slice = a_slice_op.eval()
a_slice will be a numpy array [1, 4]
We can do the same using this combination of map_fn and gather_nd.
def get_element(a, indices):
"""
Outputs (ith element of indices) from (ith row of a)
"""
return tf.map_fn(lambda x: tf.gather_nd(x[0], x[1]),
(a, indices),
dtype = tf.float32)
Here's an example usage.
A = tf.constant(np.array([[1,2,3],
[4,5,6],
[7,8,9]], dtype = np.float32))
idx = tf.constant(np.array([[2],[1],[0]]))
elems = get_element(A, idx)
with tf.Session() as sess:
e = sess.run(elems)
print(e)
I don't know if this will be much slower than other answers.
It has the advantage that you don't need to specify the number of rows of A in advance, as long as a and indices have the same number of rows at runtime.
Note the output of the above will be rank 1. If you'd prefer it to have rank 2, replace gather_nd by gather
I couldn't get the accepted answer to work in Tensorflow 2 when I incorporated it into a loss function. Something about GradientTape didn't like it. My solution is an altered version of the accepted answer:
def get_rows(arr):
N, _ = arr.shape
return N
num_rows= tf.py_function(get_rows, [arr], [tf.int32])[0]
rng = tf.range(0,num_rows)
ind = tf.stack([rng, ind], axis=1)
tf.gather_nd(arr, ind)

Categories