Simultaneous Batch and Channel slicing in PyTorch - python

In PyTorch I have an RGB tensor imgA of batch size 256. I want to retain the green channel for first 128 batches and red channel for remaining 128 batches, something like below:
imgA[:128,2,:,:] = imgA[:128,1,:,:]
imgA[128:,2,:,:] = imgA[128:,0,:,:]
imgA = imgA[:,2,:,:].unsqueeze(1)
or same can be achieved like
imgA = torch.cat((imgA[:128,1,:,:].unsqueeze(1),imgA[128:,0,:,:].unsqueeze(1)),dim=0)
but as I have multiple such images like imgA, imgB, imgC, etc what is the fastest way of achieving the above goal?

A slicing-based solution can be achieved using torch.gather and repeat_interleave:
select = torch.tensor([1, 0], device=imgA.device)
imgA = = imgA.gather(dim=1, index=select.repeat_interleave(128, dim=0).view(256, 1, 1, 1).expand(-1, -1, *imgA.shape[-2:]))
You can also do that using matrix multiplication and repeat_interleave:
# select c=1 for first half and c=0 for second
select = torch.tensor([[0, 1],[1, 0],[0, 0]], dtype=imgA.dtype, device=imgA.device)
imgA = torch.einsum('cb,bchw->bhw',select.repeat_interleave(128, dim=1), imgA).unsqueeze(dim=1)

Related

Convert / "inflate" unaligned pixel data (bgr4) to a byte-aligned numpy array

I have an image in an esoteric format (BGR4) that I would like to load into numpy. In BGR4 individual pixels are byte aligned (thank god) and are comprised of 3 components (B, G, and R) encoded in a single byte. They are ordered like this: b0000BGGR.
Here is an example image with size (1, 2), aka. 2 pixels:
img_bytes = b"\x0F\x09" # this is how it looks in memory
img = np.array([[1, 3, 1], [1, 0, 1]], dtype=np.uint8) # this is my desired result
Since there are a lot of pixels in each image, what is the most performant way to inflate such an array?
I have the same question for BGR8 (ordered: bBBBGGGRR), but I assume the approach is similar, and I will cross that bridge when I get there :)
Here is a numpy implementation that follows the suggestion #MichaelButscher made in the comments:
img_bytes = b"\x0f\x09" # this is how it looks in memory
# b0000BGGR
b = 0b00001000
g = 0b00000110
r = 0b00000001
template = np.array([b, g, r], dtype=np.uint8)[:,None]
shifts = np.array([3, 1, 0], dtype=np.uint8)[:,None]
arr = np.frombuffer(img_bytes, dtype=np.uint8)
res = (arr & template) >> shifts
print(res.T)
[[1 3 1]
[1 0 1]]
You may want to tune transpose order for better performance.

Conv2D produces weird output

I'm trying to use a Laplace Filter via TensorFlow tf.nn.conv2d on my image. But the output is super weird and I don't have a clue what I did wrong.
I load my picture via:
file = tf.io.read_file("corgi.jpg")
uint_image = tf.io.decode_jpeg(file, 1)
image = tf.cast(uint_image,tf.float32)
kernel = tf.constant(np.array([[1, 1, 1],
[1, -8, 1],
[1, 1, 1]]), dtype=tf.float32)
convoluted_image = self.convoluteTest(image, kernel)
rs_convoluted_image = tf.reshape(convoluted_image,
[tf.shape(image)[0] - tf.shape(kernel)[0] + 1,
tf.shape(image)[1] - tf.shape(kernel)[0] + 1, 1])
casted_image = tf.cast(rs_convoluted_image, tf.uint8)
encoded = tf.io.encode_jpeg(casted_image)
tf.io.write_file("corgi-tensor-laplace.jpg", encoded)
But the image parameter cant be passed onto the tf.nn.conv2d function since image tensor requires to be a 4d tensor.
This function here reshapes and applies my laplace filter:
def convoluteTest(image_tensor, kernel_tensor):
shape = tf.shape(image_tensor)
reshaped_image_tensor = tf.reshape(image_tensor, [1, shape[0].numpy(), shape[1].numpy(), 1])
reshaped_kernel_tensor = tf.reshape(kernel_tensor,
[tf.shape(kernel_tensor)[0].numpy(), tf.shape(kernel_tensor)[0].numpy(), 1,
1])
convoluted = tf.nn.conv2d(reshaped_image_tensor, reshaped_kernel_tensor, strides=[1, 1, 1, 1], padding='VALID')
return convoluted
Original Picture:
Failed laplace:
Update:
Greyish output:
What did I do wrong? I can't wrap my head around this...
I believe the problem is casted_image = tf.cast(rs_convoluted_image, tf.uint8) truncates data outside of [0, 255] to pure black or pure white (0 and 255).
I think you are missing a normalization step back to the [0, 255] range before casting to utint8.
Try
normalized_convolved = (rs_convoluted_image - tf.reduce_min(rs_convoluted_image) / (tf.reduce_max(rs_convoluted_image) - tf.reduce_min(rs_convoluted_image))
normalized_convolved = normalized_convolved * 255
casted_image = tf.cast(normalized_convolved, tf.uint8)

How can I select a row from a SparseTensor in TensorFlow?

Say, if I have two SparseTensors as following:
[[1, 0, 0, 0],
[2, 0, 0, 0],
[1, 2, 0, 0]]
and
[[1.0, 0, 0, 0],
[1.0, 0, 0, 0],
[0.3, 0.7, 0, 0]]
and I want to extract the first two rows out of them. I need both indices and values of non-zeros entries as SparseTensors so that I can pass the result to tf.nn.embedding_lookup_sparse. How can I do this?
My application is:
I want to use word embeddings, which is quite straight forward in TensorFlow. But now I want to use sparse embeddings, i.e.: for common words, they have their own embeddings. For rare words, their embeddings are a sparse linear combination of embeddings of common words.
So I need two cookbooks to indicate how sparse embeddings are composed. In the aforementioned example, the cookbook says: For the first word, it's embedding consists of its own embedding with weight 1.0. Things are similar for the second word. For the last word, it says: the embedding of this word is a linear combination of the embeddings of the first two words, and the corresponding weights are 0.3 and 0.7 respectively.
I need to extract a row, then feed the indices and weights to tf.nn.embedding_lookup_sparse to obtain the final embeddings. How can I do that in TensorFlow?
Or I need to work around it, i.e.: preprocess my data and deal with the cookbook out of TensorFlow?
I checked in with one of the engineers here who knows more about this area, and here's what he passed on:
I am not sure if we have an efficient implementation of the this, but here is a not-so-optimal implementation using dynamic_partition and gather ops.
def sparse_slice(indices, values, needed_row_ids):
num_rows = tf.shape(indices)[0]
partitions = tf.cast(tf.equal(indices[:,0], needed_row_ids), tf.int32)
rows_to_gather = tf.dynamic_partition(tf.range(num_rows), partitions, 2)[1]
slice_indices = tf.gather(indices, rows_to_gather)
slice_values = tf.gather(values, rows_to_gather)
return slice_indices, slice_values
with tf.Session().as_default():
indices = tf.constant([[0,0], [1, 0], [2, 0], [2, 1]])
values = tf.constant([1.0, 1.0, 0.3, 0.7], dtype=tf.float32)
needed_row_ids = tf.constant([1])
slice_indices, slice_values = sparse_slice(indices, values, needed_row_ids)
print(slice_indices.eval(), slice_values.eval())
Update:
The engineer sent on an example to help with multiple rows too, thanks for pointing that out!
def sparse_slice(indices, values, needed_row_ids):
needed_row_ids = tf.reshape(needed_row_ids, [1, -1])
num_rows = tf.shape(indices)[0]
partitions = tf.cast(tf.reduce_any(tf.equal(tf.reshape(indices[:,0], [-1, 1]), needed_row_ids), 1), tf.int32)
rows_to_gather = tf.dynamic_partition(tf.range(num_rows), partitions, 2)[1]
slice_indices = tf.gather(indices, rows_to_gather)
slice_values = tf.gather(values, rows_to_gather)
return slice_indices, slice_values
with tf.Session().as_default():
indices = tf.constant([[0,0], [1, 0], [2, 0], [2, 1]])
values = tf.constant([1.0, 1.0, 0.3, 0.7], dtype=tf.float32)
needed_row_ids = tf.constant([0, 2])
Let sp be the name of your 2d SparseTensor. You can first create an indicator tensor for the rows of your SparseTensor that you want to extract, namely
mask = tf.concat([tf.constant([True, True]), tf.fill([sp.dense_shape[0] - 2],
False)], axis=0)
Next use tf.gather to propagate this to the sparse indices:
mask_sp = tf.gather(mask, sp.indices[:, 0])
Finally,
values = tf.boolean_mask(sp.values, mask_sp)
indices = tf.boolean_mask(sp.indices, mask_sp)
dense_shape = [sp.dense_shape[0] - 2, sp.dense_shape[1]]
output_sp = tf.SparseTensor(indices=indices, values=values, dense_shape=dense_shape)
Shouldn't it behave more like this:
This version will keep the order and frequency of the indices in selected_indices and, therefore, makes it possible to e.g. select the same row multiple times:
import tensorflow as tf
tf.enable_eager_execution()
def sparse_gather(indices, values, selected_indices, axis=0):
"""
indices: [[idx_ax0, idx_ax1, idx_ax2, ..., idx_axk], ... []]
values: [ value1, , ..., valuen]
"""
mask = tf.equal(indices[:, axis][tf.newaxis, :], selected_indices[:, tf.newaxis])
to_select = tf.where(mask)[:, 1]
return tf.gather(indices, to_select, axis=0), tf.gather(values, to_select, axis=0)
indices = tf.constant([[1, 0], [2, 0], [3, 0], [7, 0]])
values = tf.constant([1.0, 2.0, 3.0, 7.0], dtype=tf.float32)
needed_row_ids = tf.constant([7, 3, 2, 2, 3, 7])
slice_indices, slice_values = sparse_gather(indices, values, needed_row_ids)
print(slice_indices, slice_values)
I tried the answer by "Pete Warden" which only worked for small data. Given sparsetensor A with m nonzero elements, we would like to take out n rows. The tf.equal would take m*n space, which is not acceptable in my task.
My suggestion is to use Scipy.sparse instead of tensorflow.
In details:
take out all data from tf, indices & data, and form a Scipy.sparse. use coo
If u need to take out rows, use csr formate. if u need to take out cols, use csc
A[:,m]
transform to coo
transform to tf

Tensorflow - pick values from indicies, what is the operation called?

An example
Suppose I have a tensor values with shape (2,2,2)
values = [[[0, 1],[2, 3]],[[4, 5],[6, 7]]]
And a tensor indicies with shape (2,2) which describes what values to be selected in the innermost dimension
indicies = [[1,0],[0,0]]
Then the result will be a (2,2) matrix with these values
result = [[1,2],[4,6]]
What is this operation called in tensorflow and how to do it?
General
Note that the above shape (2,2,2) is only an example, it can be any dimension. Some conditions for this operation:
ndim(values) -1 = ndim(indicies)
values.shape[:-1] == indicies.shape == result.shape
indicies.max() < values.shape[-1] -1
I think you can emulate this with tf.gather_nd. You will just have to convert "your" indices to a representation that is suitable for tf.gather_nd. The following example here is tied to your specific example, i.e. input tensors of shape (2, 2, 2) but I think this gives you an idea how you could write the conversion for input tensors with arbitrary shape, although I am not sure how easy it would be to implement this (haven't thought about it too long). Also, I'm not claiming that this is the easiest possible solution.
import tensorflow as tf
import numpy as np
values = np.array([[[0, 1], [2, 3]], [[4, 5], [6, 7]]])
values_tf = tf.constant(values)
indices = np.array([[1, 0], [0, 0]])
converted_idx = []
for k in range(values.shape[0]):
outer = []
for l in range(values.shape[1]):
inds = [k, l, indices[k][l]]
outer.append(inds)
print(inds)
converted_idx.append(outer)
with tf.Session() as sess:
result = tf.gather_nd(values_tf, converted_idx)
print(sess.run(result))
This prints
[[1 2]
[4 6]]
Edit: To handle arbitrary shapes here is a recursive solution that should work (only tested on your example):
def convert_idx(last_dim_vals, ori_indices, access_to_ori, depth):
if depth == len(last_dim_vals.shape) - 1:
inds = access_to_ori + [ori_indices[tuple(access_to_ori)]]
return inds
outer = []
for k in range(ori_indices.shape[depth]):
inds = convert_idx(last_dim_vals, ori_indices, access_to_ori + [k], depth + 1)
outer.append(inds)
return outer
You can use this together with the original code I posted like so:
...
converted_idx = convert_idx(values, indices, [], 0)
with tf.Session() as sess:
result = tf.gather_nd(values_tf, converted_idx)
print(sess.run(result))

How to perform stencil computations element-wise on a matrix in Theano?

I have the following blurring kernel I need to apply to every pixel in an RGB image
[ 0.0625 0.025 0.375 0.025 0.0625 ]
So, the pseudo-code looks something like this in Numpy
for i in range(rows):
for j in range(cols):
for k in range(3):
final[i][j][k] = image[i-2][j][k]*0.0625 + \
image[i-1][j][k]*0.25 + \
image[i][j][k]*0.375 + \
image[i+1][j][k]*0.25 + \
image[i+2][j][k]*0.0625
I've tried searching for a question similar to this but never found these sort of data accesses in the computation.
How do I perform the above function for a Theano tensor matrix?
You can use Conv2D function for this task. see the reference here and may be you also can read the example tutorial here. Notes for this solution:
Because your kernel is symmetrical, you can ignore filter_flip parameter
Conv2D is using 4D input and kernel shape as parameters, so you need to reshape it first
Conv2D sum every channel (I think in your case 'k' variable is for RGB right? it's called channel) so you should separate it first
This is my example code, I use simpler kernel here:
import numpy as np
import theano
import theano.tensor as T
from theano.tensor.nnet import conv2d
# original image
img = [[[1, 2, 3, 4], #R channel
[1, 1, 1, 1], #
[2, 2, 2, 2]], #
[[1, 1, 1, 1], #G channel
[2, 2, 2, 2], #
[1, 2, 3, 4]], #
[[1, 1, 1, 1], #B channel
[1, 2, 3, 4], #
[2, 2, 2, 2],]]#
# separate and reshape each channel to 4D
R = np.asarray([[img[0]]], dtype='float32')
G = np.asarray([[img[1]]], dtype='float32')
B = np.asarray([[img[2]]], dtype='float32')
# 4D kernel from the original : [1,0,1]
kernel = np.asarray([[[[1],[0],[1]]]], dtype='float32')
# theano convolution
t_img = T.ftensor4("t_img")
t_kernel = T.ftensor4("t_kernel")
result = conv2d(
input = t_img,
filters=t_kernel,
filter_shape=(1,1,1,3),
border_mode = 'half')
f = theano.function([t_img,t_kernel],result)
# compute each channel
R = f(R,kernel)
G = f(G,kernel)
B = f(B,kernel)
# reshape again
img = np.asarray([R,G,B])
img = np.reshape(img,(3,3,4))
print img
If you have anything to discuss about the code, please comment. Hope it helps.

Categories