I have two tensors, both with batch size N of images and same resolution. I would like to convolve the first image in tensor 1 with the first image of tensor 2, second image of tensor 1 with tensor 2, and so on. I want the output to be a tensor with N images of the same size.
I looked into using tf.nn.conv2d, but it seems like this command will take in a batch of N images and convolve them with a single filter.
I looked into examples like What does tf.nn.conv2d do in tensorflow?
but they do not talk about multiple images and multiple filters.
You can manage to do something like that using tf.nn.separable_conv2d, using the batch dimension as the separable channels and the actual input channels as batch dimension. I am not sure if it is going to be perform very well, though, as it involves several transpositions (which are not free in TensorFlow) and a convolution through a large number of channels, which is not really the optimized use case. Here is how it could work:
import tensorflow as tf
import numpy as np
import scipy.signal
# Expects imgs with shape (B, H, W, C) and filters with shape (B, H, W, 1)
def batch_conv(imgs, filters, strides, padding, rate=None):
imgs = tf.convert_to_tensor(imgs)
filters = tf.convert_to_tensor(filters)
b = tf.shape(imgs)[0]
imgs_t = tf.transpose(imgs, [3, 1, 2, 0])
filters_t = tf.transpose(filters, [1, 2, 0, 3])
strides = [strides[3], strides[1], strides[2], strides[0]]
# "do-nothing" pointwise filter
pointwise = tf.eye(b, batch_shape=[1, 1])
conv = tf.nn.separable_conv2d(imgs_t, filters_t, pointwise, strides, padding, rate)
return tf.transpose(conv, [3, 1, 2, 0])
# Slow, loop-based version using SciPy's correlate to check result
def batch_conv_np(imgs, filters, padding):
return np.stack(
[np.stack([scipy.signal.correlate2d(img[..., i], filter[..., 0], padding.lower())
for i in range(img.shape[-1])], axis=-1)
for img, filter in zip(imgs, filters)], axis=0)
# Make random input
np.random.seed(0)
imgs = np.random.rand(5, 20, 30, 3).astype(np.float32)
filters = np.random.rand(5, 20, 30, 1).astype(np.float32)
padding = 'SAME'
# Test
res_np = batch_conv_np(imgs, filters, padding)
with tf.Graph().as_default(), tf.Session() as sess:
res_tf = batch_conv(imgs, filters, [1, 1, 1, 1], padding)
res_tf_val = sess.run(res_tf)
print(np.allclose(res_np, res_tf_val))
# True
Related
Im new to tensorflow and Im trying to feed some data with tensorflow.Dataset. Im using Cityscape dataset with 8 different classes. Here is my code:
import os
import cv2
import numpy as np
import tensorflow as tf
H = 256
W = 256
id2cat = np.array([0,0,0,0,0,0,0, 1,1,1,1, 2,2,2,2,2,2, 3,3,3,3, 4,4, 5, 6,6, 7,7,7,7,7,7,7,7,7])
def readImage(x):
x = cv2.imread(x, cv2.IMREAD_COLOR)
x = cv2.resize(x, (W, H))
x = x / 255.0
x = x.astype(np.float32)
return x
def readMask(path):
mask = cv2.imread(path, 0)
mask = cv2.resize(mask, (W, H))
mask = id2cat[mask]
return mask.astype(np.int32)
def preprocess(x, y):
def f(x, y):
image = readImage(x)
mask = readMask(y)
return image, mask
image, mask = tf.numpy_function(f, [x, y], [tf.float32, tf.int32])
mask = tf.one_hot(mask, 3, dtype=tf.int32)
image.set_shape([H, W, 3])
mask.set_shape([H, W, 3])
return image, mask
def tf_dataset(x, y, batch=8):
dataset = tf.data.Dataset.from_tensor_slices((x, y))
dataset = dataset.shuffle(buffer_size=5000)
dataset = dataset.map(preprocess)
dataset = dataset.batch(batch)
dataset = dataset.repeat()
dataset = dataset.prefetch(2)
return dataset
def loadCityscape():
trainPath = os.path.join(os.path.dirname(os.path.realpath(__file__)), 'datasets\\Cityscape\\train')
imagesPath = os.path.join(trainPath, 'images')
maskPath = os.path.join(trainPath, 'masks')
images = []
masks = []
print('Loading images and masks for Cityscape dataset...')
for image in os.listdir(imagesPath):
images.append(readImage(os.path.join(imagesPath, image)))
for mask in os.listdir(maskPath):
if 'label' in mask:
masks.append(readMask(os.path.join(maskPath, mask)))
print('Loaded {} images\n'.format(len(images)))
return images, masks
images, masks = loadCityscape()
dataset = tf_dataset(images, masks, batch=8)
print(dataset)
That last print(dataset) shows:
<PrefetchDataset shapes: ((None, 256, 256, 3), (None, 256, 256, 3)), types: (tf.float32, tf.int32)>
Why am I obtaining (None, 256, 256, 3) instead of (8, 256, 256, 3)? I also have some doubts about how to iterate over this dataset.
Thanks a lot.
Tensorflow is a graph based mathematical framework that abstracts for you all of those complex vectorial or matricial operations you face, particularly in machine learning.
What the developers though is that it would be unconfortable to specify every single time how many input vectors you need to pass in your model for the training, so they decided to abstract it for you.
You will not interested if your model is fed with one single or thousands samples as long as the output matches with the input dimension (but also any internal operation should match in dimensions!).
So the None size is a placeholder for a possible changing shape, that is usually the batch size of the input.
We need a placeholder because (None, 2) is a different shape with respect of just (2,), because in the first case we know we will face 2 dimensions.
Even if the None dimension is unknown when you "compile" your model, it will be evaluated only when it is strictly needed, in other words when you run it. In this way your model will be happy to run on a batch size of 64 as like as 128 samples.
For the rest a (non-scalar) Tensor behaves like a normal numpy array:
tensor1 = tf.constant([ 0, 1, 2, 3]) # shape (4, )
tensor2 = tf.constant([ [0], [1], [2], [3]]) # shape (4, 1)
for x in tensor1:
print(x) # 0, 1, 2, 3
for x in tensor2:
print(x) # Tensor([0]), Tensor([1]), Tensor([2]), Tensor([3])
The only difference is that it can be allocated into any supported device memory (CPU / Cuda GPU).
Iterating through the dataset is just like slicing it at (usually) constant sizes, where that constant is your batch size, which will fill that empty None dimension.
This line of code will be responsible of slicing your dataset into "sub-tensors" ("sub-arrays") composed by its samples:
dataset = dataset.batch(N)
# iterating over it:
for batch in dataset: # I'm taking N samples here
...
Your "runtime" shape will be (N, 256, 256, 3), but if you will try to take an element from the dataset it could still have None in the shape... That's because we can't guarantee, for example, that the dimension of the dataset is exactly divisible by the batch size, so some trailing samples of a variable shape could still be possible. You will hardly get rid off that None dimension, but in some custom methods of your model you could achieve that.
If you are still unconfortable with tensors there is the tensor.numpy() method that gives you back a numpy array, but at the cost of copying it (usually to your CPU). This is not available in every step of the process.
There are many way to define a dataset in tensorflow, I suggest to read how they think you should build an input pipeline, because it will make your life easier if you understand how much tensorflow takes your code at higher levels of abstraction.
I am trying to filter a TensorFlow tensor of shape (N_batch, N_data), where N_batch is the batch size (e.g. 32), and N_data is the size of the (noisy) timeseries array. I have a Gaussian kernel (taken from here), which is one-dimensional. I then want to use tensorflow.nn.conv1d to convolve this kernel with my signal.
I have been trying for most of the morning to get the dimensions of the input signal and the kernel right, but obviously with no success. From what I gathered from the interwebs, the dimensions of both the input signal and the kernel need to be aligned in some finicky way, and I just can't figure out which way that is. The TensorFlow error messages aren't particularly meaningful either (Shape must be rank 4 but is rank 3 for 'conv1d/Conv2D' (op: 'Conv2D') with input shapes: [?,1,1000], [1,81]). Below I've included a little piece of code to reproduce the situation:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
# Based on: https://stackoverflow.com/a/52012658/1510542
# Credits to #zephyrus
def gaussian_kernel(size, mean, std):
d = tf.distributions.Normal(tf.cast(mean, tf.float32), tf.cast(std, tf.float32))
vals = d.prob(tf.range(start=-size, limit=size+1, dtype=tf.float32))
kernel = vals # Some reshaping is required here
return kernel / tf.reduce_sum(kernel)
def gaussian_filter(input, sigma):
size = int(4*sigma + 0.5)
x = input # Some reshaping is required here
kernel = gaussian_kernel(size=size, mean=0.0, std=sigma)
conv = tf.nn.conv1d(x, kernel, stride=1, padding="SAME")
return conv
def run_filter():
tf.reset_default_graph()
# Define size of data, batch sizes
N_batch = 32
N_data = 1000
noise = 0.2 * (np.random.rand(N_batch, N_data) - 0.5)
x = np.linspace(0, 2*np.pi, N_data)
y = np.tile(np.sin(x), N_batch).reshape(N_batch, N_data)
y_noisy = y + noise
input = tf.placeholder(tf.float32, shape=[None, N_data])
smooth_input = gaussian_filter(input, sigma=10)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
y_smooth = smooth_input.eval(feed_dict={input: y_noisy})
plt.plot(y_noisy[0])
plt.plot(y_smooth[0])
plt.show()
if __name__ == "__main__":
run_filter()
Any ideas?
You need to add channel dimensions to your input/kernel, since TF convolutions are generally used for multi-channel inputs/outputs. As you are working with simple 1-channel input/output this amounts to just adding some size-1 "dummy" axes.
Since by default convolution expects channels to come last, your placeholder should have shape [None, N_data, 1] and your input be modified like
y_noisy = y + noise
y_noisy = y_noisy[:, :, np.newaxis]
Similarly, you need to add input and output channel dimensions to your filter:
kernel = gaussian_kernel(size=size, mean=0.0, std=sigma)
kernel = kernel[:, tf.newaxis, tf.newaxis]
That is, the filter is expected to have shape [width, in_channels, out_cannels].
I am trying to implement some deep neural network with tensorflow. But I have already a problem at the first steps.
When I type the following using theano.tensor.nnet.conv2d, I get the expected result:
import theano.tensor as T
import theano
import numpy as np
# Theano expects input of shape (batch_size, channels, height, width)
# and filters of shape (out_channel, in_channel, height, width)
x = T.tensor4()
w = T.tensor4()
c = T.nnet.conv2d(x, w, filter_flip=False)
f = theano.function([x, w], [c], allow_input_downcast=True)
base = np.array([[1, 0, 0, 0], [1, 0, 0, 0], [0, 0, 0, 1]]).T
i = base[np.newaxis, np.newaxis, :, :]
print f(i, i) # -> results in 3 as expected because np.sum(i*i) = 3
However, when I do the presumingly same thing in tf.nn.conv2d, my result is different:
import tensorflow as tf
import numpy as np
# TF expects input of (batch_size, height, width, channels)
# and filters of shape (height, width, in_channel, out_channel)
x = tf.placeholder(tf.float32, shape=(1, 4, 3, 1), name="input")
w = tf.placeholder(tf.float32, shape=(4, 3, 1, 1), name="weights")
c = tf.nn.conv2d(x, w, strides=[1, 1, 1, 1], padding='VALID')
with tf.Session() as sess:
base = np.array([[1, 0, 0, 0], [1, 0, 0, 0], [0, 0, 0, 1]]).T
i = base[np.newaxis, :, :, np.newaxis]
weights = base[:, :, np.newaxis, np.newaxis]
res = sess.run(c, feed_dict={x: i, w: weights})
print res # -> results in -5.31794233e+37
The layout of the convolution operation in tensorflow is a little different from theano, which is why the input looks slightly different.
However, since strides in Theano default to (1,1,1,1) and a valid convolution is the default, too, this should be the exact same input.
Furthermore, tensorflow does not flip the kernel (implements cross-correlation).
Do you have any idea why this is not giving the same result?
Thanks in advance,
Roman
Okay, I found a solution, even though it is not really one because I do not understand it myself.
First, it seems that for the task that I was trying to solve, Theano and Tensorflow use different convolutions.
The task at hand is a "1.5 D convolution" which means sliding a kernel in only one direction over the input (here DNA sequences).
In Theano, I solved this with the conv2d operation that had the same amount of rows as the kernels and it was working fine.
However, Tensorflow (probably correctly) wants me to use conv1d for that, interpreting the rows as channels.
So, the following should work but didn't in the beginning:
import tensorflow as tf
import numpy as np
# TF expects input of (batch_size, height, width, channels)
# and filters of shape (height, width, in_channel, out_channel)
x = tf.placeholder(tf.float32, shape=(1, 4, 3, 1), name="input")
w = tf.placeholder(tf.float32, shape=(4, 3, 1, 1), name="weights")
x_star = tf.reshape(x, [1, 4, 3])
w_star = tf.reshape(w, [4, 3, 1])
c = tf.nn.conv1d(x_star, w_star, stride=1, padding='VALID')
with tf.Session() as sess:
base = np.array([[1, 0, 0, 0], [1, 0, 0, 0], [0, 0, 0, 1]]).T
i = base[np.newaxis, :, :, np.newaxis]
weights = base[:, :, np.newaxis, np.newaxis]
res = sess.run(c, feed_dict={x: i, w: weights})
print res # -> produces 3 after updating tensorflow
This code produced NaN until I updated Tensorflow to version 1.0.1 and since then, it produces the expected output.
To summarize, my problem was partly solved by using 1D convolution instead of 2D convolution but still required the update of the framework. For the second part, I have no idea at all what might have caused wrong behavior in the first place.
EDIT: The code I posted in my original question is now working fine, too. So the different behavior came only from an old (maybe corrupt) version of TF.
I would like to implement the spatial pyramid pooling layer as introduced in this paper.
As the paper setting, the keypoint is to define variant kernel size and stride size of max_pooling layer, which is:
kernel_size = ceil(a/n)
stride_size = floor(a/n)
where a is the input tensor spatial size, and n is the pyramid level, i.e. spatial bins of the pooling output.
I try to implement this layer with tensorflow:
import numpy as np
import tensorflow as tf
def spp_layer(input_, name='SPP_layer'):
"""
4 level SPP layer.
spatial bins: [6_6, 3_3, 2_2, 1_1]
Parameters
----------
input_ : tensor
name : str
Returns
-------
tensor
"""
shape = input_.get_shape().as_list()
with tf.variable_scope(name):
spp_6_6_pool = tf.nn.max_pool(input_,
ksize=[1,
np.ceil(shape[1]/6).astype(np.int32),
np.ceil(shape[2]/6).astype(np.int32),
1],
strides=[1, shape[1]//6, shape[2]//6, 1],
padding='SAME')
print('SPP layer level 6:', spp_6_6_pool.get_shape().as_list())
spp_3_3_pool = tf.nn.max_pool(input_,
ksize=[1,
np.ceil(shape[1]/3).astype(np.int32),
np.ceil(shape[2]/3).astype(np.int32),
1],
strides=[1, shape[1]//3, shape[2]//3, 1],
padding='SAME')
print('SPP layer level 3:', spp_3_3_pool.get_shape().as_list())
spp_2_2_pool = tf.nn.max_pool(input_,
ksize=[1,
np.ceil(shape[1]/2).astype(np.int32),
np.ceil(shape[2]/2).astype(np.int32),
1],
strides=[1, shape[1]//2, shape[2]//2, 1],
padding='SAME')
print('SPP layer level 2:', spp_2_2_pool.get_shape().as_list())
spp_1_1_pool = tf.nn.max_pool(input_,
ksize=[1,
np.ceil(shape[1]/1).astype(np.int32),
np.ceil(shape[2]/1).astype(np.int32),
1],
strides=[1, shape[1]//1, shape[2]//1, 1],
padding='SAME')
print('SPP layer level 1:', spp_1_1_pool.get_shape().as_list())
spp_6_6_pool_flat = tf.reshape(spp_6_6_pool, [shape[0], -1])
spp_3_3_pool_flat = tf.reshape(spp_3_3_pool, [shape[0], -1])
spp_2_2_pool_flat = tf.reshape(spp_2_2_pool, [shape[0], -1])
spp_1_1_pool_flat = tf.reshape(spp_1_1_pool, [shape[0], -1])
spp_pool = tf.concat(1, [spp_6_6_pool_flat,
spp_3_3_pool_flat,
spp_2_2_pool_flat,
spp_1_1_pool_flat])
return spp_pool
But it cannot gurantee the same length pooling output, when the input sizes are different.
How to solve this problem?
I believe the authors of the paper are wrong, the formula should be:
stride_size = floor(a/n)
kernel_size = floor(a/n) + (a mod n)
Notice that both formula give the same result for n < 4.
You can prove this result by doing the euclidian division of a by n.
I modified the code I found at https://github.com/tensorflow/tensorflow/issues/6011 and here it is:
def spp_layer(input_, levels=(6, 3, 2, 1), name='SPP_layer'):
shape = input_.get_shape().as_list()
with tf.variable_scope(name):
pyramid = []
for n in levels:
stride_1 = np.floor(float(shape[1] / n)).astype(np.int32)
stride_2 = np.floor(float(shape[2] / n)).astype(np.int32)
ksize_1 = stride_1 + (shape[1] % n)
ksize_2 = stride_2 + (shape[2] % n)
pool = tf.nn.max_pool(input_,
ksize=[1, ksize_1, ksize_2, 1],
strides=[1, stride_1, stride_2, 1],
padding='VALID')
# print("Pool Level {}: shape {}".format(n, pool.get_shape().as_list()))
pyramid.append(tf.reshape(pool, [shape[0], -1]))
spp_pool = tf.concat(1, pyramid)
return spp_pool
Yes, the output size right now is not constant, and looking at your code it seems that your individual pooling operations will have output sizes that alternate between two numbers. The reason is that the output size, at least for 'SAME', is calculated by the formula
out_height = ceil(float(in_height) / float(strides[1]))
If for the stride we use what is essentially the floor of in_height/n, then the output will fluctuate between n and n+1. What you need to do to ensure constancy is use the ceil operation instead for your stride values. The altered code for spp_6_6 pool would be
ksize=[1, np.ceil(shape[1]/6).astype(np.int32), np.ceil(shape[2]/6).astype(np.int32), 1]
spp_6_6_pool = tf.nn.max_pool(input_, ksize=ksize,strides=ksize, padding='SAME')
I defined the ksize outside of the call to tf.nn.max_pool() for clarity. So, if you use your ksize for your strides too, it should work out. If you round up then mathematically as long as the size of the input dimensions are at least double the value of your largest pyramid size n, your output size should be constant with 'SAME' padding!
Somewhat related to your question, in your first max pooling operation your ksize parameter is
ksize=[1, np.ceil(shape[1]/6).astype(np.int32), np.ceil(shape[1]/6).astype(np.int32), 1]
For the third element of ksize, you did shape[1]/6 instead of shape[2]/6. I assumed that was a typo, so I changed it in the above code.
I'm aware that in the paper that the stride is taken to be the floor of a/n and not the ceil, but as of now, using the native pooling operations of tensorflow, there is no way to make that work as desired. 'VALID' pooling will not result in anything near what you want.
Well... if you're really willing to put the time into it, you can take the input size modulo your largest pyramid dimension, in this case 6, and handle all six of these circumstances independently. I can't find a good justification for this though. Tensorflow pads differently than other libraries such as, say, Caffe, so inherently there will be differences. The above solution will get you what they're aiming for in the paper, a pyramid of pooling layers where disjoint regions of the image are being max-pooled with differing levels of granularity.
EDIT: Actually, if you use tf.pad() to manually pad the input yourself and create a new input for each max pooling operation such that the new inputs have height and width a neat multiple of n, then it would work out with the code you already have.
I would like to train a network with two different shapes of input tensor. Each epoch chooses one type.
Here I write a small code:
import tensorflow as tf
import numpy as np
with tf.Session() as sess:
imgs1 = tf.placeholder(tf.float32, [4, 224, 224, 3], name = 'input_imgs1')
imgs2 = tf.placeholder(tf.float32, [4, 180, 180, 3], name = 'input_imgs2')
epoch_num_tf = tf.placeholder(tf.int32, [], name = 'input_epoch_num')
imgs = tf.cond(tf.equal(tf.mod(epoch_num_tf, 2), 0),
lambda: tf.Print(imgs2, [imgs2.get_shape()], message='(even number) input epoch number is '),
lambda: tf.Print(imgs1, [imgs1.get_shape()], message='(odd number) input epoch number is'))
print(imgs.get_shape())
for epoch in range(10):
epoch_num = np.array(epoch).astype(np.int32)
imgs1_input = np.ones([4, 224, 224, 3], dtype = np.float32)
imgs2_input = np.ones([4, 180, 180, 3], dtype = np.float32)
output = sess.run(imgs, feed_dict = {epoch_num_tf: epoch_num,
imgs1: imgs1_input,
imgs2: imgs2_input})
When I execute it, the output of imgs.get_shape() is (4, ?, ?, 3)
i.e. imgs.get_shape()[1]=None, imgs.get_shape()[2]=None.
But I will use the value of the output of imgs.get_shape() to define the kernel (ksize) and strides size (strides) of the tf.nn.max_pool() e.g. ksize=[1,imgs.get_shape()[1]/6, imgs.get_shape()[2]/6, 1] in the future code.
I think ksize and strides cannot support tf.Tensor value.
How to solve this problem? Or how to set the shape of imgs conditionally?
When you do print(a.get_shape()), you are getting the static shape of the tensor a. Assuming you mean imgs.get_shape() and not a.get_shape() in the code above, dimensions 1 and 2 of imgs vary dynamically with the value of epoch_num_tf. Therefore the static shape in those dimensions is unknown, which TensorFlow represents as None.
If you want to use the dynamic shape of imgs in subsequent code, you should use the tf.shape() operator to get the shape as a tf.Tensor. For example, instead of imgs.get_shape()[2], you can use tf.shape(imgs)[2].
Unfortunately, the ksize and strides arguments of tf.nn.max_pool() do not accept tf.Tensor values. (I think this is a historical limitation, because these were configured as "attrs" rather than "inputs" of the corresponding kernel. Please open a GitHub issue if you'd like to request this feature!) One possible workaround would be to use another tf.cond():
imgs = ...
# Could also use `tf.equal(tf.mod(epoch_num_tf, 2), 0)` as the predicate.
pool_output = tf.cond(tf.equal(tf.shape(imgs)[2], 180),
lambda: tf.nn.max_pool(imgs, ksize=[1, 180/6, 180/6, 1], ...),
lambda: tf.nn.max_pool(imgs, ksize=[1, 224/6, 224/6, 1], ...))