I am trying to filter a TensorFlow tensor of shape (N_batch, N_data), where N_batch is the batch size (e.g. 32), and N_data is the size of the (noisy) timeseries array. I have a Gaussian kernel (taken from here), which is one-dimensional. I then want to use tensorflow.nn.conv1d to convolve this kernel with my signal.
I have been trying for most of the morning to get the dimensions of the input signal and the kernel right, but obviously with no success. From what I gathered from the interwebs, the dimensions of both the input signal and the kernel need to be aligned in some finicky way, and I just can't figure out which way that is. The TensorFlow error messages aren't particularly meaningful either (Shape must be rank 4 but is rank 3 for 'conv1d/Conv2D' (op: 'Conv2D') with input shapes: [?,1,1000], [1,81]). Below I've included a little piece of code to reproduce the situation:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
# Based on: https://stackoverflow.com/a/52012658/1510542
# Credits to #zephyrus
def gaussian_kernel(size, mean, std):
d = tf.distributions.Normal(tf.cast(mean, tf.float32), tf.cast(std, tf.float32))
vals = d.prob(tf.range(start=-size, limit=size+1, dtype=tf.float32))
kernel = vals # Some reshaping is required here
return kernel / tf.reduce_sum(kernel)
def gaussian_filter(input, sigma):
size = int(4*sigma + 0.5)
x = input # Some reshaping is required here
kernel = gaussian_kernel(size=size, mean=0.0, std=sigma)
conv = tf.nn.conv1d(x, kernel, stride=1, padding="SAME")
return conv
def run_filter():
tf.reset_default_graph()
# Define size of data, batch sizes
N_batch = 32
N_data = 1000
noise = 0.2 * (np.random.rand(N_batch, N_data) - 0.5)
x = np.linspace(0, 2*np.pi, N_data)
y = np.tile(np.sin(x), N_batch).reshape(N_batch, N_data)
y_noisy = y + noise
input = tf.placeholder(tf.float32, shape=[None, N_data])
smooth_input = gaussian_filter(input, sigma=10)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
y_smooth = smooth_input.eval(feed_dict={input: y_noisy})
plt.plot(y_noisy[0])
plt.plot(y_smooth[0])
plt.show()
if __name__ == "__main__":
run_filter()
Any ideas?
You need to add channel dimensions to your input/kernel, since TF convolutions are generally used for multi-channel inputs/outputs. As you are working with simple 1-channel input/output this amounts to just adding some size-1 "dummy" axes.
Since by default convolution expects channels to come last, your placeholder should have shape [None, N_data, 1] and your input be modified like
y_noisy = y + noise
y_noisy = y_noisy[:, :, np.newaxis]
Similarly, you need to add input and output channel dimensions to your filter:
kernel = gaussian_kernel(size=size, mean=0.0, std=sigma)
kernel = kernel[:, tf.newaxis, tf.newaxis]
That is, the filter is expected to have shape [width, in_channels, out_cannels].
Related
Im new to tensorflow and Im trying to feed some data with tensorflow.Dataset. Im using Cityscape dataset with 8 different classes. Here is my code:
import os
import cv2
import numpy as np
import tensorflow as tf
H = 256
W = 256
id2cat = np.array([0,0,0,0,0,0,0, 1,1,1,1, 2,2,2,2,2,2, 3,3,3,3, 4,4, 5, 6,6, 7,7,7,7,7,7,7,7,7])
def readImage(x):
x = cv2.imread(x, cv2.IMREAD_COLOR)
x = cv2.resize(x, (W, H))
x = x / 255.0
x = x.astype(np.float32)
return x
def readMask(path):
mask = cv2.imread(path, 0)
mask = cv2.resize(mask, (W, H))
mask = id2cat[mask]
return mask.astype(np.int32)
def preprocess(x, y):
def f(x, y):
image = readImage(x)
mask = readMask(y)
return image, mask
image, mask = tf.numpy_function(f, [x, y], [tf.float32, tf.int32])
mask = tf.one_hot(mask, 3, dtype=tf.int32)
image.set_shape([H, W, 3])
mask.set_shape([H, W, 3])
return image, mask
def tf_dataset(x, y, batch=8):
dataset = tf.data.Dataset.from_tensor_slices((x, y))
dataset = dataset.shuffle(buffer_size=5000)
dataset = dataset.map(preprocess)
dataset = dataset.batch(batch)
dataset = dataset.repeat()
dataset = dataset.prefetch(2)
return dataset
def loadCityscape():
trainPath = os.path.join(os.path.dirname(os.path.realpath(__file__)), 'datasets\\Cityscape\\train')
imagesPath = os.path.join(trainPath, 'images')
maskPath = os.path.join(trainPath, 'masks')
images = []
masks = []
print('Loading images and masks for Cityscape dataset...')
for image in os.listdir(imagesPath):
images.append(readImage(os.path.join(imagesPath, image)))
for mask in os.listdir(maskPath):
if 'label' in mask:
masks.append(readMask(os.path.join(maskPath, mask)))
print('Loaded {} images\n'.format(len(images)))
return images, masks
images, masks = loadCityscape()
dataset = tf_dataset(images, masks, batch=8)
print(dataset)
That last print(dataset) shows:
<PrefetchDataset shapes: ((None, 256, 256, 3), (None, 256, 256, 3)), types: (tf.float32, tf.int32)>
Why am I obtaining (None, 256, 256, 3) instead of (8, 256, 256, 3)? I also have some doubts about how to iterate over this dataset.
Thanks a lot.
Tensorflow is a graph based mathematical framework that abstracts for you all of those complex vectorial or matricial operations you face, particularly in machine learning.
What the developers though is that it would be unconfortable to specify every single time how many input vectors you need to pass in your model for the training, so they decided to abstract it for you.
You will not interested if your model is fed with one single or thousands samples as long as the output matches with the input dimension (but also any internal operation should match in dimensions!).
So the None size is a placeholder for a possible changing shape, that is usually the batch size of the input.
We need a placeholder because (None, 2) is a different shape with respect of just (2,), because in the first case we know we will face 2 dimensions.
Even if the None dimension is unknown when you "compile" your model, it will be evaluated only when it is strictly needed, in other words when you run it. In this way your model will be happy to run on a batch size of 64 as like as 128 samples.
For the rest a (non-scalar) Tensor behaves like a normal numpy array:
tensor1 = tf.constant([ 0, 1, 2, 3]) # shape (4, )
tensor2 = tf.constant([ [0], [1], [2], [3]]) # shape (4, 1)
for x in tensor1:
print(x) # 0, 1, 2, 3
for x in tensor2:
print(x) # Tensor([0]), Tensor([1]), Tensor([2]), Tensor([3])
The only difference is that it can be allocated into any supported device memory (CPU / Cuda GPU).
Iterating through the dataset is just like slicing it at (usually) constant sizes, where that constant is your batch size, which will fill that empty None dimension.
This line of code will be responsible of slicing your dataset into "sub-tensors" ("sub-arrays") composed by its samples:
dataset = dataset.batch(N)
# iterating over it:
for batch in dataset: # I'm taking N samples here
...
Your "runtime" shape will be (N, 256, 256, 3), but if you will try to take an element from the dataset it could still have None in the shape... That's because we can't guarantee, for example, that the dimension of the dataset is exactly divisible by the batch size, so some trailing samples of a variable shape could still be possible. You will hardly get rid off that None dimension, but in some custom methods of your model you could achieve that.
If you are still unconfortable with tensors there is the tensor.numpy() method that gives you back a numpy array, but at the cost of copying it (usually to your CPU). This is not available in every step of the process.
There are many way to define a dataset in tensorflow, I suggest to read how they think you should build an input pipeline, because it will make your life easier if you understand how much tensorflow takes your code at higher levels of abstraction.
I am trying to feed the pixel vector to the convolutional neural network (CNN), where the pixel vector came from image data like cifar-10 dataset. Before feeding the pixel vector to CNN, I need to expand the pixel vector with maclaurin series. The point is, I figured out how to expand tensor with one dim, but not able to get it right for tensor with dim >2. Can anyone one give me ideas of how to apply maclaurin series of one dim tensor to tensor dim more than 1? is there any heuristics approach to implement this either in TensorFlow or Keras? any possible thought?
maclaurin series on CNN:
I figured out way of expanding tensor with 1 dim using maclaurin series. Here is how to scratch implementation looks like:
def cnn_taylor(input_dim, approx_order=2):
x = Input((input_dim,))
def pwr(x, approx_order):
x = x[..., None]
x = tf.tile(x, multiples=[1, 1, approx_order + 1])
pw = tf.range(0, approx_order + 1, dtype=tf.float32)
x_p = tf.pow(x, pw)
x_p = x_p[..., None]
return x_p
x_p = Lambda(lambda x: pwr(x, approx_order))(x)
h = Dense(1, use_bias=False)(x_p)
def cumu_sum(h):
h = tf.squeeze(h, axis=-1)
s = tf.cumsum(h, axis=-1)
s = s[..., None]
return s
S = Lambda(cumu_sum)(h)
so above implementation is sketch coding attempt on how to expand CNN with Taylor expansion by using 1 dim tensor. I am wondering how to do same thing to tensor with multi dim array (i.e, dim=3).
If I want to expand CNN with an approximation order of 2 with Taylor expansion where input is a pixel vector from RGB image, how am I going to accomplish this easily in TensorFlow? any thought? Thanks
If I understand correctly, each x in the provided computational graph is just a scalar (one channel of a pixel). In this case, in order to apply the transformation to each pixel, you could:
Flatten the 4D (b, h, w, c) input coming from the convolutional layer into a tensor of shape (b, h*w*c).
Apply the transformation to the resulting tensor.
Undo the reshaping to get a 4D tensor of shape (b, h, w, c)` back for which the "Taylor expansion" has been applied element-wise.
This could be achieved as follows:
shape_cnn = h.shape # Shape=(bs, h, w, c)
flat_dim = h.shape[1] * h.shape[2] * h.shape[3]
h = tf.reshape(h, (-1, flat_dim))
taylor_model = taylor_expansion_network(input_dim=flat_dim, max_pow=approx_order)
h = taylor_model(h)
h = tf.reshape(h, (-1, shape_cnn[1], shape_cnn[2], shape_cnn[3]))
NOTE: I am borrowing the function taylor_expansion_network from this answer.
UPDATE: I still don't clearly understand the end goal, but perhaps this update brings us closer to the desired output. I modified the taylor_expansion_network to apply the first part of the pipeline to RGB images of shape (width, height, nb_channels=3), returning a tensor of shape (width, height, nb_channels=3, max_pow+1):
def taylor_expansion_network_2(width, height, nb_channels=3, max_pow=2):
input_dim = width * height * nb_channels
x = Input((width, height, nb_channels,))
h = tf.reshape(x, (-1, input_dim))
# Raise input x_i to power p_i for each i in [0, max_pow].
def raise_power(x, max_pow):
x_ = x[..., None] # Shape=(batch_size, input_dim, 1)
x_ = tf.tile(x_, multiples=[1, 1, max_pow + 1]) # Shape=(batch_size, input_dim, max_pow+1)
pows = tf.range(0, max_pow + 1, dtype=tf.float32) # Shape=(max_pow+1,)
x_p = tf.pow(x_, pows) # Shape=(batch_size, input_dim, max_pow+1)
return x_p
h = raise_power(h, max_pow)
# Compute s_i for each i in [0, max_pow]
h = tf.cumsum(h, axis=-1) # Shape=(batch_size, input_dim, max_pow+1)
# Get the input format back
h = tf.reshape(h, (-1, width, height, nb_channels, max_pow+1)) # Shape=(batch_size, w, h, nb_channels, max_pow+1)
# Return Taylor expansion model
model = Model(inputs=x, outputs=h)
model.summary()
return model
In this modified model, the last step of the pipeline, namely the sum of w_i * s_i for each i, is not applied. Now, you can use the resulting tensor of shape (width, height, nb_channels=3, max_pow+1) in any way you want.
I try to apply a convolutional layer to a picture of shape [256,256,3]
a have an error when I user the tensor of the image directly
conv1 = conv2d(input,W_conv1) +b_conv1 #<=== error
error message:
ValueError: Shape must be rank 4 but is rank 3 for 'Conv2D' (op: 'Conv2D')
with input shapes: [256,256,3], [3,3,3,1].
but when I reshape the function conv2d work normally
x_image = tf.reshape(input,[-1,256,256,3])
conv1 = conv2d(x_image,W_conv1) +b_conv1
if I must reshape the tensor what the best value to reshape in my case and why?
import tensorflow as tf
import numpy as np
from PIL import Image
def img_to_tensor(img) :
return tf.convert_to_tensor(img, np.float32)
def weight_generater(shape):
return tf.Variable(tf.truncated_normal(shape,stddev=0.1))
def bias_generater(shape):
return tf.Variable(tf.constant(.1,shape=shape))
def conv2d(x,W):
return tf.nn.conv2d(x,W,[1,1,1,1],'SAME')
def pool_max_2x2(x):
return tf.nn.max_pool(x,ksize=[1,2,2,1],strides=[1,1,1,1],padding='SAME')
#read image
img = Image.open("img.tif")
sess = tf.InteractiveSession()
#convetir image to tensor
input = img_to_tensor(img).eval()
#print(input)
# get img dimension
img_dimension = tf.shape(input).eval()
print(img_dimension)
height,width,channel=img_dimension
filter_size = 3
feature_map = 32
x = tf.placeholder(tf.float32,shape=[height*width*channel])
y = tf.placeholder(tf.float32,shape=21)
# generate weigh [kernal size, kernal size,channel,number of filters]
W_conv1 = weight_generater([filter_size,filter_size,channel,1])
#for each filter W has his specific bais
b_conv1 = bias_generater([feature_map])
""" I must reshape the picture
x_image = tf.reshape(input,[-1,256,256,3])
"""
conv1 = conv2d(input,W_conv1) +b_conv1 #<=== error
h_conv1 = tf.nn.relu(conv1)
h_pool1 = pool_max_2x2(h_conv1)
layer1_dimension = tf.shape(h_pool1).eval()
print(layer1_dimension)
The first dimension is the batch size. If you are feeding 1 image at a time you can simply make the first dimension 1 and it doesn't change your data any, just changes the indexing to 4D:
x_image = tf.reshape(input, [1, 256, 256, 3])
If you reshape it with a -1 in the first dimension what you are doing is saying that you will feed in a 4D batch of images (shaped [batch_size, height, width, color_channels], and you are allowing the batch size to be dynamic (which is common to do).
You could also use
im = tf.expand_dims(input, axis=0)
to insert a dimension of 1 into the tensor's shape. im will be a rank 4 tensor. This way you do not have to specify the dimensions of the image.
I'm trying to reshape a tensor from [A, B, C, D] into [A, B, C * D] and feed it into a dynamic_rnn. Assume that I don't know the B, C, and D in advance (they're a result of a convolutional network).
I think in Theano such reshaping would look like this:
x = x.flatten(ndim=3)
It seems that in TensorFlow there's no easy way to do this and so far here's what I came up with:
x_shape = tf.shape(x)
x = tf.reshape(x, [batch_size, x_shape[1], tf.reduce_prod(x_shape[2:])]
Even when the shape of x is known during graph building (i.e. print(x.get_shape()) prints out absolute values, like [10, 20, 30, 40] after the reshaping get_shape() becomes [10, None, None]. Again, still assume the initial shape isn't known so I can't operate with absolute values.
And when I'm passing x to a dynamic_rnn it fails:
ValueError: Input size (depth of inputs) must be accessible via shape inference, but saw value None.
Why is reshape unable to handle this case? What is the right way of replicating Theano's flatten(ndim=n) in TensorFlow with tensors of rank 4 and more?
It is not a flaw in reshape, but a limitation of tf.dynamic_rnn.
Your code to flatten the last two dimensions is correct. And, reshape behaves correctly too: if the last two dimensions are unknown when you define the flattening operation, then so is their product, and None is the only appropriate value that can be returned at this time.
The culprit is tf.dynamic_rnn, which expects a fully-defined feature shape during construction, i.e. all dimensions apart from the first (batch size) and the second (time steps) must be known. It is a bit unfortunate perhaps, but the current implementation does not seem to allow RNNs with a variable number of features, à la FCN.
I tried a simple code according to your requirements. Since you are trying to reshape a CNN output, the shape of X is same as the output of CNN in Tensorflow.
HEIGHT = 100
WIDTH = 200
N_CHANELS =3
N_HIDDEN =64
X = tf.placeholder(tf.float32, shape=[None,HEIGHT,WIDTH,N_CHANELS],name='input') # output of CNN
shape = X.get_shape().as_list() # get the shape of each dimention shape[0] =BATCH_SIZE , shape[1] = HEIGHT , shape[2] = HEIGHT = WIDTH , shape[3] = N_CHANELS
input = tf.reshape(X, [-1, shape[1] , shape[2] * shape[3]])
print(input.shape) # prints (?, 100, 600)
#Input for tf.nn.dynamic_rnn should be in the shape of [BATCH_SIZE, N_TIMESTEPS, INPUT_SIZE]
#Therefore, according to the reshape N_TIMESTEPS = 100 and INPUT_SIZE= 600
#create the RNN here
lstm_layers = tf.contrib.rnn.BasicLSTMCell(N_HIDDEN, forget_bias=1.0)
outputs, _ = tf.nn.dynamic_rnn(lstm_layers, input, dtype=tf.float32)
Hope this helps.
I found a solution to this by using .get_shape().
Assuming 'x' is a 4-D Tensor.
This will only work with the Reshape Layer. As you were making changes to the architecture of the model, this should work.
x = tf.keras.layers.Reshape(x, [x.get_shape()[0], x.get_shape()[1], x.get_shape()[2] * x.get_shape()][3])
Hope this works!
If you use the tf.keras.models.Model or tf.keras.layers.Layer wrapper, the build method provides a nice way to do this.
Here's an example:
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Conv1D, Conv2D, Conv2DTranspose, Attention, Layer, Reshape
class VisualAttention(Layer):
def __init__(self, channels_out, key_is_value=True):
super(VisualAttention, self).__init__()
self.channels_out = channels_out
self.key_is_value = key_is_value
self.flatten_images = None # see build method
self.unflatten_images = None # see build method
self.query_conv = Conv1D(filters=channels_out, kernel_size=1, padding='same')
self.value_conv = Conv1D(filters=channels_out, kernel_size=4, padding='same')
self.key_conv = self.value_conv if key_is_value else Conv1D(filters=channels_out, kernel_size=4, padding='same')
self.attention_layer = Attention(use_scale=False, causal=False, dropout=0.)
def build(self, input_shape):
b, h, w, c = input_shape
self.flatten_images = Reshape((h*w, c), input_shape=(h, w, c))
self.unflatten_images = Reshape((h, w, self.channels_out), input_shape=(h*w, self.channels_out))
def call(self, x, training=True):
x = self.flatten_images(x)
q = self.query_conv(x)
v = self.value_conv(x)
inputs = [q, v] if self.key_is_value else [q, v, self.key_conv(x)]
output = self.attention_layer(inputs=inputs, training=training)
return self.unflatten_images(output)
# test
import numpy as np
x = np.arange(8*28*32*3).reshape((8, 28, 32, 3)).astype('float32')
model = VisualAttention(8)
y = model(x)
print(y.shape)
I'm currently trying to understand how Tensorflow's Depthwise Convolution works. As far as I've understood, each channel in the input image is convolved with it's own set of filters, and then the results are concatenated. I'm going to stick with the parameter depth_multiplier=1 for the sake of simplicity in the remainder, so n_inputchannels == n_outputchannels.
So in theory, I could split up the depthwise convolution into N individual, regular Conv2Ds, correct? Why does the following code produce different results then I am wondering - is this a precision issue? I'm following the documentation for the ordering [filter_height, filter_width, in_channels, 1] for the depthwise convolution filters, and [filter_height, filter_width, in_channels, out_channels] for the regular convolutions, and NHWC data format.
import tensorflow as tf
import numpy as np
import random
width = 128
height = 128
channels = 32
kernel_width = 3
kernel_height = 3
with tf.Session() as sess:
_input = np.float32(np.random.rand(1, height, width, channels))
_weights = np.float32(np.random.rand(kernel_height, kernel_width, channels, 1))
_input_ph = tf.placeholder(tf.float32, shape=(1, height, width, channels))
_weights_pc = tf.placeholder(tf.float32, shape=(kernel_height, kernel_width, channels, 1))
feed = { _input_ph: _input, _weights_pc : _weights }
result = tf.nn.depthwise_conv2d(_input_ph, _weights_pc, [1,1,1,1], 'SAME')
individual_results = []
for i in range(channels):
individual_results.append(tf.nn.conv2d(tf.expand_dims(_input_ph[:,:,:,i],axis=3), tf.expand_dims(_weights_pc[:,:,i,:],axis=3), [1,1,1,1], 'SAME'))
depth_result = sess.run(result, feed_dict=feed)
concat_result = sess.run(tf.concat(individual_results, axis=3), feed_dict=feed)
channel_diff = 0.0
for i in range(channels):
channel_diff += np.sum(depth_result[:,:,:,i]-concat_result[:,:,:,i])
print(channel_diff)
Here I'm computing first the normal tf.nn.depthwise_conv2d and then slice the input and weights accordingly and do tf.nn.conv2ds individually. For these parameters I get about 1e-5 difference, but that tends to get higher when I increase the number of channels.
I would be really glad if someone could explain to me what's going on :)
Thanks!