I'm trying to reshape a tensor from [A, B, C, D] into [A, B, C * D] and feed it into a dynamic_rnn. Assume that I don't know the B, C, and D in advance (they're a result of a convolutional network).
I think in Theano such reshaping would look like this:
x = x.flatten(ndim=3)
It seems that in TensorFlow there's no easy way to do this and so far here's what I came up with:
x_shape = tf.shape(x)
x = tf.reshape(x, [batch_size, x_shape[1], tf.reduce_prod(x_shape[2:])]
Even when the shape of x is known during graph building (i.e. print(x.get_shape()) prints out absolute values, like [10, 20, 30, 40] after the reshaping get_shape() becomes [10, None, None]. Again, still assume the initial shape isn't known so I can't operate with absolute values.
And when I'm passing x to a dynamic_rnn it fails:
ValueError: Input size (depth of inputs) must be accessible via shape inference, but saw value None.
Why is reshape unable to handle this case? What is the right way of replicating Theano's flatten(ndim=n) in TensorFlow with tensors of rank 4 and more?
It is not a flaw in reshape, but a limitation of tf.dynamic_rnn.
Your code to flatten the last two dimensions is correct. And, reshape behaves correctly too: if the last two dimensions are unknown when you define the flattening operation, then so is their product, and None is the only appropriate value that can be returned at this time.
The culprit is tf.dynamic_rnn, which expects a fully-defined feature shape during construction, i.e. all dimensions apart from the first (batch size) and the second (time steps) must be known. It is a bit unfortunate perhaps, but the current implementation does not seem to allow RNNs with a variable number of features, à la FCN.
I tried a simple code according to your requirements. Since you are trying to reshape a CNN output, the shape of X is same as the output of CNN in Tensorflow.
HEIGHT = 100
WIDTH = 200
N_CHANELS =3
N_HIDDEN =64
X = tf.placeholder(tf.float32, shape=[None,HEIGHT,WIDTH,N_CHANELS],name='input') # output of CNN
shape = X.get_shape().as_list() # get the shape of each dimention shape[0] =BATCH_SIZE , shape[1] = HEIGHT , shape[2] = HEIGHT = WIDTH , shape[3] = N_CHANELS
input = tf.reshape(X, [-1, shape[1] , shape[2] * shape[3]])
print(input.shape) # prints (?, 100, 600)
#Input for tf.nn.dynamic_rnn should be in the shape of [BATCH_SIZE, N_TIMESTEPS, INPUT_SIZE]
#Therefore, according to the reshape N_TIMESTEPS = 100 and INPUT_SIZE= 600
#create the RNN here
lstm_layers = tf.contrib.rnn.BasicLSTMCell(N_HIDDEN, forget_bias=1.0)
outputs, _ = tf.nn.dynamic_rnn(lstm_layers, input, dtype=tf.float32)
Hope this helps.
I found a solution to this by using .get_shape().
Assuming 'x' is a 4-D Tensor.
This will only work with the Reshape Layer. As you were making changes to the architecture of the model, this should work.
x = tf.keras.layers.Reshape(x, [x.get_shape()[0], x.get_shape()[1], x.get_shape()[2] * x.get_shape()][3])
Hope this works!
If you use the tf.keras.models.Model or tf.keras.layers.Layer wrapper, the build method provides a nice way to do this.
Here's an example:
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Conv1D, Conv2D, Conv2DTranspose, Attention, Layer, Reshape
class VisualAttention(Layer):
def __init__(self, channels_out, key_is_value=True):
super(VisualAttention, self).__init__()
self.channels_out = channels_out
self.key_is_value = key_is_value
self.flatten_images = None # see build method
self.unflatten_images = None # see build method
self.query_conv = Conv1D(filters=channels_out, kernel_size=1, padding='same')
self.value_conv = Conv1D(filters=channels_out, kernel_size=4, padding='same')
self.key_conv = self.value_conv if key_is_value else Conv1D(filters=channels_out, kernel_size=4, padding='same')
self.attention_layer = Attention(use_scale=False, causal=False, dropout=0.)
def build(self, input_shape):
b, h, w, c = input_shape
self.flatten_images = Reshape((h*w, c), input_shape=(h, w, c))
self.unflatten_images = Reshape((h, w, self.channels_out), input_shape=(h*w, self.channels_out))
def call(self, x, training=True):
x = self.flatten_images(x)
q = self.query_conv(x)
v = self.value_conv(x)
inputs = [q, v] if self.key_is_value else [q, v, self.key_conv(x)]
output = self.attention_layer(inputs=inputs, training=training)
return self.unflatten_images(output)
# test
import numpy as np
x = np.arange(8*28*32*3).reshape((8, 28, 32, 3)).astype('float32')
model = VisualAttention(8)
y = model(x)
print(y.shape)
Related
I am trying to create a simple linear regression neural net for use with batches of images. The input dimensions are [BatchSize, 3, Width, Height] with the second dimension representing the RGB channels of the input image.
Here is my (broken) attempt at that regression model:
class LinearNet(torch.nn.Module):
def __init__(self, Chn, W,H, nHidden):
"""
Input: A [BatchSize x Channels x Width x Height] set of images
Output: A fitted regression model with weights dimension : [Width x Height]
"""
super(LinearNet, self).__init__()
self.Chn = Chn
self.W = W
self.H = H
self.hidden = torch.nn.Linear(Chn*W*H,nHidden) # hidden layer
self.predict = torch.nn.Linear(nHidden, Chn*W*H) # output layer
def forward(self, x):
torch.reshape(x, (-1,self.Chn*self.W*self.H)) # FAILS here
# x = x.resize(-1,self.Chn*self.W*self.H)
x = F.relu(self.hidden(x)) # activation function for hidden layer
x = self.predict(x) # linear output
x = x.resize(-1,self.Chn, self.W,self.H)
return x
When sending in a batch of images with dimensions [128 x 3 x 96 x 128] this fails on the indicated line:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (36864x128 and 36864x256)
How should the matrix dimensions be properly manipulated to use these pytorch functions?
Update Based on a (since deleted) comment I have updated the code to use torch.reshape.
Solution 1 As a possible solution, you can get a batch size from input x with x.shape[0] and use it in reshape later
import torch
batch = torch.zeros([128, 3, 96, 128], dtype=torch.float32)
# -1 will compute last dimension automatically
batch_upd = torch.reshape(batch, (batch.shape[0], -1))
print(batch_upd.shape)
Output for this code is
torch.Size([128, 36864])
Solution 2
As another possible solution you can use flatten
batch_upd = batch.flatten(start_dim=1)
will result in the same output
As to your next problem, consider going through the modified forward code:
def forward(self, x):
x = x.flatten(1) # shape: [B, C, W, H] -> [B, C*W*H]
x = F.relu(self.hidden(x)) # activation function for hidden layer
x = self.predict(x) # linear output
x = x.reshape((-1, self.Chn, self.W, self.H)) # shape: [B, C*W*H] -> [B, C, W, H]
return x
Here is the successful usage example:
ln = LinearNet(3, 96, 128, 256)
batch = torch.zeros((128, 3, 96, 128))
res = ln(batch)
print(res.shape) # torch.Size([128, 3, 96, 128])
I am trying to filter a TensorFlow tensor of shape (N_batch, N_data), where N_batch is the batch size (e.g. 32), and N_data is the size of the (noisy) timeseries array. I have a Gaussian kernel (taken from here), which is one-dimensional. I then want to use tensorflow.nn.conv1d to convolve this kernel with my signal.
I have been trying for most of the morning to get the dimensions of the input signal and the kernel right, but obviously with no success. From what I gathered from the interwebs, the dimensions of both the input signal and the kernel need to be aligned in some finicky way, and I just can't figure out which way that is. The TensorFlow error messages aren't particularly meaningful either (Shape must be rank 4 but is rank 3 for 'conv1d/Conv2D' (op: 'Conv2D') with input shapes: [?,1,1000], [1,81]). Below I've included a little piece of code to reproduce the situation:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
# Based on: https://stackoverflow.com/a/52012658/1510542
# Credits to #zephyrus
def gaussian_kernel(size, mean, std):
d = tf.distributions.Normal(tf.cast(mean, tf.float32), tf.cast(std, tf.float32))
vals = d.prob(tf.range(start=-size, limit=size+1, dtype=tf.float32))
kernel = vals # Some reshaping is required here
return kernel / tf.reduce_sum(kernel)
def gaussian_filter(input, sigma):
size = int(4*sigma + 0.5)
x = input # Some reshaping is required here
kernel = gaussian_kernel(size=size, mean=0.0, std=sigma)
conv = tf.nn.conv1d(x, kernel, stride=1, padding="SAME")
return conv
def run_filter():
tf.reset_default_graph()
# Define size of data, batch sizes
N_batch = 32
N_data = 1000
noise = 0.2 * (np.random.rand(N_batch, N_data) - 0.5)
x = np.linspace(0, 2*np.pi, N_data)
y = np.tile(np.sin(x), N_batch).reshape(N_batch, N_data)
y_noisy = y + noise
input = tf.placeholder(tf.float32, shape=[None, N_data])
smooth_input = gaussian_filter(input, sigma=10)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
y_smooth = smooth_input.eval(feed_dict={input: y_noisy})
plt.plot(y_noisy[0])
plt.plot(y_smooth[0])
plt.show()
if __name__ == "__main__":
run_filter()
Any ideas?
You need to add channel dimensions to your input/kernel, since TF convolutions are generally used for multi-channel inputs/outputs. As you are working with simple 1-channel input/output this amounts to just adding some size-1 "dummy" axes.
Since by default convolution expects channels to come last, your placeholder should have shape [None, N_data, 1] and your input be modified like
y_noisy = y + noise
y_noisy = y_noisy[:, :, np.newaxis]
Similarly, you need to add input and output channel dimensions to your filter:
kernel = gaussian_kernel(size=size, mean=0.0, std=sigma)
kernel = kernel[:, tf.newaxis, tf.newaxis]
That is, the filter is expected to have shape [width, in_channels, out_cannels].
I am trying to develop a seq2seq model from a low level perspective (creating by myself all the tensors needed). I am trying to feed the model with a sequence of vectors as a two-dimensional tensor, however, i can't iterate over one dimension of the tensor to extract vector by vector. Does anyone know what could I do to feed a batch of vectors and later get them one by one?
This is my code:
batch_size = 100
hidden_dim = 5
input_dim = embedding_dim
time_size = 5
input_sentence = tf.placeholder(dtype=tf.float64, shape=[embedding_dim,None], name='input')
output_sentence = tf.placeholder(dtype=tf.float64, shape=[embedding_dim,None], name='output')
input_array = np.asarray(input_sentence)
output_array = np.asarray(output_sentence)
gru_layer1 = GRU(input_array, input_dim, hidden_dim) #This is a class created by myself
for i in range(input_array.shape[-1]):
word = input_array[:,i]
previous_state = gru_encoder.h_t
gru_layer1.forward_pass(previous_state,word)
And this is the error that I get
TypeError: Expected binary or unicode string, got <tf.Tensor 'input_7:0' shape=(10, ?) dtype=float64>
Tensorflow does deferred execution.
You usually can't know how big the vector will be (words in a sentance, audio samples, etc...). The common thing to do is to cap it at some reasonably large value and then pad the shorter sequences with an empty token.
Once you do this you can select the data for a time slice with the slice operator:
data = tf.placeholder(shape=(batch_size, max_size, numer_of_inputs))
....
for i in range(max_size):
time_data = data[:, i, :]
DoStuff(time_data)
Also lookup tf.transpose for swapping batch and time indices. It can help with performance in certain cases.
Alternatively consider something like tf.nn.static_rnn or tf.nn.dynamic_rnn to do the boilerplate stuff for you.
Finally I found an approach that solves my problem. It worked using tf.scan() instead of a loop, which doesn't require the input tensor to have a defined number in the second dimension. Consecuently you hace to prepare the input tensor previously to be parsed as you want throught tf.san(). In my case this is the code:
batch_size = 100
hidden_dim = 5
input_dim = embedding_dim
time_size = 5
input_sentence = tf.placeholder(dtype=tf.float64, shape=[embedding_dim,None], name='input')
output_sentence = tf.placeholder(dtype=tf.float64, shape=[embedding_dim,None], name='output')
input_array = np.asarray(input_sentence)
output_array = np.asarray(output_sentence)
x_t = tf.transpose(input_array, [1, 0], name='x_t')
h_0 = tf.convert_to_tensor(h_0, dtype=tf.float64)
h_t_transposed = tf.scan(forward_pass, x_t, h_0, name='h_t_transposed')
h_t = tf.transpose(h_t_transposed, [1, 0], name='h_t')
Since the Keras wrapper does not support attention model yet, I'd like to refer to the following custom attention.
https://github.com/datalogue/keras-attention/blob/master/models/custom_recurrents.py
But the problem is, when I run the code above, it returns following error:
ImportError: cannot import name '_time_distributed_dense'
It looks like no more _time_distributed_dense is supported by keras over 2.0.0
the only parts that use _time_distributed_dense module is the part below:
def call(self, x):
# store the whole sequence so we can "attend" to it at each timestep
self.x_seq = x
# apply the a dense layer over the time dimension of the sequence
# do it here because it doesn't depend on any previous steps
# thefore we can save computation time:
self._uxpb = _time_distributed_dense(self.x_seq, self.U_a, b=self.b_a,
input_dim=self.input_dim,
timesteps=self.timesteps,
output_dim=self.units)
return super(AttentionDecoder, self).call(x)
In which way should I change the _time_distrubuted_dense(self ... ) part?
I just posted from An Chen's answer of the GitHub issue (the page or his answer might be deleted in the future)
def _time_distributed_dense(x, w, b=None, dropout=None,
input_dim=None, output_dim=None,
timesteps=None, training=None):
"""Apply `y . w + b` for every temporal slice y of x.
# Arguments
x: input tensor.
w: weight matrix.
b: optional bias vector.
dropout: wether to apply dropout (same dropout mask
for every temporal slice of the input).
input_dim: integer; optional dimensionality of the input.
output_dim: integer; optional dimensionality of the output.
timesteps: integer; optional number of timesteps.
training: training phase tensor or boolean.
# Returns
Output tensor.
"""
if not input_dim:
input_dim = K.shape(x)[2]
if not timesteps:
timesteps = K.shape(x)[1]
if not output_dim:
output_dim = K.shape(w)[1]
if dropout is not None and 0. < dropout < 1.:
# apply the same dropout pattern at every timestep
ones = K.ones_like(K.reshape(x[:, 0, :], (-1, input_dim)))
dropout_matrix = K.dropout(ones, dropout)
expanded_dropout_matrix = K.repeat(dropout_matrix, timesteps)
x = K.in_train_phase(x * expanded_dropout_matrix, x, training=training)
# collapse time dimension and batch dimension together
x = K.reshape(x, (-1, input_dim))
x = K.dot(x, w)
if b is not None:
x = K.bias_add(x, b)
# reshape to 3D tensor
if K.backend() == 'tensorflow':
x = K.reshape(x, K.stack([-1, timesteps, output_dim]))
x.set_shape([None, None, output_dim])
else:
x = K.reshape(x, (-1, timesteps, output_dim))
return x
You could just add this on your Python code.
I'm currently trying to understand how Tensorflow's Depthwise Convolution works. As far as I've understood, each channel in the input image is convolved with it's own set of filters, and then the results are concatenated. I'm going to stick with the parameter depth_multiplier=1 for the sake of simplicity in the remainder, so n_inputchannels == n_outputchannels.
So in theory, I could split up the depthwise convolution into N individual, regular Conv2Ds, correct? Why does the following code produce different results then I am wondering - is this a precision issue? I'm following the documentation for the ordering [filter_height, filter_width, in_channels, 1] for the depthwise convolution filters, and [filter_height, filter_width, in_channels, out_channels] for the regular convolutions, and NHWC data format.
import tensorflow as tf
import numpy as np
import random
width = 128
height = 128
channels = 32
kernel_width = 3
kernel_height = 3
with tf.Session() as sess:
_input = np.float32(np.random.rand(1, height, width, channels))
_weights = np.float32(np.random.rand(kernel_height, kernel_width, channels, 1))
_input_ph = tf.placeholder(tf.float32, shape=(1, height, width, channels))
_weights_pc = tf.placeholder(tf.float32, shape=(kernel_height, kernel_width, channels, 1))
feed = { _input_ph: _input, _weights_pc : _weights }
result = tf.nn.depthwise_conv2d(_input_ph, _weights_pc, [1,1,1,1], 'SAME')
individual_results = []
for i in range(channels):
individual_results.append(tf.nn.conv2d(tf.expand_dims(_input_ph[:,:,:,i],axis=3), tf.expand_dims(_weights_pc[:,:,i,:],axis=3), [1,1,1,1], 'SAME'))
depth_result = sess.run(result, feed_dict=feed)
concat_result = sess.run(tf.concat(individual_results, axis=3), feed_dict=feed)
channel_diff = 0.0
for i in range(channels):
channel_diff += np.sum(depth_result[:,:,:,i]-concat_result[:,:,:,i])
print(channel_diff)
Here I'm computing first the normal tf.nn.depthwise_conv2d and then slice the input and weights accordingly and do tf.nn.conv2ds individually. For these parameters I get about 1e-5 difference, but that tends to get higher when I increase the number of channels.
I would be really glad if someone could explain to me what's going on :)
Thanks!