How to use tf.nn.raw_rnn function in Tensorflow? - python

I am trying to implement LSTM based network where after hidden state computation we also apply linear + sigmoid transformation at each time step. I have found the official documentation and a nice article that describe tf.nn.raw_rnn function suitable for this task however I struggle to understand why it does not work in my particular case.
Input description
So, let our input to LSTM be a minibatch of size [num_steps x batch_size x size], concretely [5, 32, 100]. Let LSTM have 200 hidden units. Then the output of the LSTM is [5, 32, 200] tensor which we can later use for loss computation. I assume the input [5, 32, 100] tensor is first unstacked into an array of [32, 100] tensors and then stacked back if we use tf.nn.dynamic_rnn with time_major=True in Tensorflow:
tf.nn.dynamic_rnn(LSTM)
LSTM t=0 LSTM t=1 LSTM t=2 LSTM t=3 LSTM t=4
[5, 32, 100] --> [[32, 100], [32, 100], [32, 100], [32, 100], [32, 100]] --> [5, 32, 200]
Hidden state model
In addition after each LSTM cell I need to perform linear + sigmoid transformation to squash each [32, 200] tensor into [32, 1] for example. Our tf.nn.dynamic_rnn won't work for that since it only accepts cells. We need to use tf.nn.raw_rnn API. So, here is my try:
def _get_raw_rnn_graph(self, inputs):
time = tf.constant(0, dtype=tf.int32)
_inputs_ta = tf.TensorArray(dtype=tf.float32, size=5)
# our [5, 32, 100] tensor becomes [[32, 100], [32, 100], ...]
_inputs_ta = _inputs_ta.unstack(inputs)
# create simple LSTM cell
cell = tf.contrib.rnn.LSTMCell(config.hidden_size)
# create loop_fn for raw_rnn
def loop_fn(time, cell_output, cell_state, loop_state):
emit_output = cell_output # == None if time = 0
if cell_output is None: # time = 0
next_cell_state = cell.zero_state(32, tf.float32)
self._initial_state = next_cell_state
else:
next_cell_state = cell_state
elements_finished = (time >= 32)
finished = tf.reduce_all(elements_finished)
next_input = tf.cond(finished,
lambda: tf.zeros([32, config.input_size], dtype=tf.float32),
lambda: _inputs_ta.read(time))
# apply linear + sig transform here
next_input = self._linear_transform(next_input, activation=tf.sigmoid)
next_loop_state = None
return (elements_finished, next_input, next_cell_state, emit_output, next_loop_state)
outputs_ta, final_state, _ = tf.nn.raw_rnn(cell, loop_fn)
outputs = outputs_ta.stack()
return outputs, final_state
This unfortunately does not work. The loop_fn iterates only two times instead of num_steps times as I expected and its output is Tensor("Train/Model/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, 32, 200), dtype=float32) not [5, 32, 1] as we intended. What am I missing here?

Related

how to change torch.scatter_add to tensorflow function

I need transfer code pytorch to tensorflow
this pytorch code is here NADST
encoded_context = ft['encoded_context2']
encoded_in_domainslots = ft['encoded_in_domainslots2']
self.pointer_attn(ft['out_states'], encoded_context, encoded_context, context_mask)
pointer_attn = self.pointer_attn.attn.squeeze(1)
p_vocab = F.softmax(vocab_attn, dim = -1)
context_index = context.unsqueeze(1).expand_as(pointer_attn)
p_context_ptr = torch.zeros(p_vocab.size()).cuda()
p_context_ptr.scatter_add_(2, context_index, pointer_attn)
I want to change code "p_context_ptr.scatter_add_(2, context_index, pointer_attn)" to tensorflow version.
so I use "tf.compat.v1.tensor_scatter_nd_add()" of tensorflow function, but not same operation torch scatter_add_() fucntion
I'm so try work until now but not found solution my some code like this
def get_scatter_add(tensor, indices, updates):
if indices.shape.rank > 2:
tensor = tf.compat.v1.reshape(tensor, shape=[-1, tensor.shape[-1]])
indices = tf.compat.v1.reshape(indices, shape=[-1, indices.shape[-1]])
updates = tf.compat.v1.reshape(updates, shape=[-1, updates.shape[-1]])
one_hot_index = tf.compat.v1.one_hot(indices=indices, depth=tensor.shape[-1])
tile_update = tf.compat.v1.expand_dims(updates, axis=-1)
updates = tf.compat.v1.to_float(one_hot_index) * tf.compat.v1.to_float(tile_update)
indices = tf.compat.v1.expand_dims(indices, axis=-1)
update = tensor.shape[indices.shape[-1]:]
res = indices.shape[:-1] + update
scatter = tf.compat.v1.tensor_scatter_nd_add(tensor, indices, updates)
return scatter
but, memory overflow, my variable shape is tensor.shape()->[1100, 19200], update.shape()->[1100, 900], updates.shape()->[1100, 900]
how to solve this problem ???
Thank you for your reply
have nice day!!!
I found solution by myself
tensorflow tensor_scatter_nd_add function is some problem vector dimension is expanded for target vector.
but except for one case is same operation to torch scatter_add_ fucntion
this case :
import tensorflow as tf
indices = tf.constant([[4], [3], [1], [7]])
updates = tf.constant([9, 10, 11, 12])
tensor = tf.ones([8], dtype=tf.int32)
updated = tf.tensor_scatter_nd_add(tensor, indices, updates)
print(updated)
it only update, tensor one dimension and indices is rank 2 shape
so i am change shape like above method like this
tensor.shape()->reshape[-1]
update.shape()->reshape[-1]
indices.shape()->reshape[-1, 1]
this same above case but, we need update index operation but if we have pointer generater for DST task, becuase tensor is vocabulary size of last dimension, so index + vocab size next batch and +vocab*2 next batch
so it function same operation Torch scatter_add_
example:
tensor = [35, 32, vocab_size], indices = [35, 32, 900], update = [35, 32, 900]
Torch case:
tensor.scatter_add_(2, indices, update)
Tensorflow case:
tensor = my_tensorflow_scatter_add(tensor, indices, update)
this same operation case above variable dimension
my_tensorflow_scatter_add function:
def my_tensorflow_scatter_add(tensor, indices, updates):
original_tensor = tensor
# expand index value from vocab size
indices = tf.compat.v1.reshape(indices, shape=[-1, tf.shape(indices)[-1]])
indices_add = tf.compat.v1.expand_dims(tf.range(0, tf.shape(indices)[0], 1)*(tf.shape(tensor)[-1]), axis=-1)
indices += indices_add
# resize
tensor = tf.compat.v1.reshape(tensor, shape=[-1])
indices = tf.compat.v1.reshape(indices, shape=[-1, 1])
updates = tf.compat.v1.reshape(updates, shape=[-1])
#check_
"""
update = tensor.shape[indices.shape[-1]:]
res = indices.shape[:-1] + update
"""
#same Torch scatter_add_
scatter = tf.compat.v1.tensor_scatter_nd_add(tensor, indices, updates)
scatter = tf.compat.v1.reshape(scatter, shape=[tf.shape(original_tensor)[0], tf.shape(original_tensor)[1], -1])
return scatter
I solved my question problem
Alternative solution without flattening all tensors. Assuming the tensor shapes tensor = [35, 32, vocab_size], indices = [35, 32, 900], update = [35, 32, 900] (based on Proper usage of `tf.scatter_nd` in tensorflow-r1.2) :
def scatter_add(tensor, indices, updates):
"""
Args:
tensor: (seq_len, batch_size, vocab_size)
indices: (seq_len, batch_size, dim)
updates: (seq_len, batch_size, dim)
Returns:
(seq_len, batch_size, vocab_size)
"""
seq_len, batch_size, dim = indices.shape
# Create additional indices
i1, i2 = tf.meshgrid(tf.range(seq_len),
tf.range(batch_size), indexing="ij")
i1 = tf.tile(i1[:, :, tf.newaxis], [1, 1, dim])
i2 = tf.tile(i2[:, :, tf.newaxis], [1, 1, dim])
# Create final indices
idx = tf.stack([i1, i2, indices], axis=-1)
# Get scatter-added tensor
scatter = tf.tensor_scatter_nd_add(tensor, idx, updates)
return scatter

what should be the shape of target data in LSTM for predicting number in the sequence

I am working with keras functional API on LSTM. where a have given a input sequence of 3 features and predicting the next value in it. for eg
input
[10, 20, 30]
target
[40]
given below is input data.
[[[10, 20, 30],
[20, 30, 40],
[30, 40, 50],
[40, 50, 60],
[50, 60, 70],
[60, 70, 80],
[70, 80, 90]]]
And target data.
[ 40, 50, 60, 70, 80, 90, 100]
here is my code. for creating input array and target array
from keras.utils import plot_model
from keras.models import Model
from keras.layers import Input
from keras.layers import Dense
from keras.layers.recurrent import LSTM
from numpy import array
def own_split_sequence():
X, y = list(), list()
for i in range(10,100,10):
if i+30 > 100:
break
seq_x = [i, i+10, i+20]
seq_y = i+30
X.append(seq_x)
y.append(seq_y)
return array(X), array(y)
reshape the input according to (samples, timesteps, features)
x1 = X.reshape((1,7,3))
model code.
visible = Input(shape=(7,3))
hidden1 = LSTM(10)(visible)
output = Dense(1,activation='relu')(hidden1)
model = Model(inputs=visible,outputs=output)
model.compile(optimizer='adam', loss='mse')
model.fit(x1, y, epochs=150, verbose=2)
and here in fit method it is giving an error
ValueError: Input arrays should have the same number of samples as target arrays. Found 1 input samples and 7 target samples.
Error is mostly to do with the input and target shapes. You are right in quoting that the input has to be reshaped to (samples, timesteps, features). But the code below doesn't reflect that. You have 7 samples, each having 3 timesteps, with one feature for a timestep.
x1 = X.reshape((7,3,1))
Also the target has to be reshaped to match the same pattern. Timestep dimension will be squeezed in the target.
y = y.reshape((7,1))
x1 = X.reshape((7,3,1))
visible = Input(shape=(3,1))
hidden1 = LSTM(10,activation='relu')(visible)
output = Dense(1,activation='relu')(hidden1)
model = Model(inputs=visible,outputs=output)
this worked for me

logits and labels must be broadcastable: logits_size=[32,1] labels_size=[16,1]

I'm trying to learn tensorflow and I'm getting the following error:
logits and labels must be broadcastable: logits_size=[32,1] labels_size=[16,1]
The code runs fine when I got this as input:
self.input = np.ones((500, 784))
self.y = np.ones((500, 1))
However, when I add and extra dimension the error is thrown:
self.input = np.ones((500, 2, 784))
self.y = np.ones((500, 1))
The code to build the graph
self.x = tf.placeholder(tf.float32, shape=[None] + self.config.state_size)
self.y = tf.placeholder(tf.float32, shape=[None, 1])
# network architecture
d1 = tf.layers.dense(self.x, 512, activation=tf.nn.relu, name="dense1")
d2 = tf.layers.dense(d1, 1, name="dense2")
with tf.name_scope("loss"):
self.cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=self.y, logits=d2))
self.train_step = tf.train.AdamOptimizer(self.config.learning_rate).minimize(self.cross_entropy,
global_step=self.global_step_tensor)
correct_prediction = tf.equal(tf.argmax(d2, 1), tf.argmax(self.y, 1))
self.accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
Could someone explain me why this is happening and how I can fix this?
logits is the name typically given to the output of the network, these are your predictions. A size of [32, 10] tells me that you have a batch size of 32, and 10 outputs, such as is common with mnist, as you appear to be working with.
Your labels are sized [16, 10], which is to say, you're providing 16 labels/vectors of size 10. The number of labels you're providing is in conflict with the output of the network, they should be the same.
I'm not quite clear what you're doing with the extra dimension in the input, but I guess you must be accidentally doubling the samples in some way. Perhaps the [500, 2, 784] shape is being reshaped to [1000, 784] automatically somewhere along the way, which is then not matching the 500 labels. Also, your self.y should be shaped [500, 10] not, [500, 1], your labels need to be in one-hot encoding format. E.g. a single label of shape [1, 10] for digit 3 would be [[0,0,0,1,0,0,0,0,0,0,0]], not in digit representation, e.g. [3] as you seem to have it set up in your sanity-test here.

Tensorflow: CNN training converges at a vector of zeros

I'm a beginner in deep learning and have taken a few courses on Udacity. Recently I'm trying to build a deep network detecting hand joints in the input depth images, which doesn't seem to be working well. (My dataset is ICVL Hand Posture Dataset)
The network structure is shown here.
① A batch of input images, 240x320;
② An 8-channel convolutional layer with a 5x5 kernel;
③ A max pooling layer, ksize = stride = 2;
④ A fully-connected layer, weight.shape = [38400, 1024];
⑤ A fully-connected layer, weight.shape = [1024, 48].
After several epochs of training, the output of the last layer converges as a (0, 0, ..., 0) vector. I chose the mean square error as the loss function and its value stayed above 40000 and didn't seem to reduce.
The network structure is already too simple to be simplified again but the problem remains. Could anyone offer any suggestions?
My main code is posted below:
image = tf.placeholder(tf.float32, [None, 240, 320, 1])
annotations = tf.placeholder(tf.float32, [None, 48])
W_convolution_layer1 = tf.Variable(tf.truncated_normal([5, 5, 1, 8], stddev=0.1))
b_convolution_layer1 = tf.Variable(tf.constant(0.1, shape=[8]))
h_convolution_layer1 = tf.nn.relu(
tf.nn.conv2d(image, W_convolution_layer1, [1, 1, 1, 1], 'SAME') + b_convolution_layer1)
h_pooling_layer1 = tf.nn.max_pool(h_convolution_layer1, [1, 2, 2, 1], [1, 2, 2, 1], 'SAME')
W_fully_connected_layer1 = tf.Variable(tf.truncated_normal([120 * 160 * 8, 1024], stddev=0.1))
b_fully_connected_layer1 = tf.Variable(tf.constant(0.1, shape=[1024]))
h_pooling_flat = tf.reshape(h_pooling_layer1, [-1, 120 * 160 * 8])
h_fully_connected_layer1 = tf.nn.relu(
tf.matmul(h_pooling_flat, W_fully_connected_layer1) + b_fully_connected_layer1)
W_fully_connected_layer2 = tf.Variable(tf.truncated_normal([1024, 48], stddev=0.1))
b_fully_connected_layer2 = tf.Variable(tf.constant(0.1, shape=[48]))
detection = tf.nn.relu(
tf.matmul(h_fully_connected_layer1, W_fully_connected_layer2) + b_fully_connected_layer2)
mean_squared_error = tf.reduce_sum(tf.losses.mean_squared_error(annotations, detection))
training = tf.train.AdamOptimizer(1e-4).minimize(mean_squared_error)
# This data loader reads images and annotations and convert them into batches of numbers.
loader = ICVLDataLoader('../data/')
with tf.Session() as session:
session.run(tf.global_variables_initializer())
for i in range(1000):
# batch_images: a list with shape = [BATCH_SIZE, 240, 320, 1]
# batch_annotations: a list with shape = [BATCH_SIZE, 48]
[batch_images, batch_annotations] = loader.get_batch(100).to_1d_list()
[x_, t_, l_, p_] = session.run([x_image, training, mean_squared_error, detection],
feed_dict={images: batch_images, annotations: batch_annotations})
And it runs like this.
The main issue is likely the relu activation in the output layer. You should remove this, i.e. let detection simply be the results of a matrix multiplication. If you want to force the outputs to be positive, consider something like the exponential function instead.
While relu is a popular hidden activation, I see one major problem with using it as an output activation: As is well known relu maps negative inputs to 0 -- however, crucially, the gradients will also be 0. This happening in the output layer basically means your network cannot learn from its mistakes when it produces outputs < 0 (which is likely to happen with random initializations). This will likely heavily impair the overall learning process.

Tensorflow: Recurrent Neural Network Batch Training

I am trying to implement RNN in Tensorflow. I am writing my own functions instead of using RNN cells to practice.
The problem is sequence tagging, input size is [32, 48, 900] where 32 is batch size, 48 is time steps and 900 is vocab size which is one-hot encoded vector. Output is [32, 48, 145] where first two dimensions are same as input, but the last dimension is output vocabulary size (one-hot). Basically this is a NLP tagging problem.
I am getting following error:
InvalidArgumentError (see above for traceback): logits and labels must
be same size: logits_size=[48,145] labels_size=[1536,145]
The actual labels_size is [32, 48, 145] but it merges first two dimensions without my control. FYI 32*48 = 1536
If I run my RNN with batch size 1, it works fine as expected. I could not figure out how to solve the issue. I am getting the problem in the last line of the code.
I pasted the related part of the code:
inputs = tf.placeholder(shape=[None, self.seq_length, self.vocab_size], dtype=tf.float32, name="inputs")
targets = tf.placeholder(shape=[None, self.seq_length, self.output_vocab_size], dtype=tf.float32, name="targets")
init_state = tf.placeholder(shape=[1, self.hidden_size], dtype=tf.float32, name="state")
initializer = tf.random_normal_initializer(stddev=0.1)
with tf.variable_scope("RNN") as scope:
hs_t = init_state
ys = []
for t, xs_t in enumerate(tf.split(inputs[0], self.seq_length, axis=0)):
if t > 0: scope.reuse_variables()
Wxh = tf.get_variable("Wxh", [self.vocab_size, self.hidden_size], initializer=initializer)
Whh = tf.get_variable("Whh", [self.hidden_size, self.hidden_size], initializer=initializer)
Why = tf.get_variable("Why", [self.hidden_size, self.output_vocab_size], initializer=initializer)
bh = tf.get_variable("bh", [self.hidden_size], initializer=initializer)
by = tf.get_variable("by", [self.output_vocab_size], initializer=initializer)
hs_t = tf.tanh(tf.matmul(xs_t, Wxh) + tf.matmul(hs_t, Whh) + bh)
ys_t = tf.matmul(hs_t, Why) + by
ys.append(ys_t)
hprev = hs_t
output_softmax = tf.nn.softmax(ys) # Get softmax for sampling
#outputs = tf.concat(ys, axis=0)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=targets, logits=ys))
The problem may fall in the size of the ys, ys should have the size of [32, 48, 145], but the output ys only have the size of [48,145], so if the batchsize is 1, the taget size is [1, 48, 145], which just have the same size of [48,145] after dimensionality reduction.
To solve the problem you can add a loop to deal with the batchsize ( inputs[0] ) :
such as :
for i in range(inputs.getshape(0)):
for t, xs_t in enumerate(tf.split(inputs[i], self.seq_length, axis=0)):

Categories