Iterate over a tensor dimension in Tensorflow - python

I am trying to develop a seq2seq model from a low level perspective (creating by myself all the tensors needed). I am trying to feed the model with a sequence of vectors as a two-dimensional tensor, however, i can't iterate over one dimension of the tensor to extract vector by vector. Does anyone know what could I do to feed a batch of vectors and later get them one by one?
This is my code:
batch_size = 100
hidden_dim = 5
input_dim = embedding_dim
time_size = 5
input_sentence = tf.placeholder(dtype=tf.float64, shape=[embedding_dim,None], name='input')
output_sentence = tf.placeholder(dtype=tf.float64, shape=[embedding_dim,None], name='output')
input_array = np.asarray(input_sentence)
output_array = np.asarray(output_sentence)
gru_layer1 = GRU(input_array, input_dim, hidden_dim) #This is a class created by myself
for i in range(input_array.shape[-1]):
word = input_array[:,i]
previous_state = gru_encoder.h_t
gru_layer1.forward_pass(previous_state,word)
And this is the error that I get
TypeError: Expected binary or unicode string, got <tf.Tensor 'input_7:0' shape=(10, ?) dtype=float64>

Tensorflow does deferred execution.
You usually can't know how big the vector will be (words in a sentance, audio samples, etc...). The common thing to do is to cap it at some reasonably large value and then pad the shorter sequences with an empty token.
Once you do this you can select the data for a time slice with the slice operator:
data = tf.placeholder(shape=(batch_size, max_size, numer_of_inputs))
....
for i in range(max_size):
time_data = data[:, i, :]
DoStuff(time_data)
Also lookup tf.transpose for swapping batch and time indices. It can help with performance in certain cases.
Alternatively consider something like tf.nn.static_rnn or tf.nn.dynamic_rnn to do the boilerplate stuff for you.

Finally I found an approach that solves my problem. It worked using tf.scan() instead of a loop, which doesn't require the input tensor to have a defined number in the second dimension. Consecuently you hace to prepare the input tensor previously to be parsed as you want throught tf.san(). In my case this is the code:
batch_size = 100
hidden_dim = 5
input_dim = embedding_dim
time_size = 5
input_sentence = tf.placeholder(dtype=tf.float64, shape=[embedding_dim,None], name='input')
output_sentence = tf.placeholder(dtype=tf.float64, shape=[embedding_dim,None], name='output')
input_array = np.asarray(input_sentence)
output_array = np.asarray(output_sentence)
x_t = tf.transpose(input_array, [1, 0], name='x_t')
h_0 = tf.convert_to_tensor(h_0, dtype=tf.float64)
h_t_transposed = tf.scan(forward_pass, x_t, h_0, name='h_t_transposed')
h_t = tf.transpose(h_t_transposed, [1, 0], name='h_t')

Related

How to create end execute a basic LSTM network in TensorFlow?

I want to create a basic LSTM network that accept sequences of 5 dimensional vectors (for example as a N x 5 arrays) and returns the corresponding sequences of 4 dimensional hidden- and cell-vectors (N x 4 arrays), where N is the number of time steps.
How can I do it TensorFlow?
ADDED
So, far I got the following code working:
num_units = 4
lstm = tf.nn.rnn_cell.LSTMCell(num_units = num_units)
timesteps = 18
num_input = 5
X = tf.placeholder("float", [None, timesteps, num_input])
x = tf.unstack(X, timesteps, 1)
outputs, states = tf.contrib.rnn.static_rnn(lstm, x, dtype=tf.float32)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
x_val = np.random.normal(size = (12,18,5))
res = sess.run(outputs, feed_dict = {X:x_val})
sess.close()
However, there are many open questions:
Why number of time steps is preset? Shouldn't LSTM be able to accept sequences of arbitrary length?
Why do we split data by time-steps (using unstack)?
How to interpret the "outputs" and "states"?
Why number of time steps is preset? Shouldn't LSTM be able to accept
sequences of arbitrary length?
If you want to accept sequences of arbitrary length, I recommend using dynamic_rnn.You can refer here to understand the difference between them.
For example:
num_units = 4
lstm = tf.nn.rnn_cell.LSTMCell(num_units = num_units)
num_input = 5
X = tf.placeholder("float", [None, None, num_input])
outputs, states = tf.nn.dynamic_rnn(lstm, X, dtype=tf.float32)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
x_val = np.random.normal(size = (12,18,5))
res = sess.run(outputs, feed_dict = {X:x_val})
x_val = np.random.normal(size = (12,16,5))
res = sess.run(outputs, feed_dict = {X:x_val})
sess.close()
dynamic_rnn require same length in one batch , but you can specify every length using the sequence_length parameter after you pad batch data when you need arbitrary length in one batch.
We do we split data by time-steps (using unstack)?
Just static_rnn needs to split data with unstack,this depending on their different input requirements. The input shape of static_rnn is [timesteps,batch_size, features], which is a list of 2D tensors of shape [batch_size, features]. But the input shape of dynamic_rnn is either [timesteps,batch_size, features] or [batch_size,timesteps, features] depending on time_major is True or False.
How to interpret the "outputs" and "states"?
The shape of states is [2,batch_size,num_units ] in LSTMCell, one [batch_size, num_units ] represents C and the other [batch_size, num_units ] represents h. You can see pictures below.
In the same way, You will get the shape of states is [batch_size, num_units ] in GRUCell.
outputs represents the output of each time step, so by default(time_major=False) its shape is [batch_size, timesteps, num_units]. And You can easily conclude that
state[1, batch_size, : ] == outputs[ batch_size, -1, : ].

Keras backend function: InvalidArgumentError

I can't get keras.backend.function to work properly. I'm trying to follow this post:
How to calculate prediction uncertainty using Keras?
In this post they create a function f:
f = K.function([model.layers[0].input],[model.layers[-1].output]) #(I actually simplified the function a little bit).
In my neural network I have 3 inputs. When I try to compute f([[3], [23], [0.0]]) I get this error:
InvalidArgumentError: You must feed a value for placeholder tensor 'input_3' with dtype float and shape [?,1]
[[{{node input_3}} = Placeholder[dtype=DT_FLOAT, shape=[?,1], _device="/job:localhost/replica:0/task:0/device:CPU:0"]
Now I know using [[3], [23], [0.0]] as an input in my model doesn't give me an error during the testing phase. Can anyone tell me where I'm going wrong?
This is what my model looks like if it matters:
home_in = Input(shape=(1,))
away_in = Input(shape=(1,))
time_in = Input(shape = (1,))
embed_home = Embedding(input_dim = in_dim, output_dim = out_dim, input_length = 1)
embed_away = Embedding(input_dim = in_dim, output_dim = out_dim, input_length = 1)
embedding_home = Flatten()(embed_home(home_in))
embedding_away = Flatten()(embed_away(away_in))
keras.backend.set_learning_phase(1) #this will keep dropout on during the testing phase
model_layers = Dense(units=2)\
(Dropout(0.3)\
(Dense(units=64, activation = "relu")\
(Dropout(0.3)\
(Dense(units=64, activation = "relu")\
(Dropout(0.3)\
(Dense(units=64, activation = "relu")\
(concatenate([embedding_home, embedding_away, time_in]))))))))
model = Model(inputs=[home_in, away_in, time_in], outputs=model_layers)`
The function you have defined is only using one of the input layers (i.e. model.layers[0].input) as its input. Instead, it must use all the inputs so the model could be run. There are inputs and outputs attributes for the model which you can use to include all the inputs and outputs with less verbosity:
f = K.function(model.inputs, model.outputs)
Update: The shape of all the input arrays must be (num_samples, 1). Therefore, you need to pass a list of lists (e.g. [[3]]) instead of a list (e.g. [3]):
outs = f([[[3]], [[23]], [[0.0]]])

Flattening two last dimensions of a tensor in TensorFlow

I'm trying to reshape a tensor from [A, B, C, D] into [A, B, C * D] and feed it into a dynamic_rnn. Assume that I don't know the B, C, and D in advance (they're a result of a convolutional network).
I think in Theano such reshaping would look like this:
x = x.flatten(ndim=3)
It seems that in TensorFlow there's no easy way to do this and so far here's what I came up with:
x_shape = tf.shape(x)
x = tf.reshape(x, [batch_size, x_shape[1], tf.reduce_prod(x_shape[2:])]
Even when the shape of x is known during graph building (i.e. print(x.get_shape()) prints out absolute values, like [10, 20, 30, 40] after the reshaping get_shape() becomes [10, None, None]. Again, still assume the initial shape isn't known so I can't operate with absolute values.
And when I'm passing x to a dynamic_rnn it fails:
ValueError: Input size (depth of inputs) must be accessible via shape inference, but saw value None.
Why is reshape unable to handle this case? What is the right way of replicating Theano's flatten(ndim=n) in TensorFlow with tensors of rank 4 and more?
It is not a flaw in reshape, but a limitation of tf.dynamic_rnn.
Your code to flatten the last two dimensions is correct. And, reshape behaves correctly too: if the last two dimensions are unknown when you define the flattening operation, then so is their product, and None is the only appropriate value that can be returned at this time.
The culprit is tf.dynamic_rnn, which expects a fully-defined feature shape during construction, i.e. all dimensions apart from the first (batch size) and the second (time steps) must be known. It is a bit unfortunate perhaps, but the current implementation does not seem to allow RNNs with a variable number of features, à la FCN.
I tried a simple code according to your requirements. Since you are trying to reshape a CNN output, the shape of X is same as the output of CNN in Tensorflow.
HEIGHT = 100
WIDTH = 200
N_CHANELS =3
N_HIDDEN =64
X = tf.placeholder(tf.float32, shape=[None,HEIGHT,WIDTH,N_CHANELS],name='input') # output of CNN
shape = X.get_shape().as_list() # get the shape of each dimention shape[0] =BATCH_SIZE , shape[1] = HEIGHT , shape[2] = HEIGHT = WIDTH , shape[3] = N_CHANELS
input = tf.reshape(X, [-1, shape[1] , shape[2] * shape[3]])
print(input.shape) # prints (?, 100, 600)
#Input for tf.nn.dynamic_rnn should be in the shape of [BATCH_SIZE, N_TIMESTEPS, INPUT_SIZE]
#Therefore, according to the reshape N_TIMESTEPS = 100 and INPUT_SIZE= 600
#create the RNN here
lstm_layers = tf.contrib.rnn.BasicLSTMCell(N_HIDDEN, forget_bias=1.0)
outputs, _ = tf.nn.dynamic_rnn(lstm_layers, input, dtype=tf.float32)
Hope this helps.
I found a solution to this by using .get_shape().
Assuming 'x' is a 4-D Tensor.
This will only work with the Reshape Layer. As you were making changes to the architecture of the model, this should work.
x = tf.keras.layers.Reshape(x, [x.get_shape()[0], x.get_shape()[1], x.get_shape()[2] * x.get_shape()][3])
Hope this works!
If you use the tf.keras.models.Model or tf.keras.layers.Layer wrapper, the build method provides a nice way to do this.
Here's an example:
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Conv1D, Conv2D, Conv2DTranspose, Attention, Layer, Reshape
class VisualAttention(Layer):
def __init__(self, channels_out, key_is_value=True):
super(VisualAttention, self).__init__()
self.channels_out = channels_out
self.key_is_value = key_is_value
self.flatten_images = None # see build method
self.unflatten_images = None # see build method
self.query_conv = Conv1D(filters=channels_out, kernel_size=1, padding='same')
self.value_conv = Conv1D(filters=channels_out, kernel_size=4, padding='same')
self.key_conv = self.value_conv if key_is_value else Conv1D(filters=channels_out, kernel_size=4, padding='same')
self.attention_layer = Attention(use_scale=False, causal=False, dropout=0.)
def build(self, input_shape):
b, h, w, c = input_shape
self.flatten_images = Reshape((h*w, c), input_shape=(h, w, c))
self.unflatten_images = Reshape((h, w, self.channels_out), input_shape=(h*w, self.channels_out))
def call(self, x, training=True):
x = self.flatten_images(x)
q = self.query_conv(x)
v = self.value_conv(x)
inputs = [q, v] if self.key_is_value else [q, v, self.key_conv(x)]
output = self.attention_layer(inputs=inputs, training=training)
return self.unflatten_images(output)
# test
import numpy as np
x = np.arange(8*28*32*3).reshape((8, 28, 32, 3)).astype('float32')
model = VisualAttention(8)
y = model(x)
print(y.shape)

Gettin TensorFlow to work with distributed binary representation

I know how to create an rnn in TensorFlow with a one_hot vector:
x = tf.placeholder(tf.int32, [batch_size, num_steps], name='input_placeholder')
y = tf.placeholder(tf.int32, [batch_size, num_steps], name='labels_placeholder')
init_state = tf.zeros([batch_size, state_size])
x_one_hot = tf.one_hot(x, num_classes)
rnn_inputs = tf.unstack(x_one_hot, axis=1)
But I am not really sure what to do when my input vector has multiple 1s, eg. it could be 11011 as 1 input per time. so: [[11011],[00111],...]
Is there an issue if I would just feed this vector like I would have my one-hot representation? How should I formulate the above then? I feel like I shouldn't use the tf.one_hot function... Not sure how the shape of rnn_inputs (200 x 5 x 2) can be created without one_hot.
(using TF 1.0)

Tensorflow 170 times slower than Theano for RNN implementation

I am trying to implement a RNN in Tensorflow (0.11), based on this paper.
They have a Theano implementation here, that I am comparing my implementation to. When I try to run their Theano implementation, it finishes 10 epochs in about 1 hour. My Tensorflow implementation needs about 17 hours just to finish 1 epoch. I am wondering if anyone could look at my code and tell me if there are some obvious problems that are slowing it down.
The purpose of the RNN is to predict the next item a user is going to click on, given his previous clicks. The items are represented by unique IDs that are given as input to the RNN as a 1-HOT vector.
So the RNN is built like this:
[INPUT (1-HOT representation, size 37803)] -> [GRU layer (size 100)] -> [FeedForward layer]
and the ouput from the FF layer is a vector with the same size as the input vector, where high values indicate that the item corresponding to that index is very likely to be the next one clicked.
num_hidden = 100
x = tf.placeholder(tf.float32, [None, max_length, n_items], name="InputX")
y = tf.placeholder(tf.float32, [None, max_length, n_items], name="TargetY")
session_length = tf.placeholder(tf.int32, [None], name="SeqLenOfInput")
output, state = rnn.dynamic_rnn(
rnn_cell.GRUCell(num_hidden),
x,
dtype=tf.float32,
sequence_length=session_length
)
layer = {'weights':tf.Variable(tf.random_normal([num_hidden, n_items])),
'biases':tf.Variable(tf.random_normal([n_items]))}
output = tf.reshape(output, [-1, num_hidden])
prediction = tf.matmul(output, layer['weights'])
y_flat = tf.reshape(y, [-1, n_items])
final_output = tf.nn.softmax_cross_entropy_with_logits(prediction,y_flat)
cost = tf.reduce_sum(final_output)
optimizer = tf.train.AdamOptimizer().minimize(cost)
Both implementations are tested on the same hardware. Both implementations utilize the GPU.
EDIT:
The Theano model has the same structure. (1-HOT input -> GRU layer with 100 units -> FeedForward)
I tested the Theano version with the same parameters as I used in my model (using cross entropy for the loss, batch size=200, adam optimizer, with the same learning rate, no dropout in either model) but the speed difference is still the same.
EDIT (2016-12-07):
Using file queues to queue batches instead of using feed_dict helped alot.
I still need to do other optimizations to make it faster. Anyways, here is how I used file queues to make it faster.
# Create filename_queue
filename_queue = tf.train.string_input_producer(train_files, shuffle=True)
min_after_dequeue = 1024
capacity = min_after_dequeue + 3*batch_size
examples_queue = tf.RandomShuffleQueue(
capacity=capacity,
min_after_dequeue=min_after_dequeue,
dtypes=[tf.string])
# Create multiple readers to populate the queue of examples
enqueue_ops = []
for i in range(n_readers):
reader = tf.TextLineReader()
_key, value = reader.read(filename_queue)
enqueue_ops.append(examples_queue.enqueue([value]))
tf.train.queue_runner.add_queue_runner(
tf.train.queue_runner.QueueRunner(examples_queue, enqueue_ops))
example_string = examples_queue.dequeue()
# Default values, and type of the columns, first is sequence_length
# +1 since first field is sequence length
record_defaults = [[0]]*(max_sequence_length+1)
enqueue_examples = []
for thread_id in range(n_preprocess_threads):
example = tf.decode_csv(value, record_defaults=record_defaults)
# Split the row into input/target values
sequence_length = example[0]
features = example[1:-1]
targets = example[2:]
enqueue_examples.append([sequence_length, features, targets])
# Batch together examples
session_length, x_unparsed, y_unparsed = tf.train.batch_join(
enqueue_examples,
batch_size=batch_size,
capacity=2*n_preprocess_threads*batch_size)
# Parse the examples in a batch
x = tf.one_hot(x_unparsed, depth=n_classes)
y = tf.one_hot(y_unparsed, depth=n_classes)
# From here on, x, y and session_length can be used in the model

Categories