Gettin TensorFlow to work with distributed binary representation - python

I know how to create an rnn in TensorFlow with a one_hot vector:
x = tf.placeholder(tf.int32, [batch_size, num_steps], name='input_placeholder')
y = tf.placeholder(tf.int32, [batch_size, num_steps], name='labels_placeholder')
init_state = tf.zeros([batch_size, state_size])
x_one_hot = tf.one_hot(x, num_classes)
rnn_inputs = tf.unstack(x_one_hot, axis=1)
But I am not really sure what to do when my input vector has multiple 1s, eg. it could be 11011 as 1 input per time. so: [[11011],[00111],...]
Is there an issue if I would just feed this vector like I would have my one-hot representation? How should I formulate the above then? I feel like I shouldn't use the tf.one_hot function... Not sure how the shape of rnn_inputs (200 x 5 x 2) can be created without one_hot.
(using TF 1.0)

Related

How to create end execute a basic LSTM network in TensorFlow?

I want to create a basic LSTM network that accept sequences of 5 dimensional vectors (for example as a N x 5 arrays) and returns the corresponding sequences of 4 dimensional hidden- and cell-vectors (N x 4 arrays), where N is the number of time steps.
How can I do it TensorFlow?
ADDED
So, far I got the following code working:
num_units = 4
lstm = tf.nn.rnn_cell.LSTMCell(num_units = num_units)
timesteps = 18
num_input = 5
X = tf.placeholder("float", [None, timesteps, num_input])
x = tf.unstack(X, timesteps, 1)
outputs, states = tf.contrib.rnn.static_rnn(lstm, x, dtype=tf.float32)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
x_val = np.random.normal(size = (12,18,5))
res = sess.run(outputs, feed_dict = {X:x_val})
sess.close()
However, there are many open questions:
Why number of time steps is preset? Shouldn't LSTM be able to accept sequences of arbitrary length?
Why do we split data by time-steps (using unstack)?
How to interpret the "outputs" and "states"?
Why number of time steps is preset? Shouldn't LSTM be able to accept
sequences of arbitrary length?
If you want to accept sequences of arbitrary length, I recommend using dynamic_rnn.You can refer here to understand the difference between them.
For example:
num_units = 4
lstm = tf.nn.rnn_cell.LSTMCell(num_units = num_units)
num_input = 5
X = tf.placeholder("float", [None, None, num_input])
outputs, states = tf.nn.dynamic_rnn(lstm, X, dtype=tf.float32)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
x_val = np.random.normal(size = (12,18,5))
res = sess.run(outputs, feed_dict = {X:x_val})
x_val = np.random.normal(size = (12,16,5))
res = sess.run(outputs, feed_dict = {X:x_val})
sess.close()
dynamic_rnn require same length in one batch , but you can specify every length using the sequence_length parameter after you pad batch data when you need arbitrary length in one batch.
We do we split data by time-steps (using unstack)?
Just static_rnn needs to split data with unstack,this depending on their different input requirements. The input shape of static_rnn is [timesteps,batch_size, features], which is a list of 2D tensors of shape [batch_size, features]. But the input shape of dynamic_rnn is either [timesteps,batch_size, features] or [batch_size,timesteps, features] depending on time_major is True or False.
How to interpret the "outputs" and "states"?
The shape of states is [2,batch_size,num_units ] in LSTMCell, one [batch_size, num_units ] represents C and the other [batch_size, num_units ] represents h. You can see pictures below.
In the same way, You will get the shape of states is [batch_size, num_units ] in GRUCell.
outputs represents the output of each time step, so by default(time_major=False) its shape is [batch_size, timesteps, num_units]. And You can easily conclude that
state[1, batch_size, : ] == outputs[ batch_size, -1, : ].

Iterate over a tensor dimension in Tensorflow

I am trying to develop a seq2seq model from a low level perspective (creating by myself all the tensors needed). I am trying to feed the model with a sequence of vectors as a two-dimensional tensor, however, i can't iterate over one dimension of the tensor to extract vector by vector. Does anyone know what could I do to feed a batch of vectors and later get them one by one?
This is my code:
batch_size = 100
hidden_dim = 5
input_dim = embedding_dim
time_size = 5
input_sentence = tf.placeholder(dtype=tf.float64, shape=[embedding_dim,None], name='input')
output_sentence = tf.placeholder(dtype=tf.float64, shape=[embedding_dim,None], name='output')
input_array = np.asarray(input_sentence)
output_array = np.asarray(output_sentence)
gru_layer1 = GRU(input_array, input_dim, hidden_dim) #This is a class created by myself
for i in range(input_array.shape[-1]):
word = input_array[:,i]
previous_state = gru_encoder.h_t
gru_layer1.forward_pass(previous_state,word)
And this is the error that I get
TypeError: Expected binary or unicode string, got <tf.Tensor 'input_7:0' shape=(10, ?) dtype=float64>
Tensorflow does deferred execution.
You usually can't know how big the vector will be (words in a sentance, audio samples, etc...). The common thing to do is to cap it at some reasonably large value and then pad the shorter sequences with an empty token.
Once you do this you can select the data for a time slice with the slice operator:
data = tf.placeholder(shape=(batch_size, max_size, numer_of_inputs))
....
for i in range(max_size):
time_data = data[:, i, :]
DoStuff(time_data)
Also lookup tf.transpose for swapping batch and time indices. It can help with performance in certain cases.
Alternatively consider something like tf.nn.static_rnn or tf.nn.dynamic_rnn to do the boilerplate stuff for you.
Finally I found an approach that solves my problem. It worked using tf.scan() instead of a loop, which doesn't require the input tensor to have a defined number in the second dimension. Consecuently you hace to prepare the input tensor previously to be parsed as you want throught tf.san(). In my case this is the code:
batch_size = 100
hidden_dim = 5
input_dim = embedding_dim
time_size = 5
input_sentence = tf.placeholder(dtype=tf.float64, shape=[embedding_dim,None], name='input')
output_sentence = tf.placeholder(dtype=tf.float64, shape=[embedding_dim,None], name='output')
input_array = np.asarray(input_sentence)
output_array = np.asarray(output_sentence)
x_t = tf.transpose(input_array, [1, 0], name='x_t')
h_0 = tf.convert_to_tensor(h_0, dtype=tf.float64)
h_t_transposed = tf.scan(forward_pass, x_t, h_0, name='h_t_transposed')
h_t = tf.transpose(h_t_transposed, [1, 0], name='h_t')

Tensorflow, declare a vector depending on another tensor

I'm new with tensorflow.
Using tensorflow, I want to define a vector which depends on the output of my neural net to compute the wanted cost function:
# Build the neural network
X = tf.placeholder(tf.float32, shape=[None, n_inputs], name='X')
hidden = fully_connected(X, n_hidden, activation_fn=tf.nn.elu, weights_initializer=initializer)
logits = fully_connected(hidden, n_outputs, activation_fn=None, weights_initializer=initializer)
outputs = tf.nn.softmax(logits)
# Select a random action based on the probability
action = tf.multinomial(tf.log(outputs), num_samples=1)
# Define the target if the action chosen was correct and the cost function
y = np.zeros(n_outputs)
y[int(tf.to_float(action))] = 1.0
cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(labels=y, logits=logits)
To define y, I need the value of action (between 0 and 9) so that my vector y is [0,0,0,1,0 ...] whith the 1 at the index "action".
But action is a tensor and not an integer so I can't do that !
This code before crashes because I can't apply int to a Tensor object...
What should I do ?
Many thanks
tf.one_hot() is the function you are looking for.
You will have to do as follows :
action_indices = tf.cast(action, tf.int32)
y = tf.one_hot(action_indices)

How to calculate logits matrix in Tensorflow?

I have LSTM model that gets one 88-dimensional vector per step at input. Each element in vector can be of class {0, 1, 2}. Output is coded as one-hot, so that means at each step I have matrix of size 3x88 at output. I would like to calculate cross-entropy loss. This is my model:
x = tf.placeholder(tf.float32, (None, None, INPUT_SIZE))
y = tf.placeholder(tf.float32, (None, None, None, OUTPUT_SIZE))
def LSTM(x_):
cell = tf.contrib.rnn.LSTMCell(RNN_HIDDEN, state_is_tuple=True)
cell = tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob=0.5)
cell = tf.contrib.rnn.MultiRNNCell([cell] * num_layers, state_is_tuple=True)
batch_size = tf.shape(x_)[0]
initial_state = cell.zero_state(batch_size, tf.float32)
rnn_outputs, rnn_states = tf.nn.dynamic_rnn(cell,
x_,
initial_state=initial_state,
time_major=False)
final_projection = lambda lx: layers.linear(lx, num_outputs=OUTPUT_SIZE,
activation_fn=None)
predicted_outputs = tf.map_fn(final_projection, rnn_outputs)
return predicted_outputs
Sample inputs and outputs to my network are here. In this sample, for inputs, size of batch is 1, there are 3 time steps, and data dimension is 88. Outputs are same, just data are transformed into one-hot vectors. So, batch size is 1 (1st dimension), there are 3 time steps (2nd dimension), there are 3 classes (3rd dimension) and data dimension is 88.
I do not know what to do with rnn_outputs and what to do to make predicted_outputs of appropriate shape so that I can call softmax_cross_entropy_with_logits(logits=pred, labels=batch_y_oh).
Code as it is now, gives me following error:
InvalidArgumentError (see above for traceback): logits and labels must be same size: logits_size=[3,88] labels_size=[9,88]
Is it even possible to calculate cross entropy like this, by feeding it directly to TF's function, or do I have to write my own function, because basically, loss would be sum of 88 cross entropies (I am thinking of iterating over columns and calling softmax_cross_entropy_with_logits() for every column?

Dynamic shape and Indexing with Tensorflow

I am adapting this MLP tutorial for Tensorflow.
However, I can't figure out how to properly and efficiently compute the negative-log-likelihood. When I try to build the following graph, an error occur :
x = tf.placeholder(tf.float32, shape=(None, 784), name='x')
y = tf.placeholder(tf.int32, shape=(None,), name='y')
# [...] Define the tf.Variables W_hidden, W_out b_hidden, b_out
hidden_act = tf.nn.relu(tf.matmul(x, W_hidden) + b_hidden)
p_y_given_x = tf.nn.softmax(tf.matmul(hidden_act, W_out) + b_out)
ind_i = tf.range(tf.shape(p_y_given_x)[0])
ind_j = y
nLL = -tf.reduce_mean(tf.log(p_y_given_x[ind_i, ind_j]))
ValueError: Shape must be rank 1 but is rank 2 for 'strided_slice_2' (op: 'StridedSlice') with input shapes: [?,10], [2,?], [2,?], [2]
Is it possible to index a tensor with dynamic shape in the graph?
What am I missing?
P.s. When I evaluate (sess.run(...)) individually the p_y_given_x, ind_i and ind_j tensors, I can then easily index the resulting ndarray p_y_given_x with the two index vectors. The problem is that I don't know how to do that symbolically in TF.

Categories