How to calculate logits matrix in Tensorflow? - python

I have LSTM model that gets one 88-dimensional vector per step at input. Each element in vector can be of class {0, 1, 2}. Output is coded as one-hot, so that means at each step I have matrix of size 3x88 at output. I would like to calculate cross-entropy loss. This is my model:
x = tf.placeholder(tf.float32, (None, None, INPUT_SIZE))
y = tf.placeholder(tf.float32, (None, None, None, OUTPUT_SIZE))
def LSTM(x_):
cell = tf.contrib.rnn.LSTMCell(RNN_HIDDEN, state_is_tuple=True)
cell = tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob=0.5)
cell = tf.contrib.rnn.MultiRNNCell([cell] * num_layers, state_is_tuple=True)
batch_size = tf.shape(x_)[0]
initial_state = cell.zero_state(batch_size, tf.float32)
rnn_outputs, rnn_states = tf.nn.dynamic_rnn(cell,
x_,
initial_state=initial_state,
time_major=False)
final_projection = lambda lx: layers.linear(lx, num_outputs=OUTPUT_SIZE,
activation_fn=None)
predicted_outputs = tf.map_fn(final_projection, rnn_outputs)
return predicted_outputs
Sample inputs and outputs to my network are here. In this sample, for inputs, size of batch is 1, there are 3 time steps, and data dimension is 88. Outputs are same, just data are transformed into one-hot vectors. So, batch size is 1 (1st dimension), there are 3 time steps (2nd dimension), there are 3 classes (3rd dimension) and data dimension is 88.
I do not know what to do with rnn_outputs and what to do to make predicted_outputs of appropriate shape so that I can call softmax_cross_entropy_with_logits(logits=pred, labels=batch_y_oh).
Code as it is now, gives me following error:
InvalidArgumentError (see above for traceback): logits and labels must be same size: logits_size=[3,88] labels_size=[9,88]
Is it even possible to calculate cross entropy like this, by feeding it directly to TF's function, or do I have to write my own function, because basically, loss would be sum of 88 cross entropies (I am thinking of iterating over columns and calling softmax_cross_entropy_with_logits() for every column?

Related

Keras Neural Net Loss Function

I've encountered a problem while writing Siamese net. Definition of the net takes as an input 2 vectors which represents 2 pieces of text. The vectors length is padded and different with respect to batches (in batch 1: vectors length = 32, in batch 2: vectors length = 64 and so on).
# model definition
def create_model(vocab_size=512, d_model=128):
def normalize(x):
norm = tf.norm(x, axis=-1, keepdims=True)
return tf.divide(x, norm)
component = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size, d_model),
tf.keras.layers.LSTM(d_model),
tf.keras.layers.Lambda(lambda x: tf.reduce_mean(x, axis=1)),
tf.keras.layers.Lambda(normalize),
])
# due to the variability in text, input shape differs with respect to batch
inputs = [tf.keras.Input(shape=(None,)) for _ in range(2)]
outputs = tf.tuple([component(ins) for ins in inputs])
return tf.keras.Model(inputs=inputs, outputs=outputs)
# loss function
class MyLoss(tf.keras.losses.Loss):
def __init__(self):
super().__init__(name='TripletLoss')
def call(self, y_true, y_pred):
# >>> HERE IS THE PROBLEM, y_pred has different shape then I'd expect,
# its shape is (batch_size,) instead of (2, batch_size)
l, r = y_pred
# compute and return loss
return loss
When calling Model#fit(loss=MyLoss(), ...) the parameter passed to the MyLoss#call is a projection of the first coordinate of the model prediction, i.e. model.predict(z) returns [x, y] where x, y are vectors with length equal to the batch size. I'd expected that y_pred passed as a parameter to Loss#call would have had that exact value, that is [x,y], but it equals to the first vector of the given list, that is x. Furthermore I've looked up at the call stack and I've spotted that before y_pred is passed to the MyLoss#call it has expected value ([x,y]) which changes to the x in the keras' Loss.__call__ body.
I tried to reshape input, but other problems arised.

Unable to make predictions due to incompatible matrix shape/size

I am trying to make a model to predict insurance cost based on the individual. And this is the code for it.
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader, TensorDataset
import numpy as np
import pandas as pd
from LSR import ListSearchReplace as LSR
csv = pd.read_csv("main.csv")
partialInputs = csv[["age", "bmi", "children"]]
smoker, sex = list(csv["smoker"]), list(csv["sex"])
L1 = LSR(smoker)
L1.replace("yes", 1, True)
L1.replace("no", 0, True)
L2 = LSR(sex)
L2.replace("female", 1, True)
L2.replace("male", 0, True)
pdReadySmoker = pd.DataFrame({"smoker": smoker})
pdReadySex = pd.DataFrame({"sex": sex})
SmokerAndSex = pd.merge(pdReadySmoker, pdReadySex, how="outer", left_index=True, right_index=True)
INPUTS = pd.merge(partialInputs, SmokerAndSex, how="outer", left_index=True, right_index=True)
TARGETS = csv["charges"]
INPUTS = torch.from_numpy(np.array(INPUTS, dtype='float32'))
TARGETS = torch.from_numpy(np.array(TARGETS, dtype='float32'))
print(INPUTS.shape, TARGETS.shape)
loss_fn = F.mse_loss
model = nn.Linear(5, 3) # <-- changing this, changes the error message.
opt = torch.optim.SGD(model.parameters(), lr=1e-5)
trainDataset = TensorDataset(INPUTS, TARGETS)
BATCH_SIZE = 5
trainDataloader = DataLoader(trainDataset, BATCH_SIZE, shuffle=True)
def fit(numEpochs, model, loss_fn, opt, trainDataloader):
for epochs in range(numEpochs):
for inputBatch, targetBatch in trainDataloader:
preds = model(inputBatch)
loss = loss_fn(preds, targetBatch)
loss.backward()
opt.step()
opt.zero_grad()
e = epoch + 1
if e % 10 == 0:
print(f"Epoch: {e/numEpochs}, loss: {loss.item():.4f}")
fit(100, model, loss_fn, opt, trainDataloader) <-- error
Error produced:
<ipython-input-7-b7028a3d94fd>:5: UserWarning: Using a target size (torch.Size([5])) that is different to the input size (torch.Size([5, 3])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
loss = loss_fn(preds, targetBatch)
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-20-d8f5bcdc847d> in <module>
----> 1 fit(100, model, loss_fn, opt, trainDataloader)
<ipython-input-7-b7028a3d94fd> in fit(numEpochs, model, loss_fn, opt, trainDataloader)
3 for inputBatch, targetBatch in trainDataloader:
4 preds = model(inputBatch)
----> 5 loss = loss_fn(preds, targetBatch)
6 loss.backward()
7
D:\coding\machine-learning\env-ml\lib\site-packages\torch\nn\functional.py in mse_loss(input, target, size_average, reduce, reduction)
2657 reduction = _Reduction.legacy_get_string(size_average, reduce)
2658
-> 2659 expanded_input, expanded_target = torch.broadcast_tensors(input, target)
2660 return torch._C._nn.mse_loss(expanded_input, expanded_target, _Reduction.get_enum(reduction))
2661
D:\coding\machine-learning\env-ml\lib\site-packages\torch\functional.py in broadcast_tensors(*tensors)
69 if any(type(t) is not Tensor for t in tensors) and has_torch_function(tensors):
70 return handle_torch_function(broadcast_tensors, tensors, *tensors)
---> 71 return _VF.broadcast_tensors(tensors) # type: ignore
72
73
RuntimeError: The size of tensor a (3) must match the size of tensor b (5) at non-singleton dimension 1
I've tried changing the dimensions of the of model, and these are a few of the changes made and the associated errors:
model = nn.Linear(5, 1338)
Error:
RuntimeError: The size of tensor a (1338) must match the size of tensor b (5) at non-singleton dimension 1
model = nn.Linear(1338, 1338)
Error:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (5x5 and 1338x1338)
Sometimes this error, will make me change the matrix to the correct shape, but that results in the previous error regarding non-singleton dimension
This should be quite straight-forward, you only have a single layer. This is a matter of sorting the shapes right.
You are feeding a nn.Linear layer an input with shape input_shape. This type of layer takes two arguments: in_features the number of features in the input vector, and out_features the number of features in the resulting vector. Since you are using the F.mse_loss, your target vector needs to have the same shape as your prediction.
Bear in mind the first dimension is the batch dimension. In summary, your input tensor has shape (batch, input_size), your dense layer is defined as nn.Linear(input_size, out_size) and your target tensor has shape (batch, output_size).
Coming back to your case, your TARGETS tensor is of shape (1338) so you either mean to:
have a single prediction with 1338 components which would match a nn.Linear(?, 1338) and it would actually correspond to (1, 1338) (a single element in the batch). This can be fixed with TARGETS = TARGETS.unsqueeeze(0).
or, there are actually 1338 predictions one element, which would match a nn.Linear(?, 1) and the appropriate target shape would be (1338, 1). This can be fixed with TARGETS = TARGETS.unsqueeeze(-1) (adds an additional axis after the last dimension).
Your input dimension is 5, and you predict a scalar value (target) for each input.
Therefore, your linear model should be of size:
model = nn.Linear(5, 1) # from 5-dim inputs to 1-dim output
I think the setting batch size to 5 (similar to input dimension) is confusing you. Try changing the batch size and see how it does not affect the dimensions of the model.

How to create end execute a basic LSTM network in TensorFlow?

I want to create a basic LSTM network that accept sequences of 5 dimensional vectors (for example as a N x 5 arrays) and returns the corresponding sequences of 4 dimensional hidden- and cell-vectors (N x 4 arrays), where N is the number of time steps.
How can I do it TensorFlow?
ADDED
So, far I got the following code working:
num_units = 4
lstm = tf.nn.rnn_cell.LSTMCell(num_units = num_units)
timesteps = 18
num_input = 5
X = tf.placeholder("float", [None, timesteps, num_input])
x = tf.unstack(X, timesteps, 1)
outputs, states = tf.contrib.rnn.static_rnn(lstm, x, dtype=tf.float32)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
x_val = np.random.normal(size = (12,18,5))
res = sess.run(outputs, feed_dict = {X:x_val})
sess.close()
However, there are many open questions:
Why number of time steps is preset? Shouldn't LSTM be able to accept sequences of arbitrary length?
Why do we split data by time-steps (using unstack)?
How to interpret the "outputs" and "states"?
Why number of time steps is preset? Shouldn't LSTM be able to accept
sequences of arbitrary length?
If you want to accept sequences of arbitrary length, I recommend using dynamic_rnn.You can refer here to understand the difference between them.
For example:
num_units = 4
lstm = tf.nn.rnn_cell.LSTMCell(num_units = num_units)
num_input = 5
X = tf.placeholder("float", [None, None, num_input])
outputs, states = tf.nn.dynamic_rnn(lstm, X, dtype=tf.float32)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
x_val = np.random.normal(size = (12,18,5))
res = sess.run(outputs, feed_dict = {X:x_val})
x_val = np.random.normal(size = (12,16,5))
res = sess.run(outputs, feed_dict = {X:x_val})
sess.close()
dynamic_rnn require same length in one batch , but you can specify every length using the sequence_length parameter after you pad batch data when you need arbitrary length in one batch.
We do we split data by time-steps (using unstack)?
Just static_rnn needs to split data with unstack,this depending on their different input requirements. The input shape of static_rnn is [timesteps,batch_size, features], which is a list of 2D tensors of shape [batch_size, features]. But the input shape of dynamic_rnn is either [timesteps,batch_size, features] or [batch_size,timesteps, features] depending on time_major is True or False.
How to interpret the "outputs" and "states"?
The shape of states is [2,batch_size,num_units ] in LSTMCell, one [batch_size, num_units ] represents C and the other [batch_size, num_units ] represents h. You can see pictures below.
In the same way, You will get the shape of states is [batch_size, num_units ] in GRUCell.
outputs represents the output of each time step, so by default(time_major=False) its shape is [batch_size, timesteps, num_units]. And You can easily conclude that
state[1, batch_size, : ] == outputs[ batch_size, -1, : ].

Video classification using many to many LSTM in TensorFlow

I have to build a binary classifier to predict whether the input video contains an action or not.
The input to the model will be of shape: [batch, frames, height, width, channel]
Here, batch is number of videos, frames is number of images in that video (It's fixed for every video), height is number of rows in that image, width is number of columns in that image, and channel is RGB colors.
I found in Andrej Karpathy blog that many to many Recurrent Neural Network is best for this application: http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Thus, I need to implement this in TensorFlow:
I learned how to implement LSTM using this tutorial: https://github.com/nlintz/TensorFlow-Tutorials/blob/master/07_lstm.py#L52
But, it is implementing many to one LSTM and predicting output and reducing loss using only last tensor: outputs[-1]
And, I want to predict output using many tensors (let say 4) and reduce loss using them.
Here's my implementation:
import tensorflow as tf
from tensorflow.contrib import rnn
import numpy as np
# Training Parameters
batch = 5 # number of examples
frames = time_step_size = 20
height = 60
width = 80
channel = 3
lstm_size = 240
num_classes = 2
# Creating random data
input_x = np.random.normal(size=[batch, frames, height, width, channel])
input_y = np.zeros((batch, num_classes))
B = np.ones(batch)
input_y[:,1] = B
X = tf.placeholder("float", [None, frames, height, width, channel], name='InputData')
Y = tf.placeholder("float", [None, num_classes], name='LabelData')
with tf.name_scope('Model'):
XR = tf.reshape(X, [-1, height*width*channel]) # shape=(?, 14400)
X_split3 = tf.split(XR, time_step_size, 0) # 20 tensors of shape=(?, 14400)
lstm = rnn.BasicLSTMCell(lstm_size, forget_bias=1.0, state_is_tuple=True)
outputs, _states = rnn.static_rnn(lstm, X_split3, dtype=tf.float32) # 20 tensors of shape=(?, 240)
logits = tf.layers.dense(outputs[-1], num_classes, name='logits') # shape=(?, 2)
prediction = tf.nn.softmax(logits)
# Define loss and optimizer
with tf.name_scope('Loss'):
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y))
with tf.name_scope('optimizer'):
optimizer = tf.train.AdamOptimizer(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, use_locking=False, name='Adam')
train_op = optimizer.minimize(loss_op)
# Evaluate model (with test logits, for dropout to be disabled)
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
with tf.name_scope('Accuracy'):
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
with tf.Session() as sess:
tf.global_variables_initializer().run()
logits_output = sess.run(logits, feed_dict={X: input_x})
print(logits_output.shape) # shape=(5, 2)
sess.run(train_op, feed_dict={X: input_x, Y: input_y})
loss, acc = sess.run([loss_op, accuracy], feed_dict={X: input_x, Y: input_y})
print("Loss: ", loss) # loss: 1.46626135e-05
print("Accuracy: ", acc) # Accuracy: 1.0
Problems:
1. I need help to implement many to many LSTM and predict output after certain frames (let say 4), but, I am only using last tensor outputs[-1] to reduce loss. There are 20 tensors, one for each frames or time_step_size. If I transform every 5th tensor: outputs[4], outputs[9], outputs[14], outputs[-1], I will get 4 logits. So, how I am going to reduce loss on all four of them?
2. One more problem is, I have to implement binary classifier, but I only have video of action I want to identify. So, the input_y is one hot representation of labels in which 1st column is always 0 and 2nd column is always 1 (action I have to identify), and I don’t have any example video in which 1st column's value is 1. Do you think it will work?
3. Why in above implement, in only one iteration the accuracy is 1?
Thanks
For 1., Dense takes any number of batch dimensions, so you should be able to transform into logits from all of the steps in one go (then likewise operate on a batch until you get a final loss for each step, then aggregate e.g. by taking the mean).
For 2. and 3., it seems like you need to find some negative examples. There's a literature on "positive and unlabeled (PU)" learning and "one-class classification" which may help.

Predicting the next word using the LSTM ptb model tensorflow example

I am trying to use the tensorflow LSTM model to make next word predictions.
As described in this related question (which has no accepted answer) the example contains pseudocode to extract next word probabilities:
lstm = rnn_cell.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
state = tf.zeros([batch_size, lstm.state_size])
loss = 0.0
for current_batch_of_words in words_in_dataset:
# The value of state is updated after processing each batch of words.
output, state = lstm(current_batch_of_words, state)
# The LSTM output can be used to make next word predictions
logits = tf.matmul(output, softmax_w) + softmax_b
probabilities = tf.nn.softmax(logits)
loss += loss_function(probabilities, target_words)
I am confused about how to interpret the probabilities vector. I modified the __init__ function of the PTBModel in ptb_word_lm.py to store the probabilities and logits:
class PTBModel(object):
"""The PTB model."""
def __init__(self, is_training, config):
# General definition of LSTM (unrolled)
# identical to tensorflow example ...
# omitted for brevity ...
# computing the logits (also from example code)
logits = tf.nn.xw_plus_b(output,
tf.get_variable("softmax_w", [size, vocab_size]),
tf.get_variable("softmax_b", [vocab_size]))
loss = seq2seq.sequence_loss_by_example([logits],
[tf.reshape(self._targets, [-1])],
[tf.ones([batch_size * num_steps])],
vocab_size)
self._cost = cost = tf.reduce_sum(loss) / batch_size
self._final_state = states[-1]
# my addition: storing the probabilities and logits
self.probabilities = tf.nn.softmax(logits)
self.logits = logits
# more model definition ...
Then printed some info about them in the run_epoch function:
def run_epoch(session, m, data, eval_op, verbose=True):
"""Runs the model on the given data."""
# first part of function unchanged from example
for step, (x, y) in enumerate(reader.ptb_iterator(data, m.batch_size,
m.num_steps)):
# evaluate proobability and logit tensors too:
cost, state, probs, logits, _ = session.run([m.cost, m.final_state, m.probabilities, m.logits, eval_op],
{m.input_data: x,
m.targets: y,
m.initial_state: state})
costs += cost
iters += m.num_steps
if verbose and step % (epoch_size // 10) == 10:
print("%.3f perplexity: %.3f speed: %.0f wps, n_iters: %s" %
(step * 1.0 / epoch_size, np.exp(costs / iters),
iters * m.batch_size / (time.time() - start_time), iters))
chosen_word = np.argmax(probs, 1)
print("Probabilities shape: %s, Logits shape: %s" %
(probs.shape, logits.shape) )
print(chosen_word)
print("Batch size: %s, Num steps: %s" % (m.batch_size, m.num_steps))
return np.exp(costs / iters)
This produces output like this:
0.000 perplexity: 741.577 speed: 230 wps, n_iters: 220
(20, 10000) (20, 10000)
[ 14 1 6 589 1 5 0 87 6 5 3 5 2 2 2 2 6 2 6 1]
Batch size: 1, Num steps: 20
I was expecting the probs vector to be an array of probabilities, with one for each word in the vocabulary (eg with shape (1, vocab_size)), meaning that I could get the predicted word using np.argmax(probs, 1) as suggested in the other question.
However, the first dimension of the vector is actually equal to the number of steps in the unrolled LSTM (20 if the small config settings are used), which I'm not sure what to do with. To access to the predicted word, do I just need to use the last value (because it's the output of the final step)? Or is there something else that I'm missing?
I tried to understand how the predictions are made and evaluated by looking at the implementation of seq2seq.sequence_loss_by_example, which must perform this evaluation, but this ends up calling gen_nn_ops._sparse_softmax_cross_entropy_with_logits, which doesn't seem to be included in the github repo, so I'm not sure where else to look.
I'm quite new to both tensorflow and LSTMs, so any help is appreciated!
The output tensor contains the concatentation of the LSTM cell outputs for each timestep (see its definition here). Therefore you can find the prediction for the next word by taking chosen_word[-1] (or chosen_word[sequence_length - 1] if the sequence has been padded to match the unrolled LSTM).
The tf.nn.sparse_softmax_cross_entropy_with_logits() op is documented in the public API under a different name. For technical reasons, it calls a generated wrapper function that does not appear in the GitHub repository. The implementation of the op is in C++, here.
I am implementing seq2seq model too.
So lets me try to explain with my understanding:
The outputs of your LSTM model is a list (with length num_steps) of 2D tensor of size [batch_size, size].
The code line:
output = tf.reshape(tf.concat(1, outputs), [-1, size])
will produce a new output which is a 2D tensor of size [batch_size x num_steps, size].
For your case, batch_size = 1 and num_steps = 20 --> output shape is [20, size].
Code line:
logits = tf.nn.xw_plus_b(output, tf.get_variable("softmax_w", [size, vocab_size]), tf.get_variable("softmax_b", [vocab_size]))
<=> output[batch_size x num_steps, size] x softmax_w[size, vocab_size] will output logits of size [batch_size x num_steps, vocab_size].
For your case, logits of size [20, vocab_size]
--> probs tensor has same size as logits by [20, vocab_size].
Code line:
chosen_word = np.argmax(probs, 1)
will output chosen_word tensor of size [20, 1] with each value is the next prediction word index of current word.
Code line:
loss = seq2seq.sequence_loss_by_example([logits], [tf.reshape(self._targets, [-1])], [tf.ones([batch_size * num_steps])])
is to compute the softmax cross entropy loss for batch_size of sequences.

Categories