When I attempt to use the softmax cross entropy function, I get a ValueError saying
ValueError: Rank mismatch: Rank of labels (received 2) should equal rank of logits minus 1 (received 2).
The thing is that my layers are built in such a way that my logits should output only 1 value.
The shape of my logits is (5, 1) but I have no idea why there is a 5. The X for each instance is a 5x7 matrix
X = tf.placeholder(shape=(1, 5, 7), name='inputs', dtype=tf.float32)
y = tf.placeholder(shape=(1, 1), name='outputs', dtype=tf.int32)
hidden1 = tf.layers.dense(X, 150)
hidden2 = tf.layers.dense(hidden1, 50)
logits = tf.layers.dense(hidden2, 1)
with tf.name_scope("loss"):
xentropy= tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y,
logits=logits)
loss = tf.reduce_mean(xentropy, name="loss")
Edit
Check the comment, and try this code.
X = tf.placeholder(shape=(1, 5, 7), name='inputs', dtype=tf.float32)
y = tf.placeholder(shape=(1), name='outputs', dtype=tf.int32)
flattened = tf.layers.flatten(X) # shape (1,35)
hidden1 = tf.layers.dense(flattened, 150) # shape (1,150)
hidden2 = tf.layers.dense(hidden1, 50) # shape (1,50)
logits = tf.layers.dense(hidden2, 1) # shape (1,1)
with tf.name_scope("loss"):
# expects logits of shape (1,1) against labels of shape (1)
xentropy= tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y,
logits=logits)
loss = tf.reduce_mean(xentropy, name="loss")
Original
Let's think through what's going on here.
You create an X placeholder with the shape (5,7) (presumably (batch_size, data_size)).
You feed it into a hidden layer, which transforms the shape from (batch_size, data_size) to (batch_size, units) (units here is 150)
Likewise for the next two layers with hidden2 and logits, resulting in logits having shape (batch_size, 1), which is (5, 1) in this case
You're computing cross entropy between the labels and logits. The requirement for shapes here is for logits to have shape (batch_size, num_classes), where each value is the weight for a particular class, and for labels to have shape (batch_size), where each value is the class number for that particular sample. So this is where things go wrong for you. Your y has shape (1,1), and TF is expecting just a tensor or shape (5).
From what I'm guessing, I think you're trying to directly feed forward X as the data of a single sample (so like a (5,7) shaped matrix). If this is the case, you should have X take the shape (1,5,7) to signify to Tensorflow that X only represents one piece of data.
The thing is that my layers are built in such a way that my logits should output only 1 value.
That's not true. When X is an a X b tensor and you do tf.layers.dense(X, c), you are multiplying X by a b X c matrix (and bias also added of size c). So output size is a X c.
In your case since the first dimension of X is 5, it continues to be 5 even for logits. And your logits should be of size 5. So you are definitely doing something wrong. It is difficult to say what is without more information.
Related
I want to implement a GRU for sentiment analysis and this is what I have so far:
epochs = 20
batch_size = 25
embedding_size = 50
layers = 1
max_label = 2 # only valid target labels are 0 and 1
# one word is fed in at any time instance
embedding_matrix = tf.Variable(tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
embeddings = tf.nn.embedding_lookup(embedding_matrix, x)
# number of neurons in a single LSTM cell is equal to the embedding size
cell = tf.contrib.rnn.GRUCell(embedding_size)
cell = tf.contrib.rnn.DropoutWrapper(cell=cell, output_keep_prob=0.75)
# encoding fed into softmax prediction layer
output, states = tf.nn.dynamic_rnn(cell, embeddings, dtype=tf.float32)
logits = tf.layers.dense(output, max_label, activation=None)
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=y)
loss = tf.reduce_mean(cross_entropy)
# prediction is accurate if predicted label is equal to the actual label
prediction = tf.equal(tf.argmax(logits, 1), tf.cast(y, tf.int64))
accuracy = tf.reduce_mean(tf.cast(prediction, tf.float32))
But I get this error:
ValueError: Rank mismatch: Rank of labels (received 1) should equal rank of logits minus 1 (received 3).
I tried changing the value of max_label to 2 and 1 and tried to make logits=logits-1 but the error is still there. How can I fix it?
NB: This is without knowing the size of your label vector y nor which line the error message is coming from. Please include this important information in future questions.
According to the tensorflow 1.3 docs, the usage of nn.sparse_softmax_cross_entropy_with_logits is:
"... to have logits of shape [batch_size, num_classes] and labels of shape [batch_size]. But higher dimensions are supported."
From my experience with using RNN's in tensorflow, the size convention of your model will be [Batch, Length of Sequence, Feature Size]. So, your output will be 3 dimensional. According to the quote I've attached above, your output should then be of size [Batch, Length of Sequence], with each value being the index of your label (so if position (X,Y) is class 30, then, in your code, y[X,Y] = 30). This is different (from my memory) to the non-sparse version, which takes in one-hot (or multiple-hot) encodings.
When using a SimpleRNN or LSTM for classical sentiment analysis algorithms (applied here to sentences of length <= 250 words/tokens):
model = Sequential()
model.add(Embedding(5000, 32, input_length=250)) # Output shape: (None, 250, 32)
model.add(SimpleRNN(100)) # Output shape: (None, 100)
model.add(Dense(1, activation='sigmoid')) # Output shape: (None, 1)
where is it specified which axis of the input of the RNN is used as the "temporal" axis?
To be more precise, after the Embedding layer, a given input sentence, e.g. "the cat sat on the mat", is encoded into a matrix x of shape (250, 32), where 250 is the max length (in words) of the input text, and 32 the dimension of the embedding. Then, where in Keras is it specified if this will be used:
h[t] = activation( W_h * x[:, t] + U_h * h[t-1] + b_h )
or this:
h[t] = activation( W_h * x[t, :] + U_h * h[t-1] + b_h )
(In both cases, y[t] = activation( W_y * h[t] + b_y ))
TL;DR: if an input for a RNN Keras layer is of size, say, (250, 32), which axis does it use as the temporal axis by default? Where is this detailed in the Keras or Tensorflow documentation?
PS: how to explain the number of parameters (given by model.summary()) which is 13300? W_h has 100x32 coefs, U_h has 100x100 coefs, b_h has 100x1 coefs, i.e. we already have 13300! There is no coefs left for W_y and b_y! How to explain this?
Temporal axis: it's always dim 1, unless time_major=True, then it's dim 2; the Embedding layer outputs a 3D tensor. This can be seen here where step_input_shape is the shape of input fed to the RNN cell at each step in the recurrent loop. For your case, timesteps=250, and the SimpleRNN cell "sees" a tensor shaped (batch_size, 32) at each step.
# of params: you can see how the figure's derived by inspecting each layer's .build() code: Embedding, SimpleRNN, Dense, or likewise calling .weights on each layer. For your case, w/ l = model.layers[1]:
l.weights[0].shape == (32, 100) --> 3200 params (kernel)
l.weights[1].shape == (100, 100) --> 10000 params (recurrent_kernel)
l.weights[2].shape == (100,) --> 100 params (bias) (sum: 13,300)
Computation logic: there is no W_y or b_y; the "y" is the hidden state, h, actually for all recurrent layers - what you cite are likely from generic RNN formulae. # "in both cases..." - this is false; to see what's actually happening, inspect the .call() code.
P.S. I recommend defining the full batch_shape of the model for debugging, as it eliminates the ambiguous None shapes
SimpleRNN formula vs. code: as requested; note the h in source code is misleading, and is typically z in formulae ("pre-activation").
return_sequences=True -> outputs for all timesteps are returned: (batch_size, timesteps, channels)
return_sequences=False -> only last timestep's output is returned: (batch_size, 1, channels). See here
I'm building an autoencoder based on RNN. After FC layer, I have to reshape my output to [batch_size, sequence_length, embedding_dimension]. However, my sequence length(timestep) for my decoder is uncertain. What I wish is something work as follow.
outputs = tf.reshape(outputs, [batch_size, None, word_dimension])
Or, is there any other way for me to get the sequence length from the input data which has a shape [batch_size, sequence_length, embedding_dimension].
You can use -1 for the dimension in your reshape operation that you want to be calculated automatically.
For example, here:
x = tf.zeros((100 * 10 *12,))
reshaped = tf.reshape(x, [100, -1, 12])
reshaped will have shape (100, 10, 12)
Or, is there any other way for me to get the sequence length from the input data which has a shape [batch_size, sequence_length, embedding_dimension].
You can use the tf.shape operation to find the shape of a tensor at runtime so if you want sequence_length in a tensor with shape [batch_size, sequence_length, embedding_dimension], you need just call tf.shape(x)[1].
For my example above, calling:
tf.shape(reshaped)[1]
would give an int32 tensor with shape () and value 10
I need to build custom categorical cross entropy loss function where I should compare y_true and Q*y_pred instead of just y_pred. Q is a matrix.
The problem is that the batch size must not be equal to 1. So, there is a problem with dimensions.
How to built categorical cross entropy loss function which works with batch_size=200?
For example, this is the custom categorical cross entropy loss function which works correctly but for batch_size = 1.
I have 3 classes, so, the shape of y_pred is (batch_size, 3, 1) and the shape of Q is (3,3).
I also tried to transfer a multidimensional numpy array with shape = (batch_size, 3, 3) but it did not work.
Q=np.matrix([[0, 0.7,0.2], [0,0,0.8],[1,0.3,0]])
def alpha_loss(y_true, y_pred):
return K.categorical_crossentropy(y_true,K.dot(tf.convert_to_tensor(Q,dtype=tf.float32 ),K.reshape(y_pred,(3,1)) ))
Since you are using TensorFlow back end, this may work:
Q=np.matrix([[0, 0.7,0.2], [0,0,0.8],[1,0.3,0]])
def alpha_loss(y_true, y_pred):
# Edit: from the comments below it appears that y_pred has dim (batch_size, 3), so reshape it to have (batch_size, 3, 1)
y_pred = tf.expand_dims(y_pred, axis=-1)
q_tf = tf.convert_to_tensor(Q,dtype=tf.float32)
# Changing the shape of Q from (3,3) to (batch_size, 3, 3)
q_expanded = tf.tile(tf.expand_dims(q_tf, axis=0), multiples=[tf.shape(y_pred)[0], 1,1])
# Calculate the matrix multiplication of Q and y_pred, gives a tensor of shape (batch_size, 3, 1)
qy_pred = tf.matmul(q_expanded, y_pred)
return K.categorical_crossentropy(y_true, qy_pred)
I am trying to train a RNN by batches.
The input input size
(10, 70, 3075),
where 10 is the batch size, 70 the time dimension, 3075 are the frequency dimension.
There are three outputs whose size is
(10, 70, 1025)
each, basically 10 spectrograms with size (70,1025).
I would like to train this RNN by regression, whose structure is
input_img = Input(shape=(70,3075 ) )
x = Bidirectional(LSTM(n_hid,return_sequences=True, dropout=0.5, recurrent_dropout=0.2))(input_img)
x = Dropout(0.2)(x)
x = Bidirectional(LSTM(n_hid, dropout=0.5, recurrent_dropout=0.2))(x)
x = Dropout(0.2)(x)
o0 = ( Dense(1025, activation='sigmoid'))(x)
o1 = ( Dense(1025, activation='sigmoid'))(x)
o2 = ( Dense(1025, activation='sigmoid'))(x)
The problem is that output dense layers cannot take into account three dimensions, they want something like (None, 1025), which I don't know how to provide, unless I concatenate along the time dimension.
The following error occurs:
ValueError: Cannot feed value of shape (10, 70, 1025) for Tensor u'dense_2_target:0', which has shape '(?, ?)'
Would be the batch_shape option useful in the input layer? I have actually tried it, but I've got the same error.
In this instance the second RNN is collapsing the sequence to a single vector because by default return_sequences=False. To make the model return sequences and run the Dense layer over each timestep separately just add return_sequences=True to the second RNN as well:
x = Bidirectional(LSTM(n_hid, return_sequences=True, dropout=0.5, recurrent_dropout=0.2))(x)
The Dense layers automatically apply to the last dimension so no need to reshape afterwards.
To get the right output shape, you can use the Reshape layer:
o0 = Dense(70 * 1025, activation='sigmoid')(x)
o0 = Reshape((70, 1025)))(o0)
This will output (batch_dim, 70, 1025). You can do exactly the same for the other two outputs.