I'm trying to train a model that takes n values as input and output n values. The problem is that n can be from 1 to 700. So I build a network with 700 as input and 700 as output. The extra inputs and outputs are set to zero.
When training the model, I don't care about if the extra outputs are accurate or not. So I tried to define my own loss function as follows:
def mse_truncate(y_true, y_pred):
def fn(x):
return tf.cond(x < 0.01,lambda: 0.0,lambda: 1.0)
#Ignore the square error if y_true[i] is near zero
sgn = tf.map_fn(fn,y_true)
return K.mean(sgn * K.square(y_true-y_pred),axis=-1)
This function works on console.
But when I compile the model, I get an error:
model.compile(optimizer='sgd',loss=mse_truncate, metrics=['accuracy'])
ValueError: Shape must be rank 0 but is rank 1 for 'loss_5/dense_2_loss/map/while/cond/Switch' (op: 'Switch') with input shapes: [?], [?].
Can someone tell me what's wrong here?
Or are there better ways to handle the variable length input and output?
Note:
More on the problem, the input is a sequence(length <= 700) and the output is the distance between the first element and each element in the sequence.
You could use tf.where and tf.gather to only take those values that you care about into consideration, e.g.:
indices = tf.where(tf.greater(y_true, 0.01)) # or `tf.less`, `tf.equal` etc.
loss = K.mean(K.square(tf.gather(y_true, indices) - tf.gather(y_pred, indices))))
Related
I have defined a loss function like this:
def my_loss(y_recon, y_real, brain_hidden, brain_real):
loss = torch.mean((y_recon - y_real)**2 + (brain_hidden - brain_real)**2
return loss
y_recon's shape (and y_real) is batch_size*300 and brain_hidden's shape (and brain_real) is batch_size*64
I need to minimize these two both elements. However, this way I get the error
The size of tensor a (300) must match the size of tensor b (64) at
non-singleton dimension 1
How can I update the loss function to avoid this error?
I want to write a custom loss function for a Keras network.
In this function, the result depends not only on y_actual and y_pred, but also on some other value that I extract from database using the value of y_actual.
I wrote the following function. In this function, I want to involve the label of y_actual that I retrieve from database in the loss calculation.
def custom_loss(y_actual,y_pred):
dataset_whole=pd.read_sql("select * from Records", con=db)
dataset_features_only=dataset_whole.drop(['label'],axis=1)
dataset_whole_np=dataset_whole.values
dataset_whole_tf=tf.constant(dataset_whole_np)
dataset_features_np=dataset_features_only.values
dataset_features_tf=tf.constant(dataset_features_np)
index=tf.where(tf.equal(dataset_features_tf, y_actual))
row=dataset_whole_tf[index,:]
label=row['label']
return ((y_actual-y_pred)-tf.Variable(label,tf.float64))**
I see this error message in the row=dataset_whole_tf[index,:] line:
ValueError: Shapes must be equal rank, but are 2 and 0 From merging shape 0 with other shapes. for 'loss/dense_7_loss/strided_slice/stack_1' (op: 'Pack') with input shapes: [?,2], [].
I am new to machine learning, python and tensorflow. I am used to code in C++ or C# and it is difficult for me to use tf.backend.
I am trying to write a custom loss function for an LSTM network that tries to predict if the next element of a time series will be positive or negative. My code runs nicely with the binary_crossentropy loss function. I want now to improve my network having a loss function that adds the value of the next time series element if the predicted probability is greater than 0.5 and substracts it if the prob is less or equal to 0.5.
I tried something like this:
def customLossFunction(y_true, y_pred):
temp = 0.0
for i in range(0, len(y_true)):
if(y_pred[i] > 0):
temp += y_true[i]
else:
temp -= y_true[i]
return temp
Obviously, dimensions are wrong but since I cannot step into my function while debugging, it is very hard to get a grasp of dimensions here.
Can you please tell me if I can use an element-by-element function? If yes, how? And if not, could you help me with tf.backend?
Thanks a lot
From keras backend functions, you have the function greater that you can use:
import keras.backend as K
def customLossFunction(yTrue,yPred)
greater = K.greater(yPred,0.5)
greater = K.cast(greater,K.floatx()) #has zeros and ones
multiply = (2*greater) - 1 #has -1 and 1
modifiedTrue = multiply * yTrue
#here, it's important to know which dimension you want to sum
return K.sum(modifiedTrue, axis=?)
The axis parameter should be used according to what you want to sum.
axis=0 -> batch or sample dimension (number of sequences)
axis=1 -> time steps dimension (if you're using return_sequences = True until the end)
axis=2 -> predictions for each step
Now, if you have only a 2D target:
axis=0 -> batch or sample dimension (number of sequences)
axis=1 -> predictions for each sequence
If you simply want to sum everything for every sequence, then just don't put the axis parameter.
Important note about this function:
Since it contains only values from yTrue, it cannot backpropagate to change the weights. This will lead to a "none values not supported" error or something very similar.
Although yPred (the one that is connected to the model's weights) is used in the function, it's used only for getting a true x false condition, which is not differentiable.
I'm working in Keras with TensorFlow under the hood. I have a deep neural model (predictive autoencoder). I'm doing something somewhat similar to this: https://arxiv.org/abs/1612.00796 -- I'm trying to understand influence of variables in a given layer on the output.
For this I need to find 2nd derivative (Hessian) of the loss (L) with respect to output of particular layer (s):
Diagonal entries would be sufficient. L is a scalar, s is 1 by n.
What I tried first:
dLds = tf.gradients(L, s) # works fine to get first order derivatives
d2Lds2 = tf.gradients(dLds, s) # throws an error
TypeError: Second-order gradient for while loops not supported.
I also tried:
d2Lds2 = tf.hessians(L, s)
ValueError: Computing hessians is currently only supported for one-dimensional tensors. Element number 0 of `xs` has 2 dimensions.
I cannot change shape of s cause it's a part of neural network (LSTM's state). The first dimension (batch_size) is already set to 1, I don't think I can get rid of it.
I cannot reshape s because it breaks flow of the gradients, e.g.:
tf.gradients(L, tf.reduce_sum(s, axis=0))
gives:
[None]
Any ideas on what can I do in this situation?
This is not supported at the moment. See this report.
I'm trying to apply the concept of distillation, basically to train a new smaller network to do the same as the original one but with less computation.
I have the softmax outputs for every sample instead of the logits.
My question is, how is the categorical cross entropy loss function implemented?
Like it takes the maximum value of the original labels and multiply it with the corresponded predicted value in the same index, or it does the summation all over the logits (One Hot encoding) as the formula says:
As an answer to "Do you happen to know what the epsilon and tf.clip_by_value is doing?",
it is ensuring that output != 0, because tf.log(0) returns a division by zero error.
(I don't have points to comment but thought I'd contribute)
I see that you used the tensorflow tag, so I guess this is the backend you are using?
def categorical_crossentropy(output, target, from_logits=False):
"""Categorical crossentropy between an output tensor and a target tensor.
# Arguments
output: A tensor resulting from a softmax
(unless `from_logits` is True, in which
case `output` is expected to be the logits).
target: A tensor of the same shape as `output`.
from_logits: Boolean, whether `output` is the
result of a softmax, or is a tensor of logits.
# Returns
Output tensor.
This code comes from the keras source code. Looking directly at the code should answer all your questions :) If you need more info just ask !
EDIT :
Here is the code that interests you :
# Note: tf.nn.softmax_cross_entropy_with_logits
# expects logits, Keras expects probabilities.
if not from_logits:
# scale preds so that the class probas of each sample sum to 1
output /= tf.reduce_sum(output,
reduction_indices=len(output.get_shape()) - 1,
keep_dims=True)
# manual computation of crossentropy
epsilon = _to_tensor(_EPSILON, output.dtype.base_dtype)
output = tf.clip_by_value(output, epsilon, 1. - epsilon)
return - tf.reduce_sum(target * tf.log(output),
reduction_indices=len(output.get_shape()) - 1)
If you look at the return, they sum it... :)