Creating BLEU loss method on tensorflow gives "No gradient provided" - python

I need to build a custom loss method based on BLEU. I'm passing my LabelEncoder in the constructor to reverse labels and predictions and calculate the bleu distance.
Here is my Loss class
class CIMCodeSuccessiveLoss(Loss):
def __init__(self, labelEncoder: LabelEncoder):
super().__init__()
self.le = labelEncoder
def bleu_score(self, true_label, pred_label):
cim_true_label = self.le.inverse_transform(true_label.numpy())
cim_pred_label = self.le.inverse_transform(pred_label.numpy())
bleu_scores = [sentence_bleu(list(one_true_label),
list(one_pred_label),
weights=(0.5, 0.25, 0.125, 0.125)) for one_true_label, one_pred_label in
zip(cim_true_label, cim_pred_label)]
return np.float32(bleu_scores)
def call(self, y_true, y_pred):
labeled_y_pred = tf.cast(tf.argmax(y_pred, axis=-1), tf.int32)
bleu = tf.py_function(self.bleu_score, (tf.reshape(y_true, [-1]), labeled_y_pred), tf.float32)
return tf.reduce_sum(tf.square(1 - bleu))
The bleu_score method is calculating the correct scores and returns a NumPy array.
when I try to return the squared sum, I get this error
raise ValueError(f"No gradients provided for any variable: {variable}.
I'm also providing the model:
inputs = tf.keras.Input(shape=(1,), dtype=tf.string)
x = vectorize_layer(inputs)
x = Embedding(vocab_size, embedding_dim, name="embedding")(x)
x = LSTM(units=32, name="lstm")(x)
outputs = Dense(classes_number, name="classification")(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs, name="first_cim_classifier")
model.summary()
# we add early stopping for our model.
early_stopping = EarlyStopping(monitor='loss', patience=2)
model.compile(
loss=CIMCodeSuccessiveLoss(le),
optimizer=tf.keras.optimizers.Adam(),
metrics=["accuracy", "crossentropy"],
run_eagerly=True)
trained_model = model.fit(np.array(x_train), np.array(y_train), batch_size=64, epochs=10,
validation_data=(np.array(x_val), np.array(y_val)),
callbacks=[early_stopping])
Any help is appreciated. Thanks in advance.

To calculate the loss function, you use the method 'tf.argmax(y_pred, axis=-1)',argmax is not differentiable and the automatic differentiation to calculate the gradients is not possible, you have to remove this method, for example (depending on your data) you can change the output layer to softmax and labels to one_hot.

The issue is, the argmax function is not a differentiable, which is problematic when including it in a loss function:
labeled_y_pred = tf.cast(tf.argmax(y_pred, axis=-1), tf.int32)
One way to workaround this is to use a differentiable approximation of the argmax function, similar to the smooth maximum function:
As β approaches infinity, this will approach the the true maximum. For your purposes, β=10 or β=100 should accomplish your goals.
In Tensorflow, this could be accomplished as follows:
def differentiable_argmax_approx(x, beta=10, axis=None):
return tf.reduce_sum(tf.cumsum(tf.ones_like(x)) * tf.exp(beta * x) / tf.reduce_sum(tf.exp(beta * x), axis=axis), axis=axis) - 1
Then changing the original line to:
labeled_y_pred = tf.cast(differentiable_argmax_approx(y_pred, axis=-1), tf.int32)
We can verify the functionality with a simple test case:
beta = 10
x = np.array([1, 2, 3, 10, 4, 5], dtype=np.float)
y = differentiable_argmax_approx(x, beta)
assert x.argmax() == y
One caveat to this approach: if the maximum value is not unique along the axis that we're applying the function to, the result will be the arithmetic mean of the indices. Providing another test case to illustrate:
beta = 10
x = np.array([1, 2, 10, 3, 10], dtype=np.float)
y = differentiable_argmax_approx(x, beta)
assert y == 3
The result is 3 here, because we have two occurrences of the maximum value (10): one at index 2, and the other at index 4. In contrast, the regular argmax function returns the first index of the maximum argument.
Another improvement would be moving more computation into Tensorflow functions. To start, instead of using sklearn's LabelEncoder, to apply a mapping in the loss function, you could use a tf.lookup.StaticHashTable to accomplish the same objective with the Tensorflow API. To convert from a LabelEncoder to a tf.lookup.StaticHashTable, you can use the following function:
def convert_label_encoder_to_static_hash_table(le: LabelEncoder,
default_value: int = -1) -> tf.lookup.StaticHashTable:
static_hash_table = tf.lookup.StaticHashTable(
tf.lookup.KeyValueTensorInitializer(
tf.convert_to_tensor(le.classes_),
tf.convert_to_tensor(le.transform(le.classes_))), default_value=default_value)
return static_hash_table
Or, for your purposes, since you're applying the inverse mapping (to go from integers -> string), you may want to swap the key and the values:
def convert_label_encoder_to_static_hash_table(le: LabelEncoder,
default_value: int = "") -> tf.lookup.StaticHashTable:
static_hash_table = tf.lookup.StaticHashTable(
tf.lookup.KeyValueTensorInitializer(
tf.convert_to_tensor(le.transform(le.classes_)),
tf.convert_to_tensor(le.classes_))), default_value=default_value)
return static_hash_table
and, in the initializer:
def __init__(self, labelEncoder: LabelEncoder):
super().__init__()
self.table = convert_label_encoder_to_static_hash_table(labelEncoder)
By operating on tf.Tensor objects, you can utilize tf.map_fn instead of using a for-loop and converting to a numpy array/lists - your loss function would become:
def bleu_score(self, true_label, pred_label):
cim_true_label = self.table[true_label]
cim_pred_label = self.table[pred_label]
bleu_scores = tf.map_fn(lambda x: sentence_bleu([str(x[0])], [str(x[1])], weights=(0.5, 0.25, 0.125, 0.125)),
elems=tf.stack([(ground_truth, pred) for ground_truth, pred in
zip(cim_pred_label, cim_true_label)],
dtype=(tf.string, tf.string),
fn_output_signature=tf.int32))
return bleu_scores
This should also mitigate the need to call tf.py_func in the loss computation, since the bleu_score function is now entirely Tensorflow operations instead of calling native Python functions.

Related

Keras Neural Net Loss Function

I've encountered a problem while writing Siamese net. Definition of the net takes as an input 2 vectors which represents 2 pieces of text. The vectors length is padded and different with respect to batches (in batch 1: vectors length = 32, in batch 2: vectors length = 64 and so on).
# model definition
def create_model(vocab_size=512, d_model=128):
def normalize(x):
norm = tf.norm(x, axis=-1, keepdims=True)
return tf.divide(x, norm)
component = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size, d_model),
tf.keras.layers.LSTM(d_model),
tf.keras.layers.Lambda(lambda x: tf.reduce_mean(x, axis=1)),
tf.keras.layers.Lambda(normalize),
])
# due to the variability in text, input shape differs with respect to batch
inputs = [tf.keras.Input(shape=(None,)) for _ in range(2)]
outputs = tf.tuple([component(ins) for ins in inputs])
return tf.keras.Model(inputs=inputs, outputs=outputs)
# loss function
class MyLoss(tf.keras.losses.Loss):
def __init__(self):
super().__init__(name='TripletLoss')
def call(self, y_true, y_pred):
# >>> HERE IS THE PROBLEM, y_pred has different shape then I'd expect,
# its shape is (batch_size,) instead of (2, batch_size)
l, r = y_pred
# compute and return loss
return loss
When calling Model#fit(loss=MyLoss(), ...) the parameter passed to the MyLoss#call is a projection of the first coordinate of the model prediction, i.e. model.predict(z) returns [x, y] where x, y are vectors with length equal to the batch size. I'd expected that y_pred passed as a parameter to Loss#call would have had that exact value, that is [x,y], but it equals to the first vector of the given list, that is x. Furthermore I've looked up at the call stack and I've spotted that before y_pred is passed to the MyLoss#call it has expected value ([x,y]) which changes to the x in the keras' Loss.__call__ body.
I tried to reshape input, but other problems arised.

Tensorflow AutoGraph Polynomial Model With Multiple Outputs

I have a tensorflow model whose outputs correspond to coefficients of multiple polynomials. Note that my model actually has another set outputs (multi-output), but I've mocked this below just by returning the input in addition to the polynomial coefficients.
I'm having a lot of trouble during the training of the model, related to tensor shapes. I've verified that the model is able to predict on sample inputs, and that the loss function works on sample outputs. But, during training, it immediately throws an error (see below)
For every input, the model takes in a fixed embedding-size input, and outputs coefficients for 2 polynomials of degree 2. For example, the output on a single input can look like:
[array([[[1, 2, 3],
[ 4, 5, 6]]]),
[...]]
corresponding to polynomials [1*x^2+2*x+3, 4*x^2+5*x+6]. Note that I've hidden the second output.
I noticed that tf.math.polyval requires a list of coefficients, making it wonky with AutoGrad. So, I implemented my own version of Horner's algorithm with pure tensors.
import numpy as np
import tensorflow as tf
import logging
import tensorflow.keras as K
#tf.function
def tensor_polyval(coeffs, x):
"""
Calculates polynomial scalars from tensor of polynomial coefficients
Tensorflow tf.math.polyval requires a list coeff, which isn't compatible with autograd
# Inputs:
- coeffs (NxD Tensor): each row of coeffs corresponds to r[0]*x^(D-1)+r[1]*x^(D-2)...+r[D]
- x: Scalar!
# Output:
- r[0]*x^(D-1)+r[1]*x^(D-2)...+r[D] for row in coeffs
"""
p = coeffs[:, 0]
for i in range(1,coeffs.shape[1]):
tf.autograph.experimental.set_loop_options(
shape_invariants=[(p, tf.TensorShape([None]))])
c = coeffs[:, i]
p = tf.add(c, tf.multiply(x, p))
return p
#tf.function
def coeffs_to_poly(coeffs, n):
# Converts a NxD array of coefficients to N evaluated polynomials at x=n
return tensor_polyval(coeffs, tf.convert_to_tensor(n))
Now here's a super-simplified example of my model, loss function and training routine:
def model_init(embedDim=8, polyDim=2,terms=2):
input = K.Input(shape=(embedDim,))
x = K.layers.Reshape((embedDim,))(input)
aCoeffs = K.layers.Dense((polyDim+1)*terms, activation='tanh')(x)
aCoeffs = K.layers.Reshape((terms, polyDim+1))(aCoeffs)
model = K.Model(inputs=input, outputs=[aCoeffs, input])
return model
def get_random_batch(batch, embedDim, dtype='float64'):
x = np.random.randn(batch, embedDim).astype(dtype)
y = np.array([1. for i in range(batch)]).astype(dtype)
return [x,
y]
#tf.function
def test_loss(y_true, y_pred, dtype=dataType):
an = tf.vectorized_map(lambda y_p: coeffs_to_poly(y_p[0],
tf.constant(5,dtype=dataType)),
y_pred)
return tf.reduce_mean(tf.reduce_mean(an,axis=-1))
embedDim=8
polyDim=2
terms=2
dataType = 'float64'
tf.keras.backend.set_floatx(dataType)
model = model_init(embedDim, polyDim, terms)
XTrain, yTrain = get_random_batch(batch=128,
embedDim=embedDim)
# Init Model
LR = 0.001
loss = test_loss
epochs = 5
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=LR), loss=loss)
hist = model.fit(XTrain,
yTrain,
batch_size=4,
epochs=epochs,
max_queue_size=10, workers=2, use_multiprocessing=True)
The error I get is related to the tensor_polyval function:
<ipython-input-15-f96bd099fe08>:3 test_loss *
an = tf.vectorized_map(lambda y_p: coeffs_to_poly(y_p[0],
<ipython-input-5-7205207d12fd>:23 coeffs_to_poly *
return tensor_polyval(coeffs, tf.convert_to_tensor(n))
<ipython-input-5-7205207d12fd>:13 tensor_polyval *
p = coeffs[:, 0]
...
ValueError: Index out of range using input dim 1; input has only 1 dims for '{{node strided_slice}} = StridedSlice[Index=DT_INT32, T=DT_DOUBLE, begin_mask=1, ellipsis_mask=0, end_mask=1, new_axis_mask=0, shrink_axis_mask=2](coeffs, strided_slice/stack, strided_slice/stack_1, strided_slice/stack_2)' with input shapes: [3], [2], [2], [2] and with computed input tensors: input[3] = <1 1>.
What's frustrating is that I'm perfectly able to predict with the model on sample inputs and also calculate a sample loss:
test_loss(yTrain[0:5],
model.predict(XTrain[0:5]),
dtype=dataType)
which runs just fine.
In the test_loss function, specifically the I'm just referring to the first output, via y_p[0]. It tries to calculate the value of the polynomials at n=5 and then outputs an average over everything (again this is just mocked code). As I understand it, y_p[1] would refer to the second output (in this case, a copy of the input). I would think the tf.vectorized_map should be operating across all outputs of the model batch, but it seems to be slicing one extra dimension??
I noticed that the code does train if I remove the output ,input in the model (making it a single output) and change y_p[0] to y_p in the test_loss. I have no idea why it's broken when adding the extra output, as my understanding of tf.vectorized_map implies that it acts separately on each element of the list y_pred
If we need the single loss function to receive multiple outputs altogether, perhaps we can concatenate them together to form one output.
In this case:
Changes to the model structure, here we pack the outputs:
def model_init(embedDim=8, polyDim=2, terms=2):
input = K.Input(shape=(embedDim, ))
x = K.layers.Reshape((embedDim, ))(input)
aCoeffs = K.layers.Dense((polyDim + 1) * terms, activation='tanh')(x)
# pack the two outputs, add flatten layers if their shapes are not batch*K
outputs = K.layers.Concatenate()([aCoeffs, input])
model = K.Model(inputs=input, outputs=outputs)
model.summary()
return model
Changes to the loss function, here we unpack the outputs:
# the loss function needs to know these
polyDim = 2
terms = 2
#tf.function
def test_loss(y_true, y_pred, dtype=dataType):
"""Loss function for flattened outputs."""
# unpack multiple outputs
offset = (polyDim + 1) * terms
aCoeffs = tf.reshape(y_pred[:, :offset], [-1, terms, polyDim + 1])
inputs = y_pred[:, offset:]
print(aCoeffs, inputs)
# do something with the two unpacked outputs, like below
an = tf.vectorized_map(
lambda y_p: coeffs_to_poly(y_p, tf.constant(5, dtype=dataType)),
aCoeffs)
return tf.reduce_mean(tf.reduce_mean(an, axis=-1))
Notice that the loss function relies on the knowledge of the original shapes of the outputs in order to restore them. Consider sub-classing tf.keras.losses.Loss.
P.S. For anyone simply need different losses for the multiple losses:
Define loss functions for the two outputs.
#tf.function
def test_loss(y_true, y_pred, dtype=dataType):
"""Loss function for output 1
(Only changed y_p[0] to y_p)"""
an = tf.vectorized_map(
lambda y_p: coeffs_to_poly(y_p, tf.constant(5, dtype=dataType)),
y_pred)
return tf.reduce_mean(tf.reduce_mean(an, axis=-1))
#tf.function
def dummy_loss(y_true, y_pred, dtype=dataType):
"""Loss function for output 2 i.e. the input, for debugging
Better use 0 insead of 1.2345"""
return tf.constant(1.2345, dataType)
Change to model.compile:
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=LR), loss=[test_loss, dummy_loss])

expand tensor in tensorflow

In TensorFlow, I intend to manipulate tensor with Taylor series of sin(x) with certain approximation terms. To do so, I have tried to manipulate the grayscale image (shape of (32,32)) with Taylor series of sin(x) and it works fine. Now I have trouble manipulating the same things that worked for a grayscale image with the shape of (32,32) to RGB image with the shape of (32,32,3), and it doesn't give me the correct array. Intuitively, I am trying to manipulate tensor with Taylor's expansion of sin(x). Can anyone show me the possible way of doing this in tensorflow? Any idea?
my attempt:
here is taylor expansion of sin(x) at x=0: 1- x + x**2/2 - x**3/6 with three expansion term.
from tensorflow.keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()
x= X_train[1,:,:,1]
k= 3
func = 'sin(x)'
new_x = np.zeros((x.shape[0], x.shape[1]*k))
new_x = new_x.astype('float32')
nn = 0
for i in range(x.shape[1]):
col_d = x[:,i].ravel()
new_x[:,nn] = col_d
if n_terms > 0:
for j in range(1,k):
if func == 'cos(x)':
new_x[:,nn+j] = new_x[:,nn+j-1]
I think I could do this more efficiently with TensorFlow but that's not quite intuitive for me how to do it. Can anyone suggest a possible workaround to make this work? Any thought?
update:
In 2dim array col_d = x[:,i].ravel() is pixel vector which flattened 2 dim array. Similarly, we could reshape 3dim array to 2 dim by this way: x.transpose(0,1,2).reshape(x.shape[1],-1) in for loop, so it could be x[:,i].transpose(0,1,2).reshape(x.shape[1],-1), but this is still not correct. I think tensorflow might have better way of doing this. How can we manipulate the tensor with taylor series of sin(x) more efficiently? Any thoughts?
goal:
Intuitively, in Taylor series of sin(x), x is tensor, and if we want only 2, 3 approximation terms of Taylor series of sin(x) for each tensor, I want to concatenate them in new tensor. How should we do it efficiently in TensorFlow? Any thoughts?
new_x = np.zeros((x.shape[0], x.shape[1]*n_terms))
This line has no meaning, why allocating space for 96 elements for 3 taylor expansion terms.
(new_x[:, 3:] == 0.0).all() = True # check
For pixelwise taylor expansion with n-terms
def sin_exp_step(x, i):
c1 = 2 * i + 1
c2 = (-1) ** i / np.math.factorial(c1)
t = c2 * (x ** c1)
return t
# validate
x = 45.0
x = (np.pi / 180.0) * x
y = np.sin(x)
approx_y = 0
for i in range(n_terms):
approx_y += sin_exp_step(x, i)
abs(approx_y - y) < 1e-8
x= X_train[1,:,:,:]
n_terms = 3
func = 'sin(x)'
new_x = np.zeros((*x.shape, n_terms))
for i in range(0, n_terms):
if func == 'sin(x)': # sin(x)
new_x[..., i] += sin_exp_step(x, i)
Commonly numerical approximation methods are being avoided, as they are computationally expensive (i.e. factorial) and less stable, so gradient based optimization usually is the best, for a higher order derivatives algorithms such BFGS and LBFGS used to approximate hessian matrix (2nd order derivative). Optimizers such Adam & SGD are sufficient and comes with much less computational consumption. Using neural network, we might be able to find a much better expansions.
Tensorflow solution for n-terms expansion
import numpy as np
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.layers import Input, LocallyConnected2D
from tensorflow.keras.models import Model
from tensorflow.keras import backend as K
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train = tf.constant(x_train, dtype=tf.float32)
x_test = tf.constant(x_test, dtype=tf.float32)
def expansion_approx_of(func):
def reconstruction_loss(y_true, y_pred):
loss = (y_pred - func(y_true)) ** 2
loss = 0.5 * K.mean(loss)
return loss
return reconstruction_loss
class Expansion2D(LocallyConnected2D): # n-terms expansion layer
def __init__(self, i_shape, n_terms, kernel_size=(1, 1), *args, **kwargs):
if len(i_shape) != 3:
raise ValueError('...')
self.i_shape = i_shape
self.n_terms = n_terms
filters = self.n_terms * self.i_shape[-1]
super(Expansion2D, self).__init__(filters=filters, kernel_size=kernel_size,
use_bias=False, *args, **kwargs)
def call(self, inputs):
shape = (-1, self.i_shape[0], self.i_shape[1], self.i_shape[-1], self.n_terms)
out = super().call(inputs)
expansion = tf.reshape(out, shape)
out = tf.math.reduce_sum(expansion, axis=-1)
return out, expansion
inputs = Input(shape=(32, 32, 3))
# expansion: might be a taylor expansion or something better.
out, expansion = Expansion2D(i_shape=(32, 32, 3), n_terms=3)(inputs)
model = Model(inputs, [out, expansion])
opt = tf.keras.optimizers.Adam(learning_rate=0.0001, beta_1=0.9, beta_2=0.999)
loss = expansion_approx_of(K.sin)
model.compile(optimizer=opt, loss=[loss])
model.summary()
model.fit(x_train, x_train, batch_size=1563, epochs=100)
x_pred, x_exp = model.predict_on_batch(x_test[:32])
print((x_exp[0].sum(axis=-1) == x_pred[0]).all())
err = abs(x_pred - np.sin(x_test[0])).mean()
print(err)
Put three expansion terms into a tensor at axis=1
x = tf.ones([8, 32, 32, 3], tf.float32) * 0.5 # example batchsize=8, imageshape=[32, 32, 3]
x = tf.stack([x, - (1/6) * tf.math.pow(x, 3), (1/120) * tf.math.pow(x, 5)], axis=1) # expansion of three terms of sin(x), [8, 3, 32, 32, 3]
If you would go with tf.keras Functional API or Sequential API, you might make a Keras custom layer
tf.math.pow
tf.stack
Edit: In the first answer, I recommended tf.keras.layers.Lambda, but it might not work with tf.math.pow or tf.stack (I haven't tried). You would go with Keras custom layer.
I think you can do this for 1D tensor as:
def expend_func(x):
p1 = x
p2 = x - ((x**2)/2)
new_x = K.concatenate([p1, p2], axis=1)
return new_x
note that x is your 1D tensor, new_x with two terms. If you need new_x with three terms, you might modify expend_funcs with three terms. for 2D tensor, you should use tf.stack() which is not the elegant way but that might help.

Class Weight not supported for 3+ dimensional targets - Python Tensorflow [duplicate]

Here's the code I'm working with (pulled from Kaggle mostly):
inputs = Input((IMG_HEIGHT, IMG_WIDTH, IMG_CHANNELS))
...
outputs = Conv2D(4, (1, 1), activation='sigmoid') (c9)
model = Model(inputs=[inputs], outputs=[outputs])
model.compile(optimizer='adam', loss='dice', metrics=[mean_iou])
results = model.fit(X_train, Y_train, validation_split=0.1, batch_size=8, epochs=30, class_weight=class_weights)
I have 4 classes that are very imbalanced. Class A equals 70%, class B = 15%, class C = 10%, and class D = 5%. However, I care most about class D. So I did the following type of calculations: D_weight = A/D = 70/5 = 14 and so on for the weight for class B and A. (if there are better methods to select these weights, then feel free)
In the last line, I'm trying to properly set class_weights and I'm doing it as so: class_weights = {0: 1.0, 1: 6, 2: 7, 3: 14}.
However, when I do this, I get the following error.
class_weight not supported for 3+ dimensional targets.
Is it possible that I add a dense layer after the last layer and just use it as a dummy layer so I can pass the class_weights and then only use the output of the last conv2d layer to do the prediction?
If this is not possible, how would I modify the loss function (I'm aware of this post, however, just passing in the weights in to the loss function won't cut it, because the loss function is called separately for each class) ? Currently, I'm using the following loss function:
def dice_coef(y_true, y_pred):
smooth = 1.
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection = K.sum(y_true_f * y_pred_f)
return (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)
def bce_dice_loss(y_true, y_pred):
return 0.5 * binary_crossentropy(y_true, y_pred) - dice_coef(y_true, y_pred)
But I don't see any way in which I can input class weights. If someone wants the full working code see this post. But remember to change the final conv2d layer's num classes to 4 instead of 1.
You can always apply the weights yourself.
The originalLossFunc below you can import from keras.losses.
The weightsList is your list with the weights ordered by class.
def weightedLoss(originalLossFunc, weightsList):
def lossFunc(true, pred):
axis = -1 #if channels last
#axis= 1 #if channels first
#argmax returns the index of the element with the greatest value
#done in the class axis, it returns the class index
classSelectors = K.argmax(true, axis=axis)
#if your loss is sparse, use only true as classSelectors
#considering weights are ordered by class, for each class
#true(1) if the class index is equal to the weight index
classSelectors = [K.equal(i, classSelectors) for i in range(len(weightsList))]
#casting boolean to float for calculations
#each tensor in the list contains 1 where ground true class is equal to its index
#if you sum all these, you will get a tensor full of ones.
classSelectors = [K.cast(x, K.floatx()) for x in classSelectors]
#for each of the selections above, multiply their respective weight
weights = [sel * w for sel,w in zip(classSelectors, weightsList)]
#sums all the selections
#result is a tensor with the respective weight for each element in predictions
weightMultiplier = weights[0]
for i in range(1, len(weights)):
weightMultiplier = weightMultiplier + weights[i]
#make sure your originalLossFunc only collapses the class axis
#you need the other axes intact to multiply the weights tensor
loss = originalLossFunc(true,pred)
loss = loss * weightMultiplier
return loss
return lossFunc
For using this in compile:
model.compile(loss= weightedLoss(keras.losses.categorical_crossentropy, weights),
optimizer=..., ...)
Changing the class balance directly on the input data
You can change the balance of the input samples too.
For instance, if you have 5 samples from class 1 and 10 samples from class 2, pass the samples for class 5 twice in the input arrays.
.
Using the sample_weight argument.
Instead of working "by class", you can also work "by sample".
Create an array of weights for each sample in your input array: len(x_train) == len(weights)
And fit passing this array to the sample_weight argument.
(If it's fit_generator, the generator will have to return the weights along with the train/true pairs: return/yield inputs, targets, weights)

Custom loss function implementation

I'm trying to implement a new loss function of my own.
When I tried to debug it (or print in it) I've noticed it is called only once at the model creating section of the code.
How can I know what y_pred and y_true contains (shapes, data etc..) if I cannot run my code into this function while fitting the model?
I wrote this loss function:
def my_loss(y_true, y_pred):
# run over the sequence, jump by 3
# calc the label
# if the label incorrect punish
y_pred = K.reshape(y_pred, (1, 88, 3))
y_pred = K.argmax(y_pred, axis=1)
zero_count = K.sum(K.clip(y_pred, 0, 0))
one_count = K.sum(K.clip(y_pred, 1, 1))
two_count = K.sum(K.clip(y_pred, 2, 2))
zero_punish = 1 - zero_count / K.count_params(y_true)
one_punish = 1- one_count/ K.count_params(y_true)
two_punish = 1- two_count/ K.count_params(y_true)
false_arr = K.not_equal(y_true, y_pred)
mask0 = K.equal(y_true, K.zeros_like(y_pred))
mask0_miss = K.dot(false_arr, mask0) * zero_punish
mask1 = K.equal(y_true, K.ones_like(y_pred))
mask1_miss = K.dot(false_arr, mask1) * one_punish
mask2 = K.equal(y_true, K.zeros_like(y_pred)+2)
mask2_miss = K.dot(false_arr, mask2) * two_punish
return K.sum(mask0_miss) + K.sum(mask1_miss) + K.sum(mask2_miss)
It fails on:
theano.gof.fg.MissingInputError: A variable that is an input to the graph was
neither provided as an input to the function nor given a value. A chain of
variables leading from this input to an output is [/dense_1_target, Shape.0].
This chain may not be unique
Backtrace when the variable is created:
How can I fix it?
You have to understand that Theano is a symbolic language. For example, when we define the following loss function in Keras:
def myLossFn(y_true, y_pred):
return K.mean(K.abs(y_pred - y_true), axis=-1)
Theano is just making a symbolic rule in a computational graph, which would be executed when it gets values i.e. when you train the model with some mini-batches.
As far as your question on how to debug your model goes, you can use theano.function for that. Now, you want to know if your loss calculation is correct. You do the following.
You can implement the python/numpy version of your loss function. Pass two random vectors to your numpy-loss-function and get a number. To verify if theano gives nearly identical result, define something as follows.
import theano
from theano import tensor as T
from keras import backend as K
Y_true = T.frow('Y_true')
Y_pred = T.fcol('Y_pred')
out = K.mean(K.abs(Y_pred - Y_true), axis=-1)
f = theano.function([Y_true, Y_pred], out)
# creating some values
y_true = np.random.random((10,))
y_pred = np.random.random((10,))
numpy_loss_result = np.mean(np.abs(y_true-y_pred))
theano_loss_result = f(y_true, y_pred)
# check if both are close enough
print numpy_loss_result-theano_loss_result # should be less than 1e-5
Basically, theano.function is a way to put values and evaluate those symbolic expressions. I hope this helps.

Categories