I'm trying to implement a custom loss function on my neural network, which would look like this, if tensors were, instead, numpy arrays:
def custom_loss(y_true, y_pred):
activated = y_pred[y_true > 1]
return np.abs(activated.mean() - activated.std()) / activated.std()
The y's have a shape of (batch_size, 1); that's to say, it's a scalar output for each input row.
obs: this post (Converting Tensor to np.array using K.eval() in Keras returns InvalidArgumentError) gave me an initial direction for which to walk on.
Edit:
This is a reproducible setup for which I'm trying to apply the custom loss function:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
X = np.random.normal(0, 1, (256, 5))
Y = np.random.normal(0, 1, (256, 1))
model = keras.Sequential([
layers.Dense(1),
])
model.compile(optimizer='adam', loss=custom_loss)
model.fit(X, Y)
The .fit() on the last line throws the error AttributeError: 'Tensor' object has no attribute 'mean', if I define custom_loss as stated above on my question.
It's a simple catch. You can use your custom loss as follows
def custom_loss(y_true, y_pred):
activated = y_pred[y_true > 1]
return tf.math.abs(tf.reduce_mean(activated) -
tf.math.reduce_std(activated)) / tf.math.reduce_std(activated)
or if you want to use tf.boolean_mask(tensor, mask, ..) then you need to ensure that the mask condition is in the shape of (None,) or 1D. And if we apply tf.where(y_true>1) it will produce a 2D tensor that needs to be reshaped in your case.
def custom_loss(y_true, y_pred):
activated = tf.boolean_mask(y_pred, tf.reshape(tf.where(y_true>1),[-1]) )
return tf.math.abs(tf.reduce_mean(activated) -
tf.math.reduce_std(activated)) / tf.math.reduce_std(activated)
Have you tried writing it in tensorflow as had gradient problems? Or is this just how to do so in tensorflow? -- Don't worry, I won't give you a classic toxic SO response!
I would try something like this (not tested, but seems along the right track):
def custom_loss(y_true, y_pred):
activated = tf.boolean_mask(y_pred, tf.where(y_true>1))
return tf.math.abs(tf.reduce_mean(activated) - tf.math.reduce_std(activated)) / tf.math.reduce_std(activated))
You may need to play around with dimensions in there, since all of those functions allow for specifying the dimensions to work with.
Also, you will lose the loss function when you save the model, unless you subclass the general loss function. That may be more detail than you are looking for, but if you have problems saving and loading the model, let me know.
Related
I am trying to write a custom loss function for triplet loss(using keras), which takes 3 arguments anchor,positive and negative. The triplets are generated using gru layer and the arguments for model.fit is provided through data generators.
The problem I am facing is while training :
TypeError: Cannot convert a symbolic Keras input/output to a numpy array.
This error may indicate that you're trying to pass a symbolic value to a NumPy
call, which is not supported. Or, you may be trying to pass Keras symbolic
inputs/outputs to a TF API that does not register dispatching, preventing Keras from automatically
converting the API call to a lambda layer in the Functional Model.
Implementation of loss function
def batch_hard_triplet_loss(self, anchor_embeddings, pos_embeddings, neg_embeddings, margin):
def loss(y_true, y_pred):
'''print(anchor_embeddings)
print(pos_embeddings)
print(neg_embeddings)'''
# distance between the anchor and the positive
pos_dist = K.sum(K.square(anchor_embeddings - pos_embeddings), axis=-1)
max_pos_dist = K.max(pos_dist)
# distance between the anchor and the negative
neg_dist = K.sum(K.square(anchor_embeddings - neg_embeddings), axis=-1)
max_neg_dist = K.min(neg_dist)
# compute loss
basic_loss = max_pos_dist - max_neg_dist + margin
tr_loss = K.maximum(basic_loss, 0.0)
return tr_loss
#return triplet_loss
return loss
Can this be because keras is expecting array as returned loss but I am providing a scalar value?
After the introduction of Tensorflow 2.0 the scipy interface (tf.contrib.opt.ScipyOptimizerInterface) has been removed. However, I would still like to use the scipy optimizer scipy.optimize.minimize(method=’L-BFGS-B’) to train a neural network (keras model sequential). In order for the optimizer to work, it requires as input a function fun(x0) with x0 being an array of shape (n,). Therefore, the first step would be to "flatten" the weights matrices to obtain a vector with the required shape. To this end, I modified the code provided by https://pychao.com/2019/11/02/optimize-tensorflow-keras-models-with-l-bfgs-from-tensorflow-probability/. This provides a function factory meant to create such a function fun(x0). However, the code does not seem to work and the loss function does not decrease. I would be really grateful if someone could help me work this out.
Here the piece of code I am using:
func = function_factory(model, loss_function, x_u_train, u_train)
# convert initial model parameters to a 1D tf.Tensor
init_params = tf.dynamic_stitch(func.idx, model.trainable_variables)
init_params = tf.cast(init_params, dtype=tf.float32)
# train the model with L-BFGS solver
results = scipy.optimize.minimize(fun=func, x0=init_params, method='L-BFGS-B')
def loss_function(x_u_train, u_train, network):
u_pred = tf.cast(network(x_u_train), dtype=tf.float32)
loss_value = tf.reduce_mean(tf.square(u_train - u_pred))
return tf.cast(loss_value, dtype=tf.float32)
def function_factory(model, loss_f, x_u_train, u_train):
"""A factory to create a function required by tfp.optimizer.lbfgs_minimize.
Args:
model [in]: an instance of `tf.keras.Model` or its subclasses.
loss [in]: a function with signature loss_value = loss(pred_y, true_y).
train_x [in]: the input part of training data.
train_y [in]: the output part of training data.
Returns:
A function that has a signature of:
loss_value, gradients = f(model_parameters).
"""
# obtain the shapes of all trainable parameters in the model
shapes = tf.shape_n(model.trainable_variables)
n_tensors = len(shapes)
# we'll use tf.dynamic_stitch and tf.dynamic_partition later, so we need to
# prepare required information first
count = 0
idx = [] # stitch indices
part = [] # partition indices
for i, shape in enumerate(shapes):
n = np.product(shape)
idx.append(tf.reshape(tf.range(count, count+n, dtype=tf.int32), shape))
part.extend([i]*n)
count += n
part = tf.constant(part)
def assign_new_model_parameters(params_1d):
"""A function updating the model's parameters with a 1D tf.Tensor.
Args:
params_1d [in]: a 1D tf.Tensor representing the model's trainable parameters.
"""
params = tf.dynamic_partition(params_1d, part, n_tensors)
for i, (shape, param) in enumerate(zip(shapes, params)):
model.trainable_variables[i].assign(tf.cast(tf.reshape(param, shape), dtype=tf.float32))
# now create a function that will be returned by this factory
def f(params_1d):
"""
This function is created by function_factory.
Args:
params_1d [in]: a 1D tf.Tensor.
Returns:
A scalar loss.
"""
# update the parameters in the model
assign_new_model_parameters(params_1d)
# calculate the loss
loss_value = loss_f(x_u_train, u_train, model)
# print out iteration & loss
f.iter.assign_add(1)
tf.print("Iter:", f.iter, "loss:", loss_value)
return loss_value
# store these information as members so we can use them outside the scope
f.iter = tf.Variable(0)
f.idx = idx
f.part = part
f.shapes = shapes
f.assign_new_model_parameters = assign_new_model_parameters
return f
Here model is an object tf.keras.Sequential.
Thank you in advance for any help!
Changing from tf1 to tf2 I was exposed to the same question and after a little bit of experimenting I found the solution below that shows how to establish the interface between a function decorated with tf.function and a scipy optimizer. The important changes compared to the question are:
As mentioned by Ives scipy's lbfgs
needs to get function value and gradient, so you need to provide a function that delivers both and then set jac=True
scipy's lbfgs is a Fortran function that expects the interface to provide np.float64 arrays while tensorflow tf.function uses tf.float32.
So one has to cast input and output.
I provide an example of how this can be done for a toy problem here below.
import tensorflow as tf
import numpy as np
import scipy.optimize as sopt
def model(x):
return tf.reduce_sum(tf.square(x-tf.constant(2, dtype=tf.float32)))
#tf.function
def val_and_grad(x):
with tf.GradientTape() as tape:
tape.watch(x)
loss = model(x)
grad = tape.gradient(loss, x)
return loss, grad
def func(x):
return [vv.numpy().astype(np.float64) for vv in val_and_grad(tf.constant(x, dtype=tf.float32))]
resdd= sopt.minimize(fun=func, x0=np.ones(5),
jac=True, method='L-BFGS-B')
print("info:\n",resdd)
displays
info:
fun: 7.105427357601002e-14
hess_inv: <5x5 LbfgsInvHessProduct with dtype=float64>
jac: array([-2.38418579e-07, -2.38418579e-07, -2.38418579e-07, -2.38418579e-07,
-2.38418579e-07])
message: b'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
nfev: 3
nit: 2
status: 0
success: True
x: array([1.99999988, 1.99999988, 1.99999988, 1.99999988, 1.99999988])
Benchmark
For comparing speed
I use
the lbfgs optimizer for a style transfer
problem (see here for the network). Note, that for this problem the network parameters are fixed and the input signal is adapted. As the optimized parameters (the input signal) are 1D the function factory is not needed.
I compare four implementations
TF1.12: TF1 with with ScipyOptimizerInterface
TF2.0 (E): the approach above without using tf.function decorators
TF2.0 (G): the approach above using tf.function decorators
TF2.0/TFP: using the lbfgs minimizer from
tensorflow_probability
For this comparison the optimization is stopped after 300 iterations (generally for convergence the problem requires 3000 iterations)
Results
Method runtime(300it) final loss
TF1.12 240s 0.045 (baseline)
TF2.0 (E) 299s 0.045
TF2.0 (G) 233s 0.045
TF2.0/TFP 226s 0.053
The TF2.0 eager mode (TF2.0(E)) works correctly but is about 20% slower than the TF1.12 baseline version. TF2.0(G) with tf.function works fine and is marginally faster than TF1.12, which is a good thing to know.
The optimizer from tensorflow_probability (TF2.0/TFP) is slightly faster than TF2.0(G) using scipy's lbfgs but does not achieve the same error reduction. In fact the decrease of the loss over time is not monotonous which seems a bad sign. Comparing the two implementations of lbfgs (scipy and tensorflow_probability=TFP) it is clear that the Fortran code in scipy is significantly more complex.
So either the simplification of the algorithm in TFP is harming here or even the fact that TFP is performing all calculations in float32 may also be a problem.
Here is a simple solution using a library (autograd_minimize) that I wrote building on the answer of Roebel:
import tensorflow as tf
from autograd_minimize import minimize
def rosen_tf(x):
return tf.reduce_sum(100.0*(x[1:] - x[:-1]**2.0)**2.0 + (1 - x[:-1])**2.0)
res = minimize(rosen_tf, np.array([0.,0.]))
print(res.x)
>>> array([0.99999912, 0.99999824])
It also works with keras models as shown with this naive example of linear regression:
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers
from autograd_minimize.tf_wrapper import tf_function_factory
from autograd_minimize import minimize
import tensorflow as tf
#### Prepares data
X = np.random.random((200, 2))
y = X[:,:1]*2+X[:,1:]*0.4-1
#### Creates model
model = keras.Sequential([keras.Input(shape=2),
layers.Dense(1)])
# Transforms model into a function of its parameter
func, params = tf_function_factory(model, tf.keras.losses.MSE, X, y)
# Minimization
res = minimize(func, params, method='L-BFGS-B')
print(res.x)
>>> [array([[2.0000016 ],
[0.40000062]]), array([-1.00000164])]
I guess SciPy does not know how to calculate gradients of TensorFlow objects. Try to use the original function factory (i.e., the one also returns the gradients together after loss), and set jac=True in scipy.optimize.minimize.
I tested the python code from the original Gist and replaced tfp.optimizer.lbfgs_minimize with SciPy optimizer. It worked with BFGS method:
results = scipy.optimize.minimize(fun=func, x0=init_params, jac=True, method='BFGS')
jac=True means SciPy knows that func also returns gradients.
For L-BFGS-B, however, it's tricky. After some effort, I finally made it work. I have to comment out the #tf.function lines and let func return grads.numpy() instead of the raw TF Tensor. I guess that's because the underlying implementation of L-BFGS-B is a Fortran function, so there might be some issue converting data from tf.Tensor -> numpy array -> Fortran array. And forcing the function func to return the ndarray version of the gradients resolves the problem. But then it's not possible to use #tf.function.
(Similar Question to: Is there a tf.keras.optimizers implementation for L-BFGS?)
While this is not from anywhere as legit as tf.contrib, it's an implementation L-BFGS (and any other scipy.optimize.minimize solver) for your consideration in case it fits your use case:
https://pypi.org/project/kormos/
https://github.com/mbhynes/kormos
The package has models that extend keras.Model and keras.Sequential models, and can be compiled with .compile(..., optimizer="L-BFGS-B") to use L-BFGS in TF2, or compiled with any of the other standard optimizers (because flipping between stochastic & deterministic should be easy!):
kormos.models.BatchOptimizedModel
kormos.models.BatchOptimizedSequentialModel
I would like to implement a custom loss function shown in this paper with Keras.
My loss is not going down and I have the feeling that it is because of the implementation of the loss: It doesn't use Keras' backend for everything but rather a combination of some K functions, simple operations and numpy:
def l1_matrix_norm(M):
return K.cast(K.max(K.sum(K.abs(M), axis=0)), 'float32')
def reconstruction_loss(patch_size, mask, center_weight=0.9):
mask = mask.reshape(1, *mask.shape).astype('float32')
mask_inv = 1 - mask
def loss(y_true, y_pred):
diff = y_true - y_pred
center_part = mask * diff
center_part_normed = l1_matrix_norm(center_part)
surr_part = mask_inv * diff
surr_part_normed = l1_matrix_norm(surr_part)
num_pixels = np.prod(patch_size).astype('float32')
numerator = center_weight * center_part_normed + (1 - center_weight) * surr_part_normed
return numerator / num_pixels
return loss
Is it necessary to use Keras functions, if so for which type of operations do I need it (I saw some code where simple operations such as addition don't use K).
Also if I have to use a Keras backend function, can I instead use TensorFlows function?
NN training depends on being able to compute the derivatives of all functions in the graph including the loss function. Keras backend functions and TensorFlow functions are annotated such that tensorflow (or other backend) automatically known how to compute gradients. That is not the case for numpy functions. It is possible to use non tf functions, if you do know how to compute their gradients manually (see tf.custom_gradients). In general, I would recommend with sticking with backend functions preferably and then tensorflow functions when necessary.
I know there are many questions treating custom loss functions in Keras but I've been unable to answer this even after 3 hours of googling.
Here is a very simplified example of my problem. I realize this example is pointless but I provide it for simplicity, I obviously need to implement something more complicated.
from keras.backend import binary_crossentropy
from keras.backend import mean
def custom_loss(y_true, y_pred):
zeros = tf.zeros_like(y_true)
index_of_zeros = tf.where(tf.equal(zeros, y_true))
ones = tf.ones_like(y_true)
index_of_ones = tf.where(tf.equal(ones, y_true))
zero = tf.gather(y_pred, index_of_zeros)
one = tf.gather(y_pred, index_of_ones)
loss_0 = binary_crossentropy(tf.zeros_like(zero), zero)
loss_1 = binary_crossentropy(tf.ones_like(one), one)
return mean(tf.concat([loss_0, loss_1], axis=0))
I do not understand why training the network with the above loss function on a two class dataset does not yield the same result as training with the built in binary-crossentropy loss function.
Thank you!
EDIT: I edited the code snippet to include the mean as per comments below. I still get the same behavior however.
I finally figured it out. The tf.where function behaves very differently when the shape is "unknown".
To fix the snippet above simply insert the following lines right after the function is declared:
y_pred = tf.reshape(y_pred, [-1])
y_true = tf.reshape(y_true, [-1])
I'm trying to get the activation values for each layer in this baseline autoencoder built using Keras since I want to add a sparsity penalty to the loss function based on the Kullbach-Leibler (KL) divergence, as shown here, pag. 14.
In this scenario, I'm going to calculate the KL divergence for each layer and then sum all of them with the main loss function, e.g. mse.
I therefore made a script in Jupyter where I do that but all the time, when I try to compile I get ZeroDivisionError: integer division or modulo by zero.
This is the code
import numpy as np
from keras.layers import Conv2D, Activation
from keras.models import Sequential
from keras import backend as K
from keras import losses
x_train = np.random.rand(128,128).astype('float32')
kl = K.placeholder(dtype='float32')
beta = K.constant(value=5e-1)
p = K.constant(value=5e-2)
# encoder
model = Sequential()
model.add(Conv2D(filters=16,kernel_size=(4,4),padding='same',
name='encoder',input_shape=(128,128,1)))
model.add(Activation('relu'))
# get the average activation
A = K.mean(x=model.output)
# calculate the value for the KL divergence
kl = K.concatenate([kl, losses.kullback_leibler_divergence(p, A)],axis=0)
# decoder
model.add(Conv2D(filters=1,kernel_size=(4,4),padding='same', name='encoder'))
model.add(Activation('relu'))
B = K.mean(x=model.output)
kl = K.concatenate([kl, losses.kullback_leibler_divergence(p, B)],axis=0)
Here seems the cause
/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py in _normalize_axis(axis, ndim)
989 else:
990 if axis is not None and axis < 0:
991 axis %= ndim <----------
992 return axis
993
so there might be something wrong in the mean calculation. If I print the value I get
Tensor("Mean_10:0", shape=(), dtype=float32)
that is quite strange because the weights and the biases are non-zero initialised. Thus, there might be something wrong in the way of getting the activation values either.
I really would not know hot to fix it, I'm not much of a skilled programmer.
Could anyone help me in understanding where I'm wrong?
First, you shouldn't be doing calculations outside layers. The model must keep track of all calculations.
If you need a specific calculation to be done in the middle of the model, you should use a Lambda layer.
If you need that a specific output be used in the loss function, you should split your model for that output and do calculations inside a custom loss function.
Here, I used Lambda layer to calculate the mean, and a customLoss to calculate the kullback-leibler divergence.
import numpy as np
from keras.layers import *
from keras.models import Model
from keras import backend as K
from keras import losses
x_train = np.random.rand(128,128).astype('float32')
kl = K.placeholder(dtype='float32') #you'll probably not need this anymore, since losses will be treated individually in each output.
beta = beta = K.constant(value=5e-1)
p = K.constant(value=5e-2)
# encoder
inp = Input((128,128,1))
lay = Convolution2D(filters=16,kernel_size=(4,4),padding='same', name='encoder',activation='relu')(inp)
#apply the mean using a lambda layer:
intermediateOut = Lambda(lambda x: K.mean(x),output_shape=(1,))(lay)
# decoder
finalOut = Convolution2D(filters=1,kernel_size=(4,4),padding='same', name='encoder',activation='relu')(lay)
#but from that, let's also calculate a mean output for loss:
meanFinalOut = Lambda(lambda x: K.mean(x),output_shape=(1,))(finalOut)
#Now, you have to create a model taking one input and those three outputs:
splitModel = Model(inp,[intermediateOut,meanFinalOut,finalOut])
And finally, compile your model with your custom loss function (we will define that later). But since I don't know if you're actually using the final output (not mean) for training, I'll suggest creating one model for training and another for predicting:
trainingModel = Model(inp,[intermediateOut,meanFinalOut])
trainingModel.compile(...,loss=customLoss)
predictingModel = Model(inp,finalOut)
#you don't need to compile the predicting model since you're only training the trainingModel
#both will share the same weights, you train one, and predict in the other
Our custom loss function should then deal with the kullback.
def customLoss(p,mean):
return #your own kullback expression (I don't know how it works, but maybe keras' one can be used with single values?)
Alternatively, if you want a single loss function to be called instead of two:
summedMeans = Add([intermediateOut,meanFinalOut])
trainingModel = Model(inp, summedMeans)