I am very new to NN and tensorflow, recently I have been reading up on keras implementation of variational autoencoder, and I found two versions of loss functions:
version1:
def vae_loss(x, x_decoded_mean):
recon_loss = original_dim * objectives.mse(x, x_decoded_mean)
kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1)
return recon_loss + kl_loss
version2:
def vae_loss(x, x_decoded_mean):
recon_loss = objectives.mse(x, x_decoded_mean)
kl_loss = - 0.5 * K.mean(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1)
return recon_loss + kl_loss
if my understanding is correct, version 1 is a sum of loss and version 2 is mean loss across all samples in the same batch. so does the scale of loss affect learning result? I tried testing them out, and it largely affect my latent variable scale. so why is this and which form of loss function is correct?
update of my question:
if I multiply original_dim with KL loss,
def vae_loss(x, x_decoded_mean):
xent_loss = original_dim * objectives.binary_crossentropy(x, x_decoded_mean)
kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1) *original_dim
return xent_loss + kl_loss
the latent distribution looks like below:
enter image description here
and decoded output looks like this:
enter image description here
looks the encoder output does not contain any information. I am using mnist dataset, and the example from https://github.com/vvkv/Variational-Auto-Encoders/blob/master/Variational%2BAuto%2BEncoders.ipynb
Summing versus averaging the loss for each example in a batch will simply scale all loss terms proportionally. An equivalent change would be adjusting the learning rate. The important thing is that your normal loss magnitude multiplied by your learning rate do not lead to unstable learning.
Related
I'm trying to build a custom loss function in Keras v2.4.3:
(as explained in this answer)
def vae_loss(x: tf.Tensor, x_decoded_mean: tf.Tensor,
original_dim=original_dim):
z_mean = encoder.get_layer('mean').output
z_log_var = encoder.get_layer('log-var').output
xent_loss = original_dim * metrics.binary_crossentropy(x, x_decoded_mean)
kl_loss = - 0.5 * K.sum(
1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1)
vae_loss = K.mean(xent_loss + kl_loss)
return vae_loss
But I think it's behaving much different than expected (perhaps because of my Keras version?), I'm getting this error:
TypeError: Cannot convert a symbolic Keras input/output to a numpy array. This error may indicate that you're trying to pass a symbolic value to a NumPy call, which is not supported. Or, you may be trying to pass Keras symbolic inputs/outputs to a TF API that does not register dispatching, preventing Keras from automatically converting the API call to a lambda layer in the Functional Model.
And I think that's because encoder.get_layer('mean').output is returning a KerasTensor object instead of a tf.Tensor object (as the other answer indicates).
What am I doing wrong here? How can I access the output of a given layer from inside a custom loss function?
I think it's very simple using model.add_loss(). this functionality enables you to pass multiple inputs to your custom loss.
To make a reliable example I produce a simple VAE where I add the VAE loss using model.add_loss()
The full model structure is like below:
def sampling(args):
z_mean, z_log_sigma = args
batch_size = tf.shape(z_mean)[0]
epsilon = K.random_normal(shape=(batch_size, latent_dim), mean=0., stddev=1.)
return z_mean + K.exp(0.5 * z_log_sigma) * epsilon
def vae_loss(x, x_decoded_mean, z_log_var, z_mean):
xent_loss = original_dim * K.binary_crossentropy(x, x_decoded_mean)
kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var))
vae_loss = K.mean(xent_loss + kl_loss)
return vae_loss
def get_model():
### encoder ###
inp = Input(shape=(n_features,))
enc = Dense(64)(inp)
z = Dense(32, activation="relu")(enc)
z_mean = Dense(latent_dim)(z)
z_log_var = Dense(latent_dim)(z)
encoder = Model(inp, [z_mean, z_log_var])
### decoder ###
inp_z = Input(shape=(latent_dim,))
dec = Dense(64)(inp_z)
out = Dense(n_features)(dec)
decoder = Model(inp_z, out)
### encoder + decoder ###
z_mean, z_log_sigma = encoder(inp)
z = Lambda(sampling)([z_mean, z_log_var])
pred = decoder(z)
vae = Model(inp, pred)
vae.add_loss(vae_loss(inp, pred, z_log_var, z_mean)) # <======= add_loss
vae.compile(loss=None, optimizer='adam')
return vae, encoder, decoder
The running notebook is available here: https://colab.research.google.com/drive/18day9KMEbH8FeYNJlCum0xMLOtf1bXn8?usp=sharing
I am working on a dummy example with generated heartbeats, and want to first use a VAE to encode the heartbeats and afterwards a simple classifier.
Problem is when i increase the beta above 0.01, the reconstructions become nonsense (see the first image).
And when the beta is low i get a normal autoencoder output with no disentanglement (second image).
I believe the problem may be in my KL divergence or VAE loss function, but i can't seem to find it.
In my encoder i do the reparameterization as such:
enc = self.encoder(x,batch_size, x_lenghts)
mu = self.enc2mean(enc)
logv = self.enc2logv(enc)
std = torch.exp(0.5*logv)
z = torch.randn([batch_size,1, self.encoder_hidden_sizes[-1] * (int(self.bidirectional)+1)]).to(self.device)
z = z * std + mu
And i define the VAE loss as:
def VAE_loss(x, reconstruction, mu, logvar, batch_size, latent_dim, beta=0):
mse = F.mse_loss(x, reconstruction)
KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
KLD /= (batch_size * latent_dim)
return mse + beta*KLD
Full standalone code to reproduce the results is here.
Any insights are appreciated!
I've read this blog by Keras on VAE implementation, where VAE loss is defined this way:
def vae_loss(x, x_decoded_mean):
xent_loss = objectives.binary_crossentropy(x, x_decoded_mean)
kl_loss = - 0.5 * K.mean(1 + z_log_sigma - K.square(z_mean) - K.exp(z_log_sigma), axis=-1)
return xent_loss + kl_loss
I looked at the Keras documentation and the VAE loss function is defined this way:
In this implementation, the reconstruction_loss is multiplied by original_dim, which I don't see in the first implementation!
if args.mse:
reconstruction_loss = mse(inputs, outputs)
else:
reconstruction_loss = binary_crossentropy(inputs,
outputs)
reconstruction_loss *= original_dim
kl_loss = 1 + z_log_var - K.square(z_mean) - K.exp(z_log_var)
kl_loss = K.sum(kl_loss, axis=-1)
kl_loss *= -0.5
vae_loss = K.mean(reconstruction_loss + kl_loss)
vae.add_loss(vae_loss)
Can somebody please explain why? Thank you!
first_one: CE + mean(kl, axis=-1) = CE + sum(kl, axis=-1) / d
second_one: d * CE + sum(kl, axis=-1)
So:
first_one = second_one / d
And note that the second one returns the mean loss over all the samples, but the first one returns a vector of losses for all samples.
In VAE, the reconstruction loss function can be expressed as:
reconstruction_loss = - log(p ( x | z))
If the decoder output distribution is assumed to be Gaussian, then the loss function boils down to MSE since:
reconstruction_loss = - log(p( x | z)) = - log ∏ ( N(x(i), x_out(i), sigma**2) = − ∑ log ( N(x(i), x_out(i), sigma**2) . alpha . ∑ (x(i), x_out(i))**2
In contrast, the equation for the MSE loss is:
L(x,x_out) = MSE = 1/m ∑ (x(i) - x_out(i)) **2
Where m is the output dimensions. for example, in MNIST m = width × height × channels = 28 × 28 × 1 = 784
Thus,
reconstruction_loss = mse(inputs, outputs)
should be multiplied by m (i.e. original dimension) to be equal to the original reconstruction loss in the VAE formulation.
I have only one output for my model, but I would like to combine two different loss functions:
def get_model():
# create the model here
model = Model(inputs=image, outputs=output)
alpha = 0.2
model.compile(loss=[mse, gse],
loss_weights=[1-alpha, alpha]
, ...)
but it complains that I need to have two outputs because I defined two losses:
ValueError: When passing a list as loss, it should have one entry per model outputs.
The model has 1 outputs, but you passed loss=[<function mse at 0x0000024D7E1FB378>, <function gse at 0x0000024D7E1FB510>]
Can I possibly write my final loss function without having to create another loss function (because that would restrict me from changing the alpha outside the loss function)?
How do I do something like (1-alpha)*mse + alpha*gse?
Update:
Both my loss functions are equivalent to the function signature of any builtin keras loss function, takes in y_true and y_pred and gives a tensor back for loss (which can be reduced to a scalar using K.mean()), but I believe, how these loss functions are defined shouldn't affect the answer as long as they return valid losses.
def gse(y_true, y_pred):
# some tensor operation on y_pred and y_true
return K.mean(K.square(y_pred - y_true), axis=-1)
Specify a custom function for the loss:
model = Model(inputs=image, outputs=output)
alpha = 0.2
model.compile(
loss=lambda y_true, y_pred: (1 - alpha) * mse(y_true, y_pred) + alpha * gse(y_true, y_pred),
...)
Or if you don't want an ugly lambda make it into an actual function:
def my_loss(y_true, y_pred):
return (1 - alpha) * mse(y_true, y_pred) + alpha * gse(y_true, y_pred)
model = Model(inputs=image, outputs=output)
alpha = 0.2
model.compile(loss=my_loss, ...)
EDIT:
If your alpha is not some global constant, you can have a "loss function factory":
def make_my_loss(alpha):
def my_loss(y_true, y_pred):
return (1 - alpha) * mse(y_true, y_pred) + alpha * gse(y_true, y_pred)
return my_loss
model = Model(inputs=image, outputs=output)
alpha = 0.2
my_loss = make_my_loss(alpha)
model.compile(loss=my_loss, ...)
Yes, define your own custom loss function and pass that to the loss argument upon compiling:
def custom_loss(y_true, y_pred):
return (1-alpha) * K.mean(K.square(y_true-y_pred)) + alpha * gse
(Not sure what you mean with gse). It can be helpful to have a look at how the vanilla losses are implemented in Keras: https://github.com/keras-team/keras/blob/master/keras/losses.py
loss function should be one function.You are giving your model a list of two functions
try:
def mse(y_true, y_pred):
return K.mean(K.square(y_pred - y_true), axis=-1)
model.compile(loss= (mse(y_true, y_pred)*(1-alpha) + gse(y_true, y_pred)*alpha),
, ...)
Not that this answer particularly addresses the original question, I thought of writing it because the same error occurs when trying to load a keras model that has a custom loss using keras.models.load_model, and it's not been properly answered anywhere. Specifically, following the VAE example code in keras github repository, this error occurs when loading the VAE model after been saved with model.save.
The solution is to save only the weights using vae.save_weights('file.h5') instead of saving the full model. However, you would have to build and compile the model again before loading the weights using vae.load_weights('file.h5').
Following is an example implementation.
class VAE():
def build_model(self): # latent_dim and intermediate_dim can be passed as arguments
def sampling(args):
"""Reparameterization trick by sampling from an isotropic unit Gaussian.
# Arguments
args (tensor): mean and log of variance of Q(z|X)
# Returns
z (tensor): sampled latent vector
"""
z_mean, z_log_var = args
batch = K.shape(z_mean)[0]
dim = K.int_shape(z_mean)[1]
# by default, random_normal has mean = 0 and std = 1.0
epsilon = K.random_normal(shape=(batch, dim))
return z_mean + K.exp(0.5 * z_log_var) * epsilon
# original_dim = self.no_features
# intermediate_dim = 256
latent_dim = 8
inputs = Input(shape=(self.no_features,))
x = Dense(256, activation='relu')(inputs)
x = Dense(128, activation='relu')(x)
x = Dense(64, activation='relu')(x)
z_mean = Dense(latent_dim, name='z_mean')(x)
z_log_var = Dense(latent_dim, name='z_log_var')(x)
# use reparameterization trick to push the sampling out as input
# note that "output_shape" isn't necessary with the TensorFlow backend
z = Lambda(sampling, output_shape=(latent_dim,), name='z')([z_mean, z_log_var])
# instantiate encoder model
encoder = Model(inputs, [z_mean, z_log_var, z], name='encoder')
# build decoder model
latent_inputs = Input(shape=(latent_dim,), name='z_sampling')
x = Dense(32, activation='relu')(latent_inputs)
x = Dense(48, activation='relu')(x)
x = Dense(64, activation='relu')(x)
outputs = Dense(self.no_features, activation='linear')(x)
# instantiate decoder model
decoder = Model(latent_inputs, outputs, name='decoder')
# instantiate VAE model
outputs = decoder(encoder(inputs)[2])
VAE = Model(inputs, outputs, name='vae_mlp')
reconstruction_loss = mse(inputs, outputs)
reconstruction_loss *= self.no_features
kl_loss = 1 + z_log_var - K.square(z_mean) - K.exp(z_log_var)
kl_loss = K.sum(kl_loss, axis=-1)
kl_loss *= -0.5
vae_loss = K.mean(reconstruction_loss + kl_loss)
VAE.add_loss(vae_loss)
VAE.compile(optimizer='adam')
return VAE
Now,
vae_cls = VAE()
vae = vae_cls.build_model()
# vae.fit()
vae.save_weights('file.h5')
Load model and predict (if in a different script, you need to import the VAE class),
vae_cls = VAE()
vae = vae_cls.build_model()
vae.load_weights('file.h5')
# vae.predict()
Finally, The Difference: [ref]
Keras model.save saves,
Model weights
Model architecture
Model compilation details (loss function(s) and metrics)
Model optimizer and regularizer states
Keras model.save_weights saves only the model weights. Keras model.to_json() saves the model architecture.
Hope this helps someone experimenting with variational autoencoders.
Combine MAE and RMSE together:
import tensorflow as tf
from tensorflow import keras
def loss_fn_mae_rmse(y_true, y_pred, alpha=0.8):
mae = keras.losses.MeanAbsoluteError()
mse = keras.losses.MeanSquaredError()
return alpha * mae(y_true, y_pred) + (1 - alpha) * tf.sqrt(mse(y_true, y_pred))
model = keras.Model(inputs=..., outputs=...)
opt = keras.optimizers.Adam(learning_rate=1e-4)
model.compile(optimizer=opt, loss=loss_fn_mae_rmse, metrics=['mae'])
At the same time, if you want to load this model after training and saved to disk:
model = keras.models.load_model('path/to/model.h5', custom_objects={'loss_fn_mae_rmse': loss_fn_mae_rmse})
I'm a beginner of neural network and currently working on using neural network to analyze protein sequence alignment data. The data is converted to one-hot encoding binary data. I'm using keras to build my variational autoencoder. My model has 4 "Dense" layers for both encoder and decoder. The input is normalized by "BatchNormalization" and Dropout layer is also added inside the model. The optimizer is Adam. The problem is I get very huge and still increasing loss, and they soon reach nan. I try to use learning rate = 0.1, 0.2 or 0.8 and use decay to reduce the learning rate. But it doesn't help. I'm stuck on this model for a week...almost collapse. Thanks for any help!
x = Input(shape=(original_dim,))
dr = Dropout(0.7)
h1z = BatchNormalization(mode = 0)(x)
h1 = Dense(intermediate_dim, activation='elu')(h1z)
h2 = dr(Dense(intermediate_dim2, activation='elu')(h1))
h3 = Dense(intermediate_dim3, activation = 'elu')(h2)
h4 = Dense(intermediate_dim4, activation = 'elu')(h3)
z_mean = Dense(latent_dim)(h4)
z_log_var = Dense(latent_dim)(h4)
def sampling(args):
z_mean, z_log_var = args
epsilon = K.random_normal(shape=(K.shape(z_mean)[0], latent_dim),
mean=0.,stddev=epsilon_std)
return z_mean + K.exp(z_log_var / 2) * epsilon
z = Lambda(sampling, output_shape=(latent_dim,))([z_mean, z_log_var])
decoder_h4 = Dense(intermediate_dim4, activation = 'elu')
decoder_h3 = Dense(intermediate_dim3, activation='elu')
decoder_h2 = Dense(intermediate_dim2, activation='elu')
decoder_h1 = Dense(intermediate_dim, activation='elu')
drd = Dropout(0.7)
decoder_mean = Dense(original_dim, activation='sigmoid')
h_decoded4 = BatchNormalization(mode = 0)(decoder_h4(z))
h_decoded3 = decoder_h3(h_decoded4)
h_decoded2 = drd(decoder_h2(h_decoded3))
h_decoded1 = decoder_h1(h_decoded2)
x_decoded_mean = decoder_mean(h_decoded1)
def vae_loss(x, x_decoded_mean):
xent_loss = original_dim * metrics.binary_crossentropy(x,
x_decoded_mean)
kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) -
K.exp(z_log_var), axis=-1)
return K.mean(xent_loss + kl_loss)
vae = Model(x, x_decoded_mean)
vae.compile(optimizer=Adam(lr = 0, decay = 0), loss=vae_loss, metrics=
`enter code here`["categorical_accuracy","top_k_categorical_accuracy"])