I know there are many questions treating custom loss functions in Keras but I've been unable to answer this even after 3 hours of googling.
Here is a very simplified example of my problem. I realize this example is pointless but I provide it for simplicity, I obviously need to implement something more complicated.
from keras.backend import binary_crossentropy
from keras.backend import mean
def custom_loss(y_true, y_pred):
zeros = tf.zeros_like(y_true)
index_of_zeros = tf.where(tf.equal(zeros, y_true))
ones = tf.ones_like(y_true)
index_of_ones = tf.where(tf.equal(ones, y_true))
zero = tf.gather(y_pred, index_of_zeros)
one = tf.gather(y_pred, index_of_ones)
loss_0 = binary_crossentropy(tf.zeros_like(zero), zero)
loss_1 = binary_crossentropy(tf.ones_like(one), one)
return mean(tf.concat([loss_0, loss_1], axis=0))
I do not understand why training the network with the above loss function on a two class dataset does not yield the same result as training with the built in binary-crossentropy loss function.
Thank you!
EDIT: I edited the code snippet to include the mean as per comments below. I still get the same behavior however.
I finally figured it out. The tf.where function behaves very differently when the shape is "unknown".
To fix the snippet above simply insert the following lines right after the function is declared:
y_pred = tf.reshape(y_pred, [-1])
y_true = tf.reshape(y_true, [-1])
Related
I am building a model with 3 classes: [0,1,2]
After training, the .predict function returns a list of percentages instead.
I was checking the keras documentation but could not figure out, what I did wrong.
.predict_classes is not working anymore, and I did not have this problem with previous classifiers. I already tried different activation functions (relu, sigmoid etc.)
If I understand correctly, the number inDense(3...) defines the amount of classes.
outputs1=Dense(3,activation='softmax')(att_out)
model1=Model(inputs1,outputs1)
model1.summary()
model1.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=['accuracy'])
model1.fit(x=text_pad,y=train_y,batch_size=batch_size,epochs=epochs,verbose=1,shuffle=True)
y_pred = model1.predict(test_text_matrix)
Output example:
[[0.34014237 0.33570153 0.32415614]
[0.34014237 0.33570153 0.32415614]
[0.34014237 0.33570153 0.32415614]
[0.34014237 0.33570153 0.32415614]
[0.34014237 0.33570153 0.32415614]]
Output I want:
[1,2,0,0,0,1,2,0]
Thank you for any ideas.
You did not do anything wrong, predict has always returned the output of the model, for a classifier this has always been probabilities per class.
predict_classes is only available for Sequential models, not for Functional ones.
But there is an easy solution, you just need to take the argmax on the last dimension and you will get class indices:
y_probs = model1.predict(test_text_matrix)
y_pred = np.argmax(y_probs, axis=-1)
I'm trying to implement a custom loss function on my neural network, which would look like this, if tensors were, instead, numpy arrays:
def custom_loss(y_true, y_pred):
activated = y_pred[y_true > 1]
return np.abs(activated.mean() - activated.std()) / activated.std()
The y's have a shape of (batch_size, 1); that's to say, it's a scalar output for each input row.
obs: this post (Converting Tensor to np.array using K.eval() in Keras returns InvalidArgumentError) gave me an initial direction for which to walk on.
Edit:
This is a reproducible setup for which I'm trying to apply the custom loss function:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
X = np.random.normal(0, 1, (256, 5))
Y = np.random.normal(0, 1, (256, 1))
model = keras.Sequential([
layers.Dense(1),
])
model.compile(optimizer='adam', loss=custom_loss)
model.fit(X, Y)
The .fit() on the last line throws the error AttributeError: 'Tensor' object has no attribute 'mean', if I define custom_loss as stated above on my question.
It's a simple catch. You can use your custom loss as follows
def custom_loss(y_true, y_pred):
activated = y_pred[y_true > 1]
return tf.math.abs(tf.reduce_mean(activated) -
tf.math.reduce_std(activated)) / tf.math.reduce_std(activated)
or if you want to use tf.boolean_mask(tensor, mask, ..) then you need to ensure that the mask condition is in the shape of (None,) or 1D. And if we apply tf.where(y_true>1) it will produce a 2D tensor that needs to be reshaped in your case.
def custom_loss(y_true, y_pred):
activated = tf.boolean_mask(y_pred, tf.reshape(tf.where(y_true>1),[-1]) )
return tf.math.abs(tf.reduce_mean(activated) -
tf.math.reduce_std(activated)) / tf.math.reduce_std(activated)
Have you tried writing it in tensorflow as had gradient problems? Or is this just how to do so in tensorflow? -- Don't worry, I won't give you a classic toxic SO response!
I would try something like this (not tested, but seems along the right track):
def custom_loss(y_true, y_pred):
activated = tf.boolean_mask(y_pred, tf.where(y_true>1))
return tf.math.abs(tf.reduce_mean(activated) - tf.math.reduce_std(activated)) / tf.math.reduce_std(activated))
You may need to play around with dimensions in there, since all of those functions allow for specifying the dimensions to work with.
Also, you will lose the loss function when you save the model, unless you subclass the general loss function. That may be more detail than you are looking for, but if you have problems saving and loading the model, let me know.
Hello I am in need of a custom regularization term to add to my (binary cross entropy) Loss function. Can somebody help me with the Tensorflow syntax to implement this?
I simplified everything as much as possible so it could be easier to help me.
The model takes a dataset 10000 of 18 x 18 binary configurations as input and has a 16x16 of a configuration set as output. The neural network consists only of 2 Convlutional layer.
My model looks like this:
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
EPOCHS = 10
model = models.Sequential()
model.add(layers.Conv2D(1,2,activation='relu',input_shape=[18,18,1]))
model.add(layers.Conv2D(1,2,activation='sigmoid',input_shape=[17,17,1]))
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),loss=tf.keras.losses.BinaryCrossentropy())
model.fit(initial.reshape(10000,18,18,1),target.reshape(10000,16,16,1),batch_size = 1000, epochs=EPOCHS, verbose=1)
output = model(initial).numpy().reshape(10000,16,16)
Now I wrote a function which I'd like to use as an aditional regularization terme to have as a regularization term. This function takes the true and the prediction. Basically it multiplies every point of both with its 'right' neighbor. Then the difference is taken. I assumed that the true and prediction term is 16x16 (and not 10000x16x16). Is this correct?
def regularization_term(prediction, true):
order = list(range(1,4))
order.append(0)
deviation = (true*true[:,order]) - (prediction*prediction[:,order])
deviation = abs(deviation)**2
return 0.2 * deviation
I would really appreciate some help with adding something like this function as a regularization term to my loss for helping the neural network to train better to this 'right neighbor' interaction. I'm really struggling with using the customizable Tensorflow functionalities a lot.
Thank you, much appreciated.
It is quite simple. You need to specify a custom loss in which you define your adding regularization term. Something like this:
# to minimize!
def regularization_term(true, prediction):
order = list(range(1,4))
order.append(0)
deviation = (true*true[:,order]) - (prediction*prediction[:,order])
deviation = abs(deviation)**2
return 0.2 * deviation
def my_custom_loss(y_true, y_pred):
return tf.keras.losses.BinaryCrossentropy()(y_true, y_pred) + regularization_term(y_true, y_pred)
model.compile(optimizer='Adam', loss=my_custom_loss)
As stated by keras:
Any callable with the signature loss_fn(y_true, y_pred) that returns
an array of losses (one of sample in the input batch) can be passed to
compile() as a loss. Note that sample weighting is automatically
supported for any such loss.
So be sure to return an array of losses (EDIT: as I can see now it is possible to return also a simple scalar. It doesn't matter if you use for example the reduce function). Basically y_true and y_predicted have as first dimension the batch size.
here details: https://keras.io/api/losses/
After i 'v written the simple neural network with numpy, i wanted to compare it numerically with PyTorch impementation. Running alone, seems my neural network implementation converges, so it seems to have no errors.
Also i v checked forward pass matches to PyTorch, so basic setup is correct.
But something different happens while backward pass, because the weights after one backpropagation are different.
I dont want to post full code here because its linked over several .py files, and most of the code is irrelevant to the question. I just want to know does PyTorch "basic" gradient descent or something different.
I m viewing the most simle example about full-connected weights of the last layer, cause if it is different, further will be also different:
self.weight += self.learning_rate * hidden_layer.T.dot(output_delta )
where
output_delta = self.expected - self.output
self.expected are expected value,
self.output is forward pass result
No activation or further stuff here.
The torch past is:
optimizer = torch.optim.SGD(nn.parameters() , lr = 1.0)
criterion = torch.nn.MSELoss(reduction='sum')
output = nn.forward(x_train)
loss = criterion(output, y_train)
loss.backward()
optimizer.step()
optimizer.zero_grad()
So it is possible that with SGD optimizer and MSELoss it uses some different delta or backpropagation function, not the basic one mentioned above? If its so i d like to know how to numerically check my numpy solution with pytorch.
I just want to know does PyTorch "basic" gradient descent or something different.
If you set torch.optim.SGD, this means stochastic gradient descent.
You have different implementations on GD, but the one that is used in PyTorch is applied to mini-batches.
There are GD implementations that will optimize parameters after the full epoch. As you may guess they are very "slow", this may be great for supercomputers to test. There are GD implementations that work for every sample, as you may guess their imperfectness is "huge" gradient fluctuations.
These are all relative terms, so I am using ""
Note you are using too big learning rates like lr = 1.0, which means you haven't normalized your data at first, but this is a skill you may scalp over time.
So it is possible that with SGD optimizer and MSELoss it uses some different delta or backpropagation function, not the basic one mentioned above?
It uses what you told.
Here is a the example in PyTorch and in Python to show detection of gradients works as expected (used in back propagation) :
x = torch.tensor([5.], requires_grad=True);
print(x) # tensor([5.], requires_grad=True)
y = 3*x**2
y.backward()
print(x.grad) # tensor([30.])
How would you get this value 30 in plain python?
def y(x):
return 3*x**2
x=5
e=0.01 #etha
g=(y(x+e)-y(x))/e
print(g) # 30.0299
As we expect we got ~30, it would be even better with smaller etha.
I cannot find anywhere how exactly is backpropagation done in Keras? Let me explain:
Lets say i have network
input = Input(shape=(X,X,Y))
x = Conv2D(32,(3,3),padding="same")(input)
x = Conv2D(64,(3,3),padding="same")(x)
x = Conv2D(128,(3,3),padding="same")(x)
x = Conv2D(64,(3,3),padding="same")(x)
Output = Flatten(1024)(x)
Output = Flatten(6)(Output)
model = Model(input,Output)
model.compile(loss="mean_squared_error", optimizer=keras.optimizers.Adam(),metrics=['accuracy'])
model.fit(trainingData,trainingLabels)
The output of last layer is compared to trainingLabels, mean squared error is computed and Backpropagation happens based on the value of mean squaed error
However, what if i wanted to something more. And for example I want to try every permutation of output vector, and the one that results in minimal mean squared error should be treated as output, thus Backpropagation happens based on permutation with least error.
Is something like this possible in Keras? If so, how can i achieve it
The loss argument of model.compile method accepts a python function. You could compute the minimum over the set of permutations in a custom function:
def custom_loss(y_true, y_predicted):
''' code here '''
and then pass it to model.compile:
model.compile(loss=custom_loss,
optimizer=keras.optimizers.Adam(),
metrics=['accuracy'])
See here and here for reference.