Specify multiple loss function for model compilation in Keras - python

I want to specify 2 loss functions 1 for the object class which is cross-entropy and the other for the bounding box which is mean squared error. how to specify in model.compile each output with the corresponding loss function?
model = Sequential()
model.add(Dense(128, activation='relu'))
out_last_dense = model.add(Dense(128, activation='relu'))
object_type = model.add(Dense(1, activation='softmax'))(out_last_dense)
object_coordinates = model.add(Dense(4, activation='softmax'))(out_last_dense)
/// here is the problem i want to specify loss function for object type and coordinates
model.compile(loss= keras.losses.categorical_crossentropy,
optimizer= 'sgd', metrics=['accuracy'])

First of all, you can't use Sequential API here since your model has two output layers (i.e. what you have written is all wrong and would raise error). Instead you must use Keras Functional API:
inp = Input(shape=...)
x = Dense(128, activation='relu')(inp)
x = Dense(128, activation='relu')(x)
object_type = Dense(1, activation='sigmoid', name='type')(x)
object_coordinates = Dense(4, activation='linear', name='coord')(x)
Now, you can specify a loss function (as well as metric) for each output layer based on their names given above and using a dictionary:
model.compile(loss={'type': 'binary_crossentropy', 'coord': 'mse'},
optimizer='sgd', metrics={'type': 'accuracy', 'coord': 'mae'})
Further, note that you are using softmax as the activation function and I have changed it to sigomid and linear above. That's because: 1) using softmax on a layer with one unit does not make sense (if there are more than 2 classes then you should use softmax), and 2) the other layer predicts coordinates and therefore using softmax is not suitable at all (unless the problem formulation let you do so).

Related

Argmax in a Keras multiclassifier ANN

I am trying to code a 5 class classifier ANN, and this code return this error:
classifier = Sequential()
classifier.add(Dense(units=10, input_dim=14, kernel_initializer='uniform', activation='relu'))
classifier.add(Dense(units=6, kernel_initializer='uniform', activation='relu'))
classifier.add(Dense(units=5, kernel_initializer='uniform', activation='softmax'))
classifier.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
RD_Model = classifier.fit(X_train,y_train, batch_size=10 , epochs=10, verbose=1)
File "c:\Program Files\Python310\lib\site-packages\keras\backend.py", line 5119, in categorical_crossentropy
target.shape.assert_is_compatible_with(output.shape)
ValueError: Shapes (None, 1) and (None, 5) are incompatible
I figured this is caused because I have a probability matrix instead of an actual output, so I have been trying to apply an argmax, but haven't figured a way
Can someone help me out?
Have you tried applying:
tf.keras.backend.argmax()
You can define a lambda layer using the following:
from keras.layer import Lambda
from keras import backend as K
def argmax_layer(input):
return K.argmax(input, axis=-1)
Keras provides two paradigms for defining a model topology.
The code you are using uses the Sequential API. You might have to revert to the Functional API.
input_layer = Input(shape=(14,))
layer_1 = Dense(10, activation="relu")(input_layer)
layer_2 = Dense(6, activation="relu")(layer_1)
layer_3 = argmax_layer()(layer_2 )
output_layer= Dense(5, activation="linear")(layer_3 )
model = Model(inputs=input_layer, outputs=output_layer)
model.compile(optimizer='adam',
loss='categorical_crossentropy', metrics=['accuracy'])
Another option would be to instantiate an inherited class of a Keras Layer.
https://www.tutorialspoint.com/keras/keras_customized_layer.htm
As Dr. Snoopy mentioned, it was indeed a problem of one-hot encoding... I missed to do that, resulting in my model not working.
So I just one hot encoded it:
encoder = LabelEncoder()
encoder.fit(y_train)
encoded_Y = encoder.transform(y_train)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)
And it worked after using dummy_y. Thank you for your help.

softmax and sigmoid are giving same results in multiclass classification

I am building an lstm model. I tested my model using softmax and sigmoid activation function. In the documentation sigmoid is used for binary classification and softmax is used for multiclass classification. But in my case, both are giving the same results. Why is it so?
Here is my code:
embedding_vecor_length = 128
max_length = 700
model = Sequential()
model.add(Embedding(len(tokenizer.word_index)+1, embedding_vecor_length, input_length=max_length))
model.add(Conv1D(filters=32, kernel_size=5, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Conv1D(filters=16, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Bidirectional(LSTM(64)))
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
Here are the predicted results:
[[2.72062905e-02 1.47979835e-03 4.44446778e-04 1.60833297e-05
4.15672457e-06 3.20438482e-02 9.38653767e-01 1.41544719e-04
5.55426550e-06 4.47654566e-06]
[2.31099591e-01 1.71699154e-03 1.32052042e-02 4.70457249e-04
8.86382014e-02 2.65704724e-03 6.54215395e-01 7.50611164e-03
4.89178114e-04 1.89376965e-06]
[1.24909900e-01 8.73659015e-01 9.71468398e-06 1.66079029e-04
1.05203628e-06 4.14116839e-05 3.97000113e-05 6.98190925e-05
1.10231712e-03 9.84829512e-07]
The sigmoid allows you to have high probability for all of your classes, some of them, or none of them. Example: classifying diseases in a chest x-ray image. The image might contain pneumonia, emphysema, and/or cancer, or none of those findings.
The softmax enforces that the sum of the probabilities of your output classes are equal to one, so in order to increase the probability of a particular class, your model must correspondingly decrease the probability of at least one of the other classes. Example: classifying images from the MNIST data set of handwritten digits. A single picture of a digit has only one true identity - the picture cannot be a 7 and an 8 at the same time.
So in your case if the model is good the prediction will not differ alot when either using sigmoid or softmax, softmax forces the sum of prediction to be 1 sigmoid doesn't do that.

mutli-task learning keras ( integrate two task in one mutli-ask one)

i work on keras , i have two problem i want to solve ( one classifiecation and other regression ) with the same input and different in output
for classification all data will be used and also for regression , the difference just in output layer
i create single model for each one as the following example for classification
model = Sequential()
model.add(Dense(300, activation='relu', input_dim=377))
model.add(Dense(256, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(56, activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(1))#
and it works well , the same model with changing in the last model works well for regression problem
my question is how to integrate the two tasks in multi-task learning neural network that take one input and output the two tasks
i search a lot but i didn't reach to the solution i want
note : i work with data in CSV file format
any help will be appreciated
this is an example with keras functional api
inp = Input((377,))
x = Dense(300, activation='relu')(inp)
x = Dense(256, activation='relu')(x)
x = Dense(128, activation='relu')(x)
x = Dropout(0.1)(x)
x = Dense(56, activation='relu')(x)
x = Dense(16, activation='relu')(x)
x = Dropout(0.1)(x)
out_reg = Dense(1, name='reg')(x)
out_class = Dense(1, activation='sigmoid', name='class')(x) # I suppose bivariate classification problem
model = Model(inp, [out_reg, out_class])
model.compile('adam', loss={'reg':'mse', 'class':'binary_crossentropy'},
loss_weights={'reg':0.5, 'class':0.5})
I used the structure you reported for classification and for regression, the only difference is in the output: 2 dense layer, one regression and the other classification (I suppose a binary classifier)
I've applied also a different loss for regression and classification. You can also balance them in a different way

How to include transformation in Keras input model?

I'm pretty new to Keras and Tensorflow in general, so perhaps this is a stupid question...
What I want to achieve is the following:
I have a set of words, let's say: cat, dog, cow,..
Those words should be encoded based on a given alphabet, on the position of the character there is a 1 in the vector, else a 0.
For cat e.g. something like 1,0,1,0,0,0,0,0,....,1,0,0,...0.
I use the Keras Tokenizer for that:
tk = Tokenizer(char_level=True, oov_token='UNK')
alphabet="abcdefghijklmnopqrstuvwxyzöäü0123456789-,;.!?:'\"/\\|_##$%^&*~`+-=<>()[]{}"
char_dict = {}
for i, char in enumerate(alphabet):
char_dict[char] = i + 1
# Use char_dict to replace the tk.word_index
tk.word_index = char_dict
# Add 'UNK' to the vocabulary
tk.word_index[tk.oov_token] = max(char_dict.values()) + 1
x_train = tk.texts_to_matrix(x_train)
Those vectors are passed into the Keras model for prediction. Now, I want the transformation to happen in the Keras model. So, the user should provide "cat" to the model and not a numeric vector like above. And the model should also return "cat".
How can I achieve that? I saw that there is a Lambda layer in Keras, is this the correct approach?
Thanks in advance.
Edit for clarification:
The model at the moment looks like this:
model = Sequential()
model.add(Dense(128, input_shape=(len(alphabet)+1,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
But what I want to achieve is to have an input layer that gets strings as inputs and converts the strings to the format the actual first layer can read.Something like this:
model = Sequential()
**model.add(transformation_layer)**
model.add(Dense(128, input_shape=(len(alphabet)+1,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
Edit 2
This is what I tried, but getting the following error when running the "model.fit" function:
tensorflow.python.framework.errors_impl.OperatorNotAllowedInGraphError: iterating over tf.Tensor is not allowed in Graph execution. Use Eager execution or decorate this function with #tf.function.
def transform_layer(x):
return tk.texts_to_matrix(x)
print('Building model...')
transform_layer = Lambda(transform_layer)
model = Sequential()
model.add(transform_layer)
model.add(Dense(128, input_shape=(len(alphabet)+1,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
history = model.fit(np.array(['test','test2']), np.array(['blub','blub2']),
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_split=0.1)

Usage of sigmoid activation function in Keras

I have a big dataset composed of 18260 input field with 4 outputs. I am using Keras and Tensorflow to build a neural network that can detect the possible output.
However I tried many solutions but the accuracy is not getting above 55% unless I use sigmoid activation function in all model layers except the first one as below:
def baseline_model(optimizer= 'adam' , init= 'random_uniform'):
# create model
model = Sequential()
model.add(Dense(40, input_dim=18260, activation="relu", kernel_initializer=init))
model.add(Dense(40, activation="sigmoid", kernel_initializer=init))
model.add(Dense(40, activation="sigmoid", kernel_initializer=init))
model.add(Dense(10, activation="sigmoid", kernel_initializer=init))
model.add(Dense(4, activation="sigmoid", kernel_initializer=init))
model.summary()
# Compile model
model.compile(loss='sparse_categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
return model
Is using sigmoid for activation correct in all layers? The accuracy is reaching 99.9% when using sigmoid as shown above. So I was wondering if there is something wrong in the model implementation.
The sigmoid might work. But I suggest using relu activation for hidden layers' activation. The problem is, your output layer's activation is sigmoid but it should be softmax(because you are using sparse_categorical_crossentropy loss).
model.add(Dense(4, activation="softmax", kernel_initializer=init))
Edit after discussion on comments
Your outputs are integers for class labels. Sigmoid logistic function outputs values in range (0,1). The output of the softmax is also in range (0,1), but the Softmax function adds another constraint on the outputs:- the sum of the outputs must be 1. Therefore the outputs of softmax can be interpreted as probability of the input being each class.
E.g
def sigmoid(x):
return 1.0/(1 + np.exp(-x))
def softmax(a):
return np.exp(a-max(a))/np.sum(np.exp(a-max(a)))
a = np.array([0.6, 10, -5, 4, 7])
print(sigmoid(a))
# [0.64565631, 0.9999546 , 0.00669285, 0.98201379, 0.99908895]
print(softmax(a))
# [7.86089760e-05, 9.50255231e-01, 2.90685280e-07, 2.35544722e-03,
4.73104222e-02]
print(sum(softmax(a))
# 1.0
You got to use one or the other activation, as activations are the source to bring non-linearity into the model. If the model doesn't have any activation, then it basically behaves like a single layer network. Read more about 'Why to use activations here'. You can check various activations here.
Although it seems like your model is overfitting when using sigmoid, so try techniques to overcome it like creating train/dev/test sets, reducing complexity of the model, dropouts, etc.
Neural networks require non-linearity at each layer to work. Without non-linear activation no matter how many layers you have, you could write the same thing with only one layer.
Linear functions are limited in complexity and if "g" and "f" are linear functions g(f(x)) could be written as z(x) where z is also a linear function. It is pointless to stack them without adding non-linearity.
And that's why we use non-linear activation functions. sigmoid(g(f(x))) cannot be written as a linear function.

Categories