Tensorflow Keras cross entropy loss with linear activation on output - python

In PyTorch the cross entropy loss function works something like
CrossEntropyLoss(x, y) = H(one_hot(y), softmax(x))
so you can have a linear output layer. Is there a way to do that with tf.keras.Sequential?
I have wirtten this little CNN for MNIST
model = tf.keras.Sequential()
model.add(tfkl.Input(shape=(28, 28, 1)))
model.add(tfkl.Conv2D(32, (5, 5), padding="valid", activation=tf.nn.relu))
model.add(tfkl.MaxPool2D((2, 2)))
model.add(tfkl.Conv2D(64, (5, 5), padding="valid", activation=tf.nn.relu))
model.add(tfkl.MaxPool2D((2, 2)))
model.add(tfkl.Flatten())
model.add(tfkl.Dense(1024, activation=tf.nn.relu))
model.add(tfkl.Dense(10, activation=tf.nn.softmax))
model.compile(optimizer="adam",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])
model.summary()
model.fit(x_train, y_train, epochs=1)
and I would like to have
model.add(tfkl.Dense(10))
as the last layer.
I am trying to implement the ADef algorithm but the entries of the gradient wrt. the input seem to be too small and I guess with a linear output they would be right.
I know there is tf.nn.softmax_cross_entropy_with_logits but I don't know how to use it in this context.
Edit:
Changing
model.compile(optimizer="adam",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])
to
model.compile(optimizer="adam",
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=["accuracy"])
has done the trick.

Thank you #Moe1234. For the benefit of community providing solution here
Issue was resolved after changing
model.compile(optimizer="adam",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])
to
model.compile(optimizer="adam",
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=["accuracy"])

Related

Why is my multiclass neural model not training (accuracy and loss staying same)?

I am learning neural networks. I get 98% accuracy with classical ML methods, so I think I made a coding error. The neural networks model is not learning.
Things I tried:
Changing X and y to float64 or float32
Normalizing data
Changing the activation to "linear" or "relu"
Removing Flatten()
Adding hidden layers
Using stochastic gradient descent as optimizer, instead of "adam".
Changing the y label with another label
There are 9 labels in X_train and 8 different classes in y_train.
X_train:
y_train:
Code:
model = keras.models.Sequential()
model.add(keras.layers.Input(shape=(9,)))
model.add(keras.layers.Dense(8, activation='softmax'))
model.add(layers.Flatten())
model.compile(optimizer= 'adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
Fitting:
I tried these lines by changing the target label. None of them help training the model. Some give "nan" loss, some go slightly up and down, but all of them are below 0.1% accuracy:
model = tf.keras.Sequential()
model.add(layers.Input(shape=(9,)))
model.add(layers.Dense(1, name='dense1'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=20, batch_size=24)
or this:
model = tf.keras.Sequential()
model.add(layers.Input(shape=(9,)))
model.add(layers.Dense(3, activation='relu', name='relu1'))
model.add(layers.Dense(16, activation='relu', name='relu2'))
model.add(layers.Dense(16, activation='relu', name='relu3'))
model.add(layers.Dense(1, name='dense1'))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
history = model.fit(x=X_train, y=y_train, epochs=20)

How to get weights from keras model?

I'm trying to build a 2 layered neural network for MNIST dataset and I want to get weights from my model.
I found a similar question her on SO and I tried this,
model.get_weights()
But It returned 11 values when I check the len(model.get_weights()) Isn't it suppose to return 3 weights? I have even disabled bias.
model = Sequential()
model.add(Flatten(input_shape = (28, 28)))
model.add(Dense(512, activation='relu', kernel_initializer='he_normal', use_bias=False,))
model.add(BatchNormalization())
model.add(Dropout(0.3))
model.add(Dense(128, activation='relu', kernel_initializer='he_normal', use_bias=False,))
model.add(BatchNormalization())
model.add(Dropout(0.1))
model.add(Dense(10, activation='relu', kernel_initializer='he_normal', use_bias=False,))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
result = model.fit(x_train, y_train, validation_split=0.25, epochs=10,
batch_size=128, verbose=1)
To get the weights of a particular layer, you could retrieve this layer by using its name and call get_weights on it (as shubham-panchal said in its comment).
For example:
model.get_layer('dense').get_weights()
or
model.get_layer('dense_2').get_weights()
You could go though the layers of your model and retrieve its name and weights:
{layer.name: layer.get_weights() for layer in model.layers}

Keras Model for Molecular Activity

I am experimenting with the Merku molecular activity challenge and I have created the train and test dataset.
The shape of the data is the following:
x_train.shape=(1452, 4306)
y_train.shape=(1452, 1)
x_test.shape=(363, 4306)
y_test.shape=(363, 1)
I have used the Dense layer for defining the model as follows:
model = Sequential()
model.add(Dense(100, activation="relu", input_shape=(4306,)))
model.add(Dense(50, activation="relu"))
model.add(Dense(25, activation="relu"))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(1))
# Compile the model
model.compile(
loss='categorical_crossentropy',
optimizer="adam",
)
model.summary()
# Train the model
model.fit(
x_train,
y_train,
batch_size=300,
epochs=900,
validation_data=(x_test, y_test),
shuffle=True
)
While trying the above code, the following error occurred:
ValueError: Input 0 is incompatible with layer flatten_23: expected min_ndim=3, found ndim=2
How can I resolve this error?
Just remove the flatten layer:
model = Sequential()
model.add(Dense(100, activation="relu", input_shape=(4306,)))
model.add(Dense(50, activation="relu"))
model.add(Dense(25, activation="relu"))
model.add(Dropout(0.25))
model.add(Dense(1))
The data sent to sequential layers is essentially 1-D (ignoring the batch column) so there's nothing to flatten. The data entering the flatten layer is already 1D.
EDIT -- for regression:
Categorical crossentropy is not an appropriate cost function for regression, you need to use mean-square error, which is commonly used for all regression tasks:
model.compile(
loss='mse',
optimizer="adam",
)

Keras TensorBoard visulize Conv Kernels

I am using Keras with TensorFlow as backend.
Now i want to use the TensorBoard callback to visualize my conv layer kernels.
But i can only see the first conv layer kernel in TensorBoard and my Dense layers at the end.
For the other conv layers i can just the the bias values and not the kernels.
Here is my sample code for the Keras model.
tb = TensorBoard(
log_dir=log_dir,
histogram_freq=epochs,
write_images=True)
# Define the DNN
model = Sequential()
model.add(Conv2D(filters=16, kernel_size=3, input_shape=(width, height, depth), name="conv1"))
model.add(Activation("relu"))
model.add(Conv2D(filters=16, kernel_size=3, name="conv2"))
model.add(Activation("relu"))
model.add(MaxPool2D())
model.add(Conv2D(filters=32, kernel_size=3, name="conv3"))
model.add(Activation("relu"))
model.add(Conv2D(filters=32, kernel_size=3, name="conv4"))
model.add(Activation("relu"))
model.add(MaxPool2D())
model.add(Flatten())
model.add(Dense(128))
model.add(Activation("relu"))
model.add(Dense(num_classes, name="features"))
model.add(Activation("softmax"))
# Print the DNN layers
model.summary()
# Train the DNN
lr = 1e-3
optimizer = Adam(lr=lr)
model.compile(loss="categorical_crossentropy", optimizer=optimizer, metrics=["accuracy"])
model.fit(x_train, y_train, verbose=1,
batch_size=batch_size, epochs=epochs,
validation_data=(x_test, y_test),
callbacks=[tb])
And this is what i see in TensorBoard.
(I minimized the Kernels of my first conv layer)
TB Screenshot
What am i missing to visulize all my kernels?
This is the expected (but not specified in the documentation) behaviour of the Tensorboard callback. See the answer on this related bug report of Tensorboard GitHub page:
The TensorBoard Keras callback calls tf.summary.image without
overriding the default for max_outputs, so there’s no way to visualize
more than the first 3 kernels via the callback at this time.
You need to visualize the kernels with your own call of the tf.summary.image.

Why does a binary Keras CNN always predict 1?

I want to build a binary classifier using a Keras CNN.
I have about 6000 rows of input data which looks like this:
>> print(X_train[0])
[[[-1.06405307 -1.06685851 -1.05989663 -1.06273152]
[-1.06295958 -1.06655996 -1.05969803 -1.06382503]
[-1.06415248 -1.06735609 -1.05999593 -1.06302975]
[-1.06295958 -1.06755513 -1.05949944 -1.06362621]
[-1.06355603 -1.06636092 -1.05959873 -1.06173742]
[-1.0619655 -1.06655996 -1.06039312 -1.06412326]
[-1.06415248 -1.06725658 -1.05940014 -1.06322857]
[-1.06345662 -1.06377347 -1.05890365 -1.06034568]
[-1.06027557 -1.06019084 -1.05592469 -1.05537518]
[-1.05550398 -1.06038988 -1.05225064 -1.05676692]]]
>>> print(y_train[0])
[1]
And then I've build a CNN by this way:
model = Sequential()
model.add(Convolution1D(input_shape = (10, 4),
nb_filter=16,
filter_length=4,
border_mode='same'))
model.add(BatchNormalization())
model.add(LeakyReLU())
model.add(Dropout(0.2))
model.add(Convolution1D(nb_filter=8,
filter_length=4,
border_mode='same'))
model.add(BatchNormalization())
model.add(LeakyReLU())
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(64))
model.add(BatchNormalization())
model.add(LeakyReLU())
model.add(Dense(1))
model.add(Activation('softmax'))
reduce_lr = ReduceLROnPlateau(monitor='val_acc', factor=0.9, patience=30, min_lr=0.000001, verbose=0)
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
history = model.fit(X_train, y_train,
nb_epoch = 100,
batch_size = 128,
verbose=0,
validation_data=(X_test, y_test),
callbacks=[reduce_lr],
shuffle=True)
y_pred = model.predict(X_test)
But it returns the following:
>> print(confusion_matrix(y_test, y_pred))
[[ 0 362]
[ 0 608]]
Why all predictions are ones? Why does the CNN perform so bad?
Here are the loss and acc charts:
It always predicts one because of the output in your network. You have a Dense layer with one neuron, with a Softmax activation. Softmax normalizes by the sum of exponential of each output. Since there is one output, the only possible output is 1.0.
For a binary classifier you can either use a sigmoid activation with the "binary_crossentropy" loss, or put two output units at the last layer, keep using softmax and change the loss to categorical_crossentropy.

Categories