Keras loss is not decreasing - python

I tried to learn my NN with breast Cancer Wisconsin
(I add "id" column as an index and changed "diagnosis" column to 0 and 1 with sklearn.preprocessing.LabelEncoder), but my NN is not reducing Loss.
I tried other optimizers and losses but this isn't working.
That's my NN:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, BatchNormalization, InputLayer
import tensorflow.nn as tfnn
model = Sequential()
model.add(Dense(30, activation = tfnn.relu, input_dim = 30))
model.add(BatchNormalization(axis = 1))
model.add(Dense(60, activation = tfnn.relu))
model.add(BatchNormalization(axis = 1))
model.add(Dense(1, activation = tfnn.softmax))
model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
model.fit(data, target, epochs = 6)
And my output:
Epoch 1/6
569/569 [==============================] - 2s 3ms/sample - loss: 10.0025 - acc: 0.3726
Epoch 2/6
569/569 [==============================] - 0s 172us/sample - loss: 10.0025 - acc: 0.3726
Epoch 3/6
569/569 [==============================] - 0s 176us/sample - loss: 10.0025 - acc: 0.3726
Epoch 4/6
569/569 [==============================] - 0s 167us/sample - loss: 10.0025 - acc: 0.3726
Epoch 5/6
569/569 [==============================] - 0s 163us/sample - loss: 10.0025 - acc: 0.3726
Epoch 6/6
569/569 [==============================] - 0s 169us/sample - loss: 10.0025 - acc: 0.3726
I seems that NN after a few iterations stops learning (look at the time of epochs learning, in the first epoch it's 2s and in others it's 0s and in first epoch speed of processing the data is ms/sample, but in other epochs iits us/sample)
Thank you for your time!

Softmax has sum=1.
You can't use softmax with 1 unit. It will always be 1.
Use 'sigmoid'.
Also be careful with 'relu'. It may (by luck) fall into an "all-zeros" region and stop evolving.
Ideally, the batch normalization should be before it (this way you guarantee that there will always be some positive numbers):
model = Sequential()
model.add(Dense(30, input_dim = 30))
model.add(BatchNormalization(axis = 1))
model.add(Activation(tfnn.relu))
model.add(Dense(60)
model.add(BatchNormalization(axis = 1))
model.add(Activation(tfnn.relu))
model.add(Dense(1, activation = tfnn.sigmoid))

Since you have a binary classification task with a single-unit final layer, you should not use tfnn.softmax as an activation for this layer. Use tfnn.sigmoid instead, i.e.
model.add(Dense(1, activation = tfnn.sigmoid)) # last layer

Related

how to define input_shape in keras model properly

New to tensorflow.
Following is the datasets I am working on:
abalone_train = pd.read_csv(
"https://storage.googleapis.com/download.tensorflow.org/data/abalone_train.csv",
names=["Length", "Diameter", "Height", "Whole weight", "Shucked weight",
"Viscera weight", "Shell weight", "Age"])
abalone_train.head()
abalone_cols = abalone_train.columns
y_train = abalone_train[abalone_cols[-1]]
x_train = abalone_train[abalone_cols[:-1]]
I tried 2 iterations of model:
1st iteration:
model = tf.keras.models.Sequential([
tf.keras.layers.InputLayer(input_shape = (None,7)),
tf.keras.layers.Dense(20, activation='relu'),
tf.keras.layers.Dense(10, activation='relu'),
tf.keras.layers.Dense(2, activation = 'relu'),
tf.keras.layers.Dense(1, activation = 'relu'),
]
)
model.compile(optimizer = 'sgd', loss = 'mean_squared_error')
x_train_np = np.array(x_train)
y_train_np = np.array(y_train)
modelcheck = model.fit(x_train_np, y_train_np, epochs = 5)
2nd iteration:
Similar to 1st one, but I only changed the input_shape:
model = tf.keras.models.Sequential([
tf.keras.layers.InputLayer(input_shape = (7,)),
tf.keras.layers.Dense(20, activation='relu'),
tf.keras.layers.Dense(10, activation='relu'),
tf.keras.layers.Dense(2, activation = 'relu'),
tf.keras.layers.Dense(1, activation = 'relu'),
]
)
model.compile(optimizer = 'sgd', loss = 'mean_squared_error')
x_train_np = np.array(x_train)
y_train_np = np.array(y_train)
modelcheck = model.fit(x_train_np, y_train_np, epochs = 5)
It looks like that in the first iteration, I get constant loss of 108.0 across iterations and epochs:
104/104 [==============================] - 1s 4ms/step - loss: 108.2235
Epoch 2/5
104/104 [==============================] - 0s 4ms/step - loss: 108.2235
Epoch 3/5
104/104 [==============================] - 0s 4ms/step - loss: 108.2235
Epoch 4/5
104/104 [==============================] - 0s 4ms/step - loss: 108.2235
Epoch 5/5
104/104 [==============================] - 0s 4ms/step - loss: 108.2235
In the 2nd one, the code is working fine and I am getting a loss as follows:
Epoch 1/5
104/104 [==============================] - 1s 5ms/step - loss: 13.9729
Epoch 2/5
104/104 [==============================] - 0s 4ms/step - loss: 8.0497
Epoch 3/5
104/104 [==============================] - 0s 4ms/step - loss: 7.4067
Epoch 4/5
104/104 [==============================] - 0s 4ms/step - loss: 6.9215
Epoch 5/5
104/104 [==============================] - 0s 5ms/step - loss: 6.5436
I don't seem to understand how keras is treating these two iterations differently. From what I have read, even if I put 'None' at the beginning, it should not matter as it is the 'batch_size'.
Am I missing something here?! Any guidance would be really helpful!
In the input layer you don't define the batch size. You just define the shape of the input, excluding the batch size. Keras automatically adds the None value in the front of the shape of each layer, which is later replaced by the batch size.
So in the 1st iteration, you have an incorrect input shape. You want to have the 7 inputs in a vector of shape (7, 1) because your data is made up of rows of 7 elements. Thus, the correct input shape is (7,).

Keras weighted_metrics does not include sample weights in calculation [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I am training a CNN model with a 2D tensor of shape (400,22) as both input and output. I am using categorical_crossentropy both as loss and metric. However the loss/metrics values are very different.
My model is somewhat like this:
1. Using sample weights, and passing metrics with metrics= in model.compile.
# Imports
import tensorflow as tf
from tensorflow.keras.layers import *
from tensorflow.keras.optimizers import *
from tensorflow.keras.regularizers import *
from tensorflow.keras import *
import numpy as np
# Build the model
X_input = Input(shape=(400,22))
X = Conv1D(filters=32, kernel_size=2, activation='elu',
kernel_regularizer=L2(1e-4), bias_regularizer=L2(1e-4),
padding='same')(X_input)
X = Dropout(0.2)(X)
X = Conv1D(filters=32, kernel_size=2, activation='elu',
kernel_regularizer=L2(1e-4), bias_regularizer=L2(1e-4),
padding='same')(X)
X = Dropout(0.2)(X)
y = Conv1D(filters=22, kernel_size=1, activation='softmax',
kernel_regularizer=L2(1e-4), bias_regularizer=L2(1e-4),
padding='same')(X)
model = Model(X_input, y, name='mymodel')
# Compile and train the model (with metrics=[])
model.compile(optimizer=Adam(1e-3),
loss=tf.keras.losses.categorical_crossentropy,
metrics=[tf.keras.losses.categorical_crossentropy])
Xtrain = np.random.rand(20,400,22)
ytrain = np.random.rand(20,400,22)
np.random.seed(0)
sample_weight = np.random.choice([0.01, 0.1, 1], size=20)
history = model.fit(x=Xtrain, y=ytrain, sample_weight=sample_weight, epochs=4)
Epoch 1/4
1/1 [==============================] - 0s 824us/step - loss: 10.2952 - categorical_crossentropy: 34.9296
Epoch 2/4
1/1 [==============================] - 0s 785us/step - loss: 10.2538 - categorical_crossentropy: 34.7858
Epoch 3/4
1/1 [==============================] - 0s 772us/step - loss: 10.2181 - categorical_crossentropy: 34.6719
Epoch 4/4
1/1 [==============================] - 0s 766us/step - loss: 10.1903 - categorical_crossentropy: 34.5797
From the results, it is evident that Keras is not using sample weights in the calculation of metrics, hence it is larger than the loss. If we change the sample weights to ones, we get the following:
2. Sample weights = ones, passing metrics with metrics= in `model.compile.
# Compile and train the model
model.compile(optimizer=Adam(1e-3),
loss=tf.keras.losses.categorical_crossentropy,
metrics=[tf.keras.losses.categorical_crossentropy])
Xtrain = np.random.rand(20,400,22)
ytrain = np.random.rand(20,400,22)
np.random.seed(0)
sample_weight = np.ones((20,))
history = model.fit(x=Xtrain, y=ytrain, sample_weight=sample_weight, epochs=4)
Epoch 1/4
1/1 [==============================] - 0s 789us/step - loss: 35.2659 - categorical_crossentropy: 35.2573
Epoch 2/4
1/1 [==============================] - 0s 792us/step - loss: 35.0647 - categorical_crossentropy: 35.0562
Epoch 3/4
1/1 [==============================] - 0s 778us/step - loss: 34.9301 - categorical_crossentropy: 34.9216
Epoch 4/4
1/1 [==============================] - 0s 736us/step - loss: 34.8076 - categorical_crossentropy: 34.7991
Now the metrics and loss are quite close with sample weights of ones. I understand that the loss is slightly larger than metrics due to the effects of dropout, regularization, and the fact that the metric is computed at the end of each epoch, whereas the loss is the average over the batches in the training.
How can I get the metrics to include the sample weights??
3. UPDATED: using sample weights, and passing metrics with weighted_metrics= in model.compile.
It was suggested that I used weighted_metrics=[...] instead of metrics=[...] in model.compile. However, Keras still does not include the sample weights in the evaluation of the metrics.
# Compile and train the model
model.compile(optimizer=Adam(1e-3),
loss=tf.keras.losses.categorical_crossentropy,
weighted_metrics=[tf.keras.losses.categorical_crossentropy])
Xtrain = np.random.rand(20,400,22)
ytrain = np.random.rand(20,400,22)
np.random.seed(0)
sample_weight = np.random.choice([0.01, 0.1, 1], size=20)
history = model.fit(x=Xtrain, y=ytrain, sample_weight=sample_weight, epochs=4)
Epoch 1/4
1/1 [==============================] - 0s 764us/step - loss: 10.2581 - categorical_crossentropy: 34.9224
Epoch 2/4
1/1 [==============================] - 0s 739us/step - loss: 10.2251 - categorical_crossentropy: 34.8100
Epoch 3/4
1/1 [==============================] - 0s 755us/step - loss: 10.1854 - categorical_crossentropy: 34.6747
Epoch 4/4
1/1 [==============================] - 0s 746us/step - loss: 10.1631 - categorical_crossentropy: 34.5990
What can be done to ensure that the sample weights are evaluated in the metrics?
Keras does not automatically include sample weights in the evaluation of metrics. That's why there is a huge difference between the loss and the metrics.
If you'll like to include sample weights when evaluating metrics, pass them as weighted_metrics rather than metrics.
model.compile(optimizer=Adam(1e-3),
loss=tf.keras.losses.categorical_crossentropy,
weighted_metrics=[tf.keras.losses.categorical_crossentropy]))
First of all, categorical cross-entropy is usually not used as a metric. Secondly, you are doing some type of seq2seq task, I hope you design the model with that intention.
Finally, in your setup, using sample_weight only works on the loss, it has no effect on the metrics or validation. There are other small bugs in your code too. Here is the fixed working code:
ref: TF 2.3.0 training keras model using tf dataset with sample weights does not apply to metrics (why sample_weight only works on loss)
import tensorflow as tf
from tensorflow.keras.layers import *
from tensorflow.keras.optimizers import *
from tensorflow.keras import *
import numpy as np
X_input = Input(shape=(400,22))
X = Conv1D(filters=32, kernel_size=2, activation='elu', kernel_regularizer=L2(1e-4), bias_regularizer=L2(1e-4), padding='same')(X_input)
X = Dropout(0.2)(X)
X = Conv1D(filters=32, kernel_size=2, activation='elu', kernel_regularizer=L2(1e-4), bias_regularizer=L2(1e-4), padding='same')(X)
X = Dropout(0.2)(X)
y = Conv1D(filters=22, kernel_size=1, activation='softmax', kernel_regularizer=L2(1e-4), bias_regularizer=L2(1e-4), padding='same')(X)
model = Model(X_input, y, name='mymodel')
model.compile(optimizer=Adam(1e-3), loss=tf.keras.losses.categorical_crossentropy,
metrics=[tf.keras.losses.categorical_crossentropy])
Xtrain = np.random.rand(10,400,22)
ytrain = np.random.rand(10,400,22)
history = model.fit(Xtrain, ytrain, sample_weight=np.ones(10), epochs=10)
Epoch 1/10
1/1 [==============================] - 1s 719ms/step - loss: 35.4521 - categorical_crossentropy: 35.4437
Epoch 2/10
1/1 [==============================] - 0s 20ms/step - loss: 35.5138 - categorical_crossentropy: 35.5054
Epoch 3/10
1/1 [==============================] - 0s 19ms/step - loss: 35.5984 - categorical_crossentropy: 35.5900
Epoch 4/10
1/1 [==============================] - 0s 19ms/step - loss: 35.6617 - categorical_crossentropy: 35.6533
Epoch 5/10
1/1 [==============================] - 0s 19ms/step - loss: 35.7807 - categorical_crossentropy: 35.7723
Epoch 6/10
1/1 [==============================] - 0s 19ms/step - loss: 35.9045 - categorical_crossentropy: 35.8961
Epoch 7/10
1/1 [==============================] - 0s 18ms/step - loss: 36.0590 - categorical_crossentropy: 36.0505
Epoch 8/10
1/1 [==============================] - 0s 19ms/step - loss: 36.2040 - categorical_crossentropy: 36.1956
Epoch 9/10
1/1 [==============================] - 0s 18ms/step - loss: 36.4169 - categorical_crossentropy: 36.4084
Epoch 10/10
1/1 [==============================] - 0s 32ms/step - loss: 36.6622 - categorical_crossentropy: 36.6538
Here, if you use no sample_weight or 1 for each sample, you will get close/similar categorical cross-entropy.
Use weighted_metrics according to docs.

Using Keras to give a score for combination of words

I have a file with a set of "words" for example:
1a 9( 9j = 2453
3a 4( 6j 0s = 2309
1 7( 8ll = 4934
It looks like random data but it isn't, it has a score for each set of "words". My file consists of about 1million lines and there is definately patterns in it. There are about 3600 unqiue individual words.
The end column contains a score for that particular arrangement of words.
I have encoded each line to ints and padded them with 0's and put them in a file called words.txt
an example of that file would be:
475,12,2495,2934,105,0,0,0,9384 (last column being the output score)
Now I have this code:
When I run it, it's loss/accuracy is very bad, loss is like 70000000.
What am I doing wrong?
from numpy import loadtxt
from itertools import islice
from keras.models import Sequential
from keras.layers import Dense
# load the dataset
dataset = loadtxt('words.txt', delimiter=',')
X = dataset[:,0:8]
y = dataset[:,8]
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=150, batch_size=10000)
_, accuracy = model.evaluate(X, y)
print('Accuracy: %.2f' % (accuracy*100))
My goal is to predict the score of random combinations of words that I generate.
Log of fit:
:\AI>python main.py
Using TensorFlow backend.
2021-02-14 08:52:48.350476: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
C:\Users\fordy\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow_core\python\framework\indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Epoch 1/150
1047711/1047711 [==============================] - 7s 7us/step - loss: 72945595445.1503 - accuracy: 0.2351
Epoch 2/150
1047711/1047711 [==============================] - 3s 3us/step - loss: 72940365091.2725 - accuracy: 0.0016
Epoch 3/150
1047711/1047711 [==============================] - 3s 3us/step - loss: 72922327250.8712 - accuracy: 0.0016
Epoch 4/150
1047711/1047711 [==============================] - 2s 2us/step - loss: 72883151430.7776 - accuracy: 0.0030
Epoch 5/150
1047711/1047711 [==============================] - 2s 2us/step - loss: 72815216732.1170 - accuracy: 0.0041
Epoch 6/150
1047711/1047711 [==============================] - 2s 2us/step - loss: 72711719248.6156 - accuracy: 0.0012
Epoch 7/150
1047711/1047711 [==============================] - 2s 2us/step - loss: 72566884174.8089 - accuracy: 1.5271e-05
Your model is too small. Try adding embedding layer and LSTM:
model = Sequential()
model.add(tf.keras.layers.Embedding(3600, 12, input_length=8)) # <= adjust vocab size
model.add(tf.keras.layers.LSTM(8))
# model.add(tf.keras.layers.Dense(12, input_dim=8, activation='relu'))
# model.add(tf.keras.layers.Dense(8, activation='relu'))
model.add(Dense(1))

How to control if input features contribute exclusively to one neuron in subsequent layer of a Tensorflow neural network?

I'm trying to make the most basic of basic neural networks to get familiar with functional API in Tensorflow 2.x.
Basically what I'm trying to do is the following with my simplified iris dataset (i.e. setosa or not)
Use the 4 features as input
Dense layer of 3
Sigmoid activation function
Dense layer of 2 (one for each class)
Softmax activation
Binary cross entropy / log-loss as my loss function
However, I can't figure out how to control one key aspect of the model. That is, how can I ensure that each feature from my input layer contributes to only one neuron in my subsequent dense layer? Also, how can I allow a feature to contribute to more than one neuron?
This isn't clear to me from the documentation.
# Load data
from sklearn.datasets import load_iris
import pandas as pd
iris = load_iris()
X, y = load_iris(return_X_y=True, as_frame=True)
X = X.astype("float32")
X.index = X.index.map(lambda i: "iris_{}".format(i))
X.columns = X.columns.map(lambda j: j.split(" (")[0].replace(" ","_"))
y.index = X.index
y = y.map(lambda i:iris.target_names[i])
y_simplified = y.map(lambda i: {True:1, False:0}[i == "setosa"])
y_simplified = pd.get_dummies(y_simplified, columns=["setosa", "not_setosa"])
# Traing test split
from sklearn.model_selection import train_test_split
seed=0
X_train,X_test, y_train,y_test= train_test_split(X,y_simplified, test_size=0.3, random_state=seed)
# Simple neural network
import tensorflow as tf
tf.random.set_seed(seed)
# Input[4 features] -> Dense layer of 3 neurons -> Activation function -> Dense layer of 2 (one per class) -> Softmax
inputs = tf.keras.Input(shape=(4))
x = tf.keras.layers.Dense(3)(inputs)
x = tf.keras.layers.Activation(tf.nn.sigmoid)(x)
x = tf.keras.layers.Dense(2)(x)
outputs = tf.keras.layers.Activation(tf.nn.softmax)(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs, name="simple_binary_iris")
model.compile(loss="binary_crossentropy", metrics=["accuracy"] )
model.summary()
history = model.fit(X_train, y_train, batch_size=64, epochs=10, validation_split=0.2)
test_scores = model.evaluate(X_test, y_test)
print("Test loss:", test_scores[0])
print("Test accuracy:", test_scores[1])
Results:
Model: "simple_binary_iris"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_44 (InputLayer) [(None, 4)] 0
_________________________________________________________________
dense_96 (Dense) (None, 3) 15
_________________________________________________________________
activation_70 (Activation) (None, 3) 0
_________________________________________________________________
dense_97 (Dense) (None, 2) 8
_________________________________________________________________
activation_71 (Activation) (None, 2) 0
=================================================================
Total params: 23
Trainable params: 23
Non-trainable params: 0
_________________________________________________________________
Epoch 1/10
2/2 [==============================] - 0s 40ms/step - loss: 0.6344 - accuracy: 0.6667 - val_loss: 0.6107 - val_accuracy: 0.7143
Epoch 2/10
2/2 [==============================] - 0s 6ms/step - loss: 0.6302 - accuracy: 0.6667 - val_loss: 0.6083 - val_accuracy: 0.7143
Epoch 3/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6278 - accuracy: 0.6667 - val_loss: 0.6056 - val_accuracy: 0.7143
Epoch 4/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6257 - accuracy: 0.6667 - val_loss: 0.6038 - val_accuracy: 0.7143
Epoch 5/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6239 - accuracy: 0.6667 - val_loss: 0.6014 - val_accuracy: 0.7143
Epoch 6/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6223 - accuracy: 0.6667 - val_loss: 0.6002 - val_accuracy: 0.7143
Epoch 7/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6209 - accuracy: 0.6667 - val_loss: 0.5989 - val_accuracy: 0.7143
Epoch 8/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6195 - accuracy: 0.6667 - val_loss: 0.5967 - val_accuracy: 0.7143
Epoch 9/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6179 - accuracy: 0.6667 - val_loss: 0.5953 - val_accuracy: 0.7143
Epoch 10/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6166 - accuracy: 0.6667 - val_loss: 0.5935 - val_accuracy: 0.7143
2/2 [==============================] - 0s 607us/step - loss: 0.6261 - accuracy: 0.6444
Test loss: 0.6261375546455383
Test accuracy: 0.644444465637207
how can I ensure that each feature from my input layer contributes to
only one neuron in my subsequent dense layer?
Have one input layer per feature and feed each input layer to a separate dense layer. Later you can concatenate the output of all the dense layers and proceed.
NOTE: One neuron can take any size input (in this case the input size is 1 as you want one feature to be used by the neuron) and the output size if always 1. A Dense layer with with n units will have n neurons and and so will have output size of n.
Working Sample
import tensorflow as tf
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# Model architecutre
x1 = tf.keras.Input(shape=(1,))
x2 = tf.keras.Input(shape=(1,))
x3 = tf.keras.Input(shape=(1,))
x4 = tf.keras.Input(shape=(1,))
x1_ = tf.keras.layers.Dense(3, activation=tf.nn.relu)(x1)
x2_ = tf.keras.layers.Dense(3, activation=tf.nn.relu)(x2)
x3_ = tf.keras.layers.Dense(3, activation=tf.nn.relu)(x3)
x4_ = tf.keras.layers.Dense(3, activation=tf.nn.relu)(x4)
merged = tf.keras.layers.concatenate([x1_, x2_, x3_, x4_])
merged = tf.keras.layers.Dense(16, activation=tf.nn.relu)(merged)
outputs = tf.keras.layers.Dense(3, activation=tf.nn.softmax)(merged)
model = tf.keras.Model(inputs=[x1,x2,x3,x4], outputs=outputs)
model.compile(loss="sparse_categorical_crossentropy", metrics=["accuracy"] )
# Load and prepare data
iris = load_iris()
X = iris.data
y = iris.target
X_train,X_test, y_train,y_test= train_test_split(X,y, test_size=0.3)
# Fit the model
model.fit([X_train[:,0],X_train[:,1],X_train[:,2],X_train[:,3]], y_train, batch_size=64, epochs=100, validation_split=0.25)
# Evaluate the model
test_scores = model.evaluate([X_test[:,0],X_test[:,1],X_test[:,2],X_test[:,3]], y_test)
print("Test loss:", test_scores[0])
print("Test accuracy:", test_scores[1])
Output:
Epoch 1/100
2/2 [==============================] - 0s 75ms/step - loss: 1.6446 - accuracy: 0.4359 - val_loss: 1.6809 - val_accuracy: 0.5185
Epoch 2/100
2/2 [==============================] - 0s 10ms/step - loss: 1.4151 - accuracy: 0.6154 - val_loss: 1.4886 - val_accuracy: 0.5556
Epoch 3/100
2/2 [==============================] - 0s 9ms/step - loss: 1.2725 - accuracy: 0.6795 - val_loss: 1.3813 - val_accuracy: 0.5556
Epoch 4/100
2/2 [==============================] - 0s 9ms/step - loss: 1.1829 - accuracy: 0.6795 - val_loss: 1.2779 - val_accuracy: 0.5926
Epoch 5/100
2/2 [==============================] - 0s 10ms/step - loss: 1.0994 - accuracy: 0.6795 - val_loss: 1.1846 - val_accuracy: 0.5926
Epoch 6/100
.................. [ Truncated ]
Epoch 100/100
2/2 [==============================] - 0s 2ms/step - loss: 0.4049 - accuracy: 0.9333
Test loss: 0.40491223335266113
Test accuracy: 0.9333333373069763
Pictorial representation of the above model architecture
Dense layers in Keras/TF are fully connected layers. For example, when you use a Dense layer as follows
inputs = tf.keras.Input(shape=(4))
x = tf.keras.layers.Dense(3)(inputs)
all the 4 connected input neurons are connected to all the 3 output neurons.
There isn't any predefined layer in Keras/TF to specify how to connect input and output neurons. However, Keras/TF is very flexible in that it allows you to define your custom layers easily.
Borrowing the idea from this answer, you could define a CustomConnected layer as follows:
class CustomConnected(tf.keras.layers.Dense):
def __init__(self, units, connections, **kwargs):
self.connections = connections
super(CustomConnected, self).__init__(units, **kwargs)
def call(self, inputs):
self.kernel = self.kernel * self.connections
return super(CustomConnected, self).call(inputs)
Using this layer, you can then specify the connections between two layers through the connections argument. For example:
inputs = tf.keras.Input(shape=(4))
connections = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1], [0, 0, 1]])
x = CustomConnected(3, connections)(inputs)
Here, the 1st, 2nd, and 3rd input neurons are connected to the 1st, 2nd, and 3rd output neurons, respectively. Additionally, the 4th input neuron is connected to the 3rd output neuron.
UPDATE: As discussed in the comments section, an adaptive approach (e.g. by using only the maximum weight for each output neuron) is also possible but not recommended. You could implement this via the following layer:
class CustomSparse(tf.keras.layers.Dense):
def __init__(self, units, **kwargs):
super(CustomSparse, self).__init__(units, **kwargs)
def call(self, inputs):
nb_in, nb_out = self.kernel.shape
argmax = tf.argmax(self.kernel, axis=0) # Shape=(nb_out,)
argmax_onehot = tf.transpose(tf.one_hot(argmax, depth=nb_in)) # Shape=(nb_in, nb_out)
kernel_max = self.kernel * argmax_onehot
# tf.print(kernel_max) # Uncomment this line to print the weights
out = tf.matmul(inputs, kernel_max)
if self.bias is not None:
out += self.bias
if self.activation is not None:
out = self.activation(out)
return out
The main issue of this approach is that you cannot propagate gradients through the argmax operation required to select the maximum weight. As a result, the network will only "switch input neurons" when the selected weight is no longer the maximum weight.

Keras Concatenated Model Doesn't learn

i'm trying to build a model that can predict emotions using 7 models concatenated .
Each of the 7 model represents a part of the face: mouth, left_eye, right_eye...ect
the problem is the model doesn't learn at all: from the 2nd epoch to the last one 100 : i have 15% accuracy, no changes in acuracy or loss during all the epochs.
i think maybe the problem is in my model cocatenated or my fit function ( the train and labels data)
there is 7 Emotions : sad, angry , happy ....ect
Here is my model and my compile and train and my datasets
Model
from keras.layers import Conv2D, MaxPooling2D, Input, concatenate
from keras.models import Sequential, Model
from keras.layers.core import Dense, Dropout, Flatten
def build_all_faceparts_model(input_shape,batch_shape,num_classes):
input1=Input(input_shape)
input2=Input(input_shape)
input3=Input(input_shape)
input4=Input(input_shape)
input5=Input(input_shape)
input6=Input(input_shape)
input7=Input(input_shape)
# Create the model for right eye
right_eye=Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input1, batch_input_shape = batch_shape) (input1)
right_eye=MaxPooling2D(pool_size=(2, 2))(right_eye)
right_eye=Dropout(0.25)(right_eye)
right_eye=Flatten()(right_eye)
# Create the model for leftt eye
left_eye=Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input2, batch_input_shape = batch_shape) (input2)
left_eye=MaxPooling2D(pool_size=(2, 2))(left_eye)
left_eye=Dropout(0.25)(left_eye)
left_eye=Flatten()(left_eye)
# Create the model for right eyebrow
right_eyebrow=Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input3, batch_input_shape = batch_shape) (input3)
right_eyebrow=MaxPooling2D(pool_size=(2, 2))(right_eyebrow)
right_eyebrow=Dropout(0.25)(right_eyebrow)
right_eyebrow=Flatten()(right_eyebrow)
# Create the model for leftt eye
left_eyebrow=Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input4, batch_input_shape = batch_shape) (input4)
left_eyebrow=MaxPooling2D(pool_size=(2, 2))(left_eyebrow)
left_eyebrow=Dropout(0.25)(left_eyebrow)
left_eyebrow=Flatten()(left_eyebrow)
# Create the model for mouth
mouth=Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input5, batch_input_shape = batch_shape) (input5)
mouth=MaxPooling2D(pool_size=(2, 2))(mouth)
mouth=Dropout(0.25)(mouth)
mouth=Flatten()(mouth)
# Create the model for nose
nose=Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input6, batch_input_shape = batch_shape) (input6)
nose=MaxPooling2D(pool_size=(2, 2))(nose)
nose=Dropout(0.25)(nose)
nose=Flatten()(nose)
# Create the model for jaw
jaw=Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input7, batch_input_shape = batch_shape) (input7)
jaw=MaxPooling2D(pool_size=(2, 2))(jaw)
jaw=Dropout(0.25)(jaw)
jaw=Flatten()(jaw)
concatenated = concatenate([right_eye, left_eye, right_eyebrow, left_eyebrow, mouth, nose, jaw],axis = -1)
out = Dense(num_classes, activation='softmax')(concatenated)
model = Model([input1,input2,input3,input4,input5,input6,input7], out)
return model
train and test datasets Here X_train_all is a list of datasets, not like y_train_all
X_train_all=[X_train_mouth,X_train_right_eyebrow,X_train_left_eyebrow,X_train_right_eye,X_train_left_eye,X_train_nose,X_train_jaw]
X_test_all=[X_test_mouth,X_test_right_eyebrow,X_test_left_eyebrow,X_test_right_eye,X_test_left_eye,X_test_nose,X_test_jaw]
y_train_all=y_train_mouth+y_train_right_eyebrow+y_train_left_eyebrow+y_train_right_eye+y_train_left_eye+y_train_nose+y_train_jaw
y_test_all=y_test_mouth+y_test_right_eyebrow+y_test_left_eyebrow+y_test_right_eye+y_test_left_eye+y_test_nose+y_test_jaw
compile
from keras.optimizers import Adam
input_shape =X_train_mouth[0].shape
batch_shape = X_train_mouth[0].shape
model_all_faceparts=build_all_faceparts_model(input_shape,batch_shape,7)
#Compile Model
model_all_faceparts.compile(loss='categorical_crossentropy', optimizer=Adam(lr=1e-3),metrics=["accuracy"])
lr_reducer = ReduceLROnPlateau(monitor='val_loss', factor=0.9, patience=3)
early_stopper = EarlyStopping(monitor='val_acc', min_delta=0, patience=15, mode='auto')
checkpointer = ModelCheckpoint(current_dir+'/weights_jaffe.hd5', monitor='val_loss', verbose=1, save_best_only=True)
Train
history=model_all_faceparts.fit(
X_train_all, y_train_all, batch_size=7, epochs=100, verbose=1,callbacks=[lr_reducer, checkpointer, early_stopper])
output
Epoch 1/100
181/181 [==============================] - 19s 107ms/step - loss: 94.6603 - acc: 0.1271
Epoch 2/100
/usr/local/lib/python3.6/dist-packages/keras/callbacks.py:1109: RuntimeWarning: Reduce LR on plateau conditioned on metric `val_loss` which is not available. Available metrics are: loss,acc,lr
(self.monitor, ','.join(list(logs.keys()))), RuntimeWarning
/usr/local/lib/python3.6/dist-packages/keras/callbacks.py:434: RuntimeWarning: Can save best model only with val_loss available, skipping.
'skipping.' % (self.monitor), RuntimeWarning)
/usr/local/lib/python3.6/dist-packages/keras/callbacks.py:569: RuntimeWarning: Early stopping conditioned on metric `val_acc` which is not available. Available metrics are: loss,acc,lr
(self.monitor, ','.join(list(logs.keys()))), RuntimeWarning
181/181 [==============================] - 15s 81ms/step - loss: 95.9962 - acc: 0.1492
Epoch 3/100
181/181 [==============================] - 15s 81ms/step - loss: 95.9962 - acc: 0.1492
Epoch 4/100
181/181 [==============================] - 15s 83ms/step - loss: 95.9962 - acc: 0.1492
Epoch 5/100
181/181 [==============================] - 15s 84ms/step - loss: 95.9962 - acc: 0.1492
Epoch 6/100
181/181 [==============================] - 15s 85ms/step - loss: 95.9962 - acc: 0.1492
Epoch 7/100
181/181 [==============================] - 16s 86ms/step - loss: 95.9962 - acc: 0.1492
Epoch 8/100
181/181 [==============================] - 16s 87ms/step - loss: 95.9962 - acc: 0.1492
Epoch 9/100
181/181 [==============================] - 16s 86ms/step - loss: 95.9962 - acc: 0.1492
Epoch 10/100
(I completly forgot this post)
The problem was in the model itself, i just changed the model (added some layers) and everything was fine concluding to 93% accuracy!
PS: thanks to the tensorflow support guy that did remind me to post an answer

Categories