Related
I am building a custom Keras layer that it is essentially the softmax function with a base parameter which is trainable. While the layer works on its own, when placed inside a sequential model, model.summary() determines its output shape as None and model.fit() raises a presumably linked exception:
ValueError: as_list() is not defined on an unknown TensorShape.
In other custom layers (including obviously the Linear example from keras) the output shape can be determined after .build() is called. By looking at model.summary()'s source code, as well as keras.layers.Layer, there is this #property Layer.output_shape that fails to automatically determine the output shape.
Then I tried overwriting the property and manually returning the input_shape argument passed to my layer's .build() method after saving it (softmax does not change the shape of the input), but this didn't work either: If i make a call to super().output_shape before returning my value, model.summary() determines the shape as ?, while if I don't, the value may be shown seemingly correct, but in both cases, I get the exact same error during .fit().
Is there something special about the code iside call() that prevents keras from understanding the shape of the output?
Alteratively, is there a piece of documentation I have missed?
My layer:
class B_Softmax(keras.layers.Layer):
def __init__(self, b_init_mean=10, b_init_var=0.001):
super(B_Softmax, self).__init__()
self.b_init = tf.random_normal_initializer(b_init_mean, b_init_var)
self._out_shape = None
def build(self, input_shape):
self.b = tf.Variable(
initial_value = self.b_init(shape=(1,), dtype='float32'),
trainable=True
)
self._out_shape = input_shape
def call(self, inputs):
# This is an implementation of Softmax for batched inputs
# where the factor b is added to the exponents
nominators = tf.math.exp(self.b * inputs)
denominator = tf.reduce_sum(nominators, axis=1)
denominator = tf.squeeze(denominator)
denominator = tf.expand_dims(denominator, -1)
s = tf.divide(nominators, denominator)
return s
#property
def output_shape(self): # If I comment out this function, summary prints 'None'
self.output_shape # If I leave this line, summary prints '?'
return self._out_shape # If the above line is commented out, summary prints '10' (correctly)
# but the same error is triggered in all three cases
The layer works on its own:
>>> A = tf.constant([[1,2,3], [7,5,6]], dtype="float32")
>>> layer = B_Softmax(1.0)
>>> layer(A)
<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[0.08991686, 0.24461554, 0.6654676 ],
[0.6654677 , 0.08991687, 0.24461551]], dtype=float32)>
But when I try to include it inside a model, the summary doesn't look right:
input_dim = 5
model = keras.Sequential([
Dense(32, activation='relu', input_shape=(input_dim,)),
Dense(num_classes, activation="softmax"),
B_Softmax(1.0)
])
model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_10 (Dense) (None, 32) 192
dense_11 (Dense) (None, 10) 330
b__softmax_18 (B_Softmax) None <-------------------1-------- "None", "?", or "10" (in a hacky way) may be printted
=================================================================
Total params: 523
Trainable params: 523
Non-trainable params: 0
And training fails:
batch_size = 128
epochs = 15
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)
ValueError: in user code:
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1051, in train_function *
return step_function(self, iterator)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1040, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1030, in run_step **
outputs = model.train_step(data)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 894, in train_step
return self.compute_metrics(x, y, y_pred, sample_weight)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 987, in compute_metrics
self.compiled_metrics.update_state(y, y_pred, sample_weight)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/compile_utils.py", line 480, in update_state
self.build(y_pred, y_true)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/compile_utils.py", line 398, in build
y_pred)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/compile_utils.py", line 526, in _get_metric_objects
return [self._get_metric_object(m, y_t, y_p) for m in metrics]
File "/usr/local/lib/python3.7/dist-packages/keras/engine/compile_utils.py", line 526, in <listcomp>
return [self._get_metric_object(m, y_t, y_p) for m in metrics]
File "/usr/local/lib/python3.7/dist-packages/keras/engine/compile_utils.py", line 548, in _get_metric_object
y_p_rank = len(y_p.shape.as_list())
ValueError: as_list() is not defined on an unknown TensorShape.
This doesn't directly solves the issue, rather side-stepping it: Instead of using squeeze and expand_dims, the former of which seems to be problematic for Tensorflow to keep track of, we use keepdims=True in the summation to keep the axes aligned correctly for the softmax denominator.
def call(self, inputs):
# This is an implementation of Softmax for batched inputs
# where the factor b is added to the exponents
nominators = tf.math.exp(self.b * inputs)
denominator = tf.reduce_sum(nominators, axis=1, keepdims=True)
s = tf.divide(nominators, denominator)
return s
Arguably, it would be much preferable to make use of the built-in softmax:
def call(self, inputs):
return tf.nn.softmax(self.b * inputs)
You could implement the compute_output_shape method in your Layer subclass:
def compute_output_shape(self, input_shape):
return [(None, out_shape)]
Where out_shape contains the dimensionality of the output, or you can replace the whole tuple to have any output shape you want.
I found no problem with the codes, I think input parameters are important there are some remarks done this way:
I use the dataset and the model.fit() I also use it without splits that is because I create a sample with one record.
Model input, I modified from Input( 5, ) to Input ( 1, 5 ) that matches the dataset creates a shape ( that is why I added choice number 1 into the summary )
Categorize, BatchSize, Number of class, LossFN, and Optimizers are adjusted by the output dimensions of the networks by number_classes parameters.
Sample: << Loss FN does not decisions the model but Input and Output does, not necessary tell how it created removed what it does not use or comment it >>
import tensorflow as tf
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Class / Definition
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
class B_Softmax(tf.keras.layers.Layer):
def __init__(self, b_init_mean=10, b_init_var=0.001):
super(B_Softmax, self).__init__()
self.b_init = tf.random_normal_initializer(b_init_mean, b_init_var)
self._out_shape = None
def build(self, input_shape):
self.b = tf.Variable(
initial_value = self.b_init(shape=(1,), dtype='float32'),
trainable=True
)
self._out_shape = input_shape
def call(self, inputs):
# This is an implementation of Softmax for batched inputs
# where the factor b is added to the exponents
nominators = tf.math.exp(self.b * inputs)
denominator = tf.reduce_sum(nominators, axis=1)
denominator = tf.squeeze(denominator)
denominator = tf.expand_dims(denominator, -1)
s = tf.divide(nominators, denominator)
return s
# #property
# def output_shape(self): # If I comment out this function, summary prints 'None'
# self.output_shape # If I leave this line, summary prints '?'
# return self._out_shape # If the above line is commented out, summary prints '10' (correctly)
# but the same error is triggered in all three cases
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Variables
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
A = tf.constant([[1,2,3], [7,5,6]], dtype="float32")
batch_size = 128
epochs = 15
input_dim = 5
num_classes = 1
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Dataset
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
start = 3
limit = 16
delta = 3
sample = tf.range( start, limit, delta )
sample = tf.cast( sample, dtype=tf.float32 )
sample = tf.constant( sample, shape=( 1, 1, 1, 5 ) )
dataset = tf.data.Dataset.from_tensor_slices(( sample, tf.constant( [0], shape=( 1, 1, 1, 1 ), dtype=tf.int64)))
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Model Initialize
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
layer = B_Softmax(1.0)
print( layer(A) )
model = tf.keras.Sequential([
tf.keras.layers.Dense(32, activation='relu', input_shape=(1, input_dim)),
tf.keras.layers.Dense(num_classes, activation="softmax"),
B_Softmax(1.0)
])
model.summary()
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Working
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
model.fit(dataset, batch_size=batch_size, epochs=epochs, validation_data=dataset)
Output:
tf.Tensor(
[[0.09007736 0.24477491 0.6651477 ]
[0.66514784 0.09007736 0.24477486]], shape=(2, 3), dtype=float32)
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 1, 32) 192
dense_1 (Dense) (None, 1, 1) 33
b__softmax_1 (B_Softmax) None 1
=================================================================
Total params: 226
Trainable params: 226
Non-trainable params: 0
_________________________________________________________________
Epoch 1/15
1/1 [==============================] - 5s 5s/step - loss: 0.0000e+00 - accuracy: 0.0000e+00 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 2/15
1/1 [==============================] - 0s 14ms/step - loss: 0.0000e+00 - accuracy: 0.0000e+00 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 3/15
1/1 [==============================] - 0s 15ms/step - loss: 0.0000e+00 - accuracy: 0.0000e+00 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 4/15
1/1 [==============================] - 0s 13ms/step - loss: 0.0000e+00 - accuracy: 0.0000e+00 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 5/15
1/1 [==============================] - 0s 14ms/step - loss: 0.0000e+00 - accuracy: 0.0000e+00 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 6/15
1/1 [==============================] - 0s 12ms/step - loss: 0.0000e+00 - accuracy: 0.0000e+00 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 7/15
1/1 [==============================] - 0s 13ms/step - loss: 0.0000e+00 - accuracy: 0.0000e+00 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 8/15
1/1 [==============================] - 0s 12ms/step - loss: 0.0000e+00 - accuracy: 0.0000e+00 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 9/15
1/1 [==============================] - 0s 12ms/step - loss: 0.0000e+00 - accuracy: 0.0000e+00 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 10/15
1/1 [==============================] - 0s 12ms/step - loss: 0.0000e+00 - accuracy: 0.0000e+00 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 11/15
1/1 [==============================] - 0s 12ms/step - loss: 0.0000e+00 - accuracy: 0.0000e+00 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 12/15
1/1 [==============================] - 0s 15ms/step - loss: 0.0000e+00 - accuracy: 0.0000e+00 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 13/15
1/1 [==============================] - 0s 14ms/step - loss: 0.0000e+00 - accuracy: 0.0000e+00 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 14/15
1/1 [==============================] - 0s 15ms/step - loss: 0.0000e+00 - accuracy: 0.0000e+00 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 15/15
1/1 [==============================] - 0s 14ms/step - loss: 0.0000e+00 - accuracy: 0.0000e+00 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
C:\Python310>
I have this code in which I am trying to fit a model of a neural network which has just three layers: the input layer, a hidden layer and, at the end, the ouput layer which must have just one neuron for the single ouput. The problem is that when doing the fit I'm always obtaining the same values for the accuracy (null) an the loss (remains constant), and I've tried changing the optimizer from 'sgd' to 'adam' and still anything works as it should be. What would you recommend?
Layer (type) Output Shape Param N°
=================================================================
data_in (InputLayer) [(None, 4, 256)] 0
dense (Dense) (None, 4, 124) 31868
dense_1 (Dense) (None, 4, 1) 125
=================================================================
Total params: 31,993
Trainable params: 31,993
Non-trainable params: 0
_________________________________________________________________
Epoch 1/20
20/20 [==============================] - 8s 350ms/step - loss: 0.3170 - accuracy: 1.7361e-05
Epoch 2/20
20/20 [==============================] - 7s 348ms/step - loss: 0.2009 - accuracy: 6.7817e-08
Epoch 3/20
20/20 [==============================] - 7s 348ms/step - loss: 0.0513 - accuracy: 0.0000e+00
Epoch 4/20
20/20 [==============================] - 7s 348ms/step - loss: 0.0437 - accuracy: 0.0000e+00
Epoch 5/20
20/20 [==============================] - 7s 346ms/step - loss: 0.0430 - accuracy: 0.0000e+00
Epoch 6/20
20/20 [==============================] - 7s 346ms/step - loss: 0.0428 - accuracy: 0.0000e+00
Epoch 7/20
20/20 [==============================] - 7s 345ms/step - loss: 0.0428 - accuracy: 0.0000e+00
Epoch 8/20
20/20 [==============================] - 7s 345ms/step - loss: 0.0430 - accuracy: 0.0000e+00
Epoch 9/20
20/20 [==============================] - 7s 348ms/step - loss: 0.0429 - accuracy: 0.0000e+00
Epoch 10/20
20/20 [==============================] - 7s 348ms/step - loss: 0.0429 - accuracy: 0.0000e+00
Epoch 11/20
20/20 [==============================] - 7s 346ms/step - loss: 0.0429 - accuracy: 0.0000e+00
Epoch 12/20
20/20 [==============================] - 7s 344ms/step - loss: 0.0428 - accuracy: 0.0000e+00
Epoch 13/20
20/20 [==============================] - 7s 348ms/step - loss: 0.0428 - accuracy: 0.0000e+00
Epoch 14/20
20/20 [==============================] - 7s 345ms/step - loss: 0.0433 - accuracy: 0.0000e+00
Epoch 15/20
20/20 [==============================] - 7s 345ms/step - loss: 0.0430 - accuracy: 0.0000e+00
Epoch 16/20
20/20 [==============================] - 7s 347ms/step - loss: 0.0432 - accuracy: 0.0000e+00
Epoch 17/20
20/20 [==============================] - 7s 346ms/step - loss: 0.0429 - accuracy: 0.0000e+00
Epoch 18/20
20/20 [==============================] - 7s 347ms/step - loss: 0.0430 - accuracy: 0.0000e+00
Epoch 19/20
20/20 [==============================] - 7s 348ms/step - loss: 0.0428 - accuracy: 0.0000e+00
Epoch 20/20
20/20 [==============================] - 7s 348ms/step - loss: 0.0428 - accuracy: 0.0000e+00
*TEST*
1800/1800 [==============================] - 3s 2ms/step - loss: 0.0449 - accuracy: 0.0000e+00
accuracy: 0%
My input_shape is (4, 256) and my array of training data has shape (57600, 4, 256), meaning I have 57600 samples of shape (4,256). I also have my training labels array (the values I should obtain with the data), having shape (57600,). Finally, the library I am using is TENSORFLOW.
My code is the next one
from keras.layers import Input, Dense, concatenate, Conv2D, MaxPooling2D, Flatten
from keras.models import Model
from tensorflow import keras
from sklearn.preprocessing import MinMaxScaler
div_n = 240
#DIVIDING THE DATA WE WANT TO CLASIFFY AND ITS LABELS - It is being used the
#scaled data
data = np.array([self_mbyy_scaled,
self_mvyy_scaled,
self_mtpr_scaled,
self_mrho_scaled])
labels = self_iout_scaled
print(np.shape(data))
print(np.shape(labels))
#TRAINING SET AND DATA SET
tr_data = []
tr_labels = []
#Here I'm dividing the whole data in half for the nx, nz dimensions. The first half is the training set and the second half is the test set
for j in range(div_n):
for k in range(div_n):
tr_data.append([data[0][j,:,k],
data[1][j,:,k],
data[2][j,:,k],
data[3][j,:,k]]) #It puts the magnetic field, velocity, temperature and density values in one row for 240x240=57600 columns
tr_labels.append(labels[j,k]) #the values from the column of targets
tr_data = np.array(tr_data)
tr_data = tr_data.reshape(div_n*div_n, len(data), self_ny, 1)
tr_labels = np.array(tr_labels)
print('\n training data shape')
print(np.shape(tr_data))
print('\n training labels shape')
print(np.shape(tr_labels))
te_data = []
te_labels = []
for j in range(div_n):
for k in range(div_n):
te_data.append([data[0][div_n+j,:,div_n+k],
data[1][div_n+j,:,div_n+k],
data[2][div_n+j,:,div_n+k],
data[3][div_n+j,:,div_n+k]]) #It puts the magnetic field, velocity, temperature and density values in one row for 240x240=57600 columns
te_labels.append(labels[div_n+j,div_n+k]) #the values from the column of targets
te_data = np.array(te_data)
te_data = te_data.reshape(div_n*div_n, len(data), self_ny, 1)
te_labels = np.array(te_labels)
print('\n test data shape')
print(np.shape(te_data))
print('\n test labels shape')
print(np.shape(te_labels))
print('\n')
#NEURAL NETWORK MODEL
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(4, 256, 1)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dense(1))
model.summary()
model.compile(
optimizer=keras.optimizers.Adam(0.001),
loss=keras.losses.MeanSquaredError(),
metrics=['accuracy'],
)
model.fit(
tr_data, tr_labels,
epochs=6,
validation_data=ds_valid,
)
Since your data seems to have a spatial dimension 57600, 4, 256 --> (samples, timesteps, features), I would recommend using Conv1D layers instead of Conv2D. Here is a simple working example:
import tensorflow as tf
model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv1D(128, 2, activation='relu', input_shape=(4, 256)))
model.add(tf.keras.layers.Conv1D(64, 2, activation='relu'))
model.add(tf.keras.layers.Conv1D(32, 2, activation='relu'))
model.add(tf.keras.layers.GlobalMaxPool1D())
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(tf.keras.layers.Dense(1))
model.summary()
model.compile(
optimizer=tf.keras.optimizers.Adam(0.001),
loss=tf.keras.losses.MeanSquaredError(),
metrics=['mse'],
)
samples = 50
x = tf.random.normal((50, 4, 256))
y = tf.random.normal((50,))
model.fit(x, y, batch_size=10, epochs=6)
And note that you usually do not use the accuracy metric for the tf.keras.losses.MeanSquaredError loss function.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I am training a CNN model with a 2D tensor of shape (400,22) as both input and output. I am using categorical_crossentropy both as loss and metric. However the loss/metrics values are very different.
My model is somewhat like this:
1. Using sample weights, and passing metrics with metrics= in model.compile.
# Imports
import tensorflow as tf
from tensorflow.keras.layers import *
from tensorflow.keras.optimizers import *
from tensorflow.keras.regularizers import *
from tensorflow.keras import *
import numpy as np
# Build the model
X_input = Input(shape=(400,22))
X = Conv1D(filters=32, kernel_size=2, activation='elu',
kernel_regularizer=L2(1e-4), bias_regularizer=L2(1e-4),
padding='same')(X_input)
X = Dropout(0.2)(X)
X = Conv1D(filters=32, kernel_size=2, activation='elu',
kernel_regularizer=L2(1e-4), bias_regularizer=L2(1e-4),
padding='same')(X)
X = Dropout(0.2)(X)
y = Conv1D(filters=22, kernel_size=1, activation='softmax',
kernel_regularizer=L2(1e-4), bias_regularizer=L2(1e-4),
padding='same')(X)
model = Model(X_input, y, name='mymodel')
# Compile and train the model (with metrics=[])
model.compile(optimizer=Adam(1e-3),
loss=tf.keras.losses.categorical_crossentropy,
metrics=[tf.keras.losses.categorical_crossentropy])
Xtrain = np.random.rand(20,400,22)
ytrain = np.random.rand(20,400,22)
np.random.seed(0)
sample_weight = np.random.choice([0.01, 0.1, 1], size=20)
history = model.fit(x=Xtrain, y=ytrain, sample_weight=sample_weight, epochs=4)
Epoch 1/4
1/1 [==============================] - 0s 824us/step - loss: 10.2952 - categorical_crossentropy: 34.9296
Epoch 2/4
1/1 [==============================] - 0s 785us/step - loss: 10.2538 - categorical_crossentropy: 34.7858
Epoch 3/4
1/1 [==============================] - 0s 772us/step - loss: 10.2181 - categorical_crossentropy: 34.6719
Epoch 4/4
1/1 [==============================] - 0s 766us/step - loss: 10.1903 - categorical_crossentropy: 34.5797
From the results, it is evident that Keras is not using sample weights in the calculation of metrics, hence it is larger than the loss. If we change the sample weights to ones, we get the following:
2. Sample weights = ones, passing metrics with metrics= in `model.compile.
# Compile and train the model
model.compile(optimizer=Adam(1e-3),
loss=tf.keras.losses.categorical_crossentropy,
metrics=[tf.keras.losses.categorical_crossentropy])
Xtrain = np.random.rand(20,400,22)
ytrain = np.random.rand(20,400,22)
np.random.seed(0)
sample_weight = np.ones((20,))
history = model.fit(x=Xtrain, y=ytrain, sample_weight=sample_weight, epochs=4)
Epoch 1/4
1/1 [==============================] - 0s 789us/step - loss: 35.2659 - categorical_crossentropy: 35.2573
Epoch 2/4
1/1 [==============================] - 0s 792us/step - loss: 35.0647 - categorical_crossentropy: 35.0562
Epoch 3/4
1/1 [==============================] - 0s 778us/step - loss: 34.9301 - categorical_crossentropy: 34.9216
Epoch 4/4
1/1 [==============================] - 0s 736us/step - loss: 34.8076 - categorical_crossentropy: 34.7991
Now the metrics and loss are quite close with sample weights of ones. I understand that the loss is slightly larger than metrics due to the effects of dropout, regularization, and the fact that the metric is computed at the end of each epoch, whereas the loss is the average over the batches in the training.
How can I get the metrics to include the sample weights??
3. UPDATED: using sample weights, and passing metrics with weighted_metrics= in model.compile.
It was suggested that I used weighted_metrics=[...] instead of metrics=[...] in model.compile. However, Keras still does not include the sample weights in the evaluation of the metrics.
# Compile and train the model
model.compile(optimizer=Adam(1e-3),
loss=tf.keras.losses.categorical_crossentropy,
weighted_metrics=[tf.keras.losses.categorical_crossentropy])
Xtrain = np.random.rand(20,400,22)
ytrain = np.random.rand(20,400,22)
np.random.seed(0)
sample_weight = np.random.choice([0.01, 0.1, 1], size=20)
history = model.fit(x=Xtrain, y=ytrain, sample_weight=sample_weight, epochs=4)
Epoch 1/4
1/1 [==============================] - 0s 764us/step - loss: 10.2581 - categorical_crossentropy: 34.9224
Epoch 2/4
1/1 [==============================] - 0s 739us/step - loss: 10.2251 - categorical_crossentropy: 34.8100
Epoch 3/4
1/1 [==============================] - 0s 755us/step - loss: 10.1854 - categorical_crossentropy: 34.6747
Epoch 4/4
1/1 [==============================] - 0s 746us/step - loss: 10.1631 - categorical_crossentropy: 34.5990
What can be done to ensure that the sample weights are evaluated in the metrics?
Keras does not automatically include sample weights in the evaluation of metrics. That's why there is a huge difference between the loss and the metrics.
If you'll like to include sample weights when evaluating metrics, pass them as weighted_metrics rather than metrics.
model.compile(optimizer=Adam(1e-3),
loss=tf.keras.losses.categorical_crossentropy,
weighted_metrics=[tf.keras.losses.categorical_crossentropy]))
First of all, categorical cross-entropy is usually not used as a metric. Secondly, you are doing some type of seq2seq task, I hope you design the model with that intention.
Finally, in your setup, using sample_weight only works on the loss, it has no effect on the metrics or validation. There are other small bugs in your code too. Here is the fixed working code:
ref: TF 2.3.0 training keras model using tf dataset with sample weights does not apply to metrics (why sample_weight only works on loss)
import tensorflow as tf
from tensorflow.keras.layers import *
from tensorflow.keras.optimizers import *
from tensorflow.keras import *
import numpy as np
X_input = Input(shape=(400,22))
X = Conv1D(filters=32, kernel_size=2, activation='elu', kernel_regularizer=L2(1e-4), bias_regularizer=L2(1e-4), padding='same')(X_input)
X = Dropout(0.2)(X)
X = Conv1D(filters=32, kernel_size=2, activation='elu', kernel_regularizer=L2(1e-4), bias_regularizer=L2(1e-4), padding='same')(X)
X = Dropout(0.2)(X)
y = Conv1D(filters=22, kernel_size=1, activation='softmax', kernel_regularizer=L2(1e-4), bias_regularizer=L2(1e-4), padding='same')(X)
model = Model(X_input, y, name='mymodel')
model.compile(optimizer=Adam(1e-3), loss=tf.keras.losses.categorical_crossentropy,
metrics=[tf.keras.losses.categorical_crossentropy])
Xtrain = np.random.rand(10,400,22)
ytrain = np.random.rand(10,400,22)
history = model.fit(Xtrain, ytrain, sample_weight=np.ones(10), epochs=10)
Epoch 1/10
1/1 [==============================] - 1s 719ms/step - loss: 35.4521 - categorical_crossentropy: 35.4437
Epoch 2/10
1/1 [==============================] - 0s 20ms/step - loss: 35.5138 - categorical_crossentropy: 35.5054
Epoch 3/10
1/1 [==============================] - 0s 19ms/step - loss: 35.5984 - categorical_crossentropy: 35.5900
Epoch 4/10
1/1 [==============================] - 0s 19ms/step - loss: 35.6617 - categorical_crossentropy: 35.6533
Epoch 5/10
1/1 [==============================] - 0s 19ms/step - loss: 35.7807 - categorical_crossentropy: 35.7723
Epoch 6/10
1/1 [==============================] - 0s 19ms/step - loss: 35.9045 - categorical_crossentropy: 35.8961
Epoch 7/10
1/1 [==============================] - 0s 18ms/step - loss: 36.0590 - categorical_crossentropy: 36.0505
Epoch 8/10
1/1 [==============================] - 0s 19ms/step - loss: 36.2040 - categorical_crossentropy: 36.1956
Epoch 9/10
1/1 [==============================] - 0s 18ms/step - loss: 36.4169 - categorical_crossentropy: 36.4084
Epoch 10/10
1/1 [==============================] - 0s 32ms/step - loss: 36.6622 - categorical_crossentropy: 36.6538
Here, if you use no sample_weight or 1 for each sample, you will get close/similar categorical cross-entropy.
Use weighted_metrics according to docs.
I am trying to build an image classifier that differentiates images into pumps, Turbines, and PCB classes. I am using transfer learning from Inception V3.
Below is my code to initialize InceptionV3
import os
from tensorflow.keras import layers
from tensorflow.keras import Model
!wget --no-check-certificate \
https://storage.googleapis.com/mledu-datasets/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5 \
-O /tmp/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5
from tensorflow.keras.applications.inception_v3 import InceptionV3
local_weights_file = '/tmp/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5'
pre_trained_model = InceptionV3(input_shape = (150, 150, 3),
include_top = False,
weights = None)
pre_trained_model.load_weights(local_weights_file)
for layer in pre_trained_model.layers:
layer.trainable = False
# pre_trained_model.summary()
last_layer = pre_trained_model.get_layer('mixed7')
print('last layer output shape: ', last_layer.output_shape)
last_output = last_layer.output
Next I connect my DNN to the pre-trained model:
from tensorflow.keras.optimizers import RMSprop
# Flatten the output layer to 1 dimension
x = layers.Flatten()(last_output)
# Add a fully connected layer with 1,024 hidden units and ReLU activation
x = layers.Dense(1024, activation='relu')(x)
# Add a dropout rate of 0.2
x = layers.Dropout(0.2)(x)
x = layers.Dense (3, activation='softmax')(x)
model = Model( pre_trained_model.input, x)
model.compile(optimizer = RMSprop(lr=0.0001),
loss = 'categorical_crossentropy',
metrics = ['accuracy'])
I feed in my images using ImageDataGenerator and train the model as below:
history = model.fit(
train_generator,
validation_data = validation_generator,
steps_per_epoch = 100,
epochs = 20,
validation_steps = 50,
verbose = 2)
However, the validation accuracy is not printed/generated after the first epoch:
Epoch 1/20
/usr/local/lib/python3.6/dist-packages/PIL/TiffImagePlugin.py:788: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0.
warnings.warn(str(msg))
/usr/local/lib/python3.6/dist-packages/PIL/Image.py:932: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images
"Palette images with Transparency expressed in bytes should be "
WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches (in this case, 50 batches). You may need to use the repeat() function when building your dataset.
100/100 - 43s - loss: 0.1186 - accuracy: 0.9620 - val_loss: 11.7513 - val_accuracy: 0.3267
Epoch 2/20
100/100 - 41s - loss: 0.1299 - accuracy: 0.9630
Epoch 3/20
100/100 - 39s - loss: 0.0688 - accuracy: 0.9840
Epoch 4/20
100/100 - 39s - loss: 0.0826 - accuracy: 0.9785
Epoch 5/20
100/100 - 39s - loss: 0.0909 - accuracy: 0.9810
Epoch 6/20
100/100 - 39s - loss: 0.0523 - accuracy: 0.9845
Epoch 7/20
100/100 - 38s - loss: 0.0976 - accuracy: 0.9835
Epoch 8/20
100/100 - 39s - loss: 0.0802 - accuracy: 0.9795
Epoch 9/20
100/100 - 39s - loss: 0.0612 - accuracy: 0.9860
Epoch 10/20
100/100 - 40s - loss: 0.0729 - accuracy: 0.9825
Epoch 11/20
100/100 - 39s - loss: 0.0601 - accuracy: 0.9870
Epoch 12/20
100/100 - 39s - loss: 0.0976 - accuracy: 0.9840
Epoch 13/20
100/100 - 39s - loss: 0.0591 - accuracy: 0.9815
Epoch 14/20
I am not understanding as to what is stopping the validation accuracy from being printed/generated. I get an error if the plot a graph on accuracy vs validation accuracy with a message as:
ValueError: x and y must have same first dimension, but have shapes (20,) and (1,)
what am I missing here?
It worked finally, posting my changes here in case if anybody faces issues like these.
So I changed the "weights" parameter in InceptionV3 from None to 'imagenet' and calculated my steps per epoch and validations steps as follows:
steps_per_epoch = np.ceil(no_of_training_images/batch_size)
validation_steps = np.ceil(no_of validation_images/batch_size)
As you see WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least ``steps_per_epoch * epochs`` batches (in this case, 50 batches). You may need to use the repeat() function when building your dataset.
To make sure that you have "at least steps_per_epoch * epochs batches", set the steps_per_epoch to:
steps_per_epoch = X_train.shape[0]//batch_size
I'm trying to make the most basic of basic neural networks to get familiar with functional API in Tensorflow 2.x.
Basically what I'm trying to do is the following with my simplified iris dataset (i.e. setosa or not)
Use the 4 features as input
Dense layer of 3
Sigmoid activation function
Dense layer of 2 (one for each class)
Softmax activation
Binary cross entropy / log-loss as my loss function
However, I can't figure out how to control one key aspect of the model. That is, how can I ensure that each feature from my input layer contributes to only one neuron in my subsequent dense layer? Also, how can I allow a feature to contribute to more than one neuron?
This isn't clear to me from the documentation.
# Load data
from sklearn.datasets import load_iris
import pandas as pd
iris = load_iris()
X, y = load_iris(return_X_y=True, as_frame=True)
X = X.astype("float32")
X.index = X.index.map(lambda i: "iris_{}".format(i))
X.columns = X.columns.map(lambda j: j.split(" (")[0].replace(" ","_"))
y.index = X.index
y = y.map(lambda i:iris.target_names[i])
y_simplified = y.map(lambda i: {True:1, False:0}[i == "setosa"])
y_simplified = pd.get_dummies(y_simplified, columns=["setosa", "not_setosa"])
# Traing test split
from sklearn.model_selection import train_test_split
seed=0
X_train,X_test, y_train,y_test= train_test_split(X,y_simplified, test_size=0.3, random_state=seed)
# Simple neural network
import tensorflow as tf
tf.random.set_seed(seed)
# Input[4 features] -> Dense layer of 3 neurons -> Activation function -> Dense layer of 2 (one per class) -> Softmax
inputs = tf.keras.Input(shape=(4))
x = tf.keras.layers.Dense(3)(inputs)
x = tf.keras.layers.Activation(tf.nn.sigmoid)(x)
x = tf.keras.layers.Dense(2)(x)
outputs = tf.keras.layers.Activation(tf.nn.softmax)(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs, name="simple_binary_iris")
model.compile(loss="binary_crossentropy", metrics=["accuracy"] )
model.summary()
history = model.fit(X_train, y_train, batch_size=64, epochs=10, validation_split=0.2)
test_scores = model.evaluate(X_test, y_test)
print("Test loss:", test_scores[0])
print("Test accuracy:", test_scores[1])
Results:
Model: "simple_binary_iris"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_44 (InputLayer) [(None, 4)] 0
_________________________________________________________________
dense_96 (Dense) (None, 3) 15
_________________________________________________________________
activation_70 (Activation) (None, 3) 0
_________________________________________________________________
dense_97 (Dense) (None, 2) 8
_________________________________________________________________
activation_71 (Activation) (None, 2) 0
=================================================================
Total params: 23
Trainable params: 23
Non-trainable params: 0
_________________________________________________________________
Epoch 1/10
2/2 [==============================] - 0s 40ms/step - loss: 0.6344 - accuracy: 0.6667 - val_loss: 0.6107 - val_accuracy: 0.7143
Epoch 2/10
2/2 [==============================] - 0s 6ms/step - loss: 0.6302 - accuracy: 0.6667 - val_loss: 0.6083 - val_accuracy: 0.7143
Epoch 3/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6278 - accuracy: 0.6667 - val_loss: 0.6056 - val_accuracy: 0.7143
Epoch 4/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6257 - accuracy: 0.6667 - val_loss: 0.6038 - val_accuracy: 0.7143
Epoch 5/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6239 - accuracy: 0.6667 - val_loss: 0.6014 - val_accuracy: 0.7143
Epoch 6/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6223 - accuracy: 0.6667 - val_loss: 0.6002 - val_accuracy: 0.7143
Epoch 7/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6209 - accuracy: 0.6667 - val_loss: 0.5989 - val_accuracy: 0.7143
Epoch 8/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6195 - accuracy: 0.6667 - val_loss: 0.5967 - val_accuracy: 0.7143
Epoch 9/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6179 - accuracy: 0.6667 - val_loss: 0.5953 - val_accuracy: 0.7143
Epoch 10/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6166 - accuracy: 0.6667 - val_loss: 0.5935 - val_accuracy: 0.7143
2/2 [==============================] - 0s 607us/step - loss: 0.6261 - accuracy: 0.6444
Test loss: 0.6261375546455383
Test accuracy: 0.644444465637207
how can I ensure that each feature from my input layer contributes to
only one neuron in my subsequent dense layer?
Have one input layer per feature and feed each input layer to a separate dense layer. Later you can concatenate the output of all the dense layers and proceed.
NOTE: One neuron can take any size input (in this case the input size is 1 as you want one feature to be used by the neuron) and the output size if always 1. A Dense layer with with n units will have n neurons and and so will have output size of n.
Working Sample
import tensorflow as tf
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# Model architecutre
x1 = tf.keras.Input(shape=(1,))
x2 = tf.keras.Input(shape=(1,))
x3 = tf.keras.Input(shape=(1,))
x4 = tf.keras.Input(shape=(1,))
x1_ = tf.keras.layers.Dense(3, activation=tf.nn.relu)(x1)
x2_ = tf.keras.layers.Dense(3, activation=tf.nn.relu)(x2)
x3_ = tf.keras.layers.Dense(3, activation=tf.nn.relu)(x3)
x4_ = tf.keras.layers.Dense(3, activation=tf.nn.relu)(x4)
merged = tf.keras.layers.concatenate([x1_, x2_, x3_, x4_])
merged = tf.keras.layers.Dense(16, activation=tf.nn.relu)(merged)
outputs = tf.keras.layers.Dense(3, activation=tf.nn.softmax)(merged)
model = tf.keras.Model(inputs=[x1,x2,x3,x4], outputs=outputs)
model.compile(loss="sparse_categorical_crossentropy", metrics=["accuracy"] )
# Load and prepare data
iris = load_iris()
X = iris.data
y = iris.target
X_train,X_test, y_train,y_test= train_test_split(X,y, test_size=0.3)
# Fit the model
model.fit([X_train[:,0],X_train[:,1],X_train[:,2],X_train[:,3]], y_train, batch_size=64, epochs=100, validation_split=0.25)
# Evaluate the model
test_scores = model.evaluate([X_test[:,0],X_test[:,1],X_test[:,2],X_test[:,3]], y_test)
print("Test loss:", test_scores[0])
print("Test accuracy:", test_scores[1])
Output:
Epoch 1/100
2/2 [==============================] - 0s 75ms/step - loss: 1.6446 - accuracy: 0.4359 - val_loss: 1.6809 - val_accuracy: 0.5185
Epoch 2/100
2/2 [==============================] - 0s 10ms/step - loss: 1.4151 - accuracy: 0.6154 - val_loss: 1.4886 - val_accuracy: 0.5556
Epoch 3/100
2/2 [==============================] - 0s 9ms/step - loss: 1.2725 - accuracy: 0.6795 - val_loss: 1.3813 - val_accuracy: 0.5556
Epoch 4/100
2/2 [==============================] - 0s 9ms/step - loss: 1.1829 - accuracy: 0.6795 - val_loss: 1.2779 - val_accuracy: 0.5926
Epoch 5/100
2/2 [==============================] - 0s 10ms/step - loss: 1.0994 - accuracy: 0.6795 - val_loss: 1.1846 - val_accuracy: 0.5926
Epoch 6/100
.................. [ Truncated ]
Epoch 100/100
2/2 [==============================] - 0s 2ms/step - loss: 0.4049 - accuracy: 0.9333
Test loss: 0.40491223335266113
Test accuracy: 0.9333333373069763
Pictorial representation of the above model architecture
Dense layers in Keras/TF are fully connected layers. For example, when you use a Dense layer as follows
inputs = tf.keras.Input(shape=(4))
x = tf.keras.layers.Dense(3)(inputs)
all the 4 connected input neurons are connected to all the 3 output neurons.
There isn't any predefined layer in Keras/TF to specify how to connect input and output neurons. However, Keras/TF is very flexible in that it allows you to define your custom layers easily.
Borrowing the idea from this answer, you could define a CustomConnected layer as follows:
class CustomConnected(tf.keras.layers.Dense):
def __init__(self, units, connections, **kwargs):
self.connections = connections
super(CustomConnected, self).__init__(units, **kwargs)
def call(self, inputs):
self.kernel = self.kernel * self.connections
return super(CustomConnected, self).call(inputs)
Using this layer, you can then specify the connections between two layers through the connections argument. For example:
inputs = tf.keras.Input(shape=(4))
connections = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1], [0, 0, 1]])
x = CustomConnected(3, connections)(inputs)
Here, the 1st, 2nd, and 3rd input neurons are connected to the 1st, 2nd, and 3rd output neurons, respectively. Additionally, the 4th input neuron is connected to the 3rd output neuron.
UPDATE: As discussed in the comments section, an adaptive approach (e.g. by using only the maximum weight for each output neuron) is also possible but not recommended. You could implement this via the following layer:
class CustomSparse(tf.keras.layers.Dense):
def __init__(self, units, **kwargs):
super(CustomSparse, self).__init__(units, **kwargs)
def call(self, inputs):
nb_in, nb_out = self.kernel.shape
argmax = tf.argmax(self.kernel, axis=0) # Shape=(nb_out,)
argmax_onehot = tf.transpose(tf.one_hot(argmax, depth=nb_in)) # Shape=(nb_in, nb_out)
kernel_max = self.kernel * argmax_onehot
# tf.print(kernel_max) # Uncomment this line to print the weights
out = tf.matmul(inputs, kernel_max)
if self.bias is not None:
out += self.bias
if self.activation is not None:
out = self.activation(out)
return out
The main issue of this approach is that you cannot propagate gradients through the argmax operation required to select the maximum weight. As a result, the network will only "switch input neurons" when the selected weight is no longer the maximum weight.