Bert prediction shape not equal to num_samples - python

I have a text classification that I am trying to do using BERT. Below is the code I am using. The model training code(below) works fine but I am facing issue with the prediction part
from transformers import TFBertForSequenceClassification
import tensorflow as tf
# recommended learning rate for Adam 5e-5, 3e-5, 2e-5
learning_rate = 5e-5
nlabels = 26
# we will do just 1 epoch for illustration, though multiple epochs might be better as long as we will not overfit the model
number_of_epochs = 1
# model initialization
model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=nlabels,
# optimizer Adam
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate, epsilon=1e-08)
# we do not have one-hot vectors, we can use sparce categorical cross entropy and accuracy
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
model.compile(optimizer=optimizer, loss=loss, metrics=[metric])
bert_history =, epochs=number_of_epochs)
I am getting the output using the following
preds = model.predict(ds_te_encoded)
pred_labels_idx = np.argmax(preds['logits'], axis=1)
The issue I am facing is that the shape of pred_labels_idx is not the same as ds_te_encoded
len(pred_labels_idx) #426820 #<tf.Tensor: shape=(), dtype=int64, numpy=21341>
Not sure why this is happening.

Since ds_te_encoded is of type and you call cardinality(...), the cardinality in your case is simply the rounded number of batches and not the number of samples. So I am assuming you are using a batch size of 20, because 426820/20 = 21341. That is probably what is causing the confusion.


Overfitted model performs poorly on training data

What can be the cause of accuracy being >90% while model predicts one class in 100% cases in multiclass clasification problem? I would expect that the overfitted model with high accuracy for training data will predict well on training data.
model = tf.keras.Sequential([
tf.keras.layers.Rescaling(scale=1./255., offset=0.0),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(32, activation='relu'),
tf.keras.layers.Dense(3, activation='softmax')
Where mobilenet_custom is a tf.keras.applications.MobileNetV2 model with random weights and input is modified to have 7 channels.
I am trying to classify frames of a clip into three classes. The training data set is kind of balanced:
Total count: 15849
Labels[ A ]: 3906
Labels[ B ]: 5955
Labels[ O ]: 5988
Batch Shape X: (32, 224, 224, 7)
Batch Shape y: (32, 3)
The accuracy is >90% after 2 epochs. (the val_accuracy is around 35%)
However I also record confusion matrices at each epoch end using the Callback class using the following function to collect data for the confusion matrix:
def _collect_validation_data(self, arg_datagen):
predicts = []
truths = []
print("\nCollecting data for epoch evaluation...")
batch_num_per_epoch = arg_datagen.__len__()
for batch_i in range(batch_num_per_epoch):
X_batch, y_batch = arg_datagen[batch_i]
y_pred = self.model.predict(X_batch, verbose=0)
y_true = y_batch
predicts += [item for item in np.argmax(y_pred, axis=1)]
truths += [item for item in np.argmax(y_true, axis=1)]
print("Batch: {}/{}".format(batch_i+1, batch_num_per_epoch), end='\r')
return truths, predicts
Every time the confusion matrices look like following:
When I saved the values of y_pred and y_true that are passed to compute the accuracy it confirms that the accuracy is calculated correctly.
The problem was the batch normalization. When tried on model without batch normalization layers like vgg16, the results were as expected on training data - values of the confusion matrix build from prediction on training data were mostly on the diagonal (when the training accuracy was high).
More detailed discussion here:
A possible solution here: fit() works as expected but then during evaluate() model performs at chance

Selecting validation metric for `categorical_crossentropy` in Keras

I am looking at these two questions and documentation:
Whats the output for Keras categorical_accuracy metrics?
Categorical crossentropy need to use categorical_accuracy or accuracy as the metrics in keras?
For classification of X-Rays images I (15 classes) I do:
# Compile a model
model1.compile(optimizer = 'adam', loss = 'categorical_crossentropy',
metrics = ['accuracy'])
# Fit the model
history1 = model1.fit_generator(train_generator, epochs = 10,
steps_per_epoch = 10, verbose = 1, validation_data = valid_generator)
My model works and I have an output:
But I am not sure how to add validation accuracy here to compare results and avoid over/underfitting.
I hope the following can help you:
The use of "categorical_crossentropy" tells me that your labels are a one hot encoding over different classes.
Let's say you have 15 classes, the correct prediction would be a vector with 14 zeros, and a one at the corresponding index. In this context "accuracy" will be very high as your model will be correctly predicting mostly zero everywhere, so the accuracy should easily be at least 13/15 = 0.86.
A more suitable metric would be "categorical_accuracy" which will give you 1 if the model predicts the correct index, and else 0.
If you have a validation "categorical_accuracy" better than 1/15 = 0.067 (assuming your class are correctly balanced), your model is better than random.
You can find a list of metrics at keras metrics.

LSTM Model not having any variance during evaluation

I have a question regarding the evaluation of an LSTM Model. I have trained an LSTM Model and stored it with Now I want load_model and evaluate it on the validation set datasets. Since neural networks are stochastic, I run it several times and compute the mean and the variance of the different metrics I am interested in.
Now I am shocked that after the first run all consecutive runs have the same performance on every metric. I don't think that is right, but I don't know where the error occurs.
So my question is:
what is my mistake in setting up the validation of my model?
and how can I fix that?
Here are the code snippets that should explain what I am doing:
Compile and fit the Model
def compile_and_fit( hparams,
model_path ):
window = WindowGenerator( input_width= hparams[HP_WINDOW_SIZE],
label_width=hparams[HP_WINDOW_SIZE], shift=1,
label_columns=['q_MARI'], batch_size = hparams[HP_BATCH_SIZE])
model = tf.keras.models.Sequential([
tf.keras.layers.LSTM(hparams[HP_NUM_UNITS], return_sequences=True, name="LSTM_1"),
tf.keras.layers.Dropout(hparams[HP_DROPOUT], name="Dropout_1"),
tf.keras.layers.LSTM(hparams[HP_NUM_UNITS], return_sequences=True, name="LSTM_2"),
learning_rate = hparams[HP_LEARNING_RATE]
history =,
callbacks= get_callbacks(model_path))
_, a,_,_,_,_ = model.evaluate(window.val)
return a, model, history
Train and safe it
a, model, history = compile_and_fit( hparams = hparams, MAX_EPOCHS = MAX_EPOCHS, model_path = run_path)
Load and evaluate it
model = tf.keras.models.load_model(os.path.join(hparam_path, model_name),
custom_objects={"max_error": max_error, "median_absolute_error": median_absolute_error, "rev_metric": rev_metric, "nse_metric": nse_metric})
model.compile(loss=tf.losses.MeanSquaredError(), optimizer="adam", metrics=get_metrics())
metric_values = np.empty(shape = (nr_runs, len(metrics)), dtype=float)
for j in range(nr_runs):
window = WindowGenerator(input_width= hparam_vals[i], label_width=hparam_vals[i], shift=1,
metric_values[j]= np.array(model.evaluate(window.val))
means = metric_values.mean(axis=0)
varis = metric_values.var(axis=0)
print(f'means: {means}, varis: {varis}')
The results I am getting
For setting up the Training I follow those two guides:
LSTM is not stochastic. Evaluation results should be the same for the same data.
There are two steps, when you train the model, randomness will influence the model you trained. However, after that, you saved the model, the prediction result would be same if you use the same model.

Tensorflow neural network doesn’t learn

I built a neural network for a university project. The goal is to find out if sensor data (temperature, humidity and light) can predict if the sunrise happened during a given time frame. So, it is a binary classification.
The problem is that the network does not learn. The accuracy converges towards about 0.8 and does not change after about 5 epochs. Same with the loss, which sits at about 0.4921 after a few epochs. I tried several things like changing the activation function or the number of hidden layers, but nothing worked.
I also created a dataset with an equal amount of "sunrise = 1" and "sunrise = 0" data points. The accuracy ended up at exactly 0,5. Therefore I think that there is something wrong with the network setup itself.
Do you have any idea what could be wrong?
Here is my code:
def build_network():
input = keras.Input(shape=(4,25), name="input")
hidden = layers.Dense(1000, activation="sigmoid", name="dense1")(input)
hidden = layers.Dense(1000, activation="sigmoid", name="dense2")(hidden)
hidden = layers.Flatten()(hidden)
hidden = layers.Dense(500, activation="sigmoid", name="dense3")(hidden)
hidden = layers.Dense(500, activation="sigmoid", name="dense4")(hidden)
hidden = layers.Dense(10, activation="sigmoid", name="dense5")(hidden)
output = layers.Dense(1, activation="sigmoid", name="output")(hidden)
model = keras.Model(inputs=input, outputs=output, name="sunrise_model")
return model
def train_model():
training_files = r'data/training'
test_files = r'data/test'
print('reding files...')
train_x, train_y = load_data(training_files)
test_x, test_y = load_data(test_files)
print("training network")
# compile model
model = build_network()
# Train / fit, train_y, batch_size=100, epochs=200)
# evaluate
test_scores = model.evaluate(test_x, test_y, verbose=2)
print("Test loss:", test_scores[0])
print("Test accuracy:", test_scores[1])
Here is the output: loss: 0.4921 - accuracy: 0.8225
Test loss: 0.4921109309196472,
Test accuracy: 0.8225
And here is an example of the data:
I would use RELU instead of sigmoid as the activation function. What was the learning rate you used? Try a smaller learning rate. Actually I find I get the best results using a variable learning rate. The Keras callback ReduceLROnPlateau makes this easy to do. Documentation is here. I also recommend that you use the Keras callback ModelCheckpoint to save the model with the lowest validation loss then use that model to make predictions on the test set. Documentation is here.I also think your model has to many parameters and will overfit. Add dropout layers to the model to help reduce this problem. I would try reducing the model complexity as a good alternative. Take out in of the layers with 1000 nodes and one of the layers with 500 nodes and see what results you get. I also prefer to use the Adamax optimizer. Documentation is here.. Use the default values.

Regression with LSTM - python and Keras

I am trying to use a LSTM network in Keras to make predictions of timeseries data one step into the future. The data I have is of 5 dimensions, and I am trying to use the previous 3 periods of readings to predict the a future value in the next period. I have normalised the data and removed all NaN etc, and this is the code I am trying to use to train the network:
length = len(OUT)
train_x = IN[:int(0.9 * length)]
validation_x = IN[int(0.9 * length):]
train_y = OUT[:int(0.9 * length)]
validation_y = OUT[int(0.9 * length):]
# Define Network & callback:
train_x = train_x.reshape(train_x.shape[0],3, 5)
validation_x = validation_x.reshape(validation_x.shape[0],3, 5)
model = Sequential()
model.add(LSTM(units=128, return_sequences= True, input_shape=(train_x.shape[1],3)))
model.compile(optimizer='adam', loss='mean_squared_error')
train_y = np.asarray(train_y)
validation_y = np.asarray(validation_y)
history =, train_y, batch_size=BATCH_SIZE, epochs=EPOCHS, validation_data=(validation_x, validation_y))
# Score model
score = model.evaluate(validation_x, validation_y, verbose=0)
print('Test loss:', score)
# Save model"models/new_model")
I am attempting to roughly follow the steps outlined here-
However, no matter what adjustments I have made in terms of changing the number of dimensions used to train the network or the length of the time period I cannot get the output of the model to give predictions that are not either 1 or 0. This is even though the target data, in the array 'OUT' is made up of data continuous on [0,1].
I think there may be something wrong with how I am setting up the .Sequential() function, but I cannot see what to adjust. I am relatively new to this so any help would be greatly appreciated.
You are probably using a prediction function that is not the standard. Maybe you are using predict_classes?
The one that is well documented and the standard is model.predict.
