I am working on sign language detection using VGG16 pre-trained model with grayscale images. When I am trying to run the model.fit command, I am getting the following error.
CLARIFICATION
I already have images as RGB form but I want to use them as grayscale to check if they would work with grayscale. The reason being, with color images, I am not getting the accuracy which I am expecting. It is having test accuracy of max 40% only and getting overfitted on dataset.
Also, this is my model command
vgg = VGG16(input_shape= [128, 128] + [3], weights='imagenet', include_top=False)
This is my model.fit command
history = model.fit(
train_x,
train_y,
epochs=15,
validation_data=(test_x, test_y),
callbacks=[early_stop, checkpoint],
batch_size=32,shuffle=True)
I am new to working with pre-trained models. When I am trying to run the code with color images with 3 channels, my model is getting into overfitting and val_accuracy doesn't rise above 40% so I want to give try the grayscale images as I have added many data augmentation techniques but accuracy is not improving. Any leads are welcomed as I am stuck into this for long time now.
The simplest (and likely fastest) solution I can think of is to just convert your image to rgb. You can do this as part of your model.
model = Sequential([
tf.keras.layers.Lambda(tf.image.grayscale_to_rgb),
vgg
])
This will fix your issue with VGG. I also see that you're missing the last dimensionality for your images. Images in grayscale are expected to be of shape [height, width, 1], but you simply have [height, width]. You can fix this using tf.expand_dims:
model = Sequential([
tf.keras.layers.Lambda(
lambda x: tf.image.grayscale_to_rgb(tf.expand_dims(x, -1))
),
vgg,
])
Note that this solution solves the problem in the graph, so it runs online. Meaning, at runtime, you can feed data exactly the same way you have it now (in the shape [128, 128], without a channels dimension) and it will still functionally work. If this is your expected dimensionality during runtime, this will be faster than manipulating your data before throwing it into the model.
By the way, none of this is ideal, given that VGG was trained specifically to work best with color images. Just thought I should add that.
Why are you getting overfitting?
Maybe for different reasons:
Your images and labels don't equally exist in the train, Val, test. (maybe you have images in train and don't have them in test.) Or your train, Val, test data don't stratify correctly and you train your model on a specific area in your data and features.
You Dataset is very small and you need more data.
Maybe you have noise in your datase, first make sure to remove noise from the dataset. (if you have noise, model fit on your noise.)
How can you input grayscale images to VGG16?
For Using VGG16, you need to input 3 channels images. For this reason, you need to concatenate your images like below to get three channels images from grayscale:
image = tf.concat([image, image, image], -1)
Example of training VGG16 on grayscale images from fashion_mnist dataset:
from tensorflow.keras.applications.vgg16 import VGG16
import tensorflow_datasets as tfds
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
train, val, test = tfds.load(
'fashion_mnist',
shuffle_files=True,
as_supervised=True,
split = ['train[:85%]', 'train[85%:]', 'test']
)
def resize_preprocess(image, label):
image = tf.image.resize(image, (32, 32))
image = tf.concat([image, image, image], -1)
image = tf.keras.applications.densenet.preprocess_input(image)
return image, label
train = train.map(resize_preprocess, num_parallel_calls=tf.data.AUTOTUNE)
test = test.map(resize_preprocess, num_parallel_calls=tf.data.AUTOTUNE)
val = val.map(resize_preprocess, num_parallel_calls=tf.data.AUTOTUNE)
train = train.repeat(15).batch(64).prefetch(tf.data.AUTOTUNE)
test = test.batch(64).prefetch(tf.data.AUTOTUNE)
val = val.batch(64).prefetch(tf.data.AUTOTUNE)
base_model = VGG16(weights="imagenet", include_top=False, input_shape=(32,32,3))
base_model.trainable = False ## Not trainable weights
model = tf.keras.Sequential()
model.add(base_model)
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(1024, activation='relu'))
model.add(tf.keras.layers.Dropout(rate=.4))
model.add(tf.keras.layers.Dense(256, activation='relu'))
model.add(tf.keras.layers.Dropout(rate=.4))
model.add(tf.keras.layers.Dense(10, activation='sigmoid'))
model.compile(loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
optimizer='Adam',
metrics=['accuracy'])
model.summary()
fit_callbacks = [tf.keras.callbacks.EarlyStopping(
monitor='val_accuracy', patience = 4, restore_best_weights = True)]
history = model.fit(train, steps_per_epoch=150, epochs=5, batch_size=64, validation_data=val, callbacks=fit_callbacks)
model.evaluate(test)
Output:
Model: "sequential_17"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
vgg16 (Functional) (None, 1, 1, 512) 14714688
flatten_3 (Flatten) (None, 512) 0
dense_9 (Dense) (None, 1024) 525312
dropout_6 (Dropout) (None, 1024) 0
dense_10 (Dense) (None, 256) 262400
dropout_7 (Dropout) (None, 256) 0
dense_11 (Dense) (None, 10) 2570
=================================================================
Total params: 15,504,970
Trainable params: 790,282
Non-trainable params: 14,714,688
_________________________________________________________________
Epoch 1/5
150/150 [==============================] - 6s 35ms/step - loss: 0.8056 - accuracy: 0.7217 - val_loss: 0.5433 - val_accuracy: 0.7967
Epoch 2/5
150/150 [==============================] - 4s 26ms/step - loss: 0.5560 - accuracy: 0.7965 - val_loss: 0.4772 - val_accuracy: 0.8224
Epoch 3/5
150/150 [==============================] - 4s 26ms/step - loss: 0.5287 - accuracy: 0.8080 - val_loss: 0.4698 - val_accuracy: 0.8234
Epoch 4/5
150/150 [==============================] - 5s 32ms/step - loss: 0.5012 - accuracy: 0.8149 - val_loss: 0.4334 - val_accuracy: 0.8329
Epoch 5/5
150/150 [==============================] - 4s 25ms/step - loss: 0.4791 - accuracy: 0.8315 - val_loss: 0.4312 - val_accuracy: 0.8398
157/157 [==============================] - 2s 15ms/step - loss: 0.4457 - accuracy: 0.8325
[0.44566288590431213, 0.8324999809265137]
Related
I have tried many combinations in the values for this model.
Can 2D Convolutions be used instead of 1D for the following case?
How can accuracy be improved for the training dataset?
shape of original dataset : (343889, 80)
shape of - training dataset : (257916, 80)
shape of - training Labels : (257916,)
shape of - testing dataset : (85973, 80)
shape of - testing Labels : (85973,)
The model is
inputShape = (80,1,)
model = Sequential()
model.add(Input(shape=inputShape))
model.add(Conv1D(filters=80, kernel_size=30, activation='relu'))
model.add(MaxPooling1D(40))
model.add(Dense(60))
model.add(Dense(9))
model.compile(optimizer='adam', loss='binary_crossentropy',
metrics=['accuracy'])
Model's summary
Model: "sequential_11"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d_11 (Conv1D) (None, 51, 80) 2480
max_pooling1d_9 (MaxPooling (None, 1, 80) 0
1D)
dense_8 (Dense) (None, 1, 60) 4860
dense_9 (Dense) (None, 1, 9) 549
=================================================================
Total params: 7,889
Trainable params: 7,889
Non-trainable params: 0
_________________________________________________________________
The training is given below.
Epoch 1/5
8060/8060 [==============================] - 56s 7ms/step - loss: -25.7724 - accuracy: 0.0015
Epoch 2/5
8060/8060 [==============================] - 44s 5ms/step - loss: -26.7578 - accuracy: 0.0011
Epoch 3/5
8060/8060 [==============================] - 43s 5ms/step - loss: -26.7578 - accuracy: 0.0011
You can try a couple of things to adjust your model performance.
Firstly Try Using Conv2D layers
Modify kernel size to (3,3)
Change optimiser to SGD and loss to Sparse Categorical Crossentropy
Try the following, run the model for a longer epoch and let's see how that goes.
Since you want to classify something, your model is not doing so (at least not directly).
The problems I can see at first sight are:
You use no activation functions (especially in the last layer)
You use 9 output neurons, but binary crossentropy loss.
First of all, in your shoes, I would revise the classification problems with neural network.
About your model, a starting point could be this edit
inputShape = (80,1,)
model = Sequential()
model.add(Conv1D(filters=80, kernel_size=30, activation='relu', input_shape = inputShape))
model.add(MaxPooling1D(40))
model.add(Dense(60), activation='relu') # note activation function
model.add(Dense(9), activation='softmax') # note activation function
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',
metrics=['accuracy']) # note the loss function
I am not saying this is going to solve your problem (without knowing data it is impossible) but it is a start, then you have to work on fighting overfitting, hyperparameters tuning, etc.
I'm implementing my first Neural Network, it being an LSTM for binary sentiment analysis classification. I've pre-processed the data with lowering the letters, tokenizing and removing most punctuation (keeping only .,').
I'm also using GloVe's 100d pre-trained embeddings for this.
The problem is: Whatever I do the accuracy is terrible and doesn't change with epocs (also doesn't change when changing the LSTM architecture)
I've tried changing the optimizer and its learning rate, adding more neurons to the LSTM, changing number of epochs and batch size.
Nothing seems to work
def setLSTM(data, stopRem, stemm, lemma, negHand):
#pre-processing data
data = pre_processing(data, stopRem, stemm, lemma, negHand)
print(data[1])
#splitting data
X_train, X_test, y_train, y_test = datasplit(data)
#Setting the words as unique indexes (max 10k unique indexes)
tokenizer = Tokenizer(num_words=5000)
tokenizer.fit_on_texts(X_train)
X_train = tokenizer.texts_to_sequences(X_train)
X_test = tokenizer.texts_to_sequences(X_test)
#getting vocabulary
vocab = tokenizer.word_index.items()
print(vocab)
vocab_size = vocab_size = len(tokenizer.word_index) + 1
#maxlen = Maxlen is correspondes to the maximum tweet length (so that we can add padding to shorter ones)
maxlen = len(max((X_train + X_test)))
print("Maxlen is: ",maxlen)
#Padding the sequences to guarantee that all tweets have the same length
X_train = pad_sequences(X_train, padding='post', maxlen=maxlen)
X_test = pad_sequences(X_test, padding='post', maxlen=maxlen)
#Create embedding matrix with zeros (because some of the vocabulary might not exist in the embeddings)
#and adding the embeddings we have
embedding_matrix = zeros((vocab_size, 100))
for idx,word in vocab:
embedding_vector = embeddings.get(word)
if embedding_vector is not None:
embedding_matrix[idx] = embedding_vector
#creating the model with its layers (embedding layer, lstm layer, dense layer)
model = Sequential()
#The embedding layer has "trainable=False" because we're using pre-trained embeddings
embedding_layer = Embedding(vocab_size, 100, weights=[embedding_matrix], input_length=maxlen, trainable=False)
model.add(embedding_layer)
model.add(Dropout(0.2))
#Adding an LSTM layer with 128 neurons
model.add(LSTM(units=100))
model.add(Dropout(0.2))
#Adding dense layer with sigmoid activation
model.add(Dense(1, activation='sigmoid'))
#opt = Adam(learning_rate=0.0001, beta_1=0.9, beta_2=0.999, amsgrad=False)
#Compiling model ("loss='binary_crossentropy'" because we're dealing with a binary classification problem)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
print(model.summary())
history = model.fit(X_train, y_train, batch_size=64, epochs=5, verbose=1, validation_split=0.2)
score = model.evaluate(X_test, y_test, verbose=1)
print("Test Score:", score[0])
print("Test Accuracy:", score[1])
setLSTM(tweets,False,False,False,False)
Model: "sequential_9"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_9 (Embedding) (None, 13, 100) 1916600
_________________________________________________________________
dropout_1 (Dropout) (None, 13, 100) 0
_________________________________________________________________
lstm_9 (LSTM) (None, 100) 80400
_________________________________________________________________
dropout_2 (Dropout) (None, 100) 0
_________________________________________________________________
dense_9 (Dense) (None, 1) 101
=================================================================
Total params: 1,997,101
Trainable params: 80,501
Non-trainable params: 1,916,600
_________________________________________________________________
None
Train on 10852 samples, validate on 2713 samples
Epoch 1/5
10852/10852 [==============================] - 5s 448us/step - loss: 0.6920 - acc: 0.5275 - val_loss: 0.6916 - val_acc: 0.5404
Epoch 2/5
10852/10852 [==============================] - 4s 360us/step - loss: 0.6917 - acc: 0.5286 - val_loss: 0.6908 - val_acc: 0.5404
Epoch 3/5
10852/10852 [==============================] - 4s 365us/step - loss: 0.6920 - acc: 0.5286 - val_loss: 0.6907 - val_acc: 0.5404
Epoch 4/5
10852/10852 [==============================] - 4s 382us/step - loss: 0.6916 - acc: 0.5286 - val_loss: 0.6903 - val_acc: 0.5404
Epoch 5/5
10852/10852 [==============================] - 4s 383us/step - loss: 0.6916 - acc: 0.5264 - val_loss: 0.6906 - val_acc: 0.5404
4522/4522 [==============================] - 1s 150us/step
Test Score: 0.6925433831950933
Test Accuracy: 0.5176913142204285
I'm currently working on the CIFAR-10 Dataset which is an image classification problem with 10 classes.
I have started to develop with Tensorflow 2 a Linear Classification without the LinearClassifier Object.
X shape corresponds to 10 000 images of 32*32 pixels RBG = (10000, 3072)
Y_one_hot is a one hot vector = (10000, 10)
model creation code:
model = tf.keras.Sequential()
model.add(Dense(1, activation="linear", input_dim=32*32*3))
model.add(Dense(10, activation="softmax", input_dim=1))
model.compile(optimizer="adam", loss="mean_squared_error", metrics=["accuracy"])
training code:
model.fit(X, Y_one_hot, batch_size=10000, verbose=1, epochs=100)
predict code:
img = X[0].reshape(1, 3072) # Select image 0
res = np.argmax((model.predict(img))) # select the max in output
Problem:
res value is always the same. It seems my model is not learning.
Model.summary
Summary displays :
dense (Dense) (None, 1) 3073
dense_1 (Dense) (None, 10) 20
Total params: 3,093
Trainable params: 3,093
Non-trainable params: 0
Accuracy & loss:
Epoch 1/100
10000/10000 [==============================] - 2s 184us/sample - loss: 0.0949 - accuracy: 0.1005
Epoch 50/100
10000/10000 [==============================] - 0s 10us/sample - loss: 0.0901 - accuracy: 0.1000
Epoch 100/100
10000/10000 [==============================] - 0s 8us/sample - loss: 0.0901 - accuracy: 0.1027
Do you have any idea why my model is always prediciting the same value ?
Thanks,
One remarks:
The loss you used loss="mean_squared_error"is not meant for classification. Is meant for regression. Two very different problems. Try a cross entropy. For example
`model.compile(optimizer=AdamOpt,
loss='categorical_crossentropy', metrics=['accuracy'])`
You can find an example here: https://github.com/michelucci/oreilly-london-ai/blob/master/day1/Beginner%20friendly%20networks/First_Example_of_a_CNN_(CIFAR10).ipynb. Is a note book I used for a training I gave. The network is CNN but you can change it with yours.
Try that...
Best of luck, Umberto
I'm trying to train an autoencoder in Keras for signal processing but I'm somehow failing.
My inputs are segments of 128 frames length for 6 measures (acceleration_x/y/z, gyro_x/y/z), so the overall shape of my dataset is (22836, 128, 6) where 22836 is the sample size.
This is the sample code I'm using for the autoencoder:
X_train, X_test, Y_train, Y_test = load_dataset()
# reshape the input, whose size is (22836, 128, 6)
X_train = X_train.reshape(X_train.shape[0], np.prod(X_train.shape[1:]))
X_test = X_test.reshape(X_test.shape[0], np.prod(X_test.shape[1:]))
# now the shape will be (22836, 768)
### MODEL ###
input_shape = [X_train.shape[1]]
X_input = Input(input_shape)
x = Dense(1000, activation='sigmoid', name='enc0')(X_input)
encoded = Dense(350, activation='sigmoid', name='enc1')(x)
x = Dense(1000, activation='sigmoid', name='dec0')(encoded)
decoded = Dense(input_shape[0], activation='sigmoid', name='dec1')(x)
model = Model(inputs=X_input, outputs=decoded, name='autoencoder')
model.compile(optimizer='rmsprop', loss='mean_squared_error')
print(model.summary())
The output of model.summary() is
Model summary
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_55 (InputLayer) (None, 768) 0
_________________________________________________________________
enc0 (Dense) (None, 1000) 769000
_________________________________________________________________
enc1 (Dense) (None, 350) 350350
_________________________________________________________________
dec1 (Dense) (None, 1000) 351000
_________________________________________________________________
dec0 (Dense) (None, 768) 768768
=================================================================
Total params: 2,239,118
Trainable params: 2,239,118
Non-trainable params: 0
The training is done via
# train the model
history = model.fit(x = X_train, y = X_train,
epochs=5,
batch_size=32,
validation_data=(X_test, X_test))
where I'm simply trying to learn the identity function which yields:
Train on 22836 samples, validate on 5709 samples
Epoch 1/5
22836/22836 [==============================] - 27s 1ms/step - loss: 0.9481 - val_loss: 0.8862
Epoch 2/5
22836/22836 [==============================] - 24s 1ms/step - loss: 0.8669 - val_loss: 0.8358
Epoch 3/5
22836/22836 [==============================] - 25s 1ms/step - loss: 0.8337 - val_loss: 0.8146
Epoch 4/5
22836/22836 [==============================] - 25s 1ms/step - loss: 0.8164 - val_loss: 0.7960
Epoch 5/5
22836/22836 [==============================] - 25s 1ms/step - loss: 0.8004 - val_loss: 0.7819
At this point, to try to understand how well it performed, I check the plot of some true inputs vs the predicted ones:
prediction = model.predict(X_test)
for i in np.random.randint(0, 100, 7):
pred = prediction[i, :].reshape(128,6)
# getting only values for acceleration_x
pred = pred[:, 0]
true = X_test[i, :].reshape(128,6)
# getting only values for acceleration_x
true = true[:, 0]
# plot original and reconstructed
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(20, 6))
ax1.plot(true, color='green')
ax2.plot(pred, color='red')
and these are some of the plots which appear to be completely wrong:
Do you have any suggestion on what's wrong, aside from the small number of epochs (which actually do not seem to make any difference)?
Your data is not in the range [0,1] so why do you use sigmoid as the activation function in the last layer? Remove the activation function from the last layer (and it might be better to use relu in the previous layers).
Also normalize the training data. You can use feature-wise normalization:
X_mean = X_train.mean(axis=0)
X_train -= X_mean
X_std = X_train.std(axis=0)
X_train /= X_std + 1e-8
And don't forget to use the computed statistics (X_mean and X_std) in inference time (i.e. testing) to normalize test data.
I'm beginner with deep learning and keras/tensorflow.
I have followed the first tutorial on tensorflow.org: a basic classification with fashion MNIST.
In this case the input data are 60000, 28x28 images and the model is this:
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation=tf.nn.relu),
keras.layers.Dense(10, activation=tf.nn.softmax)
])
Compiled with:
model.compile(optimizer=tf.train.AdamOptimizer(),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
At the end of training the model has this accuracy:
10000/10000 [==============================] - 0s 21us/step
Test accuracy: 0.8769
It's ok.
Now I'm trying to duplicate this model with another set of datas. New input is a dataset downloaded from kaggle.
The dataset has images with different sized of dogs and cats, so I have create a simple script that get the images, resize in 28x28 pixel and convert in a numpy array.
This is the code to do this:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
from tensorflow.keras.models import load_model
from PIL import Image
import os
# Helper libraries
import numpy as np
# base path dataset
base_path = './dataset/'
training_path = base_path + "training_set/"
test_path = base_path + "test_set/"
# size rate of images
size = 28, 28
#
train_images = []
train_labels = []
test_images = []
test_labels = []
classes = ['dogs', 'cats']
# Scorre sulle cartelle contenute nel path e trasforma le immagini in nparray
def from_files_to_nparray(path):
images = []
labels = []
for subfolder in os.listdir(path):
if subfolder == '.DS_Store':
continue
for image_name in os.listdir(path + subfolder):
if not image_name.endswith('.jpg'):
continue
img = Image.open(path + subfolder + "/" + image_name).convert("L").resize(size) # convert to grayscale and resize
npimage = np.asarray(img)
images.append(npimage)
labels.append(classes.index(subfolder))
img.close()
# convertt to np arrays
images = np.asarray(images)
labels = np.asarray(labels)
# Normalize to [0, 1]
images = images / 255.0
return (images, labels)
(train_images, train_labels) = from_files_to_nparray(training_path)
(test_images, test_labels) = from_files_to_nparray(test_path)
At the end I have these shapes:
Train images shape : (8000, 128, 128)
Labels images shape : (8000,)
Test images shape : (2000, 128, 128)
Test images shape : (2000,)
After training the same model (but with the last dense layer format by 2 neurons) I have this result, that should be ok:
Train images shape : (8000, 28, 28)
Labels images shape : (8000,)
Test images shape : (2000, 28, 28)
Test images shape : (2000,)
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten (Flatten) (None, 784) 0
_________________________________________________________________
dense (Dense) (None, 128) 100480
_________________________________________________________________
dense_1 (Dense) (None, 2) 258
=================================================================
Total params: 100,738
Trainable params: 100,738
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/5
2018-07-27 15:25:51.283117: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
8000/8000 [==============================] - 1s 66us/step - loss: 0.6924 - acc: 0.5466
Epoch 2/5
8000/8000 [==============================] - 0s 39us/step - loss: 0.6679 - acc: 0.5822
Epoch 3/5
8000/8000 [==============================] - 0s 41us/step - loss: 0.6593 - acc: 0.6048
Epoch 4/5
8000/8000 [==============================] - 0s 39us/step - loss: 0.6545 - acc: 0.6134
Epoch 5/5
8000/8000 [==============================] - 0s 39us/step - loss: 0.6559 - acc: 0.6039
2000/2000 [==============================] - 0s 33us/step
Test accuracy: 0.592
Now, the question is, if I try to change the input size from 28x28 to, for example 128x128 the result is this:
Train images shape : (8000, 128, 128)
Labels images shape : (8000,)
Test images shape : (2000, 128, 128)
Test images shape : (2000,)
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten (Flatten) (None, 16384) 0
_________________________________________________________________
dense (Dense) (None, 128) 2097280
_________________________________________________________________
dense_1 (Dense) (None, 2) 258
=================================================================
Total params: 2,097,538
Trainable params: 2,097,538
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/5
2018-07-27 15:27:41.966860: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
8000/8000 [==============================] - 4s 483us/step - loss: 8.0341 - acc: 0.4993
Epoch 2/5
8000/8000 [==============================] - 3s 362us/step - loss: 8.0590 - acc: 0.5000
Epoch 3/5
8000/8000 [==============================] - 3s 351us/step - loss: 8.0590 - acc: 0.5000
Epoch 4/5
8000/8000 [==============================] - 3s 342us/step - loss: 8.0590 - acc: 0.5000
Epoch 5/5
8000/8000 [==============================] - 3s 342us/step - loss: 8.0590 - acc: 0.5000
2000/2000 [==============================] - 0s 217us/step
Test accuracy: 0.5
Why? Though adding a new dense layer or increasing the neuron numbers the result is the same.
What is the connection between the input size and the model layers? Thanks!
The problem is that you have more parameters to train in the second example. In the first example you just have 100k Parameters. You train them with 8k images.
In the second example you have 2000k Parameters and you try to train them with the same amount of images. This does not work because there is a relation between the free parameters and the number of samples. There is no exact formula to calculate this relation, but there is a rule of thumb that you should have more samples than trainable parameters.
What you can try it to train more epochs and to look how it works but in general you need more data for more complex models.