Why is accuracy and loss staying exactly the same while training?

Why is accuracy and loss staying exactly the same while training? - python

So I have tried to modify the entry-tutorial from https://www.tensorflow.org/tutorials/keras/basic_classification, to work with my own data. The goal is to classify images of dogs and cats. The code is very simple and given below. The problem is that the network does not seem to learn at all, training loss and accuracy stay the same after every epoch.
The images (X_training) and the labels (y_training) seem to have the right format:
X_training.shape returns: (18827, 80, 80, 3)
y_training is a one dimensional list with entries in {0,1}
I have checked several times, that the "images" in X_training are correctly labeled:
Let's say X_training[i,:,:,:] represents a dog, then y_training[i] will return a 1, if X_training[i,:,:,:] represents a cat, then y_training[i] will return a 0.
Shown below is the complete python file without the import statements.
#loading the data from 4 pickle files:
pickle_in = open("X_training.pickle","rb")
X_training = pickle.load(pickle_in)
pickle_in = open("X_testing.pickle","rb")
X_testing = pickle.load(pickle_in)
pickle_in = open("y_training.pickle","rb")
y_training = pickle.load(pickle_in)
pickle_in = open("y_testing.pickle","rb")
y_testing = pickle.load(pickle_in)
#normalizing the input data:
X_training = X_training/255.0
X_testing = X_testing/255.0
#building the model:
model = keras.Sequential([
keras.layers.Flatten(input_shape=(80, 80,3)),
keras.layers.Dense(128, activation=tf.nn.relu),
keras.layers.Dense(1,activation='sigmoid')
])
model.compile(optimizer='adam',loss='mean_squared_error',metrics=['accuracy'])
#running the model:
model.fit(X_training, y_training, epochs=10)
The code compiles and trains for 10 epochs, but neither loss nor accuracy improve, they stay exactly the same after every epoch.
The code works fine with the MNIST-fashion dataset used in the tutorial with slight changes accounting for the difference in multiclass vs binary classification and input shape.

if you want to train a classification model you must have binary_crossentropy as you lost function and not mean_squared_error which is used for regression tasks
replace
model.compile(optimizer='adam',loss='mean_squared_error',metrics=['accuracy'])
with
model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])
Furthermore i would recommend not using relu activation on your dense layer but linear
replace
keras.layers.Dense(128, activation=tf.nn.relu),
with
keras.layers.Dense(128),
and of cource to better use the power of neural networks use some convolutional layers prior your flatten layer

I have found a different implementation with a slightly more complex model that works.
Here is the complete code without the import statements:
#global variables:
batch_size = 32
nr_of_epochs = 64
input_shape = (80,80,3)
#loading the data from 4 pickle files:
pickle_in = open("X_training.pickle","rb")
X_training = pickle.load(pickle_in)
pickle_in = open("X_testing.pickle","rb")
X_testing = pickle.load(pickle_in)
pickle_in = open("y_training.pickle","rb")
y_training = pickle.load(pickle_in)
pickle_in = open("y_testing.pickle","rb")
y_testing = pickle.load(pickle_in)
#building the model
def define_model():
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=input_shape))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# compile model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
return model
model = define_model()
#Possibility for image data augmentation
train_datagen = ImageDataGenerator(rescale=1.0/255.0)
val_datagen = ImageDataGenerator(rescale=1./255.)
train_generator =train_datagen.flow(X_training,y_training,batch_size=batch_size)
val_generator = val_datagen.flow(X_testing,y_testing,batch_size= batch_size)
#running the model
history = model.fit_generator(train_generator,steps_per_epoch=len(X_training) //batch_size,
epochs=nr_of_epochs,validation_data=val_generator,
validation_steps=len(X_testing) //batch_size)

Related

Why Can't I train the ANN for XNOR?

I have made a simple NN for deciding the XNOR values with the Two Binary values in the Input layer.
I have the Numpy array of all the possible combinations with the lables.
Code :
from keras.models import Sequential
from keras.layers import Dense
import numpy
data = numpy.array([[0.,0.,1.],[0.,1.,0.],[1.,0.,0.],[1.,1.,1.]])
train = data[:,:-1] # Taking The same and All data for training
test = data[:,:-1]
train_l = data[:,-1]
test_l = data[:,-1]
train_label = []
test_label = []
for i in train_l:
train_label.append([i])
for i in test_l:
test_label.append([i]) # Just made Labels Single element...
train_label = numpy.array(train_label)
test_label = numpy.array(test_label) # Numpy Conversion
model = Sequential()
model.add(Dense(2,input_dim = 2,activation = 'relu'))
model.add(Dense(2,activation = 'relu'))
model.add(Dense(1,activation = 'relu'))
model.compile(loss = "binary_crossentropy" , metrics = ['accuracy'], optimizer = 'adam')
model.fit(train,train_label, epochs = 10, verbose=2)
model.predict_classes(test)
Even if taking the Same dataset to train and to test... It doesn't predict properly ...
Where was I wrong ?
I have taken whole dataset deliberately as it wasn't predicting with 2 values...

Your architecture is just too simple for this function. If you use the architecture below and train for 100 epochs, you'll get accuracy = 1.
model = Sequential()
model.add(Dense(20,input_dim = 2,activation = 'relu'))
model.add(Dense(20,activation = 'relu'))
model.add(Dense(1,activation = 'sigmoid'))
UPD:
Why a simple model doesn't work that well?
One reason is that with a ReLU activation, if one neuron becomes negative on every data point, its gradient becomes zero, and its weights don't train any more. You have few neurons the start, and if some of them "die" this way, the remaining neurons may not be enough to approximate the function.
Another problem is that fewer neurons make it more likely for a model to get stuck in a local minimum.
However, you are right that theoretically, just a few neurons should be enough.
The model below works even with just one layer. I've replaced ReLU with LeakyReLU to remedy the first problem. It works most of the time, but sometimes gets stuck in a local minimum.
model = Sequential()
model.add(Dense(2,input_dim = 2,activation = LeakyReLU(alpha=0.3)))
model.add(Dense(1,activation = 'sigmoid'))
optimizer = Adam(lr=0.01)
model.compile(loss = "binary_crossentropy" , metrics = ['accuracy'], optimizer=optimizer)
model.fit(train,train_label, epochs = 500, verbose=2)

Reset all weights and biases of the model in Keras (restore the model after training)

Suppose that I have something like this.
model = Sequential()
model.add(LSTM(units = 10 input_shape = (x1, x2)))
model.add(Activation('tanh'))
model.compile(optimizer = 'adam', loss = 'mse')
## Step 1.
model.fit(X_train, Y_train, epochs = 10)
After training the model, I want to reset everything (weights and biases) in the model. So I want to restore the model after compile function (Step 1). What is the fastest way to that in Keras?

Whether it's the fastest is probably up in the air, but it's certainly straightforward and might be good enough for your case: Serialize the initial weights, then deserialize when necessary, and use something like io.BytesIO to avoid the disk I/O hit (and having to clean up afterwards):
from io import BytesIO
model = Sequential()
model.add(LSTM(units = 10, input_shape = (x1, x2)))
model.add(Activation('tanh'))
model.compile(optimizer = 'adam', loss = 'mse')
f = BytesIO()
model.save_weights(f) # Stores the weights
model.fit(X_train, Y_train, epochs = 10)
# [Do whatever you want with your trained model here]
model.load_weights(f) # Resets the weights

Extract features from 2 auto-encoders and feed them into an MLP

I understand that the features extracted from an auto-encoder can be fed into an mlp for classification or regression purpose. This is something that I did earlier.
But what if I have 2 auto-encoders? Can I extract the features from the bottleneck layers of 2 auto-encoders and feed them into an mlp which performs classification based on these features? If yes, then how? I am not sure how to concatenate these two feature sets. I tried with numpy.hstack() which gives me 'unhashable slice' error, whereas, using tf.concat() gives me the error 'Input tensors to a Model must be Keras tensors.' the bottleneck layers of the two auto-encoders are of dimension (None,100) each. So, essentially, if I stack them horizontally, I should be getting a (None, 200). The hidden layer of the mlp may contain some (num_hidden=100) neurons. Could anyone please help?
x1 = autoencoder1.get_layer('encoder2').output
x2 = autoencoder2.get_layer('encoder2').output
#inp = np.hstack((x1, x2))
inp = tf.concat([x1, x2], 1)
x = tf.concat([x1, x2], 1)
h = Dense(num_hidden, activation='relu', name='hidden')(x)
y = Dense(1, activation='sigmoid', name='prediction')(h)
mymlp = Model(inputs=inp, outputs=y)
# Compile model
mymlp.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Train model
mymlp.fit(x_train, y_train, epochs=20, batch_size=8)
updated as per #twolffpiggott's suggestion:
from keras.layers import Input, Dense, Dropout
from keras import layers
from keras.models import Model
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
import numpy as np
x1 = Data1
x2 = Data2
y = Data3
num_neurons1 = x1.shape[1]
num_neurons2 = x2.shape[1]
# Train-test split
x1_train, x1_test, x2_train, x2_test, y_train, y_test = train_test_split(x1, x2, y, test_size=0.2)
# scale data within [0-1] range
scalar = MinMaxScaler()
x1_train = scalar.fit_transform(x1_train)
x1_test = scalar.transform(x1_test)
x2_train = scalar.fit_transform(x2_train)
x2_test = scalar.transform(x2_test)
x_train = np.concatenate([x1_train, x2_train], axis =-1)
x_test = np.concatenate([x1_test, x2_test], axis =-1)
# Auto-encoder1
encoding_dim1 = 500
encoding_dim2 = 100
input_data = Input(shape=(num_neurons1,))
encoded = Dense(encoding_dim1, activation='relu', name='encoder1')(input_data)
encoded1 = Dense(encoding_dim2, activation='relu', name='encoder2')(encoded)
decoded = Dense(encoding_dim2, activation='relu', name='decoder1')(encoded1)
decoded = Dense(num_neurons1, activation='sigmoid', name='decoder2')(decoded)
# this model maps an input to its reconstruction
autoencoder1 = Model(inputs=input_data, outputs=decoded)
autoencoder1.compile(optimizer='sgd', loss='mse')
# training
autoencoder1.fit(x1_train, x1_train,
epochs=100,
batch_size=8,
shuffle=True,
validation_data=(x1_test, x1_test))
# Auto-encoder2
encoding_dim1 = 500
encoding_dim2 = 100
input_data = Input(shape=(num_neurons2,))
encoded = Dense(encoding_dim1, activation='relu', name='encoder1')(input_data)
encoded2 = Dense(encoding_dim2, activation='relu', name='encoder2')(encoded)
decoded = Dense(encoding_dim2, activation='relu', name='decoder1')(encoded2)
decoded = Dense(num_neurons2, activation='sigmoid', name='decoder2')(decoded)
# this model maps an input to its reconstruction
autoencoder2 = Model(inputs=input_data, outputs=decoded)
autoencoder2.compile(optimizer='sgd', loss='mse')
# training
autoencoder2.fit(x2_train, x2_train,
epochs=100,
batch_size=8,
shuffle=True,
validation_data=(x2_test, x2_test))
# MLP
num_hidden = 100
encoded1.trainable = False
encoded2.trainable = False
encoded1 = autoencoder1(autoencoder1.inputs)
encoded2 = autoencoder2(autoencoder2.inputs)
concatenated = layers.concatenate([encoded1, encoded2], axis=-1)
x = Dropout(0.2)(concatenated)
h = Dense(num_hidden, activation='relu', name='hidden')(x)
h = Dropout(0.5)(h)
y = Dense(1, activation='sigmoid', name='prediction')(h)
myMLP = Model(inputs=[autoencoder1.inputs, autoencoder2.inputs], outputs=y)
# Compile model
myMLP.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Training
myMLP.fit(x_train, y_train, epochs=200, batch_size=8)
# Testing
myMLP.predict(x_test)
giving me an error: unhashable type: 'list' from the line:
myMLP = Model(inputs=[autoencoder1.inputs, autoencoder2.inputs], outputs=y)

The problem is that you're mixing numpy arrays with keras tensors. This can't go.
There are two approaches.
Predict numpy arrays from each autoencoder, concat the arrays, send them to the third model
Connect all models, probably make the autoencoders untrainable, fit with one input for each autoencoder.
Personally, I'd go for the first. (Assuming the autoencoders are already trained and don't need change).
First approach
numpyOutputFromAuto1 = autoencoder1.predict(numpyInputs1)
numpyOutputFromAuto2 = autoencoder2.predict(numpyInputs2)
inputDataForThird = np.concatenate([numpyOutputFromAuto1,numpyOutputFromAuto2],axis=-1)
inputTensorForMlp = Input(inputsForThird.shape[1:])
h = Dense(num_hidden, activation='relu', name='hidden')(inputTensorForMlp)
y = Dense(1, activation='sigmoid', name='prediction')(h)
mymlp = Model(inputs=inputTensorForMlp, outputs=y)
....
mymlp.fit(inputDataForThird ,someY)
Second Approach
This is a little more complicated, and at first I don't see much reason to do this. (But of course there may be cases where it's a good choice)
Now we're totally forgetting numpy and working with keras tensors.
Creating the mlp on its own (good if you will use it later without the autoencoders):
inputTensorForMlp = Input(input_shape_compatible_with_concatenated_encoder_outputs)
x = Dropout(0.2)(inputTensorForMlp)
h = Dense(num_hidden, activation='relu', name='hidden')(x)
h = Dropout(0.5)(h)
y = Dense(1, activation='sigmoid', name='prediction')(h)
myMLP = Model(inputs=[autoencoder1.inputs, autoencoder2.inputs], outputs=y)
We probably want the bottleneck features of the autoencoders, right? If you happened to create the autoencoders properly with: encoder model, decoder model, join both, then it's easier to use just the encoder model. Else:
encodedOutput1 = autoencoder1.layers[bottleneckLayer].outputs #or encoder1.outputs
encodedOutput2 = autoencoder1.layers[bottleneckLayer].outputs #or encoder2.outputs
Creating a joined model. The concatenation must use a keras layer (we're working with keras tensors):
concatenated = Concatenate()([encodedOutput1,encodedOutput2])
output = myMLP(concatenated)
joinedModel = Model([autoencoder1.input,autoencoder2.input],output)

I'd also go with Daniel's first approach (for simplicity and efficiency), but if you're interested in the second; for instance if you're interested in running the network end-to-end, you'd approach it like this:
# make autoencoders not trainable
autoencoder1.trainable = False
autoencoder2.trainable = False
encoded1 = autoencoder1(kerasInputs1)
encoded2 = autoencoder2(kerasInputs2)
concatenated = layers.concatenate([encoded1, encoded2], axis=-1)
h = Dense(num_hidden, activation='relu', name='hidden')(concatenated)
y = Dense(1, activation='sigmoid', name='prediction')(h)
myMLP = Model([input_data1, input_data2], y)
myMLP.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Training
myMLP.fit([x1_train, x2_train], y_train, epochs=200, batch_size=8)
# Testing
myMLP.predict([x1_test, x2_test])
Key edits
The weights of both autoencoders should be frozen end-to-end (otherwise early-stage gradient updates from the randomly initialized MLP will likely result in the loss of much of their learning).
The autoencoder input layers should be assigned to separate variables input_data1 and input_data2 per autoencoder (instead of both to input_data). Even though autoencoder1.inputs returns a tf tensor, this is the source of the unhashable type: list exception, and replacing with [input_data1, input_data2] solves the issue.
When fitting the MLP for the end-to-end model, the input should be a list of x1_train and x2_train rather than the concatenated inputs. Same when predicting.

Difference between fit and fit_generator when doing transfer learning in keras

I am trying to train a deep neural network using transfer learning in Keras with tensorflow. There are different ways to do that, if your data is small you can afford computing features using the pre-trained model for the entire data and then use those features to train and test a small network, this is good as you don't need to compute those features for each batch and at each epoch. However, if the data is large, it will be impossible to compute features for the entire data, in this case we use ImageDataGenerator, flow_from_directory and fit_generator. In this case features are computed each time fore each batch at each epoch which make things much slower. I was assuming that both approaches produce similar results in terms of accuracy and loss. The problem is that I took a small data-set and tried both approaches and got completely different results. I will appreciate if someone can tell if something is wrong in the provided code and/or why I am getting different results please?
Approach when having large data-set:
from keras.applications.inception_v3 import InceptionV3,preprocess_input
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Model
datagen= ImageDataGenerator(preprocessing_function=preprocess_input)
train_generator = datagen.flow_from_directory('data/train',
class_mode='categorical',
batch_size=64,...)
vaild_generator = datagen.flow_from_directory('data/valid',
class_mode='categorical',
batch_size=64,...)
base_model = InceptionV3(weights='imagenet', include_top=False)
x = base_model.output
x = Conv2D(filters = 128 , kernel_size = (2,2)) (x)
x = MaxPooling2D()(x)
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(2, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=predictions)
for layer in base_model.layers:
layer.trainable = False
model.compile(optimizer='rmsprop', loss='categorical_crossentropy',...)
model.fit_generator(generator = train_generator,
steps_per_epoch = len (train_generator),
validation_data = valid_generator ,
validation_steps = len(valid_generator),
...)
Approach when having small data-set:
from keras.applications.inception_v3 import InceptionV3,preprocess_input
from keras.models import Sequential
from keras.utils import np_utils
base_model = InceptionV3(weights='imagenet', include_top=False)
train_features = base_model.predict(preprocess_input(train_data))
valid_features = base_model.predict(preprocess_input(valid_data))
model = Sequential()
model.add(Conv2D(filters = 128 , kernel_size = (2,2),
input_shape=(train_features [1],
train_features [2],
train_features [3])))
model.add(MaxPooling2D())
model.add(GlobalAveragePooling2D())
model.add(Dense(1024, activation='relu'))
model.add(Dense(2, activation='softmax'))
model.compile(optimizer='rmsprop', loss='categorical_crossentropy',...)
model.fit(train_features, np_utils.to_categorical(y_train,2),
validation_data=(valid_features, np_utils.to_categorical(y_valid,2)),
batch_size=64,...)

VGG16 different epoch and batch size generating the same result

I am trying to learn how vgg16 works. Below is my code, using vgg16 for another classification.
# Generate a model with all layers (with top)
model_vgg16_conv = VGG16(weights='imagenet', include_top=False)
model_vgg16_conv.summary()
# create your own input format
input = Input(shape=(128,128,3),name = 'image_input')
# Use the generated model
output_vgg16_conv = model_vgg16_conv(input)
# Add the fully-connected layers
x = Flatten(name='flatten')(output_vgg16_conv)
x = Dense(4096, activation='relu', name='fc1')(x)
x = Dense(4096, activation='relu', name='fc2')(x)
x = Dense(5, activation='softmax', name='predictions')(x)
#Create your own model
model = Model(input=input, output=x)
#In the summary, weights and layers from VGG part will be hidden, but they will be fit during the training
model.summary()
# Specify an optimizer to use
adam = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
# Choose loss function, optimization method, and metrics (which results to display)
model.compile(
optimizer = adam,
loss='categorical_crossentropy',
metrics=['accuracy']
)
model.fit(X_train,y_train,epochs=10,batch_size=10,verbose=2)
# model.fit(X_train,y_train,epochs=30,batch_size=100,verbose=2)
result = model.predict(y_test) # same result
For some reason, using different epoch size and batch size generate exactly the same result. Am I doing something wrong?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why is accuracy and loss staying exactly the same while training? - python

Related

Why Can't I train the ANN for XNOR?

Reset all weights and biases of the model in Keras (restore the model after training)

Extract features from 2 auto-encoders and feed them into an MLP

Difference between fit and fit_generator when doing transfer learning in keras

VGG16 different epoch and batch size generating the same result

Categories

Resources