My problem is to create an autoencoder model to recognize anomalies on a cardboard-like surface. I know that I can use an autoencoder and train it using "good" samples, and later I can use it to recognize "bad" samples (i.e., anomalies).
I've built convolutional autoencoders in Tensorflow based on Building autoencoder in Keras and PyImageSearch autoencoders. Both examples use MNIST dataset and they work perfectly (loss is decreasing, accuracy is growing up to about 0.85). However, I tried to train both autoencoders using my custom pictures, and the problem is that my models don't converge - the training loss (I tried binary_crossentropy and mse as stated in the websites) gets stuck at some level, e.g. 3.0873e-04 or 0.0016 (depending on loss and way of normalizing the data), and accuracy is 0 or sth like 1.2618e-07. Below is the sample network architecture.
input_layer = layers.Input(shape=(28, 28, 1))
data_augmentation = tf.keras.Sequential()
data_augmentation.add(tensorflow.keras.layers.experimental.preprocessing.RandomFlip('horizontal'))
data_augmentation.add(tensorflow.keras.layers.experimental.preprocessing.RandomFlip('vertical'))
x = data_augmentation(input_layer, 0)
x = layers.Conv2D(16, (3, 3), padding='same', activation='relu')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
# the block below was added once and removed later to reduce network depth
x = layers.Conv2D(8, (3, 3), padding='same', activation='relu')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), padding='same', activation='relu')(x)
x = layers.UpSampling2D((2, 2))(x)
# the block below was added once and removed later to reduce network depth - END
x = layers.Conv2D(16, (3, 3), padding='same', activation='relu')(x)
x = layers.UpSampling2D((2, 2))(x)
decoded = layers.Conv2D(1, (3, 3), padding='same', activation='sigmoid')(x)
autoencoder = Model(input_layer, decoded)
My dataset consists of about 50k pictures of size 50x50 (and resized to 28x28) presenting tiny parts of cardboard-like sheet: Example 1, Example 2, and here you can see the source 900x900 picture I used to create 50x50 pictures. For the model, I convert them to grayscale.
I used two ways of data normalization: the first one (taken from the websites) was to split their values by 255, and the second one was to use min-max normalization. The pixel values are in range 59-168. Here is how I create the dataset:
imgs = [cv2.imread(fname) for fname in glob.glob('{}/*.jpg'.format(dir_with_pictures))]
imgs = [img for img in imgs if img is not None] # to remove pictures which OpenCV was not able to load for some reason
dataset = np.array([np.expand_dims(np.array(cv2.resize(cv2.cvtColor(img, cv2.COLOR_BGR2GRAY), (28, 28)), axis=-1) for img in imgs])
dataset = (dataset - np.min(dataset)) / (np.max(dataset) - np.min(dataset)) # one way of normalization, OR
dataset = dataset.astype('float32') / 255.0 # second way of normalization
And here I compile my model, and later it's trained using my dataset - either the whole dataset, or the training set creating by splitting the dataset 80/20:
autoencoder.compile(optimizer=Adam(learning_rate=1e-3), loss='mse', metrics=['accuracy'])
Is it possible that items in my dataset are 'too similar' to each other and that's why I can't train a good model? Or may there be anything other wrong with the dataset? What can I try to get a converging model? Should I pay attention to accuracy metric?
Related
I need to use a CNN for some audio task, and I chose to do something a little more interesting than the obvious audio classifier, so I set out to implement a convolutional denoising autoencoder, but applying a simple image denoising architecture to the task has proven pretty unsuccessful. I'm using Keras on Google Colab with their cloud GPU. I'm using an STFT to represent my audio. Below is my architecture and results (links to spectrograms) so far, any advice?
input_layer = Input(shape=X_train[0].shape)
reshaped_input = Reshape([X_train[0].shape[0], X_train[0].shape[1], 1],
input_shape=X_train[0].shape)(input_layer)
skip0 = Conv2D(64, (3, 3), activation='relu', padding='same')(reshaped_input)
h = BatchNormalization()(skip0)
h = MaxPooling2D((2, 2), padding='same')(h)
# decoder
h = Conv2D(64, (3, 3), activation='relu', padding='same')(h)
h = BatchNormalization()(h)
h = UpSampling2D((2, 2))(h)
h = add([h, skip0])
h = Conv2D(1, (3, 3), activation='relu', padding='same')(h)
output_layer = add([h, reshaped_input])
reshaped_output = Reshape(X_train[0].shape)(output_layer)
autoencoder = Model(input_layer, reshaped_output)
autoencoder.summary()
autoencoder.compile(loss='mse', optimizer='adam')
Clean Recording
Noisy Recording (I simply added scaled-down Gaussian noise)
Prediction
I've adapted a simple CNN from a tutorial on Analytics Vidhya.
Problem is that my accuracy on a holdout set is no better than random. I am training on ~8600 images each of cats and dogs, which should be enough data for decent model, but accuracy on the test set is at 49%. Is there a glaring omission in my code somewhere?
import os
import numpy as np
import keras
from keras.models import Sequential
from sklearn.model_selection import train_test_split
from datetime import datetime
from PIL import Image
from keras.utils.np_utils import to_categorical
from sklearn.utils import shuffle
def main():
cat=os.listdir("train/cats")
dog=os.listdir("train/dogs")
filepath="train/cats/"
filepath2="train/dogs/"
print("[INFO] Loading images of cats and dogs each...", datetime.now().time())
#print("[INFO] Loading {} images of cats and dogs each...".format(num_images), datetime.now().time())
images=[]
label = []
for i in cat:
image = Image.open(filepath+i)
image_resized = image.resize((300,300))
images.append(image_resized)
label.append(0) #for cat images
for i in dog:
image = Image.open(filepath2+i)
image_resized = image.resize((300,300))
images.append(image_resized)
label.append(1) #for dog images
images_full = np.array([np.array(x) for x in images])
label = np.array(label)
label = to_categorical(label)
images_full, label = shuffle(images_full, label)
print("[INFO] Splitting into train and test", datetime.now().time())
(trainX, testX, trainY, testY) = train_test_split(images_full, label, test_size=0.25)
filters = 10
filtersize = (5, 5)
epochs = 5
batchsize = 32
input_shape=(300,300,3)
#input_shape = (30, 30, 3)
print("[INFO] Designing model architecture...", datetime.now().time())
model = Sequential()
model.add(keras.layers.InputLayer(input_shape=input_shape))
model.add(keras.layers.convolutional.Conv2D(filters, filtersize, strides=(1, 1), padding='same',
data_format="channels_last", activation='relu'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(units=2, input_dim=50,activation='softmax'))
#model.add(keras.layers.Dense(units=2, input_dim=5, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print("[INFO] Fitting model...", datetime.now().time())
model.fit(trainX, trainY, epochs=epochs, batch_size=batchsize, validation_split=0.3)
model.summary()
print("[INFO] Evaluating on test set...", datetime.now().time())
eval_res = model.evaluate(testX, testY)
print(eval_res)
if __name__== "__main__":
main()
For me the problem comes from the size of your network, you have only one Conv2D with a filter size of 10. This is way too small to learn the deep reprensation of your image.
Try to increment this a lot by using blocks of common architectures like VGGnet !
Example of a block :
x = Conv2D(32, (3, 3) , padding='SAME')(model_input)
x = LeakyReLU(alpha=0.3)(x)
x = BatchNormalization()(x)
x = Conv2D(32, (3, 3) , padding='SAME')(x)
x = LeakyReLU(alpha=0.3)(x)
x = BatchNormalization()(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Dropout(0.25)(x)
You need to try multiple blocks like that, and increasing the filter size in order to capture deeper features.
Other thing, you don't need to specify the input_dim of your dense layer, keras automaticly take care of that !
Last but not least, you need to fully connected network in oder to correctly classify your images, not only a single layer.
For example :
x = Flatten()(x)
x = Dense(256)(x)
x = LeakyReLU(alpha=0.3)(x)
x = Dense(128)(x)
x = LeakyReLU(alpha=0.3)(x)
x = Dense(2)(x)
x = Activation('softmax')(x)
Try those changes and keep me in touch !
Update after op's questions
Images are complex, they contain much information like shapes, edges, colors, etc
In order to capture the maximum amont of information you need to passes through multiple convolutions which will learn the different aspects of the image.
Imagine that like for example first convolution will learn to recognise a square, the second conv to recognise circles, the third to recognise edges, etc ..
And for my second point, the final fully connected acts like a classifier, the conv network will output a vector that "represents" a dog or a cat, now you need to learn that this kind of vector is one class or the other one.
And directly feeding that vector in the final layer is not enough to learn this representation.
Is that more clear ?
Last update for op's second comment
Here the two ways for defining a Keras model, both output the same thing !
model_input = Input(shape=(200, 1))
x = Dense(32)(model_input)
x = Dense(16)(x)
x = Activation('relu')(x)
model = Model(inputs=model_input, outputs=x)
model = Sequential()
model.add(Dense(32, input_shape=(200, 1)))
model.add(Dense(16, activation = 'relu'))
Example of architecure
model = Sequential()
model.add(keras.layers.InputLayer(input_shape=input_shape))
model.add(keras.layers.convolutional.Conv2D(32, (3,3), strides=(2, 2), padding='same', activation='relu'))
model.add(keras.layers.convolutional.Conv2D(32, (3,3), strides=(2, 2), padding='same', activation='relu'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(keras.layers.convolutional.Conv2D(64, (3,3), strides=(2, 2), padding='same', activation='relu'))
model.add(keras.layers.convolutional.Conv2D(64, (3,3), strides=(2, 2), padding='same', activation='relu'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(128, activation='relu'))
model.add(keras.layers.Dense(2, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
Don't forget to normalize your data before feeding into your network.
A simple images_full = images_full / 255.0 on your data can boost your accuracy a lot.
Try it with grayscale images too, it's more computaly efficient.
I was told that using data augmentation would help me get more accurate predictions on handwritten digits I've extracted from a document (so it's not in the MNIST dataset I'm using), so I used it in my model. However, I'm curious if I did it correctly, as the size of the training set before using data augmentation is 60000, but after adding data augmentation it went down to 3750 per epoch? Am I doing this correctly?
Following the data augmentation part of this tutorial, I've adjusted it such that it will work on how I create and train my model. I've also left out optional parameters in the functions that I don't really understand as of the moment.
The current model I'm using is from one of the answers I got from my previous question as it performs better than the second model I've cobbled together for the sake of trying. I think the only change I made is to make it use sparse_categorical_crossentropy instead for loss, as I'm categorizing digits and handwritten characters can't belong to two categories of digits, uh, right?
def createModel():
model = keras.models.Sequential()
# 1st conv & maxpool
model.add(keras.layers.Conv2D(40, (5, 5), padding="same", activation='relu', input_shape=(28, 28, 1)))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# 2nd conv & maxpool
model.add(keras.layers.Conv2D(200, (3, 3), padding="same", activation='relu'))
model.add(keras.layers.MaxPooling2D(pool_size=(3, 3), strides=(1,1)))
# 3rd conv & maxpool
model.add(keras.layers.Conv2D(512, (3, 3), padding="valid", activation='relu'))
model.add(keras.layers.MaxPooling2D(pool_size=(3, 3), strides=(1,1)))
# reduces dims from 2d to 1d
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(units=100, activation='relu'))
# dropout for preventing overfitting
model.add(keras.layers.Dropout(0.5))
# final fully-connected layer
model.add(keras.layers.Dense(10, activation='softmax'))
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
return model
Training is done by a separate function, and this is where I plugged in the data augmentation part:
def trainModel(file_model, epochs=5, create_new=False, show=False):
model = createModel()
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = x_train.reshape((60000, 28, 28, 1)) / 255.0
x_test = x_test.reshape((10000, 28, 28, 1)) /255.0
x_gen = np.array(x_train, copy=True)
y_gen = np.array(y_train, copy=True)
datagen = keras.preprocessing.image.ImageDataGenerator(featurewise_center=True, featurewise_std_normalization=True, rotation_range=20)
datagen.fit(x_gen)
x_train = np.concatenate((x_train, x_gen), axis=0)
y_train = np.concatenate((y_train, y_gen), axis=0)
# if prev model exists and new model isn't needed..
if (os.path.exists(file_model)==True and create_new==False):
model = keras.models.load_model(file_model)
else:
history = model.fit_generator(datagen.flow(x_train, y_train), epochs=epochs, validation_data=(x_test, y_test))
model.save(file_model)
if (show==True):
model.summary()
return model
I would expect it will greatly help with correctly identifying the handwritten characters assuming it was used correctly. But I'm not even sure if I did it correctly such that it contributes a lot on the model's accuracy.
EDIT: It did help a bit in identifying some of the extracted characters, but the model still didn't get the majority of the extracted characters right, leading me to doubt if I have implemented it properly..
I am working on a regression CNN using Keras/Tensorflow. I have a multi-output feed-forward model that I have trained up with some success. The model takes in a 201x201 grayscale image and returns two regression targets.
Here is an example of an input/target pair:
is associated with (z=562.59, a=4.53)
There exists an analytical solution for this problem, so I know it's solvable.
Here is the model architecture:
model_input = keras.Input(shape=input_shape, name='image')
x = model_input
x = Conv2D(32, kernel_size=(3, 3), activation='relu')(x)
x = MaxPooling2D(pool_size = (2,2))(x)
x = Conv2D(32, kernel_size=(3, 3), activation='relu')(x)
x = MaxPooling2D(pool_size = (2,2))(x)
x = Conv2D(32, kernel_size=(3, 3), activation='relu')(x)
x = MaxPooling2D(pool_size = (2,2))(x)
x = Conv2D(16, kernel_size=(3, 3), activation='relu')(x)
x = MaxPooling2D(pool_size = (4,4))(x)
x = Flatten()(x)
model_outputs = list()
out_names = ['z', 'a']
for i in range(2):
out_name = out_names[i]
local_output= x
local_output = Dense(10, activation='relu')(local_output)
local_output = Dropout(0.2)(local_output)
local_output = Dense(units=1, activation='linear', name = out_name)(local_output)
model_outputs.append(local_output)
model = Model(model_input, model_outputs)
model.compile(loss = 'mean_squared_error', optimizer='adam', loss_weights = [1,1])
My targets are on different scales, so I normalized one of them (name 'a') to the range [0,1] for training. Here is how I rescale:
def rescale(min, max, list):
scalar = 1./(max-min)
list = (list-min)*scalar
return list
Where min,max for each parameter are known a priori and are constant.
Here is how I trained:
model.fit({'image' : x_train},
{'z' : z_train, 'a' : a_train},
batch_size = 32,
epochs=20,
verbose=1,
validation_data = ({'image' : x_test},
{'z' : z_test, 'a' : a_test}))
When I predict for 'a', I get a fairly good accuracy, but with an offset:
This is a fairly easy thing to fix, I just apply a linear fit to the predictions and invert it to rescale:
But I can't think of a reason why this would be happening in the first place. I've used this same model architecture for other problems, and I get that same offset again. Has anyone seen this sort of thing before?
EDIT: This offset occurs in multiple different models of mine, which each predict different parameters but are rescaled/preprocessed in the same way. It happens regardless of how many epochs I train for, with more training resulting in predictions hugging the green line (in the first graph) more closely.
As a temporary work-around, I trained a single-node model to take the input as the original model's prediction and the output as the ground truth. This trained up nicely, and corrects the offset. What's strange though, is that I can apply this rescale model to ANY of the models with this issue, and it corrects the offset equally well.
Essentially: the offset has the same weight for multiple different models, which predict completely different parameters. This makes me think there is something to do with the activation or regularization.
I want to use autoencoders on real life photos (and not simple MNIST digits). I have taken the cats and dog dataset and
train with it. My parameters are:
I stick with a grayscale and a scaled down verson of 128x128 px image and do some preprocessing in the ImageDataGenerator for data augmentation.
I train with different of datasets of about 2000 images or cats and dogs. I could take 10000 but it lasts too long.
I choose a convolution network with basic downsamplers and upsamplers and toyed with the parameters and ended up with a bootlebeck of 8x8x8 = 512 (it is 1/32 of the original image with 128x128px).
Here is the python code:
from keras.preprocessing.image import ImageDataGenerator
from keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D
from keras.models import Model
from keras import metrics
from keras.callbacks import EarlyStopping
import os
root_dir = '/opt/data/pets'
epochs = 400 # epochs of training, the more the better
batch_size = 64 # number of images to be yielded from the generator per batch
seed = 4321 # constant seed for constant conditions
# keras image input type definition
img_channel = 1 # 1 for grayscale, 3 for color
# dimension of input image for network, the bigger the more CPU and RAM is used
img_x, img_y = 128, 128
input_img = Input(shape = (img_x, img_y, img_channel))
# this is the augmentation configuration we use for training
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
# this is the augmentation configuration we will use for testing
test_datagen = ImageDataGenerator(rescale=1./255)
# this is a generator that will read pictures found in
# subfolders of 'data/train', and indefinitely generate
# batches of augmented image data
train_generator = train_datagen.flow_from_directory(
root_dir + '/train', # this is the target directory
target_size=(img_x, img_y), # all images will be resized
batch_size=batch_size,
color_mode='grayscale',
class_mode='input', # necessarry for autoencoder
shuffle=False, # important for correct filename for labels
seed = seed)
# this is a similar generator, for validation data
validation_generator = test_datagen.flow_from_directory(
root_dir + '/validation',
target_size=(img_x, img_y),
batch_size=batch_size,
color_mode='grayscale',
class_mode='input', # necessarry for autoencoder
shuffle=False, # important for correct filename for labels
seed = seed)
# create convolutional autoencoder inspired from https://blog.keras.io/building-autoencoders-in-keras.html
x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(16, (3, 3), activation='relu',padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu',padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu',padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(img_channel, (3, 3), activation='sigmoid', padding='same')(x) # example from documentaton
autoencoder = Model(input_img, decoded)
autoencoder.summary() # show model data
autoencoder.compile(optimizer='sgd',loss='mean_squared_error',metrics=[metrics.mae, metrics.categorical_accuracy])
# do not run forever but stop if model does not get better
stopper = EarlyStopping(monitor='val_loss', min_delta=0.0001, patience=2, mode='auto', verbose=1)
# do the actual fitting
autoencoder_train = autoencoder.fit_generator(
train_generator,
validation_data=validation_generator,
epochs=epochs,
shuffle=False,
callbacks=[stopper])
# create an encoder for debugging purposes later
encoder = Model(input_img, encoded)
# save the modell paramers to a file
autoencoder.save(os.path.basename(__file__) + '_model.hdf')
## PLOTS ####################################
import matplotlib.pyplot as plt
# Plot loss over epochs
print(autoencoder_train.history.keys())
plt.plot(autoencoder_train.history['loss'])
plt.plot(autoencoder_train.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'])
plt.show()
# Plot original, encoded and predicted image
import numpy as np
images_show_start = 1
images_show_stop = 20
images_show_number = images_show_stop - images_show_start +1
images,_ = train_generator.next()
plt.figure(figsize=(30, 5))
for i in range(images_show_start, images_show_stop):
# original image
ax = plt.subplot(3, images_show_number, i +1)
image = images[i,:,:,0]
image_reshaped = np.reshape(image, [1, 128, 128, 1])
plt.imshow(image,cmap='gray')
# label
image_label = os.path.dirname(validation_generator.filenames[i])
plt.title(image_label) # only OK if shuffle=false
# encoded image
ax = plt.subplot(3, images_show_number, i + 1+1*images_show_number)
image_encoded = encoder.predict(image_reshaped)
# adjust shape if the network parameters are adjusted
image_encoded_reshaped = np.reshape(image_encoded, [16,32])
plt.imshow(image_encoded_reshaped,cmap='gray')
# predicted image
ax = plt.subplot(3, images_show_number, i + 1+ 2*images_show_number)
image_pred = autoencoder.predict(image_reshaped)
image_pred_reshaped = np.reshape(image_pred, [128,128])
plt.imshow(image_pred_reshaped,cmap='gray')
plt.show()
In the network configuration you see the layers.
What do you think? It is to deep or to simple? What adjustments could one do?
The loss decreased over the epochs as it should be.
And here we have three images in each column:
the original (scaled down) image,
the encoded image and
the predicted.
So, I wonder, why the encoded images look quite similar in characteristics (besides they are all cats) with lot of vertical lines. The encoded images are quite big with 8x8x8 pixel that I ploted with 16x32 pixel which makes it 1/32 of the pixel of the original images.
Is the quality of the decoded image sufficient for that?
Can it somehow improved? Can I even make a smaller bottleneck in the Autoencoder ? If I try a smaller bottleneck the loss is stuck at 0.06 and the predicted images are very bad.
Your model contains only a very few parameters (~32,000) only. These might just not be enough to process the data and to get an insight the data generating probability distribution.
Your convolutions decrease the image size always by a factor of 2, but you do not increase the number of filters. This means, that your convolutions are not volume-preserving but actually strongly shrinking. This might simply be too strong.
I would at first try to increase the number of parameters and check if this helps to make the images less blurry. Then, if the images actually get better by increasing the number of parameters (it should, as the compression level is now lower than before) you can decrease the number of parameters (i.e. size of the compressed state) again. This way can help you to spot other problems in your code.
Maybe you can take a look at existing autoencoder implementations in keras which work in different datasets (which also feature more complex data, too), like this one which uses CIFAR10.
The black lines in the encoded state images might just come from the way how you plot the data. As your data in this layer does not have depth 1 but 8 you must resize it. If the original cube had lower values at the borders (which makes sense, as there is most likely not that much important information), you will rearrange the dark/black surface of the cube and project it on a 2D surface; this then might look like this repetitive black lines.
Furthermore, considering the loss plot of the network, it might also be the case, that the training has not converged yet. So, the quality of the images might still increase if you continue training.
Lastly, you should use all training images available and not just a small subset. This will (of course) increase the time necessary to train but the results of the encoder will be much better as the network will be more resistant to overfitting and most-likely able to better generalize.
Shuffling your data might also increase the training's performance.