Denoising Autoencoder returns a mostly black image - python

I have some faces cropped out of images, and I want to run them through a denoising autoencoder, the code which I got from here. When I run the code on the MNIST dataset, the results look fine, like the ones in the website. However, when I run it on my own images, I get a mostly or completely black image in return instead of simply the same image without the noise.
This is the original image for reference before I resized it, so you can tell how it looks.
This is the image after resizing which I had to do in order to feed it to the autoencoder. I sized it down to be 28x28.
These are the results plotted. For the first results, I actually expect my original grayscale image to appear before I've fed it into the autoencoder. For the second row, I had wanted it to be the same image but without the noise. As you can see, I get these odd outputs and I can't tell why.
Here is the code I've tried on the MNIST dataset. For my dataset, I skipped the preprocessing of the MNIST dataset and instead preprocessed my own images (Sized them down, made them grayscale...Their dimensions are (28, 28, 1), just like the original code intended. I tried changing the number of Epochs (I went through 10, 50, and 100), but there was no noticeable difference. I considered changing the layers, but after looking at some papers and other code, the layers seem to be the same as the ones presented. I tried looking up tutorials where the autoencoder works on regular images like mine and not just the MNIST dataset, but I couldn't really find any. I'm also confused as to why, when I plot the original array, I get black squares, even though when I use cv2_imshow to relay it I get the image I showed after resizing. I don't really know if it's the same issue. I've also tried training the autoencoder on my own dataset (Which has 785 images similar to the ones I've shown above), but to no avail. I've displayed the code I used down, and if there is something missing needed to understand my question please tell me.
import keras
from keras.datasets import mnist
import numpy as np
import matplotlib.pyplot as plt
from keras.layers import Input, Dense, Conv2D, MaxPooling2D, UpSampling2D
from keras.models import Model
from keras import backend as K
from keras.callbacks import TensorBoard
(x_train, _), (x_test, _) = mnist.load_data()
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = np.reshape(x_train, (len(x_train), 28, 28, 1)) # adapt this if using `channels_first` image data format
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1)) # adapt this if using `channels_first` image data format
noise_factor = 0.5
x_train_noisy = x_train + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_train.shape)
x_test_noisy = x_test + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_test.shape)
x_train_noisy = np.clip(x_train_noisy, 0., 1.)
x_test_noisy = np.clip(x_test_noisy, 0., 1.)
n = 10
plt.figure(figsize=(20, 2))
for i in range(1, n):
ax = plt.subplot(1, n, i)
plt.imshow(x_test_noisy[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
input_img = Input(shape=(28, 28, 1)) # adapt this if using `channels_first` image data format
x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
# at this point the representation is (7, 7, 32)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
autoencoder.fit(grayscale, grayscale,
epochs=100,
batch_size=128,
shuffle=True,
validation_data=(grayscale, grayscale),
callbacks=[TensorBoard(log_dir='/tmp/tb', histogram_freq=0, write_graph=True)])
Here is the code I used to feed my image into the autoencoder and display the results.
arr= cv2.imread('/content/FramesResized/frame0000sec.jpg')
#Converting the image to grayscale
gray = cv2.cvtColor(arr, cv2.COLOR_BGR2GRAY)
#Adding an axis as when the image was converted to grayscale, it become (28,28) and I need it to be (28,28,1)
if gray.ndim == 2:
gray = np.expand_dims(gray, axis=2)
#Making a new array to take my images, currently of which there is only one
grayscale = np.zeros([785, 28, 28, 1], dtype=np.uint8)
grayscale[0] = gray
#Feeding my image into the autoencoder
decoded_imgs = autoencoder.predict(grayscale)
#Plotting the before and after images
plt.figure(figsize=(20, 4))
for i in range(1, n):
# display original
ax = plt.subplot(2, n, i)
plt.imshow(grayscale[i].reshape(28,28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# display reconstruction
ax = plt.subplot(2, n, i + n)
plt.imshow(decoded_imgs[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()

If anyone is wondering, I believe the issue was that I was applying an autoencoder that was trained on MNIST images to complex, RGB images, so the autoencoder reconstruction was very poor.
When I used the CIFAR-100 dataset to train the autoencoder, the results were much more coherent.

Related

Tensorflow convolutional autoencoders doesn't converge with my data

My problem is to create an autoencoder model to recognize anomalies on a cardboard-like surface. I know that I can use an autoencoder and train it using "good" samples, and later I can use it to recognize "bad" samples (i.e., anomalies).
I've built convolutional autoencoders in Tensorflow based on Building autoencoder in Keras and PyImageSearch autoencoders. Both examples use MNIST dataset and they work perfectly (loss is decreasing, accuracy is growing up to about 0.85). However, I tried to train both autoencoders using my custom pictures, and the problem is that my models don't converge - the training loss (I tried binary_crossentropy and mse as stated in the websites) gets stuck at some level, e.g. 3.0873e-04 or 0.0016 (depending on loss and way of normalizing the data), and accuracy is 0 or sth like 1.2618e-07. Below is the sample network architecture.
input_layer = layers.Input(shape=(28, 28, 1))
data_augmentation = tf.keras.Sequential()
data_augmentation.add(tensorflow.keras.layers.experimental.preprocessing.RandomFlip('horizontal'))
data_augmentation.add(tensorflow.keras.layers.experimental.preprocessing.RandomFlip('vertical'))
x = data_augmentation(input_layer, 0)
x = layers.Conv2D(16, (3, 3), padding='same', activation='relu')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
# the block below was added once and removed later to reduce network depth
x = layers.Conv2D(8, (3, 3), padding='same', activation='relu')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), padding='same', activation='relu')(x)
x = layers.UpSampling2D((2, 2))(x)
# the block below was added once and removed later to reduce network depth - END
x = layers.Conv2D(16, (3, 3), padding='same', activation='relu')(x)
x = layers.UpSampling2D((2, 2))(x)
decoded = layers.Conv2D(1, (3, 3), padding='same', activation='sigmoid')(x)
autoencoder = Model(input_layer, decoded)
My dataset consists of about 50k pictures of size 50x50 (and resized to 28x28) presenting tiny parts of cardboard-like sheet: Example 1, Example 2, and here you can see the source 900x900 picture I used to create 50x50 pictures. For the model, I convert them to grayscale.
I used two ways of data normalization: the first one (taken from the websites) was to split their values by 255, and the second one was to use min-max normalization. The pixel values are in range 59-168. Here is how I create the dataset:
imgs = [cv2.imread(fname) for fname in glob.glob('{}/*.jpg'.format(dir_with_pictures))]
imgs = [img for img in imgs if img is not None] # to remove pictures which OpenCV was not able to load for some reason
dataset = np.array([np.expand_dims(np.array(cv2.resize(cv2.cvtColor(img, cv2.COLOR_BGR2GRAY), (28, 28)), axis=-1) for img in imgs])
dataset = (dataset - np.min(dataset)) / (np.max(dataset) - np.min(dataset)) # one way of normalization, OR
dataset = dataset.astype('float32') / 255.0 # second way of normalization
And here I compile my model, and later it's trained using my dataset - either the whole dataset, or the training set creating by splitting the dataset 80/20:
autoencoder.compile(optimizer=Adam(learning_rate=1e-3), loss='mse', metrics=['accuracy'])
Is it possible that items in my dataset are 'too similar' to each other and that's why I can't train a good model? Or may there be anything other wrong with the dataset? What can I try to get a converging model? Should I pay attention to accuracy metric?

Keras crop input image to origin that is predicted by model

I would like to build a model in Keras that predicts what regions of an image are important, using this model:
crop_points = keras.Sequential([
Conv2D(8, (3,3), input_shape=(28, 28, 1)),
MaxPooling2D(),
Conv2D(8, (3,3)),
MaxPooling2D(),
Conv2D(8, (3,3)),
Flatten(),
Dense(16),
RepeatVector(num_samples),
LSTM(32, return_sequences=True),
TimeDistributed(Dense(2))
])
The model predicts a tensor of the length num_samples with the origin of the cropping region. shape = (num_samples, 2) I would now like to have a Lambda layer (or a custom layer if that works better) to crop the input image to each of those predicted tensors for further processing with another model. This needs to be done in Keras as this model should be trained end-to-end and in the end, be exported to CoreML.
My current Lambda layer looks like this:
# Lambda layer
def crop_image(tensor):
image = tensor[0]
point = tensor[1]
x_location = int(K.cast(point[0],"int32"))
y_location = int(K.cast(point[1],"int32"))
print("x: {}, y: {}".format(x,y))
chunk = x_test[i][x_location:x_location + chunk_size, y_location:y_location + chunk_size]
flattened = K.concatenate(chunk).ravel()
flattened = K.append(chunk, x_location)
flattened = K.append(chunk, y_location)
#flattened = np.array(flattened)
#return tf.convert_to_tensor(flattened)
My training data looks like this:
(x_train, _), (x_test, _) = keras.datasets.mnist.load_data()
x_train = x_train / 255
x_test = x_test / 255
However, Keras complains int() argument must be a string, a bytes-like object or a number, not 'Tensor' as it is unable to convert the tensors to integers, which I need for cropping the image. How shall I write my Lambda layer to crop the image dynamically based on the predicted origins?

Tensorflow:Model was constructed with shape (None, 28, 28) , but it was called on an input with incompatible shape (None, 28)

I am solving the digit recognition task using the MNIST dataset in keras.
The task itself runs smoothly but afterwards I have tried to use the same model
for some other handwritten digits that I created with 'paint'.
Since the original size was (192, 188, 3), I specifically resized to (28, 28).
However, once I try the model on this newly created digit (see attachment), this is the Warning message I get:
WARNING:tensorflow:Model was constructed with shape (None, 28, 28) for input KerasTensor(type_spec=TensorSpec(shape=(None, 28, 28), dtype=tf.float32, name='flatten_input'), name='flatten_input', description="created by layer 'flatten_input'"), but it was called on an input with incompatible shape (None, 28)
In addition to this error message:
ValueError: Input 0 of layer dense is incompatible with the layer: expected axis -1 of input shape to have value 784 but received input with shape (None, 28)
Here is my code:
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
# %matplotlib inline
import numpy as np
import pandas as pd
import cv2 as cv
(X_train, y_train),(X_test, y_test)=keras.datasets.mnist.load_data()
# Normalize the train dataset
X_train = tf.keras.utils.normalize(X_train, axis=1)
# Normalize the test dataset
X_test = tf.keras.utils.normalize(X_test, axis=1)
#Build the model object
model = tf.keras.models.Sequential()
# Add the Flatten Layer
model.add(tf.keras.layers.Flatten())
# Build the input and the hidden layers
model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu))
# Build the output layer
model.add(tf.keras.layers.Dense(10, activation=tf.nn.softmax))
# Compile the model
model.compile(optimizer="adam", loss="sparse_categorical_crossentropy",
metrics=["accuracy"])
model.fit(x=X_train, y=y_train, epochs=20) # Start training process
# Evaluate the model performance
test_loss, test_acc = model.evaluate(x=X_test, y=y_test)
# Print out the model accuracy
print('\nTest accuracy:', test_acc)
predictions = model.predict([X_test]) # Make prediction
# TRY SAME MODEL WITH NEW DIGIT
img_6 = cv.imread("6.png")
img_7 = cv.imread("7.png")
img_2 = cv.imread("2.png")
from tensorflow.keras.preprocessing import image
img = img_7
img=cv.resize(img, X_train[0].shape,
interpolation = cv.INTER_AREA)
img = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
plt.imshow(img)
plt.show()
img=np.invert(np.array([img]))
img=np.reshape(img, ( 784, 1))
print(img.shape,'fghjkljkhjgfgfgcgvhbjnmnbjv')
plt.imshow(img)
plt.show()
img=np.expand_dims(img, axis=0) # will move it to (1,784)
print(img.shape,'fghjkljkhjgfgfgcgvhbjnmnbjv')
plt.imshow(img)
plt.show()
prediction=model.predict(img) # predict
print ('prediction=',np.argmax(prediction))
plt.imshow(img)
plt.show()
The problem with your code is that your model is expecting a 3-dimensional input (batch_size, width, height), while you're giving it a single 2-dimensional image (width, height).
You can first reshape your input image to the correct shape, like so:
np.reshape(img_6, (1, 28, 28))
The first layer on your model is tf.keras.layers.Flatten() i.e flatten. means it's like an array. what is that array length is 784(28X28X1 ~ length x width x channel). so if you put model.summary() the first layer is :
Layer (type) Output Shape Param #
=================================================================
flatten (Flatten) (None, 784) 0
so it means that predict is expecting input data as
(1,784). you are on right track to resize and gray out the input image, few more steps are needed. please refer to the below code and comment against each line:
from tensorflow.keras.preprocessing import image # import image preprocessing
img_6 = cv.imread("6.png") # shape if (352, 324, 3) for screen snap, this could be different based on read image.
img_6=cv.resize(img_6, X_train[0].shape,
interpolation = cv.INTER_AREA) # now its in shape (28, 28, 3) which is~ 2352(28x28x3)
img_6 = cv.cvtColor(img_6, cv.COLOR_BGR2GRAY) # now gray image
img_6=image.img_to_array(img_6) # shape (28, 28, 1) i.e channel 1
img_6= img_6.flatten() # flatten it as model is expecting (None,784) , this will be (784,) i.e 28x28x1 =
img_6=np.expand_dims(img_6, axis=0) # will move it to (1,784)
prediction=model.predict(im1) # predict
print (np.argmax(prediction))
Indeed the keras model has first layer Flatten, but since the training is done on X_train of shape (60000,28,28) and the first successful prediction is done on X_test of shape (10000,28,28), what you need for prediction is a pandas array of shape (1,28,28).
Also be sure that the images in the taining MINST database are on black background (0 color) written with white nuances (closer to 1), so you need to normalize the img array with img = (255-img) / 255
So with the following additional code I could predict successfuly 2 and 6 image :
img = img2
img=cv.resize(img, X_train[0].shape, interpolation = cv.INTER_AREA)
img = cv.cvtColor(img, cv.COLOR_BGR2GRAY) # now gray image (28,28)
img = (255-img) / 255 # normalize as white on black
img=np.expand_dims(img, axis=0) # will move it to (1,28,28)
pred=model.predict(img) # predict
print(pred)

Predict radius of multiple circle in images with CNN

I am trying to calculate the radius of circles in an image using a convolutional neural network.
I have only the image as input and the radius on the output side, so the mapping is [image]->[radius of circle].
Input dimensions and neural network architecture are as following:
from tensorflow.keras import layers
from tensorflow.keras import Model
img_input = layers.Input(shape=(imgsize, imgsize, 1))
x = layers.Conv2D(16, (3,3), activation='relu', strides =1, padding = 'same')(img_input)
x = layers.Conv2D(32, (3,3), activation='relu', strides = 2)(x)
x = layers.Conv2D(128, (3,3), activation='relu', strides = 2)(x)
x = layers.MaxPool2D(pool_size=2)(x)
x = layers.Conv2D(circle_per_box, 1, activation='linear', strides = 2)(x)
output = layers.Flatten()(x)
model_CNN = Model(img_input, output)
model_CNN.summary()
model_CNN.compile(loss='mean_squared_error',optimizer= 'adam', metrics=['mse'])
X_train, X_test, Y_train, Y_test = train_test_split(image, radii, test_size=0.2, random_state=0)
print(X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
(8000, 12, 12, 1) (2000, 12, 12, 1) (8000, 1) (2000, 1)
Y_train
array([[1.01003947],
[1.32057104],
[0.34507285],
...,
[1.53130402],
[0.69527609],
[1.85973669]])
If I calculate this for one circle per image, I get a solid result:
With more circles (see image) per image, however, the same network collapses and I get the following result:
Shape of Y.train for 2 circles reads:
Y_train.shape
(10000, 2)
Y.train
array([[1.81214007, 0.68388911],
[1.47920612, 1.04222943],
[1.90827465, 1.43238623],
...,
[1.40865229, 1.65726638],
[0.52878558, 1.94234548],
[1.57923437, 1.19544775]])
Why does the neural network behave this way?
If I try to calculate the radius of the two generated circles in the image separately as described above, I get good results again, but not if two circles are in the image at the same time.
Does anyone have any ideas/suggestions?

Fine-tuning of Keras autoencoders of cat images

I want to use autoencoders on real life photos (and not simple MNIST digits). I have taken the cats and dog dataset and
train with it. My parameters are:
I stick with a grayscale and a scaled down verson of 128x128 px image and do some preprocessing in the ImageDataGenerator for data augmentation.
I train with different of datasets of about 2000 images or cats and dogs. I could take 10000 but it lasts too long.
I choose a convolution network with basic downsamplers and upsamplers and toyed with the parameters and ended up with a bootlebeck of 8x8x8 = 512 (it is 1/32 of the original image with 128x128px).
Here is the python code:
from keras.preprocessing.image import ImageDataGenerator
from keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D
from keras.models import Model
from keras import metrics
from keras.callbacks import EarlyStopping
import os
root_dir = '/opt/data/pets'
epochs = 400 # epochs of training, the more the better
batch_size = 64 # number of images to be yielded from the generator per batch
seed = 4321 # constant seed for constant conditions
# keras image input type definition
img_channel = 1 # 1 for grayscale, 3 for color
# dimension of input image for network, the bigger the more CPU and RAM is used
img_x, img_y = 128, 128
input_img = Input(shape = (img_x, img_y, img_channel))
# this is the augmentation configuration we use for training
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
# this is the augmentation configuration we will use for testing
test_datagen = ImageDataGenerator(rescale=1./255)
# this is a generator that will read pictures found in
# subfolders of 'data/train', and indefinitely generate
# batches of augmented image data
train_generator = train_datagen.flow_from_directory(
root_dir + '/train', # this is the target directory
target_size=(img_x, img_y), # all images will be resized
batch_size=batch_size,
color_mode='grayscale',
class_mode='input', # necessarry for autoencoder
shuffle=False, # important for correct filename for labels
seed = seed)
# this is a similar generator, for validation data
validation_generator = test_datagen.flow_from_directory(
root_dir + '/validation',
target_size=(img_x, img_y),
batch_size=batch_size,
color_mode='grayscale',
class_mode='input', # necessarry for autoencoder
shuffle=False, # important for correct filename for labels
seed = seed)
# create convolutional autoencoder inspired from https://blog.keras.io/building-autoencoders-in-keras.html
x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(16, (3, 3), activation='relu',padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu',padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu',padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(img_channel, (3, 3), activation='sigmoid', padding='same')(x) # example from documentaton
autoencoder = Model(input_img, decoded)
autoencoder.summary() # show model data
autoencoder.compile(optimizer='sgd',loss='mean_squared_error',metrics=[metrics.mae, metrics.categorical_accuracy])
# do not run forever but stop if model does not get better
stopper = EarlyStopping(monitor='val_loss', min_delta=0.0001, patience=2, mode='auto', verbose=1)
# do the actual fitting
autoencoder_train = autoencoder.fit_generator(
train_generator,
validation_data=validation_generator,
epochs=epochs,
shuffle=False,
callbacks=[stopper])
# create an encoder for debugging purposes later
encoder = Model(input_img, encoded)
# save the modell paramers to a file
autoencoder.save(os.path.basename(__file__) + '_model.hdf')
## PLOTS ####################################
import matplotlib.pyplot as plt
# Plot loss over epochs
print(autoencoder_train.history.keys())
plt.plot(autoencoder_train.history['loss'])
plt.plot(autoencoder_train.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'])
plt.show()
# Plot original, encoded and predicted image
import numpy as np
images_show_start = 1
images_show_stop = 20
images_show_number = images_show_stop - images_show_start +1
images,_ = train_generator.next()
plt.figure(figsize=(30, 5))
for i in range(images_show_start, images_show_stop):
# original image
ax = plt.subplot(3, images_show_number, i +1)
image = images[i,:,:,0]
image_reshaped = np.reshape(image, [1, 128, 128, 1])
plt.imshow(image,cmap='gray')
# label
image_label = os.path.dirname(validation_generator.filenames[i])
plt.title(image_label) # only OK if shuffle=false
# encoded image
ax = plt.subplot(3, images_show_number, i + 1+1*images_show_number)
image_encoded = encoder.predict(image_reshaped)
# adjust shape if the network parameters are adjusted
image_encoded_reshaped = np.reshape(image_encoded, [16,32])
plt.imshow(image_encoded_reshaped,cmap='gray')
# predicted image
ax = plt.subplot(3, images_show_number, i + 1+ 2*images_show_number)
image_pred = autoencoder.predict(image_reshaped)
image_pred_reshaped = np.reshape(image_pred, [128,128])
plt.imshow(image_pred_reshaped,cmap='gray')
plt.show()
In the network configuration you see the layers.
What do you think? It is to deep or to simple? What adjustments could one do?
The loss decreased over the epochs as it should be.
And here we have three images in each column:
the original (scaled down) image,
the encoded image and
the predicted.
So, I wonder, why the encoded images look quite similar in characteristics (besides they are all cats) with lot of vertical lines. The encoded images are quite big with 8x8x8 pixel that I ploted with 16x32 pixel which makes it 1/32 of the pixel of the original images.
Is the quality of the decoded image sufficient for that?
Can it somehow improved? Can I even make a smaller bottleneck in the Autoencoder ? If I try a smaller bottleneck the loss is stuck at 0.06 and the predicted images are very bad.
Your model contains only a very few parameters (~32,000) only. These might just not be enough to process the data and to get an insight the data generating probability distribution.
Your convolutions decrease the image size always by a factor of 2, but you do not increase the number of filters. This means, that your convolutions are not volume-preserving but actually strongly shrinking. This might simply be too strong.
I would at first try to increase the number of parameters and check if this helps to make the images less blurry. Then, if the images actually get better by increasing the number of parameters (it should, as the compression level is now lower than before) you can decrease the number of parameters (i.e. size of the compressed state) again. This way can help you to spot other problems in your code.
Maybe you can take a look at existing autoencoder implementations in keras which work in different datasets (which also feature more complex data, too), like this one which uses CIFAR10.
The black lines in the encoded state images might just come from the way how you plot the data. As your data in this layer does not have depth 1 but 8 you must resize it. If the original cube had lower values at the borders (which makes sense, as there is most likely not that much important information), you will rearrange the dark/black surface of the cube and project it on a 2D surface; this then might look like this repetitive black lines.
Furthermore, considering the loss plot of the network, it might also be the case, that the training has not converged yet. So, the quality of the images might still increase if you continue training.
Lastly, you should use all training images available and not just a small subset. This will (of course) increase the time necessary to train but the results of the encoder will be much better as the network will be more resistant to overfitting and most-likely able to better generalize.
Shuffling your data might also increase the training's performance.

Categories