I want to use autoencoders on real life photos (and not simple MNIST digits). I have taken the cats and dog dataset and
train with it. My parameters are:
I stick with a grayscale and a scaled down verson of 128x128 px image and do some preprocessing in the ImageDataGenerator for data augmentation.
I train with different of datasets of about 2000 images or cats and dogs. I could take 10000 but it lasts too long.
I choose a convolution network with basic downsamplers and upsamplers and toyed with the parameters and ended up with a bootlebeck of 8x8x8 = 512 (it is 1/32 of the original image with 128x128px).
Here is the python code:
from keras.preprocessing.image import ImageDataGenerator
from keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D
from keras.models import Model
from keras import metrics
from keras.callbacks import EarlyStopping
import os
root_dir = '/opt/data/pets'
epochs = 400 # epochs of training, the more the better
batch_size = 64 # number of images to be yielded from the generator per batch
seed = 4321 # constant seed for constant conditions
# keras image input type definition
img_channel = 1 # 1 for grayscale, 3 for color
# dimension of input image for network, the bigger the more CPU and RAM is used
img_x, img_y = 128, 128
input_img = Input(shape = (img_x, img_y, img_channel))
# this is the augmentation configuration we use for training
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
# this is the augmentation configuration we will use for testing
test_datagen = ImageDataGenerator(rescale=1./255)
# this is a generator that will read pictures found in
# subfolders of 'data/train', and indefinitely generate
# batches of augmented image data
train_generator = train_datagen.flow_from_directory(
root_dir + '/train', # this is the target directory
target_size=(img_x, img_y), # all images will be resized
batch_size=batch_size,
color_mode='grayscale',
class_mode='input', # necessarry for autoencoder
shuffle=False, # important for correct filename for labels
seed = seed)
# this is a similar generator, for validation data
validation_generator = test_datagen.flow_from_directory(
root_dir + '/validation',
target_size=(img_x, img_y),
batch_size=batch_size,
color_mode='grayscale',
class_mode='input', # necessarry for autoencoder
shuffle=False, # important for correct filename for labels
seed = seed)
# create convolutional autoencoder inspired from https://blog.keras.io/building-autoencoders-in-keras.html
x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(16, (3, 3), activation='relu',padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu',padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu',padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(img_channel, (3, 3), activation='sigmoid', padding='same')(x) # example from documentaton
autoencoder = Model(input_img, decoded)
autoencoder.summary() # show model data
autoencoder.compile(optimizer='sgd',loss='mean_squared_error',metrics=[metrics.mae, metrics.categorical_accuracy])
# do not run forever but stop if model does not get better
stopper = EarlyStopping(monitor='val_loss', min_delta=0.0001, patience=2, mode='auto', verbose=1)
# do the actual fitting
autoencoder_train = autoencoder.fit_generator(
train_generator,
validation_data=validation_generator,
epochs=epochs,
shuffle=False,
callbacks=[stopper])
# create an encoder for debugging purposes later
encoder = Model(input_img, encoded)
# save the modell paramers to a file
autoencoder.save(os.path.basename(__file__) + '_model.hdf')
## PLOTS ####################################
import matplotlib.pyplot as plt
# Plot loss over epochs
print(autoencoder_train.history.keys())
plt.plot(autoencoder_train.history['loss'])
plt.plot(autoencoder_train.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'])
plt.show()
# Plot original, encoded and predicted image
import numpy as np
images_show_start = 1
images_show_stop = 20
images_show_number = images_show_stop - images_show_start +1
images,_ = train_generator.next()
plt.figure(figsize=(30, 5))
for i in range(images_show_start, images_show_stop):
# original image
ax = plt.subplot(3, images_show_number, i +1)
image = images[i,:,:,0]
image_reshaped = np.reshape(image, [1, 128, 128, 1])
plt.imshow(image,cmap='gray')
# label
image_label = os.path.dirname(validation_generator.filenames[i])
plt.title(image_label) # only OK if shuffle=false
# encoded image
ax = plt.subplot(3, images_show_number, i + 1+1*images_show_number)
image_encoded = encoder.predict(image_reshaped)
# adjust shape if the network parameters are adjusted
image_encoded_reshaped = np.reshape(image_encoded, [16,32])
plt.imshow(image_encoded_reshaped,cmap='gray')
# predicted image
ax = plt.subplot(3, images_show_number, i + 1+ 2*images_show_number)
image_pred = autoencoder.predict(image_reshaped)
image_pred_reshaped = np.reshape(image_pred, [128,128])
plt.imshow(image_pred_reshaped,cmap='gray')
plt.show()
In the network configuration you see the layers.
What do you think? It is to deep or to simple? What adjustments could one do?
The loss decreased over the epochs as it should be.
And here we have three images in each column:
the original (scaled down) image,
the encoded image and
the predicted.
So, I wonder, why the encoded images look quite similar in characteristics (besides they are all cats) with lot of vertical lines. The encoded images are quite big with 8x8x8 pixel that I ploted with 16x32 pixel which makes it 1/32 of the pixel of the original images.
Is the quality of the decoded image sufficient for that?
Can it somehow improved? Can I even make a smaller bottleneck in the Autoencoder ? If I try a smaller bottleneck the loss is stuck at 0.06 and the predicted images are very bad.
Your model contains only a very few parameters (~32,000) only. These might just not be enough to process the data and to get an insight the data generating probability distribution.
Your convolutions decrease the image size always by a factor of 2, but you do not increase the number of filters. This means, that your convolutions are not volume-preserving but actually strongly shrinking. This might simply be too strong.
I would at first try to increase the number of parameters and check if this helps to make the images less blurry. Then, if the images actually get better by increasing the number of parameters (it should, as the compression level is now lower than before) you can decrease the number of parameters (i.e. size of the compressed state) again. This way can help you to spot other problems in your code.
Maybe you can take a look at existing autoencoder implementations in keras which work in different datasets (which also feature more complex data, too), like this one which uses CIFAR10.
The black lines in the encoded state images might just come from the way how you plot the data. As your data in this layer does not have depth 1 but 8 you must resize it. If the original cube had lower values at the borders (which makes sense, as there is most likely not that much important information), you will rearrange the dark/black surface of the cube and project it on a 2D surface; this then might look like this repetitive black lines.
Furthermore, considering the loss plot of the network, it might also be the case, that the training has not converged yet. So, the quality of the images might still increase if you continue training.
Lastly, you should use all training images available and not just a small subset. This will (of course) increase the time necessary to train but the results of the encoder will be much better as the network will be more resistant to overfitting and most-likely able to better generalize.
Shuffling your data might also increase the training's performance.
Related
My problem is to create an autoencoder model to recognize anomalies on a cardboard-like surface. I know that I can use an autoencoder and train it using "good" samples, and later I can use it to recognize "bad" samples (i.e., anomalies).
I've built convolutional autoencoders in Tensorflow based on Building autoencoder in Keras and PyImageSearch autoencoders. Both examples use MNIST dataset and they work perfectly (loss is decreasing, accuracy is growing up to about 0.85). However, I tried to train both autoencoders using my custom pictures, and the problem is that my models don't converge - the training loss (I tried binary_crossentropy and mse as stated in the websites) gets stuck at some level, e.g. 3.0873e-04 or 0.0016 (depending on loss and way of normalizing the data), and accuracy is 0 or sth like 1.2618e-07. Below is the sample network architecture.
input_layer = layers.Input(shape=(28, 28, 1))
data_augmentation = tf.keras.Sequential()
data_augmentation.add(tensorflow.keras.layers.experimental.preprocessing.RandomFlip('horizontal'))
data_augmentation.add(tensorflow.keras.layers.experimental.preprocessing.RandomFlip('vertical'))
x = data_augmentation(input_layer, 0)
x = layers.Conv2D(16, (3, 3), padding='same', activation='relu')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
# the block below was added once and removed later to reduce network depth
x = layers.Conv2D(8, (3, 3), padding='same', activation='relu')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), padding='same', activation='relu')(x)
x = layers.UpSampling2D((2, 2))(x)
# the block below was added once and removed later to reduce network depth - END
x = layers.Conv2D(16, (3, 3), padding='same', activation='relu')(x)
x = layers.UpSampling2D((2, 2))(x)
decoded = layers.Conv2D(1, (3, 3), padding='same', activation='sigmoid')(x)
autoencoder = Model(input_layer, decoded)
My dataset consists of about 50k pictures of size 50x50 (and resized to 28x28) presenting tiny parts of cardboard-like sheet: Example 1, Example 2, and here you can see the source 900x900 picture I used to create 50x50 pictures. For the model, I convert them to grayscale.
I used two ways of data normalization: the first one (taken from the websites) was to split their values by 255, and the second one was to use min-max normalization. The pixel values are in range 59-168. Here is how I create the dataset:
imgs = [cv2.imread(fname) for fname in glob.glob('{}/*.jpg'.format(dir_with_pictures))]
imgs = [img for img in imgs if img is not None] # to remove pictures which OpenCV was not able to load for some reason
dataset = np.array([np.expand_dims(np.array(cv2.resize(cv2.cvtColor(img, cv2.COLOR_BGR2GRAY), (28, 28)), axis=-1) for img in imgs])
dataset = (dataset - np.min(dataset)) / (np.max(dataset) - np.min(dataset)) # one way of normalization, OR
dataset = dataset.astype('float32') / 255.0 # second way of normalization
And here I compile my model, and later it's trained using my dataset - either the whole dataset, or the training set creating by splitting the dataset 80/20:
autoencoder.compile(optimizer=Adam(learning_rate=1e-3), loss='mse', metrics=['accuracy'])
Is it possible that items in my dataset are 'too similar' to each other and that's why I can't train a good model? Or may there be anything other wrong with the dataset? What can I try to get a converging model? Should I pay attention to accuracy metric?
I have some faces cropped out of images, and I want to run them through a denoising autoencoder, the code which I got from here. When I run the code on the MNIST dataset, the results look fine, like the ones in the website. However, when I run it on my own images, I get a mostly or completely black image in return instead of simply the same image without the noise.
This is the original image for reference before I resized it, so you can tell how it looks.
This is the image after resizing which I had to do in order to feed it to the autoencoder. I sized it down to be 28x28.
These are the results plotted. For the first results, I actually expect my original grayscale image to appear before I've fed it into the autoencoder. For the second row, I had wanted it to be the same image but without the noise. As you can see, I get these odd outputs and I can't tell why.
Here is the code I've tried on the MNIST dataset. For my dataset, I skipped the preprocessing of the MNIST dataset and instead preprocessed my own images (Sized them down, made them grayscale...Their dimensions are (28, 28, 1), just like the original code intended. I tried changing the number of Epochs (I went through 10, 50, and 100), but there was no noticeable difference. I considered changing the layers, but after looking at some papers and other code, the layers seem to be the same as the ones presented. I tried looking up tutorials where the autoencoder works on regular images like mine and not just the MNIST dataset, but I couldn't really find any. I'm also confused as to why, when I plot the original array, I get black squares, even though when I use cv2_imshow to relay it I get the image I showed after resizing. I don't really know if it's the same issue. I've also tried training the autoencoder on my own dataset (Which has 785 images similar to the ones I've shown above), but to no avail. I've displayed the code I used down, and if there is something missing needed to understand my question please tell me.
import keras
from keras.datasets import mnist
import numpy as np
import matplotlib.pyplot as plt
from keras.layers import Input, Dense, Conv2D, MaxPooling2D, UpSampling2D
from keras.models import Model
from keras import backend as K
from keras.callbacks import TensorBoard
(x_train, _), (x_test, _) = mnist.load_data()
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = np.reshape(x_train, (len(x_train), 28, 28, 1)) # adapt this if using `channels_first` image data format
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1)) # adapt this if using `channels_first` image data format
noise_factor = 0.5
x_train_noisy = x_train + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_train.shape)
x_test_noisy = x_test + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_test.shape)
x_train_noisy = np.clip(x_train_noisy, 0., 1.)
x_test_noisy = np.clip(x_test_noisy, 0., 1.)
n = 10
plt.figure(figsize=(20, 2))
for i in range(1, n):
ax = plt.subplot(1, n, i)
plt.imshow(x_test_noisy[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
input_img = Input(shape=(28, 28, 1)) # adapt this if using `channels_first` image data format
x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
# at this point the representation is (7, 7, 32)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
autoencoder.fit(grayscale, grayscale,
epochs=100,
batch_size=128,
shuffle=True,
validation_data=(grayscale, grayscale),
callbacks=[TensorBoard(log_dir='/tmp/tb', histogram_freq=0, write_graph=True)])
Here is the code I used to feed my image into the autoencoder and display the results.
arr= cv2.imread('/content/FramesResized/frame0000sec.jpg')
#Converting the image to grayscale
gray = cv2.cvtColor(arr, cv2.COLOR_BGR2GRAY)
#Adding an axis as when the image was converted to grayscale, it become (28,28) and I need it to be (28,28,1)
if gray.ndim == 2:
gray = np.expand_dims(gray, axis=2)
#Making a new array to take my images, currently of which there is only one
grayscale = np.zeros([785, 28, 28, 1], dtype=np.uint8)
grayscale[0] = gray
#Feeding my image into the autoencoder
decoded_imgs = autoencoder.predict(grayscale)
#Plotting the before and after images
plt.figure(figsize=(20, 4))
for i in range(1, n):
# display original
ax = plt.subplot(2, n, i)
plt.imshow(grayscale[i].reshape(28,28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# display reconstruction
ax = plt.subplot(2, n, i + n)
plt.imshow(decoded_imgs[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
If anyone is wondering, I believe the issue was that I was applying an autoencoder that was trained on MNIST images to complex, RGB images, so the autoencoder reconstruction was very poor.
When I used the CIFAR-100 dataset to train the autoencoder, the results were much more coherent.
I am trying a CNN model for mnist dataset. After training the model, it's giving 99% test accuracy by model.evaluate.
But when I try to predict the answer for one image, its always returning the same array when I call model.predict().
Normalising the data:
train_images = mnist_train_images.reshape(mnist_train_images.shape[0], 28, 28, 1)
test_images = mnist_test_images.reshape(mnist_test_images.shape[0], 28, 28, 1)
input_shape = (28, 28, 1)
train_images = train_images.astype('float32')
test_images = test_images.astype('float32')
train_images /= 255
test_images /= 255
#converting labels to one hot encoded format
train_labels = tensorflow.keras.utils.to_categorical(mnist_train_labels, 10)
test_labels = tensorflow.keras.utils.to_categorical(mnist_test_labels, 10)
Model Structure and Model Training:
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
activation='relu',
input_shape=input_shape))
# 64 3x3 kernels
model.add(Conv2D(64, (3, 3), activation='relu'))
# Reduce by taking the max of each 2x2 block
model.add(MaxPooling2D(pool_size=(2, 2)))
# Dropout to avoid overfitting
model.add(Dropout(0.25))
# Flatten the results to one dimension for passing into our final layer
model.add(Flatten())
# A hidden layer to learn with
model.add(Dense(128, activation='relu'))
# Another dropout
model.add(Dropout(0.5))
# Final categorization from 0-9 with softmax
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
with tensorflow.device('/device:GPU:0'):
model.fit(train_images, train_labels,
batch_size=128,
epochs=7,
verbose=2,
validation_data=(test_images, test_labels))
Now, I have a black and white (28,28) image of a digit(Actually, its a digit from mnist training data itself). Trying to predict after normalizing that image:
image = image.reshape(-1,28, 28,1)
image = image.astype('float32')
image/=255
pred_array = model.predict(image)
print(pred_array)
pred_array = np.argmax(pred_array)
print('Result: {0}'.format(pred_array))
This always gives the same pred_array every time, and of course wrong.
I tried the answers to similar questions. For example, tried increasing epochs,
also one answer said to do
from keras.layers.core import Dense, Dropout, Flatten
from keras.layers.convolutional import Conv2D,MaxPooling2D
instead of
from keras.layers import Dense, Dropout, Flatten, Conv2D,MaxPooling2D
Tried everything but nothing seems to help. Maybe, my normalising of the image is wrong, or I might've made some silly mistake as I am new to working with images and using CNN. Please Help
I just replicated the code and everything works fine. I hope you are not loading the test image from normalized train_images because images in there are already normalized and you normalizing it again before predicting. Following works as expected for me:
image = train_images[14]
image = image.astype('float32')
image = image.reshape(-1,28, 28,1)
image/=255
pred_array = model.predict(image)
print(pred_array)
pred_array = np.argmax(pred_array)
print('Result: {0}'.format(pred_array))
Edit:
I did something different when I was replicating your code. I kept normalized images in different Numpy array like this:
train_images_norm = train_images.astype('float32')
test_images_norm = test_images.astype('float32')
train_images_norm /= 255
test_images_norm /= 255
...
model.fit(train_images_norm, train_labels_norm,...)
So now, when I predict I use original images(not normalized) ones and normalize them before prediction. The reason you get unpredictable results is that you are dividing already normalized images by 255 again which creates completely different numbers that the network is not trained with. You have two options, either keep the original images in different array and normalize them before prediction (My code) or if you want your original code to work you can remove image = image.astype('float32') and image /= 255 before prediction.
I've adapted a simple CNN from a tutorial on Analytics Vidhya.
Problem is that my accuracy on a holdout set is no better than random. I am training on ~8600 images each of cats and dogs, which should be enough data for decent model, but accuracy on the test set is at 49%. Is there a glaring omission in my code somewhere?
import os
import numpy as np
import keras
from keras.models import Sequential
from sklearn.model_selection import train_test_split
from datetime import datetime
from PIL import Image
from keras.utils.np_utils import to_categorical
from sklearn.utils import shuffle
def main():
cat=os.listdir("train/cats")
dog=os.listdir("train/dogs")
filepath="train/cats/"
filepath2="train/dogs/"
print("[INFO] Loading images of cats and dogs each...", datetime.now().time())
#print("[INFO] Loading {} images of cats and dogs each...".format(num_images), datetime.now().time())
images=[]
label = []
for i in cat:
image = Image.open(filepath+i)
image_resized = image.resize((300,300))
images.append(image_resized)
label.append(0) #for cat images
for i in dog:
image = Image.open(filepath2+i)
image_resized = image.resize((300,300))
images.append(image_resized)
label.append(1) #for dog images
images_full = np.array([np.array(x) for x in images])
label = np.array(label)
label = to_categorical(label)
images_full, label = shuffle(images_full, label)
print("[INFO] Splitting into train and test", datetime.now().time())
(trainX, testX, trainY, testY) = train_test_split(images_full, label, test_size=0.25)
filters = 10
filtersize = (5, 5)
epochs = 5
batchsize = 32
input_shape=(300,300,3)
#input_shape = (30, 30, 3)
print("[INFO] Designing model architecture...", datetime.now().time())
model = Sequential()
model.add(keras.layers.InputLayer(input_shape=input_shape))
model.add(keras.layers.convolutional.Conv2D(filters, filtersize, strides=(1, 1), padding='same',
data_format="channels_last", activation='relu'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(units=2, input_dim=50,activation='softmax'))
#model.add(keras.layers.Dense(units=2, input_dim=5, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print("[INFO] Fitting model...", datetime.now().time())
model.fit(trainX, trainY, epochs=epochs, batch_size=batchsize, validation_split=0.3)
model.summary()
print("[INFO] Evaluating on test set...", datetime.now().time())
eval_res = model.evaluate(testX, testY)
print(eval_res)
if __name__== "__main__":
main()
For me the problem comes from the size of your network, you have only one Conv2D with a filter size of 10. This is way too small to learn the deep reprensation of your image.
Try to increment this a lot by using blocks of common architectures like VGGnet !
Example of a block :
x = Conv2D(32, (3, 3) , padding='SAME')(model_input)
x = LeakyReLU(alpha=0.3)(x)
x = BatchNormalization()(x)
x = Conv2D(32, (3, 3) , padding='SAME')(x)
x = LeakyReLU(alpha=0.3)(x)
x = BatchNormalization()(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Dropout(0.25)(x)
You need to try multiple blocks like that, and increasing the filter size in order to capture deeper features.
Other thing, you don't need to specify the input_dim of your dense layer, keras automaticly take care of that !
Last but not least, you need to fully connected network in oder to correctly classify your images, not only a single layer.
For example :
x = Flatten()(x)
x = Dense(256)(x)
x = LeakyReLU(alpha=0.3)(x)
x = Dense(128)(x)
x = LeakyReLU(alpha=0.3)(x)
x = Dense(2)(x)
x = Activation('softmax')(x)
Try those changes and keep me in touch !
Update after op's questions
Images are complex, they contain much information like shapes, edges, colors, etc
In order to capture the maximum amont of information you need to passes through multiple convolutions which will learn the different aspects of the image.
Imagine that like for example first convolution will learn to recognise a square, the second conv to recognise circles, the third to recognise edges, etc ..
And for my second point, the final fully connected acts like a classifier, the conv network will output a vector that "represents" a dog or a cat, now you need to learn that this kind of vector is one class or the other one.
And directly feeding that vector in the final layer is not enough to learn this representation.
Is that more clear ?
Last update for op's second comment
Here the two ways for defining a Keras model, both output the same thing !
model_input = Input(shape=(200, 1))
x = Dense(32)(model_input)
x = Dense(16)(x)
x = Activation('relu')(x)
model = Model(inputs=model_input, outputs=x)
model = Sequential()
model.add(Dense(32, input_shape=(200, 1)))
model.add(Dense(16, activation = 'relu'))
Example of architecure
model = Sequential()
model.add(keras.layers.InputLayer(input_shape=input_shape))
model.add(keras.layers.convolutional.Conv2D(32, (3,3), strides=(2, 2), padding='same', activation='relu'))
model.add(keras.layers.convolutional.Conv2D(32, (3,3), strides=(2, 2), padding='same', activation='relu'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(keras.layers.convolutional.Conv2D(64, (3,3), strides=(2, 2), padding='same', activation='relu'))
model.add(keras.layers.convolutional.Conv2D(64, (3,3), strides=(2, 2), padding='same', activation='relu'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(128, activation='relu'))
model.add(keras.layers.Dense(2, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
Don't forget to normalize your data before feeding into your network.
A simple images_full = images_full / 255.0 on your data can boost your accuracy a lot.
Try it with grayscale images too, it's more computaly efficient.
I am working on a regression CNN using Keras/Tensorflow. I have a multi-output feed-forward model that I have trained up with some success. The model takes in a 201x201 grayscale image and returns two regression targets.
Here is an example of an input/target pair:
is associated with (z=562.59, a=4.53)
There exists an analytical solution for this problem, so I know it's solvable.
Here is the model architecture:
model_input = keras.Input(shape=input_shape, name='image')
x = model_input
x = Conv2D(32, kernel_size=(3, 3), activation='relu')(x)
x = MaxPooling2D(pool_size = (2,2))(x)
x = Conv2D(32, kernel_size=(3, 3), activation='relu')(x)
x = MaxPooling2D(pool_size = (2,2))(x)
x = Conv2D(32, kernel_size=(3, 3), activation='relu')(x)
x = MaxPooling2D(pool_size = (2,2))(x)
x = Conv2D(16, kernel_size=(3, 3), activation='relu')(x)
x = MaxPooling2D(pool_size = (4,4))(x)
x = Flatten()(x)
model_outputs = list()
out_names = ['z', 'a']
for i in range(2):
out_name = out_names[i]
local_output= x
local_output = Dense(10, activation='relu')(local_output)
local_output = Dropout(0.2)(local_output)
local_output = Dense(units=1, activation='linear', name = out_name)(local_output)
model_outputs.append(local_output)
model = Model(model_input, model_outputs)
model.compile(loss = 'mean_squared_error', optimizer='adam', loss_weights = [1,1])
My targets are on different scales, so I normalized one of them (name 'a') to the range [0,1] for training. Here is how I rescale:
def rescale(min, max, list):
scalar = 1./(max-min)
list = (list-min)*scalar
return list
Where min,max for each parameter are known a priori and are constant.
Here is how I trained:
model.fit({'image' : x_train},
{'z' : z_train, 'a' : a_train},
batch_size = 32,
epochs=20,
verbose=1,
validation_data = ({'image' : x_test},
{'z' : z_test, 'a' : a_test}))
When I predict for 'a', I get a fairly good accuracy, but with an offset:
This is a fairly easy thing to fix, I just apply a linear fit to the predictions and invert it to rescale:
But I can't think of a reason why this would be happening in the first place. I've used this same model architecture for other problems, and I get that same offset again. Has anyone seen this sort of thing before?
EDIT: This offset occurs in multiple different models of mine, which each predict different parameters but are rescaled/preprocessed in the same way. It happens regardless of how many epochs I train for, with more training resulting in predictions hugging the green line (in the first graph) more closely.
As a temporary work-around, I trained a single-node model to take the input as the original model's prediction and the output as the ground truth. This trained up nicely, and corrects the offset. What's strange though, is that I can apply this rescale model to ANY of the models with this issue, and it corrects the offset equally well.
Essentially: the offset has the same weight for multiple different models, which predict completely different parameters. This makes me think there is something to do with the activation or regularization.