Related
I am trying to implement an algorithm from a paper, using keras, where they train a neural network to approximate a mathematical function f(x) with limited amount of data points. I want the input of the neural network to be x and the output in the form of f(x) = 1 + xN(x), where N(x) is the value from the final dense layer.
I know how to make it work for output f(x) = N(x) but I just don't know how to adjust the network for f(x) = 1 + xN(x). Can someone help me?
This is my current code
from keras.layers import Input, Dense, Add, Multiply
from keras.models import Model
import keras.backend as K
import matplotlib.pyplot as plt
import numpy as np
import time
def f(x):
return x**2
Xtrain = np.linspace(0, 1, 10)
ytrain = np.array([f(x) for x in Xtrain])
X = np.linspace(0, 2, 100)
y = np.array([f(x) for x in X])
input = Input(shape=(1,))
init = np.ones(shape=(10, 1))
init = K.variable(init)
hidden = input
hidden = Dense(8, activation='relu')(hidden)
out = Dense(1, activation='linear')(hidden)
out = Add()([init, Multiply()([out, input])])
model = Model(inputs=input, outputs=out)
model.compile(loss='mean_squared_error', optimizer="adam")
tic = time.perf_counter()
model.fit(Xtrain, ytrain, epochs=1000, verbose=1)
toc = time.perf_counter()
print(f"Training time: {toc - tic:0.4f} seconds")
prediction = model.predict(X)
prediction = prediction.reshape((100,))
plt.figure(figsize=(10,5))
plt.plot(X, y, color='red', label='Analytical solution')
plt.plot(X, prediction, color='black', label = 'Prediction')
plt.scatter(Xtrain, ytrain, color='blue', label='Training points')
plt.legend()
plt.show()
plt.tight_layout()
which crashes at line
out = Add()([init, Multiply()([out, input])])
The Add layer is working between two layers and between a layer and a number/ndarray.
you can just use it like this:
init=np.ones(shape=(10, 1))
inp = Input(shape=(1,))
hidden = Dense(8, activation='relu')(inp)
out = Dense(1, activation='linear')(hidden)
mul=Multiply()([out, inp])
out = Add()([init, mul])
model = Model(inputs=inp, outputs=out)
model.compile(loss='mean_squared_error', optimizer="adam")
I checked it and it worked.
by the way, input is a builtin function, I don't recommend to use it unless you want to use it.
I have been trying to replicate the same simple network structure in MATLAB and Keras. The problem is the accuracy I get is very different. MATLAB code gets accuracy near 0.84 and loss near 17 and Keras code gets accuracy near 0.63 and loss near 130, with Keras using double epochs to train and the same data. I think the difference is too big to be a matter of implementation, so I think I'm missing something.
The original code is from a MATLAB example in which I have made a little change to avoid normalization in the first layer.
Here is the MATLAB code:
% Load the digit training set as 4-D array data using
% |digitTrain4DArrayData|.
[trainImages,~,trainAngles] = digitTrain4DArrayData;
disp("Train Images:")
disp(trainImages(:,:,:,1))
% Display 20 random sample training digits using |imshow|.
numTrainImages = size(trainImages,4);
figure
idx = randperm(numTrainImages,20);
for i = 1:numel(idx)
subplot(4,5,i)
imshow(trainImages(:,:,:,idx(i)))
drawnow
end
%%
% Combine all the layers together in a |Layer| array.
layers = [ ...
imageInputLayer([28 28 1], 'Normalization', 'none')
convolution2dLayer(12,25)
reluLayer
fullyConnectedLayer(1)
regressionLayer];
%% Train Network'
options = trainingOptions('sgdm','InitialLearnRate',0.001, ...
'MaxEpochs',15)
net = trainNetwork(trainImages,trainAngles,layers,options)
net.Layers
%% Test Network
[testImages,~,testAngles] = digitTest4DArrayData;
predictedTestAngles = predict(net,testImages);
% *Evaluate Performance*
predictionError = testAngles - predictedTestAngles;
thr = 10;
numCorrect = sum(abs(predictionError) < thr);
numTestImages = size(testImages,4);
accuracy = numCorrect/numTestImages
%%
% Use the root-mean-square error (RMSE) to measure the differences between
% the predicted and actual angles of rotation.
squares = predictionError.^2;
rmse = sqrt(mean(squares))
%%
% *Display Box Plot of Residuals for Each Digit Class*
residuals = testAngles - predictedTestAngles;
residualMatrix = reshape(residuals,500,10);
figure
boxplot(residualMatrix, ...
'Labels',{'0','1','2','3','4','5','6','7','8','9'})
xlabel('Digit Class')
ylabel('Degrees Error')
title('Residuals')
idx = randperm(numTestImages,49);
for i = 1:numel(idx)
image = testImages(:,:,:,idx(i));
predictedAngle = predictedTestAngles(idx(i));
imagesRotated(:,:,:,i) = imrotate(image,predictedAngle,'bicubic','crop');
end
figure
subplot(1,2,1)
montage(testImages(:,:,:,idx))
title('Original')
subplot(1,2,2)
montage(imagesRotated)
title('Corrected')
Here is the Keras code:
import numpy as np
import scipy.io
import matplotlib.pyplot as plt
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dropout, Flatten, Dense, Activation, Conv2D, BatchNormalization, AveragePooling2D
from keras.optimizers import SGD
from keras.utils import np_utils
from keras import regularizers
np.random.seed(1671) # for reproducibility
# network and training
NB_EPOCH = 30
BATCH_SIZE = 128
VERBOSE = 1
NB_CLASSES = 10 # number of outputs = number of digits
OPTIMIZER = SGD() # SGD optimizer, explained later in this chapter
N_HIDDEN = 128
VALIDATION_SPLIT=0.2 # how much TRAIN is reserved for VALIDATION
data = scipy.io.loadmat('RegressionImageData.mat')
XTrain = np.rollaxis(data['XTrain'],3,0)
XTest = np.rollaxis(data['XTest'],3,0)
YTest = np.squeeze(data['YTest'])
YTrain = np.squeeze(data['YTrain'])
print("Train Images:")
print(XTrain.shape)
print(type(XTrain))
print(XTrain)
XTrain_test = np.reshape(XTrain, (5000,28,28))
with open("./test.txt", "a+") as file:
np.set_printoptions(threshold=np.nan)
file.write(np.array2string(XTrain_test[0], max_line_width=np.inf))
model = Sequential()
model.add(Conv2D(25,(12,12), input_shape=(28,28,1), strides=(1,1), activation = "relu"))
model.add(Flatten())
model.add(Dense(1))
model.summary()
sgd = SGD(lr=0.001, decay=0.1, momentum=0.9, nesterov=False)
model.compile(loss='mean_squared_error', optimizer=sgd)
history = model.fit(XTrain, YTrain,
batch_size=BATCH_SIZE, epochs=NB_EPOCH,
verbose=VERBOSE, validation_split=VALIDATION_SPLIT,
shuffle=False)
predictions= model.predict(XTrain)
[np.transpose(predictions[1:50]), np.transpose(YTrain[1:50]), np.abs(np.transpose(predictions[1:50])- np.transpose(YTrain[1:50]))]
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('rmse')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
Ypred_test= model.predict(XTest)
Ypred_test=np.reshape(Ypred_test, (5000,))
predictionError = Ypred_test - YTest
thr = 10;
numCorrect = np.sum((np.abs(predictionError) < thr)*1)
numValidationImages = len(YTest)
accuracy = numCorrect/numValidationImages
print(accuracy)
squares = np.power(predictionError,2)
rmse = np.sqrt(np.mean(squares))
print(rmse)
Anyone know where could be the gap?
I need one or more hints to get over the first pain in transfer-learning.
The following code is a stripped-down version of what I am actually trying to do, but it shows the issues even with one the fake image (A: empty / B: empty + little square) I use there. In the final version, the input will be much more complex images (which justifies the complexity of the applied base model).
The problem looks simple. Input: two types of images, output: binary classification ("square present yes/no"). The modified ResNet50 model is fed with prepared training data via ImageDataGenerator. As I can create any amount of fake data, there is no data augmentation step in the code.
Anyway, when I run the code the displayed loss (for both the Adam and the SDG optimizer) doesn't seem to improve and the accuracy quickly tends to approach the ratio of the number of the examples in the two image classes (i.e. B/A). (Note: during the weekend, I even tried for 500 epochs ... no change.)
For both (most likely connected) issues I haven't been able to spot the reason yet ... could you? Is it one of the hyper-parameters, is there an obvious glitch in the model setup or any other part of the implementation? Probably it's just something stupid, but after chasing it and playing around with different and more and more simplified versions, I am about to run out of ideas regarding what to try next.
import cv2
import matplotlib.pyplot as plt
import numpy as np
from tqdm import tqdm
from random import randint
from keras.layers import Dense, GlobalAveragePooling2D
from keras.optimizers import Adam
from keras.models import Model
from keras.applications import ResNet50
from keras.preprocessing.image import ImageDataGenerator
def modified_resnet_model():
# load ResNet50 model excluding classification layers
basemodel = ResNet50(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
# freeze model weights
for layer in basemodel.layers:
layer.trainable = False
# add new classification head
x = GlobalAveragePooling2D()(basemodel.output)
x = Dense(128, activation='relu')(x)
predictions = Dense(1, activation='softmax')(x)
modresnet50model = Model(inputs=basemodel.input, outputs=predictions)
# return the result
return modresnet50model
def data_set_creator(numsamples, probpos, target_image_size=(224, 224)):
dataset = {}
image_stack = []
immean = np.array([0.0, 0.0, 0.0])
imstat = {}
# first create target labels
lbbuf = np.zeros((numsamples, 1))
lbbuf[:int(probpos*numsamples)] = 1
lbbuf = np.random.permutation(lbbuf)
# second create matching "fake" images according to label stack
for index in tqdm(range(numsamples)):
# zero labeled images are empty
img = np.zeros((target_image_size[0], target_image_size[1], 3)).astype(np.float32)
sh = 10
if lbbuf[index]:
# all others contain a suqare somewhere
xp = randint(sh, target_image_size[0]-1-sh)
yp = randint(sh, target_image_size[1]-1-sh)
randval = 100 # randint(1, 255)
# print('center: ({0:d},{1:d}); value: {2:d}'.format(xp, yp, randval))
img[yp-sh:yp+sh, xp-sh:xp+sh, :] = randval
# else:
# print(' --- ')
# normalize image and add it to the image stack
img /= 255.0 # normalize image
image_stack.append(img)
# update mean vector
immean += cv2.mean(img)[:-1]
# assemple data set
imstat['mean'] = immean/numsamples
image_stack = np.array(image_stack)
dataset['images'] = image_stack
dataset['imstat'] = imstat
dataset['labels'] = lbbuf
# return the result
return dataset
if __name__ == '__main__':
# define some parameters
imagesize = (224, 224)
nsamples = 10000
pos_prob_train = 0.3
probposval = pos_prob_train
valfrac = 0.1 # use 10% of the data for validation
batchsize = 24
epochs = 30
stepsperepoch = 100
validationsteps = 25
# ================================================================================
# create training and validation data sets
nst = int(nsamples*(1-valfrac))
dataset_training = data_set_creator(nst, pos_prob_train, target_image_size=imagesize)
dataset_validation = data_set_creator(nsamples-nst, probposval, target_image_size=imagesize)
# subtract the mean (training data!) from all the images
for ci in range(3):
dataset_training['images'][:, :, :, ci] -= dataset_training['imstat']['mean'][ci]
dataset_validation['images'][:, :, :, ci] -= dataset_training['imstat']['mean'][ci]
# get the (modified) model
model = modified_resnet_model()
theoptimizer = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-8)
model.compile(optimizer=theoptimizer, loss='binary_crossentropy', metrics=['accuracy'])
print(model.summary())
# setup data input generators
train_datagen = ImageDataGenerator()
validation_datagen = ImageDataGenerator()
train_generator = train_datagen.flow(dataset_training['images'],
dataset_training['labels'],
batch_size=batchsize)
validation_generator = validation_datagen.flow(dataset_validation['images'],
dataset_validation['labels'],
batch_size=batchsize)
# train the (modified) model
history = model.fit_generator(train_generator, steps_per_epoch=stepsperepoch,
epochs=epochs, validation_data=validation_generator,
validation_steps=validationsteps)
#visualize the training and validation performance
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']
nepochs = range(1, len(acc)+1)
plt.plot(nepochs, acc, 'bo', label='Training acc')
plt.plot(nepochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()
plt.savefig('trainval_acc.png')
plt.figure()
plt.plot(nepochs, loss, 'bo', label='Training loss')
plt.plot(nepochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.savefig('trainval_loss.png')
plt.show()
I have programmed a GAN model using keras but the training didn't go well. The generator model always returns a bare noise image (28x28 size) instead of something similar to mnist dataset. This doesn't give me any error though, when it comes to training discriminator model will become trainable=False, which is not what I want to do.
If this implementation is bad, please let me know. Can anyone help?
import os
import numpy as np
import matplotlib.pyplot as plt
import keras
from keras.models import Sequential
from keras.layers import Dense, Activation, BatchNormalization
from keras.optimizers import SGD, Adam, RMSprop
from keras.datasets import mnist
from keras.regularizers import l1_l2
def plot_generated(noise, Generator):
image_fake = Generator.predict(noise)
plt.figure(figsize=(10,8))
plt.show()
plt.close()
def plot_metircs(metrics, epoch=None):
plt.figure(figsize=(10,8))
plt.plot(metrics['d'], label='discriminative loss', color='b')
plt.legend()
plt.show()
plt.close()
plt.figure(figsize=(10,8))
plt.plot(metrics['g'], label='generative loss', color='r')
plt.legend()
plt.show()
plt.close()
def Generator():
model = Sequential()
LeakyReLU = keras.layers.advanced_activations.LeakyReLU(alpha=0.2)
model.add(Dense(input_dim=100, units=128, activation=LeakyReLU, name='g_input'))
model.add(Dense(input_dim=128, units=784, activation='tanh', name='g_output'))
return model
def Discriminator():
model = Sequential()
LeakyReLU = keras.layers.advanced_activations.LeakyReLU(alpha=0.2)
model.add(Dense(input_dim=784, units=128, activation=LeakyReLU, name='d_input'))
model.add(Dense(input_dim=128, units=1, activation='sigmoid', name='d_output'))
model.compile(loss='binary_crossentropy', optimizer='Adam')
return model
def Generative_Adversarial_Network(Generator, Discriminator):
model = Sequential()
model.add(Generator)
model.add(Discriminator)
# train only generator in the entire GAN architecture
Discriminator.trainable = False
model.compile(loss='binary_crossentropy', optimizer='Adam')
return model
def Training(z_input_size, Generator, Discriminator, GAN, loss_dict, X_train, epoch, batch, smooth):
for e in range(epoch):
# z: noise, used for input of G to generate fake image based on this noise! it's like a seed
noise = np.random.uniform(-1, 1, size=[batch, z_input_size])
image_fake = Generator.predict_on_batch(noise)
# sampled real_image from dataset
rand_train_index = np.random.randint(0, X_train.shape[0], size=batch)
image_real = X_train[rand_train_index, :]
# concatenate real and fake images
"""
X = [
image_real => label : 1 (we can multiply a smoothing factor)
image_fake => label : 0
]
"""
X = np.vstack((image_real, image_fake))
y = np.zeros(len(X))
# putting label "1" to image_real
y[len(image_real):] = 1*(1 - smooth)
y = y.astype(int)
# train only discriminator
d_loss = Discriminator.train_on_batch(x=X, y=y)
# NOTE: remember?? we set discriminator OFF during the training of GAN!
# So, we can safely train only generator, weight of discriminator set fixed!
g_loss = GAN.train_on_batch(x=noise, y=y[len(noise):])
loss_dict['d'].append(d_loss)
loss_dict['g'].append(g_loss)
if e%1000 == 0:
plt.imshow(image_fake)
plt.show()
plot_generated(noise, Generator)
plot_metircs(loss_dict)
return "done!"
Gen = Generator()
Dis = Discriminator()
GAN = Generative_Adversarial_Network(Gen, Dis)
GAN.summary()
Gen.summary()
Dis.summary()
gan_losses = {"d":[], "g":[], "f":[]}
epoch = 30000
batch = 1000
smooth = 0.9
z_input_size = 100
row, col = 28, 28
z_group_matrix = np.random.uniform(0, 1, examples*z_input_size)
z_group_matrix = z_group_matrix.reshape([9, z_input_size])
print(z_group_matrix.shape)
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train, X_test = X_train.reshape(X_train.shape[0], row*col), X_test.reshape(X_test.shape[0], row*col)
X_train.astype('float32')
X_test.astype('float32')
X_train, X_test = X_train/255, X_test/255
print('X_train shape: ', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
Training(z_input_size, Gen, Dis, GAN, loss_dict=gan_losses, X_train=X_train, epoch=epoch, batch=batch, smooth=smooth)
The model itself is correct.
I would suggest a few minor changes:
smooth 0.9 is too much. Make it close to 0.1.
Leak Factor you have is 0.2, usually its a very small decimal close to 0; take around
0.01/0.02.
Batchsize around 400
Epochs around 2000
And finally early stopping with a bit large threshold.
It looks like simple CAE not working for Carvana dataset
I’m trying simple CAE for Carvana dataset. You can download it here
My code is following:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from skimage.io import imread
from skimage.transform import downscale_local_mean
from skimage.color import rgb2grey
from os.path import join, isfile
from tqdm import tqdm_notebook
from sklearn.model_selection import train_test_split
from keras.layers import Conv2D, MaxPooling2D, Conv2DTranspose, Input, concatenate
from keras.models import Model
from keras.callbacks import ModelCheckpoint
import keras.backend as K
from scipy.ndimage.filters import gaussian_filter
from keras.optimizers import Adam
from random import randint
import hickle as hkl
import dill
class Data(object):
def __init__(self, X, Y):
self.X = X
self.Y = Y
input_folder = join('..', 'input')
print('Path:',input_folder)
data_file_name = 'datafile.pkl'
df_mask = pd.read_csv(join(input_folder, 'train_masks.csv'), usecols=['img'])
load_img = lambda im, idx: imread(join(input_folder, 'train', '{}_{:02d}.jpg'.format(im, idx)))
load_mask = lambda im, idx: imread(join(input_folder, 'train_masks', '{}_{:02d}_mask.gif'.format(im, idx)))
ids_train = df_mask['img'].map(lambda s: s.split('_')[0]).unique()
imgs_idx = list(range(1, 17))
resize = lambda im: downscale_local_mean(im, (4,4) if im.ndim==2 else (4,4,1))
mask_image = lambda im, mask: (im * np.expand_dims(mask, 2))
num_train = 48#len(ids_train)
if isfile(data_file_name):
#with open(data_file_name, 'rb') as f:
data = hkl.load(data_file_name)
X = data.X
y = data.y
else:
X = np.empty((num_train, 320, 480, 1), dtype=np.float32)
y = np.empty((num_train, 320, 480, 1), dtype=np.float32)
with tqdm_notebook(total=num_train) as bar:
idx = 1 # Rotation index
for i, img_id in enumerate(ids_train[:num_train]):
imgs_id = [resize(load_img(img_id, j)) for j in imgs_idx]
greyscale = rgb2grey(imgs_id[idx-1]) / 255
greyscale = np.expand_dims(greyscale, 2)
X[i] = greyscale
y_processed = resize(np.expand_dims(load_mask(img_id, idx), 2)) / 255.
y[i] = y_processed
del imgs_id # Free memory
bar.update()
#data = Data(X, y)
#with open(data_file_name, 'w+') as f:
#hkl.dump(data, data_file_name)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=43)
y_train_mean = y_train.mean(axis=0)
y_train_std = y_train.std(axis=0)
y_train_min = y_train.min(axis=0)
y_features = np.concatenate([y_train_mean, y_train_std, y_train_min], axis=2)
inp = Input((320, 480, 1))
conv1 = Conv2D(64, 3, activation='relu', padding='same')(inp)
max1 = MaxPooling2D(2)(conv1)
conv2 = Conv2D(48, 5, activation='relu', padding='same')(max1)
max2 = MaxPooling2D(2)(conv2)
conv3 = Conv2D(32, 7, activation='relu', padding='same')(max2)
deconv3 = Conv2DTranspose(32, 7, strides=4, activation='relu', padding='same')(conv3)
deconv2 = Conv2DTranspose(48, 5, strides=2, activation='relu', padding='same')(conv2)
deconvs = concatenate([conv1, deconv2, deconv3])
out = Conv2D(1, 7, activation='sigmoid', padding='same')(deconvs)
model = Model(inp, out)
model.summary()
smooth = 1.
# From here: https://github.com/jocicmarko/ultrasound-nerve-segmentation/blob/master/train.py
def dice_coef(y_true, y_pred):
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection = K.sum(y_true_f * y_pred_f)
return (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)
def bce_dice_loss(y_true, y_pred):
return 0.5 * K.binary_crossentropy(y_true, y_pred) - dice_coef(y_true, y_pred)
model.compile(Adam(lr=0.0001), bce_dice_loss, metrics=['accuracy', dice_coef])
cae_filepath = "cae_375.hdf5"
pre_mcp = ModelCheckpoint(cae_filepath, monitor='val_dice_coef', verbose=2, save_best_only=True, mode='max')
pre_history = model.fit(X_train, X_train, epochs=1000, validation_data=(X_val, X_val), batch_size=22, verbose=2, callbacks=[pre_mcp])
model.compile(Adam(lr=0.0001), bce_dice_loss, metrics=['accuracy', dice_coef])
model.load_weights(cae_filepath)
filepath="weights-improvement2_lre-5-{epoch:02d}-{val_acc:.5f}-{val_dice_coef:.5f}.hdf5"
mcp = ModelCheckpoint(filepath, monitor='val_dice_coef', verbose=2, save_best_only=True, mode='max')
history = model.fit(X_train, y_train, epochs=1000, validation_data=(X_val, y_val), batch_size=22, verbose=2, callbacks=[mcp])
idxs = [0, X_val.shape[0]/2, randint(1, X_val.shape[0] -1)]
for idx in idxs:
print('Index:', idx)
x = X_val[idx]
fig, ax = plt.subplots(3,3, figsize=(16, 16))
ax = ax.ravel()
cmaps = ['Reds', 'Greens', 'Blues']
for i in range(x.shape[-1]):
ax[i].imshow(x[...,i], cmap='gray') #cmaps[i%3])
ax[i].set_title('channel {}'.format(i))
ax[-8].imshow(y_val[idx,...,0], cmap='gray')
ax[-8].set_title('y')
y_pred = model.predict(x[None]).squeeze()
ax[-7].imshow(y_pred, cmap='gray')
ax[-7].set_title('y_pred')
ax[-6].imshow(gaussian_filter(y_pred,1) > 0.5, cmap='gray')
ax[-6].set_title('1')
ax[-5].imshow(gaussian_filter(y_pred,2) > 0.5, cmap='gray')
ax[-5].set_title('2')
ax[-4].imshow(gaussian_filter(y_pred,3) > 0.5, cmap='gray')
ax[-4].set_title('3')
ax[-3].imshow(gaussian_filter(y_pred,4) > 0.5, cmap='gray')
ax[-3].set_title('4')
ax[-2].imshow(gaussian_filter(y_pred,5) > 0.5, cmap='gray')
ax[-2].set_title('5')
ax[-1].imshow(gaussian_filter(y_pred,6) > 0.5, cmap='gray')
ax[-1].set_title('6')
It’s working fine without pre-training, you can check it by commenting these lines:
model.compile(Adam(lr=0.0001), bce_dice_loss, metrics=['accuracy', dice_coef])
cae_filepath = "cae_375.hdf5"
pre_mcp = ModelCheckpoint(cae_filepath, monitor='val_dice_coef', verbose=2, save_best_only=True, mode='max')
pre_history = model.fit(X_train, X_train, epochs=1000, validation_data=(X_val, X_val), batch_size=22, verbose=2, callbacks=[pre_mcp])
model.compile(Adam(lr=0.0001), bce_dice_loss, metrics=['accuracy', dice_coef])
model.load_weights(cae_filepath)
However, then I tried pre-train auto encoder to reconstruct original images I have no accuracy improvements, only dice coefficient improvements:
Moreover, then I tried using pre-trained autoencoder for training to make predictions based on training data I have a different result – accuracy stuck on level 0,8374 and dice coefficient degradation from 0.11864 initially down to 7.5781e-04:
Pre-train of model by autoencoder should increase model accuracy. From my experience it gives an improvement of accuracy to 99.62% for full MNIST dataset with a simple CAE
Also, I looked into data to make sure the same nature for both cases (you can see it by temporary variables to debug it in code)
In the second case I have an idea that it may be caused due to the fact, we have not only encoder, but also decoder’s weights and it can potentially cause an issue during training
After reset of decoder’s weights I had almost the same picture for some time:
But after 49 iteration process has reached a crucial moment and training process became efficient:
However, I have no clue why during autoencoder train we don’t have accuracy increase, despite the fact of dice coefficient improvements, probably something wrong with my code or frameworks I’m using
Additional info:
My environment:
Ubuntu 16.04
Python 2.7
Theano 0.10
Keras 2.0.8
Structure:
Any suggestions will be appreciated