GAN doesn't proceed training very well - python

I have programmed a GAN model using keras but the training didn't go well. The generator model always returns a bare noise image (28x28 size) instead of something similar to mnist dataset. This doesn't give me any error though, when it comes to training discriminator model will become trainable=False, which is not what I want to do.
If this implementation is bad, please let me know. Can anyone help?
import os
import numpy as np
import matplotlib.pyplot as plt
import keras
from keras.models import Sequential
from keras.layers import Dense, Activation, BatchNormalization
from keras.optimizers import SGD, Adam, RMSprop
from keras.datasets import mnist
from keras.regularizers import l1_l2
def plot_generated(noise, Generator):
image_fake = Generator.predict(noise)
plt.figure(figsize=(10,8))
plt.show()
plt.close()
def plot_metircs(metrics, epoch=None):
plt.figure(figsize=(10,8))
plt.plot(metrics['d'], label='discriminative loss', color='b')
plt.legend()
plt.show()
plt.close()
plt.figure(figsize=(10,8))
plt.plot(metrics['g'], label='generative loss', color='r')
plt.legend()
plt.show()
plt.close()
def Generator():
model = Sequential()
LeakyReLU = keras.layers.advanced_activations.LeakyReLU(alpha=0.2)
model.add(Dense(input_dim=100, units=128, activation=LeakyReLU, name='g_input'))
model.add(Dense(input_dim=128, units=784, activation='tanh', name='g_output'))
return model
def Discriminator():
model = Sequential()
LeakyReLU = keras.layers.advanced_activations.LeakyReLU(alpha=0.2)
model.add(Dense(input_dim=784, units=128, activation=LeakyReLU, name='d_input'))
model.add(Dense(input_dim=128, units=1, activation='sigmoid', name='d_output'))
model.compile(loss='binary_crossentropy', optimizer='Adam')
return model
def Generative_Adversarial_Network(Generator, Discriminator):
model = Sequential()
model.add(Generator)
model.add(Discriminator)
# train only generator in the entire GAN architecture
Discriminator.trainable = False
model.compile(loss='binary_crossentropy', optimizer='Adam')
return model
def Training(z_input_size, Generator, Discriminator, GAN, loss_dict, X_train, epoch, batch, smooth):
for e in range(epoch):
# z: noise, used for input of G to generate fake image based on this noise! it's like a seed
noise = np.random.uniform(-1, 1, size=[batch, z_input_size])
image_fake = Generator.predict_on_batch(noise)
# sampled real_image from dataset
rand_train_index = np.random.randint(0, X_train.shape[0], size=batch)
image_real = X_train[rand_train_index, :]
# concatenate real and fake images
"""
X = [
image_real => label : 1 (we can multiply a smoothing factor)
image_fake => label : 0
]
"""
X = np.vstack((image_real, image_fake))
y = np.zeros(len(X))
# putting label "1" to image_real
y[len(image_real):] = 1*(1 - smooth)
y = y.astype(int)
# train only discriminator
d_loss = Discriminator.train_on_batch(x=X, y=y)
# NOTE: remember?? we set discriminator OFF during the training of GAN!
# So, we can safely train only generator, weight of discriminator set fixed!
g_loss = GAN.train_on_batch(x=noise, y=y[len(noise):])
loss_dict['d'].append(d_loss)
loss_dict['g'].append(g_loss)
if e%1000 == 0:
plt.imshow(image_fake)
plt.show()
plot_generated(noise, Generator)
plot_metircs(loss_dict)
return "done!"
Gen = Generator()
Dis = Discriminator()
GAN = Generative_Adversarial_Network(Gen, Dis)
GAN.summary()
Gen.summary()
Dis.summary()
gan_losses = {"d":[], "g":[], "f":[]}
epoch = 30000
batch = 1000
smooth = 0.9
z_input_size = 100
row, col = 28, 28
z_group_matrix = np.random.uniform(0, 1, examples*z_input_size)
z_group_matrix = z_group_matrix.reshape([9, z_input_size])
print(z_group_matrix.shape)
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train, X_test = X_train.reshape(X_train.shape[0], row*col), X_test.reshape(X_test.shape[0], row*col)
X_train.astype('float32')
X_test.astype('float32')
X_train, X_test = X_train/255, X_test/255
print('X_train shape: ', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
Training(z_input_size, Gen, Dis, GAN, loss_dict=gan_losses, X_train=X_train, epoch=epoch, batch=batch, smooth=smooth)

The model itself is correct.
I would suggest a few minor changes:
smooth 0.9 is too much. Make it close to 0.1.
Leak Factor you have is 0.2, usually its a very small decimal close to 0; take around
0.01/0.02.
Batchsize around 400
Epochs around 2000
And finally early stopping with a bit large threshold.

Related

Training autoencoder for variant length time series - Tensorflow

I am trying to train a LSTM model to reconstruct time series data. I have a data set of ~1800 univariant time-series.
Basically I'm trying to solve a problem similar to this one Anomaly detection in ECG plots, but my time series have different lengths.
I used this approach to deal with variant length:
How to apply LSTM-autoencoder to variant-length time-series data?
and this approach to split the input data based on shape:
Keras misinterprets training data shape
When looping over the data and fitting a model for every shape. is the model eventually only based on the last shape it trained on or is it using all the data to train the final model?
How would I train the model on all input data regardless shape of data?
I know I can add padding but I am trying to use the data as is at this point.
Any suggestions or other approaches to deal with different length on timeseries?
(It is not an issue of time sampling it is more of one timeseries started recording on day X and some only on day X+100)
Here is the code I am using for my autoencoder:
import keras.backend as K
from keras.layers import (Input, Dense, TimeDistributed, LSTM, GRU, Dropout, merge,
Flatten, RepeatVector, Bidirectional, SimpleRNN, Lambda)
def encoder(model_input, layer, size, num_layers, drop_frac=0.0, output_size=None,
bidirectional=False):
"""Encoder module of autoencoder architecture"""
if output_size is None:
output_size = size
encode = model_input
for i in range(num_layers):
wrapper = Bidirectional if bidirectional else lambda x: x
encode = wrapper(layer(size, name='encode_{}'.format(i),
return_sequences=(i < num_layers - 1)))(encode)
if drop_frac > 0.0:
encode = Dropout(drop_frac, name='drop_encode_{}'.format(i))(encode)
encode = Dense(output_size, activation='linear', name='encoding')(encode)
return encode
def repeat(x):
stepMatrix = K.ones_like(x[0][:,:,:1]) #matrix with ones, shaped as (batch, steps, 1)
latentMatrix = K.expand_dims(x[1],axis=1) #latent vars, shaped as (batch, 1, latent_dim)
return K.batch_dot(stepMatrix,latentMatrix)
def decoder(encode, layer, size, num_layers, drop_frac=0.0, aux_input=None,
bidirectional=False):
"""Decoder module of autoencoder architecture"""
decode = Lambda(repeat)([inputs,encode])
if aux_input is not None:
decode = merge([aux_input, decode], mode='concat')
for i in range(num_layers):
if drop_frac > 0.0 and i > 0: # skip these for first layer for symmetry
decode = Dropout(drop_frac, name='drop_decode_{}'.format(i))(decode)
wrapper = Bidirectional if bidirectional else lambda x: x
decode = wrapper(layer(size, name='decode_{}'.format(i),
return_sequences=True))(decode)
decode = TimeDistributed(Dense(1, activation='linear'), name='time_dist')(decode)
return decode
inputs = Input(shape=(None, 1))
encoded = encoder(inputs,LSTM,128, 2, drop_frac=0.0, output_size=None, bidirectional=False)
decoded = decoder(encoded, LSTM, 128, 2, drop_frac=0.0, aux_input=None,
bidirectional=False,)
sequence_autoencoder = Model(inputs, decoded)
sequence_autoencoder.compile(optimizer='adam', loss='mae')
trainByShape = {}
for item in train_data:
if item.shape in trainByShape:
trainByShape[item.shape].append(item)
else:
trainByShape[item.shape] = [item]
for shape in trainByShape:
modelHistory =sequence_autoencoder.fit(
np.asarray(trainByShape[shape]),
np.asarray(trainByShape[shape]),
epochs=100, batch_size=1, validation_split=0.15)
use a bidirectional lstm and increase the number of parameters to gain accuracy. I increased the latent_dim to 1000 and it fit the data closely. More hardware and more memory.
def create_dataset(dataset, look_back=3):
dataX, dataY = [], []
for i in range(len(dataset)-look_back-1):
a = dataset[i:(i+look_back)]
dataX.append(a)
dataY.append(dataset[i + look_back])
return np.array(dataX), np.array(dataY)
COLUMNS=['Open']
dataset=eqix_df[COLUMNS]
scaler = MinMaxScaler(feature_range=(0, 1))
dataset = scaler.fit_transform(np.array(dataset).reshape(-1,1))
train_size = int(len(dataset) * 0.70)
test_size = len(dataset) - train_size
train, test = dataset[0:train_size], dataset[train_size:len(dataset)]
look_back=10
trainX=[]
testX=[]
y_train=[]
trainX, y_train = create_dataset(train, look_back)
testX, y_test = create_dataset(test, look_back)
X_train = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
X_test = np.reshape(testX, (testX.shape[0], 1, testX.shape[1]))
latent_dim=700
n_future=1
model = Sequential()
model.add(Bidirectional(LSTM(units=latent_dim, return_sequences=True,
input_shape=(X_train.shape[1], 1))))
#LSTM 1
model.add(Bidirectional(LSTM(latent_dim,return_sequences=True,dropout=0.4,recurrent_dropout=0.4,name='lstm1')))
#LSTM 2
model.add(Bidirectional(LSTM(latent_dim,return_sequences=True,dropout=0.2,recurrent_dropout=0.4,name='lstm2')))
#LSTM 3
model.add(Bidirectional(LSTM(latent_dim, return_sequences=False,dropout=0.2,recurrent_dropout=0.4,name='lstm3')))
model.add(Dense(units = n_future))
model.compile(optimizer="adam", loss="mean_squared_error", metrics=["acc"])
history=model.fit(X_train, y_train,epochs=50,verbose=0)
plt.plot(history.history['loss'])
plt.title('loss accuracy')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
#print(X_test)
prediction = model.predict(X_test)
# shift train predictions for plotting
trainPredictPlot = np.empty_like(dataset)
trainPredictPlot[:, :] = np.nan
trainPredictPlot[look_back:len(prediction)+look_back, :] = prediction
# shift test predictions for plotting
#plt.plot(scaler.inverse_transform(dataset))
plt.plot(trainPredictPlot, color='red')
#plt.plot(testPredictPlot)
#plt.legend(['Actual','Train','Test'])
x=np.linspace(look_back,len(prediction)+look_back,len(y_test))
plt.plot(x,y_test)
plt.show()
Keras LSTM implementation expect a input of type: (Batch, Timesteps, Features).
One solution would be to set Timesteps = 1 and pass the sequence lengths as the Batch dimensions.
If the sampling procedure is the same (no need for resampling), and the difference in length only comes from when the recording time start (X+100 instead of X), I would try to get rid off the lag in the pre-processing stages to get the section of interest only.
Part 1: Plotting the irregular heartbeat. Part 2 is a DENSE network to classify incoming heartbeat voltage to predict irregular beat patterns. 94% accuracy!
from scipy.io import arff
import pandas as pd
from scipy.misc import electrocardiogram
import matplotlib.pyplot as plt
import numpy as np
data = arff.loadarff('ECG5000_TRAIN.arff')
df = pd.DataFrame(data[0])
#for column in df.columns:
# print(column)
columns=[x for x in df.columns if x!="target"]
print(columns)
#print(df[df.target == "b'1'"].drop(labels='target', axis=1).mean(axis=0).to_numpy())
normal=df.query("target==b'1'").drop(labels='target', axis=1).mean(axis=0).to_numpy()
rOnT=df.query("target==b'2'").drop(labels='target', axis=1).mean(axis=0).to_numpy()
pcv=df.query("target==b'3'").drop(labels='target', axis=1).mean(axis=0).to_numpy()
sp=df.query("target==b'4'").drop(labels='target', axis=1).mean(axis=0).to_numpy()
ub=df.query("target==b'5'").drop(labels='target', axis=1).mean(axis=0).to_numpy()
plt.plot(normal,label="Normal")
plt.plot(rOnT,label="R on T",alpha=.3)
plt.plot(pcv, label="PCV",alpha=.3)
plt.plot(sp, label="SP",alpha=.3)
plt.plot(ub, label="UB",alpha=.3)
plt.legend()
plt.title("ECG")
plt.show()
Frame by frame comparision for normal. There are bands of operation which a normal heart stays with:
def PlotTheFrames(df,title,color):
fig,ax = plt.subplots(figsize=(140,50))
for key,item in df.iterrows():
array=[]
for value in np.array(item).flatten():
array.append(value);
x=np.linspace(0,100,len(array))
ax.plot(x,array,c=color)
plt.title(title)
plt.show()
normal=df.query("target==b'1'").drop(labels='target', axis=1)
PlotTheFrames(normal,"Normal Heart beat",'r')
R on T the valves don't seem to be operating correctly
rOnT=df.query("target==b'2'").drop(labels='target', axis=1)
PlotTheFrames(rOnT,"R on T Heart beat","b")
Use a deep learning dense network instead of LSTM! I used leakyReLU for the smaller gradient descent
X=df[columns]
y=pd.get_dummies(df['target'])
model=Sequential()
model.add(Dense(440, input_shape=(len(columns),),activation='LeakyReLU'))
model.add(Dropout(0.4))
model.add(Dense(280, activation='LeakyReLU'))
model.add(Dropout(0.2))
model.add(Dense(240, activation='LeakyReLU'))
model.add(Dense(32, activation='LeakyReLU'))
model.add(Dense(16, activation='LeakyReLU'))
model.add(Dense(5))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy'])
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.3, random_state=42)
scaler = StandardScaler()
scaler.fit(X_train)
X_train=scaler.transform(X_train)
X_test=scaler.transform(X_test)
history=model.fit(X_train, y_train,epochs = 1000,verbose=0)
model.evaluate(X_test, y_test)
plt.plot(history.history['loss'])
plt.title('loss accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

How to implement a CNN-LSTM using Keras

I am attempting to implement a CNN-LSTM that classifies mel-spectrogram images representing the speech of people with Parkinson's Disease/Healthy Controls. I am trying to implement a pre-existing model (DenseNet-169) with an LSTM model, however I am running into the following error: ValueError: Input 0 of layer zero_padding2d is incompatible with the layer: expected ndim=4, found ndim=3. Full shape received: [None, 216, 1]. Can anyone advise where I'm going wrong?
import librosa
import os
import glob
import IPython.display as ipd
from pathlib import Path
import timeit
import time, sys
%matplotlib inline
import matplotlib.pyplot as plt
import librosa.display
import pandas as pd
from sklearn import datasets, linear_model
from sklearn.model_selection import train_test_split
from matplotlib import pyplot as plt
import numpy as np
import cv2
import seaborn as sns
%tensorflow_version 1.x #version 1 works without problems
import tensorflow
from tensorflow.keras import models
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.layers import TimeDistributed
import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.metrics import confusion_matrix, plot_confusion_matrix
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dropout, Dense, BatchNormalization, Activation, GaussianNoise, LSTM
from sklearn.metrics import accuracy_score
DATA_DIR = Path('/content/drive/MyDrive/PhD_Project_Experiments/Spontaneous_Dialogue_PD_Dataset')
diagnosis = [x.name for x in DATA_DIR.glob('*') if x.is_dir()]
diagnosis
def create_paths_ds(paths: Path, label: str) -> list:
EXTENSION_TYPE = '.wav'
return [(x, label) for x in paths.glob('*' + EXTENSION_TYPE)]
from collections import Counter
categories_to_use = [
'Parkinsons_Disease',
'Healthy_Control',
]
NUM_CLASSES = len(categories_to_use)
print(f'Number of classes: {NUM_CLASSES}')
paths_all_labels = []
for cat in categories_to_use:
paths_all_labels += create_paths_ds(DATA_DIR / cat, cat)
X_train, X_test = train_test_split(paths_all_labels,test_size=0.1, stratify = [paths_all_labels[y][1] for y in range(len(paths_all_labels))] ) #fix stratified sampling for test data
X_train, X_val = train_test_split(X_train, test_size=0.2, stratify = [X_train[y][1] for y in range(len(X_train))] )
for i in categories_to_use:
print('Number of train samples for '+i+': '+ str([X_train[y][1] for y in range(len(X_train))].count(i))) #checks whether train samples are equally divided
print('Number of test samples for '+i+': '+ str([X_test[y][1] for y in range(len(X_test))].count(i))) #checks whether test samples are equally divided
print('Number of validation samples for '+i+': '+ str([X_val[y][1] for y in range(len(X_val))].count(i))) #checks whether val samples are equally divided
print(f'Train length: {len(X_train)}')
print(f'Validation length: {len(X_val)}')
print(f'Test length: {len(X_test)}')
def load_and_preprocess_lstm(dataset, SAMPLE_SIZE = 30):
IMG_SIZE = (216,128)
progress=0
data = []
labels = []
for (path, label) in dataset:
audio, sr = librosa.load(path)
dur = librosa.get_duration(audio, sr = sr)
sampleNum = int(dur / SAMPLE_SIZE)
offset = (dur % SAMPLE_SIZE) / 2
for i in range(sampleNum):
audio, sr = librosa.load(path, offset= offset+i, duration=SAMPLE_SIZE)
sample = librosa.feature.melspectrogram(audio, sr=sr)
# print(sample.shape)
sample = cv2.resize(sample, dsize=IMG_SIZE)
sample = np.expand_dims(sample,-1)
print(sample.shape)
data += [(sample, label)]
labels += [label]
progress +=1
print('\r Progress: '+str(round(100*progress/len(dataset))) + '%', end='')
return data, labels
def retrieve_samples(sample_size, model_type):
if model_type == 'cnn':
print("\nLoading train samples")
X_train_samples, train_labels = load_and_preprocess_cnn(X_train,sample_size)
print("\nLoading test samples")
X_test_samples, test_labels = load_and_preprocess_cnn(X_test,sample_size)
print("\nLoading val samples")
X_val_samples, val_labels = load_and_preprocess_cnn(X_val,sample_size)
print('\n')
elif model_type == 'lstm':
print("\nLoading train samples")
X_train_samples, train_labels = load_and_preprocess_lstm(X_train,sample_size)
print("\nLoading test samples")
X_test_samples, test_labels = load_and_preprocess_lstm(X_test,sample_size)
print("\nLoading val samples")
X_val_samples, val_labels = load_and_preprocess_lstm(X_val,sample_size)
print('\n')
elif model_type == "cnnlstm":
print("\nLoading train samples")
X_train_samples, train_labels = load_and_preprocess_lstm(X_train,sample_size)
print("\nLoading test samples")
X_test_samples, test_labels = load_and_preprocess_lstm(X_test,sample_size)
print("\nLoading val samples")
X_val_samples, val_labels = load_and_preprocess_lstm(X_val,sample_size)
print('\n')
print("shape: " + str(X_train_samples[0][0].shape))
print("number of training samples: "+ str(len(X_train_samples)))
print("number of validation samples: "+ str(len(X_val_samples)))
print("number of test samples: "+ str(len(X_test_samples)))
return X_train_samples, X_test_samples, X_val_samples
def create_cnn_lstm_model(input_shape):
model = Sequential()
cnn = tensorflow.keras.applications.DenseNet169(include_top=True, weights=None, input_tensor=None, input_shape=input_shape, pooling=None, classes=2)
# define LSTM model
model.add(tensorflow.keras.layers.TimeDistributed(cnn, input_shape=input_shape))
model.add(LSTM(units = 512, dropout=0.5, recurrent_dropout=0.3, return_sequences = True, input_shape = input_shape))
model.add(LSTM(units = 512, dropout=0.5, recurrent_dropout=0.3, return_sequences = False))
model.add(Dense(units=NUM_CLASSES, activation='sigmoid'))#Compile
model.compile(loss=tensorflow.keras.losses.binary_crossentropy, optimizer='adam', metrics=['accuracy'])
print(model.summary())
return model
def create_model_data_and_labels(X_train_samples, X_val_samples, X_test_samples):
#Prepare samples to work for training the model
labelizer = LabelEncoder()
#prepare training data and labels
x_train = np.array([x[0] for x in X_train_samples])
y_train = np.array([x[1] for x in X_train_samples])
y_train = labelizer.fit_transform(y_train)
y_train = to_categorical(y_train)
#prepare validation data and labels
x_val = np.array([x[0] for x in X_val_samples])
y_val = np.array([x[1] for x in X_val_samples])
y_val = labelizer.transform(y_val)
y_val = to_categorical(y_val)
#prepare test data and labels
x_test = np.array([x[0] for x in X_test_samples])
y_test = np.array([x[1] for x in X_test_samples])
y_test = labelizer.transform(y_test)
y_test = to_categorical(y_test)
return x_train, y_train, x_val, y_val, x_test, y_test, labelizer
#Main loop for testing multiple sample sizes
#choose model type: 'cnn' or 'lstm'
model_type = 'cnnlstm'
n_epochs = 20
patience= 20
es = EarlyStopping(patience=20)
fragment_sizes = [5,10]
start = timeit.default_timer()
ModelData = pd.DataFrame(columns = ['Model Type','Fragment size (s)', 'Time to Compute (s)', 'Early Stopping epoch', 'Training accuracy', 'Validation accuracy', 'Test Accuracy']) #create a DataFrame for storing the results
conf_matrix_data = []
for i in fragment_sizes:
start_per_size = timeit.default_timer()
print(f'\n---------- Model trained on fragments of size: {i} seconds ----------------')
X_train_samples, X_test_samples, X_val_samples = retrieve_samples(i,model_type)
x_train, y_train, x_val, y_val, x_test, y_test, labelizer = create_model_data_and_labels(X_train_samples, X_val_samples, X_test_samples)
if model_type == 'cnn':
model = create_cnn_model(X_train_samples[0][0].shape)
elif model_type == 'lstm':
model = create_lstm_model(X_train_samples[0][0].shape)
elif model_type == 'cnnlstm':
model = create_cnn_lstm_model(X_train_samples[0][0].shape)
history = model.fit(x_train, y_train,
batch_size = 8,
epochs=n_epochs,
verbose=1,
callbacks=[es],
validation_data=(x_val, y_val))
print('Finished training')
early_stopping_epoch = len(history.history['accuracy'])
training_accuracy = history.history['accuracy'][early_stopping_epoch-1-patience]
validation_accuracy = history.history['val_accuracy'][early_stopping_epoch-1-patience]
plot_data(history, i)
predictions = model.predict(x_test)
score = accuracy_score(labelizer.inverse_transform(y_test.argmax(axis=1)), labelizer.inverse_transform(predictions.argmax(axis=1)))
print('Fragment size = ' + str(i) + ' seconds')
print('Accuracy on test samples: ' + str(score))
conf_matrix_data += [(predictions, y_test, i)]
stop_per_size = timeit.default_timer()
time_to_compute = round(stop_per_size - start_per_size)
print ('Time to compute: '+str(time_to_compute))
ModelData.loc[len(ModelData)] = [model_type, i, time_to_compute, early_stopping_epoch, training_accuracy, validation_accuracy, score] #store particular settings configuration, early stoppping epoch and accuracies in dataframe
stop = timeit.default_timer()
print ('\ntime to compute: '+str(stop-start))
I believe the input_shape is (128, 216, 1)
The issue here is that you don't have a time-axis to time distribute your CNN (DenseNet169) layer over.
In this step -
tensorflow.keras.layers.TimeDistributed(cnn, input_shape=(128,216,1)))
You are passing the 128 dimension axis as a time-axis. That means each of the CNN (DenseNet169) is left with a input shape of (216,1), which is not an image and therefore throws an error because it's expecting 3D tensors (images) and not 2D tensors.
Your input shape needs to be a 4D tensor something like - (10, 128, 216, 1), so that the 10 becomes the time axis (for time distributing), and (128, 216, 1) becomes an image input for the CNN (DenseNet169).
A solution with ragged tensors and time-distributed layer
IIUC, your data contains n audio files, each file containing a variable number of mel-spectrogram images.
You need to use tf.raggedtensors to be able to work with variable tensor shapes as inputs to the model
This requires an explicit definition of an Input layer where you set ragged=True
This allows you to pass each audio file as a single sample, with variable images, each of which will be time distributed.
You will have to use None as the time distributed axis shape while defining the model
1. Creating a dummy dataset
Let's start with a sample dataset -
import tensorflow as tf
from tensorflow.keras import layers, Model, utils, applications
#Assuming there are 5 audio files
num_audio = 5
data = []
#Create a random number of mel-spectrograms for each audio file
for i in range(num_audio):
n_images = np.random.randint(4,10)
data.append(np.random.random((n_images,128,216,1)))
print([i.shape for i in data])
[(5, 128, 216, 1),
(5, 128, 216, 1),
(9, 128, 216, 1),
(6, 128, 216, 1),
(4, 128, 216, 1)]
So, your data should be looking something like this. Here, I have a dummy dataset with 5 audio files, first one has 5 images of shape (128,216,1), while the last one has 4 images of the same shape.
2. Converting them to ragged-tensors
Next, let's convert and store these are ragged tensors. Ragged tensors allow variable-length objects to be stored, in this case, a variable number of images. Read more about them here.
#Convert each set of images (for each audio) to tensors and then a ragged tensor
tensors = [tensorflow.convert_to_tensor(i) for i in data]
X_train = tensorflow.ragged.stack(tensors).to_tensor()
#Creating dummy y_train, one for each audio files
y_train = tensorflow.convert_to_tensor(np.random.randint(0,2,(5,2)))
3. Create a model
I am using a functional API since I find it more readable and works better with an explicit input layer, but you can use input layers in Sequential API as well. Feel free to convert it to your preference.
Notice that I am using (None,128,216,1) as input shape. This creates 5 channels (first implicit one for batches) as - (Batch, audio_files, h, w, channels)
I have a dummy LSTM layer to showcase how the architecture works, feel free to stack more layers. Also, do note, that your DenseNet169 is only returning 2 features. And therefore your TimeDistributed layers is returning (None, None, 2) shaped tensor, where first None is the number of audio files, and the second None is the number of images (time axis). Therefore, do choose your next layers accordingly as 512 LSTM cells may be too much :)
#Create model
inp = layers.Input((None,128,216,1), ragged=True)
cnn = tensorflow.keras.applications.DenseNet169(include_top=True,
weights=None,
input_tensor=None,
input_shape=(128,216,1), #<----- input shape for cnn is just the image
pooling=None, classes=2)
#Feel free to modify these layers!
x = layers.TimeDistributed(cnn)(inp)
x = layers.LSTM(8)(x)
out = layers.Dense(2)(x)
model = Model(inp, out)
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics='accuracy')
utils.plot_model(model, show_shapes=True, show_layer_names=False)
4. Train!
The next step is simply to train. Feel free to add your own parameters.
model.fit(X_train, y_train, epochs=2)
Epoch 1/2
WARNING:tensorflow:5 out of the last 5 calls to <function Model.make_train_function.<locals>.train_function at 0x7f8e55b4fe50> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating #tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your #tf.function outside of the loop. For (2), #tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
1/1 [==============================] - 37s 37s/step - loss: 3.4057 - accuracy: 0.4000
Epoch 2/2
1/1 [==============================] - 16s 16s/step - loss: 3.3544 - accuracy: 0.4000
Hope that helps.

accuracy loss when running inference on edge tpu with keras neural network 

I'm a student at the University of Bologna ( Italy) and I'm using the Google Coral USB accelerator for my thesis.
I realized a keras neural network that classifies my data in four classes and the accuracy I get is roughly 97%.
I performed a full integer post- training quantization since keras networks don't support quantization-aware training. I followed the guide on TensorFlow site but I had problems running inference on the edge tpu .
In particular my network undergoes an accuracy loss when the model is converted to a tensorflowlite one ( the accuracy drops to roughly the 25%) . This is due to quantization since the tensorflowlite model without quantization that runs on my pc is not affected by the conversion.
I tried to scale my input data in a range [0, 255] with MinMaxScaler but in this case , even if the accuracy of the tensorflowlite quantized model matches the one of the not converted one , the results are not satisfactory since the accuracy of the network itself is low.
I wonder if you could help me solve this problem. Perhaps the values on my dataset are too low and the quantization fails to convert float32 into uint8 without information loss.
Below you'll find my python code.
from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from numpy import loadtxt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.model_selection import train_test_split
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.optimizers import RMSprop
from sklearn.preprocessing import StandardScaler,MinMaxScaler,Normalizer
from sklearn.metrics import confusion_matrix, accuracy_score
from tensorflow.keras.callbacks import EarlyStopping
import os
import time
import tflite_runtime.interpreter as tflite
import collections
import operator
"""Functions to work with classification models."""
Class = collections.namedtuple('Class', ['id', 'score'])
def input_tensor(interpreter):
"""Returns input tensor view as numpy array of shape (height, width, 3)."""
tensor_index = interpreter.get_input_details()[0]['index']
return interpreter.tensor(tensor_index)()[0]
def output_tensor(interpreter):
"""Returns dequantized output tensor."""
output_details = interpreter.get_output_details()[0]
output_data = np.squeeze(interpreter.tensor(output_details['index'])()) #Remove single-dimensional entries from the shape of an array.
scale, zero_point = output_details['quantization']
return scale * (output_data - zero_point)
def set_input(interpreter, data):
"""Copies data to input tensor."""
input_tensor(interpreter)[:] = data
return data
def get_output(interpreter, top_k=1, score_threshold=0.0):
"""Returns no more than top_k classes with score >= score_threshold."""
scores = output_tensor(interpreter)
classes = [
Class(i, scores[i])
for i in np.argpartition(scores, -top_k)[-top_k:]
if scores[i] >= score_threshold
]
return sorted(classes, key=operator.itemgetter(1), reverse=True)
#load the dataset
Modelli_Prova01_Nom01_Acc1L = loadtxt(r'/home/utente/Scrivania/csvtesi/Modelli_Prova01_Nom01_Acc1L.csv',delimiter=',')
Modelli_Prova02_Nom01_Acc1L = loadtxt(r'/home/utente/Scrivania/csvtesi/Modelli_Prova02_Nom01_Acc1L.csv',delimiter=',')
Modelli_Prova03_Nom01_Acc1L = loadtxt(r'/home/utente/Scrivania/csvtesi/Modelli_Prova03_Nom01_Acc1L.csv',delimiter=',')
Modelli_Prova04_Nom01_Acc1L = loadtxt(r'/home/utente/Scrivania/csvtesi/Modelli_Prova04_Nom01_Acc1L.csv',delimiter=',')
Modelli_Prova05_Nom01_Acc1L = loadtxt(r'/home/utente/Scrivania/csvtesi/Modelli_Prova05_Nom01_Acc1L.csv',delimiter=',')
time_start = time.perf_counter()
#split x and y data (train and test)
Acc1L01_train,Acc1L01_test = train_test_split(Modelli_Prova01_Nom01_Acc1L ,test_size=0.015,random_state=42)
Acc1L02_train,Acc1L02_test = train_test_split(Modelli_Prova02_Nom01_Acc1L,test_size=0.3,random_state=42)
Acc1L03_train,Acc1L03_test = train_test_split(Modelli_Prova03_Nom01_Acc1L,test_size=0.3,random_state=42)
Acc1L04_train,Acc1L04_test = train_test_split(Modelli_Prova04_Nom01_Acc1L,test_size=0.3,random_state=42)
Acc1L05_train,Acc1L05_test = train_test_split(Modelli_Prova05_Nom01_Acc1L,test_size=0.15,random_state=42)
Y1_train= np.zeros([len(Acc1L01_train)+len(Acc1L05_train),1])
Y2_train= np.ones([len(Acc1L02_train),1])
Y3_train= np.ones([len(Acc1L03_train),1]) +1
Y4_train= np.ones([len(Acc1L04_train),1]) +2
Y1_test= np.zeros([len(Acc1L01_test)+len(Acc1L05_test),1])
Y2_test= np.ones([len(Acc1L02_test),1])
Y3_test= np.ones([len(Acc1L03_test),1]) +1
Y4_test= np.ones([len(Acc1L04_test),1]) +2
xAcc1L_train = np.concatenate((Acc1L01_train,Acc1L05_train,Acc1L02_train,Acc1L03_train,Acc1L04_train),axis=0)
xAcc1L_train=MinMaxScaler([0,255]).fit_transform(xAcc1L_train)
#xAcc1L_train=StandardScaler().fit_transform(xAcc1L_train)
#xAcc1L_train=Normalizer().fit_transform(xAcc1L_train)
#xAcc1L_train=np.transpose(xAcc1L_train)
yAcc1L_train = np.concatenate((Y1_train,Y2_train,Y3_train,Y4_train),axis=0)
xAcc1L_test = np.concatenate((Acc1L01_test,Acc1L05_test,Acc1L02_test,Acc1L03_test,Acc1L04_test),axis=0)
xAcc1L_test=Normalizer().fit_transform(xAcc1L_test)
#xAcc1L_test=MinMaxScaler([0,255]).fit_transform(xAcc1L_test)
#xAcc1L_test=StandardScaler().fit_transform(xAcc1L_test)
#xAcc1L_test=np.transpose(xAcc1L_test)
yAcc1L_test = np.concatenate((Y1_test,Y2_test,Y3_test,Y4_test),axis=0)
#1 hot encode y
one_hot_labelsAcc1L =to_categorical(yAcc1L_train, num_classes=4)
one_hot_labelsAcc1L_test = to_categorical(yAcc1L_test, num_classes=4)
#fit the model
model = Sequential()
model.add(Dense(300, activation='relu', input_dim=30))
model.add(Dense(4, activation='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.summary()
es1 = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=100)
es2 = EarlyStopping(monitor='val_accuracy', mode='max', verbose=1, patience=100)
history=model.fit(xAcc1L_train, one_hot_labelsAcc1L,validation_data=(xAcc1L_test,one_hot_labelsAcc1L_test),epochs=500, batch_size=30, verbose=1, callbacks=[es1,es2])
#history=model.fit(tf.cast(xAcc1L_train, tf.float32), one_hot_labelsAcc1L,validation_data=(tf.cast(xAcc1L_test, tf.float32),one_hot_labelsAcc1L_test),epochs=500, batch_size=30, verbose=1, callbacks=[es1,es2])
time_elapsed = (time.perf_counter() - time_start)
print ("%5.1f secs " % (time_elapsed))
start=time.monotonic()
_, accuracy = model.evaluate(xAcc1L_test, one_hot_labelsAcc1L_test, batch_size=30, verbose=1)
#_, accuracy = model.evaluate(tf.cast(xAcc1L_test, tf.float32), one_hot_labelsAcc1L_test, batch_size=30, verbose=1)
print(accuracy)
inference_time = time.monotonic() - start
print('%.1fms ' % (inference_time * 1000))
# summarize history for accuracy
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper right')
plt.show()
#predicted labels
predictions = model.predict(xAcc1L_test)
y_pred = (predictions > 0.5)
matrix = confusion_matrix(one_hot_labelsAcc1L_test.argmax(axis=1), y_pred.argmax(axis=1))
print('confusion matrix = \n',matrix)
print("Accuracy:",accuracy_score(one_hot_labelsAcc1L_test.argmax(axis=1), y_pred.argmax(axis=1)))
mod01=model.save('/home/utente/Scrivania/csvtesi/rete_Nom01.h5')
#convert the model
#representative dataset
train_ds = tf.data.Dataset.from_tensor_slices(
(tf.cast(xAcc1L_train, tf.float32))).batch(1)
print(train_ds)
def representative_dataset_gen():
for input_value in train_ds:
yield [input_value]
print(model.layers[0].input_shape)
#integer post-training quantization
converter = tf.compat.v1.lite.TFLiteConverter.from_keras_model_file('/home/utente/Scrivania/csvtesi/rete_Nom01.h5') #all operations mapped on edge tpu
#converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_gen
print(converter.representative_dataset)
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
tflite_quant_model = converter.convert()
open('/home/utente/Scrivania/csvtesi/rete_Nom01_quant.tflite', "wb").write(tflite_quant_model)
#compiler compila il modello quantizzato tflite per edge tpu
os.system("edgetpu_compiler \'/home/utente/Scrivania/csvtesi/rete_Nom01_quant.tflite'")
#interpret the model
interpreter = tf.lite.Interpreter('/home/utente/Scrivania/csvtesi/rete_Nom01_quant_edgetpu.tflite',experimental_delegates=[tflite.load_delegate('libedgetpu.so.1')])
interpreter.allocate_tensors()
idt=print(interpreter.get_input_details())
odt=print(interpreter.get_output_details())
for j in range(5):
start = time.monotonic()
o_test=np.arange(len(xAcc1L_test[:,0]))
o_test=o_test[:,np.newaxis]
for i in range (len(xAcc1L_test[:,0])):
input=set_input(interpreter, xAcc1L_test[i,:])
#print("inference input %s" % input)
interpreter.invoke()
classes = get_output(interpreter,4)
output = interpreter.get_tensor(interpreter.get_output_details()[0]['index'])#/255 con edgetpu
#print("inference output %s" % output)
#print("inference classes %s" % classes)
a=np.array([one_hot_labelsAcc1L_test[i,:].argmax(axis=0)])
b=np.array(output.argmax(axis=1))
o_test[i]=b
#if a==b:
#print('good classification')
#else:
#print('bad classification')
inference_time = time.monotonic() - start
print('%.1fms ' % (inference_time * 1000))
#print(o_test)
print("Accuracy:",accuracy_score(yAcc1L_test,o_test))
My input train dataset is part of csv files and it's a matrix of dimensions = (1756,30) and my input test one is a matrix of (183,30).
This is how the data looks like (first two rows):
[[-0.283589 -0.0831421 -0.199936 -0.144523 -0.215593 -0.199029 0.0300179 -0.0299262 -0.0759612 -0.0349733 0.102882 -0.00470235 -0.14267 -0.116636 -0.0842867 -0.124638 -0.107917 -0.0995006 -0.222817 -0.256093 -0.121859 -0.130829 -0.186091 -0.174511 -0.0715493 -0.0595195 -0.054914 -0.0362971 -0.0286576 -0.0409128],
[-0.226151 -0.0386177 -0.16834 -0.0768908 -0.166611 -0.161028 0.0493133 -0.00515959 -0.0362308 -0.00723895 0.105943 -0.010825 -0.142335 -0.10863 -0.0634201 -0.112928 -0.0927994 -0.0556194 -0.180721 -0.218341 -0.0934449 -0.100047 -0.134569 -0.119806 -0.0265749 -0.044841 -0.0538225 -0.017408 -0.00528171 -0.0248457]]

"Same" network on MATLAB and Keras has very different results

I have been trying to replicate the same simple network structure in MATLAB and Keras. The problem is the accuracy I get is very different. MATLAB code gets accuracy near 0.84 and loss near 17 and Keras code gets accuracy near 0.63 and loss near 130, with Keras using double epochs to train and the same data. I think the difference is too big to be a matter of implementation, so I think I'm missing something.
The original code is from a MATLAB example in which I have made a little change to avoid normalization in the first layer.
Here is the MATLAB code:
% Load the digit training set as 4-D array data using
% |digitTrain4DArrayData|.
[trainImages,~,trainAngles] = digitTrain4DArrayData;
disp("Train Images:")
disp(trainImages(:,:,:,1))
% Display 20 random sample training digits using |imshow|.
numTrainImages = size(trainImages,4);
figure
idx = randperm(numTrainImages,20);
for i = 1:numel(idx)
subplot(4,5,i)
imshow(trainImages(:,:,:,idx(i)))
drawnow
end
%%
% Combine all the layers together in a |Layer| array.
layers = [ ...
imageInputLayer([28 28 1], 'Normalization', 'none')
convolution2dLayer(12,25)
reluLayer
fullyConnectedLayer(1)
regressionLayer];
%% Train Network'
options = trainingOptions('sgdm','InitialLearnRate',0.001, ...
'MaxEpochs',15)
net = trainNetwork(trainImages,trainAngles,layers,options)
net.Layers
%% Test Network
[testImages,~,testAngles] = digitTest4DArrayData;
predictedTestAngles = predict(net,testImages);
% *Evaluate Performance*
predictionError = testAngles - predictedTestAngles;
thr = 10;
numCorrect = sum(abs(predictionError) < thr);
numTestImages = size(testImages,4);
accuracy = numCorrect/numTestImages
%%
% Use the root-mean-square error (RMSE) to measure the differences between
% the predicted and actual angles of rotation.
squares = predictionError.^2;
rmse = sqrt(mean(squares))
%%
% *Display Box Plot of Residuals for Each Digit Class*
residuals = testAngles - predictedTestAngles;
residualMatrix = reshape(residuals,500,10);
figure
boxplot(residualMatrix, ...
'Labels',{'0','1','2','3','4','5','6','7','8','9'})
xlabel('Digit Class')
ylabel('Degrees Error')
title('Residuals')
idx = randperm(numTestImages,49);
for i = 1:numel(idx)
image = testImages(:,:,:,idx(i));
predictedAngle = predictedTestAngles(idx(i));
imagesRotated(:,:,:,i) = imrotate(image,predictedAngle,'bicubic','crop');
end
figure
subplot(1,2,1)
montage(testImages(:,:,:,idx))
title('Original')
subplot(1,2,2)
montage(imagesRotated)
title('Corrected')
Here is the Keras code:
import numpy as np
import scipy.io
import matplotlib.pyplot as plt
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dropout, Flatten, Dense, Activation, Conv2D, BatchNormalization, AveragePooling2D
from keras.optimizers import SGD
from keras.utils import np_utils
from keras import regularizers
np.random.seed(1671) # for reproducibility
# network and training
NB_EPOCH = 30
BATCH_SIZE = 128
VERBOSE = 1
NB_CLASSES = 10 # number of outputs = number of digits
OPTIMIZER = SGD() # SGD optimizer, explained later in this chapter
N_HIDDEN = 128
VALIDATION_SPLIT=0.2 # how much TRAIN is reserved for VALIDATION
data = scipy.io.loadmat('RegressionImageData.mat')
XTrain = np.rollaxis(data['XTrain'],3,0)
XTest = np.rollaxis(data['XTest'],3,0)
YTest = np.squeeze(data['YTest'])
YTrain = np.squeeze(data['YTrain'])
print("Train Images:")
print(XTrain.shape)
print(type(XTrain))
print(XTrain)
XTrain_test = np.reshape(XTrain, (5000,28,28))
with open("./test.txt", "a+") as file:
np.set_printoptions(threshold=np.nan)
file.write(np.array2string(XTrain_test[0], max_line_width=np.inf))
model = Sequential()
model.add(Conv2D(25,(12,12), input_shape=(28,28,1), strides=(1,1), activation = "relu"))
model.add(Flatten())
model.add(Dense(1))
model.summary()
sgd = SGD(lr=0.001, decay=0.1, momentum=0.9, nesterov=False)
model.compile(loss='mean_squared_error', optimizer=sgd)
history = model.fit(XTrain, YTrain,
batch_size=BATCH_SIZE, epochs=NB_EPOCH,
verbose=VERBOSE, validation_split=VALIDATION_SPLIT,
shuffle=False)
predictions= model.predict(XTrain)
[np.transpose(predictions[1:50]), np.transpose(YTrain[1:50]), np.abs(np.transpose(predictions[1:50])- np.transpose(YTrain[1:50]))]
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('rmse')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
Ypred_test= model.predict(XTest)
Ypred_test=np.reshape(Ypred_test, (5000,))
predictionError = Ypred_test - YTest
thr = 10;
numCorrect = np.sum((np.abs(predictionError) < thr)*1)
numValidationImages = len(YTest)
accuracy = numCorrect/numValidationImages
print(accuracy)
squares = np.power(predictionError,2)
rmse = np.sqrt(np.mean(squares))
print(rmse)
Anyone know where could be the gap?

Issue with simple CAE

It looks like simple CAE not working for Carvana dataset
I’m trying simple CAE for Carvana dataset. You can download it here
My code is following:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from skimage.io import imread
from skimage.transform import downscale_local_mean
from skimage.color import rgb2grey
from os.path import join, isfile
from tqdm import tqdm_notebook
from sklearn.model_selection import train_test_split
from keras.layers import Conv2D, MaxPooling2D, Conv2DTranspose, Input, concatenate
from keras.models import Model
from keras.callbacks import ModelCheckpoint
import keras.backend as K
from scipy.ndimage.filters import gaussian_filter
from keras.optimizers import Adam
from random import randint
import hickle as hkl
import dill
class Data(object):
def __init__(self, X, Y):
self.X = X
self.Y = Y
input_folder = join('..', 'input')
print('Path:',input_folder)
data_file_name = 'datafile.pkl'
df_mask = pd.read_csv(join(input_folder, 'train_masks.csv'), usecols=['img'])
load_img = lambda im, idx: imread(join(input_folder, 'train', '{}_{:02d}.jpg'.format(im, idx)))
load_mask = lambda im, idx: imread(join(input_folder, 'train_masks', '{}_{:02d}_mask.gif'.format(im, idx)))
ids_train = df_mask['img'].map(lambda s: s.split('_')[0]).unique()
imgs_idx = list(range(1, 17))
resize = lambda im: downscale_local_mean(im, (4,4) if im.ndim==2 else (4,4,1))
mask_image = lambda im, mask: (im * np.expand_dims(mask, 2))
num_train = 48#len(ids_train)
if isfile(data_file_name):
#with open(data_file_name, 'rb') as f:
data = hkl.load(data_file_name)
X = data.X
y = data.y
else:
X = np.empty((num_train, 320, 480, 1), dtype=np.float32)
y = np.empty((num_train, 320, 480, 1), dtype=np.float32)
with tqdm_notebook(total=num_train) as bar:
idx = 1 # Rotation index
for i, img_id in enumerate(ids_train[:num_train]):
imgs_id = [resize(load_img(img_id, j)) for j in imgs_idx]
greyscale = rgb2grey(imgs_id[idx-1]) / 255
greyscale = np.expand_dims(greyscale, 2)
X[i] = greyscale
y_processed = resize(np.expand_dims(load_mask(img_id, idx), 2)) / 255.
y[i] = y_processed
del imgs_id # Free memory
bar.update()
#data = Data(X, y)
#with open(data_file_name, 'w+') as f:
#hkl.dump(data, data_file_name)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=43)
y_train_mean = y_train.mean(axis=0)
y_train_std = y_train.std(axis=0)
y_train_min = y_train.min(axis=0)
y_features = np.concatenate([y_train_mean, y_train_std, y_train_min], axis=2)
inp = Input((320, 480, 1))
conv1 = Conv2D(64, 3, activation='relu', padding='same')(inp)
max1 = MaxPooling2D(2)(conv1)
conv2 = Conv2D(48, 5, activation='relu', padding='same')(max1)
max2 = MaxPooling2D(2)(conv2)
conv3 = Conv2D(32, 7, activation='relu', padding='same')(max2)
deconv3 = Conv2DTranspose(32, 7, strides=4, activation='relu', padding='same')(conv3)
deconv2 = Conv2DTranspose(48, 5, strides=2, activation='relu', padding='same')(conv2)
deconvs = concatenate([conv1, deconv2, deconv3])
out = Conv2D(1, 7, activation='sigmoid', padding='same')(deconvs)
model = Model(inp, out)
model.summary()
smooth = 1.
# From here: https://github.com/jocicmarko/ultrasound-nerve-segmentation/blob/master/train.py
def dice_coef(y_true, y_pred):
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection = K.sum(y_true_f * y_pred_f)
return (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)
def bce_dice_loss(y_true, y_pred):
return 0.5 * K.binary_crossentropy(y_true, y_pred) - dice_coef(y_true, y_pred)
model.compile(Adam(lr=0.0001), bce_dice_loss, metrics=['accuracy', dice_coef])
cae_filepath = "cae_375.hdf5"
pre_mcp = ModelCheckpoint(cae_filepath, monitor='val_dice_coef', verbose=2, save_best_only=True, mode='max')
pre_history = model.fit(X_train, X_train, epochs=1000, validation_data=(X_val, X_val), batch_size=22, verbose=2, callbacks=[pre_mcp])
model.compile(Adam(lr=0.0001), bce_dice_loss, metrics=['accuracy', dice_coef])
model.load_weights(cae_filepath)
filepath="weights-improvement2_lre-5-{epoch:02d}-{val_acc:.5f}-{val_dice_coef:.5f}.hdf5"
mcp = ModelCheckpoint(filepath, monitor='val_dice_coef', verbose=2, save_best_only=True, mode='max')
history = model.fit(X_train, y_train, epochs=1000, validation_data=(X_val, y_val), batch_size=22, verbose=2, callbacks=[mcp])
idxs = [0, X_val.shape[0]/2, randint(1, X_val.shape[0] -1)]
for idx in idxs:
print('Index:', idx)
x = X_val[idx]
fig, ax = plt.subplots(3,3, figsize=(16, 16))
ax = ax.ravel()
cmaps = ['Reds', 'Greens', 'Blues']
for i in range(x.shape[-1]):
ax[i].imshow(x[...,i], cmap='gray') #cmaps[i%3])
ax[i].set_title('channel {}'.format(i))
ax[-8].imshow(y_val[idx,...,0], cmap='gray')
ax[-8].set_title('y')
y_pred = model.predict(x[None]).squeeze()
ax[-7].imshow(y_pred, cmap='gray')
ax[-7].set_title('y_pred')
ax[-6].imshow(gaussian_filter(y_pred,1) > 0.5, cmap='gray')
ax[-6].set_title('1')
ax[-5].imshow(gaussian_filter(y_pred,2) > 0.5, cmap='gray')
ax[-5].set_title('2')
ax[-4].imshow(gaussian_filter(y_pred,3) > 0.5, cmap='gray')
ax[-4].set_title('3')
ax[-3].imshow(gaussian_filter(y_pred,4) > 0.5, cmap='gray')
ax[-3].set_title('4')
ax[-2].imshow(gaussian_filter(y_pred,5) > 0.5, cmap='gray')
ax[-2].set_title('5')
ax[-1].imshow(gaussian_filter(y_pred,6) > 0.5, cmap='gray')
ax[-1].set_title('6')
It’s working fine without pre-training, you can check it by commenting these lines:
model.compile(Adam(lr=0.0001), bce_dice_loss, metrics=['accuracy', dice_coef])
cae_filepath = "cae_375.hdf5"
pre_mcp = ModelCheckpoint(cae_filepath, monitor='val_dice_coef', verbose=2, save_best_only=True, mode='max')
pre_history = model.fit(X_train, X_train, epochs=1000, validation_data=(X_val, X_val), batch_size=22, verbose=2, callbacks=[pre_mcp])
model.compile(Adam(lr=0.0001), bce_dice_loss, metrics=['accuracy', dice_coef])
model.load_weights(cae_filepath)
However, then I tried pre-train auto encoder to reconstruct original images I have no accuracy improvements, only dice coefficient improvements:
Moreover, then I tried using pre-trained autoencoder for training to make predictions based on training data I have a different result – accuracy stuck on level 0,8374 and dice coefficient degradation from 0.11864 initially down to 7.5781e-04:
Pre-train of model by autoencoder should increase model accuracy. From my experience it gives an improvement of accuracy to 99.62% for full MNIST dataset with a simple CAE
Also, I looked into data to make sure the same nature for both cases (you can see it by temporary variables to debug it in code)
In the second case I have an idea that it may be caused due to the fact, we have not only encoder, but also decoder’s weights and it can potentially cause an issue during training
After reset of decoder’s weights I had almost the same picture for some time:
But after 49 iteration process has reached a crucial moment and training process became efficient:
However, I have no clue why during autoencoder train we don’t have accuracy increase, despite the fact of dice coefficient improvements, probably something wrong with my code or frameworks I’m using
Additional info:
My environment:
Ubuntu 16.04
Python 2.7
Theano 0.10
Keras 2.0.8
Structure:
Any suggestions will be appreciated

Categories