Newbie to machine learning here.
I'm currently working on a diagnostic machine learning framework using 3D-CNNs on fMRI imaging. My dataset consists of 636 images right now, and I'm trying to distinguish between control and affected (binary classification). However, when I tried to train my model, after every epoch, my accuracy remains at 48.13%, no matter what I do. Additionally, over the epoch, the accuracy decreases from 56% to 48.13%.
So far, I have tried:
changing my loss functions (poisson, categorical cross entropy, binary cross entropy, sparse categorical cross entropy, mean squared error, mean absolute error, hinge, hinge squared)
changing my optimizer (I've tried Adam and SGD)
changing the number of layers
using weight regularization
changing from ReLU to leaky ReLU (I thought perhaps that could help if this was a case of overfitting)
Nothing has worked so far.
Any tips? Here's my code:
#importing important packages
import tensorflow as tf
import os
import keras
from keras.models import Sequential
from keras.layers import Dense, Flatten, Conv3D, MaxPooling3D, Dropout, BatchNormalization, LeakyReLU
import numpy as np
from keras.regularizers import l2
from sklearn.utils import compute_class_weight
from keras.optimizers import SGD
BATCH_SIZE = 64
input_shape=(64, 64, 40, 20)
# Create the model
model = Sequential()
model.add(Conv3D(64, kernel_size=(3,3,3), activation='relu', input_shape=input_shape, kernel_regularizer=l2(0.005), bias_regularizer=l2(0.005), data_format = 'channels_first', padding='same'))
model.add(MaxPooling3D(pool_size=(2, 2, 2)))
model.add(Conv3D(64, kernel_size=(3,3,3), activation='relu', input_shape=input_shape, kernel_regularizer=l2(0.005), bias_regularizer=l2(0.005), data_format = 'channels_first', padding='same'))
model.add(MaxPooling3D(pool_size=(2, 2, 2)))
model.add(BatchNormalization(center=True, scale=True))
model.add(Conv3D(64, kernel_size=(3,3,3), activation='relu', input_shape=input_shape, kernel_regularizer=l2(0.005), bias_regularizer=l2(0.005), data_format = 'channels_first', padding='same'))
model.add(MaxPooling3D(pool_size=(2, 2, 2)))
model.add(Conv3D(64, kernel_size=(3,3,3), activation='relu', input_shape=input_shape, kernel_regularizer=l2(0.005), bias_regularizer=l2(0.005), data_format = 'channels_first', padding='same'))
model.add(MaxPooling3D(pool_size=(2, 2, 2)))
model.add(BatchNormalization(center=True, scale=True))
model.add(Flatten())
model.add(BatchNormalization(center=True, scale=True))
model.add(Dense(128, activation='relu', kernel_regularizer=l2(0.01), bias_regularizer=l2(0.01)))
model.add(Dropout(0.5))
model.add(Dense(128, activation='sigmoid', kernel_regularizer=l2(0.01), bias_regularizer=l2(0.01)))
model.add(Dense(1, activation='softmax', kernel_regularizer=l2(0.01), bias_regularizer=l2(0.01)))
# Compile the model
model.compile(optimizer = keras.optimizers.sgd(lr=0.000001), loss='poisson', metrics=['accuracy', tf.keras.metrics.Precision(), tf.keras.metrics.Recall()])
# Model Testing
history = model.fit(X_train, y_train, batch_size=BATCH_SIZE, epochs=50, verbose=1, shuffle=True)
The main issue is that you are using softmax activation with 1 neuron. Change it to sigmoid with binary_crossentropy as a loss function.
At the same time, bear in mind that you are using Poisson loss function, which is suitable for regression problems not classification ones. Ensure that you detect the exact scenario that your are trying to solve.
Softmax with one neuron makes the model illogical and only use one of the sigmoid activation functions or Softmax in the last layer
Related
I've implemented the following code in Jupyter notebook, and it has been over 90 mins, the programme is still running, and I've not gotten any output.
I'm working with mid-2012 MacBook Pro.
4 GB ram.
I've checked the activity monitor, memory pressure is in the yellow zone, so that means, the means, the mac is not running out of memory, and I don't know what to do now.
The program implements CNN model over CIFAR-10 dataset.
import sys
from matplotlib import pyplot
from keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Dense
from keras.layers import Flatten
from tensorflow.keras.optimizers import SGD
from keras.preprocessing.image import ImageDataGenerator
from keras.layers import Dropout
from keras.layers import BatchNormalization
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
# load train and test dataset
def load_dataset():
# load dataset
(trainX, trainY), (testX, testY) = cifar10.load_data()
# one hot encode target values
trainY = to_categorical(trainY)
testY = to_categorical(testY)
return trainX, trainY, testX, testY
# scale pixels
def prep_pixels(train, test):
# convert from integers to floats
train_norm = train.astype('float32')
test_norm = test.astype('float32')
# normalize to range 0-1
train_norm = train_norm / 255.0
test_norm = test_norm / 255.0
# return normalized images
return train_norm, test_norm
# define cnn model
def define_model():
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same', input_shape=(32, 32, 3)))
model.add(BatchNormalization())
model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.2))
model.add(Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(BatchNormalization())
model.add(Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.3))
model.add(Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(BatchNormalization())
model.add(Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.4))
model.add(Flatten())
model.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
# compile model
opt = SGD(lr=0.001, momentum=0.9)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
return model
# plot diagnostic learning curves
def summarize_diagnostics(history):
# plot loss
pyplot.subplot(211)
pyplot.title('Cross Entropy Loss')
pyplot.plot(history.history['loss'], color='blue', label='train')
pyplot.plot(history.history['val_loss'], color='orange', label='test')
# plot accuracy
pyplot.subplot(212)
pyplot.title('Classification Accuracy')
pyplot.plot(history.history['accuracy'], color='blue', label='train')
pyplot.plot(history.history['val_accuracy'], color='orange', label='test')
# save plot to file
filename = sys.argv[0].split('/')[-1]
pyplot.savefig(filename + '_plot.png')
pyplot.close()
# run the test harness for evaluating a model
def run_test_harness():
# load dataset
trainX, trainY, testX, testY = load_dataset()
# prepare pixel data
trainX, testX = prep_pixels(trainX, testX)
# define model
model = define_model()
# create data generator
datagen = ImageDataGenerator(width_shift_range=0.1, height_shift_range=0.1, horizontal_flip=True)
# prepare iterator
it_train = datagen.flow(trainX, trainY, batch_size=64)
# fit model
steps = int(trainX.shape[0] / 64)
history = model.fit_generator(it_train, steps_per_epoch=steps, epochs=100, validation_data=(testX, testY), verbose=0)
# evaluate model
_, acc = model.evaluate(testX, testY, verbose=0)
print('> %.3f' % (acc * 100.0))
# learning curves
summarize_diagnostics(history)
# entry point, run the test harness
run_test_harness()```
The reason your model is taking so long to train is because:
You are training a large model with many layers for 100 epochs
You are using a relatively low performance computer (from what I found googling it only has a 2.5 GHZ processor.)
You can make it train faster by using an a free cloud environment that has GPUs and TPUs like Google Colab (https://colab.research.google.com/), or even better a Kaggle notebook which allows you to train for longer periods of time. If you want to run it on your mac you could try making the model smaller or decreasing the number of epochs you are training for.
It should be easy to port your notebook to a Google Colab or Kaggle notebook. You will need to create a google account for Google Colab or a seperate account for Kaggle.
Hope this helped!
You need High GPU Operating system to run this on Jypyter Notebook. otherwise use Google Colab (https://colab.research.google.com/), Due to high value of epoch it will take a large amount of time in jupyter notebook
I'm trying to train a simple model over some picture data that belongs to 10 classes.
The images are in B/W format (not gray scale), I'm using the image_dataset_from_directory to import the data into python as well as split it into validation/training sets.
My code is as below:
My Imports
import matplotlib.pyplot as plt
import numpy as np
import os
import PIL
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.layers import Dense
Read Image Data
trainDT = tf.keras.preprocessing.image_dataset_from_directory(
data_path,
labels="inferred",
label_mode="categorical",
class_names=['0','1','2','3','4','5','6','7','8','9'],
color_mode="grayscale",
batch_size=4,
image_size=(256, 256),
shuffle=True,
seed=44,
validation_split=0.1,
subset='validation',
interpolation="bilinear",
follow_links=False,
)
Model Creation/Compile/Fit
model = Sequential([
Dense(units=128, activation='relu', input_shape=(256,256,1), name='h1'),
Dense(units=64, activation='relu',name='h2'),
Dense(units=16, activation='relu',name='h3'),
layers.Flatten(name='flat'),
Dense(units=10, activation='softmax',name='out')
],name='1st')
model.summary()
model.compile(optimizer='adam' , loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x=trainDT, validation_data=train_data, epochs=10, verbose=2)
The model training returns an error:
InvalidArgumentError Traceback (most recent call last)
....
/// anaconda paths and anaconda python code snippets in the error reporting \\\
....
InvalidArgumentError: Matrix size-incompatible: In[0]: [1310720,3], In[1]: [1,128]
[[node 1st/h1/Tensordot/MatMul (defined at <ipython-input-38-58d6507e2d35>:1) ]] [Op:__inference_test_function_11541]
Function call stack:
test_function
I don't understand where the size mismatch comes from, I've spent a few hours looking around for a solution and trying different things but nothing seems to work for me.
Appreciate any help, thank you in advance!
Dense layers expect flat input (not 3d tensor), but you are sending (256,256,1) shaped tensor into the first dense layer. If you want to use dense layers from the beginning then you will need to move the flatten to be the first layer or you will need to properly reshape your data.
model = tf.keras.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation="relu"),
tf.keras.layers.Dense(10, activation="softmax")
])
Also, the flatten between 2 dense layers makes no sense because the output of a dense layer is flat anyway.
From the structure of your model (especially the flatten placement), I assume that
those dense layers were supposed to be convolutional layers instead.
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation="relu"),
tf.keras.layers.Dense(10, activation="softmax")
])
Convolutional layers can process 2D input and they will also produce more dimensional output which you need to flatten before passing it to the dense top (note that you can add more convolutional layers).
Hy mhk777 Hope you are doing well. Brother, I think that you are confusing dense layers with convolution layers. You have to apply some convolution layers to the image before giving it to dense layers. If you don't want to apply convolution than you have to give 2d array to the dense layer i.e (number of samples, data)
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
model = models.Sequential()
# Here are convolutional Layer
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(256,256,1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
# Here are your dense layers
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10))
model.summary()
model.compile(optimizer='adam' , loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x=trainDT, validation_data=train_data, epochs=10, verbose=2)
I trained a model to categorize pictures in two different types. Everything is working
quite good, but my Model can only do a specific prediction (1 or 0 in my case), but I am interested to have a prediction which is more like a probability (For example 90% 1 and 10% 0).
Where is the part of my code which I should change now? Is it something with the sigmoid function in the end which decides if its 1 or 0? Help would be nice. Thanks in advance.
import numpy as np
from keras.callbacks import TensorBoard
from keras import regularizers
from keras.models import Sequential
from keras.layers import Activation, Dropout, Flatten, Dense, Conv2D, MaxPooling2D
from keras.optimizers import Adam
from keras.metrics import categorical_crossentropy
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
from keras.layers.normalization import BatchNormalization
from utils import DataGenerator, PATH
train_path = 'Dataset/train'
valid_path = 'Dataset/valid'
test_path = 'Dataset/test'
model = Sequential()
model.add(Conv2D(16, (3, 3), input_shape=(640, 640, 1), padding='same', activation='relu',
kernel_regularizer=regularizers.l2(1e-4),
bias_regularizer=regularizers.l2(1e-4)))
model.add(MaxPooling2D(pool_size=(4, 4)))
model.add(Conv2D(32, (3, 3), activation='relu',
kernel_regularizer=regularizers.l2(1e-4),
bias_regularizer=regularizers.l2(1e-4)))
model.add(MaxPooling2D(pool_size=(5, 5)))
model.add(Conv2D(64, (3, 3), activation='relu',
kernel_regularizer=regularizers.l2(1e-4),
bias_regularizer=regularizers.l2(1e-4)))
model.add(MaxPooling2D(pool_size=(6, 6)))
model.add(Flatten())
model.add(Dense(64, activation='relu',
kernel_regularizer=regularizers.l2(1e-4),
bias_regularizer=regularizers.l2(1e-4)))
model.add(Dropout(0.3))
model.add(Dense(1, activation='sigmoid',
kernel_regularizer=regularizers.l2(1e-4),
bias_regularizer=regularizers.l2(1e-4)))
print(model.summary())
model.compile(loss='binary_crossentropy', optimizer=Adam(lr=1e-3), metrics=['accuracy'])
epochs = 50
batch_size = 16
datagen = DataGenerator()
datagen.load_data()
model.fit_generator(datagen.flow(batch_size=batch_size), epochs=epochs, validation_data=datagen.get_validation_data(),
callbacks=[TensorBoard(log_dir=PATH+'/tensorboard')])
#model.save_weights('first_try.h5')
model.save('second_try')
If I try to get a picture in my model like this:
path = 'train/clean/picturenumber2'
def prepare(filepath):
IMG_SIZE = 640
img_array = cv2.imread(filepath, cv2.IMREAD_GRAYSCALE)
new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE))
return new_array.reshape(-1, IMG_SIZE, IMG_SIZE, 1)
model = tf.keras.models.load_model('second_try')
prediction = model.predict(prepare(path))
print(prediction)
I just get an output like this: [[1.]] Also if I put in a list with multiple pictures. The prediction itself seems to be working.
short answer :
change sigmoid activation function in the last layer to softmax
why ?
because sigmoid output range is 0.0 to 1.0, so to make a meaningful interpretation of this output, you choose an appropriate threshold above which represents the positive class and anything below as negative class.(for a binary classification problem)
even softmax has the same output range but the difference being its outputs are normalized class probabilities more on that here, so if your model outputs 0.99 on any given input, then it can be interpreted as the model is 99.0% confident that it is a positive class and 0.1% confident that it belongs to a negative class.
update :
as #amin suggested, if you need normalized probabilities you should do couple more changes for it to work.
modify your data generator to output 2 classes/labels instead of one.
change last Dense layer from 1 node to 2 nodes.
I am trying to use this to classify the images into two categories. Also I applied model.fit() function but its showing error.
ValueError: A target array with shape (90, 1) was passed for an output of shape (None, 10) while using as loss binary_crossentropy. This loss expects targets to have the same shape as the output.
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten, Conv2D, MaxPooling2D, LSTM
import pickle
import numpy as np
X = np.array(pickle.load(open("X.pickle","rb")))
Y = np.array(pickle.load(open("Y.pickle","rb")))
#scaling our image data
X = X/255.0
model = Sequential()
model.add(Conv2D(64 ,(3,3), input_shape = (300,300,1)))
# model.add(MaxPooling2D(pool_size = (2,2)))
model.add(tf.keras.layers.Reshape((16, 16*512)))
model.add(LSTM(128, activation='relu', return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(128, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='softmax'))
opt = tf.keras.optimizers.Adam(lr=1e-3, decay=1e-5)
model.compile(loss='binary_crossentropy', optimizer=opt,
metrics=['accuracy'])
# model.summary()
model.fit(X, Y, batch_size=32, epochs = 2, validation_split=0.1)
If your problem is categorical, your issue is that you are using binary_crossentropy instead of categorical_crossentropy; ensure that you do have a categorical instead of a binary classification problem.
Also, please note that if your labels are in simple integer format like [1,2,3,4...] and not one-hot-encoded, your loss_function should be sparse_categorical_crossentropy, not categorical_crossentropy.
If you do have a binary classification problem, like said in the error of the above ensure that:
Loss is binary_crossentroy + Dense(1,activation='sigmoid')
Loss is categorical_crossentropy + Dense(2,activation='softmax')
Guys I'm trying to classify the Dogs vs Cats dataset using CNN. I'm deep learning beginner btw.
The dataset link can be obtained from here. I've also classified the above dataset using MLP with a training accuracy of 70% and testing accuracy of 62%. So I decided to use CNN to improve the score.
But unfortunately, I'm still getting very similar results. Here is my code:
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import LabelEncoder
from keras.layers import Dense, Activation, Flatten, Dropout
from keras.layers.convolutional import Convolution2D
from keras.layers.convolutional import MaxPooling2D
from keras.models import Sequential
from keras.utils import np_utils
from keras.optimizers import SGD
from keras.datasets import mnist
from keras import backend as K
from imutils import paths
import numpy as np
import argparse
import cPickle
import h5py
import sys
import cv2
import os
K.set_image_dim_ordering('th')
def image_to_feature_vector(image, size=(28, 28)):
return cv2.resize(image, size)
print("[INFO] pre-processing images...")
imagePaths = list(paths.list_images(raw_input('path to dataset: ')))
data = []
labels = []
for (i, imagePath) in enumerate(imagePaths):
image = cv2.imread(imagePath)
label = imagePath.split(os.path.sep)[-1].split(".")[0]
features = image_to_feature_vector(image)
data.append(features)
labels.append(label)
if i > 0 and i % 1000 == 0:
print("[INFO] processed {}/{}".format(i, len(imagePaths)))
le = LabelEncoder()
labels = le.fit_transform(labels)
labels = np_utils.to_categorical(labels, 2)
data = np.array(data) / 255.0
print("[INFO] constructing training/testing split...")
(X_train, X_test, y_train, y_test) = train_test_split(data, labels, test_size=0.25, random_state=42)
X_train = X_train.reshape(X_train.shape[0], 3, 28, 28).astype('float32')
X_test = X_test.reshape(X_test.shape[0], 3, 28, 28).astype('float32')
num_classes = y_test.shape[1]
def basic_model():
model = Sequential()
model.add(Convolution2D(32, 3, 3, border_mode='valid', init='uniform', bias=True, input_shape=(3, 28, 28), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
return model
model = basic_model()
model.fit(X_train, y_train, validation_data=(X_test, y_test), nb_epoch=25, batch_size=50, shuffle=True, verbose=1)
print('[INFO] Evaluating the model on test data...')
scores = model.evaluate(X_test, y_test, batch_size=100, verbose=1)
print("\nAccuracy: %.4f%%\n\n"%(scores[1]*100))
The CNN model I've used is very basic but decent enough I think. I followed various tutorials to get to it. I even used this architecture but got similar result(65% testing accuracy):
def baseline_model():
model = Sequential()
model.add(Convolution2D(30, 5, 5, border_mode='valid', input_shape=(3, 28, 28), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(15, 3, 3, activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])
return model
For optimiser I also tried adam with default parameters and for model.complie loss function I also tried categorical_crossentropy but there was no (or very slight) improvement.
Can you suggest where I'm going wrong or what I can do to improve efficiency?(In few epochs if possible)
(I'm a beginner in deep learning and keras programming...)
EDIT: so I managed to touch 70.224% testing accuracy and 74.27% training accuracy. CNN architecture was
CONV => CONV => POOL => DROPOUT => FLATTEN => DENSE*3
(There is almost no overfitting as training acc: 74% and testing is: 70% 🙂)
But still open to suggestions to increase it further, 70% is definitely on lower side...
Use (128,3,3) or (64,3,3) that'll solve the accuracy problem. And how many epochs you are using? It'll be great if you use more than 20 epochs.
Try this:
model.add(Convolution2D(32, 3, 3, 3, border_mode='full'))
model.add(Activation('relu'))
model.add(Convolution2D(32, 32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(poolsize=(2, 2)))
model.add(Dropout(0.25))
model.add(Convolution2D(64, 32, 3, 3, border_mode='full'))
model.add(Activation('relu'))
model.add(Convolution2D(64, 64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(poolsize=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(64*8*8, 512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(512, 2))
model.add(Activation('softmax'))
Basically, your network is not deep enough. That is why both your training and validation accuracy are low. You can try to deepen your network from two aspects.
Use larger number of filters for each convolutional layer. (30, 5, 5) or (15, 3, 3) are just not enough. Change the first convolutional layer to (64, 3, 3). After max pooling, which reduces your 2D size, the network should provide "deeper" features. Thus, the second should not be 15, but something like (64, 3,3) or even (128,3,3).
Add more convolutional layers. 5 or 6 layers for this problem may be good.
Overall, your question is beyond programming. It is more about CNN network architecture. You may read more research papers on this topic to get a better understanding. For this specific problem, Keras has a very good tutorial on how to improve the performance with very small set of cats and dogs images:
Building powerful image classification models using very little data