How to Increase Accuracy of CNN on Image Recognition - python

I am training a CNN for image classification. Specifically, I am trying to create a lip reader that is able to classify an image of a segmented mouth with its associated phoneme. The images have a dimension of 64x64 and are flattened into a 1D array of length 4096. I have inserted the code for my current model below with its performance graphs and metrics. Does anyone have any advice for how I can continue to modify this model in order to raise the accuracy?
df = pd.read_csv("/kaggle/input/labeled-frames-resized/labeled_frames.csv", error_bad_lines=False)
labelencoder = LabelEncoder()
df['Phoneme'] = labelencoder.fit_transform(df['Phoneme'])
labels = np.asarray(df[['Phoneme']].copy())
df = df.drop(df.columns[0], axis = 1)
X_train, X_test, y_train, y_test = train_test_split(df, labels, random_state = 42, test_size = 0.2, stratify = labels)
X_train = tf.reshape(X_train, (8113, 4096, 1))
X_test = tf.reshape(X_test, (2029, 4096, 1))
model = Sequential()
model.add(Conv1D(filters= 128, kernel_size=3, activation ='relu',strides = 2, padding = 'valid', input_shape= (4096, 1)))
model.add(MaxPooling1D(pool_size=2))
model.add(Conv1D(filters= 128, kernel_size=3, activation ='relu',strides = 2, padding = 'valid'))
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.5))
model.add(MaxPooling1D(pool_size=2))
model.add(Conv1D(filters= 128, kernel_size=3, activation ='relu',strides = 2, padding = 'valid'))
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.2))
model.add(MaxPooling1D(pool_size=2))
model.add(Conv1D(filters= 128, kernel_size=3, activation ='relu',strides = 2, padding = 'valid'))
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.2))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(39))
model.add(Activation('softmax'))
optimizer = keras.optimizers.Adam(lr=0.4)
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(X_train,y_train, epochs = 500, batch_size = 2048, validation_data = (X_test, y_test), shuffle = True)

You can easily convert it into 2D Convolution:
model.add(Conv2D(filters= 128, kernel_size=(3,3), activation ='relu',strides = (2,2),
padding = 'valid', input_shape= (64,64,1)))
model.add(MaxPooling2D(pool_size=(2,2))
...
model.add(Flatten())
model.add(Dense(39))
model.add(Activation('softmax'))
I've only worked with Conv1d so far because it seemed easier.
Can 1D Convolution be used on images?
Yes you can, but not recommended, unless you have a very specific case and know what you are doing. Assume your images as 1024x1024, what happens when you flatten them? The information that you extract with 2D Convolutions is more than 1D Convolutions.
Explanation:
You can use 1D convolution on images indeed, but not in every situation. (I might be wrong) When you flatten them, then every pixel will be a feature. If we wanted every pixel to be a feature, then we could use normal Dense layers after flattening also. But there would be a lot parameters to train. What I mean by this (total parameters size not included):
model= tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(...)
...
])
When you flatten them you might break the spatial coherence of the images. Using 2D convolutions might gain you accuracy. What we do with 2D convolutions is we visit the image and see what we can extract as an important feature, with max or average pooling.
You will not be able catch that much information with 1D convolutions.
We can feed the pooled feature maps into Fully Connected Layers before making predictions.

Related

How to solve overfitting when you have sufficient data

Notebook Implementation: https://colab.research.google.com/drive/1MoSnUlnUyWo5A15gEuFPEwNCfFl62YcW?usp=sharing
So I've been debugging a CNN model on classifying people based on ECG and I just keep getting really high accuracy from first epoch.
Background
The data is sourced from Physionet MIT-BIH, I only extracted normal beats for each individual, particularly control classes. I have segmented and converted the signals into images.
I experimented with both types of image inputs:
Normal representation VS Time series recurrent representation
I have 5 classes, each with -+2800 samples (definitely sufficient), meaning 13806 total samples. Also no class imbalance. No need for augmentation because the signals are already long and all beats really slightly appear different.
Training
Training (9664, 256, 256, 3)
Validation (3727, 256, 256, 3)
Test (415, 256, 256, 3)
My data is shuffled, in np.array() format, and normalized to 0-1. I'm using a LabelBinarizer() for classes.
Network
def block(model, fs, c):
for _ in range(c):
model.add(Conv2D(filters=fs, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2)))
model.add(Dropout(0.25))
return model
# Model
model = Sequential()
model.add(Conv2D(filters=64, kernel_size=(3,3), padding="same", activation='relu', input_shape=IMAGE_DIMS))
model = block(model, 64, 1)
model = block(model, 128, 2)
model = block(model, 256, 3)
# Fully Connected Layer
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))
# softmax classifier
model.add(Dense(len(lb.classes_), activation="softmax"))
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
STEPS_PER_EPOCH = len(x_train) // BS
VAL_STEPS_PER_EPOCH = len(x_valid) // BS
# train the network
H = model.fit(x_train, y_train, batch_size=BS,
validation_data=(x_valid, y_valid),
steps_per_epoch=STEPS_PER_EPOCH,
validation_steps=VAL_STEPS_PER_EPOCH,
epochs=EPOCHS, verbose=1)
History
Just for 10 epochs??

ValueError: logits and labels must have the same shape ((1, 7, 7, 2) vs (1, 2))

I'm quite new to CNN.
I'm trying to create a the following model. but I get the following error: "ValueError: logits and labels must have the same shape ((1, 7, 7, 2) vs (1, 2))"
Below the code I'm trying to implement
#create the training data set
train_data=scaled_data[0:training_data_len,:]
#define the number of periods
n_periods=28
#split the data into x_train and y_train data set
x_train=[]
y_train=[]
for i in range(n_periods,len(train_data)):
x_train.append(train_data[i-n_periods:i,:28])
y_train.append(train_data[i,29])
x_train=np.array(x_train)
y_train=np.array(y_train)
#Reshape the train data
x_train=x_train.reshape(x_train.shape[0],x_train.shape[1],x_train.shape[2],1)
x_train.shape
y_train = keras.utils.to_categorical(y_train,2)
# x_train as the folllowing shape (3561, 28, 28, 1)
# y_train as the following shape (3561, 2, 2)
#Build the 2 D CNN model for regression
model= Sequential()
model.add(Conv2D(32,kernel_size=(3,3),padding='same',activation='relu',input_shape=(x_train.shape[1],x_train.shape[2],1)))
model.add(Conv2D(64,kernel_size=(3,3),padding='same',activation='relu'))
model.add(MaxPooling2D(pool_size=(4,4)))
model.add(Dropout(0.25))
model.add(Dense(128,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='sigmoid'))
model.add(Dense(2, activation='sigmoid'))
model.summary()
#compile the model
model.compile(optimizer='ADADELTA', loss='binary_crossentropy', metrics=['accuracy'])
#train the model
model.fit(x_train, y_train, batch_size=1, epochs=1, verbose=2)
There are two problems in your approach:
You're using Convolutional/MaxPooling layers in which the inputs/outputs are as matrices, i.e., with the shape of (Batch_Size, Height, Width, Depth). You then add some Dense layers which usually expect vectors, not matrices as inputs. Therefore, you have to first flatten the outputs of MaxPooling before giving it to Dense layer, i.e., add a model.add(Flatten()) after model.add(Dropout(0.25)) and before model.add(Dense(128,activation='relu')).
You are doing binary classification, i.e., you have two classes. You are using binary_crossentropy as the loss function, for this to work, you should keep your targets as they are (0 and 1) and not use y_train = keras.utils.to_categorical(y_train,2). Your final layer should have 1 neuron and not 2 (Change model.add(Dense(2, activation='sigmoid')) into model.add(Dense(1, activation='sigmoid')) )

How to find input_shape of Convolutional Neural Network when we are trying to fit the data from pandas dataframe?

I have a pandas dataframe X_train with 321 samples and 43 features. Also, there are 18 different classes in y_train.
strong textI want to train a CNN over my data, but I am having trouble to give the input shape in case of pandas dataframe.
X.shape, y.shape
((321, 43), (321,))
X = np.array(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0, stratify = y)
X_train.shape, X_test.shape
((256, 43), (65, 43))
inputs = np.concatenate((X_train, X_test), axis=0)
targets = np.concatenate((y_train, y_test), axis=0)
inputs.shape, targets.shape
((321, 43), (321,))
In the first layer of my model, I am having trouble with input_shape.
I am new to CNN and all the tutorials have used image and they are just passing in the height, width and channel as the parameter of input_shape.
fold_no = 1
for train, test in kfold.split(inputs, targets):
model = Sequential()
**model.add(Conv1D(filters=64, kernel_size=3, activation='relu', input_shape=(???)))**
model.add(Conv1D(filters=64, kernel_size=3, activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dense(18, activation='softmax'))
model.compile(optimizer=Adam(learning_rate = 0.001),
loss = 'sparse_categorical_crossentropy',
metrics = ['accuracy'])
history = model.fit(inputs[train], targets[train], batch_size=5, epochs=50, validation_split=0.2, verbose=1)
scores = model.evaluate(inputs[test], targets[test], verbose=0)
fold_no = fold_no + 1
I am having trouble with input_shape in the first layer:
model = Sequential()
model.add(Conv1D(filters=64, kernel_size=3, activation='relu', input_shape=(???)))
I tried to set the input shape like the following format:
model.add(Conv1D(filters=64, kernel_size=3, activation='relu', input_shape=(None, train.shape[1])))
But I got the following error:
I also tried in this way:
model.add(Conv1D(filters=64, kernel_size=3, activation='relu', input_shape=(321, 43)))
Then I got the following error:
I also tried the following format:
model.add(Conv1D(filters=64, kernel_size=3, activation='relu', input_shape=(None, 43)))
Then I got the following error:
Conv1D takes a 3D shape as an input, but the 1st dimension is the batch size, so you can ignore it for input_shape. The other 2 dimensions are (steps, input_dim).
When dealing with numeric or text data, the two dimensions are usually (a) how many sequential rows you want your CNN layer to process at once, (b) how many features are in the row. If your data is naturally segmented into specific lengths (maybe 24, for hours in a day, or 3 words in a trigram), you'll want to specifically set the steps dimension. It will also affect your output shape, which will be (steps-kernel_size+1, filters). Try using some different shapes and look at the model summary to see how they change.
But as the documentation says, you can also use None as your steps, e.g. (None, 128) for variable-length sequences of 128-dimensional vectors.
So basically, I'd suggest this, where inputs[train].shape[1] should be 43 for you:
input_shape=(None, inputs[train].shape[1])
You could also try the full length of your dataset, e.g. (321, 43):
input_shape=inputs[train].shape
Take a look at this excellent answer and also this article for a good visual intuition of how Conv1D works on numeric/text input.
tensor =1D sample or data point how many numbers or features = and number of samples.
In this case
sample1 or vector=[x0,x1,x2,...,ystando], total amount of samples are:321 and total features=31,it will have input_shape=(321,43) of the tensor

LSTM input shape for multivariate time series?

I know this question is asked many times, but I truly can't fix this input shape issue for my case.
My x_train shape == (5523000, 13) // (13 timeseries of length 5523000)
My y_train shape == (5523000, 1)
number of classes == 2
To reshape the x_train and y_train:
x_train= x_train.values.reshape(27615,200,13) # 5523000/200 = 27615
y_train= y_train.values.reshape((5523000,1)) # I know I have a problem here but I dont know how to fix it
Here is my lstm network :
def lstm_baseline(x_train, y_train):
batch_size=200
model = Sequential()
model.add(LSTM(batch_size, input_shape=(27615,200,13),
activation='relu', return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(128, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(1, activation='softmax'))
model.compile(
loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
model.fit(x_train,y_train, epochs= 15)
return model
Whenever I run the code I get this error :
ValueError: Input 0 is incompatible with layer lstm_10: expected
ndim=3, found ndim=4
My question is what I am missing here?
PS: The idea of the project is that I have 13 signals coming from the 13 points of the human body, I want to use them to detect a certain type of diseases (an arousal). By using the LSTM, I want my model to locate the regions where I have that arousal based on these 13 signals.
.
The whole data is 993 patients, for each one I use 13 signals to detect the disorder regions.
if you want me to put the data in 3D dimensions:
(500000 ,13, 993) # (nb_recods, nb_signals, nb_patient)
for each patient I have 500000 observations of 13 signals.
nb_patient is 993
It worth noting that the 500000 size doesn't matter ! as i can have patients with more observations or less than that.
Update: here is a sample data of one patient.
Here is a chunk of my data first 2000 rows
Ok, I did some changes to your code. First, I still don't now what the "200" in your attempt to reshape your data means, so I'm gonna give you a working code and let's see if you can use it or you can modify it to make your code work. The size of your input data and your targets, have to match. You can not have an input x_train with 27615 rows (which is the meaning of x_train[0] = 27615) and a target set y_train with 5523000 values.
I took the first two rows from the data example that you provided for this example:
x_sample = [[-17, -7, -7, 0, -5, -18, 73, 9, -282, 28550, 67],
[-21, -16, -7, -6, -8, 15, 60, 6, -239, 28550, 94]]
y_sample = [0, 0]
Let's reshape x_sample:
x_train = np.array(example)
#Here x_train.shape = (2,11), we want to reshape it to (2,11,1) to
#fit the network's input dimension
x_train = x_train.reshape(x_train.shape[0], x_train.shape[1], 1)
You are using a categorical loss, so you have to change your targets to categorical (chek https://keras.io/utils/)
y_train = np.array(target)
y_train = to_categorical(y_train, 2)
Now you have two categories, I assumed two categories as in the data that you provided all the targets values are 0, so I don't know how many possible values your target can take. If your target can take 4 possible values, then the number of categories in the to_categorical function will be 4. Every output of your last dense layer will represent a category and the value of that output, the probability of your input to belong to that category.
Now, we just have to slightly modify your LSTM model:
def lstm_baseline(x_train, y_train):
batch_size = 200
model = Sequential()
#We are gonna change input shape for input_dim
model.add(LSTM(batch_size, input_dim=1,
activation='relu', return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(128, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.2))
#We are gonna set the number of outputs to 2, to match with the
#number of categories
model.add(Dense(2, activation='softmax'))
model.compile(
loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=15)
return model
You may try some modifications like this below:
x_train = x_train.reshape(1999, 1, 13)
# double-check dimensions
x_train.shape
def lstm_baseline(x_train, y_train, batch_size):
model = Sequential()
model.add(LSTM(batch_size, input_shape=(None, 13),
activation='relu', return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(128, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(1, activation='softmax'))
model.compile(
loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
return model

Keras - Train convolution network, get auto-encoder output

What I want to do:
I want to train a convolutional neural network on the cifar10 dataset on just two classes. Then once I get my fitted model, I want to take all of the layers and reproduce the input image. So I want to get an image back from the network instead of a classification.
What I have done so far:
def copy_freeze_model(model, nlayers = 1):
new_model = Sequential()
for l in model.layers[:nlayers]:
l.trainable = False
new_model.add(l)
return new_model
numClasses = 2
(X_train, Y_train, X_test, Y_test) = load_data(numClasses)
#Part 1
rms = RMSprop()
model = Sequential()
#input shape: channels, rows, columns
model.add(Convolution2D(32, 3, 3, border_mode='same',
input_shape=(3, 32, 32)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation("relu"))
model.add(Dropout(0.5))
#output layer
model.add(Dense(numClasses))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer=rms,metrics=["accuracy"])
model.fit(X_train,Y_train, batch_size=32, nb_epoch=25,
verbose=1, validation_split=0.2,
callbacks=[EarlyStopping(monitor='val_loss', patience=2)])
print('Classifcation rate %02.3f' % model.evaluate(X_test, Y_test)[1])
##pull the layers and try to get an output from the network that is image.
newModel = copy_freeze_model(model, nlayers = 8)
newModel.add(Dense(1024))
newModel.compile(loss='mean_squared_error', optimizer=rms,metrics=["accuracy"])
newModel.fit(X_train,X_train, batch_size=32, nb_epoch=25,
verbose=1, validation_split=0.2,
callbacks=[EarlyStopping(monitor='val_loss', patience=2)])
preds = newModel.predict(X_test)
Also when I do:
input_shape=(3, 32, 32)
Does this means a 3 channel (RGB) 32 x 32 image?
What I suggest you is a stacked convolutional autoencoder. This makes unpooling layers and deconvolution compulsory. Here you can find the general idea and code in Theano (on which Keras is built):
https://swarbrickjones.wordpress.com/2015/04/29/convolutional-autoencoders-in-pythontheanolasagne/
An example definition of layers needed can be found here :
https://github.com/fchollet/keras/issues/378

Categories