I was told that using data augmentation would help me get more accurate predictions on handwritten digits I've extracted from a document (so it's not in the MNIST dataset I'm using), so I used it in my model. However, I'm curious if I did it correctly, as the size of the training set before using data augmentation is 60000, but after adding data augmentation it went down to 3750 per epoch? Am I doing this correctly?
Following the data augmentation part of this tutorial, I've adjusted it such that it will work on how I create and train my model. I've also left out optional parameters in the functions that I don't really understand as of the moment.
The current model I'm using is from one of the answers I got from my previous question as it performs better than the second model I've cobbled together for the sake of trying. I think the only change I made is to make it use sparse_categorical_crossentropy instead for loss, as I'm categorizing digits and handwritten characters can't belong to two categories of digits, uh, right?
def createModel():
model = keras.models.Sequential()
# 1st conv & maxpool
model.add(keras.layers.Conv2D(40, (5, 5), padding="same", activation='relu', input_shape=(28, 28, 1)))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# 2nd conv & maxpool
model.add(keras.layers.Conv2D(200, (3, 3), padding="same", activation='relu'))
model.add(keras.layers.MaxPooling2D(pool_size=(3, 3), strides=(1,1)))
# 3rd conv & maxpool
model.add(keras.layers.Conv2D(512, (3, 3), padding="valid", activation='relu'))
model.add(keras.layers.MaxPooling2D(pool_size=(3, 3), strides=(1,1)))
# reduces dims from 2d to 1d
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(units=100, activation='relu'))
# dropout for preventing overfitting
model.add(keras.layers.Dropout(0.5))
# final fully-connected layer
model.add(keras.layers.Dense(10, activation='softmax'))
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
return model
Training is done by a separate function, and this is where I plugged in the data augmentation part:
def trainModel(file_model, epochs=5, create_new=False, show=False):
model = createModel()
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = x_train.reshape((60000, 28, 28, 1)) / 255.0
x_test = x_test.reshape((10000, 28, 28, 1)) /255.0
x_gen = np.array(x_train, copy=True)
y_gen = np.array(y_train, copy=True)
datagen = keras.preprocessing.image.ImageDataGenerator(featurewise_center=True, featurewise_std_normalization=True, rotation_range=20)
datagen.fit(x_gen)
x_train = np.concatenate((x_train, x_gen), axis=0)
y_train = np.concatenate((y_train, y_gen), axis=0)
# if prev model exists and new model isn't needed..
if (os.path.exists(file_model)==True and create_new==False):
model = keras.models.load_model(file_model)
else:
history = model.fit_generator(datagen.flow(x_train, y_train), epochs=epochs, validation_data=(x_test, y_test))
model.save(file_model)
if (show==True):
model.summary()
return model
I would expect it will greatly help with correctly identifying the handwritten characters assuming it was used correctly. But I'm not even sure if I did it correctly such that it contributes a lot on the model's accuracy.
EDIT: It did help a bit in identifying some of the extracted characters, but the model still didn't get the majority of the extracted characters right, leading me to doubt if I have implemented it properly..
Related
Notebook Implementation: https://colab.research.google.com/drive/1MoSnUlnUyWo5A15gEuFPEwNCfFl62YcW?usp=sharing
So I've been debugging a CNN model on classifying people based on ECG and I just keep getting really high accuracy from first epoch.
Background
The data is sourced from Physionet MIT-BIH, I only extracted normal beats for each individual, particularly control classes. I have segmented and converted the signals into images.
I experimented with both types of image inputs:
Normal representation VS Time series recurrent representation
I have 5 classes, each with -+2800 samples (definitely sufficient), meaning 13806 total samples. Also no class imbalance. No need for augmentation because the signals are already long and all beats really slightly appear different.
Training
Training (9664, 256, 256, 3)
Validation (3727, 256, 256, 3)
Test (415, 256, 256, 3)
My data is shuffled, in np.array() format, and normalized to 0-1. I'm using a LabelBinarizer() for classes.
Network
def block(model, fs, c):
for _ in range(c):
model.add(Conv2D(filters=fs, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2)))
model.add(Dropout(0.25))
return model
# Model
model = Sequential()
model.add(Conv2D(filters=64, kernel_size=(3,3), padding="same", activation='relu', input_shape=IMAGE_DIMS))
model = block(model, 64, 1)
model = block(model, 128, 2)
model = block(model, 256, 3)
# Fully Connected Layer
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))
# softmax classifier
model.add(Dense(len(lb.classes_), activation="softmax"))
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
STEPS_PER_EPOCH = len(x_train) // BS
VAL_STEPS_PER_EPOCH = len(x_valid) // BS
# train the network
H = model.fit(x_train, y_train, batch_size=BS,
validation_data=(x_valid, y_valid),
steps_per_epoch=STEPS_PER_EPOCH,
validation_steps=VAL_STEPS_PER_EPOCH,
epochs=EPOCHS, verbose=1)
History
Just for 10 epochs??
I am training a CNN for image classification. Specifically, I am trying to create a lip reader that is able to classify an image of a segmented mouth with its associated phoneme. The images have a dimension of 64x64 and are flattened into a 1D array of length 4096. I have inserted the code for my current model below with its performance graphs and metrics. Does anyone have any advice for how I can continue to modify this model in order to raise the accuracy?
df = pd.read_csv("/kaggle/input/labeled-frames-resized/labeled_frames.csv", error_bad_lines=False)
labelencoder = LabelEncoder()
df['Phoneme'] = labelencoder.fit_transform(df['Phoneme'])
labels = np.asarray(df[['Phoneme']].copy())
df = df.drop(df.columns[0], axis = 1)
X_train, X_test, y_train, y_test = train_test_split(df, labels, random_state = 42, test_size = 0.2, stratify = labels)
X_train = tf.reshape(X_train, (8113, 4096, 1))
X_test = tf.reshape(X_test, (2029, 4096, 1))
model = Sequential()
model.add(Conv1D(filters= 128, kernel_size=3, activation ='relu',strides = 2, padding = 'valid', input_shape= (4096, 1)))
model.add(MaxPooling1D(pool_size=2))
model.add(Conv1D(filters= 128, kernel_size=3, activation ='relu',strides = 2, padding = 'valid'))
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.5))
model.add(MaxPooling1D(pool_size=2))
model.add(Conv1D(filters= 128, kernel_size=3, activation ='relu',strides = 2, padding = 'valid'))
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.2))
model.add(MaxPooling1D(pool_size=2))
model.add(Conv1D(filters= 128, kernel_size=3, activation ='relu',strides = 2, padding = 'valid'))
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.2))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(39))
model.add(Activation('softmax'))
optimizer = keras.optimizers.Adam(lr=0.4)
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(X_train,y_train, epochs = 500, batch_size = 2048, validation_data = (X_test, y_test), shuffle = True)
You can easily convert it into 2D Convolution:
model.add(Conv2D(filters= 128, kernel_size=(3,3), activation ='relu',strides = (2,2),
padding = 'valid', input_shape= (64,64,1)))
model.add(MaxPooling2D(pool_size=(2,2))
...
model.add(Flatten())
model.add(Dense(39))
model.add(Activation('softmax'))
I've only worked with Conv1d so far because it seemed easier.
Can 1D Convolution be used on images?
Yes you can, but not recommended, unless you have a very specific case and know what you are doing. Assume your images as 1024x1024, what happens when you flatten them? The information that you extract with 2D Convolutions is more than 1D Convolutions.
Explanation:
You can use 1D convolution on images indeed, but not in every situation. (I might be wrong) When you flatten them, then every pixel will be a feature. If we wanted every pixel to be a feature, then we could use normal Dense layers after flattening also. But there would be a lot parameters to train. What I mean by this (total parameters size not included):
model= tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(...)
...
])
When you flatten them you might break the spatial coherence of the images. Using 2D convolutions might gain you accuracy. What we do with 2D convolutions is we visit the image and see what we can extract as an important feature, with max or average pooling.
You will not be able catch that much information with 1D convolutions.
We can feed the pooled feature maps into Fully Connected Layers before making predictions.
I'm quite new to CNN.
I'm trying to create a the following model. but I get the following error: "ValueError: logits and labels must have the same shape ((1, 7, 7, 2) vs (1, 2))"
Below the code I'm trying to implement
#create the training data set
train_data=scaled_data[0:training_data_len,:]
#define the number of periods
n_periods=28
#split the data into x_train and y_train data set
x_train=[]
y_train=[]
for i in range(n_periods,len(train_data)):
x_train.append(train_data[i-n_periods:i,:28])
y_train.append(train_data[i,29])
x_train=np.array(x_train)
y_train=np.array(y_train)
#Reshape the train data
x_train=x_train.reshape(x_train.shape[0],x_train.shape[1],x_train.shape[2],1)
x_train.shape
y_train = keras.utils.to_categorical(y_train,2)
# x_train as the folllowing shape (3561, 28, 28, 1)
# y_train as the following shape (3561, 2, 2)
#Build the 2 D CNN model for regression
model= Sequential()
model.add(Conv2D(32,kernel_size=(3,3),padding='same',activation='relu',input_shape=(x_train.shape[1],x_train.shape[2],1)))
model.add(Conv2D(64,kernel_size=(3,3),padding='same',activation='relu'))
model.add(MaxPooling2D(pool_size=(4,4)))
model.add(Dropout(0.25))
model.add(Dense(128,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='sigmoid'))
model.add(Dense(2, activation='sigmoid'))
model.summary()
#compile the model
model.compile(optimizer='ADADELTA', loss='binary_crossentropy', metrics=['accuracy'])
#train the model
model.fit(x_train, y_train, batch_size=1, epochs=1, verbose=2)
There are two problems in your approach:
You're using Convolutional/MaxPooling layers in which the inputs/outputs are as matrices, i.e., with the shape of (Batch_Size, Height, Width, Depth). You then add some Dense layers which usually expect vectors, not matrices as inputs. Therefore, you have to first flatten the outputs of MaxPooling before giving it to Dense layer, i.e., add a model.add(Flatten()) after model.add(Dropout(0.25)) and before model.add(Dense(128,activation='relu')).
You are doing binary classification, i.e., you have two classes. You are using binary_crossentropy as the loss function, for this to work, you should keep your targets as they are (0 and 1) and not use y_train = keras.utils.to_categorical(y_train,2). Your final layer should have 1 neuron and not 2 (Change model.add(Dense(2, activation='sigmoid')) into model.add(Dense(1, activation='sigmoid')) )
I have a pandas dataframe X_train with 321 samples and 43 features. Also, there are 18 different classes in y_train.
strong textI want to train a CNN over my data, but I am having trouble to give the input shape in case of pandas dataframe.
X.shape, y.shape
((321, 43), (321,))
X = np.array(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0, stratify = y)
X_train.shape, X_test.shape
((256, 43), (65, 43))
inputs = np.concatenate((X_train, X_test), axis=0)
targets = np.concatenate((y_train, y_test), axis=0)
inputs.shape, targets.shape
((321, 43), (321,))
In the first layer of my model, I am having trouble with input_shape.
I am new to CNN and all the tutorials have used image and they are just passing in the height, width and channel as the parameter of input_shape.
fold_no = 1
for train, test in kfold.split(inputs, targets):
model = Sequential()
**model.add(Conv1D(filters=64, kernel_size=3, activation='relu', input_shape=(???)))**
model.add(Conv1D(filters=64, kernel_size=3, activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dense(18, activation='softmax'))
model.compile(optimizer=Adam(learning_rate = 0.001),
loss = 'sparse_categorical_crossentropy',
metrics = ['accuracy'])
history = model.fit(inputs[train], targets[train], batch_size=5, epochs=50, validation_split=0.2, verbose=1)
scores = model.evaluate(inputs[test], targets[test], verbose=0)
fold_no = fold_no + 1
I am having trouble with input_shape in the first layer:
model = Sequential()
model.add(Conv1D(filters=64, kernel_size=3, activation='relu', input_shape=(???)))
I tried to set the input shape like the following format:
model.add(Conv1D(filters=64, kernel_size=3, activation='relu', input_shape=(None, train.shape[1])))
But I got the following error:
I also tried in this way:
model.add(Conv1D(filters=64, kernel_size=3, activation='relu', input_shape=(321, 43)))
Then I got the following error:
I also tried the following format:
model.add(Conv1D(filters=64, kernel_size=3, activation='relu', input_shape=(None, 43)))
Then I got the following error:
Conv1D takes a 3D shape as an input, but the 1st dimension is the batch size, so you can ignore it for input_shape. The other 2 dimensions are (steps, input_dim).
When dealing with numeric or text data, the two dimensions are usually (a) how many sequential rows you want your CNN layer to process at once, (b) how many features are in the row. If your data is naturally segmented into specific lengths (maybe 24, for hours in a day, or 3 words in a trigram), you'll want to specifically set the steps dimension. It will also affect your output shape, which will be (steps-kernel_size+1, filters). Try using some different shapes and look at the model summary to see how they change.
But as the documentation says, you can also use None as your steps, e.g. (None, 128) for variable-length sequences of 128-dimensional vectors.
So basically, I'd suggest this, where inputs[train].shape[1] should be 43 for you:
input_shape=(None, inputs[train].shape[1])
You could also try the full length of your dataset, e.g. (321, 43):
input_shape=inputs[train].shape
Take a look at this excellent answer and also this article for a good visual intuition of how Conv1D works on numeric/text input.
tensor =1D sample or data point how many numbers or features = and number of samples.
In this case
sample1 or vector=[x0,x1,x2,...,ystando], total amount of samples are:321 and total features=31,it will have input_shape=(321,43) of the tensor
What I want to do:
I want to train a convolutional neural network on the cifar10 dataset on just two classes. Then once I get my fitted model, I want to take all of the layers and reproduce the input image. So I want to get an image back from the network instead of a classification.
What I have done so far:
def copy_freeze_model(model, nlayers = 1):
new_model = Sequential()
for l in model.layers[:nlayers]:
l.trainable = False
new_model.add(l)
return new_model
numClasses = 2
(X_train, Y_train, X_test, Y_test) = load_data(numClasses)
#Part 1
rms = RMSprop()
model = Sequential()
#input shape: channels, rows, columns
model.add(Convolution2D(32, 3, 3, border_mode='same',
input_shape=(3, 32, 32)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation("relu"))
model.add(Dropout(0.5))
#output layer
model.add(Dense(numClasses))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer=rms,metrics=["accuracy"])
model.fit(X_train,Y_train, batch_size=32, nb_epoch=25,
verbose=1, validation_split=0.2,
callbacks=[EarlyStopping(monitor='val_loss', patience=2)])
print('Classifcation rate %02.3f' % model.evaluate(X_test, Y_test)[1])
##pull the layers and try to get an output from the network that is image.
newModel = copy_freeze_model(model, nlayers = 8)
newModel.add(Dense(1024))
newModel.compile(loss='mean_squared_error', optimizer=rms,metrics=["accuracy"])
newModel.fit(X_train,X_train, batch_size=32, nb_epoch=25,
verbose=1, validation_split=0.2,
callbacks=[EarlyStopping(monitor='val_loss', patience=2)])
preds = newModel.predict(X_test)
Also when I do:
input_shape=(3, 32, 32)
Does this means a 3 channel (RGB) 32 x 32 image?
What I suggest you is a stacked convolutional autoencoder. This makes unpooling layers and deconvolution compulsory. Here you can find the general idea and code in Theano (on which Keras is built):
https://swarbrickjones.wordpress.com/2015/04/29/convolutional-autoencoders-in-pythontheanolasagne/
An example definition of layers needed can be found here :
https://github.com/fchollet/keras/issues/378