CSV File Dataset Augmentation using Keras - python

I am working on an already implemented project in Kaggle which has to do with Image Classification. I have 6 classes to predict on in total, which are Angry, Happy, Sad etc. I have implemented a CNN model and I am currently using only 4 classes(the ones with highest number of images), but my model is overfitting, my validation accuracy is going 53% at maximum, therefore I have tried several things but not seemingly improving my accuracy. Now I saw people mentioning something called Data Augmentation and thought to give it a go as it seems a potential to increase the accuracy. However I am stuck with an error which I cannot figure out.
Distribution of dataset:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from matplotlib.pyplot import imread, imshow, subplots, show
def plot(data_generator):
"""
Plots 4 images generated by an object of the ImageDataGenerator class.
"""
data_generator.fit(df_training)
image_iterator = data_generator.flow(df_training)
# Plot the images given by the iterator
fig, rows = subplots(nrows=1, ncols=4, figsize=(18,18))
for row in rows:
row.imshow(image_iterator.next()[0].astype('int'))
row.axis('off')
show()
x_train = df_training.drop("emotion",axis=1)
image = x_train[1:2].values.reshape(48, 48)
x_train = x_train.values.reshape(x_train.shape[0], 48, 48,1)
x_train = x_train.astype("float32")
image = image.astype("float32")
image = x_train[1:2].reshape(48, 48)
# Creating a dataset which contains just one image.
images = image.reshape((1, image.shape[0], image.shape[1]))
imshow(images[0])
show()
print(x_train.shape)
data_generator = ImageDataGenerator(rotation_range=90)
plot(data_generator)
Error:
ValueError: Input to .fit() should have rank 4. Got array with
shape: (28709, 2305)
I have already reshaped my data into a 4d array but for some reason in the error it appears as my data is 2d.
This is the shape of print(x_train.shape) => (28709, 48, 48, 1)
x_train is where the dataset is, x_train[1:2] accessing one image.
P.s Is there any other approach that you would recommend to improve my accuracy according to this dataset. For further questions about my dataset please let me know if you don't understand something in this partial code.

You use your data_generator on df_training and not on x_train.
As for more ideas about how to avoid overfitting:
Tensorflow has an official tutorial on that with some good suggestions:
https://www.tensorflow.org/tutorials/keras/overfit_and_underfit

Related

CNN keras hand written recognition has high accuracy but poor predictions

I am basically doing this for a school project and followed some guides to make a neuron network using CNN. Libraries I am using are cv2, NumPy, TensorFlow, and matplotlib. The problem currently I am facing is that my network has high accuracy but very bad predictions. I made sure the pictures are inverted and 28x28. I also expand the number of images to predict from 5 to 10. I also tried adding more layers but didn't help either. If anyone can help me out would be awesome! I am also very new to this so please explain the best you can!
Example of the output:As you can see, the hand-writing isn't bad or anything but it still can't predict that it's a 6 but a 1.
Here is the Epoch with an accuracy of 99% basically
Here is the code:
import cv2
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
mnist = tf.keras.datasets.mnist
(a_train, b_train), (a_test, b_test) = mnist.load_data()
a_train = tf.keras.utils.normalize(a_train, axis=1)
a_test = tf.keras.utils.normalize(a_test, axis=1)
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Flatten(input_shape=(28,28)))
model.add(tf.keras.layers.Dense(units=255, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(units=255, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(units=20, activation=tf.nn.softmax))
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(a_train, b_train, epochs=50)
lost, accuracy = model.evaluate(a_train, b_train)
print(lost)
print(accuracy)
model.save('test.model')
for x in range(1,11):
img = cv2.imread(fr'C:\Users\Eric\PycharmProjects\pythonProject2\test.model\{x}.png')[:,:,0]
img = np.invert(np.array([img]))
prediction = model.predict(img)
print(f'My Guess is: {np.argmax(prediction)}')
plt.imshow(img[0], cmap=plt.cm.binary)
plt.show()
Somethings I tried doing:
I tried adding more layers assuming that it will train and have a better prediction.
I added more samples numbers to see if I can have a higher prediction. I went from 5 to 10 but still a 20% right prediction.
I have tried changing about of Epoch and tried more batch size but also didn't work.
I am pretty much stuck at this point trying my best to understand it but not able to improve it at all. If anyone has any tips, please let me know!
You need to normalize your images when predicting. cv2.imread creates an array from 0 to 255. You can normalize it by dividing img by 255.
Your image you use to predict should also have white text on a black background.
Lastly, you do not need the np.invert.
So your code should be
for x in range(1, 11):
img = np.expand_dims(cv2.imread(f'C:\Users\Eric\PycharmProjects\pythonProject2\test.model\{x}.png')[:, :, 0], 0) / 255.
prediction = model.predict(img)
print(f'My Guess is: {np.argmax(prediction)}')
plt.imshow(img[0], cmap=plt.cm.binary)
plt.show()

Keras image classification network always predicting one class, and stays at 50% accuracy

I've been working on a Keras network to classify images as to whether they contain traffic lights or not, but so far I've had 0 success. I have a dataset of 11000+ images, and for my first test I used 240 images (or rather, text files for each image with the grayscale pixel values). There is only one output - a 0 or 1 saying whether the image contains traffic lights.
However, when I ran the test, it only predicted one class. Given that 53/240 images had traffic lights, it was achieving about a 79% accuracy rate because it was just predicting 0 all the time. I read that this might be down to inbalanced data, so I downscaled to just 4 images - 2 with traffic lights, 2 without.
Even with this test, it still stuck at 50% accuracy after 5 epochs; it's just predicting one class! Similar questions have been answered but I haven't found anything that is working for me :(
Here is the code I am using:
from keras.datasets import mnist
from keras import models
from keras import layers
from keras.utils import to_categorical
import numpy as np
import os
train_images = []
train_labels = []
#The following is just admin tasks - extracting the grayscale pixel values
#from the text files, adding them to the input array. Same with the labels, which
#are extracted from text files and added to output array. Not important to performance.
for fileName in os.listdir('pixels1/'):
newRead = open(os.path.join('pixels1/', fileName))
currentList = []
for pixel in newRead:
rePixel = int(pixel.replace('\n', ''))/255
currentList.append(rePixel)
train_images.append(currentList)
for fileName in os.listdir('labels1/'):
newRead = open(os.path.join('labels1/', fileName))
line = newRead.readline()
train_labels.append(int(line))
train_images = np.array(train_images)
train_labels = np.array(train_labels)
train_images = train_images.reshape((4,13689))
#model
model = models.Sequential()
model.add(layers.Dense(13689, input_dim=13689, activation='relu'))
model.add(layers.Dense(13689, activation='relu'))
model.add(layers.Dense(1, activation='softmax'))
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=5, batch_size=1)
I was hoping at the very least it would be able to recognise the images at the end. I really want to move onto running a training session on my full 11,000 examples, but at this point I can't get it to work with 4.
Rough points:
You seem to believe that the number of units in your dense layers should be equal to your data dimension (13869); this is not the case. Change both of them to something smaller (in the range of 100-200) - they do not even have to be equal. A model that big is not recommended with your relatively small number of data samples (images).
Since you are in a binary classification setting with a single node in your last layer, you should use activation=sigmoid for this (last) layer, and compile your model with loss='binary_crossentropy'.
In imaging applications, normally the first couple of layers are convolutional ones.

How can I fix the issue to reshape process in image derived from x_train in Python?

I found a dataset from Kaggle Here is the link : https://www.kaggle.com/quangqiyana/human-gender-identity
I want to implement CNN algorithm into dataset.
I wrote some codes to get X_train and Y_train
train = pd.read_csv("files/gender.csv")
train.shape -> (230, 67502)
train.drop('Unnamed: 0', axis=1, inplace=True)
Y_train = train["Label"]
X_train = train.drop(labels = ["Label"],axis = 1)
Then I want to show some images by iloc process
img = X_train.iloc[0].to_numpy()
img = np.pad(img, (0, (67600-img.shape[0])), 'constant').reshape((260, 260))
plt.imshow(img)
plt.title(train.iloc[0,0])
plt.axis("off")
plt.show()
Because any number is not a square of 67502 , I can use pad. But the image couldn't show with resolution.
Here is the screenshot.
How can I fix the reshape issue?
This dataset is likely not intended to be used with CNN, because the data encoded into the columns has no spatial relation to each other, like in images. Considering that this dataset was downloaded 1 (one) time, probably by you, and nobody has created any notebooks or deemed it worth a discussion, I'd recommend to move to another dataset, which has other people working on it, so you can ask questions there (on Kaggle) and get help.

What is the meaning of the result of model.predict() function for semantic segmentation?

I use Segmentation Models library for multi-class (in my case 4 class) semantic segmentation. The model (UNet with 'resnet34' backbone) is trained with 3000 RGB (224x224x3) images. The accuracy is around 92.80%.
1) Why model.predict() function requires (1,224,224,3) shaped array as input ? I didn't find the answer even in the Keras documentation. Actually, below code is working, I have no problem with it but I want to understand the reason.
predictions = model.predict( test_image.reshape(-1,224,224,3) );
2) predictions is a (1,224,224,3) shaped numpy array. Its data type is float32 and contains some floating numbers. What is the meaning of the numbers inside this array? How can I visualize them? I mean, I assumed that the result array will contain one of 4 class label (from 0 to 3) for every pixel, and then I will apply the color map for each class. In other words, the result should have been a prediction map, but I didn't get it. To understand better what I mean about prediction map, please visit the Jeremy Jordan's blog about semantic segmentation.
result = predictions[0]
plt.imshow(result) # import matplotlib.pyplot as plt
3) What I finally want to do is like Github: mrgloom - Semantic Segmentation Categorical Crossentropy Example did in visualy_inspect_result function.
1) Image input shape in your deep neural network architecture is (224,224,3), so width=height=224 and 3 color channels. And you need an additionnal dimension in case you want to give more than one image at a time to your model. So (1,224,224,3) or (something, 224,224,3).
2) According to the doc of Segementation models repo, you can specify the number of classes you want as output model = Unet('resnet34', classes=4, activation='softmax'). Thus if you reshape your labelled image to have a shape (1,224,224,4). The last dimension is a mask channel indicating with a 0 or 1 if pixel i,j belongs to class k. Then you can predict and access to each output mask
masked = model.predict(np.array([im])[0]
mask_class0 = masked[:,:,0]
mask_class1 = masked[:,:,1]
3) Then using matplotlib you will be able to plot semantic segmentation or using scikit-image : color.label2rgb function

TensorFlow: Train model on a custom image dataset

I am interested in training and evaluating a convolutional neural net model on my own set of images. I want to use the tf.layers module for my model definition, along with a tf.learn.Estimator object to train and evaluate the model using the fit() and evaluate() methods, respectively.
Here is the tutorial that I have been following, which is helpful for showcasing the tf.layers module and the tf.learn.Estimator class. However, the dataset that it uses (MNIST) is simply imported and loaded (as NumPy arrays). See the following main function from the tutorial script:
def main(unused_argv):
# Load training and eval data
mnist = learn.datasets.load_dataset("mnist")
train_data = mnist.train.images # Returns np.array
train_labels = np.asarray(mnist.train.labels, dtype=np.int32)
eval_data = mnist.test.images # Returns np.array
eval_labels = np.asarray(mnist.test.labels, dtype=np.int32)
# Create the Estimator
mnist_classifier = learn.Estimator(
model_fn=cnn_model_fn, model_dir="/tmp/mnist_convnet_model")
# Set up logging for predictions
# Log the values in the "Softmax" tensor with label "probabilities"
tensors_to_log = {"probabilities": "softmax_tensor"}
logging_hook = tf.train.LoggingTensorHook(
tensors=tensors_to_log, every_n_iter=50)
# Train the model
mnist_classifier.fit(
x=train_data,
y=train_labels,
batch_size=100,
steps=20000,
monitors=[logging_hook])
# Configure the accuracy metric for evaluation
metrics = {
"accuracy":
learn.MetricSpec(
metric_fn=tf.metrics.accuracy, prediction_key="classes"),
}
# Evaluate the model and print results
eval_results = mnist_classifier.evaluate(
x=eval_data, y=eval_labels, metrics=metrics)
print(eval_results)
Full code here
I have my own images, which I have in both jpg format within a certain directory structure:
data
train
classA
1.jpg
2.jpg
...
classB
3.jpg
4.jpg
...
...
validate
classA
5.jpg
6.jpg
...
classB
...
...
And I have also converted my image directories into TFRecord format, with one TFRecord file for train and one for validation. I followed this tutorial, which uses the build_image_data.py script from the Inception model that comes with TensorFlow as a blackbox that outputs these TFRecord files. I admit that I may have put the cart before the horse a bit by creating these, but I thought that perhaps there was a way to use these as inputs to the tf.learn.Estimator's fit() and evaluate() methods.
Questions
How can I format my jpg (or TFRecord) data so that I can use them as inputs to the Estimator object's functions?
I'm assuming I have to convert my images and labels to NumPy arrays, as it shows in the code above, however, it is not clear how the mnist.train.images and mnist.train.validation are formatted.
Does anyone have any experience with converting jpg files and labels to NumPy arrays that this Estimator class expects as inputs?
Any help would be greatly appreciated.
The file that you have referenced, cnn_mnist.py, and specifically the following function mnist_classifier.fit, requires Numpy arrays as input for x and y. Therefore, I will address your second and third questions as TFRecords may not be easily incorporated into the referenced code.
however, it is not clear how the mnist.train.images and mnist.train.validation are formatted
mnist.train.images is a Numpy array with shape (55000, 784), where 55000 is the number of images and 784 is the dimension of each flattened image (28 x 28). mnist.validation.images is also a Numpy array with shape (5000, 784).
Does anyone have any experience with converting jpg files and labels to NumPy arrays that this Estimator class expects as inputs?
The following code reads in one JPEG image as a three-dimensional Numpy array:
from scipy.misc import imread
filename = '1.jpg'
np_1 = imread(filename)
I assume all of these images are the same size or that you are able to resize them to the same size, considering that you have already generated TFRecords files from this dataset. All that is left to do is flatten the image, read in the other images iteratively and flatten them, and then vertically stack all the images. This object can be fed into the Estimator function.
Below is code to flatten and vertically stack two three-dimensional Numpy arrays:
import numpy as np
np_1_2 = np.vstack((np_1.flatten(), np_2.flatten()))

Categories