How to detect the trained images in the neural network? - python

I have used Convolutional Neural Network(CNN) to train the legends in the maps. These legends are circles, ovals, diamonds, crosses, and squares. The neural network (inspired from the code https://towardsdatascience.com/from-raw-images-to-real-time-predictions-with-deep-learning-ddbbda1be0e4) works well in my case. The input images are individual cropped pictures of legends like input trained image and the output I want is to predict these legends in the maps like input maps. My neural network can now classify the images and predict whether they are squares or circles. For example, when I provided this image diamondinput as input, the output is diamond.
from keras.models import model_from_json
import numpy as np
EMOTIONS_LIST = ["circle","cross","diamond","oval","square"]
def predict_emotion(img):
preds = model.predict(img)
return EMOTIONS_LIST[np.argmax(preds)]
import cv2
import numpy as np
from google.colab.patches import cv2_imshow
import keras
#model = keras.models.load_model('/content/drive/MyDrive/model_weights_updated.h5')
fr=cv2.imread('/content/drive/MyDrive/Images/train/diamond/Copy of 0076.tif')
gray_fr = cv2.cvtColor(fr, cv2.COLOR_BGR2GRAY)
roi = cv2.resize(gray_fr, (48, 48))
pred = predict_emotion(roi[np.newaxis, :, :, np.newaxis])
print(pred)
Output for the program:
[[1.7809551e-06 2.4862277e-07 9.9999583e-01 2.1272169e-06 8.9550163e-09]], diamond
How can I make the neural network to predict these legends in the map and all other legends in the map like this outputmap?.

With the network you got, it would be possible to just split each map into a grid and then perform classification for each segment in the grid, but there are many problems with such an approach.
A better solution for you would be to use a neural network that does semantic segmentation. This way, your network regresses a likelihood-map for each of your shapes on the map. With this likelihood-map, you will know how many you have of each class and where they are.
To do so, you can start with the following MaskRCNN implementation.
https://github.com/matterport/Mask_RCNN

Related

Use preprocessing function that changes size of input on ImageDataGenerator

I wish to take the FFT of the input dataset loaded using ImageDataGenerator. Taking the FFT will double the number of channels as I stack the real and complex parts of the complex output of the FFT together along the channels dimension. The preprocessing_function attribute of the ImageDataGenerator class should output a Numpy tensor with the same shape as the input, so I could not use that.
I tried applying tf.math.fft2d directly on the ImageDataGenerator.flow_from_directory() output, but it is consuming too much RAM - causing the program to crash on Google colab. Another way I tried was to add a custom layer computing the FFT as the first layer of my neural network, but this adds to the training time. So I wish to do it as a pre-processing step.
Could anyone kindly suggest an efficient way to apply a function on ImageDataGenerator.
You can do a custom ImageDataGenerator, but I have no reason to think this is any faster than using it in the first layer. It seems like a costly operation, since tf.signal.fft2d takes complex64 or complex128 dtypes. So it needs casting, and then casting back because neural network weights are tf.float32 and other image processing functions don't take complex dtype.
import tensorflow as tf
labels = ['Cats', 'Dogs', 'Others']
def read_image(file_name):
image = tf.io.read_file(file_name)
image = tf.image.decode_jpeg(image, channels=3)
image = tf.image.convert_image_dtype(image, tf.float32)
image = tf.image.resize_with_pad(image, target_height=224, target_width=224)
image = tf.cast(image, tf.complex64)
image = tf.signal.fft2d(image)
label = tf.strings.split(file_name, '\\')[-2]
label = tf.where(tf.equal(label, labels))
return image, label
ds = tf.data.Dataset.list_files(r'path\to\my\pictures\*\*.jpg')
ds = ds.map(read_image)
next(iter(ds))

Black image when image is normalized

Hi I'm using the following procedure to normalize images by following the tutorial on the TF 2.3 website:
from tensorflow.keras import layers
normalization_layer = tf.keras.layers.experimental.preprocessing.Rescaling(1./255)
normalized_ds = train_ds.map(lambda x, y: (normalization_layer(x), y))
image_batch, labels_batch = next(iter(normalized_ds))
first_image = image_batch[0]
# Notice the pixels values are now in `[0,1]`.
print(np.min(first_image), np.max(first_image))
But after testing this for training my vision model doesn't converge. It works fine if I don't use the normalization layer but I'm afraid I'm hurting its performance by using non-normalized images. Also, after trying to render the normalized images I see only black images.
I use imshow to show the images.
I don't really understand why the images are black when I use imshow() and why my model doesn't work with the normalized images

CSV File Dataset Augmentation using Keras

I am working on an already implemented project in Kaggle which has to do with Image Classification. I have 6 classes to predict on in total, which are Angry, Happy, Sad etc. I have implemented a CNN model and I am currently using only 4 classes(the ones with highest number of images), but my model is overfitting, my validation accuracy is going 53% at maximum, therefore I have tried several things but not seemingly improving my accuracy. Now I saw people mentioning something called Data Augmentation and thought to give it a go as it seems a potential to increase the accuracy. However I am stuck with an error which I cannot figure out.
Distribution of dataset:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from matplotlib.pyplot import imread, imshow, subplots, show
def plot(data_generator):
"""
Plots 4 images generated by an object of the ImageDataGenerator class.
"""
data_generator.fit(df_training)
image_iterator = data_generator.flow(df_training)
# Plot the images given by the iterator
fig, rows = subplots(nrows=1, ncols=4, figsize=(18,18))
for row in rows:
row.imshow(image_iterator.next()[0].astype('int'))
row.axis('off')
show()
x_train = df_training.drop("emotion",axis=1)
image = x_train[1:2].values.reshape(48, 48)
x_train = x_train.values.reshape(x_train.shape[0], 48, 48,1)
x_train = x_train.astype("float32")
image = image.astype("float32")
image = x_train[1:2].reshape(48, 48)
# Creating a dataset which contains just one image.
images = image.reshape((1, image.shape[0], image.shape[1]))
imshow(images[0])
show()
print(x_train.shape)
data_generator = ImageDataGenerator(rotation_range=90)
plot(data_generator)
Error:
ValueError: Input to .fit() should have rank 4. Got array with
shape: (28709, 2305)
I have already reshaped my data into a 4d array but for some reason in the error it appears as my data is 2d.
This is the shape of print(x_train.shape) => (28709, 48, 48, 1)
x_train is where the dataset is, x_train[1:2] accessing one image.
P.s Is there any other approach that you would recommend to improve my accuracy according to this dataset. For further questions about my dataset please let me know if you don't understand something in this partial code.
You use your data_generator on df_training and not on x_train.
As for more ideas about how to avoid overfitting:
Tensorflow has an official tutorial on that with some good suggestions:
https://www.tensorflow.org/tutorials/keras/overfit_and_underfit

What is the meaning of the result of model.predict() function for semantic segmentation?

I use Segmentation Models library for multi-class (in my case 4 class) semantic segmentation. The model (UNet with 'resnet34' backbone) is trained with 3000 RGB (224x224x3) images. The accuracy is around 92.80%.
1) Why model.predict() function requires (1,224,224,3) shaped array as input ? I didn't find the answer even in the Keras documentation. Actually, below code is working, I have no problem with it but I want to understand the reason.
predictions = model.predict( test_image.reshape(-1,224,224,3) );
2) predictions is a (1,224,224,3) shaped numpy array. Its data type is float32 and contains some floating numbers. What is the meaning of the numbers inside this array? How can I visualize them? I mean, I assumed that the result array will contain one of 4 class label (from 0 to 3) for every pixel, and then I will apply the color map for each class. In other words, the result should have been a prediction map, but I didn't get it. To understand better what I mean about prediction map, please visit the Jeremy Jordan's blog about semantic segmentation.
result = predictions[0]
plt.imshow(result) # import matplotlib.pyplot as plt
3) What I finally want to do is like Github: mrgloom - Semantic Segmentation Categorical Crossentropy Example did in visualy_inspect_result function.
1) Image input shape in your deep neural network architecture is (224,224,3), so width=height=224 and 3 color channels. And you need an additionnal dimension in case you want to give more than one image at a time to your model. So (1,224,224,3) or (something, 224,224,3).
2) According to the doc of Segementation models repo, you can specify the number of classes you want as output model = Unet('resnet34', classes=4, activation='softmax'). Thus if you reshape your labelled image to have a shape (1,224,224,4). The last dimension is a mask channel indicating with a 0 or 1 if pixel i,j belongs to class k. Then you can predict and access to each output mask
masked = model.predict(np.array([im])[0]
mask_class0 = masked[:,:,0]
mask_class1 = masked[:,:,1]
3) Then using matplotlib you will be able to plot semantic segmentation or using scikit-image : color.label2rgb function

CNN batch with images of different size

I restored a pre-trained model for face detection which takes a single image at a time and returns bounding boxes. How can I make it take a batch of images if these images have different sizes?
You can use tf.image.resize_images method to achieve this. According to docs tf.image.resize_images:
Resize images to size using the specified method.
Resized images will be distorted if their original aspect ratio is not
the same as size. To avoid distortions see
tf.image.resize_image_with_pad.
How to use it?
import tensorflow as tf
from tensorflow.python.keras.models import Model
x = Input(shape=(None, None, 3), name='image_input')
resize_x = tf.image.resize_images(x, [32,32])
vgg_model = load_vgg()(resize_x)
model = Model(inputs=x, outputs=vgg_model.output)
model.compile(...)
model.predict(...)

Categories