Recieving output text when a TensorFlow model name != x - python

I have made a model which detects when a person has their face to the right, left, or in the middle. I am making a prediction using the following code:
from keras.models import load_model
from PIL import Image, ImageOps
import numpy as np
# Disable scientific notation for clarity
np.set_printoptions(suppress=True)
# Load the model
model = load_model('models//keras_Model.h5', compile=False)
# Load the labels
class_names = open('models//labels.txt', 'r').readlines()
# Create the array of the right shape to feed into the keras model
# The 'length' or number of images you can put into the array is
# determined by the first position in the shape tuple, in this case 1.
data = np.ndarray(shape=(1, 224, 224, 3), dtype=np.float32)
# Replace this with the path to your image
image = Image.open('IMG.png').convert('RGB')
# resize the image to a 224x224 with the same strategy as in TM2:
# resizing the image to be at least 224x224 and then cropping from the center
size = (224, 224)
image = ImageOps.fit(image, size, Image.Resampling.LANCZOS)
# turn the image into a numpy array
image_array = np.asarray(image)
# Normalize the image
normalized_image_array = (image_array.astype(np.float32) / 127.0) - 1
# Load the image into the array
data[0] = normalized_image_array
# run the inference
prediction = model.predict(data)
index = np.argmax(prediction)
class_name = class_names[index]
confidence_score = prediction[0][index]
print('Class:', class_name, end='')
print('Confidence score:', confidence_score)
That code gives me the proper prediction. Here is the labels.txt file:
0 Left_Side_Face
1 Middle_Face
2 Right_Side_Face
What I want to do now is say for example the person has turned their head to the right side, I want to print the message 'face the computer' but when I tried doing it I failed.
Here is the code I tried:
if class_name == "2 Right_Side_Face":
print('face the computer')
if class_name == "0 Left_Side_Face":
print('face the computer')
else:
print('Invalid class name')
print('exiting.....')
The problem is that whenever I run the code above it goes to the else and print 'Invalid class name' and when I remove the else there are no errors in the code, it just skips the if statements.
Question:
How can I do something (print a message, alarm, etc.) when a specific class is detected by a TensorFlow model?
EDIT:
Here is the output of the script when I feed the model with a right side face image:
Class: 2 Right_Side_Face
Confidence score: 0.84736574
Invalid class name
exiting....

I think this is because class_name is not a complete string but an int + a string: maybe do: if str(class_name) == "2 Right_Side_Face": instead of if class_name == "2 Right_Side_Face": to make sure it is a complete string before comparing or just use the index : if index == 2: print('face the computer')

Related

I want to add a boundry box

I have a code that detects the object but I want to add the boundary box to the detections.
import cv2
import numpy as np
from keras.models import load_model
# Load the model
model = load_model('keras_model.h5')
# CAMERA can be 0 or 1 based on default camera of your computer.
camera = cv2.VideoCapture(0)
# Grab the labels from the labels.txt file. This will be used later.
labels = open('labels.txt', 'r').readlines()
while True:
# Grab the webcameras image.
ret, image = camera.read()
# Resize the raw image into (224-height,224-width) pixels.
image = cv2.resize(image, (224, 224), interpolation=cv2.INTER_AREA)
# Show the image in a window
cv2.imshow('Webcam Image', image)
# Make the image a numpy array and reshape it to the models input shape.
image = np.asarray(image, dtype=np.float32).reshape(1, 224, 224, 3)
# Normalize the image array
image = (image / 127.5) - 1
# Have the model predict what the current image is. Model.predict
# returns an array of percentages. Example:[0.2,0.8] meaning its 20% sure
# it is the first label and 80% sure its the second label.
probabilities = model.predict(image)
# Print what the highest value probabilitie label
print(labels[np.argmax(probabilities)])
# Listen to the keyboard for presses.
keyboard_input = cv2.waitKey(1)
# 27 is the ASCII for the esc key on your keyboard.
if keyboard_input == 27:
break
camera.release()
cv2.destroyAllWindows()
A nice boundrybox around the predicted object

Forward method error in DNN module of OpenCV (Ptyhon) using .onnx model

I wanted to test a pretrained model downloaded from here to perform an ocr task. Link to download, its name is CRNN_VGG_BiLSTM_CTC.onnx. This model is extracted from here. The sample-image.png can be download from here (see the code bellow).
When I do the forward of the neural network to predict (ocr) in the blob I get the following error:
error: OpenCV(4.4.0) /tmp/pip-req-build-xgme2194/opencv/modules/dnn/src/layers/convolution_layer.cpp:348: error: (-215:Assertion failed) ngroups > 0 && inpCn % ngroups == 0 && outCn % ngroups == 0 in function 'getMemoryShapes'
Feel free to read the code bellow. I tried many things, it's weird because this model does not require a predetermined input shape. If you know any way to read this model and do the forward it is also going to be helpful but I'd rather solve using OpenCV.
import cv2 as cv
# The model is downloaded from here https://drive.google.com/drive/folders/1cTbQ3nuZG-EKWak6emD_s8_hHXWz7lAr
# model path
modelRecognition = os.path.join(MODELS_PATH,'CRNN_VGG_BiLSTM_CTC.onnx')
# read net
recognizer = cv.dnn.readNetFromONNX(modelRecognition)
# Download sample_image.png from https://i.ibb.co/fMmCB7J/sample-image.png (image host website)
sample_image = cv.imread('sample-image.png')
# Height , Width and number of channels of the image
H, W, C = sample_image.shape
# Create a 4D blob from cropped image
blob = cv.dnn.blobFromImage(sample_image, size = (H, W))
recognizer.setInput(blob)
# Here is where i get the errror that I mentioned before
result = recognizer.forward()
Thank you so much in advance.
Your problem is actually that the input data you feed to your model doesn't match the shape of the data the model was trained on.
I used this answer to inspect your onnx model and it appears that it expects an input of shape (1, 1, 32, 100). I modified your code to reshape the image to 1 x 32 x 100 pixels and the inference actually runs without error.
EDIT
I've added some code to interpret the result of the inference. We now display the image and the inferred OCR text.
This doesn't seem to be working, but reading the tutorial on OpenCV, there should be two models:
one that detects where there is text in the image. This network accepts images of various sizes, it returns the locations of text within the image and then cropped parts of the image, of sizes 100x32 are passed to the second
one that actually does the "reading" and given patches of image, returns the characters. For this, there a file alphabet_36.txt that is provided together with the pre-trained models.
It isn't clear to me though which network to use for text detection. Hope the edited code below helps you develop your application further.
import cv2 as cv
import os
import numpy as np
import matplotlib.pyplot as plt
# The model is downloaded from here https://drive.google.com/drive/folders/1cTbQ3nuZG-EKWak6emD_s8_hHXWz7lAr
# model path
MODELS_PATH = './'
modelRecognition = os.path.join(MODELS_PATH,'CRNN_VGG_BiLSTM_CTC.onnx')
# read net
recognizer = cv.dnn.readNetFromONNX(modelRecognition)
# Download sample_image.png from https://i.ibb.co/fMmCB7J/sample-image.png (image host website)
sample_image = cv.imread('sample-image.png', cv.IMREAD_GRAYSCALE)
sample_image = cv.resize(sample_image, (100, 32))
sample_image = sample_image[:,::-1].transpose()
# Height and Width of the image
H,W = sample_image.shape
# Create a 4D blob from image
blob = cv.dnn.blobFromImage(sample_image, size=(H,W))
recognizer.setInput(blob)
# network inference
result = recognizer.forward()
# load alphabet
with open('alphabet_36.txt') as f:
alphabet = f.readlines()
alphabet = [f.strip() for f in alphabet]
# interpret inference results
res = []
for i in range(result.shape[0]):
ind = np.argmax(result[i,0])
res.append(alphabet[ind])
ocrtxt = ''.join(res)
# show image and detected OCR characters
plt.imshow(sample_image)
plt.title(ocrtxt)
plt.show()
Hope it helps.
Cheers

How to fix "TypeError: 'int' object is not subscriptable in the rfind(os.sep.path) + 1:"

I am trying to classify my input image using pre-trained labels and matching that with the label inserted into the input image file name like : "my_name.png" ; which is also there in the label.pickle "my_name"
def onClassify(self):
IMAGE_DIMS = (96, 96, 3)
image = cv2.imread("C:/Users/Jimit Vaghela/PycharmProjects/Image Classification/gui /cap_images/fold_1/jimit_Image_202084T132225.png")
output = image.copy()
# pre-process the image for classification
image = cv2.resize(image, (IMAGE_DIMS[1], IMAGE_DIMS[0]))
image = image.astype("float") / 255.0
image = img_to_array(image)
image = np.expand_dims(image, axis=0)
# load the trained convolutional neural network and the label
# binarizer
print("[INFO] loading network...")
model = load_model("C:/Users/Jimit Vaghela/PycharmProjects/Image Classification/gui/classifier/")
lb = pickle.loads(open("C:/Users/Jimit Vaghela/PycharmProjects/Image Classification/gui/labelbin.pickle", "rb").read())
# classify the input image
print("[INFO] classifying image...")
proba = model.predict(image)[0]
idx = np.argmax(proba)
label = lb.classes_[idx]
# we'll mark our prediction as "correct" of the input image filename
# contains the predicted label text (obviously this makes the
# assumption that you have named your testing image files this way)
filename = "jimit_Image_202084T132225.png".rfind(os.path.sep)[+1:]
correct = "correct" if filename.rfind(label) != -1 else "incorrect"
# build the label and draw the label on the image
label = "{}: {:.2f}% ({})".format(label, proba[idx] * 100, correct)
output = imutils.resize(output, width=400)
cv2.putText(output, label, (10, 25), cv2.FONT_HERSHEY_SIMPLEX,
0.7, (0, 255, 0), 2)
# show the output image
print("[INFO] {}".format(label))
self.label.setPixmap(QPixmap.fromImage(output)) ```
The error is in this line: filename = "jimit_Image_202084T132225.png".rfind(os.path.sep)[+1:]
What am I doing wrong here? What should I add in front of .rfind() method.
Desired output as asked by #MisterMiyagi:
As you can see the input image was named as "charmander_counter.png" in the directory "/examples/charmander_counter.png" and I am using my "my_name.png" replacing the original code.
To clarify, I have already trained the model with various images of mine in a class named as "my_name"
if you are trying to get the file name without the extension then try this
filename = "jimit_Image_202084T132225.png".rsplit(".")[-2]
which would give you
jimit_Image_202084T132225
assuming you only have file names, if the filenname might also include path, something like this would work.
filepath = "path/to/file/jimit_Image_202084T132225.png"
filename_with_extension = filepath..rsplit(os.path.sep)[-1] # this will extract the filename, giving you jimit_Image_202084T132225.png
filename = filename_with_extension.rsplit(".")[-2] # this will remove the extension giving you jimit_Image_202084T132225.
or in one line
filename = "path/to/file/jimit_Image_202084T132225.png".rsplit(os.path.sep)[-1].rsplit(".")[-2]
note that this relies on the file having an extension so if your original file was jimit_Image_202084T132225 (no .png) then it won't work.
this would also work if you had multiple file extensions like jimit_Image_202084T132225.new.png you'd get jimit_Image_202084T132225.new, if that is not desired then comment and I'll change it to the desired one
Edit: as mentioned by #MisterMiyagi os.path contains tools for doing this, so you can use this to achieve the same thing
os.path.splitext(os.path.basename("path/to/file/jimit_Image_202084T132225.new.png"))[0]

How to convert NumPy array image to TensorFlow image?

After using TensorFlow's retrain.py
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/image_retraining/retrain.py
I've successfully generated the "retrained_labels.txt" and "retrained_graph.pb" files. For anybody not familiar with this process, I'm essentially following this tutorial:
https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/#0
which is effectively the same steps as this popular video:
https://www.youtube.com/watch?v=QfNvhPx5Px8
After the retraining process, I'm attempting to write a Python script that opens all the images in a test images directory, and successively shows each image in an OpenCV window and also runs TensorFlow to classify the image.
Problem is, I can't seem to work out how to open the image as a NumPy array (this is the format that the Python OpenCV wrapper uses) and then convert it into a format I can pass into TensorFlow's sess.run().
Currently I'm opening the image with cv2.imread() and then opening it again with tf.gfile.FastGFile(). This is a very poor practice; I'd much rather open the image once and then convert it.
Here is the relevant portion of the code where I'm stuck:
# open the image with OpenCV
openCVImage = cv2.imread(imageFileWithPath)
# show the OpenCV image
cv2.imshow(fileName, openCVImage)
# get the final tensor from the graph
finalTensor = sess.graph.get_tensor_by_name('final_result:0')
# open the image in TensorFlow
tfImage = tf.gfile.FastGFile(imageFileWithPath, 'rb').read()
# run the network to get the predictions
predictions = sess.run(finalTensor, {'DecodeJpeg/contents:0': tfImage})
After reading these posts:
How to convert numpy arrays to standard TensorFlow format?
Feeding image data in tensorflow for transfer learning
I've tried the following:
# show the OpenCV image
cv2.imshow(fileName, openCVImage)
# get the final tensor from the graph
finalTensor = sess.graph.get_tensor_by_name('final_result:0')
# convert the NumPy array / OpenCV image to a TensorFlow image
openCVImageAsArray = np.asarray(openCVImage, np.float32)
tfImage = tf.convert_to_tensor(openCVImageAsArray, np.float32)
# run the network to get the predictions
predictions = sess.run(finalTensor, {'DecodeJpeg/contents:0': tfImage})
This results in this error on the sess.run() line:
TypeError: The value of a feed cannot be a tf.Tensor object. Acceptable feed values include Python scalars, strings, lists, numpy ndarrays, or TensorHandles.
I've also tried this:
# show the OpenCV image
cv2.imshow(fileName, openCVImage)
# get the final tensor from the graph
finalTensor = sess.graph.get_tensor_by_name('final_result:0')
# convert the NumPy array / OpenCV image to a TensorFlow image
tfImage = np.array(openCVImage)[:, :, 0:3]
# run the network to get the predictions
predictions = sess.run(finalTensor, {'DecodeJpeg/contents:0': tfImage})
which results in this error:
ValueError: Cannot feed value of shape (257, 320, 3) for Tensor 'DecodeJpeg/contents:0', which has shape '()'
--- EDIT ---
I've also tried this:
# show the OpenCV image
cv2.imshow(fileName, openCVImage)
# get the final tensor from the graph
finalTensor = sess.graph.get_tensor_by_name('final_result:0')
# convert the NumPy array / OpenCV image to a TensorFlow image
tfImage = np.expand_dims(openCVImage, axis=0)
# run the network to get the predictions
predictions = sess.run(finalTensor, feed_dict={finalTensor: tfImage})
which results in this error:
ValueError: Cannot feed value of shape (1, 669, 1157, 3) for Tensor 'final_result:0', which has shape '(?, 2)'
and I've also tried this:
# show the OpenCV image
cv2.imshow(fileName, openCVImage)
# get the final tensor from the graph
finalTensor = sess.graph.get_tensor_by_name('final_result:0')
# convert the NumPy array / OpenCV image to a TensorFlow image
tfImage = np.expand_dims(openCVImage, axis=0)
# run the network to get the predictions
predictions = sess.run(finalTensor, feed_dict={'DecodeJpeg/contents:0': tfImage})
which results in this error:
ValueError: Cannot feed value of shape (1, 669, 1157, 3) for Tensor 'DecodeJpeg/contents:0', which has shape '()'
I'm not sure if this is necessary, but if anyone is curious here is the entire script. Note that this works great except for having to open the image twice:
# test.py
import os
import tensorflow as tf
import numpy as np
import cv2
# module-level variables ##############################################################################################
RETRAINED_LABELS_TXT_FILE_LOC = os.getcwd() + "/" + "retrained_labels.txt"
RETRAINED_GRAPH_PB_FILE_LOC = os.getcwd() + "/" + "retrained_graph.pb"
TEST_IMAGES_DIR = os.getcwd() + "/test_images"
#######################################################################################################################
def main():
# get a list of classifications from the labels file
classifications = []
# for each line in the label file . . .
for currentLine in tf.gfile.GFile(RETRAINED_LABELS_TXT_FILE_LOC):
# remove the carriage return
classification = currentLine.rstrip()
# and append to the list
classifications.append(classification)
# end for
# show the classifications to prove out that we were able to read the label file successfully
print("classifications = " + str(classifications))
# load the graph from file
with tf.gfile.FastGFile(RETRAINED_GRAPH_PB_FILE_LOC, 'rb') as retrainedGraphFile:
# instantiate a GraphDef object
graphDef = tf.GraphDef()
# read in retrained graph into the GraphDef object
graphDef.ParseFromString(retrainedGraphFile.read())
# import the graph into the current default Graph, note that we don't need to be concerned with the return value
_ = tf.import_graph_def(graphDef, name='')
# end with
# if the test image directory listed above is not valid, show an error message and bail
if not os.path.isdir(TEST_IMAGES_DIR):
print("the test image directory does not seem to be a valid directory, check file / directory paths")
return
# end if
with tf.Session() as sess:
# for each file in the test images directory . . .
for fileName in os.listdir(TEST_IMAGES_DIR):
# if the file does not end in .jpg or .jpeg (case-insensitive), continue with the next iteration of the for loop
if not (fileName.lower().endswith(".jpg") or fileName.lower().endswith(".jpeg")):
continue
# end if
# show the file name on std out
print(fileName)
# get the file name and full path of the current image file
imageFileWithPath = os.path.join(TEST_IMAGES_DIR, fileName)
# attempt to open the image with OpenCV
openCVImage = cv2.imread(imageFileWithPath)
# if we were not able to successfully open the image, continue with the next iteration of the for loop
if openCVImage is None:
print("unable to open " + fileName + " as an OpenCV image")
continue
# end if
# show the OpenCV image
cv2.imshow(fileName, openCVImage)
# get the final tensor from the graph
finalTensor = sess.graph.get_tensor_by_name('final_result:0')
# ToDo: find a way to convert from a NumPy array / OpenCV image to a TensorFlow image
# instead of opening the file twice, these attempts don't work
# attempt 1:
# openCVImageAsArray = np.asarray(openCVImage, np.float32)
# tfImage = tf.convert_to_tensor(openCVImageAsArray, np.float32)
# attempt 2:
# tfImage = np.array(openCVImage)[:, :, 0:3]
# open the image in TensorFlow
tfImage = tf.gfile.FastGFile(imageFileWithPath, 'rb').read()
# run the network to get the predictions
predictions = sess.run(finalTensor, {'DecodeJpeg/contents:0': tfImage})
# sort predictions from most confidence to least confidence
sortedPredictions = predictions[0].argsort()[-len(predictions[0]):][::-1]
print("---------------------------------------")
# keep track of if we're going through the next for loop for the first time so we can show more info about
# the first prediction, which is the most likely prediction (they were sorted descending above)
onMostLikelyPrediction = True
# for each prediction . . .
for prediction in sortedPredictions:
strClassification = classifications[prediction]
# if the classification (obtained from the directory name) ends with the letter "s", remove the "s" to change from plural to singular
if strClassification.endswith("s"):
strClassification = strClassification[:-1]
# end if
# get confidence, then get confidence rounded to 2 places after the decimal
confidence = predictions[0][prediction]
# if we're on the first (most likely) prediction, state what the object appears to be and show a % confidence to two decimal places
if onMostLikelyPrediction:
scoreAsAPercent = confidence * 100.0
print("the object appears to be a " + strClassification + ", " + "{0:.2f}".format(scoreAsAPercent) + "% confidence")
onMostLikelyPrediction = False
# end if
# for any prediction, show the confidence as a ratio to five decimal places
print(strClassification + " (" + "{0:.5f}".format(confidence) + ")")
# end for
# pause until a key is pressed so the user can see the current image (shown above) and the prediction info
cv2.waitKey()
# after a key is pressed, close the current window to prep for the next time around
cv2.destroyAllWindows()
# end for
# end with
# write the graph to file so we can view with TensorBoard
tfFileWriter = tf.summary.FileWriter(os.getcwd())
tfFileWriter.add_graph(sess.graph)
tfFileWriter.close()
# end main
#######################################################################################################################
if __name__ == "__main__":
main()
You were pretty close:
{'DecodeJpeg/contents:0': tfImage} decodes a binary jpeg image.
You need to use {'DecodeJpeg:0': tfImage} instead if the image is already decoded.
Read more here
So your code should look like this:
tfImage = np.array(openCVImage)[:, :, 0:3]
# run the network to get the predictions
predictions = sess.run(finalTensor, {'DecodeJpeg:0': tfImage})

why net.blobs['fc7'].data[0] giving all zeros

I want to extract face descriptors from photos of people. This is what I've done so far:
First detected faces from photos using opencv library in python.
Saved those faces in another image.
Next I have to extract descriptor from face image.
For this I have downloaded vgg face caffemodel CNN from here: http://www.robots.ox.ac.uk/~vgg/software/vgg_face/
To extract descriptor, first I did this:
net = caffe.Net('CAFFE_FACE_deploy.prototxt','CAFFE_FACE.caffemodel',caffe.TEST)
img = caffe.io.load_image( "detectedface.jpg" )
img = img[:,:,::-1]*255.0
avg = np.array([129.1863,104.7624,93.5940])
img = img - avg
img = img.transpose((2,0,1))
img = img[None,:]
out = net.forward_all( data = img )
But it gives dimension mismatch error that data should be of dimension (50,3,224,224) instead of (50,3,490,490)
Then I tried this:
# input preprocessing: 'data' is the name of the input blob == net.inputs[0]
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2,0,1))
transformer.set_mean('data', np.load(caffe_root + 'python/caffe/imagenet/ilsvrc_2012_mean.npy').mean(1).mean(1)) # mean pixel
transformer.set_raw_scale('data', 255) # the reference model operates on images in [0,255] range instead of [0,1]
transformer.set_channel_swap('data', (2,1,0)) # the reference model has channels in BGR order instead of RGB
net.blobs['data'].reshape(50,3,224,224)
net.blobs['data'].data[...] = transformer.preprocess('data', caffe.io.load_image('detectedface.jpg'))
out = net.forward()
feats = net.blobs['fc7'].data[0]
Here when I print feats, it displays all zeros. Why is it so?

Categories