Detect Text Fields with CNN and convert it to feature array - python

I'm trying to detect mostly text fields (not OCR, i.e, my goal is not to detect what is written but areas where is written some text). Some images may be important as well (but not much)
For example:
The output must be an feature array.
I had an VGG19 from keras working, but the results aren't awesome because i think it isn't trained to deal with text fields or documents.
img = image.load_img(imagepath, target_size=(224, 224))
img_data = image.img_to_array(img)
img_data = np.expand_dims(img_data, axis=0)
img_data = preprocess_input(img_data)
features = model.predict(img_data)
features = np.array(features) - expected output
Is there any CNN already trained to do this? If not, what approach do you suggest?

Related

How to load images with different image shape to tf.data pipe?

My goal is to have a preprocessing layers so it can handle any image size. This is because the data set that I use have 2 different image shape. The solution is simple, just resize it when I load the image. However, I believe this wont work when the model is deployed, I can't do manual resize like that. So I must use preprocessing layers.
The docs I used
What I've tried:
Put the preprocessing layers part of the model, it does not work.
I am thinking to use TensorSliceDataset.map(resize_and_rescale).
The problem is I need to convert the [tensor image 1, tensor image 2] to TensorSliceDataset. However, I can't convert it.
What I've tried:
tf.data.Dataset.from_tensor_slices((X_train, y_train))
It throws error
InvalidArgumentError: {{function_node __wrapped__Pack_N_9773_device_/job:localhost/replica:0/task:0/device:GPU:0}} Shapes of all inputs must match: values[0].shape = [258,320,3] != values[23].shape = [322,480,3]
[[{{node Pack}}]] [Op:Pack] name: component_0
The load images function:
def load_images(df):
paths = df['path'].values
X = []
for path in paths:
raw = tf.io.read_file(path)
img = tf.image.decode_png(raw, channels=3)
X.append(img)
y = df['kind'].cat.codes
return X, y
As far as I understand you wish to train on both image sizes simultaneously. The simplest way is probably to create two different datasets for each image size and concatenate them after the batching as follows:
dataset_1 = tf.data.Dataset.from_tensor_slices((X_train_1, y_train_1))
dataset_1 = dataset_1.batch(batch_size_1)
dataset_2 = tf.data.Dataset.from_tensor_slices((X_train_2, y_train_2))
dataset_2 = dataset_2.batch(batch_size_2)
dataset = dataset_1.concatenate(dataset_2)
dataset = dataset.shuffle(shuffle_buffer_size)
This case each batch consists of images of the same size. If you use .repeat() do not forget to put if after the concatination.
You need to use ragged tensors to handle different image sizes:
dataset = tf.data.Dataset.from_tensor_slices((tf.ragged.constant(img_list), label_list))
dataset = dataset.apply(tf.data.experimental.dense_to_ragged_batch(batch_size=3))
Example

How to get an image to array, Tensorflow 1.9

So I have to use Tensorflow 1.9 for system specific reasons.
I want to train a cnn with a custom dataset consisting of images.
The folder structure looks very much like this:
./
+ circles
- circle-0.jpg
- circle-1.jpg
- ...
+ hexagons
- hexagon-0.jpg
- hexagon-1.jpg
- ...
+ ...
So the example I have to work with uses MNIST and has these two particular lines of code:
mnist_dataset = tf.keras.datasets.mnist.load_data('mnist_data')
(x_train, y_train), (x_test, y_test) = mnist_dataset
In my work, I also have to use this data format (x_train, y_train), (x_test, y_test), which seems to be quite common. As far as I was able to find out up to now, the format of those datasets are: (image_data, label), and is something like ((60000, 28, 28), (60000,)), at least with the MNIST dataset. The image_data here is supposedly of dtype uint8 (according to this post). I was able to find out, that a tf.data.Dataset() object looks like the tuples I need here (image_data, label).
So far so good. But a few questions arise from this information which I wasn't able to figure out yet, and where I would kindly request your help:
(60000, 28, 28) means 60k a 28 x 28 image value array, right?
If 1. is right, how do I get my images (like in the directory structure I described above) into this format? Is there a function which yields an array that I can use like that?
I know I need some kind of generator function which should get all the images with label, because in Tensorflow 1.9 the tf.keras.utils.image_dataset_from_directory() does not seem to exist yet.
How do the labels actually look like? For example, with my directory structure, would I have something like this:
(A)
File
Label
circle-0.jpg
circle
circle-233.jpg
circle
hexagon-1.jpg
hexagon
triangle-12.jpg
triangle
or (B)
File
Label
circle-0.jpg
circle-0
circle-233.jpg
circle-233
hexagon-1.jpg
hexagon-1
triangle-12.jpg
triangle-12
, where the respective image is already converted to a "(60000, 28, 28)" format? It seems as if I need to create all my functions by myself, since there does not seem to be a good function which takes a directory structure like mine to a dataset which can be utilized by Tensorflow 1.9, or is there?. I know of the tf.keras.preprocessing.image.ImageDataGenerator and image_dataset_from_directory as well as flow_from_directory(), however, all of them don't seem to bring me my desired dataset value tuple format.
I would really appreciate any help!
You have to build a custom data generator for that. If you have two arrays, train_paths containing the paths to images and train_labels containing the labels for the images, then this function (datagen) would yield the images as array and with their respective label as a tuple (image_array, label).
And I have also added a way to integer-encode your labels, with a dictionary encode_label
For example, train_paths and train_labels should look like this:
train_paths = np.array(['path/to/image1.jpg','path/to/image2.jpg','path/to/image3.jpg'])
train_labels = np.array(['circle','square','hexagon'])
where the image of path 'path/to/image1.jpg' has a label of 'circle', the image of path 'path/to/image2.jpg' has a label of 'square'.
This generator function will return data as a batch and you can write your custom augmentation techniques as well (inside the augment function)
import tensorflow as tf
# Hyperparameters
HEIGHT = 224 # Image height
WIDTH = 224 # Image width
CHANNELs = 3 # Image channels
# This function will encode your labels
encode_label = {'hexagon':0, 'circle':1, 'square':2}
def augment(image):
# All your augmentation techniques are done here
return image
def encode_labels(labels):
encoded = []
for label in labels:
encoded.append(encode_label[label])
return encoded
def open_images(paths):
'''
Given a list of paths to images, this function loads
the images from the paths, then augments them, then returns it as a batch
'''
images = []
for path in paths:
image = tf.keras.preprocessing.image.load_img(path, target_size=(HEIGHT, WIDTH, CHANNELS))
image = np.array(image)
image = augment(image)
images.append(image)
return np.array(images)
# This is the data generator
def datagen(paths, labels, batch_size=32):
for x in range(0,len(paths), batch_size):
# Load batch of images
batch_paths = paths[x:x+batch_size]
batch_images = open_images(batch_paths)
# Load batch of labels
batch_labels = labels[x:x+batch_size]
batch_labels = encode_labels(batch_labels)
batch_labels = np.array(batch_labels, dtype='float').reshape(-1)
yield batch_images, batch_labels
If you cannot get tf.keras.preprocessing.image.load_img working in your tensorflow version, try using an alternative method to load image and resize it. One alternative way would be to load the image with matplotlib and then resizing it with skimage. So the open_images function would be this:
import matplotlib
from skimage.transform import resize
def open_images(paths):
'''
Given a list of paths to images, this function loads
the images from the paths, then augments them, then returns it as a batch
'''
images = []
for path in paths:
image = matplotlib.image.imread(path)
image = np.array(image)
image = resize(image, (HEIGHT, WIDTH, CHANNELS))
image = augment(image)
images.append(image)
return np.array(images)

Keras Captcha OCR - How to pass single jpeg image to loaded (trained) model and receive prediction in string?

for the past several hours I was looking all over the internet for an answer to how I can pass a single jpeg image into my pre-trained model (saved and loaded) and receive prediction in string format.
I am using Captcha OCR from this source - https://keras.io/examples/vision/captcha_ocr/
Those two approaches below got me the farthest (I think) but they are still not working:
APPROACH 1:
model = load_model('trained_models/my_trained_model.h5', custom_objects={'CTCLayer': CTCLayer})
img_path = '/test/my_image.jpeg'
img = image.load_img(img_path, target_size=(200, 50))
img_array = image.img_to_array(img)
img_batch = np.expand_dims(img_array, axis=0)
img_preprocessed = preprocess_input(img_batch)
prediction = model.predict(img_preprocessed)
With this approach I didn't convert image to grey scale but before it could make any troubles I receive this error:
ValueError: Layer ocr_model_v1 expects 2 input(s), but it received 1 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(None, 200, 50, 3) dtype=float32>]
APPROACH 2:
This approach is pretty much copied from data preprocessing from OCR model:
img = tf.io.read_file(img_path)
img = tf.io.decode_jpeg(img, channels=1)
img = tf.image.convert_image_dtype(img, tf.float32)
img = tf.image.resize(img, [200, 50])
img_preprocessed = tf.transpose(img, perm=[1, 0, 2])
prediction = model.predict(img_preprocessed)
And it gives me pretty much they same error:
ValueError: Layer ocr_model_v1 expects 2 input(s), but it received 1 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(None, 200, 1) dtype=float32>]
But this time it looks like image is additionally malformed.
I think this error is caused by this line in OCR:
# Define the model
model = keras.models.Model(
inputs=[input_img, labels], outputs=output, name="ocr_model_v1"
)
Since the model is expecting two values (while training we were passing dict with image and image name (answer to captcha)). But now, I would like this model to actually predict the image so I am not able to pass answer/label.
After several hours, I was able to push this up to this moment but right now I ran out of ideas.
Could someone please point me in the right direction?
/// ---------- /// EDIT /// ---------- ///
Hi! I wanted to edit my question. In the meantime I was able to pass jpeg into this model but in not so clean way. I basically copied all the code from lowest part of this tutorial - https://keras.io/examples/vision/captcha_ocr/
Thanks to that I am not receiving any errors but there is really a lot of code that seems redundant but I am not able to refactor it efficiently.
With this code changed:
prediction_model = keras.models.Model(
trained_model.get_layer(name="image").input, trained_model.get_layer(name="dense2").output
)
Now I am receiving errors about wrong Shapes etc. Is it possible to somehow refactor code from section "Inference" of this tutorial?
From the last code snippet and "Since the model is expecting two values (while training we were passing dict with image and image name (answer to captcha)). But now, I would like this model to actually predict the image so I am not able to pass answer/label." statement you have made.
you are trying to train a model that outputs the label of a Captcha image. Therefore, your model has to be a multiclass classification model [if a finite small number of image labels] or an image similarity searching dictionary learning method [if there are a very large number of classes]. However, under both model architectures you dataset format has to be in {X : Captcha image , y : related image label}.
model = keras.model.fit(inputs=input_img, outputs=labels)
or if in the data preprocessing step outputs in dataset object
model = keras.model.fit(data=img_preprocessed)
the resulting model will support both your inferencing [make predictions] methods stated above.

How to generate accuracy from a saved model of Keras?

I already trained my Keras model in .h5. My model use 6 classes and it able to classify all the classes by using images. The model able to output the name of the class that it successfully classified. However, I want to generate accuracy when testing the model with an image input by user. I already searching everywhere but still there are no answer for this problem.
model = load_model('prototype-tl2-80-20.h5')
classes = { 1:'Kacip Fatimah',
2:'Mempisang',
3:'Misai Adam',
4:'Pandan Serapat',
5:'Tapak Sulaiman',
6:'Tongkat Ali'}
image = Image.open(file_path)
image = image.resize((224,224))
image = numpy.expand_dims(image, axis=0)
image = numpy.array(image)
pred = model.predict_classes([image])[0]
sign = classes[pred+1]
print(sign)
to predict an image using a trained model you have to be careful to make sure the image is processed exactly as the training images were processed. The image should be the same size (height,width) as the training images and have the same number of color bands example 'rgb' or 'grayscale'. Make sure color bands are in the same order as used in training. Next you must apply the same preprocessing to the image. For example if your training images were scaled to be between 0 and 1 then you need to rescale your test image with image=image/255. After that than do
pred = model.predict(image)
index=np.argmax(pred)
class=classes[index]
print (index, class)

How to preprocess input for a keras H5 converted model into a pb file

I successfully converted a Keras H5 model into a Tensorflow pb file but I get totally different result when making a prediction.
In Python I use 2 Keras modules to preprocess the data before feeding the network:
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
from tensorflow.keras.preprocessing.image import img_to_array
Here is how I preprocess the data in my Python code:
# extract the object ROI, convert it from BGR to RGB channel
# ordering, resize it to 224x224, and preprocess it
moving_object = img_orig[startY:endY, startX:endX]
moving_object = cv2.cvtColor(moving_object, cv2.COLOR_BGR2RGB)
moving_object = cv2.resize(moving_object, (224, 224))
moving_object = img_to_array(moving_object)
moving_object = preprocess_input(moving_object)
objects.append(moving_object)
Then I make batch predictions via the Keras predict method:
# only make a predictions if at least one object was detected
if len(objects) > 0:
objects = np.array(objects, dtype="float32")
preds = wine_plant_model.predict(objects)
Here is how I preprocess the data in C++:
vector<Mat> detected_objects;
//extract the object ROI
Mat image_roi = img_orig(roi);
detected_objects.push_back(image_roi);
and how I make batch predictions in C++:
if (detected_objects.size() > 0) {
vector<Mat> preds;
Mat inputBlobs = cv::dnn::blobFromImages(detected_objects, 1.0, Size(224, 224));
net.setInput(inputBlobs);
Mat outputs = net.forward();
}
It seems that I am not preprocessing the image the right way in C++ and therefore I am not getting the same results. But I cannot find a equivalent for the Keras preprocess_input() method in C++.
Looking at the Keras documentation the python preprocess_input() method scale the data between 1 and -1. So I do not know if I should normalize the data using the cv::normalize method or do something with the blobFromImages scale factor. I am a bit confused here.
Could you please tell me what I should do to preprocess the data the same way in C++ even if it is not through Keras which does not seem to be available in C++.

Categories