If I train my CNN to identify MNIST handwritten digits using "images" (arrays) with black background (value 0):
Will it be able to identify digits in images with white background?
What about vice versa? If the answer is yes (background color doesn't matter) what would be the explanation? Thanks in advance
No. It wouldn't work directly. If you think about the problem of classifying digits, what we want to do is take a coordinate point (of 28x28 numbers from 0-255), and map it to a digit 0-9. If we fit such a function that performs this task, you can't take the opposite point and expect it to work.
Imagine a simpler case where we have points in 2D (coordinates of 2 numbers), and fit a straight line through it. Now we transform the data by moving the points (the inverse for example), the line doesn't fit anymore, and neither does our model.
However, a CNN that trains and performs well on the first dataset in theory should be able to train and perform well on the second.
Related
To explain the title better, I am looking to classify pictures between two classes. For example, let's say that 0 is white, and black is 1. I train and validate the system with pictures that are gray, some lighter than others. In other words, none of the training/validation (t/v) pictures are 0, and none are 1. The t/v pictures range between 0 and 1 depending of how dark the gray is.
Of course, this is just a hypothetical situation, but I want to apply a similar scenario for my work. All of the information I have found online is based on a binary classification (either 1 or 0), rather than a spectrum classification (between 1 and 0).
I assume that this is possible, but I have no idea where to start. Although, I do have a binary code written with good accuracy.
Based on your given example, maybe a classification approach is not the best one. I think that what you have is a regression problem, as you want your output to be a continuous value in some range, that has a meaning itself (as higher or lower values have a proper meaning).
Regression tasks usually have an output with linear activation, and they expect to have a continuous value as the ground truth.
I think you could start by taking a look at this tutorial.
Hope this helps!
If I understand you correctly, it's definitely possible.
The creator of Keras, François Chollet, wrote Deep Learning with Python which is worth reading. In it he describes how you could accomplish what you would like.
I have worked through examples in his book and shared the code: whyboris/ml-with-python-and-keras
There are many approaches, but a fast one is to use a pre-trained model that can recognize a wide variety of images (for example, classify 1,000 different categories). You will use it "headless" (without the last classification layer that takes the vectors and decides which of the 1,000 categories it falls most into). And you will train just the "last step" in the model (freezing all the previous layers) while training your binary classifier.
Alternatively you could train your own classifier from scratch. Specifically glance at my example (based off the book) cat-dog-classifier which trains its own binary classifier.
I am trying to train my model which classifies images.
The problem I have is, they have different sizes. how should i format my images/or model architecture ?
You didn't say what architecture you're talking about. Since you said you want to classify images, I'm assuming it's a partly convolutional, partly fully connected network like AlexNet, GoogLeNet, etc. In general, the answer to your question depends on the network type you are working with.
If, for example, your network only contains convolutional units - that is to say, does not contain fully connected layers - it can be invariant to the input image's size. Such a network could process the input images and in turn return another image ("convolutional all the way"); you would have to make sure that the output matches what you expect, since you have to determine the loss in some way, of course.
If you are using fully connected units though, you're up for trouble: Here you have a fixed number of learned weights your network has to work with, so varying inputs would require a varying number of weights - and that's not possible.
If that is your problem, here's some things you can do:
Don't care about squashing the images. A network might learn to make sense of the content anyway; does scale and perspective mean anything to the content anyway?
Center-crop the images to a specific size. If you fear you're losing data, do multiple crops and use these to augment your input data, so that the original image will be split into N different images of correct size.
Pad the images with a solid color to a squared size, then resize.
Do a combination of that.
The padding option might introduce an additional error source to the network's prediction, as the network might (read: likely will) be biased to images that contain such a padded border.
If you need some ideas, have a look at the Images section of the TensorFlow documentation, there's pieces like resize_image_with_crop_or_pad that take away the bigger work.
As for just don't caring about squashing, here's a piece of the preprocessing pipeline of the famous Inception network:
# This resizing operation may distort the images because the aspect
# ratio is not respected. We select a resize method in a round robin
# fashion based on the thread number.
# Note that ResizeMethod contains 4 enumerated resizing methods.
# We select only 1 case for fast_mode bilinear.
num_resize_cases = 1 if fast_mode else 4
distorted_image = apply_with_random_selector(
distorted_image,
lambda x, method: tf.image.resize_images(x, [height, width], method=method),
num_cases=num_resize_cases)
They're totally aware of it and do it anyway.
Depending on how far you want or need to go, there actually is a paper here called Spatial Pyramid Pooling in Deep Convolution Networks for Visual Recognition that handles inputs of arbitrary sizes by processing them in a very special way.
Try making a spatial pyramid pooling layer. Then put it after your last convolution layer so that the FC layers always get constant dimensional vectors as input . During training , train the images from the entire dataset using a particular image size for one epoch . Then for the next epoch , switch to a different image size and continue training .
I'm trying to use tensorflow with python to recognize some roi with size 28x28. At the first time i use this code : https://github.com/niektemme/tensorflow-mnist-predict/blob/master/predict_2.py, and he recognize 4/5 numbers in 10, so i try to modify the code, now i can see the percentage of precision and recognize 8/9 numbers in 10. The problem is i need to recognize all numbers and i see if i change the posiztion of the 20x20 inside the 28x28 it recognize every number, so, how tensorflow works? I read many documents about tensorflow, and i don't understand how it works, why if i move the 20x20 of 1 pixel change totally the number?
this is my number: https://imgur.com/a/juOLd, it recognize it like 5, but if i move it 1 pixel down and 1 pixel right it find it like 3, why?
tldr; It's not tensorflow, it's the model
First of all, tensorflow does not do prediction or anything. It is just a fast mathematical operations library with added support for automatic gradients and some other nice powers, which turns out to be very helpful for Machine Learning.
Now, onto your question, why shifting 1 pixel down, or above changes the prediction? It's the model (or more specifically the data). Your model have learned something (from the data) that is relative to the position of your number in the image. So, shifting it by some degree, makes the model to predict something else.
Now, to understand it more clearly, you can try training a model on MNIST dataset and then take a picture of some (white on black) number in real life. It's almost certain that your model will predict this number wrong, because MNIST dataset is not a true representation of generic numbers. There are many things that can affect it's output such as lighting in the room, your camera configuration, relative size of the number in the image, etc.
I want to create a Neural Network in Keras for converting handwriting into computer letters.
My first step is to convert a sentence into an Array. My Array has the shape (1, number of letters,27). Now I want to input it in my Deep Neural Network and train.
But how do I input it properly if the dimension doesn't fit those from my image? And how do I achieve that my predict function gives me an output array of (1, number of letters,27)?
Seems like you are attempting to do Handwritten Recognition or similarly Optical Character Recognition or OCR. This is quite a broad field, and there are many ways to proceed. Even though, one approach I suggest is the following:
It is commonly known that Neural Networks have fixed size inputs, that is if you build it to take, say, inputs of shape (28,28,1) then the model will expect that shape as their inputs. Therefore, having a dimension in your samples that depends on the number of letters in a sentence (something variable) is not recommended, as you will not be able to train a model in such way with NNs.
Training such a model could be possible if you design it to predict one character at a time, instead a whole sentence that can have different lengths, and then group the predicted characters. The steps you could try to achieve this could be:
Obtain training samples for the characters you wish to recognize (like the MNIST database for example), and design and train your model to predict one character at a time.
Take the image with writing to classify and pass a Sliding Window over it that matches your expected input size (say a 28x28 window). Then, classify each of those windows to a character. Instead of Sliding Window, you could try isolating your desired features somehow and just classify those 28x28 segments instead.
Group the predicted characters somehow so you get words (probably grouping those separated by empty spaces) or do whatever you want with the predictions.
You can also try searching for tutorials or guides for Handwriting recognition like this one I have found quite useful. Hope this helps you get on track, good luck.
I have a project that use Deep CNN to classify parking lot. My idea is to classify every space whether there is a car or not. and my question is, how do i prepare my image dataset to train my model ?
i have downloaded PKLot dataset for training included negative and positive image.
should i turn all my data training image to grayscale ? should i rezise all my training image to one fix size? (but if i resize my training image to one fixed size, i have landscape and portrait image). Thanks :)
This is an extremely vague question since every image processing algorithm has different approaches to extracting features. However, in your parking lot example, you would probably need to do RGB to Greyscale conversion and Size normalization among other image processing techniques.
A great starting point would be in this link: http://www.scipy-lectures.org/advanced/image_processing/
First detect the cars present in the image, and obtain their size and alignment. Then go for segmentation and labeling of the parking lot by fixing a suitable size and alignment.
as you want use pklot dataset for training your machine and test with real data, the best approach is to make both datasets similar and homological, they must be normalized , fixed sized , gray-scaled and parameterized shapes. then you can use Scale-invariant feature transform (SIFT) for image feature extraction as basic method.the exact definition often depends on the problem or the type of application. Since features are used as the starting point and main primitives for subsequent algorithms, the overall algorithm will often only be as good as its feature detector. you can use these types of image features based on your problem:
Corners / interest points
Edges
Blobs / regions of interest points
Ridges
...