I'm trying to use tensorflow with python to recognize some roi with size 28x28. At the first time i use this code : https://github.com/niektemme/tensorflow-mnist-predict/blob/master/predict_2.py, and he recognize 4/5 numbers in 10, so i try to modify the code, now i can see the percentage of precision and recognize 8/9 numbers in 10. The problem is i need to recognize all numbers and i see if i change the posiztion of the 20x20 inside the 28x28 it recognize every number, so, how tensorflow works? I read many documents about tensorflow, and i don't understand how it works, why if i move the 20x20 of 1 pixel change totally the number?
this is my number: https://imgur.com/a/juOLd, it recognize it like 5, but if i move it 1 pixel down and 1 pixel right it find it like 3, why?
tldr; It's not tensorflow, it's the model
First of all, tensorflow does not do prediction or anything. It is just a fast mathematical operations library with added support for automatic gradients and some other nice powers, which turns out to be very helpful for Machine Learning.
Now, onto your question, why shifting 1 pixel down, or above changes the prediction? It's the model (or more specifically the data). Your model have learned something (from the data) that is relative to the position of your number in the image. So, shifting it by some degree, makes the model to predict something else.
Now, to understand it more clearly, you can try training a model on MNIST dataset and then take a picture of some (white on black) number in real life. It's almost certain that your model will predict this number wrong, because MNIST dataset is not a true representation of generic numbers. There are many things that can affect it's output such as lighting in the room, your camera configuration, relative size of the number in the image, etc.
Related
To explain the title better, I am looking to classify pictures between two classes. For example, let's say that 0 is white, and black is 1. I train and validate the system with pictures that are gray, some lighter than others. In other words, none of the training/validation (t/v) pictures are 0, and none are 1. The t/v pictures range between 0 and 1 depending of how dark the gray is.
Of course, this is just a hypothetical situation, but I want to apply a similar scenario for my work. All of the information I have found online is based on a binary classification (either 1 or 0), rather than a spectrum classification (between 1 and 0).
I assume that this is possible, but I have no idea where to start. Although, I do have a binary code written with good accuracy.
Based on your given example, maybe a classification approach is not the best one. I think that what you have is a regression problem, as you want your output to be a continuous value in some range, that has a meaning itself (as higher or lower values have a proper meaning).
Regression tasks usually have an output with linear activation, and they expect to have a continuous value as the ground truth.
I think you could start by taking a look at this tutorial.
Hope this helps!
If I understand you correctly, it's definitely possible.
The creator of Keras, François Chollet, wrote Deep Learning with Python which is worth reading. In it he describes how you could accomplish what you would like.
I have worked through examples in his book and shared the code: whyboris/ml-with-python-and-keras
There are many approaches, but a fast one is to use a pre-trained model that can recognize a wide variety of images (for example, classify 1,000 different categories). You will use it "headless" (without the last classification layer that takes the vectors and decides which of the 1,000 categories it falls most into). And you will train just the "last step" in the model (freezing all the previous layers) while training your binary classifier.
Alternatively you could train your own classifier from scratch. Specifically glance at my example (based off the book) cat-dog-classifier which trains its own binary classifier.
I trained a road sign detection network. In the training data, the sign occupies the entire frame, like so:
However in the images which I want to use for predictions, road signs occupy a much smaller space, for example:
Predictions for such images are not very good, however if I crop to just the sign the predictions are fine.
How do I go about generating predictions for larger images?
I haven't been able to find an answer in similar questions unfortunately.
It sounds like you're trying to solve a different kind of problem when you want to extend your classification of individual signs to "detecting" them and classifying them inside a larger image.
You have (at least) a couple of options:
Create sliding-window that sweeps the image and makes a classification of each step. In this way when you hit the sign it will return a good classification. But you'll quickly realize that this is not very practical or efficient. The window size and stepping size become more parameters to optimize and as you'll see in the following option, there are object-detection specific methods that already try to solve this specific problem.
You can try an object detection architecture. This will require you to come up with a training dataset that's different from the one you used in your image classification. You'll need many (hundreds or thousands) of the "large" version of your image that contain (and in some cases doesn't contain) the signs you want to identify. You'll need a annotation tool to locate and label those signs and then you can train a network to locate and label them.
Some of the architectures to look up for that second option include: YOLO, Single Shot Detection (SSD), Faster RCNN, to name a few.
I trained my CNN classifier (using tensorflow) with 3 data categories (ID card, passport, bills).
When I test it with images that belong to one of the 3 categories, it gives the right prediction. However, when I test it with a wrong image (a car image for example) it keeps giving me prediction (i.e. it predicts that the car belongs the ID card category).
Is there a way to make it display an error message instead of giving a wrong prediction?
This should be tackled differently. This is known as open set recognition problem. You can google it and find more about it but basically it's this:
You cannot train your classifier on every class imaginable. It will always run into some other class that it's not familiar with and that it hasn't already seen before.
There are a few solutions from which I will single out the 3 of them:
Separate binary classifier - You can build separate binary classifier that recognizes images and sorts them in two categories depending on if the bill, passport or ID are in the image or not. If they are, it should let the algorithm you have already build to process the image and classify it into one of the 3 categories. If the first classifier says that some other object is in the image, you can immediately discard the image because it's not the image of bill/passport/ID.
Thresholding. In the case when the ID is on the image, probability of the ID is high and probabilities for bill and passport are fairly low. In the case when the image is something else (ex. a car), the probabilities are most probably about the same for all 3 classes. In other words, probability for neither of the classes really stand out. That is a situation in which you pick the highest probability of the ones generated and set the output class to be the class of that probability, regardless the value of probability is 0.4 or something like that. To resolve this, you can set a threshold at, let's say 0.7, and say if neither of probabilities is over that threshold, there is something else on the picture (not ID, passport or bill).
Create the fourth class: Unknown. If you pick this option, you should add few of the other images to the dataset and label them unknown. Then train the classifier and see what the result is.
I would recommend 1 or 2. Hope it helps :)
This is not really a programming problem, its way more complicated. What you want is called Out of Distribution detection, where the classifier has a way to tell you that the sample is not on the training set.
There are recent research papers that deal with this problem, such as https://arxiv.org/abs/1802.04865 and https://arxiv.org/abs/1711.09325
In general you cannot use a model that has not been trained specifically for this, for example, the probabilities produced by a softmax classifier are not calibrated for this purpose, so thresholding these probabilities will not work at all.
Easiest way is to simply add a fourth category for anything but the other three and train it with various completely random photos.
I was searching for same solution and it brought me here. To solve this, I used math.isclose() function to compare the values of my prediction.
def check_distribution(self, prediction):
checker = [x for x in prediction[0] if math.isclose(1, x, abs_tol=1e-9) ]
for probability in prediction[0]:
if len(checker) > 0:
return True
else:
return False
Feel free to alter the abs_tol parameter depending on how brutal you want to be.
I am trying to train my model which classifies images.
The problem I have is, they have different sizes. how should i format my images/or model architecture ?
You didn't say what architecture you're talking about. Since you said you want to classify images, I'm assuming it's a partly convolutional, partly fully connected network like AlexNet, GoogLeNet, etc. In general, the answer to your question depends on the network type you are working with.
If, for example, your network only contains convolutional units - that is to say, does not contain fully connected layers - it can be invariant to the input image's size. Such a network could process the input images and in turn return another image ("convolutional all the way"); you would have to make sure that the output matches what you expect, since you have to determine the loss in some way, of course.
If you are using fully connected units though, you're up for trouble: Here you have a fixed number of learned weights your network has to work with, so varying inputs would require a varying number of weights - and that's not possible.
If that is your problem, here's some things you can do:
Don't care about squashing the images. A network might learn to make sense of the content anyway; does scale and perspective mean anything to the content anyway?
Center-crop the images to a specific size. If you fear you're losing data, do multiple crops and use these to augment your input data, so that the original image will be split into N different images of correct size.
Pad the images with a solid color to a squared size, then resize.
Do a combination of that.
The padding option might introduce an additional error source to the network's prediction, as the network might (read: likely will) be biased to images that contain such a padded border.
If you need some ideas, have a look at the Images section of the TensorFlow documentation, there's pieces like resize_image_with_crop_or_pad that take away the bigger work.
As for just don't caring about squashing, here's a piece of the preprocessing pipeline of the famous Inception network:
# This resizing operation may distort the images because the aspect
# ratio is not respected. We select a resize method in a round robin
# fashion based on the thread number.
# Note that ResizeMethod contains 4 enumerated resizing methods.
# We select only 1 case for fast_mode bilinear.
num_resize_cases = 1 if fast_mode else 4
distorted_image = apply_with_random_selector(
distorted_image,
lambda x, method: tf.image.resize_images(x, [height, width], method=method),
num_cases=num_resize_cases)
They're totally aware of it and do it anyway.
Depending on how far you want or need to go, there actually is a paper here called Spatial Pyramid Pooling in Deep Convolution Networks for Visual Recognition that handles inputs of arbitrary sizes by processing them in a very special way.
Try making a spatial pyramid pooling layer. Then put it after your last convolution layer so that the FC layers always get constant dimensional vectors as input . During training , train the images from the entire dataset using a particular image size for one epoch . Then for the next epoch , switch to a different image size and continue training .
I'm trying to build a multilabel-classifier to predict the probabilities of some input data being either 0 or 1. I'm using a neural network and Tensorflow + Keras (maybe a CNN later).
The problem is the following:
The data is highly skewed. There are a lot more negative examples than positive maybe 90:10. So my neural network nearly always outputs very low probabilities for positive examples. Using binary numbers it would predict 0 in most of the cases.
The performance is > 95% for nearly all classes, but this is due to the fact that it nearly always predicts zero...
Therefore the number of false negatives is very high.
Some suggestions how to fix this?
Here are the ideas I considered so far:
Punishing false negatives more with a customized loss function (my first attempt failed). Similar to class weighting positive examples inside a class more than negative ones. This is similar to class weights but within a class.
How would you implement this in Keras?
Oversampling positive examples by cloning them and then overfitting the neural network such that positive and negative examples are balanced.
Thanks in advance!
You're on the right track.
Usually, you would either balance your data set before training, i.e. reducing the over-represented class or generate artificial (augmented) data for the under-represented class to boost its occurrence.
Reduce over-represented class
This one is simpler, you would just randomly pick as many samples as there are in the under-represented class, discard the rest and train with the new subset. The disadvantage of course is that you're losing some learning potential, depending on how complex (how many features) your task has.
Augment data
Depending on the kind of data you're working with, you can "augment" data. That just means that you take existing samples from your data and slightly modify them and use them as additional samples. This works very well with image data, sound data. You could flip/rotate, scale, add-noise, in-/decrease brightness, scale, crop etc.
The important thing here is that you stay within bounds of what could happen in the real world. If for example you want to recognize a "70mph speed limit" sign, well, flipping it doesn't make sense, you will never encounter an actual flipped 70mph sign. If you want to recognize a flower, flipping or rotating it is permissible. Same for sound, changing volume / frequency slighty won't matter much. But reversing the audio track changes its "meaning" and you won't have to recognize backwards spoken words in the real world.
Now if you have to augment tabular data like sales data, metadata, etc... that's much trickier as you have to be careful not to implicitly feed your own assumptions into the model.
I think your two suggestions are already quite good.
You can also simply undersample the negativ class, of course.
def balance_occurences(dataframe, zielspalte=target_name, faktor=1):
least_frequent_observation=dataframe[zielspalte].value_counts().idxmin()
bottleneck=len(dataframe[dataframe[zielspalte]==least_frequent_observation])
balanced_indices=dataframe.index[dataframe[zielspalte]==least_frequent_observation].tolist()
for value in (set(dataframe[zielspalte])-{least_frequent_observation}):
full_list=dataframe.index[dataframe[zielspalte]==value].tolist()
selection=np.random.choice(a=full_list,size=bottleneck*faktor, replace=False)
balanced_indices=np.append(balanced_indices,selection)
df_balanced=dataframe[dataframe.index.isin(balanced_indices)]
return df_balanced
Your loss function could look into the recall of the positive class combined with some other measurement.