I am working on a system to simplify our image library which grows anywhere from 7k to 20k new pictures per week. The specific application is identifying which race cars are in pictures (all cars are similar shapes with different paint schemes). I plan to use python and tensorflow for this portion of the project.
My initial thought was to use image classification to classify the image by car; however, there is a very high probability of the picture containing multiple cars. My next thought is to use object detection to detect the car numbers (present in fixed location on all cars [nose, tail, both doors, and roof] and consistent font week to week). Lastly there is the approach of object recognition of the whole car. This, on the surface, seems to be the most practical; however, the paint schemes change enough that it may not be.
Which approach will give me the best results? I have pulled a large number of images out for training, and obviously the different methods require very different training datasets.
The best approach would be to use all 3 methods as an ensamble. You train all 3 of those models, and pass the input image to all 3 of them. Then, there are several ways how you can evaluate output.
You can sum up the probabilities for all of the classes for all 3 models and then draw a conclusion based on the highest probability.
You can get prediction from every model and decide based on number of votes: 1. model - class1, 2. model - class2, 3. model - class2 ==> class2
You can do something like weighted decision making. So, let's say that first model is the best and the most robust one but you don't trust it 100% and want to see what other models will say. Than you can weight the output of the first model with 0.6, and output of other two models with weight of 0.2.
I hope this helps :)
Related
To explain the title better, I am looking to classify pictures between two classes. For example, let's say that 0 is white, and black is 1. I train and validate the system with pictures that are gray, some lighter than others. In other words, none of the training/validation (t/v) pictures are 0, and none are 1. The t/v pictures range between 0 and 1 depending of how dark the gray is.
Of course, this is just a hypothetical situation, but I want to apply a similar scenario for my work. All of the information I have found online is based on a binary classification (either 1 or 0), rather than a spectrum classification (between 1 and 0).
I assume that this is possible, but I have no idea where to start. Although, I do have a binary code written with good accuracy.
Based on your given example, maybe a classification approach is not the best one. I think that what you have is a regression problem, as you want your output to be a continuous value in some range, that has a meaning itself (as higher or lower values have a proper meaning).
Regression tasks usually have an output with linear activation, and they expect to have a continuous value as the ground truth.
I think you could start by taking a look at this tutorial.
Hope this helps!
If I understand you correctly, it's definitely possible.
The creator of Keras, François Chollet, wrote Deep Learning with Python which is worth reading. In it he describes how you could accomplish what you would like.
I have worked through examples in his book and shared the code: whyboris/ml-with-python-and-keras
There are many approaches, but a fast one is to use a pre-trained model that can recognize a wide variety of images (for example, classify 1,000 different categories). You will use it "headless" (without the last classification layer that takes the vectors and decides which of the 1,000 categories it falls most into). And you will train just the "last step" in the model (freezing all the previous layers) while training your binary classifier.
Alternatively you could train your own classifier from scratch. Specifically glance at my example (based off the book) cat-dog-classifier which trains its own binary classifier.
I have trained tensorflow object detection model (for num_steps:50000) using SSD (mobilenet-v1) on custom dataset. I got mAP#.50IOU ~0.98 and loss ~1.17. The dataset consist of uno playing card images (skip, reverse, and draw four). On all these cards, model performs pretty well as I have trained model only on these 3 card (around 278 images with 829 bounding boxes (25% bounding box used for testing i.e. validation) collected using mobile phone).
However, I haven’t trained model on any other card but still it detects other cards (inference using webcam).
How can I fix this? Should I also collect other class images (anything other than skip, reverse and draw four cards) and ignore this class in operation? So that model sees this class i.e. Label: Other images during training and doesn’t put any label during inference.
I am not sure how to inform tensorflow object detection API that it should ignore images from Other class.
Can anyone please provide pointer?
Please share your views!
Yes, you need to have another Class which is the object you don't want to detect.
If you don't have this Other Class which includes everything that is not to be detected. The model will compare it to the existing class which is almost identical to the cards of interest.
Some of the factors are:
Similarity of Shape
Similarity of Color
Similarity of Symbols
This is why even though it is not the card of interest (Skip, Reverse, and Draw 4), it would somehow have high "belongingness" to these three classes.
Having another Class to dump all of these can significantly lessen the "belongingness" to the three classes of interest and as much as possible provide A LOT of Data during Training.
If you don't want to have another class.
You could overfit Skip, Reverse, and Draw 4 cards (close to 100%), then increase your threshold value of detection to (70-90%).
Hope this will help you.
I trained my CNN classifier (using tensorflow) with 3 data categories (ID card, passport, bills).
When I test it with images that belong to one of the 3 categories, it gives the right prediction. However, when I test it with a wrong image (a car image for example) it keeps giving me prediction (i.e. it predicts that the car belongs the ID card category).
Is there a way to make it display an error message instead of giving a wrong prediction?
This should be tackled differently. This is known as open set recognition problem. You can google it and find more about it but basically it's this:
You cannot train your classifier on every class imaginable. It will always run into some other class that it's not familiar with and that it hasn't already seen before.
There are a few solutions from which I will single out the 3 of them:
Separate binary classifier - You can build separate binary classifier that recognizes images and sorts them in two categories depending on if the bill, passport or ID are in the image or not. If they are, it should let the algorithm you have already build to process the image and classify it into one of the 3 categories. If the first classifier says that some other object is in the image, you can immediately discard the image because it's not the image of bill/passport/ID.
Thresholding. In the case when the ID is on the image, probability of the ID is high and probabilities for bill and passport are fairly low. In the case when the image is something else (ex. a car), the probabilities are most probably about the same for all 3 classes. In other words, probability for neither of the classes really stand out. That is a situation in which you pick the highest probability of the ones generated and set the output class to be the class of that probability, regardless the value of probability is 0.4 or something like that. To resolve this, you can set a threshold at, let's say 0.7, and say if neither of probabilities is over that threshold, there is something else on the picture (not ID, passport or bill).
Create the fourth class: Unknown. If you pick this option, you should add few of the other images to the dataset and label them unknown. Then train the classifier and see what the result is.
I would recommend 1 or 2. Hope it helps :)
This is not really a programming problem, its way more complicated. What you want is called Out of Distribution detection, where the classifier has a way to tell you that the sample is not on the training set.
There are recent research papers that deal with this problem, such as https://arxiv.org/abs/1802.04865 and https://arxiv.org/abs/1711.09325
In general you cannot use a model that has not been trained specifically for this, for example, the probabilities produced by a softmax classifier are not calibrated for this purpose, so thresholding these probabilities will not work at all.
Easiest way is to simply add a fourth category for anything but the other three and train it with various completely random photos.
I was searching for same solution and it brought me here. To solve this, I used math.isclose() function to compare the values of my prediction.
def check_distribution(self, prediction):
checker = [x for x in prediction[0] if math.isclose(1, x, abs_tol=1e-9) ]
for probability in prediction[0]:
if len(checker) > 0:
return True
else:
return False
Feel free to alter the abs_tol parameter depending on how brutal you want to be.
I'm pretty new to object detection. I'm using tensorflow object detection API and I'm now collecting datasets for my project
and model_main.py to train my model.
I have found and transformed two quite large datasets of cars and traffic lights with annotations. And made two tfrecords from them.
Now I want to train a pretrained model however, I'm just curious will it work? When it is possible that an image for example "001.jpg" will have of course some annotated bounding boxes of cars (it is from the car dataset) but if there is a traffic light as well it wouldn't be annotated -> will it lead to bad learning rate? (there can be many of theese "problematic" images) How should I improve this? Is there any workaround? (I really don't want to annotate the images again)
If its stupid question I'm sorry, thanks for any response - some links with this problematic would be the best !
Thanks !
The short answer is yes, it might be problematic, but with some effort you can make it possible.
If you have two urban datasets, and in one you only have annotations for traffic lights, and in the second you only have annotations for cars, then each instance of car in the first dataset will be learned as false example, and each instance of traffic light in the second dataset will be learned as false example.
The two possible outcomes I can think of are:
The model will not converge, since it tries to learn opposite things.
The model will converge, but will be domain specific. This means that the model will only detect traffic lights on images from the domain of the first dataset, and cars on the second.
In fact I tried doing so myself in a different setup, and got this outcome.
In order to be able to learn your objective of learning traffic lights and cars no matter which dataset they come from, you'll need to modify your loss function. You need to tell the loss function from which dataset each image comes from, and then only compute the loss on the corresponding classes (/zero out the loss on the classes do not correspond to it). So returning to our example, you only compute loss and backpropagate traffic lights on the first dataset, and cars on the second.
For completeness I will add that if resources are available, then the better option is to annotate all the classes on all datasets in order to avoid the suggested modification, since by only backpropagating certain classes, you do not enjoy using actual false examples for other classes.
I have a use case where I have about 300 images out of 300 different items. I need machine learning to detect an item about once a minute.
I've been using Keras with Sequential to detect images but I'm wondering what I should take into consideration when I have 300 labels and only one image per label for learning.
So in short:
1) Can you do machine learning image detection with one learning image per label?
2) Are there any special things I take into consideration?
If this were a special case -- say, one class in 100 was represented by a single training image -- then you might get away with it. However, a unique image per class is asking for trouble.
A neural network learns by iterative correction, figuring out what features and combinations are important, and which are not, in discriminating the classes from one another. Training starts by a chaotic process that has some similarities to research: look at the available data, form hypotheses, and test then against the real world.
In a NN, the "hypotheses" are the various kernels it develops. Each kernel is a pattern to recognize something important to the discrimination process. If you lack enough examples for the model to generalize and discriminate for each class, then you run the risk (actually, you have the likelihood) of the model making a conclusion that is valid for the one input image, but not others in the same class.
For instance, one acquaintance of mine did the canonical cat-or-dog model, using his own photos, showing the pets of his own household and those of a couple of friends. The model trained well, identified cats and dogs with 100% accuracy on the test data, and he brought it into work ...
... where it failed, having an accuracy of about 65% (random guessing is 50%). He did some analysis and found the problem: his friends have indoor cats, but their preferred dog photos were out of doors. Very simply, the model had learned to identify not cats vs dogs, but rather couches and kitchen cabinets vs outdoor foliage. One of the main filters was of large, textured, green areas. Yes, a dog is a large, textured, green being. :-)
The only way your one-shot training would work is if each of your training images was specifically designed to include exactly those features that differentiate this class from the other 299, and no other visual information. Unfortunately, to identify what features those might be, and to provide canonical training photos, you'd have to know in advance what patterns the model needed to pick.
This entirely defeats the use case of deep learning and model training.
If you were to only train on that image once, it probably wouldn't be able to detect it yet. If you train it more, it will probably over fit and only recognize that one image. If that is what you are trying to do then you should make an algorithm to search the screen for that image (it will be more efficient).
1) You'll probably have problems with the generalization of your models because the lack of training set. In other words, your model will not "learn" about that class.
2) It's good to have a better training set in order to create a better model.