I trained a road sign detection network. In the training data, the sign occupies the entire frame, like so:
However in the images which I want to use for predictions, road signs occupy a much smaller space, for example:
Predictions for such images are not very good, however if I crop to just the sign the predictions are fine.
How do I go about generating predictions for larger images?
I haven't been able to find an answer in similar questions unfortunately.
It sounds like you're trying to solve a different kind of problem when you want to extend your classification of individual signs to "detecting" them and classifying them inside a larger image.
You have (at least) a couple of options:
Create sliding-window that sweeps the image and makes a classification of each step. In this way when you hit the sign it will return a good classification. But you'll quickly realize that this is not very practical or efficient. The window size and stepping size become more parameters to optimize and as you'll see in the following option, there are object-detection specific methods that already try to solve this specific problem.
You can try an object detection architecture. This will require you to come up with a training dataset that's different from the one you used in your image classification. You'll need many (hundreds or thousands) of the "large" version of your image that contain (and in some cases doesn't contain) the signs you want to identify. You'll need a annotation tool to locate and label those signs and then you can train a network to locate and label them.
Some of the architectures to look up for that second option include: YOLO, Single Shot Detection (SSD), Faster RCNN, to name a few.
Related
To explain the title better, I am looking to classify pictures between two classes. For example, let's say that 0 is white, and black is 1. I train and validate the system with pictures that are gray, some lighter than others. In other words, none of the training/validation (t/v) pictures are 0, and none are 1. The t/v pictures range between 0 and 1 depending of how dark the gray is.
Of course, this is just a hypothetical situation, but I want to apply a similar scenario for my work. All of the information I have found online is based on a binary classification (either 1 or 0), rather than a spectrum classification (between 1 and 0).
I assume that this is possible, but I have no idea where to start. Although, I do have a binary code written with good accuracy.
Based on your given example, maybe a classification approach is not the best one. I think that what you have is a regression problem, as you want your output to be a continuous value in some range, that has a meaning itself (as higher or lower values have a proper meaning).
Regression tasks usually have an output with linear activation, and they expect to have a continuous value as the ground truth.
I think you could start by taking a look at this tutorial.
Hope this helps!
If I understand you correctly, it's definitely possible.
The creator of Keras, François Chollet, wrote Deep Learning with Python which is worth reading. In it he describes how you could accomplish what you would like.
I have worked through examples in his book and shared the code: whyboris/ml-with-python-and-keras
There are many approaches, but a fast one is to use a pre-trained model that can recognize a wide variety of images (for example, classify 1,000 different categories). You will use it "headless" (without the last classification layer that takes the vectors and decides which of the 1,000 categories it falls most into). And you will train just the "last step" in the model (freezing all the previous layers) while training your binary classifier.
Alternatively you could train your own classifier from scratch. Specifically glance at my example (based off the book) cat-dog-classifier which trains its own binary classifier.
Model has been trained, it reliably recognises dogs from cats in tiny pictures like the following:
All these pictures are pretty much always centered on the cat/dog, and the cat/dog occupies almost all the image frame. There is little to none additional surrounding context, which allows the network to train very efficiently.
The next step is, how to make sure that the same model will effectively tell that in the picture below, there happens to be a cat, similar to the ones used to train the model, but surrounded by a broader environment?
Are there some specific steps to take when the model is supposed to be used in production with images showing a broader context than in training? Or is the model able to detect it automagically?
It decreasing order of effectiveness, the steps you can take are:
Use more training data, with images having larger borders.
Augment existing training images with borders, maybe through random or mirrored padding.
Try cropping out the borders during inference, creating multiple images with different borders. Pick the run with the best result.
I have a use case where I have about 300 images out of 300 different items. I need machine learning to detect an item about once a minute.
I've been using Keras with Sequential to detect images but I'm wondering what I should take into consideration when I have 300 labels and only one image per label for learning.
So in short:
1) Can you do machine learning image detection with one learning image per label?
2) Are there any special things I take into consideration?
If this were a special case -- say, one class in 100 was represented by a single training image -- then you might get away with it. However, a unique image per class is asking for trouble.
A neural network learns by iterative correction, figuring out what features and combinations are important, and which are not, in discriminating the classes from one another. Training starts by a chaotic process that has some similarities to research: look at the available data, form hypotheses, and test then against the real world.
In a NN, the "hypotheses" are the various kernels it develops. Each kernel is a pattern to recognize something important to the discrimination process. If you lack enough examples for the model to generalize and discriminate for each class, then you run the risk (actually, you have the likelihood) of the model making a conclusion that is valid for the one input image, but not others in the same class.
For instance, one acquaintance of mine did the canonical cat-or-dog model, using his own photos, showing the pets of his own household and those of a couple of friends. The model trained well, identified cats and dogs with 100% accuracy on the test data, and he brought it into work ...
... where it failed, having an accuracy of about 65% (random guessing is 50%). He did some analysis and found the problem: his friends have indoor cats, but their preferred dog photos were out of doors. Very simply, the model had learned to identify not cats vs dogs, but rather couches and kitchen cabinets vs outdoor foliage. One of the main filters was of large, textured, green areas. Yes, a dog is a large, textured, green being. :-)
The only way your one-shot training would work is if each of your training images was specifically designed to include exactly those features that differentiate this class from the other 299, and no other visual information. Unfortunately, to identify what features those might be, and to provide canonical training photos, you'd have to know in advance what patterns the model needed to pick.
This entirely defeats the use case of deep learning and model training.
If you were to only train on that image once, it probably wouldn't be able to detect it yet. If you train it more, it will probably over fit and only recognize that one image. If that is what you are trying to do then you should make an algorithm to search the screen for that image (it will be more efficient).
1) You'll probably have problems with the generalization of your models because the lack of training set. In other words, your model will not "learn" about that class.
2) It's good to have a better training set in order to create a better model.
I am working with Python, scikit-learn and keras. I have 3000 thousands images of front-faced watches like the following ones:
Watch_1, Watch_2, Watch_3.
I like to write program which receives as an input a photo of a real watch which maybe taken under less ideal conditions than the photos above (different background colour, darker lightning etc) and find the most similar watches among the 3000 ones to it. By similarity I mean that if I give as an input a photo of a round, brown watch with thin lace then I expect as an output watches of round shape, of dark colour and with thin lace.
What is the most efficient machine learning algorithm to do this?
For example, by following this link I have two different solutions in my mind:
1) Using a CNN as a feature extractor and compare the distances between the these features for every pair of images with reference to the input image.
2) Using two CNNs in a Siamese Neural Network to compare the images.
Are these two options the best ones for this task or would you suggest something else?
Do you know any pre-trained neural network (with pre-determined hyperparameters) for this task?
I have found some interesting posts on StackOverflow about this but they are pretty old: Post_1, Post_2, Post_3.
It's going to be difficult to define what exactly you mean by similar with your photos. Since they are all watches you'll have to decide which features being most similar matter to you (shape, color, numbers/blank face etc.)
Here is an approach using the tensorflow library mixed with a nearest neighbor library with example code: http://douglasduhaime.com/posts/identifying-similar-images-with-tensorflow.html
It can at least get you started.
I have a project that use Deep CNN to classify parking lot. My idea is to classify every space whether there is a car or not. and my question is, how do i prepare my image dataset to train my model ?
i have downloaded PKLot dataset for training included negative and positive image.
should i turn all my data training image to grayscale ? should i rezise all my training image to one fix size? (but if i resize my training image to one fixed size, i have landscape and portrait image). Thanks :)
This is an extremely vague question since every image processing algorithm has different approaches to extracting features. However, in your parking lot example, you would probably need to do RGB to Greyscale conversion and Size normalization among other image processing techniques.
A great starting point would be in this link: http://www.scipy-lectures.org/advanced/image_processing/
First detect the cars present in the image, and obtain their size and alignment. Then go for segmentation and labeling of the parking lot by fixing a suitable size and alignment.
as you want use pklot dataset for training your machine and test with real data, the best approach is to make both datasets similar and homological, they must be normalized , fixed sized , gray-scaled and parameterized shapes. then you can use Scale-invariant feature transform (SIFT) for image feature extraction as basic method.the exact definition often depends on the problem or the type of application. Since features are used as the starting point and main primitives for subsequent algorithms, the overall algorithm will often only be as good as its feature detector. you can use these types of image features based on your problem:
Corners / interest points
Edges
Blobs / regions of interest points
Ridges
...