Find most similar images by using neural networks - python

I am working with Python, scikit-learn and keras. I have 3000 thousands images of front-faced watches like the following ones:
Watch_1, Watch_2, Watch_3.
I like to write program which receives as an input a photo of a real watch which maybe taken under less ideal conditions than the photos above (different background colour, darker lightning etc) and find the most similar watches among the 3000 ones to it. By similarity I mean that if I give as an input a photo of a round, brown watch with thin lace then I expect as an output watches of round shape, of dark colour and with thin lace.
What is the most efficient machine learning algorithm to do this?
For example, by following this link I have two different solutions in my mind:
1) Using a CNN as a feature extractor and compare the distances between the these features for every pair of images with reference to the input image.
2) Using two CNNs in a Siamese Neural Network to compare the images.
Are these two options the best ones for this task or would you suggest something else?
Do you know any pre-trained neural network (with pre-determined hyperparameters) for this task?
I have found some interesting posts on StackOverflow about this but they are pretty old: Post_1, Post_2, Post_3.

It's going to be difficult to define what exactly you mean by similar with your photos. Since they are all watches you'll have to decide which features being most similar matter to you (shape, color, numbers/blank face etc.)
Here is an approach using the tensorflow library mixed with a nearest neighbor library with example code: http://douglasduhaime.com/posts/identifying-similar-images-with-tensorflow.html
It can at least get you started.

Related

Is it possible to use Keras to classify images between one option or another?

To explain the title better, I am looking to classify pictures between two classes. For example, let's say that 0 is white, and black is 1. I train and validate the system with pictures that are gray, some lighter than others. In other words, none of the training/validation (t/v) pictures are 0, and none are 1. The t/v pictures range between 0 and 1 depending of how dark the gray is.
Of course, this is just a hypothetical situation, but I want to apply a similar scenario for my work. All of the information I have found online is based on a binary classification (either 1 or 0), rather than a spectrum classification (between 1 and 0).
I assume that this is possible, but I have no idea where to start. Although, I do have a binary code written with good accuracy.
Based on your given example, maybe a classification approach is not the best one. I think that what you have is a regression problem, as you want your output to be a continuous value in some range, that has a meaning itself (as higher or lower values have a proper meaning).
Regression tasks usually have an output with linear activation, and they expect to have a continuous value as the ground truth.
I think you could start by taking a look at this tutorial.
Hope this helps!
If I understand you correctly, it's definitely possible.
The creator of Keras, François Chollet, wrote Deep Learning with Python which is worth reading. In it he describes how you could accomplish what you would like.
I have worked through examples in his book and shared the code: whyboris/ml-with-python-and-keras
There are many approaches, but a fast one is to use a pre-trained model that can recognize a wide variety of images (for example, classify 1,000 different categories). You will use it "headless" (without the last classification layer that takes the vectors and decides which of the 1,000 categories it falls most into). And you will train just the "last step" in the model (freezing all the previous layers) while training your binary classifier.
Alternatively you could train your own classifier from scratch. Specifically glance at my example (based off the book) cat-dog-classifier which trains its own binary classifier.

Object detection where the object occupies a small part of the image

I trained a road sign detection network. In the training data, the sign occupies the entire frame, like so:
However in the images which I want to use for predictions, road signs occupy a much smaller space, for example:
Predictions for such images are not very good, however if I crop to just the sign the predictions are fine.
How do I go about generating predictions for larger images?
I haven't been able to find an answer in similar questions unfortunately.
It sounds like you're trying to solve a different kind of problem when you want to extend your classification of individual signs to "detecting" them and classifying them inside a larger image.
You have (at least) a couple of options:
Create sliding-window that sweeps the image and makes a classification of each step. In this way when you hit the sign it will return a good classification. But you'll quickly realize that this is not very practical or efficient. The window size and stepping size become more parameters to optimize and as you'll see in the following option, there are object-detection specific methods that already try to solve this specific problem.
You can try an object detection architecture. This will require you to come up with a training dataset that's different from the one you used in your image classification. You'll need many (hundreds or thousands) of the "large" version of your image that contain (and in some cases doesn't contain) the signs you want to identify. You'll need a annotation tool to locate and label those signs and then you can train a network to locate and label them.
Some of the architectures to look up for that second option include: YOLO, Single Shot Detection (SSD), Faster RCNN, to name a few.

Question about deep learning implementation

I'm somewhat new to coding but am currently trying to use applied machine learning for the purposes of a biological research project. I am trying to create a neural network that will take an input of a green-channel protein image and guess the red-channel of the same image.
I have a dataset of 10,000 40x40 images that have both red and green channel variants (so 20,000 images in total). To clarify, the images are all grayscale, red channel is just used to signify that proteins stained with a red dye would appear in a different spot than proteins with a green dye. Basically, I want to input an image of one set of proteins and have it guess the location of the other set, then compare with the known paired image of the other set.
I am pretty new to Python and Generative Deep Learning. I understand how it works conceptually, but am wondering what the best architecture for this project would be. I was considering a GAN but am unsure how the discriminator would be trained in this case.
Are there any architectures anyone could recommend I take a look at that I could modify for the purpose of this project?

Using machine learning to detect images based on single learning image

I have a use case where I have about 300 images out of 300 different items. I need machine learning to detect an item about once a minute.
I've been using Keras with Sequential to detect images but I'm wondering what I should take into consideration when I have 300 labels and only one image per label for learning.
So in short:
1) Can you do machine learning image detection with one learning image per label?
2) Are there any special things I take into consideration?
If this were a special case -- say, one class in 100 was represented by a single training image -- then you might get away with it. However, a unique image per class is asking for trouble.
A neural network learns by iterative correction, figuring out what features and combinations are important, and which are not, in discriminating the classes from one another. Training starts by a chaotic process that has some similarities to research: look at the available data, form hypotheses, and test then against the real world.
In a NN, the "hypotheses" are the various kernels it develops. Each kernel is a pattern to recognize something important to the discrimination process. If you lack enough examples for the model to generalize and discriminate for each class, then you run the risk (actually, you have the likelihood) of the model making a conclusion that is valid for the one input image, but not others in the same class.
For instance, one acquaintance of mine did the canonical cat-or-dog model, using his own photos, showing the pets of his own household and those of a couple of friends. The model trained well, identified cats and dogs with 100% accuracy on the test data, and he brought it into work ...
... where it failed, having an accuracy of about 65% (random guessing is 50%). He did some analysis and found the problem: his friends have indoor cats, but their preferred dog photos were out of doors. Very simply, the model had learned to identify not cats vs dogs, but rather couches and kitchen cabinets vs outdoor foliage. One of the main filters was of large, textured, green areas. Yes, a dog is a large, textured, green being. :-)
The only way your one-shot training would work is if each of your training images was specifically designed to include exactly those features that differentiate this class from the other 299, and no other visual information. Unfortunately, to identify what features those might be, and to provide canonical training photos, you'd have to know in advance what patterns the model needed to pick.
This entirely defeats the use case of deep learning and model training.
If you were to only train on that image once, it probably wouldn't be able to detect it yet. If you train it more, it will probably over fit and only recognize that one image. If that is what you are trying to do then you should make an algorithm to search the screen for that image (it will be more efficient).
1) You'll probably have problems with the generalization of your models because the lack of training set. In other words, your model will not "learn" about that class.
2) It's good to have a better training set in order to create a better model.

How to classify continuous audio

I have a audio data set and each of them has different length. There are some events in these audios, that I want to train and test but these events are placed randomly, plus the lengths are different, it is really hard to build a machine learning system with using that dataset. I thought fixing a default size of length and build a multilayer NN however, the length's of events are also different. Then I thought about using CNN, like it is used to recognise patterns or multiple humans on an image. The problem for that one is I am really struggling when I try to understand the audio file.
So, my questions, Is there anyone who can give me some tips about building a machine learning system that classifies different types of defined events with training itself on a dataset that has these events randomly(1 data contains more than 1 events and they are different from each other.) and each of them has different lenghts?
I will be so appreciated if anyone helps.
First, you need to annotate your events in the sound streams, i.e. specify bounds and labels for them.
Then, convert your sounds into sequences of feature vectors using signal framing. Typical choices are MFCCs or log-mel filtebank features (the latter corresponds to a spectrogram of a sound). Having done this, you will convert your sounds into sequences of fixed-size feature vectors that can be fed into a classifier. See this. for better explanation.
Since typical sounds have a longer duration than an analysis frame, you probably need to stack several contiguous feature vectors using sliding window and use these stacked frames as input to your NN.
Now you have a) input data and b) annotations for each window of analysis. So, you can try to train a DNN or a CNN or a RNN to predict a sound class for each window. This task is known as spotting. I suggest you to read Sainath, T. N., & Parada, C. (2015). Convolutional Neural Networks for Small-footprint Keyword Spotting. In Proceedings INTERSPEECH (pp. 1478–1482) and to follow its references for more details.
You can use a recurrent neural network (RNN).
https://www.tensorflow.org/versions/r0.12/tutorials/recurrent/index.html
The input data is a sequence and you can put a label in every sample of the time series.
For example a LSTM (a kind of RNN) is available in libraries like tensorflow.

Categories