I have been studying word2vec model by Google. I was able to generate vectors for text word corpus for maximum 300 dimensions. It is a very impressive tool and accuracy goes much further, on big data.
I am curious, is there any way to use word2vec to generate vectors on grayscale images. I am sure the approach is same, you generate vectors based on pixel intensity and then compute a cosine similarity.
I am trying to do build a model to compute similarity distance on grayscale images. Any library is capable of doing this besides word2vec or glove that works on text?
I agree with you that word2vec is very impressive tool, but this model is trained by predicting the next word in some article or news. All in all, I think that using word2vec on image does not make sense.
You can use skimage to do some image measure. e.g skimage-measure
Word2vec is not a good model for images, however I think what you really need is a bag of word model. In a basic method of image comparison, you convert images to a list of key point features (e.g. SIFT, SURF or etc.), then you match clusters of points with each other (e.g. FLANN).
The high amount of features in an image and uncertainty of each point representation makes it difficult to use a basic one layer network learning such as word2vec for image recognition. You may find better examples in this tutorials.
UPDATE after 3 years: I should also mention ConvNets and several pre-trained models available now which you can extract visual features from pixels.
Related
I was trying to tackle an ML problem with tensor flow, but im not sure what algorithm should I use. I have tagged images on my dataset. When a new image comes on, i want the to correlate the images I have, based on the tags. Where should I start? O.o
What do you mean by correlate the images? Are you attempting to cluster the images based on their tags?
If so, you could train an encoder that runs over your images, produces a feature vector and cluster those feature vectors based on their image tags. So for example, consider you had multiple images of tags: cars & cats. You could run an encoder (consisting of convolutional layers), flatten the final layer to get a feature vector and run a clustering algorithm like K-means (with K=2, since you only have 2 tags -cars & cats).
Depending on the size and nature of the images in your dataset you might have to play around with the encoder architecture, collect more data, use alternate clustering algorithms etc.
In the event your image feature vector can belong to multiple classes and you would like to return possible tags, you'll have to opt for soft clustering algorithms such as GMMs (Gaussian Mixture Models) or FCMs (Fuzzy C Means). These algorithms don't specifically output class but outputs a class score for each data point. So if you want the top 5 tags of a new image, you could:
Run an encoder to get a feature vector
Perform soft clustering on the feature vectors
Get the 5 highest scoring classes
I have failed miserably trying to train a face verification network on my own hardware. Here by face verification i mean looking at two photos and telling its the same person or not. So any recommendations for pre trained models?
there are many articles on implementation of face-net for face identification but none for face verification. Can anyone guide me if you know of any pre trained models that i can use?
Basically, you will need a FaceNet pretrained model. A FaceNet model creates an embedding vector for a human face in an image. As mentioned in the paper, researchers have used clustering algorithms using the embedded face vectors. Hence, you get a 128 or 256-dimensional vector which represents that human face.
After you've generated an embedding vector from the images of the two subjects, you can find the cosine similarity of both the vectors, which is a common metric used for vectors comparison.
By some experimenting, you can find some threshold similarity score, meaning, if the similarity scores exceed this threshold score, the faces are of the same subjects.
You can discover some references here:
https://medium.com/#vinayakvarrier/building-a-real-time-face-recognition-system-using-pre-trained-facenet-model-f1a277a06947
https://machinelearningmastery.com/how-to-develop-a-face-recognition-system-using-facenet-in-keras-and-an-svm-classifier/
I am training a convolutional autoencoder on my own dataset. After training, the network is able to reconstruct the test images from the dataset quite well.
I am now taking the intermediate representation(1648-dim) from the encoder network and trying to cluster the feature vectors into 17(known upfront) different classes using a GMM soft clustering. However, the clusters are really bad and it is not able to cluster the images into its respective categories.
I am using sklearn.mixture.GaussianMixture package for clustering with a regularization of 0.01 and 'full' covariance_type.
My question: Why do you think that the reconstruction is very decent but the clustering is quite bad? Does it mean the intermediate features learned by the network is not adequate?
Lets revert the question - why do you think it should have any meaning? You are using clustering, which is just arbitrary method of splitting into groups yet you expect it will discover classes. Why would it do it? There is literally nothing forcing model to do so, and it is probably modeling completely different things (like patches of images, textures etc.). In general you should never expect clustering to solve the problem of some arbitrary labeling, this is not what clustering is for. To give you more perspective here - you have images, which come from say 10 categories (like cats, dogs etc.), and you ask:
why clustering in the feature space does not recover classes?
Note that equally valid questions would be:
why clustering in the features space does not divide images to "redish", "greenish" and "blueish"?
why clustering in the features space does not divide images by the size of the object on the image?
why clustering in the features space does not divide images by the country it is from?
There are exponentially many labelings to be assigned to each dataset, and nothing in your training uses any labels (autoencoding is unsupervised, clustering is unsupervised) so expecting that the result will magically guess which of so many labellings you have in mind is simply a wild guess, and the fact it does not do so means nothing. It is neither good nor bad. (Lets also ignore at this point how good can GMM be with ~1700 dimensional space. )
If you want a model to perform some task you have to give it a chance, train it to solve it. If you want to see if features learned are enough to recover categories then learn a classifier on them.
The purpose of the task is to classify images by means of SVM. The variable 'images' is supposed to contain the image information and correspondingly labels contains image labels. How can I build (what format and dimensions) should the images and labels have? I tried unsuccesfully images to be a Python array (appending flattened images) and then, in another attempt, Numpy arrays:
images=np.zeros((number_of_images, image_size))
labels=np.zeros((number_of_images, 1))
svm=cv2.SVM()
svm.train(images, labels)
Is it a right approach to the problem and if so, what is the correct way for training the classifier?
I don't think that you can use raw image data to train SVM model. Ok-ok, you can, but it won't be very fruitful.
The basic approach is to extract some features from each image and to use these features for training your model. A set of features forms a dictionary of words, each of which describes your image. Due to the fact that you are using the same set of words to describe each image, you can compare features corresponding to different images. This link introduces more details, check it.
Whats next?
Choose a feature extractor for your algo - HOG, SURF, SIFT (link)
Extract features from each image. You'll get an array of the same length as images array.
Initialize bag-of-words (BoG) model
Train SVM with BoG
Useful links:
C++ vey detailed example
Documentation for existing BOG classifier
I am new to image processing.As my project i am doing "image classifier using SVM".I have the idea of my final software "I select some image and give it as input to my software and it will classify that image .if i give the image of an animal it will classify it to cat or snake suitably"
When I google about it.it says "First you need to train SVM"
What it mean by Training SVM?
What is the actual input to SVM in my case(image classification)?
SVM is just a classifier how it classify images.Is it necessary for me to covert image to any particular format?.please help.
Support Vector Machine (SVM) is a machine learning model for supervised data classification. SVMs essentially learn a hyper-plane which separates the data space into 2 regions (in 2 class case). In your case, suppose you have images of snakes and cats and you need to classify them. The steps you'll need to follow are
Extract 'features' from the images.
These 'features' may be functions of appearance of snake/cat in your case e.g colour of the animal, shape of the animal etc. By concatenating these features you can get a multi-dimensional feature vector.
Train an SVM classifier
Training essentially learns a separating hyper-plane between the feature vectors of snake class and cat class . For example, if your feature vector is 2-dimensional, training an SVM classifier would amount to 'learning' a line which best separates your labeled-data/training-data.
You could use any of the multitude of freely available libraries of machine learning. In case you speak python, you could use sklearn for the task.
This task of learning (hyper-plane in linear SVM) is referred to training.
Classify the images.
Once you have trained your model, you could then use it classify images whose class is not known.
Note: I am simplifying a lot of details/issues involved in this answer. I suggest you should read-up about SVM