The purpose of the task is to classify images by means of SVM. The variable 'images' is supposed to contain the image information and correspondingly labels contains image labels. How can I build (what format and dimensions) should the images and labels have? I tried unsuccesfully images to be a Python array (appending flattened images) and then, in another attempt, Numpy arrays:
images=np.zeros((number_of_images, image_size))
labels=np.zeros((number_of_images, 1))
svm=cv2.SVM()
svm.train(images, labels)
Is it a right approach to the problem and if so, what is the correct way for training the classifier?
I don't think that you can use raw image data to train SVM model. Ok-ok, you can, but it won't be very fruitful.
The basic approach is to extract some features from each image and to use these features for training your model. A set of features forms a dictionary of words, each of which describes your image. Due to the fact that you are using the same set of words to describe each image, you can compare features corresponding to different images. This link introduces more details, check it.
Whats next?
Choose a feature extractor for your algo - HOG, SURF, SIFT (link)
Extract features from each image. You'll get an array of the same length as images array.
Initialize bag-of-words (BoG) model
Train SVM with BoG
Useful links:
C++ vey detailed example
Documentation for existing BOG classifier
Related
I'm working on a project that requires training a PyTorch framework NN on a very large dataset of images. Some of these images are completely irrelevant to the problem, and but these irrelevant images are not labelled as such. However, there are some metrics I can use to calculate if they are irrelevant (e.g. summing all the pixel values would give me a good sense of which are the relevant images and which are not). What I would ideally like to do is have a Dataloader that can take in a Dataset class, and create batches only with the relevant images. The Dataset class would just know the list of images and their labels, and the Dataloader would interpret whether or not the image it is making a batch with is relevant or not, and would then only make batches with relevant images.
To apply this to an example, lets say I have a dataset of black and white images. The white images are irrelevant, but they are not labelled as such. I want to be able to load batches from a file location, and have these batches only contain the black images. I could filter at some point by summing all the pixels and finding it equals to 0.
What I am wondering is if a custom Dataset, Dataloader, or Sampler would be able to solve this task for me? I already have written a custom Dataset that stores the directory of all the saved images, and a list of all the images in that directory, and can return an image with its label in the getitem function. Is there something more I should add there to filter out certain images? Or should that filter be applied in a custom Dataloader, or Sampler?
Thank you!
I'm assuming that your image dataset belongs to two classes (0 or 1) but it's unlabeled. As #PranayModukuru mentioned that you can determine the similarity by using some measure (e.g aggregating all the pixels intensity values of a image, as you mentioned) in the getitem function in tour custom Dataset class.
However, determining the similarity in getitem function while training your model will make the training process very slow. So, i would recommend you to approximate the similarity before start training (not in the getitem function). Moreover if your image dataset is comprised of complex images (not black and white images) it's better to use a pretrained deep learning model (e.g. resnet or autoencoder) for dimentionality reduction followed by applying clustering approach (e.g. agglomerative clustering) to label your image.
In the second approach you only need to label your images for exactly one time and if you apply augmentation on images while training you don't need to re-determine the similarity (label) in the getitem funcion. On the other hand, in the first approach you need to determine the similarity (label) every time (after applying transformation on images) in the getitem function which is redundant, unnecessary and time consuming.
Hope this will help.
It sounds like your goal is to totally remove the irrelevant images from training.
The best way to deal with this would be to figure out the filenames of all the relevant images up front and save their filenames to a csv or something. Then pass only the good filenames to your dataset.
The reason is you will run through your dataset multiple times during training. This means you will be loading, analyzing and discarding irrelevant images over and over again, which is a waste of compute.
It's better to do this sort of preprocessing/filtering once upfront.
I was trying to tackle an ML problem with tensor flow, but im not sure what algorithm should I use. I have tagged images on my dataset. When a new image comes on, i want the to correlate the images I have, based on the tags. Where should I start? O.o
What do you mean by correlate the images? Are you attempting to cluster the images based on their tags?
If so, you could train an encoder that runs over your images, produces a feature vector and cluster those feature vectors based on their image tags. So for example, consider you had multiple images of tags: cars & cats. You could run an encoder (consisting of convolutional layers), flatten the final layer to get a feature vector and run a clustering algorithm like K-means (with K=2, since you only have 2 tags -cars & cats).
Depending on the size and nature of the images in your dataset you might have to play around with the encoder architecture, collect more data, use alternate clustering algorithms etc.
In the event your image feature vector can belong to multiple classes and you would like to return possible tags, you'll have to opt for soft clustering algorithms such as GMMs (Gaussian Mixture Models) or FCMs (Fuzzy C Means). These algorithms don't specifically output class but outputs a class score for each data point. So if you want the top 5 tags of a new image, you could:
Run an encoder to get a feature vector
Perform soft clustering on the feature vectors
Get the 5 highest scoring classes
Where can I find details to implement siamese networks to perform image similarity and to retrieve the most similar image from a dataset
It is difficult to get a large number of image data for all the classes, so only a few images, eg 10 images for some classes, are available for most of the classes.
SIFT or ORB seems to perform poorly on some classes.
My project is to differentiate between the license plates based on the states of the UAE. Here I upload few example images.
When there is few training data, no matter how annoying it sounds, the best approach is usually to collect more. Deep networks are infamously data hungry and their performance is poor when the data is scarce. This said, there are approaches that might help you:
Transfer learning
Data augmentation
In transfer learning, you take an already trained deep net (e.g. ResNet50), which was trained for some other task (e.g. ImageNet), fix all its network weights except for the weights in the last few layers and train on your task of interest.
Data augmentation slightly modifies your training data in some predictable way. In your case you can rotate your image by a small angle, apply a perspective transformation, scale the image intensities or slightly change the colors. You apply a different set of these operations with different parameters every time you want to use a particular training image. This way you generate new training examples enlarging your training set.
I am using opencv2.4 and python 2.7.What features of images can be used for svm classification.I have gone through surf and sift but as a beginner it seems very difficult to me.What are the other feature extraction techniques?
If you are looking for simplest representation then this will help you.These two are very simple compared to other SIFT and SURF
Bitmap representation
HOG-Histogram of Gradients
SVM is a machine learning model for data classification.I have built a simple svm classifier.If you have two folder of images,birds and squireels.The steps i followed are
Extracted Hog features of Images and append that in a list
for file in listing1:
img = cv2.imread(path1 + file)
res=cv2.resize(img,(250,250))
h=hog(res)
training_set.append(h)
append the labels also
training_labels.append(1)
convert both lists to numpy array.
trainData=np.float32(training_set)
responses=np.float32(training_labels)
Train SVM
svm.train(trainData,responses, params=svm_params)
Test SVM
result = svm.predict_all(testData)
print result
I have been studying word2vec model by Google. I was able to generate vectors for text word corpus for maximum 300 dimensions. It is a very impressive tool and accuracy goes much further, on big data.
I am curious, is there any way to use word2vec to generate vectors on grayscale images. I am sure the approach is same, you generate vectors based on pixel intensity and then compute a cosine similarity.
I am trying to do build a model to compute similarity distance on grayscale images. Any library is capable of doing this besides word2vec or glove that works on text?
I agree with you that word2vec is very impressive tool, but this model is trained by predicting the next word in some article or news. All in all, I think that using word2vec on image does not make sense.
You can use skimage to do some image measure. e.g skimage-measure
Word2vec is not a good model for images, however I think what you really need is a bag of word model. In a basic method of image comparison, you convert images to a list of key point features (e.g. SIFT, SURF or etc.), then you match clusters of points with each other (e.g. FLANN).
The high amount of features in an image and uncertainty of each point representation makes it difficult to use a basic one layer network learning such as word2vec for image recognition. You may find better examples in this tutorials.
UPDATE after 3 years: I should also mention ConvNets and several pre-trained models available now which you can extract visual features from pixels.