I have to make comparison between 155 image feature vectors. Every feature vector has got 5 features.
My image are divided in 10 classes.
Unfortunately i need at least 100 images for class for using support vector machine , There is any alternative?
15 samples per class is very low for any machine learning model. Rather than wasting time trying many model classes and parameters you should collect and label new examples by hand. It will be much more fruitful. If you have a bunch of unlabeled pictures, you can use services such as https://www.mturk.com/ .
Check out pybrain.http://pybrain.org. And possibly use neural net as I've heard they need less data to train than svm's but less accurate.
If your images that belong to the same class are results of a transformations to some starting image you can increase your training size by making transofrmations to your labeled examples.
For example if you are doing character recognition, afine or elastic transforamtions can be used. P.Simard in Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis describes it in more detail. In the paper he uses Neural Networks but the same applies for SVM.
Related
Regular TimesFormer takes 3 channel input images, while I have 4 channel images (RGBD). I am struggling to find a TimesFormer (or a model similar to TimesFormer) that takes 4 channel input images and extract features from them.
Does anybody know such a model? Preferably, I would like to find pretrained model with weights.
MORE CONTEXT:
I am working with RGBD video frames and have multiclass classification problem at the end. My videos are fairly large, between 2 to 4 minutes so classical time-series models doesn't work for me. So my inputs are RGBD frames/images from the video and at the end I would like to get class prediction.
My idea was to divide the problem into 2 stages:
Extract features from video into smaller dimension with TimesFormer-like model. Result: I would get a new data representation (dataset).
Train clasification ML network with new data to get a class prediction.
As of Jan 2023, I don't think there's any readily available TimeSformer model/code that works on RGBD 4 channel image.
Alternatively, If you are looking for Vision Transformers that can work with depth as well (RGBD data), you have the entire list with state-of-the-art approaches and corresponding code (wherever available) here.
One of the good approach to start with is DepthFormer: Exploiting Long-Range Correlation and Local Information for Accurate Monocular Depth Estimation. You can find the pre-trained models with this approach here.
If you're looking for 3D CNN based object detectors that can work on RGBD data: RGB-D Salient Object Detection via 3D Convolutional Neural Networks is one of the good ones to start with. Code and pre-trained models can be found here.
Since I don't fully understand your problem statement or exact requirement I'm proposed few things that I thought could be helpful.
I am trying to determine the right approach to take for an image classification problem which involves 10 classes and only 1900 samples. The images (1288 x 964 resolution) are of industrial parts where each part is represented by one of the 10 classes; the image classes essentially differ in terms of a serial number that is present in the image as well as other subtle features. I've considered using a CNN but am wondering if this may not be recommended due to the size of the data set, i.e. is the data set too small for this? Otherwise I've considered using either the KNN or SVM algorithms which I thought may work better due to less data but am in need of some expert guidance. Thank you.
You can use a pretrained feature extractor (f.e. inceptionV3 which is standard in keras). Because it has been trained on other data, only the last layer should be retrained to your specific needs and 100ish images per class should be enough to do that
I am training a convolutional autoencoder on my own dataset. After training, the network is able to reconstruct the test images from the dataset quite well.
I am now taking the intermediate representation(1648-dim) from the encoder network and trying to cluster the feature vectors into 17(known upfront) different classes using a GMM soft clustering. However, the clusters are really bad and it is not able to cluster the images into its respective categories.
I am using sklearn.mixture.GaussianMixture package for clustering with a regularization of 0.01 and 'full' covariance_type.
My question: Why do you think that the reconstruction is very decent but the clustering is quite bad? Does it mean the intermediate features learned by the network is not adequate?
Lets revert the question - why do you think it should have any meaning? You are using clustering, which is just arbitrary method of splitting into groups yet you expect it will discover classes. Why would it do it? There is literally nothing forcing model to do so, and it is probably modeling completely different things (like patches of images, textures etc.). In general you should never expect clustering to solve the problem of some arbitrary labeling, this is not what clustering is for. To give you more perspective here - you have images, which come from say 10 categories (like cats, dogs etc.), and you ask:
why clustering in the feature space does not recover classes?
Note that equally valid questions would be:
why clustering in the features space does not divide images to "redish", "greenish" and "blueish"?
why clustering in the features space does not divide images by the size of the object on the image?
why clustering in the features space does not divide images by the country it is from?
There are exponentially many labelings to be assigned to each dataset, and nothing in your training uses any labels (autoencoding is unsupervised, clustering is unsupervised) so expecting that the result will magically guess which of so many labellings you have in mind is simply a wild guess, and the fact it does not do so means nothing. It is neither good nor bad. (Lets also ignore at this point how good can GMM be with ~1700 dimensional space. )
If you want a model to perform some task you have to give it a chance, train it to solve it. If you want to see if features learned are enough to recover categories then learn a classifier on them.
I do apologies in advance if something similar has been posted but from the research I've done I can't find anything specific.
I'm currently looking at http://scikit-learn.org and the content here looks great but I'm confused what type I should be using for my problem.
I want to able to have 2 labels.
**Suspicious**
1hbn34uqrup7a13t
qmr30zoyswr21cdxolg
1qmqnbetqx
**Not-Suspicious**
cheesemix
reg526
animato12
What type of machine learning algorithm could I feed the data in above as to teach it what I'd class as suspicious through supervised learning?
I'm leaning towards classification but there are so many models to choose from my slightly lost.
The first step in such machine learning problems is to think about the "features". You can't use e.g. a linear classifier directly on these strings. Thus, you have to extract some meaningful features that describe the string. In computer vision, these features are often edges, corner points, SIFT features. You basically have to options:
Design features yourself.
Learn the features.
1) This is the "classical" machine learning approach: you manually design a list of representative features, which you can extract from your input data. In your case, you could start with e.g.
length of the string
number of different characters
number of special characters
something about the sorting?
...
That will give you a vector of numbers for each string. Now, you can use any of the classifiers from scikit-learn to classify the data. You can start choosing your algorithm with the help of this flowchart. You should start with a simple model, e.g. a linear model (e.g. linear SVM). If performance is not sufficient, use a more complex model (e.g. SVM with kernels), or rethink your choice of features.
2) This is the "modern" approach, which is gaining more and more popularity. Designing the features is a crucial step in 1) and it requires good knowledge of your data. Now, by using a deep neural network, you can feed your raw data (the string) into the network, and let the network learn such "features" itself. This, however, requires a large amount of labeled training data, and a lot of processing power (GPUs).
LSTM networks are todays state-of-the-art in natural language processing and similar tasks. LSTMs would be well suited to your tasks, as the input can be of variable length.
tl;dr: Either design features yourself and use a classifier of your choice, or dive into deep neural networks and let a network learn both the features and the classification.
Currently I am working for a project to classify a given set of test images into one of the 5 predefined categories. I implemented Logistic Regression with a feature vector of 240 features for each image and trained it using 100 images/ category. The learning accuracy I achieved was ~98% for each category, whereas when tested on validation set consisting of 500 images (100 images/category), only ~57% images were rightly classified.
Please suggest me few libraries/tools which I can use (preferably based on Neural Network) in order to attain higher accuracy.
I tried using a Java based tool, Neurophy (neuroph.sourceforge.net) on windows but, it didn't run as expected.
Edit: The feature vector were already provided for the project. I am also looking for a better feature extraction tool for Images.
You can get help from this paper Image Classification
In My opinion, SVM is relatively better than logistic regression when it comes to multi-class response problems. We use it in e commerce classification of product where there are 1000s of response level and thousands of features.
Based on your tags I assume you would like a python package, scikit-learn has good classification routines: scikit-learn.org.
I have had good success using the WEKA tools, you need to isolate the feature set that you are interested in and then apply a classifier from this library. The examples are very clear. http://weka.wikispaces.com