Clustering Images not particullary visually simillar

Clustering Images not particullary visually simillar - python

I have to cluster images into 8 groups(sports,vegetables,monuments,sharks, rubbish,glass,landmarks,food). Problem is some of these groups are really general. For example one class category is "sports". In dataset (which consists of 100 unlabaled photos) there are sports like tennis, horse riding , boxing , swimming. Same happens with class "rubbish". In rubbish there would need to be rotten food and batteries. These images are not particullary visually simmillar.
So I tried to use pretrained model ( VGG-16) to extract features and then cluster by features using K-Means. Problem is results are far from desired. Sharks are clustered right. But I always receive a cluster of batteries, not batteries and rotten food to make it "rubbish" class. How to generalize extracted features?

Related

Generating dictionaries to categorize tweets into pre-defined categories using NLTK

I have a list of twitter users (screen_names) and I need to categorise them into 7 pre-defined categories - Education, Art, Sports, Business, Politics, Automobiles, Technology based on thier interest area.
I have extracted last 100 tweets of the users in Python and created a corpus for each user after cleaning the tweets.
As mentioned here Tweet classification into multiple categories on (Unsupervised data/tweets) :
I am trying to generate dictionaries of common words under each category so that I can use it for classification.
Is there a method to generate these dictionaries for a custom set of words automatically?
Then I can use these for classifying the twitter data using a tf-idf classifier and get the degree of correspondence of the tweet to each of the categories. The highest value will give us the most probable category of the tweet.
But since the categorisation is based on these pre-generated dictionaries, I am looking for a way to generate them automatically for a custom list of categories.
Sample dictionaries :
Education - ['book','teacher','student'....]
Automobiles - ['car','auto','expo',....]
Example I/O:
**Input :**
UserA - "students visited share learning experience eye opening
article important preserve linaugural workshop students teachers
others know coding like know alphabets vision driving codeindia office
initiative get students tagging wrong people apologies apologies real
people work..."
.
.
UserN - <another corpus of cleaned tweets>
**Expected output** :
UserA - Education (61%)
UserN - Automobiles (43%)

TL;DR
Labels are necessary for supervised machine learning. And if you don't have training data that contains Xs (input texts) and Y (output labels) then (i) supervised learning might not be what you're looking for or (ii) you have to create a dataset with texts and their corresponding labels.
In Long
Lets try to break it down and see reflect what you're looking for.
I have a list twitter users (screen_names) and I need to categorise them into 7 pre-defined categories - Education, Art, Sports, Business, Politics, Automobiles, Technology
So your ultimate task is to label tweets into 7 categories.
I have extracted last 100 tweets of the users in Python and created a corpus for each user after cleaning the tweets.
100 data points is definitely insufficient to do anything if you want to train a supervised machine learning model from scratch.
Another thing is the definition of corpus. A corpus is a body of text so it's not wrong to call any list of strings a corpus. However, to do any supervised training, each text should come with the corresponding label(s)
But I see some people do unsupervised classification without any labels!
Now, that's an oxymoron =)
Unsupervised Classification
Yes, there are "unsupervised learning" which often means to learn representation of the inputs, generally the representation of the inpus is use to (i) generate or (ii) sample.
Generation from a representation means to create from the representation a data point that is similar to the data which an unsupervised model has learnt from. In the case of text process / NLP, this often means to generate new sentences from scratch, e.g. https://transformer.huggingface.co/
Sampling a representation means to give the unsupervised model a text and the model is expected to provide some signal from which the unsupervised model has learnt from. E.g. given a language model and novel sentence, we want to estimate the probability of the sentence, then we use this probability to compare across different sentences' probabilities.
Algorithmia has a nice summary blogpost https://algorithmia.com/blog/introduction-to-unsupervised-learning and a more modern perspective https://sites.google.com/view/berkeley-cs294-158-sp20/home
That's a whole lot of information but you don't tell me how to #$%^&-ing do unsupervised classification!
Yes, the oxymoron explanation isn't finished. If we look at text classification, what are we exactly doing?
We are fitting the input text into some pre-defined categories. In your case, the labels are pre-defined but
Q: Where exactly would the signal come from?
A: From the tweets, of course, stop distracting me! Tell me how to do classification!!!
Q: How do you tell the model that a tweet should be this label and not another label?
A: From the unsupervised learning, right? Isn't that what unsupervised learning supposed to do? To map the input texts to the output labels?
Precisely, that's the oxymoron,
Supervised learning maps the input texts to output labels not unsupervised learning
So what do I do? I need to use unsupervised learning and I want to do classification.
Then the question is ask is:
Do you have labelled data?
If no, then how to get labels?
Use proxies, find signals that tells you a certain tweet is a certain label, e.g. from the hashtags or make some assumptions that some people always tweets on certain category
Use existing tweet classifiers to label your data and then train the classification model on the data
Do I have to pay for these classifiers? Most often, yes you do. https://english.api.rakuten.net/search/text%20classification
If yes, then how much?
If it's too little,
then how to create more? Maybe https://machinelearningmastery.com/a-gentle-introduction-to-the-bootstrap-method/
or maybe use some modern post-training algorithm https://towardsdatascience.com/https-medium-com-chaturangarajapakshe-text-classification-with-transformer-models-d370944b50ca
How about all these AI I keep hearing about, that I can do classification with 3 lines of code.
Don't they use unsupervised language models that sounds like Sesame Street characters, e.g. ELMO, BERT, ERNIE?
I guess you mean something like https://github.com/ThilinaRajapakse/simpletransformers#text-classification
from simpletransformers.classification import ClassificationModel
import pandas as pd
# Train and Evaluation data needs to be in a Pandas Dataframe of two columns. The first column is the text with type str, and the second column is the label with type int.
train_data = [['Example sentence belonging to class 1', 1], ['Example sentence belonging to class 0', 0]]
train_df = pd.DataFrame(train_data)
eval_data = [['Example eval sentence belonging to class 1', 1], ['Example eval sentence belonging to class 0', 0]]
eval_df = pd.DataFrame(eval_data)
# Create a ClassificationModel
model = ClassificationModel('bert', 'bert-base') # You can set class weights by using the optional weight argument
# Train the model
model.train_model(train_df)
Take careful notice of the comment:
Train and Evaluation data needs to be in a Pandas Dataframe of two columns. The first column is the text with type str, and the second column is the label with type int.
Yes that's the more modern approach to:
First use a pre-trained language model to convert your texts into input representations
Then feed the input representations and their corresponding labels to a classifier
Note, you still can't avoid the fact that you need labels to train the supervised classifier
Wait a minute, you mean all these AI I keep hearing about is not "unsupervised classification".
Genau. There's really no such thing as "unsupervised classification" (yet), somehow the (i) labels needs to be manually defined, (ii) the mapping between the inputs to the labels should exist
The right word to define the paradigm would be transfer learning, where the language is
learned in a self-supervised manner (it's actually not truly unsupervised) so that the model learns to convert any text into some numerical representation
then use the numerical representation with labelled data to produce the classifier.

How to extract/cut out parts of images classified by the model?

I am new to deep learning, I was wondering if there is a way to extract parts of images containing the different label and then feed those parts to different model for further processing?
For example,consider the dog vs cat classification.
Suppose the image contains both cat and dog.
We successfully classify that the image contains both, but how can we classify the breed of the dog and cat present?
The approach I thought of was,extracting/cutting out the parts of the image containing dog and cat.And then feed those parts to the respective dog breed classification model and cat breed classification model separately.
But I have no clue on how to do this.

Your thinking is correct, you can have multiple pipelines based on the number of classes.
Training:
Main model will be an object detection and localization model like Faster RCNN, YOLO, SSD etc trained to classify at a high level like cat and dog. This pipeline provides you bounding box details (left, bottom, right, top) along with the labels.
Sub models will be multiple models trained on a lover level. For example a model that is trained to classify breed. This can be done by using models like vgg, resnet, inception etc. You can utilize transfer learning here.
Inference: Pass the image through Main model, crop out the detection objects using bounding box details (left, bottom, right, top) and based on the label information, feed it appropriate sub model and extract the results.

Racecar image tagging

I am working on a system to simplify our image library which grows anywhere from 7k to 20k new pictures per week. The specific application is identifying which race cars are in pictures (all cars are similar shapes with different paint schemes). I plan to use python and tensorflow for this portion of the project.
My initial thought was to use image classification to classify the image by car; however, there is a very high probability of the picture containing multiple cars. My next thought is to use object detection to detect the car numbers (present in fixed location on all cars [nose, tail, both doors, and roof] and consistent font week to week). Lastly there is the approach of object recognition of the whole car. This, on the surface, seems to be the most practical; however, the paint schemes change enough that it may not be.
Which approach will give me the best results? I have pulled a large number of images out for training, and obviously the different methods require very different training datasets.

The best approach would be to use all 3 methods as an ensamble. You train all 3 of those models, and pass the input image to all 3 of them. Then, there are several ways how you can evaluate output.
You can sum up the probabilities for all of the classes for all 3 models and then draw a conclusion based on the highest probability.
You can get prediction from every model and decide based on number of votes: 1. model - class1, 2. model - class2, 3. model - class2 ==> class2
You can do something like weighted decision making. So, let's say that first model is the best and the most robust one but you don't trust it 100% and want to see what other models will say. Than you can weight the output of the first model with 0.6, and output of other two models with weight of 0.2.
I hope this helps :)

How do I check if Tensorflow-algorith is actually trained for image?

I used the "Tensorflow For Poets" example from https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/#7. It worked as it should, but when it was given an image completely different (used a photo of a dog, when it was trained for flowers) it gave back a prediction for one of the values with more than 80% certainty, though it was completely wrong.
My question is: how can it also check how certain it is that it is trained for a specific image?
Here is the output:
roses 0.8071102
tulips 0.20286822
daisy 2.0703827e-05
sunflowers 7.838418e-07
dandelion 3.357079e-08

CNNs are often confidently wrong. The filters you derived during training must have found similarities between features in roses and the dog image. If the CNN thinks a dog is a rose, then it thinks a dog is a rose. Simple as that. There wouldn't be an obvious test to check how certain it is that it is 80% certain. It can only tell you its 80% certain your dog is a rose.
The model needs to be trained on more images to get better results. If you have a recurring/persistent confusion between two objects (i.e. dogs and roses) it can be a good idea to train the model with images of dogs as negative examples (either images with no features or by creating its own class). This way the model can learn what makes a rose distinctive from a dog.

Classifier scores normalized for comparison, using log scale

What's a good method to compare the scored results of different classifiers?
For example, let's say I have a classifier that deals with 12 different classes of fruit.
Out of these 12 classes, 4 of them represent different views of an apple, 4 represent views of a banana, and 4 represent a pineapple. This is a multi-class classifier, and given an image, it assigns a score to all of the possible 12 fruit classes that the image might belong to. The class that got the highest score is selected as THE class of the image.
Now in addition to that - I have 3 individual classifiers, one for apples, one for bananas, and one for pineapples. Each classifier deals with 4 different views of an individual fruit.
I want to compare whether using the single 12-class classifier gives better results than using a combination of the individual 4-class classifiers.
When I run the 12-class classifier on images of apples, the results are indeed less accurate than the results of running the individual apple classifier, and the same goes for bananas and pineapples.
What I want to do now is build a combination of the 3 classifiers. So my program will run all 3 classifiers on a single image and tell me what the most likely class is.
The problem is - how can I normalize the scores across the different classifiers, so that I can make a comparison among the the 3 classifiers' classes and choose the class with the highest score? Although the method used to train the individual classifiers was the same, I doubt I can directly compare their scores without some kind of normalization.
Would it be practical to simply convert all the scores to the log scale and then compare them?

A popular method to normalize classifiers is done by Platt-Scaling, which is implemented in libSVM. Its a straight forward normalization as described in the following link.
http://en.wikipedia.org/wiki/Platt_scaling

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.