I have implemented code for analysing k-means clustering and hierarchical clustering on the following student performance dataset, but have trouble visualising the plots for the clusters.
Since this is a multiclassification dataset, PCA does not work on it, and I am not aware of an alternate method or workaround it.
Dataset link:
https://archive.ics.uci.edu/ml/datasets/Student+Performance
I am trying to analyze the output of running xgb classifier. I haven't been able to find a proper explanation of the difference between the feature weights and the features importance chart.
Here is a sample screenshot (not from my dataset but the same analysis I am running).
I will appreciate explanations or references to where I can get any.
Thanks in advance
Screenshot
For my master thesis i'm developing a system to classify and extract cybersecurity countermeasures from unstructured texts.
In my binary classifier I want to check if a text is relevant or not. For this purpose I tried two approaches:
Scikit-Learn Support Vector Machines:
I used the paper by Husari et al. as a guide https://www.researchgate.net/publication/321503662_TTPDrill_Automatic_and_Accurate_Extraction_of_Threat_Actions_from_Unstructured_Text_of_CTI_Sources.
They used three features for their svm classifier
My Question: How can I add Features to SVM classifier?
BERT with pytorch
I created a dataset with manually labeled texts (100; 30 relevant; 70 not relevant)
Output 70 % accuracy and 61 % loss seems not good enough
I think it's because of the small dataset
My Question: Is there another possibility to use BERT with small datasets to get more accurate results?
Currently I am working for a project to classify a given set of test images into one of the 5 predefined categories. I implemented Logistic Regression with a feature vector of 240 features for each image and trained it using 100 images/ category. The learning accuracy I achieved was ~98% for each category, whereas when tested on validation set consisting of 500 images (100 images/category), only ~57% images were rightly classified.
Please suggest me few libraries/tools which I can use (preferably based on Neural Network) in order to attain higher accuracy.
I tried using a Java based tool, Neurophy (neuroph.sourceforge.net) on windows but, it didn't run as expected.
Edit: The feature vector were already provided for the project. I am also looking for a better feature extraction tool for Images.
You can get help from this paper Image Classification
In My opinion, SVM is relatively better than logistic regression when it comes to multi-class response problems. We use it in e commerce classification of product where there are 1000s of response level and thousands of features.
Based on your tags I assume you would like a python package, scikit-learn has good classification routines: scikit-learn.org.
I have had good success using the WEKA tools, you need to isolate the feature set that you are interested in and then apply a classifier from this library. The examples are very clear. http://weka.wikispaces.com
I am a novice about svm classification but i have learned some basic theory behind svm classification.
But i would like to know the code in python 2.7 for defining,training and testing of a problem in which each feature vector contain 20 elements.
Can anyone explain how can i use libsvm with a simple example?