Problems with all values output to 1 in evaluation metrics

Problems with all values output to 1 in evaluation metrics - python

x_test,x_val,y_test,y_val = train_test_split(x_test,y_test,test_size=0.5)
print(x_train.shape)
#(1413, 3) <----Result
print(x_val.shape)
#(472, 3) <----Result
print(x_test.shape)
#(471, 3) <----Result
I proceeded with data split using machine learning and got the above results.
from sklearn.tree import DecisionTreeClassifier
dTree = DecisionTreeClassifier(max_depth=2,random_state=0).fit(x_train,y_train)
print("train score : {}".format(dTree.score(x_train, y_train)))
#train score : 1.0 <----Result
print("val score : {}".format(dTree.score(x_val, y_val)))
#val score : 1.0 <----Result
We then used Decision Tree to print out the score of train and val, respectively, and the results were all 1.
predict_y = dTree.predict(x_test)
from sklearn.metrics import classification_report
print(classification_report(y_test, dTree.predict(x_test)))
print("test score : {}".format(dTree.score(x_test, y_test)))
precision recall f1-score support
A 1.00 1.00 1.00 235
B 1.00 1.00 1.00 236
accuracy 1.00 471
macro avg 1.00 1.00 1.00 471
weighted avg 1.00 1.00 1.00 471
test score : 0.9978768577494692
Finally, classification_report also showed the above results. Are some of my data splits wrong? Or Does the value of 1 mean all datas perfectly classified?If I'm wrong, I want to hear the right solution.

Related

How to disable seqeval label formatting for POS-tagging

I am trying to evaluate my POS-tagger using huggingface's implementation of the seqeval metric but, since my tags are not made for NER, they are not formatted the way the library expects them. Consequently, when I try to read the results of my classification report, the labels for class-specific results consistently lack the first character (the last if I pass suffix=True).
Is there a way to disable entity recognition in labels or do I have to pass all my labels with a starting space to solve this issue? (Given that the library is supposed to be suitable for POS-tagging, I hope there is a built-in solution)
SSCCE:
from seqeval.metrics import accuracy_score
from seqeval.metrics import classification_report
from seqeval.metrics import f1_score
y_true = [['INT', 'PRO', 'PRO', 'VER:pres'], ['ADV', 'PRP', 'PRP', 'ADV']]
y_pred = [['INT', 'PRO', 'PRO', 'VER:pres'], ['ADV', 'PRP', 'PRP', 'ADV']]
print(classification_report(y_true, y_pred))
Output:
precision
recall
f1-score
support
DV
1.00
1.00
1.00
2
ER:pres
1.00
1.00
1.00
1
NT
1.00
1.00
1.00
1
RO
1.00
1.00
1.00
1
RP
1.00
1.00
1.00
1
micro avg
1.00
1.00
1.00
6
macro avg
1.00
1.00
1.00
6
weighted avg
1.00
1.00
1.00
6

Why do predictions and scores return different results in classification using scikit-learn?

I wrote a very simple multiclass classifier based on the iris dataset. This is the code:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC, SVC
from sklearn.preprocessing import label_binarize
from sklearn.multiclass import OneVsRestClassifier
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import classification_report
# Load the data
iris = load_iris()
X = iris.data
y = iris.target
# Use label_binarize to be multi-label like settings
Y = label_binarize(y, classes=[0, 1, 2])
n_classes = Y.shape[1]
# Add noisy features
random_state = np.random.RandomState(0)
n_samples, n_features = X.shape
X = np.concatenate([X, random_state.randn(n_samples, 200 * n_features)], axis=1)
from sklearn.preprocessing import label_binarize
# Split into training and test
X_train, X_test, y_train, y_test = train_test_split(
X, Y, test_size=0.5, random_state=0
)
# Create classifier
classifier = OneVsRestClassifier(
make_pipeline(StandardScaler(), LinearSVC(random_state=random_state))
)
# Train the model
classifier.fit(X_train, y_train)
My goal is to predict the values of the test set in 2 ways:
Using the classifier.predict() function and define y_pred.
Using the classifier.decision_function() to get the scores and then pick the highest one for each instance and define y_pred_.
Here is how I did it:
# Get the scores for the Test set
y_score = classifier.decision_function(X_test)
# Make predictions
y_pred = classifier.predict(X_test)
y_pred_ = label_binarize(np.argmax(y_score, axis=1), [0,1,2])
It looks like however that when I try to compute the classification report I get slightly different results, while I would expect to be the same since the predictions are based on the scores obtained from the decision function as it can be seen in the documentation (line 789). Here are both reports:
print(classification_report(y_test, y_pred))
print(classification_report(y_test, y_pred_))
precision recall f1-score support
0 0.54 0.62 0.58 21
1 0.44 0.40 0.42 30
2 0.36 0.50 0.42 24
micro avg 0.44 0.49 0.47 75
macro avg 0.45 0.51 0.47 75
weighted avg 0.45 0.49 0.46 75
samples avg 0.39 0.49 0.42 75
precision recall f1-score support
0 0.42 0.38 0.40 21
1 0.52 0.47 0.49 30
2 0.38 0.46 0.42 24
micro avg 0.44 0.44 0.44 75
macro avg 0.44 0.44 0.44 75
weighted avg 0.45 0.44 0.44 75
samples avg 0.44 0.44 0.44 75
What am I doing wrong? Would you be able to suggest a smart and elegant solution so that both reports are identical?

For multilabel classification you should use
y_pred_ = np.where(classifier.decision_function(X_test) > 0, 1, 0)
to replicate the output of the predict() method as in this case the different classes are not mutually exclusive, i.e. a given sample can belong to multiple classes.
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler, label_binarize
from sklearn.svm import LinearSVC
from sklearn.multiclass import OneVsRestClassifier
from sklearn.metrics import classification_report
# Load the data
iris = load_iris()
X = iris.data
y = label_binarize(iris.target, classes=[0, 1, 2])
# Split the data into training and test
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.5, random_state=0
)
# Create classifier
classifier = OneVsRestClassifier(
make_pipeline(StandardScaler(), LinearSVC(random_state=0))
)
# Train the model
classifier.fit(X_train, y_train)
# Make predictions
y_pred = classifier.predict(X_test)
y_pred_ = np.where(classifier.decision_function(X_test) > 0, 1, 0)
print(classification_report(y_test, y_pred))
# precision recall f1-score support
# 0 1.00 1.00 1.00 21
# 1 0.58 0.37 0.45 30
# 2 0.95 0.83 0.89 24
# micro avg 0.85 0.69 0.76 75
# macro avg 0.84 0.73 0.78 75
# weighted avg 0.82 0.69 0.74 75
# samples avg 0.66 0.69 0.67 75
print(classification_report(y_test, y_pred_))
# precision recall f1-score support
# 0 1.00 1.00 1.00 21
# 1 0.58 0.37 0.45 30
# 2 0.95 0.83 0.89 24
# micro avg 0.85 0.69 0.76 75
# macro avg 0.84 0.73 0.78 75
# weighted avg 0.82 0.69 0.74 75
# samples avg 0.66 0.69 0.67 75
For multiclass classification you can instead use
y_pred_ = np.argmax(classifier.decision_function(X_test), axis=1)
as in your code, as in this case the different classes are mutually exclusive, i.e. each sample is assigned to only one class.
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC
from sklearn.multiclass import OneVsRestClassifier
from sklearn.metrics import classification_report
# Load the data
iris = load_iris()
X = iris.data
y = iris.target
# Split into training and test
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.5, random_state=0
)
# Create classifier
classifier = OneVsRestClassifier(
make_pipeline(StandardScaler(), LinearSVC(random_state=0))
)
# Train the model
classifier.fit(X_train, y_train)
# Make predictions
y_pred = classifier.predict(X_test)
y_pred_ = np.argmax(classifier.decision_function(X_test), axis=1)
print(classification_report(y_test, y_pred))
# precision recall f1-score support
# 0 1.00 1.00 1.00 21
# 1 0.85 0.73 0.79 30
# 2 0.71 0.83 0.77 24
# accuracy 0.84 75
# macro avg 0.85 0.86 0.85 75
# weighted avg 0.85 0.84 0.84 75
print(classification_report(y_test, y_pred_))
# precision recall f1-score support
# 0 1.00 1.00 1.00 21
# 1 0.85 0.73 0.79 30
# 2 0.71 0.83 0.77 24
# accuracy 0.84 75
# macro avg 0.85 0.86 0.85 75
# weighted avg 0.85 0.84 0.84 75

OneVsRestClassifier is assuming that you expect multi-label result, i.e. there may be more than one positive label for a single input. The result is thus different from using argmax with decision_function.
Try
print(y_pred[0])
print(y_pred_[0])
Output:
[0 1 1]
[0 0 1]

Is there a way to find the average precision and recall of each class in the custom Tensorflow model?

I have trained a model using TensorFlow and SSD MobileNet. I was able to find the mean average precision of the model. Is their a way to find the average precision of each class in the models.
I am using Tensorflow 2.5 version.
Thanks in advance

You can use sklearn per the code below. For both the confusion matrix and the classification report you need to provide y_predict and y_true. After you train you model then do predictions on the test set. Somewhere I assume you have y_true in your code as the label for the classes. I will assume they are present in a list called y_true and are in the SAME order as your inputs to model.predict. I will also assume you have a list called classes which are the names of your classes in order.
For example if cats is label 0 and dogs is label 1 then classes=[cats, dogs]
from sklearn.metrics import confusion_matrix, classification_report
preds=model.predict ---etc
ypredict=[]
for p in preds:
index=np.argmax(p)
y_predict.append(index)
y_true= np.array(y_true)
y_predict=np.array(y_predict)
# create a confusion matrix
cm = confusion_matrix(y_true, y_predict )
# code below formats the confusion matrix plot
length=len(classes)
if length<8:
fig_width=8
fig_height=8
else:
fig_width= int(length * .5)
fig_height= int(length * .5)
plt.figure(figsize=(fig_width, fig_height))
sns.heatmap(cm, annot=True, vmin=0, fmt='g', cmap='Blues', cbar=False)
plt.xticks(np.arange(length)+.5, classes, rotation= 90)
plt.yticks(np.arange(length)+.5, classes, rotation=0)
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix")
plt.show()
clr = classification_report(y_true, y_pred, target_names=classes)
print("Classification Report:\n----------------------\n", clr)
Below is an example of a classification report
Classification Report:
----------------------
precision recall f1-score support
Banana 1.00 1.00 1.00 15
Bread 1.00 1.00 1.00 15
Eggs 1.00 1.00 1.00 15
Milk 1.00 1.00 1.00 15
Mixed 1.00 1.00 1.00 12
Potato 1.00 1.00 1.00 15
Spinach 1.00 1.00 1.00 15
Tomato 1.00 1.00 1.00 15
accuracy 1.00 117
macro avg 1.00 1.00 1.00 117
weighted avg 1.00 1.00 1.00 117

My Random Forest Classification Algorithm doesn't give the results I want

I made up a excel sheet of random numbers (3000 rows and 6 columns) and set it so any row having a B column >= 50, a C column of 0 and an E column of 1 gets a final 'y' value of 1. Else, it gets a 0 value. Ran this through this RandomForestClassifier code and it doesn't work and either returns 0 for all new test data or doesn't even take into account the B column when predicting. How can I solve this?
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
import pickle
data_crd = pd.read_csv(r'C:\Users\Rada1\.spyder-py3\new_created_data.csv')
#C:\Users\Rada1\.spyder-py3\new_created_data.csv
data_crd.head()
X = data_crd.iloc[:,1:5]
y = data_crd.iloc[:,5]
#print (X)
#print (y)
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2, random_state=0)
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
classifier = RandomForestClassifier (n_estimators = 500, random_state = 0)
classifier.fit (X_train, y_train)
y_pred = classifier.predict(X_test)
print (classification_report(y_test,y_pred))
print (confusion_matrix(y_test,y_pred))
print (accuracy_score(y_test,y_pred))
with open ('model_wcd','wb') as f:
pickle.dump(classifier,f)
I get a 100% accuracy rate as my result which just already feels wrong. What do I need to adjust?
precision recall f1-score support
0 1.00 1.00 1.00 515
1 1.00 1.00 1.00 85
accuracy 1.00 600
macro avg 1.00 1.00 1.00 600
weighted avg 1.00 1.00 1.00 600
[[515 0]
[ 0 85]]
1.0

Hopefully if you use stratify =y it might work
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2, random_state=0,stratify=y)
and also use MinMaxScaler for numerical feature and reshape them to (-1,1)
x_train_num=num_feature.transform(x_train[column_name].values.reshape(-1,1))
x_test_num=num_feature.transform(x_test[column_name].values.reshape(-1,1))

Scipy/Numpy/scikits - calculating precision/recall scores based on two arrays

I fit a Logistic Regression Model and train the model based on training dataset using the following
import scikits as sklearn
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression(C=0.1, penalty='l1')
model = lr.fit(training[:,0:-1], training[:,-1)
I have a cross validation dataset which contains a labels associated in input matrix and can be accessed as
cv[:,-1]
I run my cross validation dataset against the trained model which returns me the list of 0s and 1s based on prediction
cv_predict = model.predict(cv[:,0:-1])
Question
I want to calculate the precision and recall scores based on acutal labels and predicted labels. Is there a standard method to do it using numpy/scipy/scikits?
Thank you

Yes there are, see the documentation: http://scikit-learn.org/stable/modules/classes.html#classification-metrics
You should also have a look at the sklearn.metrics.classification_report utility:
>>> from sklearn.metrics import classification_report
>>> from sklearn.linear_model import SGDClassifier
>>> from sklearn.datasets import load_digits
>>> digits = load_digits()
>>> n_samples, n_features = digits.data.shape
>>> n_split = n_samples / 2
>>> clf = SGDClassifier().fit(digits.data[:n_split], digits.target[:n_split])
>>> predictions = clf.predict(digits.data[n_split:])
>>> expected = digits.target[n_split:]
>>> print classification_report(expected, predictions)
precision recall f1-score support
0 0.90 0.98 0.93 88
1 0.81 0.69 0.75 91
2 0.94 0.98 0.96 86
3 0.94 0.85 0.89 91
4 0.90 0.93 0.91 92
5 0.92 0.92 0.92 91
6 0.92 0.97 0.94 91
7 1.00 0.85 0.92 89
8 0.71 0.89 0.79 88
9 0.89 0.83 0.86 92
avg / total 0.89 0.89 0.89 899

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Problems with all values output to 1 in evaluation metrics - python

Related

How to disable seqeval label formatting for POS-tagging

Why do predictions and scores return different results in classification using scikit-learn?

Is there a way to find the average precision and recall of each class in the custom Tensorflow model?

My Random Forest Classification Algorithm doesn't give the results I want

Scipy/Numpy/scikits - calculating precision/recall scores based on two arrays

Categories

Resources