I have trained a text classification model that works well. I wanted to get deeper and understand what words/phrases from a sentence were most impactful in the classification outcome. I want to understand what words are most important for each classification outcome
I am using Keras for the classification and below is the code I am using to train the model. It's a simple embedding plus max-pooling text classification model that I am using.
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers
import tensorflow as tf
from tensorflow.keras.callbacks import EarlyStopping
# early stopping
callbacks = tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', min_delta=0,
patience=5, verbose=2, mode='auto', restore_best_weights=True)
# select optimizer
opt = tf.keras.optimizers.Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999,
epsilon=1e-07, amsgrad=False, name="Adam")
embedding_dim = 50
# declare model
model = Sequential()
model.add(layers.Embedding(input_dim=vocab_size,
output_dim=embedding_dim,
input_length=maxlen))
model.add(layers.GlobalMaxPool1D())
model.add(layers.Dense(10, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(optimizer=opt,
loss='binary_crossentropy',
metrics=['accuracy'])
model.summary()
# fit model
history = model.fit(X_tr, y_tr,
epochs=20,
verbose=True,
validation_data=(X_te, y_te),
batch_size=10, callbacks=[callbacks])
loss, accuracy = model.evaluate(X_tr, y_tr, verbose=False)
How do I extract the phrases/words that have the maximum impact on the classification outcome?
It seems that the keyword you need are "neural network interpretability" and "feature attribution". One of the best known methods in this area is called Integrated Gradients; it shows how model prediction depend on each input feature (each word embedding, in your case).
This tutorial shows how to implement IG in pure tensorflow for images, and this one uses the alibi library to highlight the words in the input text with the highest impact on a classification model.
I am trying to use neural network for my regression problem in python but the output of the neural network is a straight horizontal line which is zero. I have one input and obviously one output.
Here is my code:
def baseline_model():
# create model
model = Sequential()
model.add(Dense(1, input_dim=1, kernel_initializer='normal', activation='relu'))
model.add(Dense(4, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal'))
# Compile model
model.compile(loss='mean_squared_error',metrics=['mse'], optimizer='adam')
model.summary()
return model
# evaluate model
estimator = KerasRegressor(build_fn=baseline_model, epochs=50, batch_size=64,validation_split = 0.2, verbose=1)
kfold = KFold(n_splits=10)
results = cross_val_score(estimator, X_train, y_train, cv=kfold)
Here are the plots of NN prediction vs. target for both training and test data.
Training Data
Test Data
I have also tried different weight initializers (Xavier and He) with no luck!
I really appreciate your help
First of all correct your syntax while adding dense layers in model remove the double equal == with single equal = with kernal_initilizer like below
model.add(Dense(1, input_dim=1, kernel_initializer ='normal', activation='relu'))
Then to make the performance better do the followong
Increase the number of hidden neurons in the hidden layers
Increase the number of hidden layers.
If still you have same problem then try to change the optimizer and activation function. Tuning the hyperparameters may help you in converging to the solution
EDIT 1
You also have to fit the estimator after cross validation like below
estimator.fit(X_train, y_train)
and then you can test on the test data as follow
prediction = estimator.predict(X_test)
from sklearn.metrics import accuracy_score
accuracy_score(Y_test, prediction)
Today I've ran into some very strange behavior of Keras. When I try to do a classification run on the iris-dataset with a simple model, keras version 1.2.2 gives me +- 95% accuracy, whereas a keras version of 2.0+ predicts the same class for every training example (leading to an accuracy of +- 35%, as there are three types of iris). The only thing that makes my model predict +-95% accuracy is downgrading keras to a version below 2.0:
I think it is a problem with Keras, as I have tried the following things, all do not make a difference;
Switching activation function in the last layer (from Sigmoid to softmax).
Switching backend (Theano and Tensorflow both give roughly same performance).
Using a random seed.
Varying the number of neurons in the hidden layer (I only have 1 hidden layer in this simple model).
Switching loss-functions.
As the model is very simple and it runs on it's own (You just need the easy-to-obtain iris.csv dataset) I decided to include the entire code;
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import LabelEncoder
#Load data
data_frame = pd.read_csv("iris.csv", header=None)
data_set = data_frame.values
X = data_set[:, 0:4].astype(float)
Y = data_set[:, 4]
#Encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)
def baseline_model():
#Create & Compile model
model = Sequential()
model.add(Dense(8, input_dim=4, init='normal', activation='relu'))
model.add(Dense(3, init='normal', activation='sigmoid'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
#Create Wrapper For Neural Network Model For Use in scikit-learn
estimator = KerasClassifier(build_fn=baseline_model, nb_epoch=200, batch_size=5, verbose=0)
#Create kfolds-cross validation
kfold = KFold(n_splits=10, shuffle=True)
#Evaluate our model (Estimator) on dataset (X and dummy_y) using a 10-fold cross-validation procedure (kfold).
results = cross_val_score(estimator, X, dummy_y, cv=kfold)
print("Accuracy: {:2f}% ({:2f}%)".format(results.mean()*100, results.std()*100))
if anyone wants to replicate the error here are the dependencies I used to observe the problem:
numpy=1.16.4
pandas=0.25.0
sk-learn=0.21.2
theano=1.0.4
tensorflow=1.14.0
In Keras 2.0, many parameters changed names, there is compatibility layer to keep things working, but somehow it did not apply when using KerasClassifier.
In this part of the code:
estimator = KerasClassifier(build_fn=baseline_model, nb_epoch=200, batch_size=5, verbose=0)
You are using the old name nb_epoch instead of the modern name of epochs. The default value is epochs=1, meaning that your model was only being trained for one epoch, producing very low quality predictions.
Also note that here:
model.add(Dense(3, init='normal', activation='sigmoid'))
You should be using a softmax activation instead of sigmoid, as you are using the categorical cross-entropy loss:
model.add(Dense(3, init='normal', activation='softmax'))
I've managed to isolate the issue, if you change nb_epoch to epochs, (All else being exactly equal) the model predicts very good again, in keras 2 as well. I don't know if this is intended behavior or a bug.
How would you create and display an accuracy metric in keras for a regression problem, for example after you round the predictions to the nearest integer class?
While accuracy is not itself effectively defined conventionally for a regression problem, to determine ordinal classes/labels for data, it is suitable to treat the problem as a regression. But then it would be convenient to also calculate an accuracy metric, whether it be kappa or something else like that. Here is a basic keras boilerplate code to modify.
from keras.models import Sequential
from keras.layers.core import Dense, Activation
model = Sequential()
model.add(Dense(10, 64))
model.add(Activation('tanh'))
model.add(Dense(64, 1))
model.compile(loss='mean_absolute_error', optimizer='rmsprop')
model.fit(X_train, y_train, nb_epoch=20, batch_size=16)
score = model.evaluate(X_test, y_test, batch_size=16)
I use rounded accuracy like this:
from keras import backend as K
def soft_acc(y_true, y_pred):
return K.mean(K.equal(K.round(y_true), K.round(y_pred)))
model.compile(..., metrics=[soft_acc])
The answer by Thomas pretty much sums up the question. Just a minor addition as I was stuck in this one. Here "K" is
import keras.backend as K
def soft_acc(y_true, y_pred):
return K.mean(K.equal(K.round(y_true), K.round(y_pred)))
model.compile(..., metrics=[soft_acc])
I have a multi output(200) binary classification model which I wrote in keras.
In this model I want to add additional metrics such as ROC and AUC but to my knowledge keras dosen't have in-built ROC and AUC metric functions.
I tried to import ROC, AUC functions from scikit-learn
from sklearn.metrics import roc_curve, auc
from keras.models import Sequential
from keras.layers import Dense
.
.
.
model.add(Dense(200, activation='relu'))
model.add(Dense(300, activation='relu'))
model.add(Dense(400, activation='relu'))
model.add(Dense(300, activation='relu'))
model.add(Dense(200,init='normal', activation='softmax')) #outputlayer
model.compile(loss='categorical_crossentropy', optimizer='adam',metrics=['accuracy','roc_curve','auc'])
but it's giving this error:
Exception: Invalid metric: roc_curve
How should I add ROC, AUC to keras?
Due to that you can't calculate ROC&AUC by mini-batches, you can only calculate it on the end of one epoch. There is a solution from jamartinh, I patch the code below for convenience:
from sklearn.metrics import roc_auc_score
from keras.callbacks import Callback
class RocCallback(Callback):
def __init__(self,training_data,validation_data):
self.x = training_data[0]
self.y = training_data[1]
self.x_val = validation_data[0]
self.y_val = validation_data[1]
def on_train_begin(self, logs={}):
return
def on_train_end(self, logs={}):
return
def on_epoch_begin(self, epoch, logs={}):
return
def on_epoch_end(self, epoch, logs={}):
y_pred_train = self.model.predict_proba(self.x)
roc_train = roc_auc_score(self.y, y_pred_train)
y_pred_val = self.model.predict_proba(self.x_val)
roc_val = roc_auc_score(self.y_val, y_pred_val)
print('\rroc-auc_train: %s - roc-auc_val: %s' % (str(round(roc_train,4)),str(round(roc_val,4))),end=100*' '+'\n')
return
def on_batch_begin(self, batch, logs={}):
return
def on_batch_end(self, batch, logs={}):
return
roc = RocCallback(training_data=(X_train, y_train),
validation_data=(X_test, y_test))
model.fit(X_train, y_train,
validation_data=(X_test, y_test),
callbacks=[roc])
A more hackable way using tf.contrib.metrics.streaming_auc:
import numpy as np
import tensorflow as tf
from sklearn.metrics import roc_auc_score
from sklearn.datasets import make_classification
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import np_utils
from keras.callbacks import Callback, EarlyStopping
# define roc_callback, inspired by https://github.com/keras-team/keras/issues/6050#issuecomment-329996505
def auc_roc(y_true, y_pred):
# any tensorflow metric
value, update_op = tf.contrib.metrics.streaming_auc(y_pred, y_true)
# find all variables created for this metric
metric_vars = [i for i in tf.local_variables() if 'auc_roc' in i.name.split('/')[1]]
# Add metric variables to GLOBAL_VARIABLES collection.
# They will be initialized for new session.
for v in metric_vars:
tf.add_to_collection(tf.GraphKeys.GLOBAL_VARIABLES, v)
# force to update metric values
with tf.control_dependencies([update_op]):
value = tf.identity(value)
return value
# generation a small dataset
N_all = 10000
N_tr = int(0.7 * N_all)
N_te = N_all - N_tr
X, y = make_classification(n_samples=N_all, n_features=20, n_classes=2)
y = np_utils.to_categorical(y, num_classes=2)
X_train, X_valid = X[:N_tr, :], X[N_tr:, :]
y_train, y_valid = y[:N_tr, :], y[N_tr:, :]
# model & train
model = Sequential()
model.add(Dense(2, activation="softmax", input_shape=(X.shape[1],)))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy', auc_roc])
my_callbacks = [EarlyStopping(monitor='auc_roc', patience=300, verbose=1, mode='max')]
model.fit(X, y,
validation_split=0.3,
shuffle=True,
batch_size=32, nb_epoch=5, verbose=1,
callbacks=my_callbacks)
# # or use independent valid set
# model.fit(X_train, y_train,
# validation_data=(X_valid, y_valid),
# batch_size=32, nb_epoch=5, verbose=1,
# callbacks=my_callbacks)
Like you, I prefer using scikit-learn's built in methods to evaluate AUROC. I find that the best and easiest way to do this in keras is to create a custom metric. If tensorflow is your backend, implementing this can be done in very few lines of code:
import tensorflow as tf
from sklearn.metrics import roc_auc_score
def auroc(y_true, y_pred):
return tf.py_func(roc_auc_score, (y_true, y_pred), tf.double)
# Build Model...
model.compile(loss='categorical_crossentropy', optimizer='adam',metrics=['accuracy', auroc])
Creating a custom Callback as mentioned in other answers will not work for your case since your model has multiple ouputs, but this will work. Additionally, this methods allows the metric to be evaluated on both training and validation data whereas a keras callback does not have access to the training data and can thus only be used to evaluate performance on the training data.
The following solution worked for me:
import tensorflow as tf
from keras import backend as K
def auc(y_true, y_pred):
auc = tf.metrics.auc(y_true, y_pred)[1]
K.get_session().run(tf.local_variables_initializer())
return auc
model.compile(loss="binary_crossentropy", optimizer='adam', metrics=[auc])
I solved my problem this way
consider you have testing dataset x_test for features and y_test for its corresponding targets.
first we predict targets from feature using our trained model
y_pred = model.predict_proba(x_test)
then from sklearn we import roc_auc_score function and then simple pass the original targets and predicted targets to the function.
roc_auc_score(y_test, y_pred)
You can monitor auc during training by providing metrics the following way:
METRICS = [
keras.metrics.TruePositives(name='tp'),
keras.metrics.FalsePositives(name='fp'),
keras.metrics.TrueNegatives(name='tn'),
keras.metrics.FalseNegatives(name='fn'),
keras.metrics.BinaryAccuracy(name='accuracy'),
keras.metrics.Precision(name='precision'),
keras.metrics.Recall(name='recall'),
keras.metrics.AUC(name='auc'),
]
model = keras.Sequential([
keras.layers.Dense(16, activation='relu', input_shape=(train_features.shape[-1],)),
keras.layers.Dense(1, activation='sigmoid'),
])
model.compile(
optimizer=keras.optimizers.Adam(lr=1e-3)
loss=keras.losses.BinaryCrossentropy(),
metrics=METRICS)
for a more detailed tutorial see:
https://www.tensorflow.org/tutorials/structured_data/imbalanced_data
'roc_curve','auc' are not standard metrics you can't pass them like that to metrics variable, this is not allowed.
You can pass something like 'fmeasure' which is a standard metric.
Review the available metrics here: https://keras.io/metrics/
You may also want to have a look at making your own custom metric: https://keras.io/metrics/#custom-metrics
Also have a look at generate_results method mentioned in this blog for ROC, AUC...
https://vkolachalama.blogspot.in/2016/05/keras-implementation-of-mlp-neural.html
Adding to above answers, I got the error "ValueError: bad input shape ...", so I specify the vector of probabilities as follows:
y_pred = model.predict_proba(x_test)[:,1]
auc = roc_auc_score(y_test, y_pred)
print(auc)
Set your model architecture with tf.keras.metrics.AUC():
Read the Keras documentation on Classification metrics based on True/False positives & negatives.
def model_architecture_ann(in_dim,lr=0.0001):
model = Sequential()
model.add(Dense(512, input_dim=X_train_filtered.shape[1], activation='relu'))
model.add(Dense(1, activation='sigmoid'))
opt = keras.optimizers.SGD(learning_rate=0.001)
auc=tf.keras.metrics.AUC()
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=[tf.keras.metrics.AUC(name='auc')])
model.summary()
return model