I have implemented the following metric to look at Precision and Recall of the classes I deem relevant.
metrics=[tf.keras.metrics.Recall(class_id=1, name='Bkwd_R'),tf.keras.metrics.Recall(class_id=2, name='Fwd_R'),tf.keras.metrics.Precision(class_id=1, name='Bkwd_P'),tf.keras.metrics.Precision(class_id=2, name='Fwd_P')]
How can I implement the same in Tensorflow 2.5 for F1 score (i.e specifically for class 1 and class 2, and not class 0, without a custom function.
Update
Using this metric setup:
tfa.metrics.F1Score(num_classes = 3, average = None, name = f1_name)
I get the following during training:
13367/13367 [==============================] 465s 34ms/step - loss: 0.1683 - f1_score: 0.5842 - val_loss: 0.0943 - val_f1_score: 0.3314
and when I do model.evaluate:
224/224 [==============================] - 11s 34ms/step - loss: 0.0665 - f1_score: 0.3325
and the scoring =
Score: [0.06653735041618347, array([0.99740255, 0. , 0. ], dtype=float32)]
The problem is that this is training based on the average, but I would like to train on the F1 score of a sensible averaging/each of the last two values/classes in the array (which are 0 in this case)
Edit
Will accept a non tensorflow specific function that gives the desired result (with full function and call during fit code) but was really hoping for something using the exisiting tensorflow code if it exists)
You can have a look at https://www.tensorflow.org/addons/api_docs/python/tfa/metrics/F1Score in tensorflow-addons package.
Specifically, if you need a per-class score, you need to set the average param to None, or macro.
As is mentioned in David Harris' comment, a neural network model is trained on loss functions, not on metric scores. Losses help drive the model towards a solution to provide accurate labels via backpropagation. Metrics help to provide a comparable evaluation of that model's performance that are a lot more human-legible.
So, that being said, I feel like what you're saying in your question is that "there are three classes, and I want the model to care more about the last two of the three". I want to
IF that's the case, one approach you can take is to weight your samples by label. Let's say that you have labels in an array y_train.
# Which classes are you wanting to focus on
classes_i_care_about = [1, 2]
# Initialize all weights to 1.0
sample_weights = np.ones(shape=(len(y_train),))
# Give the classes you care about 50% more weight
sample_weight[np.isin(y_train, classes_i_care_about)] = 1.5
...
model.fit(
x=X_train,
y=y_train,
sample_weight=sample_weight,
epochs=5
)
This is the best advice I can offer without knowing more. If you're looking for other info on how you can have your model do better on certain classes, other info could be useful, such as:
What's the proportions of labels in your dataset?
What is the last layer of your model architecture? Dense(3, activation="softmax")?
What loss are you using?
Here's a more complete, reproducible example that shows what I'm talking about with the sample weights:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam
import tensorflow_addons as tfa
iris_data = load_iris() # load the iris dataset
x = iris_data.data
y_ = iris_data.target.reshape(-1, 1) # Convert data to a single column
# One Hot encode the class labels
encoder = OneHotEncoder(sparse=False)
y = encoder.fit_transform(y_)
# Split the data for training and testing
train_x, test_x, train_y, test_y = train_test_split(x, y, test_size=0.20)
# Build the model
def get_model():
model = Sequential()
model.add(Dense(10, input_shape=(4,), activation='relu', name='fc1'))
model.add(Dense(10, activation='relu', name='fc2'))
model.add(Dense(3, activation='softmax', name='output'))
# Adam optimizer with learning rate of 0.001
optimizer = Adam(lr=0.001)
model.compile(
optimizer,
loss='categorical_crossentropy',
metrics=[
'accuracy',
tfa.metrics.F1Score(
num_classes=3,
average=None,
)
]
)
return model
model = get_model()
model.fit(
train_x,
train_y,
verbose=2,
batch_size=5,
epochs=25,
)
results = model.evaluate(test_x, test_y)
print('Final test set loss: {:4f}'.format(results[0]))
print('Final test set accuracy: {:4f}'.format(results[1]))
print('Final test F1 scores: {}'.format(results[2]))
Final test set loss: 0.585964
Final test set accuracy: 0.633333
Final test F1 scores: [1. 0.15384616 0.6206897 ]
Now, we add weight to classes 1 and 2:
sample_weight = np.ones(shape=(len(train_y),))
sample_weight[
(train_y[:, 1] == 1) | (train_y[:, 2] == 1)
] = 1.5
model = get_model()
model.fit(
train_x,
train_y,
sample_weight=sample_weight,
verbose=2,
batch_size=5,
epochs=25,
)
results = model.evaluate(test_x, test_y)
print('Final test set loss: {:4f}'.format(results[0]))
print('Final test set accuracy: {:4f}'.format(results[1]))
print('Final test F1 scores: {}'.format(results[2]))
Final test set loss: 0.437623
Final test set accuracy: 0.900000
Final test F1 scores: [1. 0.8571429 0.8571429]
Here, the model has emphasized learning these, and their respective performance is improved.
Related
I have a regression neural network with ten input features and three outputs. But all ten features do not have the same importance in loss function calculation (mean square error). So I want to define specific coefficients for each input feature to increase their role in the loss function.
Consider we define coefficients in an array: coeff=[5,20,2,1,4,5,6,2,9,15]. When mean squared error is measuring the distances of input features, for example, if the distance of the second feature is '60', this distance is multiplied by coefficient '20' from coeff array.
I guess I need to define a custom loss function, but how to pass the defined "coeff" array and multiply its elements with input features?
Updated
I guess my idea is similar to this code and this code, but I am not sure. however, I was unable to run the first one and got errors.
from numpy import mean
from numpy import std
from sklearn.datasets import make_regression
from sklearn.model_selection import RepeatedKFold
from keras.models import Sequential
from keras.layers import Dense
# get the dataset
def get_dataset():
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=3, random_state=2)
return X, y
# get the model
def get_model(n_inputs, n_outputs):
model = Sequential()
model.add(Dense(20, input_dim=n_inputs, kernel_initializer='he_uniform', activation='relu'))
model.add(Dense(n_outputs))
model.compile(loss='mse', optimizer='adam')
return model
# evaluate a model using repeated k-fold cross-validation
def evaluate_model(X, y):
results = list()
n_inputs, n_outputs = X.shape[1], y.shape[1]
# define evaluation procedure
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
# enumerate folds
for train_ix, test_ix in cv.split(X):
# prepare data
X_train, X_test = X[train_ix], X[test_ix]
y_train, y_test = y[train_ix], y[test_ix]
# define model
model = get_model(n_inputs, n_outputs)
# fit model
model.fit(X_train, y_train, verbose=0, epochs=100)
# evaluate model on test set
mse = model.evaluate(X_test, y_test, verbose=0)
# store result
print('>%.3f' % mse)
results.append(mse)
return results
# load dataset
X, y = get_dataset()
# evaluate model
results = evaluate_model(X, y)
# summarize performance
print('MSE: %.3f (%.3f)' % (mean(results), std(results)))
If you use the functional api, then you could add a custom loss function with the model.add_loss function, within the model. Your loss function can then use the model inputs and outputs and anything in your model.
The problem with this approach is, that in the model you don't have the 'true' y values. So you would need to add an additional input to your model, and pass the y values to the model – but just for the loss calculation.
Something like this:
inputs = Input(shape=(n_inputs))
x = Dense(20, ...)(model_inputs)
outputs = Dense(n_outputs)(x)
y_true = Input(shape=(n_outputs))
modelx = Model(inputs=[inputs, y_true], outputs=outputs)
modelx.add_loss(your_loss_function(y_true=y_true, y_pred=outputs, inputs=inputs)
Since you already added the loss to the model, you compile it without any loss:
modelx.compile(loss=None, optimizer='adam')
When you fit the model, you need to pass the y values to the model inputs.
modelx.fit(x=[X_train, y_train], y=y_train, verbose=0, epochs=100)
When you want a model with just the X values as input, for example for prediction, you can create it like so:
model = Model(modelx.input[0], modelx.output)
I was practicing the keras classification for imbalanced data. I followed the official example:
https://keras.io/examples/structured_data/imbalanced_classification/
and used the scikit-learn api to do cross-validation.
I have tried the model with different parameter.
However, all the times one of the 3 folds has value 0.
eg.
results [0.99242424 0.99236641 0. ]
What am I doing wrong?
How to get ALL THREE validation recall values of order "0.8"?
MWE
%%time
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from sklearn.model_selection import train_test_split
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import StratifiedKFold
import os
import random
SEED = 100
os.environ['PYTHONHASHSEED'] = str(SEED)
np.random.seed(SEED)
random.seed(SEED)
tf.random.set_seed(SEED)
# load the data
ifile = "https://github.com/bhishanpdl/Datasets/blob/master/Projects/Fraud_detection/raw/creditcard.csv.zip?raw=true"
df = pd.read_csv(ifile,compression='zip')
# train test split
target = 'Class'
Xtrain,Xtest,ytrain,ytest = train_test_split(df.drop([target],axis=1),
df[target],test_size=0.2,stratify=df[target],random_state=SEED)
print(f"Xtrain shape: {Xtrain.shape}")
print(f"ytrain shape: {ytrain.shape}")
# build the model
def build_fn(n_feats):
model = keras.models.Sequential()
model.add(keras.layers.Dense(256, activation="relu", input_shape=(n_feats,)))
model.add(keras.layers.Dense(256, activation="relu"))
model.add(keras.layers.Dropout(0.3))
model.add(keras.layers.Dense(256, activation="relu"))
model.add(keras.layers.Dropout(0.3))
# last layer is dense 1 for binary sigmoid
model.add(keras.layers.Dense(1, activation="sigmoid"))
# compile
model.compile(loss='binary_crossentropy',
optimizer=keras.optimizers.Adam(1e-2),
metrics=['Recall'])
return model
# fitting the model
n_feats = Xtrain.shape[-1]
counts = np.bincount(ytrain)
weight_for_0 = 1.0 / counts[0]
weight_for_1 = 1.0 / counts[1]
class_weight = {0: weight_for_0, 1: weight_for_1}
FIT_PARAMS = {'class_weight' : class_weight}
clf_keras = KerasClassifier(build_fn=build_fn,
n_feats=n_feats, # custom argument
epochs=30,
batch_size=2048,
verbose=2)
skf = StratifiedKFold(n_splits=3, shuffle=True, random_state=SEED)
results = cross_val_score(clf_keras, Xtrain, ytrain,
cv=skf,
scoring='recall',
fit_params = FIT_PARAMS,
n_jobs = -1,
error_score='raise'
)
print('results', results)
Result
Xtrain shape: (227845, 30)
ytrain shape: (227845,)
results [0.99242424 0.99236641 0. ]
CPU times: user 3.62 s, sys: 117 ms, total: 3.74 s
Wall time: 5min 15s
Problem
I am getting the third recall as 0. I am expecting it of the order 0.8, how to make sure all three values are around 0.8 or more?
MilkyWay001,
You have chosen to use sklearn wrappers for your model - they have benefits, but the model training process is hidden. Instead, I trained the model separately with validation dataset added. The code for this would be:
clf_1 = KerasClassifier(build_fn=build_fn,
n_feats=n_feats)
clf_1.fit(Xtrain, ytrain, class_weight=class_weight,
validation_data=(Xtest, ytest),
epochs=30,batch_size=2048,
verbose=1)
In the Model.fit() output it is clearly seen that while loss metric goes down, recall is not stable. This lead to poor performance in CV reflected in zeros in CV results, as you observed.
I fixed this by reducing learning rate to just 0.0001. While it is 100 times less than yours - it reaches 98% recall on train and 100% (or close) on test in just 10 epochs.
Your code needs just one fix to achieve stable results: change LR to much lower one, like 0.0001:
optimizer=keras.optimizers.Adam(1e-4),
You can experiment with LR in the range < 0.001.
For reference, with LR 0.0001 I got:
results [0.99242424 0.97709924 1. ]
Good luck!
PS: thanks for inluding compact and complete MWE
I am trying to change a CNN classification model to a CNN regression model. The classification model had some press statements as an input and the change (0 for negative return on the release day and 1 for positive change) of an Index as the second variable. Now I am trying to change the model from a classification to a regression in the end, so that I can work with the actual returns and not with the binary classification.
So my input in the neural network looks like this:
document VIX 1d
1999-05-18 Release Date: May 18, 1999\n\nFor immediate re... -0.010526
1999-06-30 Release Date: June 30, 1999\n\nFor immediate r... -0.082645
1999-08-24 Release Date: August 24, 1999\n\nFor immediate... -0.043144
(document will tokenizes before going in the NN, just that you have an example)
I changed so far the following parameters:
- loss function is now the mean squared error (before: binary cross entropy) , the activation of the last layer now linear (before: sigmoid) and the metrics to mse (before: acc)
Below you can see my code:
all_words = [word for tokens in X for word in tokens]
all_sentence_lengths = [len(tokens) for tokens in X]
ALL_VOCAB = sorted(list(set(all_words)))
print("%s words total, with a vocabulary size of %s" % (len(all_words), len(ALL_VOCAB)))
print("Max sentence length is %s" % max(all_sentence_lengths))
####################### CHANGE THE PARAMETERS HERE #####################################
EMBEDDING_DIM = 300 # how big is each word vector
MAX_VOCAB_SIZE = 1893# how many unique words to use (i.e num rows in embedding vector)
MAX_SEQUENCE_LENGTH = 1086 # max number of words in a comment to use
tokenizer = Tokenizer(num_words=MAX_VOCAB_SIZE, lower=True, char_level=False)
tokenizer.fit_on_texts(change_df["document"].tolist())
training_sequences = tokenizer.texts_to_sequences(X_train.tolist())
train_word_index = tokenizer.word_index
print('Found %s unique tokens.' % len(train_word_index))
train_embedding_weights = np.zeros((len(train_word_index)+1, EMBEDDING_DIM))
for word,index in train_word_index.items():
train_embedding_weights[index,:] = w2v_model[word] if word in w2v_model else np.random.rand(EMBEDDING_DIM)
print(train_embedding_weights.shape)
######################## TRAIN AND TEST SET #################################
train_cnn_data = pad_sequences(training_sequences, maxlen=MAX_SEQUENCE_LENGTH)
test_sequences = tokenizer.texts_to_sequences(X_test.tolist())
test_cnn_data = pad_sequences(test_sequences, maxlen=MAX_SEQUENCE_LENGTH)
def ConvNet(embeddings, max_sequence_length, num_words, embedding_dim, trainable=False, extra_conv=True):
embedding_layer = Embedding(num_words,
embedding_dim,
weights=[embeddings],
input_length=max_sequence_length,
trainable=trainable)
sequence_input = Input(shape=(max_sequence_length,), dtype='int32')
embedded_sequences = embedding_layer(sequence_input)
# Yoon Kim model (https://arxiv.org/abs/1408.5882)
convs = []
filter_sizes = [3, 4, 5]
for filter_size in filter_sizes:
l_conv = Conv1D(filters=128, kernel_size=filter_size, activation='relu')(embedded_sequences)
l_pool = MaxPooling1D(pool_size=3)(l_conv)
convs.append(l_pool)
l_merge = concatenate([convs[0], convs[1], convs[2]], axis=1)
# add a 1D convnet with global maxpooling, instead of Yoon Kim model
conv = Conv1D(filters=128, kernel_size=3, activation='relu')(embedded_sequences)
pool = MaxPooling1D(pool_size=3)(conv)
if extra_conv == True:
x = Dropout(0.5)(l_merge)
else:
# Original Yoon Kim model
x = Dropout(0.5)(pool)
x = Flatten()(x)
x = Dense(128, activation='relu')(x)
preds = Dense(1, activation='linear')(x)
model = Model(sequence_input, preds)
model.compile(loss='mean_squared_error',
optimizer='adadelta',
metrics=['mse'])
model.summary()
return model
x_train = train_cnn_data
y_tr = y_train
x_test = test_cnn_data
model = ConvNet(train_embedding_weights, MAX_SEQUENCE_LENGTH, len(train_word_index)+1, EMBEDDING_DIM, False)
#define callbacks
early_stopping = EarlyStopping(monitor='val_loss', min_delta=0.01, patience=4, verbose=1)
callbacks_list = [early_stopping]
hist = model.fit(x_train, y_tr, epochs=5, batch_size=33, validation_split=0.1, shuffle=True, callbacks=callbacks_list)
y_tes=model.predict(x_test, batch_size=33, verbose=1)
Does someone has an idea what else should I change as the code is working, but I have very poor results I think.. Like running the code gives me the following result:
Epoch 5/5
33/118 [=======>......................] - ETA: 15s - loss: 0.0039 - mse: 0.0039
66/118 [===============>..............] - ETA: 9s - loss: 0.0031 - mse: 0.0031
99/118 [========================>.....] - ETA: 3s - loss: 0.0034 - mse: 0.0034
118/118 [==============================] - 22s 189ms/step - loss: 0.0035 - mse: 0.0035 - val_loss: 0.0060 - val_mse: 0.0060
Or at least a source where I can read something? I just find some classification CNNs on the web, but no example actually NLP CNN with a regression.
Thanks a lot,
Lukas
This is a great example. Copy/paste the code, load the datasets; it should answer all of your questions.
# Classification with Tensorflow 2.0
import pandas as pd
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
# %matplotlib inline
import seaborn as sns
sns.set(style="darkgrid")
cols = ['price', 'maint', 'doors', 'persons', 'lug_capacity', 'safety', 'output']
cars = pd.read_csv(r'C:\\your_path\\cars_dataset.csv', names=cols, header=None)
cars.head()
price = pd.get_dummies(cars.price, prefix='price')
maint = pd.get_dummies(cars.maint, prefix='maint')
doors = pd.get_dummies(cars.doors, prefix='doors')
persons = pd.get_dummies(cars.persons, prefix='persons')
lug_capacity = pd.get_dummies(cars.lug_capacity, prefix='lug_capacity')
safety = pd.get_dummies(cars.safety, prefix='safety')
labels = pd.get_dummies(cars.output, prefix='condition')
# To create our feature set, we can merge the first six columns horizontally:
X = pd.concat([price, maint, doors, persons, lug_capacity, safety] , axis=1)
# Let's see how our label column looks now:
labels.head()
y = labels.values
# The final step before we can train our TensorFlow 2.0 classification model is to divide the dataset into training and test sets:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)
# Model Training
# To train the model, let's import the TensorFlow 2.0 classes. Execute the following script:
from tensorflow.keras.layers import Input, Dense, Activation,Dropout
from tensorflow.keras.models import Model
# The next step is to create our classification model:
input_layer = Input(shape=(X.shape[1],))
dense_layer_1 = Dense(15, activation='relu')(input_layer)
dense_layer_2 = Dense(10, activation='relu')(dense_layer_1)
output = Dense(y.shape[1], activation='softmax')(dense_layer_2)
model = Model(inputs=input_layer, outputs=output)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])
# The following script shows the model summary:
print(model.summary())
# Result:
# Model: "model"
# Layer (type) Output Shape Param #
# Finally, to train the model execute the following script:
history = model.fit(X_train, y_train, batch_size=8, epochs=50, verbose=1, validation_split=0.2)
# Result:
# Train on 7625 samples, validate on 1907 samples
# Epoch 1/50
# - 4s 492us/sample - loss: 3.0998 - acc: 0.2658 - val_loss: 12.4542 - val_acc: 0.0834
# Let's finally evaluate the performance of our classification model on the test set:
score = model.evaluate(X_test, y_test, verbose=1)
print("Test Score:", score[0])
print("Test Accuracy:", score[1])
# Result:
# Regression with TensorFlow 2.0
petrol_cons = pd.read_csv(r'C:\\your_path\\gas_consumption.csv')
# Let's print the first five rows of the dataset via the head() function:
petrol_cons.head()
X = petrol_cons.iloc[:, 0:4].values
y = petrol_cons.iloc[:, 4].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Model Training
# The next step is to train our model. This is process is quite similar to training the classification. The only change will be in the loss function and the number of nodes in the output dense layer. Since now we are predicting a single continuous value, the output layer will only have 1 node.
input_layer = Input(shape=(X.shape[1],))
dense_layer_1 = Dense(100, activation='relu')(input_layer)
dense_layer_2 = Dense(50, activation='relu')(dense_layer_1)
dense_layer_3 = Dense(25, activation='relu')(dense_layer_2)
output = Dense(1)(dense_layer_3)
model = Model(inputs=input_layer, outputs=output)
model.compile(loss="mean_squared_error" , optimizer="adam", metrics=["mean_squared_error"])
# Finally, we can train the model with the following script:
history = model.fit(X_train, y_train, batch_size=2, epochs=100, verbose=1, validation_split=0.2)
# Result:
# Train on 30 samples, validate on 8 samples
# Epoch 1/100
# To evaluate the performance of a regression model on test set, one of the most commonly used metrics is root mean squared error. We can find mean squared error between the predicted and actual values via the mean_squared_error class of the sklearn.metrics module. We can then take square root of the resultant mean squared error. Look at the following script:
from sklearn.metrics import mean_squared_error
from math import sqrt
pred_train = model.predict(X_train)
print(np.sqrt(mean_squared_error(y_train,pred_train)))
# Result:
# 57.398156439652396
pred = model.predict(X_test)
print(np.sqrt(mean_squared_error(y_test,pred)))
# Result:
# 86.61012708343948
# https://stackabuse.com/tensorflow-2-0-solving-classification-and-regression-problems/
# datasets:
# https://www.kaggle.com/elikplim/car-evaluation-data-set
# for OLS analysis
import statsmodels.api as sm
model = sm.OLS(y, X)
results = model.fit()
print(results.summary())
# Results:
OLS Regression Results
=======================================================================================
Dep. Variable: y R-squared (uncentered): 0.987
Model: OLS Adj. R-squared (uncentered): 0.986
Method: Least Squares F-statistic: 867.8
Date: Thu, 09 Apr 2020 Prob (F-statistic): 3.17e-41
Time: 13:13:11 Log-Likelihood: -269.00
No. Observations: 48 AIC: 546.0
Df Residuals: 44 BIC: 553.5
Df Model: 4
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
x1 -14.2390 8.414 -1.692 0.098 -31.196 2.718
x2 -0.0594 0.017 -3.404 0.001 -0.095 -0.024
x3 0.0012 0.003 0.404 0.688 -0.005 0.007
x4 1630.8913 130.969 12.452 0.000 1366.941 1894.842
==============================================================================
Omnibus: 9.750 Durbin-Watson: 2.226
Prob(Omnibus): 0.008 Jarque-Bera (JB): 9.310
Skew: 0.880 Prob(JB): 0.00952
Kurtosis: 4.247 Cond. No. 1.00e+05
==============================================================================
data sources:
https://www.kaggle.com/elikplim/car-evaluation-data-set
https://drive.google.com/file/d/1mVmGNx6cbfvRHC_DvF12ZL3wGLSHD9f_/view
Maybe two more questions:
1. You are getting quite high numbers for the regression root mean squared error. (57.39 and 86.61) & I get (for my dataset) 0.0851 (train) and 0.1169 (test). Seems that my values are quite good, right? The lower mean root mean squared error, the better or? I had my statistics class quite a while ago... :D
2. Do you maybe even know (or maybe you have an example), how I would have to implement another variable in the regression in a neural network? In my case, I have text data and returns I want to predict. I would like to include some macroeconomic (control)variables as well..
Thanks!
I am using LSTM to generate news headlines. It should predict the next character base on the previous characters in the sequence. I have a file of over one million news headlines, but I've chosen to look at 100,000 of them randomly selected for speed reasons.
When I try to train my model, in just the first epoch it reaches 1.0 validation accuracy and 0.9986 training accuracy. This certainly can't be correct. I don't think it is a lack of data that is the issue because 90000 training data points should be more than enough. This seems like more than your basic overfitting. It also takes what seems to be an excessive amount of time (about 2.5 minutes for each epoch), but I have never worked with LSTMs before so I'm not sure what to expect as far as train time. What could be causing my model to perform like this?
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Import Libraries Section
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
import csv
import numpy as np
from sklearn.model_selection import train_test_split
from keras.preprocessing.text import Tokenizer
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Embedding, LSTM, Dropout, Dense
import datetime
import matplotlib.pyplot as plt
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Load Data Section
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
headlinesFull = []
with open("abcnews-date-text.csv", "r") as csv_file:
csv_reader = csv.DictReader(csv_file, delimiter=',')
for lines in csv_reader:
headlinesFull.append(lines['headline_text'])
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Pretreat Data Section
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
# shuffle and select 100000 headlines
np.random.shuffle(headlinesFull)
headlines = headlinesFull[:100000]
# add spaces to make ensure each headline is the same length as the longest headline
max_len = max(map(len, headlines))
headlines = [i + " "*(max_len-len(i)) for i in headlines]
# integer encode sequences of words
# create the tokenizer
t = Tokenizer(char_level=True)
# fit the tokenizer on the headlines
t.fit_on_texts(headlines)
sequences = t.texts_to_sequences(headlines)
# vocabulary size
vocab_size = len(t.word_index) + 1
# separate into input and output
sequences = np.array(sequences)
X, y = sequences[:,:-1], sequences[:,-1]
y = to_categorical(y, num_classes=vocab_size)
seq_len = X.shape[1]
# split data for validation
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Define Model Section
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
# define model
model = Sequential()
model.add(Embedding(vocab_size, 50, input_length=seq_len))
model.add(LSTM(100, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(100))
model.add(Dropout(0.2))
model.add(Dense(100, activation='relu'))
model.add(Dense(vocab_size, activation='softmax'))
print(model.summary())
# compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Train Model Section
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
# fit model
model.fit(X_train, y_train, validation_data=(X_test, y_test), batch_size=128, epochs=1)
Train on 90000 samples, validate on 10000 samples
Epoch 1/1
90000/90000 [==============================] - 161s 2ms/step - loss: 0.0493 - acc: 0.9986 - val_loss: 2.3842e-07 - val_acc: 1.0000
From observing the code, what I could infer is that,
You are using space as the filler string to match the maximum
headline length, headlines = [i + " "*(max_len-len(i)) for i in headlines]
The headlines are converted to sequences and input-output split is done only after making all the headlines to the maximum length.
So, for most of the input, the last word or the output (or last
numeric sequence) will be the same filler and that's why you are
getting this much accuracy even after one epoch.
Solution:
You can add the fillers at the start of the headline instead of appending at the end.
headlines = [" "*(max_len-len(i)) + i for i in headlines]
Or, add the fillers at the end of each input, after splitting the headlines into X and Y.
I am new to keras and I want to train the model with F1-score as my metrics.
I came across two things, one is that I can add callbacks and other is using the in built metrics function
Here, it says that the metrics function will not be used for training the model. So, does that mean I can anything in metrics argument while compiling the model?
Specfically,
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
In the above case even though accuracy is passed as metrics, it will not be used for training the model.
Second thing is to use callbacks as defined here,
import numpy as np
from keras.callbacks import Callback
from sklearn.metrics import confusion_matrix, f1_score, precision_score, recall_score
class Metrics(Callback):
def on_train_begin(self, logs={}):
self.val_f1s = []
self.val_recalls = []
self.val_precisions = []
def on_epoch_end(self, epoch, logs={}):
val_predict = (np.asarray(self.model.predict(self.model.validation_data[0]))).round()
val_targ = self.model.validation_data[1]
_val_f1 = f1_score(val_targ, val_predict)
_val_recall = recall_score(val_targ, val_predict)
_val_precision = precision_score(val_targ, val_predict)
self.val_f1s.append(_val_f1)
self.val_recalls.append(_val_recall)
self.val_precisions.append(_val_precision)
print “ — val_f1: %f — val_precision: %f — val_recall %f” %(_val_f1, _val_precision, _val_recall)
return
metrics = Metrics()
Then fit the model,
model.fit(training_data, training_target,
validation_data=(validation_data, validation_target),
nb_epoch=10,
batch_size=64,
callbacks=[metrics])
I am not sure if this will train the model on f1 score.
You can't train a neural network with f1-scores. For back propagating the error during training you need some sort of function which tells you, how far away your prediction is from the expected value. Such a function is as example the MSE loss.
F1 score on the other hand is just the harmonic mean between precision and recall from your samples. It does not tell you, in which direction you have to update the weights in order to get a better model. It also does not tell you, how far away you prediction is from the expected value.
What you could do is to print the F1 score after every epoch. An example on how to do this can be found in this blogpost