Keras neural network accuracy problem (100% in few epochs)

Keras neural network accuracy problem (100% in few epochs) - python

I'm writing here, hoping to solve a problem, that i had with a neural network, developed in python by using keras.
I'm newer with the deep learning world, and I'm studying the theory and trying to implement some code.
Goal: develop a net that allows me to recognize 2 different words (commands) that i say [in the future them will be used to drive a small robot-car]. Actually, I want only to achieve the identification of "yes/no"
Implementation: i'm trying to implement a binary classification network.
Here is my code:
i used librosa to convert the audio training and test set in a matrix input with 193 features
to overcame the possibility of batch normalization problem, i scaled the data, by using the preprocessing package (I saw that effectively that improves and affects performance): i notice that if I don't normalize training, test and data_to_be_anlyzed by using the same normalization, it doesn't work
I read that keras accept as input that can be both numpy array, so i convert the target y into numpy
i proceed with the construction of the model, training and test (i know that i'm using methods that actually are deprecated)
I use one audio to perform another text, because for the future i assume that the net will receive (and judge) one audio at-a-time
import support.myutilities as utilNP
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from sklearn.preprocessing import StandardScaler
#READ AND BUILD INPUT
X_si = utilNP.np_array_features_dir('DIRECTORY PATH')
X_no = utilNP.np_array_features_dir('DIRECTORY PATH')
X_tot = np.concatenate((X_si, X_no), axis=0)
# Scale the train set
scaler = StandardScaler().fit(X_tot)
X_train = scaler.transform(X_tot)
#0=si 1=no
y=[]
for i in range(len(X_si)):
y.append(0)
for i in range(len(X_no)):
y.append(1)
Y=np.array(y)
#READ AND BUIL TEST TRAINING SET
X_si_test = utilNP.np_array_features_dir('DIRECTORY PATH')
X_no_test = utilNP.np_array_features_dir('DIRECTORY PATH')
X_tot_test = np.concatenate((X_si_test, X_no_test), axis=0)
# Scale the test set
scaler2 = StandardScaler().fit(X_tot_test)
X_test = scaler2.transform(X_tot_test)
y_test=[]
for i in range(len(X_si_test)):
y_test.append(0)
for i in range(len(X_no_test)):
y_test.append(1)
Y_test=np.array(y_test)
###### BUILD MODEL
model = Sequential()
model.add(Dense(100, input_dim=len(X_tot[0]), activation='relu')) #193 features as input
model.add(Dense(50, activation='relu'))
model.add(Dense(1, activation='sigmoid')) #1 output
# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['acc'])
model.fit(X_train, Y, epochs=300, verbose=1)
#test
scores = model.evaluate(X_test, Y_test, verbose=0)
print('Accuracy on training data: {}% \n Error on training data: {}'.format(scores[1], 1 - scores[1]))
predictions = model.predict(X_test)
for i in range(len(predictions)):
print('=> %d (expected %d)' % (predictions[i], y_test[i]))
#TEST WITH A PRACTICAL NEW SOUND: supposed acquired
file_name = 'PATH AUDIO'
X = utilNP.np_array_features(file_name)
#Normalize according to input data
X_analyze = scaler2.transform(X)
y_analysis=[]
y_analysis.append(1) # i supposed that the audio is the one that return 1
pred_test= model.predict(X_analyze)
scores2 = model.evaluate(X_analyze, np.array(y_analysis), verbose=0)
print('Accuracy on test data: {}% \n Error on test data: {}'.format(scores2[1], 1 - scores2[1]))
Problems:
accuracy go in 100% in very few epochs. Is real that the training set is not so big (a total of 300 samples, and 40 for test), but this result is clearly wrong. By the way, if I use a number of epochs > 100, the net works well and performs its work good (practically the result of the single case study, is recognized)
if the number of epochs is low (20 for example), accuracy still reach 100% after few iterations, but the training is affect by the error in the results (why are not recognized?) and the final prediction wrong. It is not normal: i would expect a low accuracy to justify the wrong answer, but it remains at 100%
I test a lot of solution, passing from setting 'training=True/False', and read a lot of answer here and in stack exchange, but I don't solved nothing.
There is something wrong in my code?
Thanks in advance.

Related

Overfitting on LSTM text classification using Keras

I am trying to develop an LSTM model using Keras, following this tutorial. However, I am implementing it with a different dataset of U.S. political news articles with the aim of classifying them based on a political bias (labels: Left, Centre and Right). I have gotten a model to run with the tutorial, but the loss and accuracy would look very off, like this:
I tried to play around with different DropOut probabilities (i.e. 0.5 instead of 0.2), adding/removing hidden layers (and making them less dense), and decreasing/increasing the max number of words and max sequence length.
I have managed to get the graphs to align a bit more, however, that has led to the model having less accuracy with the training data (and the problem of overfitting is still bad):
Additionally, I am not sure why the validation accuracy always seems to be higher than the model accuracy in the first epoch (shouldn't it usually be lower)?
Here is some code that is being used when tokenizing, padding, and initializing variables:
# The maximum number of words to be used. (most frequent)
MAX_NB_WORDS = 500
# Max number of words in each news article
MAX_SEQUENCE_LENGTH = 100 # I am aware this may be too small
# This is fixed.
EMBEDDING_DIM = 64
tokenizer = Tokenizer(num_words=MAX_NB_WORDS, filters='!"#$%&()*+,-./:;<=>?#[\]^_`{|}~',
lower=True)
tokenizer.fit_on_texts(df_raw['titletext'].values)
word_index = tokenizer.word_index
print('Found %s unique tokens.' % len(word_index))
X = tokenizer.texts_to_sequences(df_raw['titletext'].values)
X = pad_sequences(X, maxlen=MAX_SEQUENCE_LENGTH)
print('Shape of data tensor:', X.shape)
Y = pd.get_dummies(df_raw['label']).values
print('Shape of label tensor:', Y.shape)
X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size = 0.20)
print(X_train.shape,Y_train.shape)
print(X_test.shape,Y_test.shape)
X_train.view()
When I look at what is shown when X_train.view() is executed, I am also not sure why all the arrays start with zeros like this:
I also did a third attempt that was just a second attempt with the number of epochs increased, it looks like this:
Here is the code of the actual model:
model = Sequential()
model.add(Embedding(MAX_NB_WORDS, EMBEDDING_DIM, input_length=X.shape[1]))
# model.add(SpatialDropout1D(0.2)) ---> commented out
# model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2)) ---> commented out
model.add(LSTM(64, dropout=0.2, recurrent_dropout=0.2))
model.add(Dropout(0.5))
model.add(Dense(8))
model.add(Dropout(0.5))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
epochs = 25
batch_size = 64
history = model.fit(X_train, Y_train, epochs=epochs,
batch_size=batch_size,validation_split=0.2,callbacks=[EarlyStopping(monitor='val_loss', patience=3, min_delta=0.0001)])
Here is the link to the full code, including the dataset
Any help would be greatly appreciated!

Hyperparameter adjustments for reducing overfitting in neural networks
Identify and ascertain overfitting. The first attempt shows largely overfitting, with early divergence of your test & train loss. I would try a lower learning rate here (in addition to the steps you took for regularisation with dropout layers). Using the default rate does not guarantee best results.
Allowing your model to find the global mimima / not being stuck in a local minima. On the second attempt, it looks better. However, if the x-axis shows the number of epochs -- it could be that your early stopping is too strict? ie. increase the threshold. Consider other optimisers, including SGD with a learning rate scheduler.
Too large network leads to overfitting on the trainset and difficulty in generalisation. Too many neurons may cause the network to 'memorize' all you trainset and overfit. I would try out 8, 16 or 24 neurons in your LSTM layer for example.
Data preprocessing & cleaning. Check your padding_sequences. It is probably padding the start of each text with zeros. I would pad post text.
Dataset. Depending on the size of your current dataset, I would suggest data augmentation to get to a sizable amount of text of training (empirically >=1M words). I would also try several techniques including feature engineering / improving data quality such as, spell checks. Are the classes imbalanced? You may need to balance them out by over/undersampling.
Consider using transfer learning and incorporate trained language models as your embeddings layer instead of training one from scratch. ie. https://www.gcptutorials.com/post/how-to-create-embedding-with-tensorflow

How to improve accuracy of model for categorical, non-binary, foreign language sentiment analysis in TensorFlow?

TLDR
My aim is to categorize sentences in a foreign language (Hungarian) to 3 sentiment categories: negative, neutral & positive. I would like to improve the accuracy of the model used, which can be found below in the "Define, Compile, Fit the Model" section. The rest of the post is here for completeness and reproducibility.
I am new to asking questions on Machine Learning topics, suggestions are welcome here as well: How to ask a good question on Machine Learning?
Data preparation
For this I have 10000 sentences, given to 5 human annotators, categorized as negative, neutral or positive, available from here. The first few lines look like this:
I categorize the sentence positive (denoted by 2) if sum of the scores by annotators is positive, neutral if it is 0 (denoted by 1), and negative (denoted by 0) if the sum is negative:
import pandas as pd
sentences_df = pd.read_excel('/content/OpinHuBank_20130106.xls')
sentences_df['annotsum'] = sentences_df['Annot1'] +\
sentences_df['Annot2'] +\
sentences_df['Annot3'] +\
sentences_df['Annot4'] +\
sentences_df['Annot5']
def categorize(integer):
if 0 < integer: return 2
if 0 == integer: return 1
else: return 0
sentences_df['sentiment'] = sentences_df['annotsum'].apply(categorize)
Following this tutorial, I use SubwordTextEncoder to proceed. From here, I download web2.2-freq-sorted.top100k.nofreqs.txt, which contains 100000 most frequently used word in the target language. (Both the sentiment data and this data was recommended by this.)
Reading in list of most frequent words:
wordlist = pd.read_csv('/content/web2.2-freq-sorted.top100k.nofreqs.txt',sep='\n',header=None,encoding = 'ISO-8859-1')[0].dropna()
Encoding data, conversion to tensors
Initializing encoder using build_from_corpus method:
import tensorflow_datasets as tfds
encoder = tfds.features.text.SubwordTextEncoder.build_from_corpus(
corpus_generator=(word for word in wordlist), target_vocab_size=2**16)
Building on this, encoding the sentences:
import numpy as np
import tensorflow as tf
def applyencoding(string):
return tf.convert_to_tensor(np.asarray(encoder.encode(string)))
sentences_df['encoded_sentences'] = sentences_df['Sentence'].apply(applyencoding)
Convert to a tensor each sentence's sentiment:
def tensorise(input):
return tf.convert_to_tensor(input)
sentences_df['sentiment_as_tensor'] = sentences_df['sentiment'].apply(tensorise)
Defining how much data to be preserved for testing:
test_fraction = 0.2
train_fraction = 1-test_fraction
From the pandas dataframe, let's create numpy array of encoded sentence train tensors:
nparrayof_encoded_sentence_train_tensors = \
np.asarray(sentences_df['encoded_sentences'][:int(train_fraction*len(sentences_df['encoded_sentences']))])
These tensors have different lengths, so lets use padding to make them have the same:
padded_nparrayof_encoded_sentence_train_tensors = tf.keras.preprocessing.sequence.pad_sequences(
nparrayof_encoded_sentence_train_tensors, padding="post")
Let's stack these tensors together:
stacked_padded_nparrayof_encoded_sentence_train_tensors = tf.stack(padded_nparrayof_encoded_sentence_train_tensors)
Stacking the sentiment tensors together as well:
stacked_nparray_sentiment_train_tensors = \
tf.stack(np.asarray(sentences_df['sentiment_as_tensor'][:int(train_fraction*len(sentences_df['encoded_sentences']))]))
Define, Compile, Fit the Model (ie the main point)
Define & compile the model as follows:
### THE QUESTION IS ABOUT THESE ROWS ###
model = tf.keras.Sequential([
tf.keras.layers.Embedding(encoder.vocab_size, 64),
tf.keras.layers.Conv1D(128, 5, activation='sigmoid'),
tf.keras.layers.GlobalAveragePooling1D(),
tf.keras.layers.Dense(6, activation='sigmoid'),
tf.keras.layers.Dense(3, activation='sigmoid')
])
model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits = True), optimizer='adam', metrics=['accuracy'])
Fit it:
NUM_EPOCHS = 40
history = model.fit(stacked_padded_nparrayof_encoded_sentence_train_tensors,
stacked_nparray_sentiment_train_tensors,
epochs=NUM_EPOCHS)
The first few lines of the output is:
Testing results
As in TensorFlow's RNN tutorial, let's plot the results we gained so far:
import matplotlib.pyplot as plt
def plot_graphs(history):
plt.plot(history.history['accuracy'])
plt.plot(history.history['loss'])
plt.xlabel("Epochs")
plt.ylabel('accuracy / loss')
plt.legend(['accuracy','loss'])
plt.show()
plot_graphs(history)
Which gives us:
Prepare the testing data as we prepared the training data:
nparrayof_encoded_sentence_test_tensors = \
np.asarray(sentences_df['encoded_sentences'][int(train_fraction*len(sentences_df['encoded_sentences'])):])
padded_nparrayof_encoded_sentence_test_tensors = tf.keras.preprocessing.sequence.pad_sequences(
nparrayof_encoded_sentence_test_tensors, padding="post")
stacked_padded_nparrayof_encoded_sentence_test_tensors = tf.stack(padded_nparrayof_encoded_sentence_test_tensors)
stacked_nparray_sentiment_test_tensors = \
tf.stack(np.asarray(sentences_df['sentiment_as_tensor'][int(train_fraction*len(sentences_df['encoded_sentences'])):]))
Evaluate the model using only test data:
test_loss, test_acc = model.evaluate(stacked_padded_nparrayof_encoded_sentence_test_tensors,stacked_nparray_sentiment_test_tensors)
print('Test Loss: {}'.format(test_loss))
print('Test Accuracy: {}'.format(test_acc))
Giving result:
Full notebook available here.
The question
How can I change the model definition and compilation rows above to have higher accuracy on the test set after no more than 1000 epochs?

You are using word piece subwords, you can try BPE. Also, you can build your model upon BERT and use transfer learning, that will literally skyrocket your results.
Firstly, change the kernel size in your Conv1D layer and try various values for it. Recommended would be [3, 5, 7]. Then, consider adding layers. Also, in the second last layer i.e. Dense, increase the number of units in it, that might help. Alternately, you can try a network with just LSTM layers or LSTM layers followed by Conv1D layer.
By trying out if it works then great otherwise repeat. But, the training loss gives a hint about it, if you see, the loss is not going down smoothly, you may assume, that your network is lacking predictive power i.e. underfitting and increase the number of neurons in it.
Yes, more data does help. But, if the fault is in your network i.e. it is underfitting, then, it won't help. First, you should explore the limits of the model you have before looking for faults in the data.
Yes, using the most common words is the usual norm because probabilistically, the less used words won't occur more and thus, won't affect the predictions greatly.

What Does Accuracy Metrics Mean in Keras' Sample Denoising Autoencoder?

I am working with Keras' sample denoising autoencoder;
https://keras.io/examples/mnist_denoising_autoencoder/
As I compile it, I use the following options:
autoencoder.compile(loss='mse', optimizer= Adadelta, metrics=['accuracy'])
Followed by training. I did training deliberately WITHOUT using noisy training data(x_train_noisy), but merely tried to recover x_train.
autoencoder.fit(x_train, x_train, epochs=30, batch_size=128)
After training 60,000 inputs of MNIST digits, it gives me an accuracy of 81.25%. Does it mean there are 60000*81.25% images are PERFECTLY recovered (equaling to the original input pixel by pixel), that is, 81.25% output images from the autoencoder are IDENTICAL to their input counterparts, or something else?
Furthermore, I also conducted a manual check by comparing output and the original data (60000 28X28 matrices) pixel by pixel--counting non-zeros elements from their differences:
x_decoded = autoencoder.predict(x_train)
temp = x_train*255
x_train_uint8 = temp.astype('uint8')
temp = x_decoded*255
x_decoded_uint8 = temp.astype('uint8')
c = np.count_nonzero(x_train_uint8 - x_decoded_uint8)
cp = 1-c /60000/28/28
Yet cp is only about 71%. Could any tell me why there is a difference?

Accuracy doesn't make sense for a regression problem, hence the keras sample doesn't use that metric during autoencoder.compile.
In this case, keras calculates the accuracy as per this metric.
binary_accuracy
def binary_accuracy(y_true, y_pred):
return K.mean(K.equal(y_true, K.round(y_pred)), axis=-1)
Using this numpy implementation, you should get the same value as output by Keras for validation accuracy at the end of training.
x_decoded = autoencoder.predict(x_test_noisy)
acc = np.mean(np.equal(x_test, np.round(x_decoded)))
print(acc)
Refer this answer for more details:
What function defines accuracy in Keras when the loss is mean squared error (MSE)?

Implemention of Neural Networks

I'm trying to understand how to implement neural networks. So I made my own dataset. Xtrain is numpy.random floats. Ytrain is sign(sin(1/x^3).
Try to implement neural networks gave me very poor results. 30%accuracy. Random Forest with 100 trees give 97%. But I heard that NN can approximate any function. What is wrong in my understanding?
import numpy as np
import keras
import math
from sklearn.ensemble import RandomForestClassifier as RF
train = np.random.rand(100000)
test = np.random.rand(100000)
def g(x):
if math.sin(2*3.14*x) > 0:
if math.cos(2*3.14*x) > 0:
return 0
else:
return 1
else:
if math.cos(2*3.14*x) > 0:
return 2
else:
return 3
def f(x):
x = (1/x) ** 3
res = [0, 0, 0, 0]
res[g(x)] = 1
return res
ytrain = np.array([f(x) for x in train])
ytest = np.array([f(x) for x in test])
train = np.array([[x] for x in train])
test = np.array([[x] for x in test])
from keras.models import Sequential
from keras.layers import Dense, Activation, Embedding, LSTM
model = Sequential()
model.add(Dense(100, input_dim=1))
model.add(Activation('sigmoid'))
model.add(Dense(100))
model.add(Activation('sigmoid'))
model.add(Dense(100))
model.add(Activation('sigmoid'))
model.add(Dense(4))
model.add(Activation('softmax'))
model.compile(optimizer='sgd',
loss='categorical_crossentropy',
metrics=['accuracy'])
P.S. I tried out many layers, activation functions, loss functions, optimizers, but never got more than 30% accuracy :(

I suspect that the 30% accuracy is a combination of small learning rate setting and a small training-step setting.
I ran your code snippet with model.fit(train, ytrain, nb_epoch=5, batch_size=32), after 5 epoch's training it yields about 28% accuracy. With the same setting but increasing the training steps to nb_epoch=50, the loss drops to ~1.157 ish and the accuracy raises to 40%. Further increase training steps should lead the model to further converging. Other than that, you can also try to configure the model with a larger learning rate setting which could make the converging faster :
model.compile(loss='categorical_crossentropy', optimizer=SGD(lr=0.1, momentum=0.9, nesterov=True), metrics=['accuracy'])
Although be careful don't set the learning rate to be too large otherwise your loss could blow up.
EDIT:
NN is known for having the potential for modeling extremely complex function, however, whether or not the model actually produce a good performance is a matter of how the model is designed, trained, and many other matters related to the specific application.

Zhongyu Kuang's answer is correct in stating that you may need to train it longer or with a different learning rate.
I'll add that the deeper your network, the longer you'll need to train it before it converges. For a relatively simple function like sign(sin(1/x^3)), you may be able to get away with a smaller network than the one you're using.
Additionally, softmax probably isn't the best output layer. You just need to yield -1 or 1. A single tanh unit seems like it would do well. softmax is generally used when you want to learn a probability distribution over a finite set. (You'll probably want to switch your error function from cross entropy to mean square error for similar reasons.)
Try a network with one sigmoidal hidden layer and an output layer with just one tanh unit. Then play around with the layer size and learning rate. Maybe add a second hidden layer if you can't get results with just one, but I wouldn't be surprised if it's unnecessary.
Addendum: In this approach, you'll replace f(x) with a direct calculation of the target function instead of the one-hot vector you're using currently.

Neural network accuracy optimization

I have constructed an ANN in keras which has 1 input layer(3 inputs), one output layer (1 output) and two hidden layers with with 12 and 3 nodes respectively.
The way i construct and train my network is:
from keras.models import Sequential
from keras.layers import Dense
from sklearn.cross_validation import train_test_split
import numpy
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
dataset = numpy.loadtxt("sorted output.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:3]
Y = dataset[:,3]
# split into 67% for train and 33% for test
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, random_state=seed)
# create model
model = Sequential()
model.add(Dense(12, input_dim=3, init='uniform', activation='relu'))
model.add(Dense(3, init='uniform', activation='relu'))
model.add(Dense(1, init='uniform', activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test,y_test), nb_epoch=150, batch_size=10)
Sorted output csv file looks like:
so after 150 epochs i get: loss: 0.6932 - acc: 0.5000 - val_loss: 0.6970 - val_acc: 0.1429
My question is: how could i modify my NN in order to achieve higher accuracy?

You could try the following things. I have written this roughly in the order of importance - i.e. the order I would try things to fix the accuracy problem you are seeing:
Normalise your input data. Usually you would take mean and standard deviation of training data, and use them to offset+scale all further inputs. There is a standard normalising function in sklearn for this. Remember to treat your test data in the same way (using the mean and std from the training data, not recalculating it)
Train for more epochs. For problems with small numbers of features and limited training set sizes, you often have to run for thousands of epochs before the network will converge. You should plot the training and validation loss values to see whether the network is still learning, or has converged as best as it can.
For your simple data, I would avoid relu activations. You may have heard they are somehow "best", but like most NN options, they have types of problems where they work well, and others where they are not best choice. I think you would be better off with tanh or sigmoid activations in hidden layers for your problem. Save relu for very deep networks and/or convolutional problems on images/audio.
Use more training data. Not clear how much you are feeding it, but NNs work best with large amounts of training data.
Provided you already have lots of training data - increase size of hidden layers. More complex relationships require more hidden neurons (and sometimes more layers) for the NN to be able to express the "shape" of the decision surface. Here is a handy browser-based network allowing you to play with that idea and get a feel for it.
Add one or more dropout layers after the hidden layers or add some other regularisation. The network could be over-fitting (although with a training accuracy of 0.5 I suspect it isn't). Unlike relu, using dropout is pretty close to a panacea for tougher NN problems - it improves generalisation in many cases. A small amount of dropout (~0.2) might help with your problem, but like most hyper-parameters, you will need to search for the best values.
Finally, it is always possible that the relationship you want to find that allows you to predict Y from X is not really there. In which case it would be a correct result from the NN to be no better than guessing at Y.

Neil Slater already provided a long list of helpful general advices.
In your specific examaple, normalization is the important thing. If you add the following lines to your code
...
X = dataset[:,0:3]
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X = scaler.fit_transform(X)
you will get 100% accuracy on your toy data, even with much simpler network structures. Without normalization, the optimizer won't work.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.