Keras LSTM is not learning

Keras LSTM is not learning - python

I'm trying to understand how Keras' LSTM model works so I took an Udemy course and the teacher built a model with an LSTM to predict Google stock prices. The model works really well and here it is:
# Part 1 - Data Preprocessing
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the training set
training_set = pd.read_csv('Google_Stock_Price_Train.csv')
training_set = training_set.iloc[:,1:2].values
# Feature Scaling
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler()
training_set = sc.fit_transform(training_set)
# Getting the inputs and the ouputs
X_train = training_set[0:1257]
y_train = training_set[1:1258]
# Reshaping
X_train = np.reshape(X_train, (1257, 1, 1))
# Part 2 - Building the RNN
# Importing the Keras libraries and packages
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
# Initialising the RNN
regressor = Sequential()
# Adding the input layer and the LSTM layer
regressor.add(LSTM(units=4, activation = 'sigmoid', input_shape = (None, 1)))
# Adding the output layer
regressor.add(Dense(units = 1))
# Compiling the RNN
regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')
# Fitting the RNN to the Training set
regressor.fit(X_train, y_train, batch_size = 32, epochs = 200)
# Part 3 - Making the predictions and visualising the results
# Getting the real stock price of 2017
test_set = pd.read_csv('Google_Stock_Price_Test.csv')
real_stock_price = test_set.iloc[:,1:2].values
# Getting the predicted stock price of 2017
inputs = real_stock_price
inputs = sc.transform(inputs)
inputs = np.reshape(inputs, (20, 1, 1))
predicted_stock_price = regressor.predict(inputs)
predicted_stock_price = sc.inverse_transform(predicted_stock_price)
# Visualising the results
plt.plot(real_stock_price, color = 'red', label = 'Real Google Stock Price')
plt.plot(predicted_stock_price, color = 'blue', label = 'Predicted Google Stock Price')
plt.title('Google Stock Price Prediction')
plt.xlabel('Time')
plt.ylabel('Google Stock Price')
plt.legend()
plt.show()
Now I want to apply my own data to this model to predict the next product that a person might purchase using their purchased history. The only thing I changed in this model is the input data:
Original stock price data is the open price of stock day by day:
720, 800, 520, ... etc (one column, one feature)
My data is in the same format, which is the IDs of a product that a user purchased:
15, 1320, 680, ... etc (one column, one feature)
The problem is the model is not learning my data, the loss value does not change. I want to know if there is a problem with my data or if the model is not well adapted to my problem?
Please help and sorry about my english. :)
------------------------ Update ----------------------
a few epochs of training:
Epoch 1/200
12507/12507 [==============================] - 2s - loss: 0.1870
Epoch 2/200
12507/12507 [==============================] - 1s - loss: 0.0705
Epoch 3/200
12507/12507 [==============================] - 1s - loss: 0.0699
Epoch 4/200
12507/12507 [==============================] - 1s - loss: 0.0699
Epoch 5/200
12507/12507 [==============================] - 1s - loss: 0.0699
Epoch 6/200
12507/12507 [==============================] - 1s - loss: 0.0699
Epoch 7/200
12507/12507 [==============================] - 1s - loss: 0.0699
Epoch 8/200
12507/12507 [==============================] - 1s - loss: 0.0699
Epoch 9/200
12507/12507 [==============================] - 1s - loss: 0.0699

Related

Why isn't my CNN model for a Binary Classification not learning?

I have been given 10000 images of shape (100,100), representing detection of particles, I have then created 10000 empty images of shape (100,100) and mixed them together. I have given each respective type labels of 0 and 1, seen in the code here:
Labels = np.append(np.ones(10000),np.zeros(empty_sheets.shape[0]))
images_scale1 = np.zeros(s) #scaling each image so that it has a maximum number of 1
#scaling each image so that it has a maximum number of 1
l = s[0]
for i in range(l):
images_scale1[i] = images[i]/np.amax(images[i])
empty_sheets_noise1 = add_noise(empty_sheets,0)
scale1noise1 = np.concatenate((images_scale1,empty_sheets_noise1),axis=0)
y11 = Labels
scale1noise1s, y11s = shuffle(scale1noise1, y11)
scale1noise1s_train, scale1noise1s_test, y11s_train, y11s_test = train_test_split(
scale1noise1s, y11, test_size=0.25)
#reshaping image arrays so that they can be passed through CNN
scale1noise1s_train = scale1noise1s_train.reshape(scale1noise1s_train.shape[0],100,100,1)
scale1noise1s_test = scale1noise1s_test.reshape(scale1noise1s_test.shape[0],100,100,1)
y11s_train = y11s_train.reshape(y11s_train.shape[0],1)
y11s_test = y11s_test.reshape(y11s_test.shape[0],1)
Then to set up my model I create a new function:
def create_model():
#initiates new model
model = keras.models.Sequential()
model.add(keras.layers.Conv2D(64, (3,3),activation='relu',input_shape=(100,100,1)))
model.add(keras.layers.MaxPooling2D((2, 2)))
model.add(keras.layers.Dropout(0.2))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(32))
model.add(keras.layers.Dense(64))
model.add(keras.layers.Dense(1,activation='sigmoid'))
return model
estimators1m1 = create_model()
estimators1m1.compile(optimizer='adam', metrics=['accuracy', tf.keras.metrics.Precision(),
tf.keras.metrics.Recall()], loss='binary_crossentropy')
history = estimators1m1.fit(scale1noise1s_train, y11s_train, epochs=3,
validation_data=(scale1noise1s_test, y11s_test))
which produces the following:
Epoch 1/3 469/469 [==============================] - 62s 131ms/step -
loss: 0.6939 - accuracy: 0.4917 - precision_2: 0.4905 - recall_2:
0.4456 - val_loss: 0.6933 - val_accuracy: 0.5012 - val_precision_2: 0.5012 - val_recall_2: 1.0000 Epoch 2/3 469/469 [==============================] - 63s 134ms/step - loss: 0.6889 -
accuracy: 0.5227 - precision_2: 0.5209 - recall_2: 0.5564 - val_loss:
0.6976 - val_accuracy: 0.4994 - val_precision_2: 0.5014 - val_recall_2: 0.2191 Epoch 3/3 469/469
[==============================] - 59s 127ms/step - loss: 0.6527 -
accuracy: 0.5783 - precision_2: 0.5764 - recall_2: 0.5887 - val_loss:
0.7298 - val_accuracy: 0.5000 - val_precision_2: 0.5028 - val_recall_2: 0.2131
I have tried more epochs and I still only manage to get 50% accuracy which is useless as its just predicting the same things constantly.

There can be many reasons why your model is not working. One that seems more likely is that the model is under-fitting as both accuracy on training set and validation set is low meaning that the neural network is unable to capture the pattern in the data. Hence you should consider building little more complex model by adding more layer at the same time avoiding over-fitting with techniques like dropout. You should also get best parameters by doing hyperparameter tuning.

Overfitting in RNN for S&P prediction

i'm doing a RNN based on the model from the Deep Learning A-Z course on Udemy.
For the example for Google Stocks, we used 5 years of daily stock price. In the end of the lecture is said to test with more data or change the parameters or the structure of the RNN.
My thought was that if i can get more data the RNN can get better results. I downloaded the data from S&P from 01/01/2006 to today, separated the train test except the last 23 days and the 23 days are my test for prediction.
So im excited to see if i can get a kind of useful insigths...let it run in 100 epochs.
Epoch 1/100
3599/3599 [==============================] - 235s 65ms/step - loss: 0.0090
Epoch 2/100
3599/3599 [==============================] - 210s 58ms/step - loss: 0.0024
Epoch 3/100
3599/3599 [==============================] - 208s 58ms/step - loss: 0.0022
Epoch 4/100
3599/3599 [==============================] - 557s 155ms/step - loss: 0.0024
Epoch 5/100
3599/3599 [==============================] - 211s 59ms/step - loss: 0.0022
Epoch 6/100
3599/3599 [==============================] - 207s 58ms/step - loss: 0.0018
Epoch 7/100
3599/3599 [==============================] - 216s 60ms/step - loss: 0.0018
Epoch 8/100
3599/3599 [==============================] - 265s 74ms/step - loss: 0.0016
Epoch 9/100
3599/3599 [==============================] - 215s 60ms/step - loss: 0.0016
Epoch 10/100
3599/3599 [==============================] - 209s 58ms/step - loss: 0.0014
Epoch 11/100
3599/3599 [==============================] - 217s 60ms/step - loss: 0.0014
Epoch 12/100
3599/3599 [==============================] - 216s 60ms/step - loss: 0.0013
Epoch 13/100
3599/3599 [==============================] - 218s 60ms/step - loss: 0.0012
Epoch 14/100
3599/3599 [==============================] - 217s 60ms/step - loss: 0.0012
Epoch 15/100
3599/3599 [==============================] - 210s 58ms/step - loss: 0.0012
Epoch 16/100
3599/3599 [==============================] - 292s 81ms/step - loss: 0.0012
Epoch 17/100
3599/3599 [==============================] - 328s 91ms/step - loss: 0.0011
Epoch 18/100
3599/3599 [==============================] - 199s 55ms/step - loss: 9.8658e-04
Epoch 19/100
3599/3599 [==============================] - 199s 55ms/step - loss: 0.0010
Epoch 20/100
3599/3599 [==============================] - 286s 79ms/step - loss: 9.9106e-04
WOW 0,0010 was pretty good...but from here it's way too low.
i stopped in the 39 epoch...because it's taking too long and the loss is too small.
Epoch 39/100
2560/3599 [====================>.........] - ETA: 1:00 - **loss: 6.3598e-04**
This is the results
Did i overfit the data? Or stopping too soon is the cause of the large errors? What can i do to optimize the time required to run the 100 epochs?
The code is the following:
# Recurrent Neural Network
# Part 1 - Data Preprocessing
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
# Importing the training set
dataset_train = pd.read_csv('S&P_Train.csv')
training_set = dataset_train.iloc[:, 1:2].values
# Feature Scaling
sc = MinMaxScaler(feature_range = [0, 1])
training_set_sc = sc.fit_transform(training_set)
# Creating a data structure with 60 timesteps and 1 output
X_train = []
y_train = []
for i in range(60, 3659):
X_train.append(training_set_sc[i-60:i, 0])
y_train.append(training_set_sc[i, 0])
X_train, y_train = np.array(X_train), np.array(y_train)
# Reshaping
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
# Part 2 - Building the RNN
# Importing the Keras libraries and packages
# Initialising the RNN
regressor = Sequential()
# Adding the first LSTM layer and some Dropout regularisation
rnn = regressor.add(LSTM(units = 50, return_sequences = True, input_shape = (X_train.shape[1], 1)))
rnn = regressor.add(Dropout(0.2))
# Adding a second LSTM layer and some Dropout regularisation
rnn = regressor.add(LSTM(units = 50, return_sequences = True))
rnn = regressor.add(Dropout(0.2))
# Adding a third LSTM layer and some Dropout regularisation
rnn = regressor.add(LSTM(units = 50, return_sequences = True))
rnn = regressor.add(Dropout(0.2))
# Adding a fourth LSTM layer and some Dropout regularisation
rnn = regressor.add(LSTM(units = 50))
rnn = regressor.add(Dropout(0.2))
# Adding the output layer
rnn = regressor.add(Dense(units = 1))
# Compiling the RNN
rnn = regressor.compile(optimizer = 'Adam', loss = 'mean_squared_error')
# Fitting the RNN to the Training set
regressor.fit(X_train, y_train, epochs = 100, batch_size = 32)
# Part 3 - Making the predictions and visualising the results
print('ok')
# Getting the real stock price of 2017
dataset_test = pd.read_csv('S&P_Test.csv')
real_stock_price = dataset_test.iloc[:, 1:2].values
# Getting the predicted stock price of 2017
dataset_total = pd.concat((dataset_train['Open'], dataset_test['Open']), axis = 0)
inputs = dataset_total[len(dataset_total) - len(dataset_test) - 60:].values
inputs = inputs.reshape(-1, 1)
inputs = sc.transform(inputs)
X_test = []
for i in range(60, 83):
X_test.append(inputs[i-60:i, 0])
X_test = np.array(X_test)
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))
predicted_stock_price = regressor.predict(X_test)
predicted_stock_price = sc.inverse_transform(predicted_stock_price)
# Visualising the results
plt.plot(real_stock_price, color = 'red', label = 'Real Stock Price')
plt.plot(predicted_stock_price, color = 'blue', label = 'Predicted Stock Price')
plt.title('Prediction of Stocks Values')
plt.xlabel('time')
plt.ylabel('Stock Price')
plt.legend()
plt.show()

Did i overfit the data?
Yeah you probably did, you can check it via val_loss, if your validation loss starts increasing, you are overfitting. You should use validation_set and check validation_error
What can i do to optimize the time required to run the 100 epochs?
You can stop training before overfitting the data with Earlystopping from tensorflow api, tf.keras.callbacks.EarlyStopping()
from tensorflow.keras.callbacks import EarlyStopping
early_stopping = EarlyStopping()
model.compile(...)
model.fit(..., epochs = 9999, callbacks=early_stopping)

Binary classification model using BERT encoder stuck at 50% accuracy

I'm trying to train a simple model for the Yelp binary classification task.
Load BERT encoder:
gs_folder_bert = "gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-12_H-768_A-12"
bert_config_file = os.path.join(gs_folder_bert, "bert_config.json")
config_dict = json.loads(tf.io.gfile.GFile(bert_config_file).read())
bert_config = bert.configs.BertConfig.from_dict(config_dict)
_, bert_encoder = bert.bert_models.classifier_model(
bert_config, num_labels=2)
checkpoint = tf.train.Checkpoint(model=bert_encoder)
checkpoint.restore(
os.path.join(gs_folder_bert, 'bert_model.ckpt')).assert_consumed()
Load data:
data, info = tfds.load('yelp_polarity_reviews', with_info=True, batch_size=-1, as_supervised=True)
train_x_orig, train_y_orig = tfds.as_numpy(data['train'])
train_x = encode_examples(train_x_orig)
train_y = train_y_orig
Use BERT to embed the data:
encoder_output = bert_encoder.predict(train_x)
Setup the model:
inputs = keras.Input(shape=(768,))
x = keras.layers.Dense(64, activation='relu')(inputs)
x = keras.layers.Dense(8, activation='relu')(x)
outputs = keras.layers.Dense(1, activation='sigmoid')(x)
model = keras.Model(inputs=inputs, outputs=outputs)
sgd = SGD(lr=0.0001)
model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])
Train:
model.fit(encoder_output[0], train_y, batch_size=64, epochs=3)
# encoder_output[0].shape === (10000, 1, 768)
# y_train.shape === (100000,)
Training results:
Epoch 1/5
157/157 [==============================] - 1s 5ms/step - loss: 0.6921 - accuracy: 0.5455
Epoch 2/5
157/157 [==============================] - 1s 5ms/step - loss: 0.6918 - accuracy: 0.5455
Epoch 3/5
157/157 [==============================] - 1s 5ms/step - loss: 0.6915 - accuracy: 0.5412
Epoch 4/5
157/157 [==============================] - 1s 5ms/step - loss: 0.6913 - accuracy: 0.5407
Epoch 5/5
157/157 [==============================] - 1s 5ms/step - loss: 0.6911 - accuracy: 0.5358
I tried different learning rates, but the main issue seems that training takes 1 second and the accuracy stays at ~0.5. Am I not setting the inputs/model correctly?

Your BERT model is not training. It has to be placed before dense layers and train as part of the model. the input layer has to take not BERT vectors, but the sequence of tokens cropped to max_length and padded. Here is the example code: https://keras.io/examples/nlp/text_extraction_with_bert/, see the beginning of create_model function.
Alternatively, you can use Trainer from transformers.

Machine learning regression model predicts same value for every image

I am currently working on a project involving training a regression model, saving it and then loading it to make further predictions using that model. However I'm having a problem. Each time that I model.predict on images it gives out the same predictions. I am not entirely sure what the problem is, maybe it's in the training stage or i'm just doing something wrong.
I was following this tutorial
All of the files are in this github repo
Here are some bits from the code:
(This part is training the model and saving it)
model = create_cnn(400, 400, 3, regress=True)
opt = Adam(lr=1e-3, decay=1e-3 / 200)
model.compile(loss="mean_absolute_percentage_error", optimizer=opt)
model.fit(X, Y, epochs=70, batch_size=8)
model.save("D:/statispic2/final-statispic_model.hdf5")
The next code part is from loading the model and making predictions.
model = load_model("D:/statispic2/statispic_model.hdf5") # Loading the model
prediction = model.predict(images_ready_for_prediction) #images ready for prediction include a numpy array
#that is loaded with the images just like I loaded them for the training stage.
print(prediction_list)
After trying it out this is the output prediction from the model:
[[0.05169942] # I gave it 5 images as parameters
[0.05169942]
[0.05169942]
[0.05169942]
[0.05169942]]
If anything is unclear, or you would like to see some more code, please let me know.

People saying regression and CNN are two completely different things clearly have missed some basic learnings in their ML course. Yes they are completely different! But should not be compared ;)
CNN is a type of deep neural network usually which became quite famous for its use on images. Therefore it is a framework to solve problem, and can solve both regression AND classification problems.
Regression refers to the type of output you are predicting. So comparing the two directly is quite stupid to be honest.
I cant comment on the specific people misleading you in this section, since I need a specific number of points to do so.
However, back to the problem. Do you encounter this problem before or after saving it? If you encounter it before, I would try scaling your output values to an easier distribution. If it happens after you save, I would look into versions of your framework and the documentation of how they save it.
It could also just be that there is no information in the pictures.

No, no, no! Regression is completely different from CNN. Do a little research and the differences will quickly become apparent. In the meantime, I'll share two code samples with you right here.
Regression:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
#%matplotlib inline
import sklearn
from sklearn.datasets import load_boston
boston = load_boston()
# Now we will load the data into a pandas dataframe and then will print the first few rows of the data using the head() function.
bos = pd.DataFrame(boston.data)
bos.head()
bos.columns = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT']
bos.head()
bos['MEDV'] = boston.target
bos.describe()
bos.isnull().sum()
sns.distplot(bos['MEDV'])
plt.show()
sns.pairplot(bos)
corr_mat = bos.corr().round(2)
sns.heatmap(data=corr_mat, annot=True)
sns.lmplot(x = 'RM', y = 'MEDV', data = bos)
X = bos[['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX','PTRATIO', 'B', 'LSTAT']]
y = bos['MEDV']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 10)
# Training the Model
# We will now train our model using the LinearRegression function from the sklearn library.
from sklearn.linear_model import LinearRegression
lm = LinearRegression()
lm.fit(X_train, y_train)
# Prediction
# We will now make prediction on the test data using the LinearRegression function and plot a scatterplot between the test data and the predicted value.
prediction = lm.predict(X_test)
plt.scatter(y_test, prediction)
df1 = pd.DataFrame({'Actual': y_test, 'Predicted':prediction})
df2 = df1.head(10)
df2
df2.plot(kind = 'bar')
from sklearn import metrics
from sklearn.metrics import r2_score
print('MAE', metrics.mean_absolute_error(y_test, prediction))
print('MSE', metrics.mean_squared_error(y_test, prediction))
print('RMSE', np.sqrt(metrics.mean_squared_error(y_test, prediction)))
print('R squared error', r2_score(y_test, prediction))
Result:
MAE 4.061419182954711
MSE 34.413968453138565
RMSE 5.866341999333023
R squared error 0.6709339839115628
CNN:
# keras imports for the dataset and building our neural network
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Conv2D, MaxPool2D, Flatten
from keras.utils import np_utils
# to calculate accuracy
from sklearn.metrics import accuracy_score
# loading the dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# building the input vector from the 28x28 pixels
X_train = X_train.reshape(X_train.shape[0], 28, 28, 1)
X_test = X_test.reshape(X_test.shape[0], 28, 28, 1)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
# normalizing the data to help with the training
X_train /= 255
X_test /= 255
# one-hot encoding using keras' numpy-related utilities
n_classes = 10
print("Shape before one-hot encoding: ", y_train.shape)
Y_train = np_utils.to_categorical(y_train, n_classes)
Y_test = np_utils.to_categorical(y_test, n_classes)
print("Shape after one-hot encoding: ", Y_train.shape)
# building a linear stack of layers with the sequential model
model = Sequential()
# convolutional layer
model.add(Conv2D(25, kernel_size=(3,3), strides=(1,1), padding='valid', activation='relu', input_shape=(28,28,1)))
model.add(MaxPool2D(pool_size=(1,1)))
# flatten output of conv
model.add(Flatten())
# hidden layer
model.add(Dense(100, activation='relu'))
# output layer
model.add(Dense(10, activation='softmax'))
# compiling the sequential model
model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')
# training the model for 10 epochs
model.fit(X_train, Y_train, batch_size=128, epochs=10, validation_data=(X_test, Y_test))
Result:
Train on 60000 samples, validate on 10000 samples
Epoch 1/10
60000/60000 [==============================] - 27s 451us/step - loss: 0.2037 - accuracy: 0.9400 - val_loss: 0.0866 - val_accuracy: 0.9745
Epoch 2/10
60000/60000 [==============================] - 27s 451us/step - loss: 0.0606 - accuracy: 0.9819 - val_loss: 0.0553 - val_accuracy: 0.9812
Epoch 3/10
60000/60000 [==============================] - 27s 445us/step - loss: 0.0352 - accuracy: 0.9892 - val_loss: 0.0533 - val_accuracy: 0.9824
Epoch 4/10
60000/60000 [==============================] - 27s 446us/step - loss: 0.0226 - accuracy: 0.9930 - val_loss: 0.0572 - val_accuracy: 0.9825
Epoch 5/10
60000/60000 [==============================] - 27s 448us/step - loss: 0.0148 - accuracy: 0.9959 - val_loss: 0.0516 - val_accuracy: 0.9834
Epoch 6/10
60000/60000 [==============================] - 27s 443us/step - loss: 0.0088 - accuracy: 0.9976 - val_loss: 0.0574 - val_accuracy: 0.9824
Epoch 7/10
60000/60000 [==============================] - 26s 442us/step - loss: 0.0089 - accuracy: 0.9973 - val_loss: 0.0526 - val_accuracy: 0.9847
Epoch 8/10
60000/60000 [==============================] - 26s 440us/step - loss: 0.0047 - accuracy: 0.9988 - val_loss: 0.0593 - val_accuracy: 0.9838
Epoch 9/10
60000/60000 [==============================] - 28s 469us/step - loss: 0.0056 - accuracy: 0.9986 - val_loss: 0.0559 - val_accuracy: 0.9836
Epoch 10/10
60000/60000 [==============================] - 27s 449us/step - loss: 0.0059 - accuracy: 0.9981 - val_loss: 0.0663 - val_accuracy: 0.9820

CNN is deep learning. You use regression models for calculating a number, like the price of a car.

I had the exact same issue after pickle.dump and pickle.load my model. The problem I was missing is that I was not normalizing features (vector X) before predicting using the model. I hope the will help you.

Changing the optimizer from Adam() to RMSprop() with a learning rate of >0.001 worked for me.

handle overfitting in unbalanced datasets

I have an unbalanced dataset (only 0.06% of data is labeled 1 and the rest are labeled 0). As I researched, I had to resample the data, so I used imblearn package to randomUnserSample my dataset. Then I used a Keras Sequential to create a neural network. While training, F1Score increases to around 75% (1000th epoch result is: loss: 0.5691 - acc: 0.7543 - f1_m: 0.7525 - precision_m: 0.7582 - recall_m: 0.7472), but on test set, the result is disappointing (loss: 55.35181%, acc: 79.25248%, f1_m: 0.39789%, precision_m: 0.23259%, recall_m: 1.54982%).
What I assume is that on train set, because the number of 1s and 0s are the same and therefore class_wights are both set to 1, so the network is not costing much for wrong predictions of 1s.
I have used some techniques like reducing the number of layers, reducing number of neurons, using regularization and dropouts, but the test set f1Score is never more than 0.5%. What should I do. Thanks
my neural network:
def neural_network(X, y, epochs_count=3, handle_overfit=False):
# create model
model = Sequential()
model.add(Dense(12, input_dim=len(X_test.columns), activation='relu'))
if (handle_overfit):
model.add(Dropout(rate = 0.5))
model.add(Dense(8, activation='relu', kernel_regularizer=regularizers.l1(0.1)))
if (handle_overfit):
model.add(Dropout(rate = 0.1))
model.add(Dense(1, activation='sigmoid'))
# compile the model
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['acc', f1_m, precision_m, recall_m])
# change weights of the classes '0' and '1' and set weights automatically
class_weights = class_weight.compute_class_weight('balanced', [0, 1], y)
print("---------------------- \n chosen class_wieghts are: ", class_weights, " \n ---------------------")
# Fit the model
model.fit(X, y, epochs=epochs_count, batch_size=512, class_weight=class_weights)
return model
defining train and test set:
vtrain_set, test_set = train_test_split(data, test_size=0.35, random_state=0)
X_train = train_set[['..... some columns ....']]
y_train = train_set[['success']]
print('Initial dataset shape: ', X_train.shape)
rus = RandomUnderSampler(random_state=42)
X_undersampled, y_undersampled = rus.fit_sample(X_train, y_train)
print('undersampled dataset shape: ', X_undersampled.shape)
and result is:
Initial dataset shape: (1625843, 11)
undersampled dataset shape: (1970, 11)
and finally the neural network calling:
print (X_undersampled.shape, y_undersampled.shape)
print (X_test.shape, y_test.shape)
model = neural_network(X_undersampled, y_undersampled, 1000, handle_overfit=True)
# evaluate the model
print("\n---------------\nEvaluated on test set:")
scores = model.evaluate(X_test, y_test)
for i in range(len(model.metrics_names)):
print("%s: %.5f%%" % (model.metrics_names[i], scores[i]*100))
and result is:
(1970, 11) (1970,)
(875454, 11) (875454, 1)
----------------------
chosen class_wieghts are: [1. 1.]
---------------------
Epoch 1/1000
1970/1970 [==============================] - 4s 2ms/step - loss: 4.5034 - acc: 0.5147 - f1_m: 0.3703 - precision_m: 0.5291 - recall_m: 0.2859
.
.
.
.
Epoch 999/1000
1970/1970 [==============================] - 0s 6us/step - loss: 0.5705 - acc: 0.7538 - f1_m: 0.7471 - precision_m: 0.7668 - recall_m: 0.7296
Epoch 1000/1000
1970/1970 [==============================] - 0s 6us/step - loss: 0.5691 - acc: 0.7543 - f1_m: 0.7525 - precision_m: 0.7582 - recall_m: 0.7472
---------------
Evaluated on test set:
875454/875454 [==============================] - 49s 56us/step
loss: 55.35181%
acc: 79.25248%
f1_m: 0.39789%
precision_m: 0.23259%
recall_m: 1.54982%

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Keras LSTM is not learning - python

Related

Why isn't my CNN model for a Binary Classification not learning?

Overfitting in RNN for S&P prediction

Binary classification model using BERT encoder stuck at 50% accuracy

Machine learning regression model predicts same value for every image

handle overfitting in unbalanced datasets

Categories

Resources