After getting help on so many thing I'm here for a last time for my last problem that I can't find a solution.
Following my previous question an user pointed the fact that my bad results on my time series prediction may be because of my architecture not converging.
After looking at it and tried some fixes I've found on other questions (set weight, lower learning rate, change optimizer/activation) I can't seem to get better results, always getting an accuracy at 0 (or 0.0003, which isn't good enough).
My code:
import numpy
import numpy as np
import tflearn
from pandas import DataFrame
from pandas import Series
from pandas import concat
from pandas import read_csv
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
from math import sqrt
import datetime
# Preprocessing function
from tflearn import Accuracy, Momentum
def preprocess(data):
return np.array(data, dtype=np.int32)
def parser(x):
return datetime.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
# frame a sequence as a supervised learning problem
def timeseries_to_supervised(data, lag=1):
df = DataFrame(data)
columns = [df.shift(i) for i in range(1, lag + 1)]
columns.append(df)
df = concat(columns, axis=1)
df.fillna(0, inplace=True)
return df
def difference(dataset, interval=1):
diff = list()
for i in range(interval, len(dataset)):
value = dataset[i] - dataset[i - interval]
diff.append(value)
return Series(diff)
# invert differenced value
def inverse_difference(history, yhat, interval=1):
return yhat + history[-interval]
# scale train and test data to [-1, 1]
def scale(train, test):
# fit scaler
scaler = MinMaxScaler(feature_range=(-1, 1))
scaler = scaler.fit(train)
# transform train
train = train.reshape(train.shape[0], train.shape[1])
train_scaled = scaler.transform(train)
# transform test
test = test.reshape(test.shape[0], test.shape[1])
test_scaled = scaler.transform(test)
return scaler, train_scaled, test_scaled
# inverse scaling for a forecasted value
def invert_scale(scaler, X, value):
new_row = [x for x in X] + [value]
array = numpy.array(new_row)
array = array.reshape(1, len(array))
inverted = scaler.inverse_transform(array)
return inverted[0, -1]
def fit_lstm(train, batch_size, nb_epoch, neurons):
X, y = train[0:-1], train[:, -1]
X = X[:, 0].reshape(len(X), 1, 1)
y = y.reshape(len(y), 1)
print (X.shape)
print (y.shape)
# Build neural network
net = tflearn.input_data(shape=[None, 1, 1])
tnorm = tflearn.initializations.uniform(minval=-1.0, maxval=1.0)
net = tflearn.dropout(net, 0.8)
net = tflearn.fully_connected(net, 1, activation='linear', weights_init=tnorm)
net = tflearn.regression(net, optimizer='adam', learning_rate=0.001,
loss='mean_square')
# Define model
model = tflearn.DNN(net, tensorboard_verbose=3, best_val_accuracy=0.6)
model.fit(X, y, n_epoch=nb_epoch, batch_size=batch_size, shuffle=False, show_metric=True)
score = model.evaluate(X, y, batch_size=128)
print (score)
return model
# make a one-step forecast
def forecast_lstm(model, X):
X = X.reshape(len(X), 1, 1)
yhat = model.predict(X)
return yhat[0, 0]
# Load CSV file, indicate that the first column represents labels
data = read_csv('nowcastScaled.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)
# transform data to be stationary
raw_values = data.values
diff_values = difference(raw_values, 1)
# transform data to be supervised learning
supervised = timeseries_to_supervised(diff_values, 1)
supervised_values = supervised.values
# split data into train and test-sets
train, test = supervised_values[0:10000], supervised_values[10000:10100]
# transform the scale of the data
scaler, train_scaled, test_scaled = scale(train, test)
repeats = 1
for r in range(repeats):
# fit the model
lstm_model = fit_lstm(train_scaled, 128, 6, 1)
# forecast the entire training dataset to build up state for forecasting
train_reshaped = train_scaled[:, 0].reshape(len(train_scaled), 1, 1)
print (lstm_model.predict(train_reshaped))
# walk-forward validation on the test data
predictions = list()
error_scores = list()
for i in range(len(test_scaled)):
# make one-step forecast
X, y = test_scaled[i, 0:-1], test_scaled[i, -1]
yhat = forecast_lstm(lstm_model, X)
# invert scaling
yhat = invert_scale(scaler, X, yhat)
# # invert differencing
yhat = inverse_difference(raw_values, yhat, len(test_scaled) + 1 - i)
# store forecast
predictions.append(yhat)
rmse = sqrt(mean_squared_error(raw_values[10000:10100], predictions))
print('%d) Test RMSE: %.3f' % (1, rmse))
error_scores.append(rmse)
print predictions
print raw_values[10000:10100]
Here is the result I get from running it (raising epoch doesn't seem to make it better):
Training Step: 472 | total loss: 0.00486 | time: 0.421s
| Adam | epoch: 006 | loss: 0.00486 - binary_acc: 0.0000 -- iter: 9856/9999
Training Step: 473 | total loss: 0.00453 | time: 0.427s
| Adam | epoch: 006 | loss: 0.00453 - binary_acc: 0.0000 -- iter: 9984/9999
Training Step: 474 | total loss: 0.00423 | time: 0.430s
| Adam | epoch: 006 | loss: 0.00423 - binary_acc: 0.0000 -- iter: 9999/9999
I've tried to lower/raise most of the setting but nothing.
Here is an extract of the data I'm using (univariate time series), using more or less data in training didn't do anything either.
(Ps: My code is mostly from this tutorial, I had to change a bit it since I wanted to try using Tflearn)
You cant define accuracy for a regression problem. You just track the MSE of the predicted and the actual. Your training loss seems to be low, so if the prediction is not close, then either your scaling inverse is not right or your overfitting.
Related
I currently have a RNN model for time series predictions. It uses 3 input features "value", "temperature" and "hour of the day" of the last 96 time steps to predict the next 96 time steps of the feature "value".
Here you can see a schema of it:
and here you have the current code:
#Import modules
import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
from tensorflow import keras
# Define the parameters of the RNN and the training
epochs = 1
batch_size = 50
steps_backwards = 96
steps_forward = 96
split_fraction_trainingData = 0.70
split_fraction_validatinData = 0.90
randomSeedNumber = 50
#Read dataset
df = pd.read_csv('C:/Users/Desktop/TestData.csv', sep=';', header=0, low_memory=False, infer_datetime_format=True, parse_dates={'datetime':[0]}, index_col=['datetime'])
# standardize data
data = df.values
indexWithYLabelsInData = 0
data_X = data[:, 0:3]
data_Y = data[:, indexWithYLabelsInData].reshape(-1, 1)
scaler_standardized_X = StandardScaler()
data_X = scaler_standardized_X.fit_transform(data_X)
data_X = pd.DataFrame(data_X)
scaler_standardized_Y = StandardScaler()
data_Y = scaler_standardized_Y.fit_transform(data_Y)
data_Y = pd.DataFrame(data_Y)
# Prepare the input data for the RNN
series_reshaped_X = np.array([data_X[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))])
series_reshaped_Y = np.array([data_Y[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))])
timeslot_x_train_end = int(len(series_reshaped_X)* split_fraction_trainingData)
timeslot_x_valid_end = int(len(series_reshaped_X)* split_fraction_validatinData)
X_train = series_reshaped_X[:timeslot_x_train_end, :steps_backwards]
X_valid = series_reshaped_X[timeslot_x_train_end:timeslot_x_valid_end, :steps_backwards]
X_test = series_reshaped_X[timeslot_x_valid_end:, :steps_backwards]
Y_train = series_reshaped_Y[:timeslot_x_train_end, steps_backwards:]
Y_valid = series_reshaped_Y[timeslot_x_train_end:timeslot_x_valid_end, steps_backwards:]
Y_test = series_reshaped_Y[timeslot_x_valid_end:, steps_backwards:]
# Build the model and train it
np.random.seed(randomSeedNumber)
tf.random.set_seed(randomSeedNumber)
model = keras.models.Sequential([
keras.layers.SimpleRNN(10, return_sequences=True, input_shape=[None, 3]),
keras.layers.SimpleRNN(10, return_sequences=True),
keras.layers.TimeDistributed(keras.layers.Dense(1))
])
model.compile(loss="mean_squared_error", optimizer="adam", metrics=['mean_absolute_percentage_error'])
history = model.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, validation_data=(X_valid, Y_valid))
#Predict the test data
Y_pred = model.predict(X_test)
# Inverse the scaling (traInv: transformation inversed)
data_X_traInv = scaler_standardized_X.inverse_transform(data_X)
data_Y_traInv = scaler_standardized_Y.inverse_transform(data_Y)
series_reshaped_X_notTransformed = np.array([data_X_traInv[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))])
X_test_notTranformed = series_reshaped_X_notTransformed[timeslot_x_valid_end:, :steps_backwards]
Y_pred_traInv = scaler_standardized_Y.inverse_transform (Y_pred)
Y_test_traInv = scaler_standardized_Y.inverse_transform (Y_test)
# Calculate errors for every time slot of the multiple predictions
abs_diff = np.abs(Y_pred_traInv - Y_test_traInv)
abs_diff_perPredictedSequence = np.zeros((len (Y_test_traInv)))
average_LoadValue_testData_perPredictedSequence = np.zeros((len (Y_test_traInv)))
abs_diff_perPredictedTimeslot_ForEachSequence = np.zeros((len (Y_test_traInv)))
absoluteError_Load_Ratio_allPredictedSequence = np.zeros((len (Y_test_traInv)))
absoluteError_Load_Ratio_allPredictedTimeslots = np.zeros((len (Y_test_traInv)))
mse_perPredictedSequence = np.zeros((len (Y_test_traInv)))
rmse_perPredictedSequence = np.zeros((len(Y_test_traInv)))
for i in range (0, len(Y_test_traInv)):
for j in range (0, len(Y_test_traInv [0])):
abs_diff_perPredictedSequence [i] = abs_diff_perPredictedSequence [i] + abs_diff [i][j]
mse_perPredictedSequence [i] = mean_squared_error(Y_pred_traInv[i] , Y_test_traInv [i] )
rmse_perPredictedSequence [i] = np.sqrt(mse_perPredictedSequence [i])
abs_diff_perPredictedTimeslot_ForEachSequence [i] = abs_diff_perPredictedSequence [i] / len(Y_test_traInv [0])
average_LoadValue_testData_perPredictedSequence [i] = np.mean (Y_test_traInv [i])
absoluteError_Load_Ratio_allPredictedSequence [i] = abs_diff_perPredictedSequence [i] / average_LoadValue_testData_perPredictedSequence [i]
absoluteError_Load_Ratio_allPredictedTimeslots [i] = abs_diff_perPredictedTimeslot_ForEachSequence [i] / average_LoadValue_testData_perPredictedSequence [i]
rmse_average_allPredictictedSequences = np.mean (rmse_perPredictedSequence)
absoluteAverageError_Load_Ratio_allPredictedSequence = np.mean (absoluteError_Load_Ratio_allPredictedSequence)
absoluteAverageError_Load_Ratio_allPredictedTimeslots = np.mean (absoluteError_Load_Ratio_allPredictedTimeslots)
absoluteAverageError_allPredictedSequences = np.mean (abs_diff_perPredictedSequence)
absoluteAverageError_allPredictedTimeslots = np.mean (abs_diff_perPredictedTimeslot_ForEachSequence)
Here you have some test data Download Test Data
So now I actually would like to include not only past values of the features into the prediction but also future values of the features "temperature" and "hour of the day" into the prediction. The future values of the feature "temperature" can for example be taken from an external weather forecasting service and for the feature "hour of the day" the future values are know before (in the test data I have included a "forecast" of the temperature that is not a real forecast; I just randomly changed the values).
This way, I could assume that - for several applications and data - the forecast could be improved.
In a schema it would look like this:
Can anyone tell me, how I can do that in Keras with a RNN (or LSTM)? One way could be to include the future values as independant features as input. But I would like the model to know that the future values of a feature are connected to the past values of a feature.
Reminder: Does anybody have an idea how to do this? I'll highly appreciate every comment.
The standard approach is to use an encoder-decoder architecture (see 1 and 2 for instance):
The encoder takes as input the past values of the features and of the target and returns an output representation.
The decoder takes as input the encoder output and the future values of the features and returns the predicted values of the target.
You can use any architecture for the encoder and for the decoder and you can also consider different approaches for passing the encoder output to the decoder (e.g. adding or concatenating it to the decoder input features, adding or concatenating it to the output of some intermediate decoder layer, or adding it to the final decoder output), the code below is just an example.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.layers import Input, Dense, LSTM, TimeDistributed, Concatenate, Add
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
# define the inputs
target = ['value']
features = ['temperatures', 'hour of the day']
sequence_length = 96
# import the data
df = pd.read_csv('TestData.csv', sep=';', header=0, low_memory=False, infer_datetime_format=True, parse_dates={'datetime': [0]}, index_col=['datetime'])
# scale the data
target_scaler = StandardScaler().fit(df[target])
features_scaler = StandardScaler().fit(df[features])
df[target] = target_scaler.transform(df[target])
df[features] = features_scaler.transform(df[features])
# extract the input and output sequences
X_encoder = [] # past features and target values
X_decoder = [] # future features values
y = [] # future target values
for i in range(sequence_length, df.shape[0] - sequence_length):
X_encoder.append(df[features + target].iloc[i - sequence_length: i])
X_decoder.append(df[features].iloc[i: i + sequence_length])
y.append(df[target].iloc[i: i + sequence_length])
X_encoder = np.array(X_encoder)
X_decoder = np.array(X_decoder)
y = np.array(y)
# define the encoder and decoder
def encoder(encoder_features):
y = LSTM(units=100, return_sequences=True)(encoder_features)
y = TimeDistributed(Dense(units=1))(y)
return y
def decoder(decoder_features, encoder_outputs):
x = Concatenate(axis=-1)([decoder_features, encoder_outputs])
# x = Add()([decoder_features, encoder_outputs])
y = TimeDistributed(Dense(units=100, activation='relu'))(x)
y = TimeDistributed(Dense(units=1))(y)
return y
# build the model
encoder_features = Input(shape=X_encoder.shape[1:])
decoder_features = Input(shape=X_decoder.shape[1:])
encoder_outputs = encoder(encoder_features)
decoder_outputs = decoder(decoder_features, encoder_outputs)
model = Model([encoder_features, decoder_features], decoder_outputs)
# train the model
model.compile(optimizer=Adam(learning_rate=0.001), loss='mse')
model.fit([X_encoder, X_decoder], y, epochs=100, batch_size=128)
# extract the last predicted sequence
y_true = target_scaler.inverse_transform(y[-1, :])
y_pred = target_scaler.inverse_transform(model.predict([X_encoder, X_decoder])[-1, :])
# plot the last predicted sequence
plt.plot(y_true.flatten(), label='actual')
plt.plot(y_pred.flatten(), label='predicted')
plt.show()
In the example above the model takes two inputs, X_encoder and X_decoder, so in your case when generating the forecasts you can use the past observed temperatures in X_encoder and the future temperature forecasts in X_decoder.
It is a pytorch code to time series prediction with an known external/exogenous regressor to the given period forecasted.Hope it helps!!!Have a marvellous day !!!
The input format is a 3d Tensor an output 1d array (MISO-Multiple Inputs Single Output)
def CNN_Attention_Bidirectional_LSTM_Encoder_Decoder_predictions(model,data ,regressors, extrapolations_leght):
n_input = extrapolations_leght
pred_list = []
batch = data[-n_input:]
model = model.train()
pred_list.append(torch.cat(( model(batch)[-1], torch.FloatTensor(regressors.iloc[1,[1]]).to(device).unsqueeze(0)),1))
batch = torch.cat((batch[n_input-1].unsqueeze(0), pred_list[-1].unsqueeze(0)),1)
batch = batch[:, 1:, :]
for i in range(n_input-1):
model = model.eval()
pred_list.append(torch.cat((model(batch).squeeze(0), torch.FloatTensor(regressors.iloc[i+1,[1]]).to(device).unsqueeze(0)),1))
batch = torch.cat((batch, pred_list[-1].unsqueeze(0)),1)
batch = batch[:, 1:, :]
model = model.train()
return np.array([pred_list[j].cpu().detach().numpy() for j in range(n_input)])[:,:, 0]
I'm trying to use TensorFlow in python, to make some prediction with cryptocurrency data. The problem is that the output of the prediction is like a 0.1-0.9 number whereas the cryptocurrency data should be a 10000-10100 format, and I don't find a solution to convert the 0.* number to the real one.
I've try to create a ratio, with substrat max - min from predicted values, and max-min from tested data, and divide to have a ratio but when I multiply this ratio with prediction there is a big rate of error ( found a 14000 number instead of a 10000 one )
Here some code :
train_start = 0
train_end = int(np.floor(0.7*n))
test_start = train_end
test_end = n
data_train = data[np.arange(train_start, train_end), :]
data_test = data[np.arange(test_start, test_end), :]
Scale data:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
data_train = scaler.fit_transform(data_train)
data_test = scaler.transform(data_test)
Build X and y:
X_train = data_train[:, 1:]
y_train = data_train[:, 0]
X_test = data_test[:, 1:]
y_test = data_test[:, 0]
.
.
.
n_data = 10
n_neurons_1 = 1024
n_neurons_2 = 512
n_neurons_3 = 256
n_neurons_4 = 128
n_target = 1
X = tf.compat.v1.placeholder(dtype=tf.compat.v1.float32, shape=[None, n_data])
Y = tf.compat.v1.placeholder(dtype=tf.compat.v1.float32, shape=[None])
Hidden layer
..
Output layer (must be transposed)
..
Cost function
..
Optimizer
..
Make Session:
sess = tf.compat.v1.Session()
Run initializer:
sess.run(tf.compat.v1.global_variables_initializer())
Setup interactive plot:
plt.ion()
fig = plt.figure()
ax1 = fig.add_subplot(111)
line1, = ax1.plot(y_test)
line2, = ax1.plot(y_test*0.5)
plt.show()
epochs = 10
batch_size = 256
for e in range(epochs):
# Shuffle training data
shuffle_indices = np.random.permutation(np.arange(len(y_train)))
X_train = X_train[shuffle_indices]
y_train = y_train[shuffle_indices]
# Minibatch training
for i in range(0, len(y_train) // batch_size):
start = i * batch_size
batch_x = X_train[start:start + batch_size]
batch_y = y_train[start:start + batch_size]
# Run optimizer with batch
sess.run(opt, feed_dict={X: batch_x, Y: batch_y})
# Show progress
if np.mod(i, 5) == 0:
# Prediction
pred = sess.run(out, feed_dict={X: X_test})
#This pred var is the output of the prediction
I persiste my result in a file and this is what its looks like :
2019-08-21 06-AM;15310.444858356934;0.50021994;
2019-08-21 12-PM;14287.717187390663;0.46680558;
2019-08-21 06-PM;14104.63871795706;0.46082407;
For example, the last prediction is 0,46 but when I try to convert it I found 14104 whereas it should be nearer a 10000 value
Does anyone have an idea how to convert those predictions?
Thanks!
You will have to make use of inverse_transform of MinMaxScaler to convert back the output you are getting in range of 0-1.
You have not given your model, but I believe you are making use of regression task with few dense layers. You will have to keep minimizing your loss. If you are using mean squared error, the larger the loss, more is the likelihood your output will be far away from the desired set of results.
Even after your loss is a small number and the result is coming good for train samples, but the prediction is bad for test dataset, you may have to consider increasing your train dataset so that more possibilities are covered. If that is not possible, consider reducing the number of neurons in your neural network so that it stops over-fitting.
You can do some postprocessing to restrict the output to some desired range.
Im trying to write a multi-class perceptron algorithm for the MNIST dataset.
now I have the following code which works, but due to the fact its iterating 60k times it works slowly.
weights is the size - (785,10)
def multiClassPLA(train_data, train_labels, weights):
epoch_err = [] # will hold the misclassified ratio for each epoch
best_weights = weights
best_error = 1
for epoch in range(EPOCH):
err = 0
# randomize the data before each epoch
train_data, train_labels = randomizeData(train_data, train_labels)
for x, y in zip(train_data, train_labels):
h = oneVsAllLabeling_(np.dot(weights, x))
diff = (y - h) / 2
x = x.reshape(1, x.shape[0])
diff = diff.reshape(CLASSES, 1)
update_step = ETA * np.dot(diff, x)
weights += update_step
return weights
oneVsAllLabeling_(X) function returns a vector which contain 1 at the argmax and -1 elsewhere. the truth labels has the same form of course.
with this algorithm I'm getting ~90% accuracy, safe but slowly.
after further exploration of the problem I found that I can improve the code using array/matrix multiplication.
so I've started to do the following:
def oneVsAllLabeling(X):
idx = np.argmax(X, axis=1)
mask = np.zeros(X.shape, dtype=bool)
mask[np.arange(len(idx)),idx] = 1
out = 2 * mask - 1
return out.astype(int)
def zeroOneError(prediction):
tester = np.zeros((1, CLASSES))
good_prediction = len(np.where(prediction == tester))
return len(prediction) - good_prediction
def preceptronModelFitting(data, weights, labels, to_print, epoch=None):
prediction = np.matmul(data, weights)
prediction = oneVsAllLabeling(prediction)
diff = (prediction - labels) / 2
error = zeroOneError(diff)
accuracy = error / len(data)
if to_print:
print("Epoch: {}. Loss: {}. Accuracy: {}".format(epoch, error, accuracy))
return prediction, error, accuracy
def multiClassPLA2(train_data, train_labels, test_data, test_labels, weights):
predicted_output = np.zeros((1, CLASSES))
train_loss_vec = np.array([])
train_accuracy_vec = np.array([])
test_loss_vec = np.array([])
test_accuracy_vec = np.array([])
for epoch in range(EPOCH):
# randomize the data before each epoch
train_data, train_labels = randomizeData(train_data, train_labels)
train_prediction, train_error, train_accuracy = preceptronModelFitting(train_data, weights, train_labels, to_print=False)
return weights
after calling preceptronModelFitting() I get a matrix the in the size (60k,10) which every entry has the following shape:
train_prediction[0]=[0,0,1,0,0,-1,0,0,0,0]
and the data has the shape (60k, 785)
now what I need to do is, if possible, to multiply each row with each of the data entries and sum so that in total what ill get is a matrix the size (785,10) which I can update with it the old set of weights.
which its almost equivalent to what I do in the not efficient algorithm, the only differance is that I update the weights every new data entry instead of after seeing all the data.
Thanks!
OK you did most of the job done, and even you had part of the answer in your title.
np.matmul(X.T, truth - prediction)
Using this, it will get you what you want in a one line.
notice that this based on the fact that indeed truth, prediction, X are as you mentioned.
Hello i m trying to complete an assignment based on training a perceptron (without any hidden layer) to perform binary classification using sigmoid activation function. but due to some reason my code is not working correctly. although the error is decreasing after each epoch but accuracy is not increasing. i have target labels 1 and 0, but my predicted labels are almost all close to one. none of my predicted label is representing the 0 class.
below is my code. anyone please tell me what have i done wrong.
<# Create a Neural_Network class
class Neural_Network(object):
def __init__(self,inputSize = 2,outputSize = 1 ):
# size of layers
self.inputSize = inputSize
self.outputSize = outputSize
#weights
self.W1 = 0.01*np.random.randn(inputSize+1, outputSize) # randomly initialize W1 using random function of numpy
# size of the wieght will be (inputSize +1, outputSize) that +1 is for bias
def feedforward(self, X): #forward propagation through our network
n,m=X.shape
Xbias = np.ones((n,1)) #bias term in input
Xnew = np.hstack((Xbias,X)) #adding biasterm in input to match the dimension with the weigth
self.product=np.dot(Xnew,self.W1) # dot product of X (input) and set of weights
output=self.sigmoid(self.product) # apply activation function (i.e. sigmoid)
return output # return your answer with as a final output of the network
def sigmoid(self, s):# apply sigmoid function on s and return its value
return (1./(1. + np.exp(-s))) #activation sigmoid function
def sigmoid_derivative(self, s):#derivative of sigmoid
#derivative of sigmoid = sigmoid(x)*(1-sigmoid(x))
return s*(1-s) # here s will be sigmoid(x)
def backwardpropagate(self,X, Y, y_pred, lr):
# backward propagate through the network
# compute error in output which is loss, compute cross entropy loss function
self.output_error=self.crossentropy(Y,y_pred) #output error
# applying derivative of sigmoid to the error
self.error_deriv=self.output_error*self.sigmoid_derivative(y_pred)
# adjust set of weights
n,m=X.shape
Xbias = np.ones((n,1)) #bias term in input
Xnew = np.hstack((Xbias,X)) #adding biasterm in input to match the dimension with the weigth
self.W1 += lr*(Xnew.T.dot(self.error_deriv)) # W1=W1+ learningrate*errorderiv*input
#self.W1 += X.T.dot(self.z2_delta)
def crossentropy(self, Y, Y_pred):
# compute error based on crossentropy loss
#Cross entropy= sum(Y_actual*log(y_predicted))/N. here 1e-6 is used to avoid log 0
N = Y_pred.shape[0]
#cr_entropy=-np.sum(((Y*np.log(Y_pred+1e-6))+((1-Y)*np.log(1-Y_pred+1e-6))))/N
cr_entropy=-np.sum(Y*np.log(Y_pred+1e-6))/N
return cr_entropy #error
Null=None
def train(self, trainX, trainY,epochs = 100, learningRate = 0.001, plot_err = True ,validationX = Null, validationY = Null):
tr_error=[]
for i in range(epochs):
# feed forward trainX and trainY and recievce predicted value
y_predicted=self.feedforward(trainX)
print(i,y_predicted)
# backpropagation with trainX, trainY, predicted value and learning rate.
self.backwardpropagate(trainX,trainY,y_predicted,learningRate)
tr_error.append(self.output_error)
print(i,self.output_error)
print(i,self.W1)
# """"""if validationX and validationY are not null than show validation accuracy and error of the model.""""""
# plot error of the model if plot_err is true
epocharray=range(0,epochs)
plt.plot(epocharray,tr_error,'r',linewidth=3.0) #plotting error vs. no. of epochs
plt.xlabel('No. of Epochs')
plt.ylabel('Cross Entropy Error')
plt.title('Error Vs. Epoch')
def predict(self, testX):
# predict the value of testX
self.ytest_pred=self.feedforward(testX)
def accuracy(self, testX, testY):
import math
# predict the value of trainX
self.ytest_pred1=self.feedforward(testX)
acc=0
# compare it with testY
for j in range(len(testY)):
q=math.ceil(self.ytest_pred1[j])
#p=round(q)
if testY[j] == q:
acc +=1
accuracy=acc/float(len(testX))*100
print("Percentage Accuracy is", accuracy,"%")
# compute accuracy, print it and """"""show in the form of picture""""""
return accuracy # return accuracy>
# generating dataset point
np.random.seed(1)
no_of_samples = 2000
dims = 2
#Generating random points of values between 0 to 1
class1=np.random.rand(no_of_samples,dims)
#To add separability we will add a bias of 1.1
class2=np.random.rand(no_of_samples,dims)+1.1
class_1_label=np.array([1 for n in range(no_of_samples)])
class_2_label=np.array([0 for n in range(no_of_samples)])
#Lets visualize the dataset
plt.scatter(class1[:,0],class1[:,1], marker='^', label="class 1")
plt.scatter(class2[:,0],class2[:,1], marker='o', label="class 2")
plt.xlabel('Dimension 1')
plt.ylabel('Dimension 2')
plt.legend(loc='best')
plt.show()
from sklearn.utils import shuffle
from sklearn.model_selection import train_test_split
# Data concatenation
data = np.concatenate((class1,class2),axis=0)
label = np.concatenate((class_1_label,class_2_label),axis=0)
#Note: shuffle this dataset before dividing it into three parts
data,label=shuffle(data,label)
#print(data)
# now using train_test_split command to split data into 60% training data, 20% testing data and 20% validation data
trainX, testX, trainY, testY = train_test_split(data, label, test_size=0.2, random_state=1)
trainX, validX, trainY, validY = train_test_split(trainX, trainY, test_size=0.25, random_state=1)
model = Neural_Network(2,1)
# try different combinations of epochs and learning rate
model.train(trainX, trainY, epochs = 100, learningRate = 0.000001, validationX = validX, validationY = validY)
model.accuracy( testX,testY)
the Results are coming like this(no label going near 0)
0 [[0.49670809]
[0.4958389 ]
[0.4966064 ]
...
[0.49537492]
[0.49566927]
[0.4961255 ]]
0 828.1069658303942
0 [[0.48311074]
[0.51907406]
[0.52764299]]
1 [[0.69813116]
[0.91746189]
[0.80408611]
...
[0.74821077]
[0.87150079]
[0.75187736]]
1 250.96538025031356
1 [[0.56983781]
[0.59205773]
[0.60057486]]
2 [[0.72602796]
[0.94067579]
[0.83591236]
...
[0.77916283]
[0.90032058]
[0.78291184]]
2 210.645081151866
2 [[0.63353102]
[0.64265939]
[0.65118627]]
3 [[0.74507968]
[0.95318096]
[0.85588864]
...
[0.79953834]
[0.91705918]
[0.80329027]]
3 186.2933734713245
3 [[0.6846678 ]
[0.68164316]
[0.69020355]]
4 [[0.75952936]
[0.96114086]
[0.87010085]
...
[0.81456476]
[0.92830628]
[0.81829009]]
4 169.32091332021724
4 [[0.72771826]
[0.71342293]
[0.72202744]]
5 [[0.77112943]
[0.96669774]
[0.88093323]
...
[0.82635507]
[0.93649788]
[0.83004119]]
5 156.53923256347372
Please help me to solve this problem
I see you have set learning rate too small. Set it to 0.001 and Increase epoch to 20k and you will see your model learning well.
Plotting error vs epoch's should give you better idea where to stop.
import numpy as np
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
#reproducible random seed
seed = 1
np.random.seed(seed)
#Import and normalize the data
df = pd.read_csv('creditcard.csv')
#Exploring the data
# print df.head()
# print df.describe()
# print df.isnull().sum()
# count_class = pd.value_counts(df['Class'])
# count_class.plot(kind = 'bar')
# plt.title('Fraud class histogram')
# plt.xlabel('class')
# plt.ylabel('Frequency')
# plt.show()
# print('Clearly the data is totally unbalanced!')
#to normalize the amount column
# data['normAmount'] = StandardScaler().fit_transform(data['Amount'].reshape(-1, 1))
df['normAmount'] = StandardScaler().fit_transform(df['Amount'].values.reshape(-1, 1))
df = df.drop(['Time','V28','V27','V26','V25','V24','V23','V22','V20','V15','V13','V8','Amount'], axis =1)
X = df.iloc[:,df.columns!='Class']
Y = df.iloc[:,df.columns=='Class']
# number of records in the minority class
number_record_fraud = len(df[df.Class==1])
fraud_indices = np.array(df[df.Class==1].index)
#picking normal class
normal_indices = np.array(df[df.Class==0].index)
#select random x(number_record_fraud) numbers from normal_indices
random_normal_indices = np.random.choice(normal_indices,number_record_fraud,replace=False)
random_normal_indices = np.array(random_normal_indices)
#under sample data
under_sample_indices = np.concatenate([fraud_indices,random_normal_indices])
under_sample_data = df.iloc[under_sample_indices,:]
X_undersample = under_sample_data.iloc[:,under_sample_data.columns!='Class']
Y_undersample = under_sample_data.iloc[:,under_sample_data.columns=='Class']
# split data into train and test dataset
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size = 0.3)
X_train_undersample,X_test_undersample,Y_train_undersample,Y_test_undersample = train_test_split(X_undersample,Y_undersample,test_size=0.3)
#parameters
learning_rate = 0.05
training_epoch = 10
batch_size = 43
display_step = 1
#tf graph input
x = tf.placeholder(tf.float32,[None,18])
y = tf.placeholder(tf.float32,[None,1])
#set model weights
w = tf.Variable(tf.zeros([18,1]))
b = tf.Variable(tf.zeros([1]))
#construct model
pred = tf.nn.softmax(tf.matmul(x,w) + b) #softmax activation
#minimize error using cross entropy
cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(pred),reduction_indices=1))
#Gradient descent
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
#initializing variables
init = tf.global_variables_initializer()
#launch the graph
with tf.Session() as sess:
sess.run(init)
#training cycle
for epoch in range(training_epoch):
total_batch = len(X_train_undersample)/batch_size
avg_cost = 0
#loop over all the batches
for batch in range(total_batch):
batch_xs = X_train.iloc[(batch)*batch_size:(batch+1) *batch_size]
batch_ys = Y_train.iloc[(batch)*batch_size:(batch+1) *batch_size]
# run optimizer and cost operation
_,c= sess.run([optimizer,cost],feed_dict={x:batch_xs,y:batch_ys})
avg_cost += c/total_batch
correct_prediction = tf.equal(tf.argmax(pred,1),tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))
#disply log per epoch step
if (epoch+1) % display_step == 0:
train_accuracy, newCost = sess.run([accuracy, cost], feed_dict={x: X_test,y: Y_test})
print "test_set_accuracy:",accuracy.eval({x:X_test_undersample,y:Y_test_undersample})*100
print "whole_set_accuracy:",accuracy.eval({x:X,y:Y})*100
# print train_accuracy
# print "cost",newCost
print
print 'optimization finished.'
Things I've tried to figure out what's causing it:
Tried changing train dataset length.
Dropped some not needed fields.
Tried putting validation blocks.
Dataset :link
There can be multiple reasons of why it is overfitting , and as well there can be multiple ways to debug it and to fix it. Its hard to tell just from the code, because it also depends on the data, but here are some common reaons as well as fixes:
Too small dataset, adding more data its a common overfitting fix
Too complex model, if you have many features, or complex polonomial features, try to reducing complexity using feature selection
Add regularization: i dont see regularization in your code, try to add it.