I am trying to predict the growth rate of a user using LSTM and Adam algo. But the predictions which I am getting from code is way far then accurate values. I am new to ML and just trying to learn how things are measured in ML. That what does units basically do in the LSTM model. I am reading values from CSV and trying to find the Growth rate of a user based on the amount he collected in 2 years. But my Predictions seem to be giving inaccurate values. Can anyone tell me how Can I find the correct prediction in order to get the Growth rate of a user?
Here my code:
import pymysql
import pandas as pd
import numpy as np
import csv
from datetime import datetime
import time
import json
import matplotlib.pyplot as plt
import seaborn as sns
import pprint
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.utils.np_utils import to_categorical
from keras.layers import Input
import os
os.environ['KERAS_BACKEND']='tensorflow'
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
from keras.layers.recurrent import LSTM
from matplotlib import style
from keras.layers import Activation, Dense, Dropout
df = pd.read_csv("trakop.csv")
print("="*50)
print("First Five Rows ","\n")
print(df.head(2),"\n")
dataset = df
dataset["Month"] = pd.to_datetime(df["timestamp"]).dt.month
dataset["Year"] = pd.to_datetime(df["timestamp"]).dt.year
dataset["Date"] = pd.to_datetime(df["timestamp"]).dt.date
dataset["Time"] = pd.to_datetime(df["timestamp"]).dt.time
dataset["Week"] = pd.to_datetime(df["timestamp"]).dt.week
dataset["Day"] = pd.to_datetime(df["timestamp"]).dt.day_name()
dataset["Hour"] = pd.to_datetime(df["timestamp"]).dt.hour
dataset = df.set_index("timestamp")
dataset.index = pd.to_datetime(dataset.index)
dataset.head(1)
print(df.Year.unique(),"\n")
print("Total Number of Unique Year", df.Year.nunique(), "\n")
NewDataSet = dataset.resample('D').mean()
# print(NewDataSet)
print("Old Dataset ",dataset.shape )
print("New Dataset ",NewDataSet.shape )
excludedValue = 5
TestData = NewDataSet.tail(10)
Training_Set = NewDataSet.iloc[:,0:1]
Training_Set = Training_Set[:-excludedValue]
print("Training Set Shape ", Training_Set.shape)
print("Test Set Shape ", TestData.shape)
Training_Set = Training_Set.values
sc = MinMaxScaler(feature_range=(0, 1))
Train = sc.fit_transform(Training_Set)
X_Train = []
Y_Train = []
# Range should be fromm 60 Values to END
for i in range(excludedValue, Train.shape[0]):
# X_Train 0-9
X_Train.append(Train[i- excludedValue:i])
# Y Would be 10 th Value based on past 10 Values
Y_Train.append(Train[i])
# Convert into Numpy Array
X_Train = np.array(X_Train)
Y_Train = np.array(Y_Train)
print(X_Train.shape)
print(Y_Train.shape)
X_Train = np.reshape(X_Train, newshape=(X_Train.shape[0], X_Train.shape[1], 1))
X_Train.shape
regressor = Sequential()
# Adding the first LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 1, return_sequences = True, input_shape = (X_Train.shape[1], 1)))
regressor.add(Dropout(0.4))
# Adding a second LSTM layer and some Dropout regularisation
regressor.add(LSTM(units=1, return_sequences = True))
regressor.add(Dropout(0.4))
# Adding a third LSTM layer and some Dropout regularisation
regressor.add(LSTM(units=1, return_sequences = True))
regressor.add(Dropout(0.4))
# Adding a fourth LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 1))
regressor.add(Dropout(0.4))
# Adding the output layer
regressor.add(Dense(units = 1))
# Compiling the RNN
regressor.compile(optimizer = 'rmsprop', loss = 'mean_squared_error', metrics=['acc'])
regressor.fit(X_Train, Y_Train, epochs = 30, batch_size = 12,verbose=2)
Df_Total = pd.concat((NewDataSet[["amount"]], TestData[["amount"]]), axis=0)
Df_Total.shape
inputs = Df_Total[len(Df_Total) - len(TestData) - excludedValue:].values
# We need to Reshape
inputs = inputs.reshape(-1,1)
# Normalize the Dataset
inputs = sc.transform(inputs)
X_test = []
for i in range(excludedValue, inputs.shape[0]):
X_test.append(inputs[i- excludedValue:i])
# Convert into Numpy Array
X_test = np.array(X_test)
# Reshape before Passing to Network
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))
# Pass to Model
predicted_raise = regressor.predict(X_test)
# Do inverse Transformation to get Values
predicted_raise = sc.inverse_transform(predicted_raise)
Predicted_Amount = predicted_raise
dates = TestData.index.to_list()
True_Amount = TestData["amount"].to_list()
Predicted_Amount = predicted_raise
dates = TestData.index.to_list()
growth_rate= (True_Amount-Predicted_Amount)/True_Amount*100
Machine_Df = pd.DataFrame(data={
"Date":dates,
"TrueAmount": True_Amount,
"PredictedAmount":[x[0] for x in Predicted_Amount ],
"Growthrate": [x[0] for x in growth_rate]
})
print(Machine_Df)
fig = plt.figure()
ax1= fig.add_subplot(111)
x = dates
y = True_Amount
y1 = Predicted_Amount
plt.plot(x,y, color="green")
plt.plot(x,y1, color="red")
# beautify the x-labels
plt.gcf().autofmt_xdate()
plt.xlabel('Dates')
plt.ylabel("Amount")
plt.title("Machine Learned the Pattern Predicting Future Values ")
plt.legend()
Here is what I am getting in my output:
('First Five Rows ', '\n')
( timestamp amount
0 2019-09-08 06:30:23 38.0
1 2019-09-08 06:36:48 19.0, '\n')
(array([2019, 2020]), '\n')
('Total Number of Unique Year', 2, '\n')
('Old Dataset ', (12492, 8))
('New Dataset ', (129, 5))
('Training Set Shape ', (124, 1))
('Test Set Shape ', (10, 5))
(119, 5, 1)
(119, 1)
Epoch 1/30
- 15s - loss: 0.0177 - acc: 0.0084
Epoch 2/30
- 1s - loss: 0.0165 - acc: 0.0084
Epoch 3/30
- 1s - loss: 0.0153 - acc: 0.0084
Epoch 4/30
- 1s - loss: 0.0167 - acc: 0.0084
Epoch 5/30
- 1s - loss: 0.0157 - acc: 0.0084
Epoch 6/30
- 1s - loss: 0.0158 - acc: 0.0084
Epoch 7/30
- 1s - loss: 0.0151 - acc: 0.0084
Epoch 8/30
- 1s - loss: 0.0153 - acc: 0.0084
Epoch 9/30
- 1s - loss: 0.0150 - acc: 0.0084
Epoch 10/30
- 1s - loss: 0.0160 - acc: 0.0084
Epoch 11/30
- 1s - loss: 0.0158 - acc: 0.0084
Epoch 12/30
- 1s - loss: 0.0155 - acc: 0.0084
Epoch 13/30
- 1s - loss: 0.0157 - acc: 0.0084
Epoch 14/30
- 1s - loss: 0.0155 - acc: 0.0084
Epoch 15/30
- 1s - loss: 0.0152 - acc: 0.0084
Epoch 16/30
- 1s - loss: 0.0153 - acc: 0.0084
Epoch 17/30
- 1s - loss: 0.0150 - acc: 0.0084
Epoch 18/30
- 1s - loss: 0.0151 - acc: 0.0084
Epoch 19/30
- 1s - loss: 0.0150 - acc: 0.0084
Epoch 20/30
- 1s - loss: 0.0151 - acc: 0.0084
Epoch 21/30
- 1s - loss: 0.0153 - acc: 0.0084
Epoch 22/30
- 1s - loss: 0.0151 - acc: 0.0084
Epoch 23/30
- 1s - loss: 0.0150 - acc: 0.0084
Epoch 24/30
- 1s - loss: 0.0153 - acc: 0.0084
Epoch 25/30
- 1s - loss: 0.0152 - acc: 0.0084
Epoch 26/30
- 1s - loss: 0.0151 - acc: 0.0084
Epoch 27/30
- 1s - loss: 0.0152 - acc: 0.0084
Epoch 28/30
- 1s - loss: 0.0151 - acc: 0.0084
Epoch 29/30
- 1s - loss: 0.0151 - acc: 0.0084
Epoch 30/30
- 1s - loss: 0.0151 - acc: 0.0084
Date Growthrate PredictedAmount TrueAmount
0 2020-01-05 1.695584 122.266731 124.375625
1 2020-01-06 1.691683 122.271584 98.166667
2 2020-01-07 1.682077 122.283531 120.892473
3 2020-01-08 1.690008 122.273666 84.863636
4 2020-01-09 1.694407 122.268196 94.673077
5 2020-01-10 1.706436 122.253235 99.140341
6 2020-01-11 1.700952 122.260056 124.580882
7 2020-01-12 1.701755 122.259056 56.390071
8 2020-01-13 1.696290 122.265854 78.746951
9 2020-01-14 1.698001 122.263725 49.423529
[100 rows x 3 columns]
Screenshot of Graph:
The CSV I am using:
https://drive.google.com/file/d/1nKHNqh7fJJJVvb2Qy-DxAO7c7HwNpEI0/view?usp=sharing
Any help would be greatly appreciated!!!
I have worked on your code. First of all please reduce the batch size because the size of dataset is small and change the optimizer from "adam" to "rmsprop". Because adam uses constant learning rate, that's why you are receiving the constant values in the prediction. I have also increased the dropout to 0.4.
For calculating the growth rate, I have used the formula,
growth rate= (True Amount- Predicted Amount)/True Amount *100
this formula, gives you the percentage score of growth in the predicted amount and true amount.
For full code, please follow the GitHub link
https://github.com/rohitnarain24/Optimizing-LSTM-model/blob/master/optimized%20lstm.txt
Related
i'm doing a RNN based on the model from the Deep Learning A-Z course on Udemy.
For the example for Google Stocks, we used 5 years of daily stock price. In the end of the lecture is said to test with more data or change the parameters or the structure of the RNN.
My thought was that if i can get more data the RNN can get better results. I downloaded the data from S&P from 01/01/2006 to today, separated the train test except the last 23 days and the 23 days are my test for prediction.
So im excited to see if i can get a kind of useful insigths...let it run in 100 epochs.
Epoch 1/100
3599/3599 [==============================] - 235s 65ms/step - loss: 0.0090
Epoch 2/100
3599/3599 [==============================] - 210s 58ms/step - loss: 0.0024
Epoch 3/100
3599/3599 [==============================] - 208s 58ms/step - loss: 0.0022
Epoch 4/100
3599/3599 [==============================] - 557s 155ms/step - loss: 0.0024
Epoch 5/100
3599/3599 [==============================] - 211s 59ms/step - loss: 0.0022
Epoch 6/100
3599/3599 [==============================] - 207s 58ms/step - loss: 0.0018
Epoch 7/100
3599/3599 [==============================] - 216s 60ms/step - loss: 0.0018
Epoch 8/100
3599/3599 [==============================] - 265s 74ms/step - loss: 0.0016
Epoch 9/100
3599/3599 [==============================] - 215s 60ms/step - loss: 0.0016
Epoch 10/100
3599/3599 [==============================] - 209s 58ms/step - loss: 0.0014
Epoch 11/100
3599/3599 [==============================] - 217s 60ms/step - loss: 0.0014
Epoch 12/100
3599/3599 [==============================] - 216s 60ms/step - loss: 0.0013
Epoch 13/100
3599/3599 [==============================] - 218s 60ms/step - loss: 0.0012
Epoch 14/100
3599/3599 [==============================] - 217s 60ms/step - loss: 0.0012
Epoch 15/100
3599/3599 [==============================] - 210s 58ms/step - loss: 0.0012
Epoch 16/100
3599/3599 [==============================] - 292s 81ms/step - loss: 0.0012
Epoch 17/100
3599/3599 [==============================] - 328s 91ms/step - loss: 0.0011
Epoch 18/100
3599/3599 [==============================] - 199s 55ms/step - loss: 9.8658e-04
Epoch 19/100
3599/3599 [==============================] - 199s 55ms/step - loss: 0.0010
Epoch 20/100
3599/3599 [==============================] - 286s 79ms/step - loss: 9.9106e-04
WOW 0,0010 was pretty good...but from here it's way too low.
i stopped in the 39 epoch...because it's taking too long and the loss is too small.
Epoch 39/100
2560/3599 [====================>.........] - ETA: 1:00 - **loss: 6.3598e-04**
This is the results
Did i overfit the data? Or stopping too soon is the cause of the large errors? What can i do to optimize the time required to run the 100 epochs?
The code is the following:
# Recurrent Neural Network
# Part 1 - Data Preprocessing
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
# Importing the training set
dataset_train = pd.read_csv('S&P_Train.csv')
training_set = dataset_train.iloc[:, 1:2].values
# Feature Scaling
sc = MinMaxScaler(feature_range = [0, 1])
training_set_sc = sc.fit_transform(training_set)
# Creating a data structure with 60 timesteps and 1 output
X_train = []
y_train = []
for i in range(60, 3659):
X_train.append(training_set_sc[i-60:i, 0])
y_train.append(training_set_sc[i, 0])
X_train, y_train = np.array(X_train), np.array(y_train)
# Reshaping
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
# Part 2 - Building the RNN
# Importing the Keras libraries and packages
# Initialising the RNN
regressor = Sequential()
# Adding the first LSTM layer and some Dropout regularisation
rnn = regressor.add(LSTM(units = 50, return_sequences = True, input_shape = (X_train.shape[1], 1)))
rnn = regressor.add(Dropout(0.2))
# Adding a second LSTM layer and some Dropout regularisation
rnn = regressor.add(LSTM(units = 50, return_sequences = True))
rnn = regressor.add(Dropout(0.2))
# Adding a third LSTM layer and some Dropout regularisation
rnn = regressor.add(LSTM(units = 50, return_sequences = True))
rnn = regressor.add(Dropout(0.2))
# Adding a fourth LSTM layer and some Dropout regularisation
rnn = regressor.add(LSTM(units = 50))
rnn = regressor.add(Dropout(0.2))
# Adding the output layer
rnn = regressor.add(Dense(units = 1))
# Compiling the RNN
rnn = regressor.compile(optimizer = 'Adam', loss = 'mean_squared_error')
# Fitting the RNN to the Training set
regressor.fit(X_train, y_train, epochs = 100, batch_size = 32)
# Part 3 - Making the predictions and visualising the results
print('ok')
# Getting the real stock price of 2017
dataset_test = pd.read_csv('S&P_Test.csv')
real_stock_price = dataset_test.iloc[:, 1:2].values
# Getting the predicted stock price of 2017
dataset_total = pd.concat((dataset_train['Open'], dataset_test['Open']), axis = 0)
inputs = dataset_total[len(dataset_total) - len(dataset_test) - 60:].values
inputs = inputs.reshape(-1, 1)
inputs = sc.transform(inputs)
X_test = []
for i in range(60, 83):
X_test.append(inputs[i-60:i, 0])
X_test = np.array(X_test)
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))
predicted_stock_price = regressor.predict(X_test)
predicted_stock_price = sc.inverse_transform(predicted_stock_price)
# Visualising the results
plt.plot(real_stock_price, color = 'red', label = 'Real Stock Price')
plt.plot(predicted_stock_price, color = 'blue', label = 'Predicted Stock Price')
plt.title('Prediction of Stocks Values')
plt.xlabel('time')
plt.ylabel('Stock Price')
plt.legend()
plt.show()
Did i overfit the data?
Yeah you probably did, you can check it via val_loss, if your validation loss starts increasing, you are overfitting. You should use validation_set and check validation_error
What can i do to optimize the time required to run the 100 epochs?
You can stop training before overfitting the data with Earlystopping from tensorflow api, tf.keras.callbacks.EarlyStopping()
from tensorflow.keras.callbacks import EarlyStopping
early_stopping = EarlyStopping()
model.compile(...)
model.fit(..., epochs = 9999, callbacks=early_stopping)
I'm new to Machine learning and I'm trying to predict the Lira rate with keras. I think the values are right, but I cannot properly plot the values. It looks like this: Image
and here's my code (the csv file is on German and because of this here are the translations: Datum -> Date, Erster -> Open, Hoch -> High, Tief -> Low, Schlusskurs -> Close):
The problem is below:
import pandas as pd
import numpy as np
from keras.models import Sequential
from keras.layers import LSTM
X_train = []
y_train = []
csv_file = "wkn_A0C32V_historic.csv" #csv file (path)
data = pd.read_csv(csv_file, sep=";") #reading the csv file
data["Erster vorher"] = data["Erster"].shift(-1) #moving the data in Erster(Open) one step backwards
data["Erster"] = data["Erster"].str.replace(",", ".") #replacing all commas with dots in order to calculate with float numbers
data["Erster vorher"] = data["Erster vorher"].str.replace(",", ".") #same here
data["Changes"] = (data["Erster"].astype(float) / data["Erster vorher"].astype(float)) - 1 #calculating the changes
data = data.dropna() #dropping the NaNs
changes = data["Changes"]
#X_train = (number_of_examples, sequence_length, input_dimension)
for i in range(len(changes) - 20):
X_train.append(np.array(changes[i+1:i+21][::-1]))
y_train.append(changes[i])
X_train = np.array(X_train).reshape(-1, 20, 1)
y_train = np.array(y_train)
print("X_train shape: " + str(X_train.shape))
print("y_train shape: " + str(y_train.shape))
#Training the data
model = Sequential()
model.add(LSTM(1, input_shape=(20, 1)))
model.compile(optimizer="rmsprop", loss="mse", metrics=["accuracy"])
model.fit(X_train, y_train, batch_size=32, epochs=10)
preds = model.predict(X_train)
preds = preds.reshape(-1)
print("Shape of predictions: " + str(preds.shape))
preds = np.append(preds, np.zeros(20))
data["predictions"] = preds
data["Open_predicted"] = data["Erster vorher"].astype(float) * (1 + data["predictions"].astype(float)) #calculating the new Open with the predicted numbers
print(data)
import matplotlib.pyplot as plt
dates = np.array(data["Datum"]).astype(np.datetime64)
#HERE BEGINS THE PROBLEM...
plt.plot(dates, data["Erster"], label="Erster")
plt.plot(dates, data["Open_predicted"], label="Erster (predicted)")
plt.legend()
plt.show()
Output:
Epoch 9/10
32/3444 [..............................] - ETA: 0s - loss: 9.5072e-05 - accuracy: 0.1250
448/3444 [==>...........................] - ETA: 0s - loss: 1.8344e-04 - accuracy: 0.0513
960/3444 [=======>......................] - ETA: 0s - loss: 1.2734e-04 - accuracy: 0.0583
1472/3444 [===========>..................] - ETA: 0s - loss: 1.0480e-04 - accuracy: 0.0577
1984/3444 [================>.............] - ETA: 0s - loss: 9.7956e-05 - accuracy: 0.0600
2464/3444 [====================>.........] - ETA: 0s - loss: 9.0399e-05 - accuracy: 0.0621
2976/3444 [========================>.....] - ETA: 0s - loss: 8.5287e-05 - accuracy: 0.0649
3444/3444 [==============================] - 0s 122us/step - loss: 8.1555e-05 - accuracy: 0.0633
Epoch 10/10
32/3444 [..............................] - ETA: 0s - loss: 5.5561e-05 - accuracy: 0.0312
544/3444 [===>..........................] - ETA: 0s - loss: 6.1705e-05 - accuracy: 0.0662
1056/3444 [========>.....................] - ETA: 0s - loss: 1.2215e-04 - accuracy: 0.0644
1536/3444 [============>.................] - ETA: 0s - loss: 9.9676e-05 - accuracy: 0.0651
2048/3444 [================>.............] - ETA: 0s - loss: 9.2219e-05 - accuracy: 0.0625
2592/3444 [=====================>........] - ETA: 0s - loss: 8.8050e-05 - accuracy: 0.0625
3104/3444 [==========================>...] - ETA: 0s - loss: 8.1685e-05 - accuracy: 0.0651
3444/3444 [==============================] - 0s 118us/step - loss: 8.1349e-05 - accuracy: 0.0633
Shape of predictions: (3444,)
Datum Erster Hoch ... Changes predictions Open_predicted
0 2020-09-04 8.8116 8,8226 ... 0.011816 0.000549 8.713479
1 2020-09-03 8.7087 8,8263 ... -0.006457 0.001141 8.775301
2 2020-09-02 8.7653 8,7751 ... -0.005051 0.001849 8.826093
3 2020-09-01 8.8098 8,8377 ... 0.009465 0.001102 8.736818
4 2020-08-31 8.7272 8,7993 ... 0.000069 0.001149 8.736630
... ... ... ... ... ... ... ...
3459 2009-01-07 2.0449 2,1288 ... -0.021392 0.000000 2.089600
3460 2009-01-06 2.0896 2,0922 ... -0.020622 0.000000 2.133600
3461 2009-01-05 2.1336 2,1477 ... 0.002914 0.000000 2.127400
3462 2009-01-04 2.1274 2,1323 ... -0.005377 0.000000 2.138900
3463 2009-01-02 2.1389 2,1521 ... 0.000000 0.000000 2.138900
[3464 rows x 9 columns]
From the graph, two things stand out: (1) Erster and Erster (predicted) appear as if they are on different scales of magnitude, and (2) the large amount of labels on the y-axis label are reminiscent of what you get when you plot datetimes, instead of numbers. I imagine there is some mix-up somewhere, but it is not obvious where.
My suggestions for troubleshooting are: (i) plotting Erster vs Erster (predicted) to check that the scales are similar, and (ii) print the output of data.info() to check that the data types are as expected.
Side note: I recommend sorting the data frame to have increasing order ascending in date.
I'm trying to make the most basic of basic neural networks to get familiar with functional API in Tensorflow 2.x.
Basically what I'm trying to do is the following with my simplified iris dataset (i.e. setosa or not)
Use the 4 features as input
Dense layer of 3
Sigmoid activation function
Dense layer of 2 (one for each class)
Softmax activation
Binary cross entropy / log-loss as my loss function
However, I can't figure out how to control one key aspect of the model. That is, how can I ensure that each feature from my input layer contributes to only one neuron in my subsequent dense layer? Also, how can I allow a feature to contribute to more than one neuron?
This isn't clear to me from the documentation.
# Load data
from sklearn.datasets import load_iris
import pandas as pd
iris = load_iris()
X, y = load_iris(return_X_y=True, as_frame=True)
X = X.astype("float32")
X.index = X.index.map(lambda i: "iris_{}".format(i))
X.columns = X.columns.map(lambda j: j.split(" (")[0].replace(" ","_"))
y.index = X.index
y = y.map(lambda i:iris.target_names[i])
y_simplified = y.map(lambda i: {True:1, False:0}[i == "setosa"])
y_simplified = pd.get_dummies(y_simplified, columns=["setosa", "not_setosa"])
# Traing test split
from sklearn.model_selection import train_test_split
seed=0
X_train,X_test, y_train,y_test= train_test_split(X,y_simplified, test_size=0.3, random_state=seed)
# Simple neural network
import tensorflow as tf
tf.random.set_seed(seed)
# Input[4 features] -> Dense layer of 3 neurons -> Activation function -> Dense layer of 2 (one per class) -> Softmax
inputs = tf.keras.Input(shape=(4))
x = tf.keras.layers.Dense(3)(inputs)
x = tf.keras.layers.Activation(tf.nn.sigmoid)(x)
x = tf.keras.layers.Dense(2)(x)
outputs = tf.keras.layers.Activation(tf.nn.softmax)(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs, name="simple_binary_iris")
model.compile(loss="binary_crossentropy", metrics=["accuracy"] )
model.summary()
history = model.fit(X_train, y_train, batch_size=64, epochs=10, validation_split=0.2)
test_scores = model.evaluate(X_test, y_test)
print("Test loss:", test_scores[0])
print("Test accuracy:", test_scores[1])
Results:
Model: "simple_binary_iris"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_44 (InputLayer) [(None, 4)] 0
_________________________________________________________________
dense_96 (Dense) (None, 3) 15
_________________________________________________________________
activation_70 (Activation) (None, 3) 0
_________________________________________________________________
dense_97 (Dense) (None, 2) 8
_________________________________________________________________
activation_71 (Activation) (None, 2) 0
=================================================================
Total params: 23
Trainable params: 23
Non-trainable params: 0
_________________________________________________________________
Epoch 1/10
2/2 [==============================] - 0s 40ms/step - loss: 0.6344 - accuracy: 0.6667 - val_loss: 0.6107 - val_accuracy: 0.7143
Epoch 2/10
2/2 [==============================] - 0s 6ms/step - loss: 0.6302 - accuracy: 0.6667 - val_loss: 0.6083 - val_accuracy: 0.7143
Epoch 3/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6278 - accuracy: 0.6667 - val_loss: 0.6056 - val_accuracy: 0.7143
Epoch 4/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6257 - accuracy: 0.6667 - val_loss: 0.6038 - val_accuracy: 0.7143
Epoch 5/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6239 - accuracy: 0.6667 - val_loss: 0.6014 - val_accuracy: 0.7143
Epoch 6/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6223 - accuracy: 0.6667 - val_loss: 0.6002 - val_accuracy: 0.7143
Epoch 7/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6209 - accuracy: 0.6667 - val_loss: 0.5989 - val_accuracy: 0.7143
Epoch 8/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6195 - accuracy: 0.6667 - val_loss: 0.5967 - val_accuracy: 0.7143
Epoch 9/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6179 - accuracy: 0.6667 - val_loss: 0.5953 - val_accuracy: 0.7143
Epoch 10/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6166 - accuracy: 0.6667 - val_loss: 0.5935 - val_accuracy: 0.7143
2/2 [==============================] - 0s 607us/step - loss: 0.6261 - accuracy: 0.6444
Test loss: 0.6261375546455383
Test accuracy: 0.644444465637207
how can I ensure that each feature from my input layer contributes to
only one neuron in my subsequent dense layer?
Have one input layer per feature and feed each input layer to a separate dense layer. Later you can concatenate the output of all the dense layers and proceed.
NOTE: One neuron can take any size input (in this case the input size is 1 as you want one feature to be used by the neuron) and the output size if always 1. A Dense layer with with n units will have n neurons and and so will have output size of n.
Working Sample
import tensorflow as tf
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# Model architecutre
x1 = tf.keras.Input(shape=(1,))
x2 = tf.keras.Input(shape=(1,))
x3 = tf.keras.Input(shape=(1,))
x4 = tf.keras.Input(shape=(1,))
x1_ = tf.keras.layers.Dense(3, activation=tf.nn.relu)(x1)
x2_ = tf.keras.layers.Dense(3, activation=tf.nn.relu)(x2)
x3_ = tf.keras.layers.Dense(3, activation=tf.nn.relu)(x3)
x4_ = tf.keras.layers.Dense(3, activation=tf.nn.relu)(x4)
merged = tf.keras.layers.concatenate([x1_, x2_, x3_, x4_])
merged = tf.keras.layers.Dense(16, activation=tf.nn.relu)(merged)
outputs = tf.keras.layers.Dense(3, activation=tf.nn.softmax)(merged)
model = tf.keras.Model(inputs=[x1,x2,x3,x4], outputs=outputs)
model.compile(loss="sparse_categorical_crossentropy", metrics=["accuracy"] )
# Load and prepare data
iris = load_iris()
X = iris.data
y = iris.target
X_train,X_test, y_train,y_test= train_test_split(X,y, test_size=0.3)
# Fit the model
model.fit([X_train[:,0],X_train[:,1],X_train[:,2],X_train[:,3]], y_train, batch_size=64, epochs=100, validation_split=0.25)
# Evaluate the model
test_scores = model.evaluate([X_test[:,0],X_test[:,1],X_test[:,2],X_test[:,3]], y_test)
print("Test loss:", test_scores[0])
print("Test accuracy:", test_scores[1])
Output:
Epoch 1/100
2/2 [==============================] - 0s 75ms/step - loss: 1.6446 - accuracy: 0.4359 - val_loss: 1.6809 - val_accuracy: 0.5185
Epoch 2/100
2/2 [==============================] - 0s 10ms/step - loss: 1.4151 - accuracy: 0.6154 - val_loss: 1.4886 - val_accuracy: 0.5556
Epoch 3/100
2/2 [==============================] - 0s 9ms/step - loss: 1.2725 - accuracy: 0.6795 - val_loss: 1.3813 - val_accuracy: 0.5556
Epoch 4/100
2/2 [==============================] - 0s 9ms/step - loss: 1.1829 - accuracy: 0.6795 - val_loss: 1.2779 - val_accuracy: 0.5926
Epoch 5/100
2/2 [==============================] - 0s 10ms/step - loss: 1.0994 - accuracy: 0.6795 - val_loss: 1.1846 - val_accuracy: 0.5926
Epoch 6/100
.................. [ Truncated ]
Epoch 100/100
2/2 [==============================] - 0s 2ms/step - loss: 0.4049 - accuracy: 0.9333
Test loss: 0.40491223335266113
Test accuracy: 0.9333333373069763
Pictorial representation of the above model architecture
Dense layers in Keras/TF are fully connected layers. For example, when you use a Dense layer as follows
inputs = tf.keras.Input(shape=(4))
x = tf.keras.layers.Dense(3)(inputs)
all the 4 connected input neurons are connected to all the 3 output neurons.
There isn't any predefined layer in Keras/TF to specify how to connect input and output neurons. However, Keras/TF is very flexible in that it allows you to define your custom layers easily.
Borrowing the idea from this answer, you could define a CustomConnected layer as follows:
class CustomConnected(tf.keras.layers.Dense):
def __init__(self, units, connections, **kwargs):
self.connections = connections
super(CustomConnected, self).__init__(units, **kwargs)
def call(self, inputs):
self.kernel = self.kernel * self.connections
return super(CustomConnected, self).call(inputs)
Using this layer, you can then specify the connections between two layers through the connections argument. For example:
inputs = tf.keras.Input(shape=(4))
connections = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1], [0, 0, 1]])
x = CustomConnected(3, connections)(inputs)
Here, the 1st, 2nd, and 3rd input neurons are connected to the 1st, 2nd, and 3rd output neurons, respectively. Additionally, the 4th input neuron is connected to the 3rd output neuron.
UPDATE: As discussed in the comments section, an adaptive approach (e.g. by using only the maximum weight for each output neuron) is also possible but not recommended. You could implement this via the following layer:
class CustomSparse(tf.keras.layers.Dense):
def __init__(self, units, **kwargs):
super(CustomSparse, self).__init__(units, **kwargs)
def call(self, inputs):
nb_in, nb_out = self.kernel.shape
argmax = tf.argmax(self.kernel, axis=0) # Shape=(nb_out,)
argmax_onehot = tf.transpose(tf.one_hot(argmax, depth=nb_in)) # Shape=(nb_in, nb_out)
kernel_max = self.kernel * argmax_onehot
# tf.print(kernel_max) # Uncomment this line to print the weights
out = tf.matmul(inputs, kernel_max)
if self.bias is not None:
out += self.bias
if self.activation is not None:
out = self.activation(out)
return out
The main issue of this approach is that you cannot propagate gradients through the argmax operation required to select the maximum weight. As a result, the network will only "switch input neurons" when the selected weight is no longer the maximum weight.
Well, i'm new to Machine Learning, and so with Keras. I'm trying to create a model from which can be passed as Input a list of arrays of arrays (a list of 6400 arrays within 2 arrays).
This is my code's problem:
XFIT = np.array([x_train, XX_train])
YFIT = np.array([y_train, yy_train])
Inputs = keras.layers.Input(shape=(6400, 2))
hidden1 = keras.layers.Dense(units=100, activation="sigmoid")(Inputs)
hidden2 = keras.layers.Dense(units=100, activation='relu')(hidden1)
predictions = keras.layers.Dense(units=3, activation='softmax')(hidden2)
model = keras.Model(inputs=Inputs, outputs=predictions)
There's no error; however, the Input layer (Inputs) forces me to pass a (6400, 2) shape, as each array (x_train and XX_train) has 6400 arrays inside. The result, with the epochs done, is this:
Train on 2 samples
Epoch 1/5
2/2 [==============================] - 1s 353ms/sample - loss: 1.1966 - accuracy: 0.2488
Epoch 2/5
2/2 [==============================] - 0s 9ms/sample - loss: 1.1303 - accuracy: 0.2544
Epoch 3/5
2/2 [==============================] - 0s 9ms/sample - loss: 1.0982 - accuracy: 0.3745
Epoch 4/5
2/2 [==============================] - 0s 9ms/sample - loss: 1.0854 - accuracy: 0.3745
Epoch 5/5
2/2 [==============================] - 0s 9ms/sample - loss: 1.0835 - accuracy: 0.3745
Process finished with exit code 0
I can't train more than twice in each epoch because of the input shape. How can I change this input?
I have triend other shapes but they got me errors.
x_train, XX_train seems like this
[[[0.505834 0.795461]
[0.843175 0.975741]
[0.22349 0.035036]
...
[0.884796 0.867509]
[0.396942 0.659936]
[0.873194 0.05454 ]]
[[0.95968 0.281957]
[0.137547 0.390005]
[0.635382 0.901555]
...
[0.887062 0.486206]
[0.49827 0.949123]
[0.034411 0.983711]]]
Thank you and forgive me if i've commited any fault, first time in Keras and first time in StackOverFlow :D
You are almost there. The problem is with:
XFIT = np.array([x_train, XX_train])
YFIT = np.array([y_train, yy_train])
Let's see with an example:
import numpy as np
x_train = np.random.random((6400, 2))
y_train = np.random.randint(2, size=(6400,1))
xx_train = np.array([x_train, x_train])
yy_train = np.array([y_train, y_train])
print(xx_train.shape)
(2, 6400, 2)
print(yy_train.shape)
(2, 6400, 1)
In the array, we have 2 batches with 6400 samples each. This means when we call model.fit, it only has 2 batches to train on. Instead, what we can do:
xx_train = np.vstack([x_train, x_train])
yy_train = np.vstack([y_train, y_train])
print(xx_train.shape)
(12800, 2)
print(yy_train.shape)
(12800, 1)
Now, we have correctly joined both sample and can now train.
Inputs = Input(shape=(2, ))
hidden1 = Dense(units=100, activation="sigmoid")(Inputs)
hidden2 = Dense(units=100, activation='relu')(hidden1)
predictions = Dense(units=1, activation='sigmoid')(hidden2)
model = Model([Inputs], outputs=predictions)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(xx_train, yy_train, batch_size=10, epochs=5)
Train on 12800 samples
Epoch 1/5
12800/12800 [==============================] - 3s 216us/sample - loss: 0.6978 - acc: 0.5047
Epoch 2/5
12800/12800 [==============================] - 2s 186us/sample - loss: 0.6952 - acc: 0.5018
Epoch 3/5
12800/12800 [==============================] - 3s 196us/sample - loss: 0.6942 - acc: 0.4962
Epoch 4/5
12800/12800 [==============================] - 3s 217us/sample - loss: 0.6938 - acc: 0.4898
Epoch 5/5
12800/12800 [==============================] - 3s 217us/sample - loss: 0.6933 - acc: 0.5002
I'm trying to build a Handwritten word recognition using IAM Dataset
and while training I'm facing over fitting problem. Would you please
help me figure out what mistake I have made in code below.
I have tried all the solution that I can find to resolve the problem but still the same overfitting problem persists.
import os
import fnmatch
import cv2
import numpy as np
import string
import time
import random
from keras import regularizers, optimizers
from keras.regularizers import l2
from keras.preprocessing.sequence import pad_sequences
from keras.layers import Dense, LSTM, Reshape, BatchNormalization, Input, Conv2D, MaxPool2D, Lambda, Bidirectional, Dropout
from keras.models import Model
from keras.activations import relu, sigmoid, softmax
import keras.backend as K
from keras.utils import to_categorical
from keras.callbacks import ModelCheckpoint,ReduceLROnPlateau
import matplotlib.pyplot as plt
imgSize = (128,32)
def preprocess(img, imgSize, dataAugmentation=False):
"put img into target img of size imgSize, transpose for TF and normalize gray-values"
# there are damaged files in IAM dataset - just use black image instead
if img is None:
img = np.zeros([imgSize[1], imgSize[0]])
# increase dataset size by applying random stretches to the images
if dataAugmentation:
stretch = (random.random() - 0.5) # -0.5 .. +0.5
wStretched = max(int(img.shape[1] * (1 + stretch)), 1) # random width, but at least 1
img = cv2.resize(img, (wStretched, img.shape[0])) # stretch horizontally by factor 0.5 .. 1.5
img = cv2.adaptiveThreshold(img,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,cv2.THRESH_BINARY,11,2)
# print('Data Augmented')
# create target image and copy sample image into it
(wt, ht) = imgSize
(h, w) = img.shape
fx = w / wt
fy = h / ht
f = max(fx, fy)
newSize = (max(min(wt, int(w / f)), 1), max(min(ht, int(h / f)), 1)) # scale according to f (result at least 1 and at most wt or ht)
img = cv2.resize(img, newSize)
target = np.ones([ht, wt]) * 255
target[0:newSize[1], 0:newSize[0]] = img
# transpose for TF
img = cv2.transpose(target)
# normalize
(m, s) = cv2.meanStdDev(img)
m = m[0][0]
s = s[0][0]
img = img - m
img = img / s if s>0 else img
img = np.expand_dims(img , axis = 2)
return img
def truncateLabel(text, maxTextLen): # A,32
cost = 0
for i in range(len(text)):
if i != 0 and text[i] == text[i-1]:
cost += 2
else:
cost += 1
if cost > maxTextLen:
return text[:i] # returns words with repeated chars
return text
path = 'iam_dataset_words/'
maxTextLen = 32
samples = []
bad_samples = []
fileName = ''
dataAugmentation = False
chars = set()
f=open(path+ 'words.txt', "r")
cou = 0
bad_samples = []
bad_samples_reference = ['a01-117-05-02.png',
'r06-022-03-05.png']
for line in f:
cou+=1
# ignore comment line
if not line or line[0]=='#':
continue
lineSplit = line.strip().split(' ')
assert len(lineSplit) >= 9
fileNameSplit = lineSplit[0].split('-') #a01-000u-00-00 splits
#../data/words/a01/a01-000u/a01-000u-00-00.png
fileName = path + 'words/' \
+ fileNameSplit[0] + '/' \
+ fileNameSplit[0] + '-' \
+ fileNameSplit[1] \
+ '/' + lineSplit[0] + '.png'
# GT text are columns starting at 9
gtText = truncateLabel(' '.join(lineSplit[8:]), maxTextLen) #A,32
#chars = chars.union(gtText) #unique chars only
chars = chars.union(set(list(gtText)))
# check if image is not empty
if not os.path.getsize(fileName):
bad_samples.append(lineSplit[0] + '.png')
continue
# put sample into list
#'A','../data/words/a01/a01-000u/a01-000u-00-00.png'
samples.append([gtText, fileName])
print(cou)
print(len(samples))
print(samples[:2])
if set(bad_samples) != set(bad_samples_reference):
print("Warning, damaged images found:", bad_samples)
print("Damaged images expected:", bad_samples_reference)
trainSamples = []
validationSamples = []
testSamples = []
valid_testSamples = []
# split into training and validation set: 90% - 10%
# dataAugmentation = True
random.shuffle(samples)
splitIdx = int(0.75 * len(samples))
train_samples = samples[:splitIdx]
valid_testSamples = samples[splitIdx:]
print('vv:', len(valid_testSamples))
validationSamples = valid_testSamples[:15000]
testSamples = valid_testSamples[15000:]
print('valid: ',len(validationSamples))
print('test: ',len(testSamples))
print('train_before: ',len(train_samples))
# # start with train set
trainSamples = train_samples[:25000] #tran data 25000
print('train_ after: ',len(trainSamples))
# # list of all unique chars in dataset
charList = sorted(list(chars))
char_list = str().join(charList)
# print('test samples: ',testSamples)
print('char list : ',char_list)
# # save characters of model for inference mode
# open(FilePaths.fnCharList, 'w').write(str().join(charList))
# # save words contained in dataset into file
# open(FilePaths.fnCorpus, 'w').write(str(' ').join(loader.trainWords + validationWords))
def encode_to_labels(txt):
# encoding each output word into digits
chars = []
for index, char in enumerate(txt):
try:
chars.append(char_list.index(char))
except:
print(char)
return chars
print(trainSamples[:2])
# lists for training dataset
train_img = []
train_txt = []
train_input_length = []
train_label_length = []
train_orig_txt = []
max_label_len = 0
b = 0
for words, imgPath in trainSamples:
img = preprocess(cv2.imread(imgPath, cv2.IMREAD_GRAYSCALE), imgSize, dataAugmentation = True)
# compute maximum length of the text
if len(words) > max_label_len:
max_label_len = len(words)
train_orig_txt.append(words)
train_label_length.append(len(words))
train_input_length.append(31)
train_img.append(img)
train_txt.append(encode_to_labels(words))
b+=1
# print(train_img[1])
print(len(train_txt))
train_txt[:5]
a = 0
#lists for validation dataset
valid_img = []
valid_txt = []
valid_input_length = []
valid_label_length = []
valid_orig_txt = []
for words, imgPath in validationSamples:
img = preprocess(cv2.imread(imgPath, cv2.IMREAD_GRAYSCALE), imgSize, dataAugmentation = False)
valid_orig_txt.append(words)
valid_label_length.append(len(words))
valid_input_length.append(31)
valid_img.append(img)
valid_txt.append(encode_to_labels(words))
a+=1
print(len(valid_txt))
valid_txt[:5]
# lists for training dataset
test_img = []
test_txt = []
test_input_length = []
test_label_length = []
test_orig_txt = []
c = 0
for words, imgPath in testSamples:
img = preprocess(cv2.imread(imgPath, cv2.IMREAD_GRAYSCALE), imgSize, dataAugmentation = False)
test_orig_txt.append(words)
test_label_length.append(len(words))
test_input_length.append(31)
test_img.append(img)
test_txt.append(encode_to_labels(words))
c+=1
# print(c)
print(test_img[0].shape)
print('Train: {}\nValid: {}\nTest: {}'.format(b,a,c))
print(max_label_len)
# pad each output label to maximum text length
train_padded_txt = pad_sequences(train_txt, maxlen=max_label_len, padding='post', value = len(char_list))
valid_padded_txt = pad_sequences(valid_txt, maxlen=max_label_len, padding='post', value = len(char_list))
test_padded_txt = pad_sequences(test_txt, maxlen=max_label_len, padding='post', value = len(char_list))
print(len(train_padded_txt))
print(len(test_padded_txt))
print(valid_padded_txt[1])
# input with shape of height=32 and width=128
inputs = Input(shape=(128,32,1))
print(inputs.shape)
# convolution layer with kernel size (3,3)
conv_1 = Conv2D(32, (3,3), activation = 'relu', padding='same')(inputs)
batch_norm_1 = BatchNormalization()(conv_1)
# poolig layer with kernel size (2,2)
pool_1 = Conv2D(32, kernel_size=(1, 1), strides=2, padding='valid')(batch_norm_1)
conv_2 = Conv2D(64, (3,3), activation = 'relu', padding='same')(pool_1)
batch_norm_2 = BatchNormalization()(conv_2)
pool_2 = Conv2D(64, kernel_size=(1, 1), strides=2, padding='valid')(batch_norm_2)
conv_3 = Conv2D(128, (3,3), activation = 'relu', padding='same')(pool_2)
batch_norm_3 = BatchNormalization()(conv_3)
conv_4 = Conv2D(128, (3,3), activation = 'relu', padding='same')(batch_norm_3)
batch_norm_4 = BatchNormalization()(conv_4)
# poolig layer with kernel size (1,2)
pool_4 = MaxPool2D(pool_size=(1,2))(batch_norm_4)
conv_5 = Conv2D(256, (3,3), activation = 'relu', padding='same')(pool_4)
# Batch normalization layer
batch_norm_5 = BatchNormalization()(conv_5)
conv_6 = Conv2D(256, (3,3), activation = 'relu', padding='same')(batch_norm_5)
batch_norm_6 = BatchNormalization()(conv_6)
pool_6 = MaxPool2D(pool_size=(1,2))(batch_norm_6)
conv_7 = Conv2D(256, (2,2), activation = 'relu')(pool_6)
batch_norm_7 = BatchNormalization()(conv_7)
# print(conv_7.shape)
# map-to-sequence-- dropping 1 dimension
squeezed = Lambda(lambda x: K.squeeze(x, 2))(batch_norm_7)
# print('squeezed',squeezed.shape)
# bidirectional LSTM layers with units=128
blstm_1 = Bidirectional(LSTM(128, return_sequences=True, dropout = 0.3))(squeezed)
blstm_2 = Bidirectional(LSTM(128, return_sequences=True, dropout = 0.3))(blstm_1)
outputs = Dense(len(char_list)+1, activation = 'softmax')(blstm_2)
# model to be used at test time
word_model = Model(inputs, outputs)
adam = optimizers.Adamax(lr=0.01, decay = 1e-5)
model.compile(loss= {'ctc': lambda y_true, y_pred: y_pred}, optimizer = adam, metrics = ['accuracy'])
filepath="best_model.hdf5"
checkpoint1 = ReduceLROnPlateau(monitor='val_loss', verbose=1,
mode='auto',factor=0.2,patience=4, min_lr=0.0001)
checkpoint2 = ModelCheckpoint(filepath=filepath, monitor='val_loss', verbose=1, save_best_only=True, mode='auto')
callbacks_list = [checkpoint1, checkpoint2]
train_img = np.array(train_img)
train_input_length = np.array(train_input_length)
train_label_length = np.array(train_label_length)
valid_img = np.array(valid_img)
valid_input_length = np.array(valid_input_length)
valid_label_length = np.array(valid_label_length)
test_img = np.array(test_img)
test_input_length = np.array(test_input_length)
test_label_length = np.array(test_label_length)
test_img.shape
batch_size = 50
epochs = 30
train_history = model.fit(x=[train_img, train_padded_txt, train_input_length, train_label_length],
y=np.zeros(len(train_img)), batch_size=batch_size, epochs = epochs,
validation_data = ([valid_img, valid_padded_txt, valid_input_length,
valid_label_length], [np.zeros(len(valid_img))]),
verbose = 1, callbacks = callbacks_list)
Train on 25000 samples, validate on 15000 samples
Epoch 1/30
25000/25000 [==============================] - 159s 6ms/step - loss: 13.6510 - acc: 0.0199 - val_loss: 11.4910 - val_acc: 0.0651
Epoch 00001: val_loss improved from inf to 11.49100, saving model to best_model.hdf5
Epoch 2/30
25000/25000 [==============================] - 146s 6ms/step - loss: 10.9559 - acc: 0.0603 - val_loss: 9.7359 - val_acc: 0.0904
Epoch 00002: val_loss improved from 11.49100 to 9.73587, saving model to best_model.hdf5
Epoch 3/30
25000/25000 [==============================] - 146s 6ms/step - loss: 9.0720 - acc: 0.0943 - val_loss: 7.3571 - val_acc: 0.1565
Epoch 00003: val_loss improved from 9.73587 to 7.35715, saving model to best_model.hdf5
Epoch 4/30
25000/25000 [==============================] - 145s 6ms/step - loss: 6.9501 - acc: 0.1520 - val_loss: 5.5228 - val_acc: 0.2303
Epoch 00004: val_loss improved from 7.35715 to 5.52277, saving model to best_model.hdf5
Epoch 5/30
25000/25000 [==============================] - 144s 6ms/step - loss: 5.4893 - acc: 0.2129 - val_loss: 4.3179 - val_acc: 0.2895
Epoch 00005: val_loss improved from 5.52277 to 4.31793, saving model to best_model.hdf5
Epoch 6/30
25000/25000 [==============================] - 143s 6ms/step - loss: 4.7053 - acc: 0.2612 - val_loss: 3.7490 - val_acc: 0.3449
Epoch 00006: val_loss improved from 4.31793 to 3.74896, saving model to best_model.hdf5
Epoch 7/30
25000/25000 [==============================] - 143s 6ms/step - loss: 4.1183 - acc: 0.3096 - val_loss: 3.5902 - val_acc: 0.3805
Epoch 00007: val_loss improved from 3.74896 to 3.59015, saving model to best_model.hdf5
Epoch 8/30
25000/25000 [==============================] - 143s 6ms/step - loss: 3.6662 - acc: 0.3462 - val_loss: 3.7923 - val_acc: 0.3350
Epoch 00008: val_loss did not improve from 3.59015
Epoch 9/30
25000/25000 [==============================] - 143s 6ms/step - loss: 3.3398 - acc: 0.3809 - val_loss: 3.1352 - val_acc: 0.4344
Epoch 00009: val_loss improved from 3.59015 to 3.13516, saving model to best_model.hdf5
Epoch 10/30
25000/25000 [==============================] - 143s 6ms/step - loss: 3.0199 - acc: 0.4129 - val_loss: 2.9798 - val_acc: 0.4541
Epoch 00010: val_loss improved from 3.13516 to 2.97978, saving model to best_model.hdf5
Epoch 11/30
25000/25000 [==============================] - 143s 6ms/step - loss: 2.7361 - acc: 0.4447 - val_loss: 3.3836 - val_acc: 0.3780
Epoch 00011: val_loss did not improve from 2.97978
Epoch 12/30
25000/25000 [==============================] - 143s 6ms/step - loss: 2.5127 - acc: 0.4695 - val_loss: 2.9266 - val_acc: 0.5041
Epoch 00012: val_loss improved from 2.97978 to 2.92656, saving model to best_model.hdf5
Epoch 13/30
25000/25000 [==============================] - 142s 6ms/step - loss: 2.3045 - acc: 0.4974 - val_loss: 2.7329 - val_acc: 0.5174
Epoch 00013: val_loss improved from 2.92656 to 2.73294, saving model to best_model.hdf5
Epoch 14/30
25000/25000 [==============================] - 141s 6ms/step - loss: 2.1245 - acc: 0.5237 - val_loss: 2.8624 - val_acc: 0.5339
Epoch 00014: val_loss did not improve from 2.73294
Epoch 15/30
25000/25000 [==============================] - 142s 6ms/step - loss: 1.9091 - acc: 0.5524 - val_loss: 2.6933 - val_acc: 0.5506
Epoch 00015: val_loss improved from 2.73294 to 2.69333, saving model to best_model.hdf5
Epoch 16/30
25000/25000 [==============================] - 141s 6ms/step - loss: 1.7565 - acc: 0.5705 - val_loss: 2.7697 - val_acc: 0.5461
Epoch 00016: val_loss did not improve from 2.69333
Epoch 17/30
25000/25000 [==============================] - 145s 6ms/step - loss: 1.6273 - acc: 0.5892 - val_loss: 2.8992 - val_acc: 0.5361
Epoch 00017: val_loss did not improve from 2.69333
Epoch 18/30
25000/25000 [==============================] - 145s 6ms/step - loss: 1.5007 - acc: 0.6182 - val_loss: 2.9558 - val_acc: 0.5345
Epoch 00018: val_loss did not improve from 2.69333
Epoch 19/30
25000/25000 [==============================] - 143s 6ms/step - loss: 1.3775 - acc: 0.6311 - val_loss: 2.8437 - val_acc: 0.5744
Epoch 00019: ReduceLROnPlateau reducing learning rate to 0.0019999999552965165.
Epoch 00019: val_loss did not improve from 2.69333
Epoch 20/30
25000/25000 [==============================] - 144s 6ms/step - loss: 0.9636 - acc: 0.7115 - val_loss: 2.6072 - val_acc: 0.6083
Epoch 00020: val_loss improved from 2.69333 to 2.60724, saving model to best_model.hdf5
Epoch 21/30
25000/25000 [==============================] - 146s 6ms/step - loss: 0.7940 - acc: 0.7583 - val_loss: 2.6613 - val_acc: 0.6167
Epoch 00021: val_loss did not improve from 2.60724
Epoch 22/30
25000/25000 [==============================] - 146s 6ms/step - loss: 0.6995 - acc: 0.7797 - val_loss: 2.7180 - val_acc: 0.6220
Epoch 00022: val_loss did not improve from 2.60724
Epoch 23/30
25000/25000 [==============================] - 144s 6ms/step - loss: 0.6197 - acc: 0.8046 - val_loss: 2.7504 - val_acc: 0.6226
Epoch 00023: val_loss did not improve from 2.60724
Epoch 24/30
25000/25000 [==============================] - 143s 6ms/step - loss: 0.5668 - acc: 0.8167 - val_loss: 2.8238 - val_acc: 0.6255
Epoch 00024: ReduceLROnPlateau reducing learning rate to 0.0003999999724328518.
Epoch 00024: val_loss did not improve from 2.60724
Epoch 25/30
25000/25000 [==============================] - 144s 6ms/step - loss: 0.5136 - acc: 0.8316 - val_loss: 2.8167 - val_acc: 0.6283
Epoch 00025: val_loss did not improve from 2.60724
Epoch 26/30
25000/25000 [==============================] - 143s 6ms/step - loss: 0.5012 - acc: 0.8370 - val_loss: 2.8244 - val_acc: 0.6299
Epoch 00026: val_loss did not improve from 2.60724
Epoch 27/30
25000/25000 [==============================] - 143s 6ms/step - loss: 0.4886 - acc: 0.8425 - val_loss: 2.8366 - val_acc: 0.6282
Epoch 00027: val_loss did not improve from 2.60724
Epoch 28/30
25000/25000 [==============================] - 143s 6ms/step - loss: 0.4820 - acc: 0.8432 - val_loss: 2.8447 - val_acc: 0.6271
Epoch 00028: ReduceLROnPlateau reducing learning rate to 0.0001.
Epoch 00028: val_loss did not improve from 2.60724
Epoch 29/30
25000/25000 [==============================] - 141s 6ms/step - loss: 0.4643 - acc: 0.8452 - val_loss: 2.8538 - val_acc: 0.6278
Epoch 00029: val_loss did not improve from 2.60724
Epoch 30/30
25000/25000 [==============================] - 141s 6ms/step - loss: 0.4576 - acc: 0.8496 - val_loss: 2.8555 - val_acc: 0.6277
Epoch 00030: val_loss did not improve from 2.60724
Evaluation of the model
test_history = model.evaluate([test_img, test_padded_txt,
test_input_length, test_label_length],
y=np.zeros(len(test_img)), verbose = 1)
test_history
Output
13830/13830 [==============================] - 42s 3ms/step
[2.855567638786134, 0.6288503253882292]
Some Predicted Output:
Not sure what you have already tried, but did you check if your training and validation samples are balanced? That is, whether they have roughly the same percentages of examples in each category.
You could shuffle 'samples' using 'random.shuffle(samples)' before executing your following code:
splitIdx = int(0.75 * len(samples))
train_samples = samples[:splitIdx]
That way, you can be more certain that your training and validation sets are balanced.
There is a lot you can do.
Add batch normalization after every conv2d layer
Replace maxpooling with conv2d valid padding so it becomes a learnable layer
from: pool_1 = MaxPool2D(pool_size=(2, 2), strides=2)(conv_1)
to: pool_1 = Conv2D(filters, kernel_size=(1, 1), strides=2, padding='valid')(conv_1)
Add l2 regularization to your layers, look here for implementation
Try weight decay
Increase the dropout values you already have
Modify your learning rate, too small and it might fall into a local minimum
And here is a lot more, the only way to know is to try them out