Keras extremely high loss, not decreasing with each epoch - python

I'm using Keras to build and train a recurrent neural network.
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Masking
from keras.layers.recurrent import LSTM
#build and train model
in_dimension = 3
hidden_neurons = 300
out_dimension = 2
model = Sequential()
model.add(Masking([0,0,0], input_shape=(max_sequence_length, in_dimension)))
model.add(LSTM(hidden_neurons, return_sequences=True, input_shape=(max_sequence_length, in_dimension)))
model.add(LSTM(hidden_neurons, return_sequences=False))
model.add(Dense(out_dimension))
model.add(Activation('softmax'))
model.compile(loss="categorical_crossentropy", optimizer="rmsprop")
model.fit(padded_training_seqs, training_final_steps, nb_epoch=5, batch_size=1)
padded_training_seqs is an an array of sequences of [latitude, longitude, temperature], all padded to the same length with values of [0,0,0]. When I train this network, the first epoch gives me a loss of about 63, and increases after more epochs.
This is causing a model.predict call later in the code to give values that are completely off of the training values. For example, most of the training values in each sequence is around [40, 40, 20], but the RNN outputs values consistently around [0.4, 0.5], which causes me to think something is wrong with the masking layer.
The training X (padded_training_seqs) data looks something like this (only much larger):
[
[[43.103, 27.092, 19.078], [43.496, 26.746, 19.198], [43.487, 27.363, 19.092], [44.107, 27.779, 18.487], [44.529, 27.888, 17.768]],
[[44.538, 27.901, 17.756], [44.663, 28.073, 17.524], [44.623, 27.83, 17.401], [44.68, 28.034, 17.601], [0,0,0]],
[[47.236, 31.43, 13.905], [47.378, 31.148, 13.562], [0,0,0], [0,0,0], [0,0,0]]
]
and the training Y (training_final_steps) data looks like this:
[
[44.652, 39.649], [37.362, 54.106], [37.115, 57.66501]
]

I am somewhat certain that you're misusing the Masking layer from Keras. Check the documentation here for more details.
Try using a masking layer like:
model.add(Masking(0, input_shape=(max_sequence_length, in_dimension)))
because I believe it just needs the masking value in the timestep dimension, not the entire time-dimension and value (ie [0,0,0]).
Best of luck.

Related

How to develop a neural network to predict joint angles form joint positions and orientation

I am all new to neural network. I have a dataset of 3d joint positions (6400*23*3) and orientations in quaternions (6400*23*4) and I want to predict the joint angles for all 22 joints and 3 motion planes (6400*22*3). I have tried to make a model however it will not run as the input data don't match the output shape, and I can't figure out how to change it.
my code
import scipy
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv1D, MaxPooling2D, Dense, Flatten
from tensorflow.keras.utils import to_categorical
Jaload = scipy.io.loadmat('JointAnglesXsens11MovementsIforlængelse.mat')
Orload = scipy.io.loadmat('OrientationXsens11MovementsIforlængelse.mat')
Or = np.array((Orload['OR'][:,:]), dtype='float')
Ja = np.array((Jaload['JA'][:,:]), dtype='float')
Jalabel = np.array(Ja)
a = 0.6108652382
Jalabel[Jalabel<a] = 0
Jalabel[Jalabel>a] = 1
Ja3d = np.array(Jalabel.reshape(6814,22,3)) # der er 22 ledvinkler
Or3d = np.array(Or.reshape(6814,23,4)) # der er 23 segmenter
X_train = np.array(Or3d)
Y_train = np.array(Ja3d)
model = Sequential([
Dense(64, activation='relu', input_shape=(23,4)),
Dense(64, activation='relu'),
Dense(3, activation='softmax'),])
model.summary()
model.compile(loss='categorical_crossentropy', optimizer='adam') # works
model.fit(
X_train,
to_categorical(Y_train),
epochs=3,)
Running the model.fit returns with:
ValueError: A target array with shape (6814, 22, 3, 2) was passed for an output of shape (None, 3) while using as loss categorical_crossentropy. This loss expects targets to have the same shape as the output.
Here are some suggestions that might get you further down the road:
(1) You might want to insert a "Flatten()" layer just before the final Dense. This will basically collapse the output from the previous layers into a single dimension.
(2) You might want to make the final Dense layer have 22*3=66 units as opposed to three. Each output unit will represent a particular joint angle.
(3) You might want to likewise collapse the Y_train to be (num_samples, 22*3) using the numpy reshape.
(4) You might want to make the final Dense layer have "linear" activation instead of "softmax" - softmax will force the outputs to sum to 1 as a probability.
(5) Don't convert the y_train to categorical. It is already in the correct format I believe (after you reshape it to match the revised output of the model).
(6) The metric to use is probably not "categorical_crossentropy" but perhaps "mse" (mean squared error).
Hopefully, some of the above will help move you in the right direction. I hope this helps.

Is there a way to speed up Embedding layer in tf.keras?

I'm trying to implement an LSTM model for DNA sequence classification, but at the moment it is unusable because of how long it takes to train (25 seconds per epoch over 6.5K sequences, about 4ms per sample, and we need to train several versions of the model over 100s of thousands of sequences).
DNA sequence can be represented as a string of A, C, G, and T, e.g. "ACGGGTGACAT" could be an example of a single DNA sequence. Each sequence belongs to one of two categories that I am trying to predict and each sequence contains 1000 characters.
Initially, my model did not include an Embedding layer and instead I manually converted each sequence into a one-hot encoded matrix (4 rows by 1000 columns) and the model didn't work great but was incredibly fast. At this point though I had seen online that using an embedding layer has clear advantages. So I added an embedding layer and instead of using the one-hot encoded matrix I converted the sequences into integers with each character represented by a different integer.
Indeed the model works much better now, but it is about 30 times slower and impossible to work with. Is there something I can do here to speed up the embedding layer?
Here are the functions for constructing and fitting my model:
from tensorflow.keras.layers import Embedding, Dense, LSTM, Activation
from tensorflow.keras import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical
def build_model():
# initialize a sequential model
model = Sequential()
# add embedding layer
model.add(Embedding(5, 1, input_length=1000, mask_zero=True))
# Add LSTM layer
model.add(
LSTM(5)
)
# Add Dense NN layer
model.add(
Dense(units=2)
)
model.add(Activation('softmax'))
optimizer = Adam(clipnorm=1.)
model.compile(
loss="categorical_crossentropy", optimizer=optimizer, metrics=['accuracy']
)
return model
def train_model(X_train, y_train, epochs, batch_size):
model = build_model()
# y_train is initially a list of zeroes and ones, needs to be converted to categorical
y_train = to_categorical(y_train)
history = model.fit(
X_train, y_train, epochs=epochs, batch_size=batch_size
)
return model, history
Any help will be greatly appreciated - after much googling and trial-and-error, I can't seem to speed this up.
A possible suggestion is to use a "cheaper" RNN, such as the SimpleRNN instead of LSTM. It has less parameters to train. In some simple testing, I got a ~3x speed up over LSTM, with the same Embedding processing as you currently have. Not sure if you can reduce the sequence length from 1000 to a lower number, but that might be a direction to explore as well. I hope this helps.

What is the timestep in Keras' LSTM?

I have some troubles with the LSTM implementation in Keras.
My training set is structured as follow:
number of sequences: 5358
the length of each sequence is 300
each element of the sequence is a vector of 54 features
I'm unsure on how to shape the input for a stateful LSTM.
Following this tutorial: http://philipperemy.github.io/keras-stateful-lstm/, I've created the subsequences (in my case there are 1452018 subsequences with a window_size = 30).
What is the best option to reshape the data for a stateful LSTM's input?
What means the timestep of the input in this case? And why?
Is the batch_size related to the timestep?
I'm unsure on how to shape the input for a stateful LSTM.
LSTM(100, statefull=True)
But before using stateful LSTM ask yourself do I really need statefull LSTM? See here and here for more details.
What is the best option to reshape the data for a stateful LSTM's
input?
It really depends on the problem on hands. However, I think you do not need reshaping just feed data directly into Keras:
input_layer = Input(shape=(300, 54))
What means the timestep of the input in this case? And why?
In your example timestamp is 300. See here for further details on timestamp. In the following picture, we have 5 timestamps that we feed them into the LSTM network.
Is the batch_size related to the timestep?
No, it has nothing to do with batch_size. More details on batch_size can be found here.
Here is simple code based on the description that you provide. It might give you some intuition:
import numpy as np
from tensorflow.python.keras import Input, Model
from tensorflow.python.keras.layers import LSTM
from tensorflow.python.layers.core import Dense
x_train = np.zeros(shape=(5358, 300, 54))
y_train = np.zeros(shape=(5358, 1))
input_layer = Input(shape=(300, 54))
lstm = LSTM(100)(input_layer)
dense1 = Dense(20, activation='relu')(lstm)
dense2 = Dense(1, activation='sigmoid')(dense1)
model = Model(inputs=input_layer, ouputs=dense2)
model.compile("adam", loss='binary_crossentropy')
model.fit(x_train, y_train, batch_size=512)

Keras model predicting experience from hours

I am very new to Keras, neural networks and machine learning having just started to learn yesterday. I decided to try predicting the experience over an hour (0 to 23) (for a game and my own generated data-set) that a user would earn. Currently running what I have the predictions seem to be very low and very poor. I have tried a relu activation, which produced predictions all to be zero and from a bit of research, LeakyReLU.
This is the code I have for the prediction model so far:
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LeakyReLU
import numpy
numpy.random.seed(7)
dataset = numpy.loadtxt("experience.csv", delimiter=",")
X = dataset[: ,0]
Y = dataset[: ,1]
model = Sequential()
model.add(Dense(12, input_dim = 1, activation=LeakyReLU(0.3)))
model.add(Dense(8, activation=LeakyReLU(0.3)))
model.add(Dense(1, activation=LeakyReLU(0.3)))
model.compile(loss = 'mean_absolute_error', optimizer='adam', metrics = ['accuracy'])
model.fit(X, Y, epochs=120, batch_size=10, verbose = 0)
predictions = model.predict(X)
rounded = [round(x[0]) for x in predictions]
print(rounded)
I have also tried playing around with the hidden levels of the network, but honestly have no idea how many there should be or a good way to justify an amount.
If it helps here is the data-set I have been using:
https://raw.githubusercontent.com/NightShadeII/xpPredictor/master/experience.csv
Thankyou for any help
Looking at your data it does not seem like a classification problem.
You have two options:
-> Look at the second column and bucket them depending on the ranges and make classes that can be predicted, for instance: 0, 1, 2 etc. Now it tries to train but does not have enough examples for millions of classes that it thinks you are trying to predict.
-> If you want real valued output and not classes, try using linear regression.

How to decide the size of layers in Keras' Dense method?

Below is the simple example of multi-class classification task with
IRIS data.
import seaborn as sns
import numpy as np
from sklearn.cross_validation import train_test_split
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Dropout
from keras.regularizers import l2
from keras.utils import np_utils
#np.random.seed(1335)
# Prepare data
iris = sns.load_dataset("iris")
iris.head()
X = iris.values[:, 0:4]
y = iris.values[:, 4]
# Make test and train set
train_X, test_X, train_y, test_y = train_test_split(X, y, train_size=0.5, random_state=0)
################################
# Evaluate Keras Neural Network
################################
# Make ONE-HOT
def one_hot_encode_object_array(arr):
'''One hot encode a numpy array of objects (e.g. strings)'''
uniques, ids = np.unique(arr, return_inverse=True)
return np_utils.to_categorical(ids, len(uniques))
train_y_ohe = one_hot_encode_object_array(train_y)
test_y_ohe = one_hot_encode_object_array(test_y)
model = Sequential()
model.add(Dense(16, input_shape=(4,),
activation="tanh",
W_regularizer=l2(0.001)))
model.add(Dropout(0.5))
model.add(Dense(3, activation='sigmoid'))
model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')
# Actual modelling
# If you increase the epoch the accuracy will increase until it drop at
# certain point. Epoch 50 accuracy 0.99, and after that drop to 0.977, with
# epoch 70
hist = model.fit(train_X, train_y_ohe, verbose=0, nb_epoch=100, batch_size=1)
score, accuracy = model.evaluate(test_X, test_y_ohe, batch_size=16, verbose=0)
print("Test fraction correct (NN-Score) = {:.2f}".format(score))
print("Test fraction correct (NN-Accuracy) = {:.2f}".format(accuracy))
My question is how do people usually decide the size of layers?
For example based on code above we have:
model.add(Dense(16, input_shape=(4,),
activation="tanh",
W_regularizer=l2(0.001)))
model.add(Dense(3, activation='sigmoid'))
Where first parameter of Dense is 16 and second is 3.
Why two layers uses two different values for Dense?
How do we choose what's the best value for Dense?
Basically it is just trial and error. Those are called hyperparameters and should be tuned on a validation set (split from your original data into train/validation/test).
Tuning just means trying different combinations of parameters and keep the one with the lowest loss value or better accuracy on the validation set, depending on the problem.
There are two basic methods:
Grid search: For each parameter, decide a range and steps into that range, like 8 to 64 neurons, in powers of two (8, 16, 32, 64), and try each combination of the parameters. This is obviously requires an exponential number of models to be trained and tested and takes a lot of time.
Random search: Do the same but just define a range for each parameter and try a random set of parameters, drawn from an uniform distribution over each range. You can try as many parameters sets you want, for as how long you can. This is just a informed random guess.
Unfortunately there is no other way to tune such parameters. About layers having different number of neurons, that could come from the tuning process, or you can also see it as dimensionality reduction, like a compressed version of the previous layer.
There is no known way to determine a good network structure evaluating the number of inputs or outputs. It relies on the number of training examples, batch size, number of epochs, basically, in every significant parameter of the network.
Moreover, a high number of units can introduce problems like overfitting and exploding gradient problems. On the other side, a lower number of units can cause a model to have high bias and low accuracy values. Once again, it depends on the size of data used for training.
Sadly it is trying some different values that give you the best adjustments. You may choose the combination that gives you the lowest loss and validation loss values, as well as the best accuracy for your dataset, as said in the previous post.
You could do some proportion on your number of units value, something like:
# Build the model
model = Sequential()
model.add(Dense(num_classes * 8, input_shape=(shape_value,), activation = 'relu' ))
model.add(Dropout(0.5))
model.add(Dense(num_classes * 4, activation = 'relu'))
model.add(Dropout(0.2))
model.add(Dense(num_classes * 2, activation = 'relu'))
model.add(Dropout(0.2))
#Output layer
model.add(Dense(num_classes, activation = 'softmax'))
The model above shows an example of a categorisation AI system. The num_classes are the number of different categories the system has to choose. For instance, in the iris dataset from Keras, we have:
Iris Setosa
Iris Versicolour
Iris Virginica
num_classes = 3
However, this could lead to worse results than with other random values. We need to adjust the parameters to the training dataset by making some different tries and then analyse the results seeking for the best combination of parameters.
My suggestion is to use EarlyStopping(). Then check the number of epochs and accuracy with test loss.
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping
rlp = lrd = ReduceLROnPlateau(monitor = 'val_loss',patience = 2,verbose = 1,factor = 0.8, min_lr = 1e-6)
es = EarlyStopping(verbose=1, patience=2)
his = classifier.fit(X_train, y_train, epochs=500, batch_size = 128, validation_split=0.1, verbose = 1, callbacks=[lrd,es])

Categories