I need help implementing a sequence classifier prediction using Keras APIs.
So far, I've managed to get the data in a specific format that I believe should be suited for input into Keras, however, I am still failing to understand the exact specifics on what parameters I need to change.
Here is the situation:
What I have is a number of targets, with each target representing an independent event.
Each target contains multiple detections of varying number. For example, one target might contain 33 detections, while another might contain 54. Each detection is just a single value between 0-1. The original dataset has a shape of (# samples, # detections)
I want to be able to input the sequence of these detections into a LSTM to classify the overall target into two classes for ALL targets
So far, I've pre-pended 0s to the detection sequences such that they are equal in length. Now the dataset has a shape of (# samples, 77(max detections across all targets))
Then, I create the time steps of an arbitrary window size of 7. The dataset has shape (# samples, 77-window+1 = 71, 7).
In case this isn't quite clear, the sequence has been turned from 1 long
[1, 2, 3, ... 77]
into 71 sequences of 7 that look like:
[[1, 2, 3, 4, 5, 6, 7],
[2, 3, 4, 5, 6, 7, 8],
...,
[71, 72, 73, 74, 75, 76, 77]]
Now that my data is in the format (# samples, # windows per sample, window), what tweaks should I make in order to obtain the output of 1 classification output per sample?
I've tried looking online at keras's documentation about TimeDistributed layers and LSTM layers, at all of the blog posts on MachineLearningMastery, and other forum posts, but I couldn't quite understand enough for me to figure out how to use the API for my specific case.
Here's what I have so far:
train_new.shape
output: (31179, 71, 7)
model = Sequential()
model.add(LSTM(100, input_shape=(71, 7), return_sequences=True))
model.add(TimeDistributed(Dense(1, activation='sigmoid')))
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=['accuracy'])
model.fit(train_new, label_train, validation_data=(test, label_test), epochs=3, batch_size=128)
Returns:
ValueError: Error when checking target: expected time_distributed_3 to have 3 dimensions, but got array with shape (31179, 1)
Any direction or guidance would be greatly appreciated.
Thanks for your time!
Related
Background
Hello everyone,
I'm working on (what I thought would be) a simple RNN using Google Colab [Tensorflow 2.9.2 and Keras 2.9.0]. I've been working through this for a while now, but I can't quite seem to get everything to play nice. The inputs to my RNN are sequences of the numbers 0 ~ 6 inclusive expressed as one-hot-encoded column vectors. The targets are just a single 0 ~ 6 value expressed as a one-hot-encoded row vector.
This link to a screenshot of my Colab describes...
Input of [0] -> Target of 6
Input of [0, 6] -> Target of 0
Input of [0, 6, 0, 3] -> Target of 0
Input of [0, 6, 0, 3, 0] -> Target of 5
From what I've been able to gather from other stackoverflow questions, blog posts, keras documentation, etc., the code below should be close to all I need for my use case as far as my model is concerned.
# Bulding RNN Model
model = None
model = keras.Sequential()
model.add(keras.Input((None, 7)))
model.add(layers.Bidirectional(layers.LSTM(32)))
model.add(layers.Dense(7, activation='softmax'))
model.summary()
# Compiling RNN Model
model.compile(
loss = keras.losses.CategoricalCrossentropy(),
optimizer="sgd",
metrics=["accuracy"],
)
The Problem
I'm very sure that my issue is related to every sample input being a vector or matrix of a different size. For example, a sequence of [0] would become a (7, 1) vector input for that particular timestep while [0, 5, 4, 1, 2, 3] would become a (7, 6) matrix input for its respective timestep. Based on the error messages I've received for the last several hours, I know keras isn't too pleased with that, but for what I'm trying to do, I'm not entirely sure of the best way forward.
I've manually split up my training and test sets.
( Image of code with output )
For clarity...
x_train and x_test -> A list of numpy arrays each with a variable column count (e.g., np.shape=(7, ???))
y_train and y_test -> A list of numpy arrays each with a constant size (e.g., np.shape=(1,7))
I'm quite sure my types are correct.
I'm fitting my model without anything extravagant.
# Fitting the RNN Model
model.fit(
x_train, y_train, validation_data=(x_test, y_test), epochs = 50
)
That said, I continue to receive a Value Error saying that "Layer Sequential_??? expects 1 input(s), but it received ??? input tensors."
( Image of Value Error )
Any help at all in this matter would be greatly appreciated!
Thank you all in advance!
As it turns out, I needing a masking layer to for what I was trying to do.
Rather, I believe a masking layer solved two underlying problems...
My dimensions not being precisely what Keras wanted [samples, timestep, features]
Not knowing how to make my input structure work when every sample had a variable timestep value. I can't shake the feeling that there's still a way to do it without padding, but my model can now 'fit' for the first time in hours. So, I'm going to call this success on some level.
Link to Embeddings and Masking
*With a special shoutout to keras.preprocessing.sequence.pad_sequences()
Please see python code below, I put comments in the code where I felt emphasis on information is required.
import keras
import numpy
def build_model():
model = keras.models.Sequential()
model.add(keras.layers.LSTM(3, input_shape = (3, 1), activation = 'elu'))# Number of LSTM cells in this layer = 3.
return model
def build_data():
inputs = [1, 2, 3, 4, 5, 6, 7, 8, 9]
outputs = [10, 11, 12, 13, 14, 15, 16, 17, 18]
inputs = numpy.array(inputs)
outputs = numpy.array(outputs)
inputs = inputs.reshape(3, 3, 1)# Number of samples = 3, Number of input vectors in each sample = 3, size of each input vector = 3.
outputs = outputs.reshape(3, 3)# Number of target samples = 3, Number of outputs per target sample = 3.
return inputs, outputs
def train():
model = build_model()
model.summary()
model.compile(optimizer= 'adam', loss='mean_absolute_error', metrics=['accuracy'])
x, y = build_data()
model.fit(x, y, batch_size = 1, epochs = 4000)
model.save("LSTM_testModel")
def apply():
model = keras.models.load_model("LSTM_testModel")
input = [[[7], [8], [9]]]
input = numpy.array(input)
print(model.predict(input))
def main():
train()
main()
My understanding is that for each input sample there are 3 input vectors. Each input vector goes to an LSTM cell. i.e. For sample 1, input vector 1 goes to LSTM cell 1, input vector 2 goes to LSTM cell 2 and so on.
Looking at tutorials on the internet, I've seen that the number of LSTM cells is much greater than the number of input vectors e.g. 300 LSTM cells.
So say for example I have 3 input vectors per sample what input goes to the 297 remaining LSTM cells?
I tried compiling the model to have 2 LSTM cells and it still accepted the 3 input vectors per sample, although I had to change the target outputs in the training data to accommodate for this(change the dimensions) . So what happened to the third input vector of each sample...is it ignored?
I believe the above image shows that each input vector (of an arbitrary scenario) is mapped to a specific RNN cell. I may be misinterpreting it. Above image taken from the following URL: http://karpathy.github.io/2015/05/21/rnn-effectiveness/
I will try to answer some of your questions and then will consolidate the information provided in the comments for completeness, for the benefit of you as well as for the Community.
As mentioned by Matias in the comments, irrespective of whether the Number of Inputs are more than or less than the Number of Units/Neurons, it will be connected like a Fully Connected Network, as shown below.
To understand how RNN/LSTM work internally, let's assume we have
Number of Input Features => 3 => F1, F2 and F3
Number of Timesteps => 2 => 0 and 1
Number of Hidden Layers => 1
Number of Neurons in each Hidden Layer => 5
Then what actually happens inside can be represented in the screenshots shown below:
You have also asked about words being assigned to LSTM Cell. Not sure which link you are referring to and whether it is correct or not but in simple terms (words in this screenshot actually will be replaced by Embedding Vectors), you can understand how LSTM handles the Text as shown in the screenshot below:
For more information, please refer Beautiful Explanation by OverLordGoldDragon and Daniel Moller.
Hope this helps. Happy Learning!
I built an LSTM in Keras. It reads observations of 9 time-lags, and predicts the next label. For some reason, the model I trained is predicting something that is nearly a straight line. What issue might there be in the model architecture that is creating such a bad regression result?
Input Data: Hourly financial time-series, with a clear upward trend 1200+ records
Input Data Dimensions:
- originally:
X_train.shape (1212, 9)
- reshaped for LSTM:
Z_train.shape (1212, 1, 9)
array([[[0.45073171, 0.46783444, 0.46226164, ..., 0.47164819,
0.47649667, 0.46017738]],
[[0.46783444, 0.46226164, 0.4553289 , ..., 0.47649667,
0.46017738, 0.47167775]],
Target data: y_train
69200 0.471678
69140 0.476364
69080 0.467761
...
7055 0.924937
7017 0.923651
7003 0.906253
Name: Close, Length: 1212, dtype: float64
type(y_train)
<class 'pandas.core.series.Series'>
LSTM design:
my = Sequential()
my.add(LSTM((20),batch_input_shape=(None,1,9), return_sequences=True))
my.add(LSTM(20, return_sequences=True))
my.add(LSTM(20, return_sequences=True))
my.add(LSTM(1))
input layer of 9 nodes. 3 hidden layers of 20 units each. 1 output layers of 1 unit.
The Keras default is return_sequences=False
Model is compiled with mse loss, and adam or sgd optimizer.
curr_model.compile(optimizer=optmfunc, loss="mse")
Model is fit in this manner. Batch is 32, shuffle can be True/False
curr_model.fit(Z_train, y_train,
validation_data=(Z_validation,y_validation),
epochs=noepoch, verbose=0,
batch_size=btchsize,
shuffle=shufBOOL)
Config and Weights are saved to disk. Since I'm training several models, I load them afterward to test certain performance metrics.
spec_model.model.save_weights(mname_trn)
mkerascfg = spec_model.model.to_json()
with open(mname_cfg, "w") as json_file:
json_file.write(mkerascfg)
When I trained an MLP, I got this result against the validation set:
I've trained several of the LSTMs, but the result against the validation set looks like this:
The 2nd plot (LSTM plot) is of the validation data. This is y_validation versus predictions on Z_validation. They are the last 135 records in respective arrays. These were split out of full data (i.e validation), and have the same type/properties as Z_train and y_train. The x-axis is just numbering 0 to 134 of the index, and y-axis it the value of y_validation or the prediction. Units are normalized in both arrays. So all the units are the same. The "straight" line is the prediction.
What idea could you suggest on why this is happening?
- I've changed batch sizes. Similar result.
- I've tried changing the return_sequences, but it leads to various errors around shape for subsequent layers, etc.
Information about LSTM progression of MSE loss
There are 4 models trained, all with the same issue of course. We'll just focus on the 3 hidden layer, 20-unit per layer, LSTM, as defined above.(Mini-batch size was 32, and shuffling was disabled. But enabling changed nothing).
This is a slightly zoomed in image of the loss progressionfor the first model (adam optimizer)
From what I can tell by messing with the index, that bounce in the loss values (which creates the thick area) starts after in the 500s of epochs.
Your code has a single critical problem: dimensionality shuffling. LSTM expects inputs to be shaped as (batch_size, timesteps, channels) (or (num_samples, timesteps, features)) - whereas you're feeding one timestep with nine channels. Backpropagation through time never even takes place.
Fix: reshape inputs as (1212, 9, 1).
Suggestion: read this answer. It's long, but could save you hours of debugging; this information isn't available elsewhere in such a compact form, and I wish I've had it when starting out with LSTMs.
Answer to a related question may also prove useful - but previous link's more important.
OverLordGoldDragon is right: the problem is with the dimensionality of the input.
As you can see in the Keras documentation all recurrent layers expect the input to be a 3D tensor with shape: (batch_size, timesteps, input_dim).
In your case:
the input has 9 time lags that need to be fed to the LSTM in sequence, so they are timesteps
the time series contains only one financial instrument, so the input_dim is 1
Hence, the correct way to reshape it is: (1212, 9, 1)
Also, make sure to respect the order in which data is fed to the LSTM. For forecasting problems it is better to feed the lags from the most ancient to the most recent, since we are going to predict the next value after the most recent.
Since the LSTM reads the input from left to right, the 9 values should be ordered as: x_t-9, x_t-8, ...., x_t-1 from left to right, i.e. the input and output tensors should look like this:
Z = [[[0], [1], [2], [3], [4], [5], [6], [7], [8]],
[[1], [2], [3], [4], [5], [6], [7], [8], [9]],
...
]
y = [9, 10, ...]
If they are not oriented as such you can always set the LSTM flag go_backwards=True to have the LSTM read from right to left.
Also, make sure to pass numpy arrays and not pandas series as X and y as Keras sometimes gets confused by Pandas.
For a full example of doing time series forecasting with Keras take a look at this notebook
Context:
I am currently working on time series prediction using Keras with Tensorflow backend and, therefore, studied the tutorial provided here.
Following this tutorial, I came to the point where the generator for the fit_generator() method is described.
The output this generator generates is as follows (left sample, right target):
[[[10. 15.]
[20. 25.]]] => [[30. 35.]] -> Batch no. 1: 2 Samples | 1 Target
---------------------------------------------
[[[20. 25.]
[30. 35.]]] => [[40. 45.]] -> Batch no. 2: 2 Samples | 1 Target
---------------------------------------------
[[[30. 35.]
[40. 45.]]] => [[50. 55.]] -> Batch no. 3: 2 Samples | 1 Target
---------------------------------------------
[[[40. 45.]
[50. 55.]]] => [[60. 65.]] -> Batch no. 4: 2 Samples | 1 Target
---------------------------------------------
[[[50. 55.]
[60. 65.]]] => [[70. 75.]] -> Batch no. 5: 2 Samples | 1 Target
---------------------------------------------
[[[60. 65.]
[70. 75.]]] => [[80. 85.]] -> Batch no. 6: 2 Samples | 1 Target
---------------------------------------------
[[[70. 75.]
[80. 85.]]] => [[90. 95.]] -> Batch no. 7: 2 Samples | 1 Target
---------------------------------------------
[[[80. 85.]
[90. 95.]]] => [[100. 105.]] -> Batch no. 8: 2 Samples | 1 Target
In the tutorial the TimeSeriesGenerator was used, but for my question it is secondary if a custom generator or this class is used.
Regarding the data, we have 8 steps_per_epoch and a sample of shape (8, 1, 2, 2).
The generator is fed to a Recurrent Neural Network, implemented by an LSTM.
My questions
fit_generator() only allows a single target per batch, as outputted by the TimeSeriesGenerator.
When I first read about the option of batches for fit(), I thought that I could have multiple samples and a corresponding number of targets (which are processed batchwise, meaning row by row). But this is not allowed by fit_generator() and, therefore, obviously false.
This would look for example like:
[[[10. 15. 20. 25.]]] => [[30. 35.]]
[[[20. 25. 30. 35.]]] => [[40. 45.]]
|-> Batch no. 1: 2 Samples | 2 Targets
---------------------------------------------
[[[30. 35. 40. 45.]]] => [[50. 55.]]
[[[40. 45. 50. 55.]]] => [[60. 65.]]
|-> Batch no. 2: 2 Samples | 2 Targets
---------------------------------------------
...
Secondly, I thought that, for example, [10, 15] and [20, 25] were used as input for the RNN consecutively for the target [30, 35], meaning that this is analog to inputting [10, 15, 20, 25]. Since the output from the RNN differs using the second approach (I tested it), this also has to be a wrong conclusion.
Hence, my questions are:
Why is only one target per batch allowed (I know there are some
workarounds, but there has to be a reason)?
How may I understand the
calculation of one batch? Meaning, how is some input like [[[40,
45], [50, 55]]] => [[60, 65]] processed and why is it not analog to
[[[40, 45, 50, 55]]] => [[60, 65]]
Edit according to todays answer
Since there is some misunderstanding about my definition of samples and targets - I follow what I understand Keras is trying to tell me when saying:
ValueError: Input arrays should have the same number of samples as target arrays. Found 1 input samples and 2 target samples.
This error occurs, when I create for example a batch which looks like:
#This is just a single batch - Multiple batches would be fed to fit_generator()
(array([[[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]]]),
array([[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]]))
This is supposed to be a single batch containing two time-sequences of length 5 (5 consecutive data points / time-steps), whose targets are also two corresponding sequences. [ 5, 6, 7, 8, 9] is the target of [0, 1, 2, 3, 4] and [10, 11, 12, 13, 14] is the corresponding target of [5, 6, 7, 8, 9].
The sample-shape in this would be shape(number_of_batches, number_of_elements_per_batch, sequence_size) and the target-shape shape(number_of_elements_per_batch, sequence_size).
Keras sees 2 target samples (in the ValueError), because I have two provide 3D-samples as input and 2D-targets as output (maybe I just don't get how to provide 3D-targets..).
Anyhow, according to #todays answer/comments, this is interpreted as two timesteps and five features by Keras. Regarding my first question (where I still see a sequence as target to my sequence, as in this edit-example), I seek information how/if I can achieve this and how such a batch would look like (like I tried to visualize in the question).
Short answers:
Why is only one target per batch allowed (I know there are some workarounds, but there has to be a reason)?
That's not the case at all. There is no restriction on the number of target samples in a batch. The only requirement is that you should have the same number of input and target samples in each batch. Read the long answer for further clarification.
How may I understand the calculation of one batch? Meaning, how is some input like [[[40,
45], [50, 55]]] => [[60, 65]] processed and why is it not analog to [[[40, 45, 50, 55]]] => [[60, 65]]?
The first one is a multi-variate timeseries (i.e. each timestep has more than one features), and the second one is a uni-variate timeseris (i.e. each timestep has one feature). So they are not equivalent. Read the long answer for further clarification.
Long answer:
I'll give the answer I mentioned in comments section and try to elaborate on it using examples:
I think you are mixing samples, timesteps, features and targets. Let me describe how I understand it: in the first example you provided, it seems that each input sample consists of 2 timesteps, e.g. [10, 15] and [20, 25], where each timestep consists of two features, e.g. 10 and 15 or 20 and 25. Further, the corresponding target consists of one timestep, e.g. [30, 35], which also has two features. In other words, each input sample in a batch must have a corresponding target. However, the shape of each input sample and its corresponding target may not be necessarily the same.
For example, consider a model where both its input and output are timeseries. If we denote the shape of each input sample as (input_num_timesteps, input_num_features) and the shape of each target (i.e. output) array as (output_num_timesteps, output_num_features), we would have the following cases:
1) The number of input and output timesteps are the same (i.e. input_num_timesteps == output_num_timesteps). Just as an example, the following model could achieve this:
from keras import layers
from keras import models
inp = layers.Input(shape=(input_num_timesteps, input_num_features))
# a stack of RNN layers on top of each other (this is optional)
x = layers.LSTM(..., return_sequences=True)(inp)
# ...
x = layers.LSTM(..., return_sequences=True)(x)
# a final RNN layer that has `output_num_features` unit
out = layers.LSTM(output_num_features, return_sequneces=True)(x)
model = models.Model(inp, out)
2) The number of input and output timesteps are different (i.e. input_num_timesteps ~= output_num_timesteps). This is usually achieved by first encoding the input timeseries into a vector using a stack of one or more LSTM layers, and then repeating that vector output_num_timesteps times to get a timeseries of desired length. For the repeat operation, we can easily use RepeatVector layer in Keras. Again, just as an example, the following model could achieve this:
from keras import layers
from keras import models
inp = layers.Input(shape=(input_num_timesteps, input_num_features))
# a stack of RNN layers on top of each other (this is optional)
x = layers.LSTM(..., return_sequences=True)(inp)
# ...
x = layers.LSTM(...)(x) # The last layer ONLY returns the last output of RNN (i.e. return_sequences=False)
# repeat `x` as needed (i.e. as the number of timesteps in output timseries)
x = layers.RepeatVector(output_num_timesteps)(x)
# a stack of RNN layers on top of each other (this is optional)
x = layers.LSTM(..., return_sequences=True)(x)
# ...
out = layers.LSTM(output_num_features, return_sequneces=True)(x)
model = models.Model(inp, out)
As a special case, if the number of output timesteps is 1 (e.g. the network is trying to predict the next timestep given the last t timesteps), we may not need to use repeat and instead we can just use a Dense layer (in this case the output shape of the model would be (None, output_num_features), and not (None, 1, output_num_features)):
inp = layers.Input(shape=(input_num_timesteps, input_num_features))
# a stack of RNN layers on top of each other (this is optional)
x = layers.LSTM(..., return_sequences=True)(inp)
# ...
x = layers.LSTM(...)(x) # The last layer ONLY returns the last output of RNN (i.e. return_sequences=False)
out = layers.Dense(output_num_features, activation=...)(x)
model = models.Model(inp, out)
Note that the architectures provided above are just for illustration, and you may need to tune or adapt them, e.g. by adding more layers such as Dense layer, based on your use case and the problem you are trying to solve.
Update: The problem is that you don't pay enough attention when reading, both my comments and answer as well as the error raised by Keras. The error clearly states that:
... Found 1 input samples and 2 target samples.
So, after reading this carefully, if I were you I would say to myself: "OK, Keras thinks that the input batch has 1 input sample, but I think I am providing two samples!! Since I am a very good person(!), I think it's very likely that I would be wrong than Keras, so let's find out what I am doing wrong!". A simple and quick check would be to just examine the shape of input array:
>>> np.array([[[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]]]).shape
(1,2,5)
"Oh, it says (1,2,5)! So that means one sample which has two timesteps and each timestep has five features!!! So I was wrong into thinking that this array consists of two samples of length 5 where each timestep is of length 1!! So what should I do now???" Well, you can fix it, step-by-step:
# step 1: I want a numpy array
s1 = np.array([])
# step 2: I want it to have two samples
s2 = np.array([
[],
[]
])
# step 3: I want each sample to have 5 timesteps of length 1 in them
s3 = np.array([
[
[0], [1], [2], [3], [4]
],
[
[5], [6], [7], [8], [9]
]
])
>>> s3.shape
(2, 5, 1)
Voila! We did it! This was the input array; now check the target array, it must have two target samples of length 5 each with one feature, i.e. having a shape of (2, 5, 1):
>>> np.array([[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]]).shape
(2,5)
Almost! The last dimension (i.e. 1) is missing (NOTE: depending on the architecture of your model you may or may not need that last axis). So we can use the step-by-step approach above to find our mistake, or alternatively we can be a bit clever and just add an axis to the end:
>>> t = np.array([[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
>>> t = np.expand_dims(t, axis=-1)
>>> t.shape
(2, 5, 1)
Sorry, I can't explain it better than this! But in any case, when you see that something (i.e. shape of input/target arrays) is repeated over and over in my comments and my answer, assume that it must be something important and should be checked.
I'm learning TensorFlow and LSTM and I'm wondering why my prediction output has multiple values when I'm training it to return one. My goal is to get a single value between 0 and 1 after training it with arrays for sentiment analysis.
The training input data looks like:
[[59, 21, ... 118, 194], ... [12, 110, ... 231, 127]]
All input arrays are of the same length padded with 0. The training target data looks like:
[1.0, 0.5, 0.0, 1.0, 0.0 ...]
Model:
model = Sequential()
model.add(Embedding(input_length, 64, mask_zero=True))
model.add(LSTM(100))
model.add(Dense(1, activation=tf.nn.sigmoid))
Why does the prediction seem to evaluate each individual value at a time instead of an array as a whole?
model.predict([192])
# Returns [[0.5491102]]
model.predict([192, 25])
# Returns [[0.5491102, 0.4923803]]
model.predict([192, 25, 651])
# Returns [[0.5491102, 0.4923803, 0.53853387]]
I don't want to take an average of the output because the relationships between the values in the input arrays matter for sentiment analysis. If I'm training to predict a single value I'm not understanding why a single value isn't output. I'm new to TensorFlow, Keras, and layered neural networks, so I'm sure I'm missing something obvious.
When you write:
model.predict([192, 25, 651])
it is if you are giving the model three input samples and therefore in return you would get three outputs, one for each input sample. Instead, if by [192, 25, 651] you really mean one input sample, then you wrap it in two lists:
model.predict([[[192, 25, 651]]])
The reason: the most outer list corresponds to the list of all the input data for all the input layers of the model, which is one here. The second list corresponds to the data for the first (and only) input layer and the third list corresponds the one input sample. That's the case with list inputs, since multi-input (and multi-output) Keras models should take a list of input arrays as input. One better way is to use a numpy array instead:
model.predict(np.array([[192, 25, 651]]))
np.array([[192, 25, 651]]) has a shape of (1,3) which means one sample of length 3.