Keras sequence prediction with multiple simultaneous sequences - python

My question is very similar to what it seems this post is asking, although that post doesn't pose a satisfactory solution. To elaborate, I am currently using keras with tensorflow backend and a sequential LSTM model. The end goal is I have n time-dependent sequences with equal time steps (the same number of points on each sequence and the points are all the same time apart) and I would like to feed all n sequences into the same network so it can use correlations between the sequences to better predict the next step for each sequence. My ideal output would be an n-element 1-D array with array[0] corresponding to the next-step prediction for sequence_1, array[1] for sequence_2, and so on.
My inputs are sequences of single values, so each of n inputs can be parsed into a 1-D array.
I was able to get a working model for each sequence independently using the code at the end of this guide by Jakob Aungiers, although my difficulty is adapting it to accept multiple sequences at once and correlate between them (i.e. be analyzed in parallel). I believe the issue is related to the shape of my input data, which is currently in the form of a 4-D numpy array because of how Jakob's Guide splits the inputs into sub-sequences of 30 elements each to analyze incrementally, although I could also be completely missing the target here. My code (which is mostly Jakob's, not trying to take credit for anything that isn't mine) presently looks like this:
As-is this complains with "ValueError: Error when checking target: expected activation_1 to have shape (None, 4) but got array with shape (4, 490)", I'm sure there are plenty of other issues but I'd love some direction on how to achieve what I'm describing. Anything stick out immediately to anyone? Any help you could give will be greatly appreciated.
Thanks!
-Eric

Keras is already prepared to work with batches containing many sequences, there is no secret at all.
There are two possible approaches, though:
You input your entire sequences (all steps at once) and predict n results
You input only one step of all sequences and predict the next step in a loop
Suppose:
nSequences = 30
timeSteps = 50
features = 1 #(as you said: single values per step)
outputFeatures = 1
First apporach: stateful=False:
inputArray = arrayWithShape((nSequences,timeSteps,features))
outputArray = arrayWithShape((nSequences,outputFeatures))
input_shape = (timeSteps,features)
#use layers like this:
LSTM(units) #if the first layer in a Sequential model, add the input_shape
#if you want to return the same number of steps (like a new sequence parallel to the input, use return_sequences=True
Train like this:
model.fit(inputArray,outputArray,....)
Predict like this:
newStep = model.predict(inputArray)
Second approach: stateful=True:
inputArray = sameAsBefore
outputArray = inputArray[:,1:] #one step after input array
inputArray = inputArray[:,:-1] #eliminate the last step
batch_input = (nSequences, 1, features) #stateful layers require the batch size
#use layers like this:
LSMT(units, stateful=True) #if the first layer in a Sequential model, add input_shape
Train like this:
model.reset_states() #you need this in stateful=True models
#if you don't reset states,
#the stateful model will think that your inputs are new steps of the same previous sequences
for step in range(inputArray.shape[1]): #for each time step
model.fit(inputArray[:,step:step+1], outputArray[:,step:step+1],shuffle=False,...)
Predict like this:
model.reset_states()
predictions = np.empty(inputArray.shape)
for step in range(inputArray.shape[1]): #for each time step
predictions[:,step] = model.predict(inputArray[:,step:step+1])

Related

How does the nn.Embedding module relate intuitively to the idea of an embedding in general?

So, I'm having a hard time understanding nn.Embedding. Specifically, I can't connect the dots between what I understand about embeddings as a concept and what this specific implementation is doing.
My understanding of an embedding is that it is a smaller dimension representation of some larger dimension data point. So it maps data in N-d to a M-d latent/embedding space such that M < N.
As I understand it, this mapping is achieved through the learning process, as in an auto-encoder. The encoder learns the optimal embedding so that the decoder can reconstruct the original input.
So my question is, how does this relate to nn.Embedding module:
A simple lookup table that stores embeddings of a fixed dictionary and size.
This module is often used to store word embeddings and retrieve them using indices. The input to the module is a list of indices, and the output is the corresponding word embeddings.
does this layer "learn" a lower dimensional representation of a larger input space? Or is it something else entirely?
What I'm looking for is to take the very abstract language of the documentation to something real:
Let's say I have some input x. This input might be a vectorized image or maybe some sequence daily temperature data. In any case, this input x has 100 elements (100 days of temperature, or a 10x10 image).
How can you explain the use of nn.Embedding() in this case?
What does each argument mean in a real world context?
As you said, the aim when using an embedding is to reduce the dimension of your data. However, it does not learn a lower dimensional representation of a larger input space on its own. Starting from a random initialization you can improve this embedding through a learning process. This requires finding a suitable task to train the embedding on, I think, for another question. I believe it's called a "pretext task", where ultimately the objective is to have an accurate embedding matrix.
You can check the parameters of any nn.Module with .parameters(). It will return an generator.
<< [x for x in nn.Embedding(10, 2).parameters()][0].shape
>> torch.Size([10, 2])
Here, there are 10*2 parameters (i.e. dimension_input*dimension_output or by PyTorch's naming num_embeddings*embedding_dims). However it is, still, a lookup table: given an index it will return an embedding of size embedding_dims. But you these embeddings (the values of this matrix) can be changed.
Here's a little experiment:
E = nn.Embedding(10, 2)
optim = optim.SGD(E.parameters(), lr=0.01)
X = torch.randint(0, 10, size=(100,))
loss_before = E(X).mean()
loss_before.backward()
optim.step()
loss_after = E(X).mean()
As expected, loss_before and loss_after are different which shows nn.Embedding's parameters are learnable.
Edit: your question comes down to, "how do I encode my data?".
For those examples you gave precisely:
Let's say I have some input x. This input might be a vectorized image or maybe some sequence daily temperature data. In any case, this input x has 100 elements (100 days of temperature, or a 10x10 image).
You can't use a nn.Embedding to solve these cases. Embedding layers are different to a reduction matrix. The latter can be used to reduce every single vector of dimension d into dimension n where n<<d. The prerequisite to using an embedding layer is having a finite dictionnary of possible elements. For example, you might want to represent a word with a vector of size n then you would use a embedding of nb_possible_words x n. This way, for any given word in the dictionnary the layer will produce the corresponding n-size vector.
As I said in the comments below, num_embeddings is the number of unique elements you are working with and embedding_dim is the size of the embedding, i.e. the size of the output vector.
nn.Embedding is usually used at the head of a network to cast encoded data into a lower dimensionality space. It won't solve your problem by magically reducing your dimensions.
If you have a sequence of temperatures you want to analyse. You could encode each temperature into a one-hot-encoding. But this vector representation might be very large (depending on the number of different temperatures). Using an embedding layer would allow to reduce the size of these vectors. This is important when the aim is to analyse the data with a RNN any other MLP for that matter. Since the bigger your input size, the more paramaters you will have!

How would I go about creating a keras model with a varying number of targets/outputs?

I've set up a neural network regression model using Keras with one target. This works fine,
now I'd like to include multiple targets. The dataset includes a total of 30 targets, and I'd rather train one neural network instead of 30 different ones.
My problem is that in the preprocessing of the data I have to remove some target values, for a given example, as they represent unphysical values that are not to be predicted.
This creates the issues that I have a varying number of targets/output.
For example:
Targets =
None, 0.007798, 0.012522
0.261140, 2110.000000, 2440.000000
0.048799, None, None
How would I go about creating a keras.Sequential model(or functional) with a varying number of outputs for a given input?
edit: Could I perhaps first train a classification model that predicts the number of outputs given some test inputs, and then vary the number of outputs in the output layer according to this prediction? I guess I would have to use the functional API for something like that.
The "classification" edit here is unnecessary, i.e. ignore it. The number of outputs of the test targets is a known quantity.
(Sorry, I don't have enough reputation to comment)
First, do you know up front whether some of the output values will be invalid or is part of the problem predicting which outputs will actually be valid?
If you don't know up front which outputs to disregard, you could go with something like the 2-step approach you described in your comment.
If it is deterministic (and you know how so) which outputs will be valid for any given input and your problem is just how to set up a proper model, here's how I would do that in keras:
Use the functional API
Create 30 named output layers (e.g. out_0, out_1, ... out_29)
When creating the model, just use the outputs argument to list all 30 outputs
When compiling the model, specify a loss for each separate output, you can do this by passing a dictionary to the loss argument where the keys are the names of your output layers and the values are the respective losses
Assuming you'll use mean-squared error for all outputs, the dictionary will look something like {'out_0': 'mse', 'out_1': 'mse', ..., 'out_29': 'mse'}
When passing inputs to the models, pass three things per input: x, y, loss-weights
y has to be a dictionary where the key is the output layer name and the value is the target output value
The loss-weights are also a dictionary in the same format as y. The weights in your case can just be binary, 1 for each output that corresponds to a real value, 0 for each output that corresponds to unphysical values (so they are disregarded during training) for any given sample
Don't pass None's for the unphysical value targets, use some kind of numeric filler, otherwise you'll get issues. It is completely irrelevant what you use for your filler as it will not affect gradients during training
This will give you a trainable model. BUT once you move on from training and try to predict on new data, YOU will have to decide which outputs to disregard for each sample, the network will likely still give you "valid"-looking outputs for those inputs.
One possible solution would be to have a separate output of "validity flags" which takes values in range from zero to one. For example, your first target will be
y=[0.0, 0.007798, 0.012522]
yf=[0.0, 1.0, 1.0]
where zeros indicate invalid values.
Use sigmoid activation function for yf.
Loss function can be the sum of losses for y and yf.
During inference, analyze the network output for yf and only consider y value valid if corresponding yf exceeds 0.5 threshold

Proper and definitive explanation about how to build a CNN 1D in Keras

Ciao,
this is the second part of a problem I'm facing with CNN 1d. The first part is this
How does it works the input_shape variable in Conv1d in Keras?
I'using this code:
from keras.models import Sequential
from keras.layers import Dense, Conv1D
import numpy as np
N_FEATURES=5
N_TIMESTEPS=10
X = np.random.rand(100, N_FEATURES)
Y = np.random.randint(0,2, size=100)
model = Sequential()
model.add(Conv1D(filters=32, kernel_size=N_TIMESTEPS, activation='relu', input_shape=(N_TIMESTEPS, N_FEATURES)))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
Now, what I want to do?
I want to train a CNN 1d over a timeseries with 5 features. Actually I want to work with time windows og length N_TIMESTEPS rather than timeserie it self. This means that I want to use a sort of "magnifier" of dimension N_TIMESTEPS x N_FEATURES on the time series to work locally. That's why I've decided to use CNN
Here come the first question. It is not clear at all if I have to transform the time series into a tensor or it is something that Keras will do for me since I've specified the kernel_size variable.
In case I must provide a tensor I would do something like this:
X_tensor = []
for i in range(len(X)-N_TIMESTEPS):
X_tensor+=[X_tensor[range(i, N_TIMESTEPS+i), :]]
X_tensor = np.asarray(X_tensor)
In this case of course I should also provide a Y_tensor vector computed from Y according to some criteria. Suppose I have already this Y_tensor boolean vector of the same length of X_tensor, which is len(X)-N_TIMESTEPS-1.
Y_tensor = np.random.randint(0,2,len(X)-N_TIMESTEPS-1)
Now if I try to feed the model I get of the most common error for CNN 1d which is:
ValueError: Error when checking input: expected conv1d_4_input to have 3 dimensions, but got array with shape (100, 5)
By looking to a dozen of posts about it I cannot understand what I did wrong. This is what I've tried:
model.fit(X,Y)
model.fit(np.expand_dims(X, axis=0),Y)
model.fit(np.expand_dims(X, axis=2),Y)
model.fit(X_tensor,Y_tensor)
For all of these cases I get always the same error (with different dimensional values in the final tuple).
Questions:
What Keras expects from my data? Can I feed the model with the whole time series or I have to slice it into a tensor?
How I have to feed the model in term of data structure?I.e. I have to specify in some strange way the dimension of the data?
Can you help me? I find out that this is one the most confusing point of CNN implementation in Keras that there are different posts with different solutions that do not fit with structure of my data (even if they have a very common structure according to me).
Note: There are some post suggesting to pass in the input_shape variable the length of the data. This is meaningless to me since I should not provide the dimension of the data (which is a variable) to the model. The only thing I should give to it, according to the theory, is the filter dimension and number of features (namely the dimension of the matrix that will roll over the time series).
Thanks,
am
Simply, Conv1D requires 3 dimensions:
Number of series (1)
Number of steps (100 - your entire data)
Number of features (5)
So, model.fit(np.expand_dims(X, axis=0),Y) is correct for X.
Now, if X is (1, 100, 5), naturally your input_shape=(100,5).
If your Y has 100 steps, then you need to make sure your Conv1D will output 100 steps. You need padding='same', otherwise it will become 91. (I suggest you actually work with 91, since you want a result for each 10 steps and probably don't want border effects spoiling your results)
Y must also follow the same rules for shape:
Number of series (1)
Number of steps (100 if padding='same'; 91 if padding='valid')
Number of features (1 = Dense output)
So, Y = Y.reshape((1,-1,1)).
Since you have only one class (true/false), it's pointless to use 'categorical_crossentropy'. You should go with 'binary_crossentropy'.
In general, your overall idea of using this convolution with kernel_size=10 to simulate sliding windows of 10 steps will work as expected (whether it will be efficient or not is another question, answered only by trying).
If you want better networks for sequences, you should probably try LSTM layers. The dimensions work exactly the same way. You will need return_sequences=False.
The main difference is that you will need to separate the data as you did in that loop. Then:
X.shape == (91, 10, 5)
Y.shape == (91, 1)
I think you don't have a clear idea on how 1d convolutional neural networks work:
if you want to predict the y values from the timeseries x and you just have 1 timeseries, your approach won't work. the network needs lots of samples to train, and having just 1 will allow it to easily memorize the input and not learn. For example, if the timeseries is the humidity of a given day, and y is the chance of rain at a specific timestep, what you have now is the data for just one day (timesteps being for example hours of the day). In order for the network to learn you need to gather data for many days, ending up with datasets of shape x=(n_days, timesteps, features) y=(n_days, timesteps, 1).
If you describe your actual problem there's better chance to get more helpful answers
[Edit] by sticking to your code, and using just one timeseries, you are better off with other methods that don't involve deep learning. You could split your timeseries at regular interval, obtaining n samples that would allow your network to train, but unless you have a very long timeseries that may not be a valid alternative.

How do you make predictions with a stateful LSTM?

Okay, so I trained a stateful LSTM characterwise on https://cs.stanford.edu/people/karpathy/char-rnn/shakespear.txt. It didn't seem to do too bad in terms of accuracy, but know I want to generate my own shakespeare works.
The question is, how do I go about actually generating predictions from it?
In particular, the models batch input shape is (128, 128, 63) and the output shape is (128, 128, 63). (The first number is the batch size, the second number is the length of the prediction input and output, and the third number is the number of distinct characters in the text.)
For example, I would like to:
Generate various predictions starting from empty text
Generate predictions starting from a small starting text (such as "PYRULEZ:")
This should be possible given how LSTMs work.
Here's a snippet of the code used to generate and fit the model:
model = Sequential()
model.add(LSTM(dataY.shape[2], batch_input_shape=(128, dataX.shape[1], dataX.shape[2]), return_sequences = True, stateful=True, activation = "softmax"))
model.summary()
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics = ['acc'])
model.fit(dataX, dataY, epochs = 1, batch_size = 128, verbose=1, shuffle = False)
Looking at other code samples, it appears I'll need to modify this somehow, but I'm not sure in how specifically.
I can include the whole code sample if that would be helpful. It is self contained.
Simple. Put your input into model.predict() with appropriate parameters (see documentation), concatenate input and output (the model predicts on progressively longer chains). Depending on how you organised training, output will add one character at a time. To be more precise, if you train sequence to sequence shifted by one, your output sequence will ideally be your input sequence shifted by one element; PYRULEZ -> YRULEZ* Hence you need to take the last character of the output and add it to your prior (input) sequence.
If you want long lines of text, you might want to limit the length of your sequence to some number of characters in the loop. Much of the long term dependencies in the text is carried through the stateful vector of the LSTM cell anyway (Not something you interact with).
Pseudocode-ish:
for counter in range(output_length):
output = model.predict(input_)
input_ = np.concatenate((input_, output[:,-1,:]), axis=1)

Input to Bidirectional LSTM in tensorflow

Normally all inputs fed to BiLSTM are of shape [batch_size, time_steps, input_size].
However, I'm working on a problem of Automatic Grading of an Essay in which there's an extra dimension called number of sentences in each essay. So in my case, a typical batch after embedding using word2vec, is of shape [2,16,25,300].
Here, there are 2 essays in each batch (batch_size=2), each essay has 16 sentences, each sentence is 25 words long(time_step=25) and I'm using word2vec of 300 dimensions (input_size=300).
So clearly I need to loop this batch over dimension 16 somehow such that the shape of input becomes [2,25,300] in each iteration. I have tried for a long time but I haven't been able to find a way to do it. For example, if you make a loop over tf.nn.bidirectional_dynamic_rnn(), it'll give error in second iteration saying that tf.nn.bidirectional_dynamic_rnn() kernel already exists. I can't directly make a for loop over sentence_dimension because those are tensors of shape [None,None,None,300] and I gave values just for the sake of simplicity. If there any other way to do it? Thanks. Please note that I am not using Keras or any other framework.
Here's a sample encoding layer for reference.
def bidirectional_lstm(input_data):
cell = tf.nn.rnn_cell.LSTMCell(num_units=200, state_is_tuple=True)
outputs, states = tf.nn.bidirectional_dynamic_rnn(cell_fw=cell,
cell_bw=cell,
dtype=tf.float32,
inputs=input_data)
return tf.concat(outputs,2)
embedded is of shape [2,16,25,300].
And here's a sample input
for batch_i, texts_batch in enumerate(get_batches(X_train, batch_size)): ## get_batches() is yielding batches one by one
## X_train is of shape [2,16,25] ([batch_size,sentence_length,num_words])
## word2vec
embeddings = tf.nn.embedding_lookup(word_embedding_matrix, texts_batch)
## embeddings shape is now [2,16,25,300] ([batch_size,sentence_length,num_words,word2vec_dim])
## Now I need some kind of loop here to loop it over sentence dimension. Can't use for loop since these are all placeholders with dimensions None
## ??? output = bidirectional_lstm(embeddings,3,200,0.7) ??? This would be correct if there was no sentence dimension.

Categories