Input Shape for Keras Model with multiple one-hot arrays - python

I don't quite understand how the input shape and dimensions for a keras model works when trying to use multiple one hot encoded arrays.
For example, this is my feature state containing 9 one-hot encoded arrays.
features = [[first_one_hot] + [second_one_hot] + \
[third_one_hot] + [fourth_one_hot] + [sixt_one_hot] + [seventh_one_hot]+ ...]
having a shape of: (3, 4, 4, 5, 5, 5, 10, 10, 10), where:
Shape: a shape (30,4,10) means an array or tensor with 3 dimensions, containing 30 elements in the first dimension, 4 in the second and 10 in the third, totaling 30 * 4 * 10 = 1200 elements or numbers.
If I just unpack each of my one hot arrays, my model works given a shape of (1, 56) - but as of my understanding, the model does not quite know which values correspond to which one hot by doing so.
Question 1
First of all, am I understanding right that each of the features concatenated in the array above should be separated, instead of using a (1, 56) array as I mentioned? Lets say, instead of:
[1,0,0,0,1,0,0,0,...] use:
[1,0,0], [1,0,0,0], ...
If so, how should I give the separated onehot's to the model? I'm new at machine learning, so this might be a strange question to ask.
Question 2
If so, what could be the advantage of also grouping thematically similar onehots into separate input layers?
My build_model right now uss just one input-dim with (1,56) layer size:
def _build_model(self, hl1_dims, hl2_dims, hl3_dims, input_layer_size, output_layer_size, optimizer, loss):
model = Sequential()
# My input_layer_size is set to 9, as I have 9 dimensions
model.add(Dense(hl1_dims, input_dim=input_layer_size))
model.add(BatchNormalization())
model.add(Activation('relu'))
# Second Hidden Layer
...
As of my understanding, I could also use multiple input layers something like that:
input_3d = Input(shape=(3,))
input_4d = Input(shape=(4,))
input_5d = Input(shape=(5,))
input_10d = Input(shape=(10,))
# multiple branches, for example:
branch_3d = Dense(32, activation='relu')(input_3d)
branch_3d = Dense(32, activation='relu')(branch_3d)
m_3d = Model(inputs=input_3d, outputs=branch_3d)
# Combine all output of branches
combined = Concatenate(axis=1)([m_3d.output, m_4d.output, m_5d.output, m_10d.output])
# Apply FC Layer
out = Dense(16, activation='relu')(combined)
out = Dense(output_layer_size, activation='linear')(out)
# Model accepts inputs of all branches and output action space based on output_layer_size
model = Model(inputs=[m_3d.input, m_4d.input, m_5d.input, m_10d.input], outputs=out)
I tried above implementation but never really got it to work, mostly got errors like:
ValueError: Error when checking input: expected dense_input to have 2 dimensions, but got array with shape (1, 1, 9)
But as I said, I'm not even sure if you would split categorical inputs into separate layers or if it's best practice to just combine all categorical features into one shape. Would really much appreciate any input on this.

Related

How is the Keras Conv1D input specified? I seem to be lacking a dimension

My input is a array of 64 integers.
model = Sequential()
model.add( Input(shape=(68,), name="input"))
model.add(Conv1D(64, 2, activation="relu", padding="same", name="convLayer"))
I have 10,000 of these arrays in my training set. And I supposed to be specifying this in order for conv1D to work?
I am getting the dreaded
ValueError: Input 0 of layer convLayer is incompatible with the layer: : expected min_ndim=3, found ndim=2. Full shape received: [None, 68]
error and I really don't understand what I need to do.
Don't let the name confuse you. The layer tf.keras.layers.Conv1D needs the following shape: (time_steps, features). If your dataset is made of 10,000 samples with each sample having 64 values, then your data has the shape (10000, 64), which is not directly applicable to the tf.keras.layers.Conv1D layer. You are missing the time_steps dimension. What you can do is use the tf.keras.layers.RepeatVector, which repeats your array input n times, in the example 5. This way your Conv1D layer gets an input of the shape (5, 64). Check out the documentation for more information:
time_steps = 5
model = tf.keras.Sequential()
model.add(tf.keras.layers.Input(shape=(64,), name="input"))
model.add(tf.keras.layers.RepeatVector(time_steps))
model.add(tf.keras.layers.Conv1D(64, 2, activation="relu", padding="same", name="convLayer"))
As a side note, you should ask yourself if using a tf.keras.layers.Conv1D layer is the right option for your use case. This layer is usually used for NLP and other time series tasks. For example, in sentence classification, each word in a sentence is usually mapped to a high-dimensional word vector representation, as seen in the image. This results in data with the shape (time_steps, features).
                                          
If you want to use character one hot encoded embeddings it would look something like this:
                                          
This is a simple example of one single sample with the shape (10, 10) --> 10 characters along the time series dimension and 10 features. It should help you understand the tutorial I mentioned a bit better.
The Conv1D layer does temporal convolution, that is, along the first dimension (not the batch dimension of course), so you should put something like this:
time_steps = 5
model = tf.keras.Sequential()
model.add(tf.keras.layers.Input(shape=(time_steps, 64), name="input"))
model.add(tf.keras.layers.Conv1D(64, 2, activation="relu", padding="same", name="convLayer"))
You will need to slice your data into time_steps temporal slices to feed the network.
However, if your arrays don't have a temporal structure, then conv1D is not the layer you are looking for.

Error when checking input: expected lstm_input to have 3 dimensions, but got array with shape (4, 1)

First of all, I know there are tons of questions similar like this; I've tried to do what the answers suggest, but seems like I do not know how to solve it. I have a Keras Functional API model:
lstm_input = keras.layers.Input(shape=(1,4), name='lstm_input')
x = keras.layers.LSTM(50, name='lstm_0')(lstm_input)
x = keras.layers.Dropout(0.2, name='lstm_dropout_0')(x)
x = keras.layers.Dense(64, name='dense_0')(x)
x = keras.layers.Activation('sigmoid', name='sigmoid_0')(x)
x = keras.layers.Dense(1, name='dense_1')(x)
output = keras.layers.Activation('linear', name='linear_output')(x)
model = keras.Model(inputs=lstm_input, outputs=output)
adam = keras.optimizers.Adam(lr=0.0005)
model.compile(optimizer=adam, loss='mse')
And when I try to fit it, it jumps this error:
ValueError: Error when checking input: expected lstm_input to have 3 dimensions, but got array with shape (4, 1)
This is my call to fit:
model.fit(X_aux['X_i'], X[i+1, 0])
# X_aux['X_i'].shape = (4, ) -- it's a numpy array
I've tried np.reshape([X_aux['X_i1']], (4,1)), where its new shape is (4, 1) but it does not work. How can I solve this?
Make sure your input_shape of X_aux['X-i'] is 3 dimensional.
The input of any RNN-based layer must be 3 dimensional where each axis is corresponded to batch_size, time_step, and feature dimension respectively.
The reason why reshaping to (4, 1) wouldn't help is that the reshaped tensor is still 2 dimension. You need 3.
Make sure you define batch_size, time_step, and feature dimension correctly and reshape X_aux['X-i'] and retrain the model again.

Mismatch in expected Keras shapes after pooling

I'm building a few simple models in Keras to improve my knowledge of deep learning, and encountering some issues I don't quite understand how to debug.
I want to use a 1D CNN to perform regression on some time-series data. My input feature tensor is of shape N x T x D, where N is the number of data points, T is the number of sequences, and D is the number of dimensions. My target tensor is of shape N x T x 1 (1 because I am trying to output a scalar value).
I've set up my model architecture like this:
feature_tensor.shape
# (75584, 40, 38)
target_tensor.shape
# (75584, 40, 1)
inputs = Input(shape=(SEQUENCE_LENGTH,DIMENSIONS))
conv1 = Conv1D(filters=64, kernel_size=3, activation='relu')
x = conv1(inputs)
x = MaxPooling1D(pool_size=2)(x)
x = Flatten()(x)
x = Dense(100, activation='relu')(x)
predictions = Dense(1, activation="linear")(x)
model = Model(inputs, predictions)
opt = Adam(lr=1e-5, decay=1e-4 / 200)
model.compile(loss="mean_absolute_error", optimizer=opt)
When I attempt to train my model, however, I get the following output:
r = model.fit(cleaned_tensor, target_tensor, epochs=100, batch_size=2058)
ValueError: Error when checking target: expected dense_164 to have 2
dimensions, but got array with shape (75584, 40, 1).
The first two numbers are familiar: 75584 is the # of samples, 40 is the sequence length.
When I debug my model summary object, I see that the expected output from the Flatten layer should be 1216:
However, my colleague and I stared at the code for a long time and could not understand why the shape of (75584, 40, 1) was being arrived at via the architecture when it reached the dense layer.
Could someone point me in the direction of what I am doing wrong?
Try reshaping your target variable to N x T, and it looks like your final dense layer should be 40 rather than 1 (i think).

Reshape Keras Input for LSTM

I have two ndarrays, inputs and results, both consisting of multiple arrays looking like this:
inputs = [
[[1,2],[2,2],[3,2]],
[[2,1],[1,2],[2,3]],
[[2,2],[1,1],[3,3]],
...
]
results = [
[3,4,5],
[3,3,5],
[4,2,6],
...
]
I managed to split them up into train and test arrays, where train contains 66% of the arrays and test the other 33%. Now I'd like to reshape them for further use in my LSTM but my script fails when inputting them into np.reshape() function.
split = int(round(0.66 * results.shape[0]))
train_results = results[:split, :]
train_inputs = inputs[:split, :]
test_results = results[split:, :]
test_inputs = inputs[split:, :]
X_train = np.reshape(train_inputs, (train_inputs.shape[0], train_inputs.shape[1], 1))
X_test = np.reshape(test_inputs, (test_inputs.shape[0], test_inputs.shape[1], 1))
Please tell me how to use np.reshape() correctly in this case.
Basically I am loosely following this tutorial: https://github.com/Vict0rSch/deep_learning/tree/master/keras/recurrent
You just pass a tuple to np.reshape.
For an LSTM layer, you need the shape like (NumberOfExamples, TimeSteps, FeaturesPerStep).
So, we need to know how many steps your sequence has. By the looks of your X array, I'll suppose you have 3 steps and 2 features.
If that's the case:
X_train = train_inputs.reshape((split,3,2))
X_test = X_test.reshape((test_inputs.shape[0], 3, 2))
If, otherwise, you want 6 steps of one feature, the shape is (split,6,1). You can do anything, as long as the multiplication of the three elements in the shape must remain always the same
For the results. Do you want the results to be a result in sequence, matching the input steps? Or are they just single outputs (two independent outputs for the entire sequence)?
Since you've got 3 results, and I have assumed you have 3 time steps, I'll assume these 3 results are in sequence as well, so, I'll reshape them as:
Y_train = train_results.reshape((split,3,1)) #three steps, one result per step
#for this to work, your last LSTM layer should use `return_sequences=True`.
But if they are 3 independent results:
Y_train = train_results.reshape((split,3))
#for this to work, you must have 3 cells in the last layer, be it a Dense or an LSTM. But this LSTM must have `return_sequences=False`.

How to train a LSTM model with different N-dimensions labels?

I am using keras (ver. 2.0.6 with TensorFlow backend) for a simple neural network:
model = Sequential()
model.add(LSTM(32, return_sequences=True, input_shape=(100, 5)))
model.add(LSTM(32, return_sequences=True))
model.add(TimeDistributed(Dense(5)))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
It is only a test for me, I am "training" the model with the following dummy data.
x_train = np.array([
[[0,0,0,0,1], [0,0,0,1,0], [0,0,1,0,0]],
[[1,0,0,0,0], [0,1,0,0,0], [0,0,1,0,0]],
[[0,1,0,0,0], [0,0,1,0,0], [0,0,0,1,0]],
[[0,0,1,0,0], [1,0,0,0,0], [1,0,0,0,0]],
[[0,0,0,1,0], [0,0,0,0,1], [0,1,0,0,0]],
[[0,0,0,0,1], [0,0,0,0,1], [0,0,0,0,1]]
])
y_train = np.array([
[[0,0,0,0,1], [0,0,0,1,0], [0,0,1,0,0]],
[[1,0,0,0,0], [0,1,0,0,0], [0,0,1,0,0]],
[[0,1,0,0,0], [0,0,1,0,0], [0,0,0,1,0]],
[[1,0,0,0,0], [1,0,0,0,0], [1,0,0,0,0]],
[[1,0,0,0,0], [0,0,0,0,1], [0,1,0,0,0]],
[[1,0,0,0,0], [0,0,0,0,1], [0,0,0,0,1]]
])
then i do:
model.fit(x_train, y_train, batch_size=2, epochs=50, shuffle=False)
print(model.predict(x_train))
The result is:
[[[ 0.11855114 0.13603994 0.21069065 0.28492314 0.24979511]
[ 0.03013871 0.04114409 0.16499813 0.41659597 0.34712321]
[ 0.00194826 0.00351031 0.06993906 0.52274817 0.40185428]]
[[ 0.17915446 0.19629011 0.21316603 0.22450975 0.18687972]
[ 0.17935558 0.1994358 0.22070852 0.2309722 0.16952793]
[ 0.18571526 0.20774922 0.22724937 0.23079531 0.14849086]]
[[ 0.11163659 0.13263632 0.20109797 0.28029731 0.27433187]
[ 0.02216373 0.03424517 0.13683401 0.38068131 0.42607573]
[ 0.00105937 0.0023865 0.0521594 0.43946937 0.50492537]]
[[ 0.13276921 0.15531689 0.21852671 0.25823513 0.23515201]
[ 0.05750636 0.08210614 0.22636817 0.3303588 0.30366054]
[ 0.01128351 0.02332032 0.210263 0.3951444 0.35998878]]
[[ 0.15303896 0.18197381 0.21823004 0.23647803 0.21027911]
[ 0.10842207 0.15755147 0.23791778 0.26479205 0.23131666]
[ 0.06472684 0.12843341 0.26680911 0.28923658 0.25079405]]
[[ 0.19560908 0.20663913 0.21954383 0.21920268 0.15900527]
[ 0.22829761 0.22907974 0.22933882 0.20822221 0.10506159]
[ 0.27179539 0.25587022 0.22594844 0.18308094 0.063305 ]]]
Ok, It works, but it is just a test, i really do not care about accuracy etc. I would like to understand how i can work with output of different size.
For example: passing a sequence (numpy.array) like:
[[0,0,0,0,1], [0,0,0,1,0], [0,0,1,0,0]]
I would like to get 4 dimensions output as prediction:
[[..first..], [..second..], [..third..], [..four..]]
Is that possibile somehow? The size could vary I would train the model with different labels that can have different N-dimensions.
Thanks
This answer is for non varying dimensions, but for varying dimensions, the padding idea in Giuseppe's answer seems the way to go, maybe with help of the "Masking" proposed in Keras documentation.
The output shape in Keras is totally dependent on the number of "units/neurons/cells" you put in the last layer, and of course, on the type of layer.
I can see that your data does not match your code in your question, it's impossible, but, suppose your code is right and forget the data for a while.
An input shape of (100,5) in an LSTM layer means a tensor of shape (None, 100, 5), which is
None is the batch size. The first dimension of your data is reserved to the number of examples you have. (X and Y must have the same number of examples).
Each example is a sequence with 100 time steps
each time step is a 5-dimension vector.
And the 32 cells in this same LSTM layer means that the resulting vectors will change from 5 to 32-dimension vectors. With return_sequences=True, all the 100 timesteps will appear in the result. So the result shape of the first layer is (None, 100, 32):
Same number of examples (this will never change along the model)
Still 100 timesteps per example (because return_sequences=True)
each time step is a 32-dimension vector (because of 32 cells)
Now the second LSTM layer does exactly the same thing. Keeps the 100 timesteps, and since it has also 32 cells, keeps the 32-dimension vectors, so the output is also (None, 100, 32)
Finally, the time distributed Dense layer will also keep the 100 timesteps (because of TimeDistributed), and change your vectors to 5-dimensoin vectors again (because of 5 units), resulting in (None, 100, 5).
As you can see, you cannot change the number of timesteps directly with recurrent layers, you need to use other layers to change these dimensions. And the way to do this is completely up to you, there are infinite ways of doing this.
But in all of them, you need to get free of the timesteps and rebuild the data with another shape.
Suggestion
A suggestion from me (which is just one possibility) is to reshape your result, and apply another dense layer just to achieve the final shape expeted.
Suppose you want a result like (None, 4, 5) (never forget, the first dimension of your data is the number of examples, it can be any number, but you must take it into account when you organize your data). We can achieve this by reshaping the data to a shape containing 4 in the second dimension:
#after the Dense layer:
model.add(Reshape((4,125)) #the batch size doesn't appear here,
#just make sure you have 500 elements, which is 100*5 = 4*125
model.add(TimeDistributed(Dense(5))
#this layer could also be model.add(LSTM(5,return_sequences=True)), for instance
#continue to the "Activation" layer
This will give you 4 timesteps (because the dimension after Reshape was: (None, 4, 125), each step being a 5-dimension vector (because of Dense(5)).
Use the model.summary() command to see the shapes outputted by each layer.
I don't know Keras but from a practical and theoretical point of view this is absolutely possible.
The idea is that you have an input sequence and an output sequence. Commonly, the beginning and the end of each sequence are delimited by some special symbol (e.g. the character sequence "cat" is translated into "^cat#" with an start symbol "^" and an end symbol "#"). Then the sequence is padded with another special symbol, up to a maximum sequence length (e.g. "^cat#$$$$$$" with a padding symbol "$").
If the padding symbol correspond to a zero-vector, it will have no impact on your training.
Your output sequence could now assume any length up to the maximum one, because the real length is the one from the start to the end symbol positions.
In other words, you will have always the same input and output sequence length (i.e. the maximum one), but the real length is that between the start and the end symbols.
(Obviously, in the output sequence, anything after the end symbol should not be considered in the loss function)
There seems to be two methods to do a sequence to sequence method, you're describing. The first directly using keras using this example (code below)
from keras.layers import Input, LSTM, RepeatVector
from keras.models import Model
inputs = Input(shape=(timesteps, input_dim))
encoded = LSTM(latent_dim)(inputs)
decoded = RepeatVector(timesteps)(encoded)
decoded = LSTM(input_dim, return_sequences=True)(decoded)
sequence_autoencoder = Model(inputs, decoded)
encoder = Model(inputs, encoded)
Where the repeat vector repeats the initial time series n times to match the output vectors number of timestamps. This will still mean you need a fixed number of time steps in you output vector, however, there may be a method to padding vectors that have less timestamps than you max amount of timesteps.
Or you can you the seq2seq module, which is built ontop of keras.

Categories