I see the following example code on tensorflow 2.0 API
model = Sequential()
model.add(Embedding(1000, 64, input_length=10))
# the model will take as input an integer matrix of size (batch,
# input_length).
# the largest integer (i.e. word index) in the input should be no larger
# than 999 (vocabulary size).
# now model.output_shape == (None, 10, 64), where None is the batch
# dimension.
input_array = np.random.randint(1000, size=(32, 10))
model.compile('rmsprop', 'mse')
output_array = model.predict(input_array)
assert output_array.shape == (32, 10, 64)
I have used keras API for a few days, compile, fit and then predict is my way.
What does above example mean without fit step?
It represents the use of initialized parameters in the model without fit(). This example is just to illustrate the return shape of Embedding layer.
Related
I have a keras model that is trained on a sequence of data with a single label. I'm assuming a categorically encoded feature which passes through an embedding layer before a GRU layer.
samples, timesteps, features = 2000, 10, 1
inputs_1 = np.random.randint(1, 50, [samples, timesteps, features]).astype(np.float32)
labels = np.random.randint(0, 2, [samples, 1])
# Input
input_ = Input(shape=(None,))
# Embeddings
emb = Embedding(input_dim=int(50),
output_dim=20,
input_length=(None,),
mask_zero=False,
name="cat_feat_0" + "_emb")(input_)
gru = GRU(32,
activation="tanh",
dropout=0,
recurrent_dropout=0,
go_backwards=False,
return_sequences=False,
name="gru_cat")(emb)
y = Dense(10, activation = "tanh")(gru)
y = Dropout(0.4)(y)
y = Dense(1, activation = "sigmoid")(y)
model = Model(inputs=input_, outputs=y)
model.compile(loss=BCE_Last_Event,
optimizer=Adam(beta_1=0.9, beta_2=0.999),
metrics=["accuracy"])
model.predict(inputs_1).shape
When I predict my data, the output shape is (2000,1) given that it predicts a single label for the sequence. Would it be possible to output the scores for every event in the sequence such that the model returns predictions of shape (2000, 10, 1)?
I know I can return the sequence in the GRU layer which will be propagated. However, I still only have a single label so the loss function would be erroneous. My current thinking is either:
Create a new model which returns the sequences using the same weights as the trained model
Wrap the model in a TimeDistributed layer such that it predicts every event in the sequence.
I am concerned that the second solution will be error-prone as it will only take as input a single event throughout the entire length of the sequence, rather than the entire sequence for its prediction. Is this thinking correct?
What are the best solutions?
I'm trying to convert a trained sign language classification solution in python to a C language headers so that I can deploy on a M4-cortex CPU board.
In Python, I'm able to build model and train it and I can see it predicting with 90% accuracy.
But I see an issue with number of weights used/generated in convolution layers
**Conv_1d configuration**
print(x_train.shape)
model = Sequential()
model.add(Conv1D(32,kernel_size=5, padding='same',
input_shape=x_train.shape[1:], name='conv1d_1'))
print(model.layers[0].kernel.numpy().shape)
**output:**
(1742, 45, 45)
**(5, 45, 32)**
According to above configuration
input dimension = 45x45x1 pixels of image(gray scale)
input channels = 1
output dimension = 45x45x32
output channesls = 32
kernel size = 5
As per the concept(w.r.t https://cs231n.github.io/convolutional-networks/)
number of weights = (input_channels) x (kernel_size) x (kernel_size) x (output_channels)=1x5x5x32=800
But keras model produces weights array of size = [5][45][32]=7200
I'm not sure if my interpretation of weight array in keras model is correct, I would be glad if someone can help me with this
Some bullets that should clarify your doubts.
You're formula for the number of weights can't be right because you're using a Conv1D, so the kernel size has only one dimension.
Defining the input shape x_train.shape[1:] = (45,45) corresponds to 45 filters applied on an array with 45 elements (again because it's a Conv1D).
Said so, the number of weights is:
# of weights = input_filters x kernel_size x output_filters = 45x5x32 = 7200 (without biases)
Considering that you have images, probably you're looking for Conv2D. In this case, the input shape should be (45,45,1), the kernel has two dimensions, and the number of parameters is exactly 800 (without biases)
model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv2D(32,kernel_size=5, padding='same',
input_shape=(45, 45, 1), use_bias=False))
model.summary()
# Layer (type) Output Shape Param #
# conv (Conv2D) (None, 45, 45, 32) 800
I would like to create a 'Sequential' model (a Time Series model as you might have guessed), that takes 20 days of past data with a feature size of 2, and predict 1 day into the future with the same feature size of 2.
I found out you need to specify the batch size for a stateful LSTM model, so if I specify a batch size of 32 for example, the final output shape of the model is (32, 2), which I think means the model is predicting 32 days into the future rathen than 1.
How would I go on fixing it?
Also, asking before I arrive to the problem; if I specify a batch size of 32 for example, but I want to predict on an input of shape (1, 20, 2), would the model predict correctly or what, since I changed to batch size from 32 to 1. Thank you.
You don't need to specify batch_size. But you should feed 3-d tensor:
import tensorflow as tf
from tensorflow.keras.layers import Input, LSTM, Dense
from tensorflow.keras import Model, Sequential
features = 2
dim = 128
new_model = Sequential([
LSTM(dim, stateful=True, return_sequences = True),
Dense(2)
])
number_of_sequences = 1000
sequence_length = 20
input = tf.random.uniform([number_of_sequences, sequence_length, features], dtype=tf.float32)
output = new_model(input) # shape is (number_of_sequences, sequence_length, features)
predicted = output[:,-1] # shape is (number_of_sequences, 1, features)
Shape of (32, 2) means that your sequence length is 32.
Batch size is a parameter of training (how many sequences should be feeded to the model before backpropagating error - see stochastic graient descent method). It doesn't affect your data (which shoud be 3-d - (number of sequences, length of sequence, feature)).
If you need to predict only one sequence - just feed tensor of shape (1, 20, 2) to the model.
I'm creating an end-to-end speech recognition architecture, in which my data is a list of segmented spectrograms. My data has shape (batch_size, timesteps, 8, 65, 1) in which batch_size is fixed but timesteps is varying. I can't figure out, how to put this data into a tensor with the appropriate shape to feed my model. Here is a piece of code that shows my problem:
import numpy as np
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.layers import Conv2D, MaxPool2D, Dense, Dropout, Flatten, TimeDistributed
from tensorflow.keras.layers import SimpleRNN, LSTM
from tensorflow.keras import Input, layers
from tensorflow.keras import backend as K
segment_width = 8
segment_height = 65
segment_channels = 1
batch_size = 4
segment_lengths = [28, 33, 67, 43]
label_lengths = [16, 18, 42, 32]
TARGET_LABELS = np.arange(35)
# Generating data
X = [np.random.uniform(0,1, size=(segment_lengths[k], segment_width, segment_height, segment_channels))
for k in range(batch_size)]
y = [np.random.choice(TARGET_LABELS, size=label_lengths[k]) for k in range(batch_size)]
# Model definition
input_segments_data = tf.keras.Input(name='input_segments_data', shape=(None, segment_width, segment_height, segment_channels),
dtype='float32')
input_segment_lengths = tf.keras.Input(name='input_segment_lengths', shape=[1], dtype='int64')
input_label_lengths = tf.keras.Input(name='input_label_lengths', shape=[1], dtype='int64')
# More complex architecture comes here
outputs = Flatten()(input_segments_data)
model = tf.keras.Model(inputs=[input_segments_data, input_segment_lengths, input_label_lengths], outputs = outputs)
def dummy_loss(y_true, y_pred):
return y_pred
model.compile(optimizer="Adam", loss=dummy_loss)
model.summary()
output:
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_segments_data (InputLayer [(None, None, 8, 65, 0
__________________________________________________________________________________________________
input_segment_lengths (InputLay [(None, 1)] 0
__________________________________________________________________________________________________
input_label_lengths (InputLayer [(None, 1)] 0
__________________________________________________________________________________________________
flatten (Flatten) (None, None) 0 input_segments_data[0][0]
==================================================================================================
Total params: 0
Trainable params: 0
Non-trainable params: 0
__________________________________________________________________________________________________
Now when I try to predict from my random data:
model.predict([X, segment_lengths, segment_lengths])
I get this error:
ValueError: Error when checking input: expected input_segments_data to have 5 dimensions, but got array with shape (4, 1)
How can I convert X (which is a list of arrays) to a tensor of shape (None, None, 8, 65, 1) and feed it to my model? I don't want to use zero padding!
Keras model takes numpy array (tensor) as input. You cannot have a tensor with variable timesteps. Instead, what you can do is to pad all the data into same shape, using e.g. pad_sequence And then, you can add a Masking layer to your model to ignore the padded values.
This is a common issue with Tensorflow and other deep learning frameworks that operate on tensors. Unfortunately, there is no current easy way to this exactly as you asked, besides padding your sequences and then masking.
To do this, you simply have to store your input data in a numpy array with fixed dimensions and feed that to the model. You have to add dummy values to represent the missing timesteps in your sequences (a common value is 0).
Then, you have to add a Masking layer to your model, that will tell Keras to ignore the timesteps that have the dummy features.
From the documentation:
keras.layers.Masking(mask_value=0.0)
If all features for a given sample timestep are equal to mask_value, then the sample timestep will be masked (skipped) in all downstream layers (as long as they support masking).
I've adapted and simplified part of your code to give you an idea of how this works. You can adapt this to your variable-sized labels, as well:
# Generating data (using a dummy zero-array to store padded sequences)
X = np.zeros((batch_size, max(segment_lengths), segment_width, segment_height, segment_channels))
X_true = [np.ones((segment_lengths[k], segment_width, segment_height, segment_channels))
for k in range(batch_size)]
# Populate dummy array
for i, x in enumerate(X_true):
X[i, -segment_lengths[i]:, ...] = x
# Model definition
input_segments_data = tf.keras.Input(name='input_segments_data', shape=(max(segment_lengths), segment_width, segment_height, segment_channels))
masked_segments_data = tf.keras.layers.Masking()(input_segments_data)
# More complex architecture comes here
outputs = tf.keras.layers.Flatten()(input_segments_data)
model = tf.keras.Model(inputs=input_segments_data, outputs = outputs)
def dummy_loss(y_true, y_pred):
return y_pred
model.compile(optimizer="Adam", loss=dummy_loss)
model.summary()
A drawback of this approach is that if you actually have a "real" feature that is exactly like a dummy feature (e.g., all zeros), the model will mask it. Choose your masking value appropriately to avoid this.
An alternative approach would be to do something similar as what you did, but using batches of size 1. This, however, is likely to cause instability in your training procedure and I would avoid it if possible.
As a final note, Tensorflow 2 added support for RaggedTensors, which are tensors with one or more variable dimensions. Currently there is no support for RNNs, but it will probably be added eventually.
Hope this helps.
I am trying to create a simple LSTM network that would - based on the last 16 time frames - provide some output. Let's say I have a dataset with 112000 rows (measurements) and 7 columns (6 features + class). What I understand is that I have to "pack" the dataset into X number of 16 elements long batches. With 112000 rows that would mean 112000/16 = 7000 batches, therefore a numpy 3D array with shape (7000, 16, 7). Splitting this array for train and test data I get shapes:
xtrain.shape == (5000, 16, 6)
ytrain.shape == (5000, 16)
xtest.shape == (2000, 16, 6)
ytest.shape == (2000, 16)
My model looks like this:
model.add(keras.layers.LSTM(8, input_shape=(16, 6), stateful=True, batch_size=16, name="input"));
model.add(keras.layers.Dense(5, activation="relu", name="hidden1"));
model.add(keras.layers.Dense(1, activation="sigmoid", name="output"));
model.compile(optimizer="rmsprop", loss="binary_crossentropy", metrics=["accuracy"]);
model.fit(xtrain, ytrain, batch_size=16, epochs=10);
However after trying to fit the model I get this error:
ValueError: Error when checking target: expected output to have shape (1,) but got array with shape (16,)
What I guess is wrong is that the model expects a single output per batch (so the ytrain shape should be (5000,)), instead of 16 outputs (one for every entry in a batch - (5000, 16)).
If that is the case, should I, instead of packing the data like this, create a 16 elements long batch for every output? Therefore having
xtrain.shape == (80000, 16, 6)
ytrain.shape == (80000,)
xtest.shape == (32000, 16, 6)
ytest.shape == (32000,)
You are close with the last comments of the question. Since it's a binary classification problem, you should have 1 output per input, so you need to get rid of the 16 in you ys and replace it for a 1.
Besides, you need to be able to divide the train set by your batch size, so you can use 5008 for example.
In fact:
ytrain.shape == (5000, 1)
Passes the error you mention, but raises a new one:
ValueError: In a stateful network, you should only pass inputs with a number of samples that can be divided by the batch size. Found: 5000 samples
Which is addressed by ensuring that:
xtrain.shape == (5008, 16, 6)
ytrain.shape == (5008, 1)