I am studying the in and outs of Keras. So, in this aspect I was checking the model.summary() function.
I was using a simple image classification example provided by Keras itself and loaded the various pretrained models provided (Xception, VGG16 etc).
I checked each model architecture using model.summary() as mentioned. Then I noticed that for some reason the column Connected to (4th column that is) is not present to every model summary. For example for MobileNetV2 I get (just the first few lines are shown):
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 224, 224, 3) 0
__________________________________________________________________________________________________
Conv1_pad (ZeroPadding2D) (None, 225, 225, 3) 0 input_1[0][0]
__________________________________________________________________________________________________
Conv1 (Conv2D) (None, 112, 112, 32) 864 Conv1_pad[0][0]
but for MobileNet I get:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 224, 224, 3) 0
_________________________________________________________________
conv1_pad (ZeroPadding2D) (None, 225, 225, 3) 0
_________________________________________________________________
conv1 (Conv2D) (None, 112, 112, 32) 864
This output is performed without taking any extra action after the model loading (no training, neither inference etc).
This seems odd and I am not sure what's going on here. For example when creating this simple model from this question here (up to the model0.fit(...) part) and running model0.summary() gives me a summary without Connected to column also contrary to the posted summary in this question.
So, this change to the output? What's the deal with model.summary()? Do we have some control over the output (although the examples above do not imply that)? Or the output has to do with the way a model was structured?
Edit:
I added the (trivial) code used to reproduce the summary of both models as requested in a comment.
from keras.applications.mobilenet_v2 import MobileNetV2
from keras.applications.mobilenet import MobileNet
model1 = MobileNetV2(weights='imagenet')
print(model1.summary())
model2 = MobileNet(weights='imagenet')
print(model2.summary())
Also, my system uses Keras 2.2.4, Tensorflow 1.12.0 and Ubuntu 16.04 if these info are useful somehow.
I suppose the reason is: MobileNetV2 was implemented keras.Model, but MobileNet is keras.Sequential.
Both Model and Sequential have a summary method. While running, it invokes print_summary method, which acts differently for sequential-like and non-sequential models:
if sequential_like:
line_length = line_length or 65
positions = positions or [.45, .85, 1.]
if positions[-1] <= 1:
positions = [int(line_length * p) for p in positions]
# header names for the different log elements
to_display = ['Layer (type)', 'Output Shape', 'Param #']
else:
line_length = line_length or 98
positions = positions or [.33, .55, .67, 1.]
if positions[-1] <= 1:
positions = [int(line_length * p) for p in positions]
# header names for the different log elements
to_display = ['Layer (type)',
'Output Shape',
'Param #',
'Connected to']
relevant_nodes = []
for v in model._nodes_by_depth.values():
relevant_nodes += v
(link). As you can see, it just doesn't print 'Connected to' for a sequential-like model.
I guess the reason is that sequential model doesn't allow to connect layers in non-sequential order - so, they just connected one-by-one.
Also, it checks a model type via model.__class__.__name__ == 'Sequential' (link). I doubt that it's a good idea trying to cahnge it "on-the-fly" to obtain a different output.
Related
I am training a LSTM to predict event occurences. For each day I have a vector like [1,0,1] to denote that the first and third event occured, whereas the second one did not.
I want to extend this problem to work for multiple people, where each person has a distinct agent_id. This means that somehow I need to present my model with the agent_id as a feature. Although I'm not sure if this is the best way, I made the first entry of my vector the agent_id, so it looks like for example [123456, 1, 0 ,1].
Now what the LSTM model does is for each event output a probability of it occuring on the next day. So how I see the input/output would be: [agent_id, did event 1 occur today?, did event 2 occur today? did event 3 occur today?] -> LSTM -> [probability of event 1 occuring tomorrow, probability of event 2 occuring tomorrow, probability of event 3 occuring tomorrow]
Now the input has a longer length than the output. As far as I understood, from the answer to this post https://stats.stackexchange.com/questions/305863/how-to-train-lstm-model-on-multiple-time-series-data , I need to have an embedding layer that can change the size of my input so that the LSTM gives me the desired output.
For this, I tried to do the following:
from keras.models import Sequential
from keras.layers import *
xin = Input(batch_shape=(batch_size, window_length), dtype='int32')
xemb = Embedding(x_traindict[123456].shape[2], x_traindict[123456].shape[2]-1)(xin) #from what I give in to what I want to get out # 3dim (batch,time,feat)
seq = LSTM(x_traindict[123456].shape[2]-1, return_sequences=True)(xemb)
mlp = TimeDistributed(Dense(y_traindict[123456].shape[1], activation='softmax'))(seq)
model = tf.keras.Model(inputs=xin, outputs=mlp)
model.compile(optimizer='Adam', loss='categorical_crossentropy')
print(f"batch size is {batch_size}, window_length = {window_length}, x_train.shape is {x_traindict[123456].shape} and y_train.shape is {y_traindictalt[123456].shape}")
model.summary()
model.fit(x_traindict[123456], y_traindict[123456], epochs=20)
------------------------------------------------------------------------------------------------
batch size is 358, window_length = 7, x_train.shape is (358, 7, 149) and y_train.shape is (358, 148)
Model: "model_7"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_12 (InputLayer) [(358, 7)] 0
embedding_14 (Embedding) (358, 7, 148) 22052
lstm_16 (LSTM) (358, 7, 148) 175824
time_distributed_11 (TimeDi (358, 7, 149) 22201
stributed)
=================================================================
Total params: 220,077
Trainable params: 220,077
Non-trainable params: 0
_________________________________________________________________
My idea was that the Embedding would take the input from x_train, including the agent_id , and would learn to encode it to an input of the size of y_train, which does not include the agent_id. The LSTM will then learn to deal with what it receives from the embedding to correctly predict y_train. However, the code above gives me the following error:
ValueError: Exception encountered when calling layer "model_7" (type Functional).
Input 0 of layer "lstm_16" is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: (None, 7, 149, 148)
I do not understand this error.
What I would thus like to ask is:
Does my idea even make sense? By implementing the agent_id directly with the events, can the LSTM learn the time series prediction for different agents?
How can I fix the error in my code? If it helps, I basically filled in the template from the first answer to this post: https://github.com/keras-team/keras/issues/2654
EDIT:
I have tried changing xin to xin = Input(batch_shape=(window_length,), dtype='int32') but now get a
Value error in the line where I say seq = ...: Input 0 of layer "lstm_26" is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (7, 133).
I also tried changing xin to xin = Input(batch_shape=(window_length,number_of_transactions+1), dtype='int32') but this produces ValueError: Input 0 of layer "model_11" is incompatible with the layer: expected shape=(None, 134), found shape=(None, 7, 134)
Note: I had to take a new sample today. The 134 replaces the 149 events from above
Yes, I think the idea is valid. By placing the agent-id as the first element in the sequence the RNN will encode this information in the state which is subsequently used to predict the probabilities for an event. One thing to watch out for is that the model will try to generate a prediction given just the first sequence element - the agent-id.
I think your issue is that you included the batch size in the input call, but batch size is implied and doesn't need to be defined, so
xin = Input(batch_shape=(batch_size, window_length), dtype='int32')
should become xin = Input(batch_shape=(window_length), dtype='int32')
I understand that it is a long post, but help in any of the sections is appreciated.
I have some queries about the prediction method of my LSTM model. Here is a general summary of my approach:
I used a dataset having 50 time series for training. They start with a value of 1.09 all the way up to 0.82, with each time series having between 570 to 2000 datapoints (i.e, each time series has a different length, but similar trend).
I converted them to the dataset accepted by keras' LSTM/Bi-LSTM layers in the format:
[1, 0.99, 0.98, 0.97] ==Output==> [0.96]
[0.99, 0.98, 0.97, 0.96] ==Output==> [0.95]
and so on..
Shapes of the input and output containers (arrays): input(39832, 5, 1) and output(39832, )
Error-free training
Prediction on an initial points of data (window) having shape (1, 5, 1). This has been taken from the actual data.
The predicted output is one value, which is appended to a separate list (for plotting), as well as appended to the window, and the first value of the window dropped out. This window is then fed as input to the model to generate the next prediction point.
Continue this until I get the whole curve for both models (LSTM and Bi-LSTM)
However, the prediction is not even close to the actual data. It flatlines to a fixed value, whereas it should be somewhat like the black curve (which is the actual data)
Figure:https://i.stack.imgur.com/Ofw7m.png
Model (similar code goes for Bi-LSTM model):
model_lstm = Sequential()
model_lstm.add(LSTM(128, input_shape=(timesteps, 1), return_sequences= True))
model_lstm.add(Dropout(0.2))
model_lstm.add(LSTM(128, return_sequences= False))
model_lstm.add(Dropout(0.2))
model_lstm.add(Dense(1))
model_lstm.compile(loss = 'mean_squared_error', optimizer = optimizers.Adam(0.001))
Curve prediction initialize:
start = cell_to_test[0:timesteps].reshape(1, timesteps, 1)
y_curve_lstm = list(start.flatten())
y_window = start
Curve prediction:
while len(y_curve_lstm) <= len(cell_to_test):
yhat = model_lstm.predict(y_window)
yhat = float(yhat)
y_curve_lstm.append(yhat)
y_window = list(y_window.flatten())
y_window.append(yhat)
y_window.remove(y_window[0])
y_window = np.array(y_window).reshape(1, timesteps, 1)
#print(yhat)
Model summary:
Model: "sequential_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_5 (LSTM) (None, 5, 128) 66560
_________________________________________________________________
dropout_5 (Dropout) (None, 5, 128) 0
_________________________________________________________________
lstm_6 (LSTM) (None, 128) 131584
_________________________________________________________________
dropout_6 (Dropout) (None, 128) 0
_________________________________________________________________
dense_5 (Dense) (None, 1) 129
=================================================================
Total params: 198,273
Trainable params: 198,273
Non-trainable params: 0
_________________________________________________________________
And in addition to diagnosing the problem, I am really trying to find the answers to the following questions (I looked up other sources, but in vain):
Is my data enough to train the LSTM model? I have been told that it requires thousands of data points, so I feel that my current dataset more than suffices the condition.
Is my model less/more complex than it needs to be?
Does increasing the number of epochs, layers, and the neurons per layer always lead to a 'better' model, or are there optimal values for the same? If the latter, then is there a method to find this optimal point, or is hit-and-trail the only way?
I trained with the number of epochs=25, which gave me a loss of 1.25 * 10e-4. Should the loss be lower for the model to predict the trend? (I am focused on getting the shape first, accuracy later, because the training takes too long with higher epochs)
In continuation to the previous question, does loss have the same unit as the data? The reason why I am asking this is because the data has a resolution of up to 10e-7.
Once again, I understand that it has been a long post, but help in any of the sections is appreciated.
when using a GlobalAveragePoolingLayer2D() in Keras, the dimension of the output is of course not the dimension of the output, for example if the input dimension is (100,100,64), the output dimension is (1,1,64). But I would like to have the same output dimension. There are two options which work for me: reproduce the mean of every channel 100*100 times in the given channel in the output or just place the result at a given position in the 100x100 matrix according to this output channel and placing zeros at all other positions. Does someone have an idea how to do this?
Kind regards.
You can add a Lambda layer which wraps arbitrary function as a Layer object.
# Replacing layer values with it's mean:
def lambda_layer(x):
a = K.zeros_like(x) + K.mean(x, axis=[1,2], keepdims=True)
return a
model = Sequential()
model.add(Conv2D(16,3, input_shape=(50,50,3)))
model.add(Lambda(lambda_layer))
'''
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_3 (Conv2D) (None, 48, 48, 16) 448
_________________________________________________________________
lambda_3 (Lambda) (None, 48, 48, 16) 0
=================================================================
'''
Similarly you can place mean at one location, keeping zeros for rest of the values.
I'm creating an end-to-end speech recognition architecture, in which my data is a list of segmented spectrograms. My data has shape (batch_size, timesteps, 8, 65, 1) in which batch_size is fixed but timesteps is varying. I can't figure out, how to put this data into a tensor with the appropriate shape to feed my model. Here is a piece of code that shows my problem:
import numpy as np
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.layers import Conv2D, MaxPool2D, Dense, Dropout, Flatten, TimeDistributed
from tensorflow.keras.layers import SimpleRNN, LSTM
from tensorflow.keras import Input, layers
from tensorflow.keras import backend as K
segment_width = 8
segment_height = 65
segment_channels = 1
batch_size = 4
segment_lengths = [28, 33, 67, 43]
label_lengths = [16, 18, 42, 32]
TARGET_LABELS = np.arange(35)
# Generating data
X = [np.random.uniform(0,1, size=(segment_lengths[k], segment_width, segment_height, segment_channels))
for k in range(batch_size)]
y = [np.random.choice(TARGET_LABELS, size=label_lengths[k]) for k in range(batch_size)]
# Model definition
input_segments_data = tf.keras.Input(name='input_segments_data', shape=(None, segment_width, segment_height, segment_channels),
dtype='float32')
input_segment_lengths = tf.keras.Input(name='input_segment_lengths', shape=[1], dtype='int64')
input_label_lengths = tf.keras.Input(name='input_label_lengths', shape=[1], dtype='int64')
# More complex architecture comes here
outputs = Flatten()(input_segments_data)
model = tf.keras.Model(inputs=[input_segments_data, input_segment_lengths, input_label_lengths], outputs = outputs)
def dummy_loss(y_true, y_pred):
return y_pred
model.compile(optimizer="Adam", loss=dummy_loss)
model.summary()
output:
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_segments_data (InputLayer [(None, None, 8, 65, 0
__________________________________________________________________________________________________
input_segment_lengths (InputLay [(None, 1)] 0
__________________________________________________________________________________________________
input_label_lengths (InputLayer [(None, 1)] 0
__________________________________________________________________________________________________
flatten (Flatten) (None, None) 0 input_segments_data[0][0]
==================================================================================================
Total params: 0
Trainable params: 0
Non-trainable params: 0
__________________________________________________________________________________________________
Now when I try to predict from my random data:
model.predict([X, segment_lengths, segment_lengths])
I get this error:
ValueError: Error when checking input: expected input_segments_data to have 5 dimensions, but got array with shape (4, 1)
How can I convert X (which is a list of arrays) to a tensor of shape (None, None, 8, 65, 1) and feed it to my model? I don't want to use zero padding!
Keras model takes numpy array (tensor) as input. You cannot have a tensor with variable timesteps. Instead, what you can do is to pad all the data into same shape, using e.g. pad_sequence And then, you can add a Masking layer to your model to ignore the padded values.
This is a common issue with Tensorflow and other deep learning frameworks that operate on tensors. Unfortunately, there is no current easy way to this exactly as you asked, besides padding your sequences and then masking.
To do this, you simply have to store your input data in a numpy array with fixed dimensions and feed that to the model. You have to add dummy values to represent the missing timesteps in your sequences (a common value is 0).
Then, you have to add a Masking layer to your model, that will tell Keras to ignore the timesteps that have the dummy features.
From the documentation:
keras.layers.Masking(mask_value=0.0)
If all features for a given sample timestep are equal to mask_value, then the sample timestep will be masked (skipped) in all downstream layers (as long as they support masking).
I've adapted and simplified part of your code to give you an idea of how this works. You can adapt this to your variable-sized labels, as well:
# Generating data (using a dummy zero-array to store padded sequences)
X = np.zeros((batch_size, max(segment_lengths), segment_width, segment_height, segment_channels))
X_true = [np.ones((segment_lengths[k], segment_width, segment_height, segment_channels))
for k in range(batch_size)]
# Populate dummy array
for i, x in enumerate(X_true):
X[i, -segment_lengths[i]:, ...] = x
# Model definition
input_segments_data = tf.keras.Input(name='input_segments_data', shape=(max(segment_lengths), segment_width, segment_height, segment_channels))
masked_segments_data = tf.keras.layers.Masking()(input_segments_data)
# More complex architecture comes here
outputs = tf.keras.layers.Flatten()(input_segments_data)
model = tf.keras.Model(inputs=input_segments_data, outputs = outputs)
def dummy_loss(y_true, y_pred):
return y_pred
model.compile(optimizer="Adam", loss=dummy_loss)
model.summary()
A drawback of this approach is that if you actually have a "real" feature that is exactly like a dummy feature (e.g., all zeros), the model will mask it. Choose your masking value appropriately to avoid this.
An alternative approach would be to do something similar as what you did, but using batches of size 1. This, however, is likely to cause instability in your training procedure and I would avoid it if possible.
As a final note, Tensorflow 2 added support for RaggedTensors, which are tensors with one or more variable dimensions. Currently there is no support for RNNs, but it will probably be added eventually.
Hope this helps.
I'm trying to create a simple stateful neural network in keras to wrap my head around how to connect Embedding layers and LSTM's. I have a piece of text where I have mapped every character to a integer and would like to send in one character at a time to predict the next character. I have done this earlier where I have sent in 8 characters at a time and got that to work well (using return_sequences=True and TimeDistributed(Dense)). But this time I want to only send in 1 character at a time and this is where my problem arises.
The code I use to set up my model:
n_fac = 32
vocab_size = len(chars)
n_hidden = 256
batch_size=64
model = Sequential()
model.add(Embedding(vocab_size,n_fac,input_length=1,batch_input_shape=(batch_size,1)))
model.add(BatchNormalization())
model.add(LSTM(n_hidden,stateful=True))
model.add(Dense(vocab_size,activation='softmax'))
model.summary() gives me the following:
Layer (type) Output Shape Param # Connected to
embedding_1 (Embedding) (64, 1, 32) 992 embedding_input_1[0][0]
batchnormalization_1 (BatchNorma (64, 1, 32) 128 embedding_1[0][0]
lstm_1 (LSTM) (64, 256) 295936 batchnormalization_1[0][0]
dense_1 (Dense) (64, 31) 7967 lstm_1[0][0]
Total params: 305,023
Trainable params: 304,959
Non-trainable params: 64
The code I use to set up my training data:
text = ... #Omitted for simplicity. Just setting text to some kind of literature work
text = text.lower() #Simple model, therefor only using lower case characters
idx2char = list(set(list(text)))
char2idx = {char:idx for idx,char in enumerate(idx2char)}
text_in_idx = [char2idx[char] for char in text]
x = text_idx[:-1]
y = text_idx[1:]
Compiling and training my network:
model.compile(optimizer=Adam(lr=1e-4),loss='sparse_categorical_crossentropy')
nb_epoch = 10
for i in range(nb_epoch):
model.reset_states()
model.fit(x,y,nb_epoch=1,batch_size=batch_size,shuffle=False)
Training works as it should, the loss is reduced with each epoch.
Now I want to try out my trained network but have no idea how to give it a character to predict the next. I start out by resetting its states and then want to start feeding it one char at a time.
I tried a couple of different inputs but all of them failed. These are not qualified guesses.
#The model uses integers for characters, therefor integers are sent as input
model.predict([1]) #Type error
model.predict(np.array([1])) #Value error
model.predict(np.array([1])[np.newaxis,:]) #Value error
model.predict(np.array([1])[:,np.newaxis]) #Value error
Am I forced to send in something of length batch_size or how am I supposed to send in data for the model to predict something?
The error text for Value error is very long and obscure so I omitted it. I can supply it if needed.
Using theano backend with keras.