Outputs of GlobalAveragePooling2D same dimensions as input (Keras) - python

when using a GlobalAveragePoolingLayer2D() in Keras, the dimension of the output is of course not the dimension of the output, for example if the input dimension is (100,100,64), the output dimension is (1,1,64). But I would like to have the same output dimension. There are two options which work for me: reproduce the mean of every channel 100*100 times in the given channel in the output or just place the result at a given position in the 100x100 matrix according to this output channel and placing zeros at all other positions. Does someone have an idea how to do this?
Kind regards.

You can add a Lambda layer which wraps arbitrary function as a Layer object.
# Replacing layer values with it's mean:
def lambda_layer(x):
a = K.zeros_like(x) + K.mean(x, axis=[1,2], keepdims=True)
return a
model = Sequential()
model.add(Conv2D(16,3, input_shape=(50,50,3)))
model.add(Lambda(lambda_layer))
'''
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_3 (Conv2D) (None, 48, 48, 16) 448
_________________________________________________________________
lambda_3 (Lambda) (None, 48, 48, 16) 0
=================================================================
'''
Similarly you can place mean at one location, keeping zeros for rest of the values.

Related

Keras LSTM - Many to many with embedding layer

I am training a LSTM to predict event occurences. For each day I have a vector like [1,0,1] to denote that the first and third event occured, whereas the second one did not.
I want to extend this problem to work for multiple people, where each person has a distinct agent_id. This means that somehow I need to present my model with the agent_id as a feature. Although I'm not sure if this is the best way, I made the first entry of my vector the agent_id, so it looks like for example [123456, 1, 0 ,1].
Now what the LSTM model does is for each event output a probability of it occuring on the next day. So how I see the input/output would be: [agent_id, did event 1 occur today?, did event 2 occur today? did event 3 occur today?] -> LSTM -> [probability of event 1 occuring tomorrow, probability of event 2 occuring tomorrow, probability of event 3 occuring tomorrow]
Now the input has a longer length than the output. As far as I understood, from the answer to this post https://stats.stackexchange.com/questions/305863/how-to-train-lstm-model-on-multiple-time-series-data , I need to have an embedding layer that can change the size of my input so that the LSTM gives me the desired output.
For this, I tried to do the following:
from keras.models import Sequential
from keras.layers import *
xin = Input(batch_shape=(batch_size, window_length), dtype='int32')
xemb = Embedding(x_traindict[123456].shape[2], x_traindict[123456].shape[2]-1)(xin) #from what I give in to what I want to get out # 3dim (batch,time,feat)
seq = LSTM(x_traindict[123456].shape[2]-1, return_sequences=True)(xemb)
mlp = TimeDistributed(Dense(y_traindict[123456].shape[1], activation='softmax'))(seq)
model = tf.keras.Model(inputs=xin, outputs=mlp)
model.compile(optimizer='Adam', loss='categorical_crossentropy')
print(f"batch size is {batch_size}, window_length = {window_length}, x_train.shape is {x_traindict[123456].shape} and y_train.shape is {y_traindictalt[123456].shape}")
model.summary()
model.fit(x_traindict[123456], y_traindict[123456], epochs=20)
------------------------------------------------------------------------------------------------
batch size is 358, window_length = 7, x_train.shape is (358, 7, 149) and y_train.shape is (358, 148)
Model: "model_7"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_12 (InputLayer) [(358, 7)] 0
embedding_14 (Embedding) (358, 7, 148) 22052
lstm_16 (LSTM) (358, 7, 148) 175824
time_distributed_11 (TimeDi (358, 7, 149) 22201
stributed)
=================================================================
Total params: 220,077
Trainable params: 220,077
Non-trainable params: 0
_________________________________________________________________
My idea was that the Embedding would take the input from x_train, including the agent_id , and would learn to encode it to an input of the size of y_train, which does not include the agent_id. The LSTM will then learn to deal with what it receives from the embedding to correctly predict y_train. However, the code above gives me the following error:
ValueError: Exception encountered when calling layer "model_7" (type Functional).
Input 0 of layer "lstm_16" is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: (None, 7, 149, 148)
I do not understand this error.
What I would thus like to ask is:
Does my idea even make sense? By implementing the agent_id directly with the events, can the LSTM learn the time series prediction for different agents?
How can I fix the error in my code? If it helps, I basically filled in the template from the first answer to this post: https://github.com/keras-team/keras/issues/2654
EDIT:
I have tried changing xin to xin = Input(batch_shape=(window_length,), dtype='int32') but now get a
Value error in the line where I say seq = ...: Input 0 of layer "lstm_26" is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (7, 133).
I also tried changing xin to xin = Input(batch_shape=(window_length,number_of_transactions+1), dtype='int32') but this produces ValueError: Input 0 of layer "model_11" is incompatible with the layer: expected shape=(None, 134), found shape=(None, 7, 134)
Note: I had to take a new sample today. The 134 replaces the 149 events from above
Yes, I think the idea is valid. By placing the agent-id as the first element in the sequence the RNN will encode this information in the state which is subsequently used to predict the probabilities for an event. One thing to watch out for is that the model will try to generate a prediction given just the first sequence element - the agent-id.
I think your issue is that you included the batch size in the input call, but batch size is implied and doesn't need to be defined, so
xin = Input(batch_shape=(batch_size, window_length), dtype='int32')
should become xin = Input(batch_shape=(window_length), dtype='int32')

LSTM predicting constant value throughout

I understand that it is a long post, but help in any of the sections is appreciated.
I have some queries about the prediction method of my LSTM model. Here is a general summary of my approach:
I used a dataset having 50 time series for training. They start with a value of 1.09 all the way up to 0.82, with each time series having between 570 to 2000 datapoints (i.e, each time series has a different length, but similar trend).
I converted them to the dataset accepted by keras' LSTM/Bi-LSTM layers in the format:
[1, 0.99, 0.98, 0.97] ==Output==> [0.96]
[0.99, 0.98, 0.97, 0.96] ==Output==> [0.95]
and so on..
Shapes of the input and output containers (arrays): input(39832, 5, 1) and output(39832, )
Error-free training
Prediction on an initial points of data (window) having shape (1, 5, 1). This has been taken from the actual data.
The predicted output is one value, which is appended to a separate list (for plotting), as well as appended to the window, and the first value of the window dropped out. This window is then fed as input to the model to generate the next prediction point.
Continue this until I get the whole curve for both models (LSTM and Bi-LSTM)
However, the prediction is not even close to the actual data. It flatlines to a fixed value, whereas it should be somewhat like the black curve (which is the actual data)
Figure:https://i.stack.imgur.com/Ofw7m.png
Model (similar code goes for Bi-LSTM model):
model_lstm = Sequential()
model_lstm.add(LSTM(128, input_shape=(timesteps, 1), return_sequences= True))
model_lstm.add(Dropout(0.2))
model_lstm.add(LSTM(128, return_sequences= False))
model_lstm.add(Dropout(0.2))
model_lstm.add(Dense(1))
model_lstm.compile(loss = 'mean_squared_error', optimizer = optimizers.Adam(0.001))
Curve prediction initialize:
start = cell_to_test[0:timesteps].reshape(1, timesteps, 1)
y_curve_lstm = list(start.flatten())
y_window = start
Curve prediction:
while len(y_curve_lstm) <= len(cell_to_test):
yhat = model_lstm.predict(y_window)
yhat = float(yhat)
y_curve_lstm.append(yhat)
y_window = list(y_window.flatten())
y_window.append(yhat)
y_window.remove(y_window[0])
y_window = np.array(y_window).reshape(1, timesteps, 1)
#print(yhat)
Model summary:
Model: "sequential_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_5 (LSTM) (None, 5, 128) 66560
_________________________________________________________________
dropout_5 (Dropout) (None, 5, 128) 0
_________________________________________________________________
lstm_6 (LSTM) (None, 128) 131584
_________________________________________________________________
dropout_6 (Dropout) (None, 128) 0
_________________________________________________________________
dense_5 (Dense) (None, 1) 129
=================================================================
Total params: 198,273
Trainable params: 198,273
Non-trainable params: 0
_________________________________________________________________
And in addition to diagnosing the problem, I am really trying to find the answers to the following questions (I looked up other sources, but in vain):
Is my data enough to train the LSTM model? I have been told that it requires thousands of data points, so I feel that my current dataset more than suffices the condition.
Is my model less/more complex than it needs to be?
Does increasing the number of epochs, layers, and the neurons per layer always lead to a 'better' model, or are there optimal values for the same? If the latter, then is there a method to find this optimal point, or is hit-and-trail the only way?
I trained with the number of epochs=25, which gave me a loss of 1.25 * 10e-4. Should the loss be lower for the model to predict the trend? (I am focused on getting the shape first, accuracy later, because the training takes too long with higher epochs)
In continuation to the previous question, does loss have the same unit as the data? The reason why I am asking this is because the data has a resolution of up to 10e-7.
Once again, I understand that it has been a long post, but help in any of the sections is appreciated.

Keras model.summary function displays incosistent output format

I am studying the in and outs of Keras. So, in this aspect I was checking the model.summary() function.
I was using a simple image classification example provided by Keras itself and loaded the various pretrained models provided (Xception, VGG16 etc).
I checked each model architecture using model.summary() as mentioned. Then I noticed that for some reason the column Connected to (4th column that is) is not present to every model summary. For example for MobileNetV2 I get (just the first few lines are shown):
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 224, 224, 3) 0
__________________________________________________________________________________________________
Conv1_pad (ZeroPadding2D) (None, 225, 225, 3) 0 input_1[0][0]
__________________________________________________________________________________________________
Conv1 (Conv2D) (None, 112, 112, 32) 864 Conv1_pad[0][0]
but for MobileNet I get:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 224, 224, 3) 0
_________________________________________________________________
conv1_pad (ZeroPadding2D) (None, 225, 225, 3) 0
_________________________________________________________________
conv1 (Conv2D) (None, 112, 112, 32) 864
This output is performed without taking any extra action after the model loading (no training, neither inference etc).
This seems odd and I am not sure what's going on here. For example when creating this simple model from this question here (up to the model0.fit(...) part) and running model0.summary() gives me a summary without Connected to column also contrary to the posted summary in this question.
So, this change to the output? What's the deal with model.summary()? Do we have some control over the output (although the examples above do not imply that)? Or the output has to do with the way a model was structured?
Edit:
I added the (trivial) code used to reproduce the summary of both models as requested in a comment.
from keras.applications.mobilenet_v2 import MobileNetV2
from keras.applications.mobilenet import MobileNet
model1 = MobileNetV2(weights='imagenet')
print(model1.summary())
model2 = MobileNet(weights='imagenet')
print(model2.summary())
Also, my system uses Keras 2.2.4, Tensorflow 1.12.0 and Ubuntu 16.04 if these info are useful somehow.
I suppose the reason is: MobileNetV2 was implemented keras.Model, but MobileNet is keras.Sequential.
Both Model and Sequential have a summary method. While running, it invokes print_summary method, which acts differently for sequential-like and non-sequential models:
if sequential_like:
line_length = line_length or 65
positions = positions or [.45, .85, 1.]
if positions[-1] <= 1:
positions = [int(line_length * p) for p in positions]
# header names for the different log elements
to_display = ['Layer (type)', 'Output Shape', 'Param #']
else:
line_length = line_length or 98
positions = positions or [.33, .55, .67, 1.]
if positions[-1] <= 1:
positions = [int(line_length * p) for p in positions]
# header names for the different log elements
to_display = ['Layer (type)',
'Output Shape',
'Param #',
'Connected to']
relevant_nodes = []
for v in model._nodes_by_depth.values():
relevant_nodes += v
(link). As you can see, it just doesn't print 'Connected to' for a sequential-like model.
I guess the reason is that sequential model doesn't allow to connect layers in non-sequential order - so, they just connected one-by-one.
Also, it checks a model type via model.__class__.__name__ == 'Sequential' (link). I doubt that it's a good idea trying to cahnge it "on-the-fly" to obtain a different output.

Keras SimpleRNN confusion

...coming from TensorFlow, where pretty much any shape and everything is defined explicitly, I am confused about Keras' API for recurrent models. Getting an Elman network to work in TF was pretty easy, but Keras resists to accept the correct shapes...
For example:
x = k.layers.Input(shape=(2,))
y = k.layers.Dense(10)(x)
m = k.models.Model(x, y)
...works perfectly and according to model.summary() I get an input layer with shape (None, 2), followed by a dense layer with output shape (None, 10). Makes sense since Keras automatically adds the first dimension for batch processing.
However, the following code:
x = k.layers.Input(shape=(2,))
y = k.layers.SimpleRNN(10)(x)
m = k.models.Model(x, y)
raises an exception ValueError: Input 0 is incompatible with layer simple_rnn_1: expected ndim=3, found ndim=2.
It works only if I add another dimension:
x = k.layers.Input(shape=(2,1))
y = k.layers.SimpleRNN(10)(x)
m = k.models.Model(x, y)
...but now, of course, my input would not be (None, 2) anymore.
model.summary():
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 2, 1) 0
_________________________________________________________________
simple_rnn_1 (SimpleRNN) (None, 10) 120
=================================================================
How can I have an input of type batch_size x 2 when I just want to feed vectors with 2 values to the network?
Furthermore, how would I chain RNN cells?
x = k.layers.Input(shape=(2, 1))
h = k.layers.SimpleRNN(10)(x)
y = k.layers.SimpleRNN(10)(h)
m = k.models.Model(x, y)
...raises the same exception with incompatible dim sizes.
This sample here works:
x = k.layers.Input(shape=(2, 1))
h = k.layers.SimpleRNN(10, return_sequences=True)(x)
y = k.layers.SimpleRNN(10)(h)
m = k.models.Model(x, y)
...but then layer h does not output (None, 10) anymore, but (None, 2, 10) since it returns the whole sequence instead of just the "regular" RNN cell output.
Why is this needed at all?
Moreover: where are the states? Do they just default to 1 recurrent state?
The documentation touches on the expected shapes of recurrent components in Keras, let's look at your case:
Any RNN layer in Keras expects a 3D shape (batch_size, timesteps, features). This means you have timeseries data.
The RNN layer then iterates over the second, time dimension of the input using a recurrent cell, the actual recurrent computation.
If you specify return_sequences then you collect the output for every timestep getting another 3D tensor (batch_size, timesteps, units) otherwise you only get the last output which is (batch_size, units).
Now returning to your questions:
You mention vectors but shape=(2,) is a vector so this doesn't work. shape=(2,1) works because now you have 2 vectors of size 1, these shapes exclude batch_size. So to feed vectors of size to you need shape=(how_many_vectors, 2) where the first dimension is the number of vectors you want your RNN to process, the timesteps in this case.
To chain RNN layers you need to feed 3D data because that what RNNs expect. When you specify return_sequences the RNN layer returns output at every timestep so that can be chained to another RNN layer.
States are collection of vectors that a RNN cell uses, LSTM uses 2, GRU has 1 hidden state which is also the output. They default to 0s but can be specified when calling the layer using initial_states=[...] as a list of tensors.
There is already a post about the difference between RNN layers and RNN cells in Keras which might help clarify the situation further.

Keras dimensionality in convolutional layer mismatch

I'm trying to play around with Keras to build my first neural network. I have zero experience and I can't seem to figure out why my dimensionality isn't right. I can't figure it out from their docs what this error is complaining about, or even what layer is causing it.
My model takes in a 32byte array of numbers, and is supposed to give a boolean value on the other side. I want a 1D convolution on the input byte array.
arr1 is the 32byte array, arr2 is an array of booleans.
inputData = np.array(arr1)
inputData = np.expand_dims(inputData, axis = 2)
labelData = np.array(arr2)
print inputData.shape
print labelData.shape
model = k.models.Sequential()
model.add(k.layers.convolutional.Convolution1D(32,2, input_shape = (32, 1)))
model.add(k.layers.Activation('relu'))
model.add(k.layers.convolutional.Convolution1D(32,2))
model.add(k.layers.Activation('relu'))
model.add(k.layers.convolutional.Convolution1D(32,2))
model.add(k.layers.Activation('relu'))
model.add(k.layers.convolutional.Convolution1D(32,2))
model.add(k.layers.Activation('relu'))
model.add(k.layers.core.Dense(32))
model.add(k.layers.Activation('sigmoid'))
model.compile(loss = 'binary_crossentropy',
optimizer = 'rmsprop',
metrics=['accuracy'])
model.fit(
inputData,labelData
)
The output of the print of shapes are
(1000, 32, 1) and (1000,)
The error I receive is:
Traceback (most recent call last): File "cnn/init.py", line
50, in
inputData,labelData File "/home/steve/Documents/cnn/env/local/lib/python2.7/site-packages/keras/models.py",
line 863, in fit
initial_epoch=initial_epoch) File "/home/steve/Documents/cnn/env/local/lib/python2.7/site-packages/keras/engine/training.py",
line 1358, in fit
batch_size=batch_size) File "/home/steve/Documents/cnn/env/local/lib/python2.7/site-packages/keras/engine/training.py",
line 1238, in _standardize_user_data
exception_prefix='target') File "/home/steve/Documents/cnn/env/local/lib/python2.7/site-packages/keras/engine/training.py",
line 128, in _standardize_input_data
str(array.shape)) ValueError: Error when checking target: expected activation_5 to have 3 dimensions, but got array with shape (1000, 1)
Well It seems to me that you need to google a bit more about convolutional networks :-)
You are applying at each step 32 filters of length 2 over yout sequence. So if we follow the dimensions of the tensors after each layer :
Dimensions : (None, 32, 1)
model.add(k.layers.convolutional.Convolution1D(32,2, input_shape = (32, 1)))
model.add(k.layers.Activation('relu'))
Dimensions : (None, 31, 32)
(your filter of length 2 goes over the whole sequence so the sequence is now of length 31)
model.add(k.layers.convolutional.Convolution1D(32,2))
model.add(k.layers.Activation('relu'))
Dimensions : (None, 30, 32)
(you lose again one value because of your filters of length 2, but you still have 32 of them)
model.add(k.layers.convolutional.Convolution1D(32,2))
model.add(k.layers.Activation('relu'))
Dimensions : (None, 29, 32)
(same...)
model.add(k.layers.convolutional.Convolution1D(32,2))
model.add(k.layers.Activation('relu'))
Dimensions : (None, 28, 32)
Now you want to use a Dense layer on top of that... the thing is that the Dense layer will work as follow on your 3D input :
model.add(k.layers.core.Dense(32))
model.add(k.layers.Activation('sigmoid'))
Dimensions : (None, 28, 32)
This is your output. First thing that I find weird is that you want 32 outputs out of your dense layer... You should have put 1 instead of 32. But even this will not fix your problem. See what happens if we change the last layer :
model.add(k.layers.core.Dense(1))
model.add(k.layers.Activation('sigmoid'))
Dimensions : (None, 28, 1)
This happens because you apply a dense layer to a '2D' tensor. What it does in case you apply a dense(1) layer to an input [28, 32] is that it produces a weight matrix of shape (32,1) that it applies to the 28 vectors so that you find yourself with 28 outputs of size 1.
What I propose to fix this is to change the last 2 layers like this :
model = k.models.Sequential()
model.add(k.layers.convolutional.Convolution1D(32,2, input_shape = (32, 1)))
model.add(k.layers.Activation('relu'))
model.add(k.layers.convolutional.Convolution1D(32,2))
model.add(k.layers.Activation('relu'))
model.add(k.layers.convolutional.Convolution1D(32,2))
model.add(k.layers.Activation('relu'))
# Only use one filter so that the output will be a sequence of 28 values, not a matrix.
model.add(k.layers.convolutional.Convolution1D(1,2))
model.add(k.layers.Activation('relu'))
# Change the shape from (None, 28, 1) to (None, 28)
model.add(k.layers.core.Flatten())
# Only one neuron as output to get the binary target.
model.add(k.layers.core.Dense(1))
model.add(k.layers.Activation('sigmoid'))
Now the last two steps will take your tensor from
(None, 29, 32) -> (None, 28, 1) -> (None, 28) -> (None, 1)
I hope this helps you.
ps. if you were wondering what None is , it's the dimension of the batch, you don't feed the 1000 samples at onces, you feed it batch by batch and as the value depends on what is chosen, by convension we put None.
EDIT :
Explaining a bit more why the sequences length loses one value at each step.
Say you have a sequence of 4 values [x1 x2 x3 x4], you want to use your filter of length 2 [f1 f2] to convolve over the sequence. The first value will be given by y1 = [f1 f2] * [x1 x2], the second will be y2 = [f1 f2] * [x2 x3], the third will be y3 = [f1 f2] * [x3 x4]. Then you reached the end of your sequence and cannot go further. You have as a result a sequnce [y1 y2 y3].
This is due to the filter length and the effects at the borders of your sequence. There are multiple options, some pad the sequence with 0's in order to get exactly the same length of output... You can chose that option with the parameter 'padding'. You can read more about this here and find the different values possible for the padding argument here. I encourage you to read this last link, it gives informations about input and output shapes...
From the doc :
padding: One of "valid" or "same" (case-insensitive). "valid" means "no padding". "same" results in padding the input such that the output has the same length as the original input.
the default is 'valid', so you don't pad in your example.
I also recommend you to upgrade your keras version to the latest. Convolution1D is now Conv1D, so you might find the doc and tutorials confusing.

Categories