I have two Keras (Tensorflow backend) models, which are stacked to make a combined model:
small_model with In: (None,K), Out: (None,K)
large_model with In: (None,N,K), Out: (None,1)
combined_model (N x small_model -> large_model) with In: (None,N,K), Out: (None,1)
large_model needs N stacked outputs from small_model as input.
I can define N small_models, which share weights, then concatenate their outputs (technically, I need to stack them), and then send that to large_model, as in the code below.
My problem is that I need to be able to do this for very large N (> 10**6), and that my current solution uses a lot of memory and time when creating the models, even for N ~ 10**2.
I'm hoping that there is a solution which sends the N data points through small_model in parallel (like what is done when giving a batch to a model), collects those points (with the Keras history, so that backprop is possible) and sends that to large_model, without having to define the N instances of small_model. The listed input and output shapes for the three models should not change, but other intermediate models can of course be defined.
Thank you.
Current unsatisfactory solution (assume that small_model and large_model already exist, and that N,K are defined):
from keras.layers import Input, Lambda
from keras.models import Model
from keras import backend as K
def build_small_model_on_batch():
def distribute_inputs_to_small_model(input):
return [small_model(input[:,i]) for i in range(N)]
def stacker(list_of_tensors):
return K.stack(list_of_tensors, axis=1)
input = Input(shape=(N,K,))
small_model_outputs = Lambda(distribute_inputs_to_small_model)(input)
stacked_small_model_outputs = Lambda(stacker)(small_model_outputs)
return Model(input, stacked_small_model_outputs)
def build_combined():
input = Input(shape=(N,K,))
stacked_small_model_outputs = small_model_on_batch(input)
output = large_model(stacked_small_model_outputs)
return Model(input, output)
small_model_on_batch = build_small_model_on_batch()
combined = build_combined()
You can do that with a TimeDistributed layer wrapper:
from keras.layers import Input, Dense, TimeDistributed
from keras.models import Sequential, Model
N = None # Use fixed value if you do not want variable input size
K = 20
def small_model():
inputs = Input(shape=(K,))
# Define the small model
# Here it is just a single dense layer
outputs = Dense(K, activation='relu')(inputs)
return Model(inputs=inputs, outputs=outputs)
def large_model():
inputs = Input(shape=(N, K))
# Define the large model
# Just a single neuron here
outputs = Dense(1, activation='relu')(inputs)
return Model(inputs=inputs, outputs=outputs)
def combined_model():
inputs = Input(shape=(N, K))
# The TimeDistributed layer applies the given model
# to every input across dimension 1 (N)
small_model_out = TimeDistributed(small_model())(inputs)
# Apply large model
outputs = large_model()(small_model_out)
return Model(inputs=inputs, outputs=outputs)
model = combined_model()
model.compile(loss='mean_squared_error', optimizer='sgd')
model.summary()
Output:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, None, 20) 0
_________________________________________________________________
time_distributed_1 (TimeDist (None, None, 20) 420
_________________________________________________________________
model_2 (Model) (None, None, 1) 21
=================================================================
Total params: 441
Trainable params: 441
Non-trainable params: 0
_________________________________________________________________
Related
I have a dir with NumPy array files: bias1.npy, kernel1.npy, bias2.npy, kernel2.npy. How can I build a TF model that uses those arrays as kernels and biases of layers?
To avoid confusion bias matrix for the consistency of the numpy file is the 2D matrix with one column. This post shows how did I reproduce tf's model based on the numpy weights and biases.
class NumpyInitializer(tf.keras.initializers.Initializer):
# custom class converting numpy arrays to tf's initializers
# used to initialize both kernel and bias
def __init__(self, array):
# convert numpy array into tensor
self.array = tf.convert_to_tensor(array.tolist())
def __call__(self, shape, dtype=None):
# return tensor
return self.array
def restore_model_from_numpy(directory):
"""
Recreate model from the numpy files.
Numpy files in the directory are ordered by layers
and bias numpy matrix comes before numpy weight matrix.
In example:
directory-
- L1B.npy //numpy bias matrix for layer 1
- L1W.npy //numpy weights matrix for layer 1
- L2B.npy //numpy bias matrix for layer 2
- L2W.npy //numpy weights matrix for layer 2
Parameters:
directory - path to the directory with numpy files
Return:
tf's model recreated from numpy files
"""
def file_iterating(directory):
"""
Iterate over directory and create
dictionary of layers number and it's structure
layers[layer_number] = [numpy_bias_matrix, numpy_weight_matrix]
"""
pathlist = Path(directory).rglob("*.npy") # list of numpy files
layers = {} # initialize dictionary
index = 0
for file in pathlist: # iterate over file in the directory
if index % 2 == 0:
layers[int(index/2)] = [] # next layer - new key in dictionary
layers[int(index/2)].append(np.load(file)) # add to dictionary bias or weight
index +=1
print(file) # optional to show list of files we deal with
return layers # return dictionary
layers = file_iterating(directory) # get dictionary with model structure
inputs = Input(shape = (np.shape(layers[0][1])[0])) # create first model input layer
x = inputs
for key, value in layers.items(): # iterate over all levers in the layers dictionary
bias_initializer = NumpyInitializer(layers[key][0][0]) # create bias initializer for key's layer
kernal_initializer = NumpyInitializer(layers[key][1]) # create weights initializer for key's layer
layer_size = np.shape(layers[key][0])[-1] # get the size of the layer
new_layer = tf.keras.layers.Dense( # initialize new Dense layer
units = layer_size,
kernel_initializer=kernal_initializer,
bias_initializer = bias_initializer,
activation="tanh")
x = new_layer(x) # stack layer at the top of the previous layer
model = tf.keras.Model(inputs, x) # create tf's model based on the stacked layers
model.compile() # compile model
return model # return compiled model
In my directory, I had 4 numpy files (layer 1 - L1 and layer 2 - L2):
100_5_25_1Knapsack_Layer1\100_5_25_1Knapsack\L1B.npy , shape: (1, 80)
100_5_25_1Knapsack_Layer1\100_5_25_1Knapsack\L1W.npy , shape: (100, 80)
100_5_25_1Knapsack_Layer1\100_5_25_1Knapsack\L2B.npy , shape: (1, 100)
100_5_25_1Knapsack_Layer1\100_5_25_1Knapsack\L2W.npy , shape: (80, 100)
Calling the function result in:
m = restore_model_from_numpy(my_numpy_files_directory)
m.summary()
Model: "model_592"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_312 (InputLayer) [(None, 100)] 0
_________________________________________________________________
dense_137 (Dense) (None, 80) 8080
_________________________________________________________________
dense_138 (Dense) (None, 100) 8100
=================================================================
Total params: 16,180
Trainable params: 16,180
Non-trainable params: 0
_________________________________________________________________
I hope that this post will be helpful to anyone as it's my first one.
Happy coding :D
I want to obtain the output of intermediate sub-model layers with tf2.keras.Here is a model composed of two sub-modules:
input_shape = (100, 100, 3)
def model1():
input = tf.keras.layers.Input(input_shape)
cov = tf.keras.layers.Conv2D(filters=32, kernel_size=3, strides=1,name='cov1')(input)
embedding_model = tf.keras.Model(input,cov,name='model1')
return embedding_model
def model2(embedding_model):
input_sequence = tf.keras.layers.Input((None,) + input_shape)
sequence_embedding = tf.keras.layers.TimeDistributed(embedding_model,name='time_dis1')
emb = sequence_embedding(input_sequence)
att = tf.keras.layers.Attention()([emb,emb])
dense1 = tf.keras.layers.Dense(64,name='dense1')(att)
outputs = tf.keras.layers.Softmax()(dense1)
final_model = tf.keras.Model(inputs=input_sequence, outputs=outputs,name='model2')
return final_model
embedding_model = model1()
model2 = model2(embedding_model)
print(model2.summary())
output:
Model: "model2"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_2 (InputLayer) [(None, None, 100, 1 0
__________________________________________________________________________________________________
time_dis1 (TimeDistributed) (None, None, 98, 98, 896 input_2[0][0]
__________________________________________________________________________________________________
attention (Attention) (None, None, 98, 98, 0 time_dis1[0][0]
time_dis1[0][0]
__________________________________________________________________________________________________
dense1 (Dense) (None, None, 98, 98, 2112 attention[0][0]
__________________________________________________________________________________________________
softmax (Softmax) (None, None, 98, 98, 0 dense1[0][0]
==================================================================================================
Total params: 3,008
Trainable params: 3,008
Non-trainable params: 0
and then,I want to get output intermediate layer of model1 and model2:
model1_output_layer = model2.get_layer('time_dis1').layer.get_layer('cov1')
output1 = model1_output_layer.get_output_at(0)
output2 = model2.get_layer('dense1').get_output_at(0)
output_tensors = [output1,output2]
model2_input = model2.input
submodel = tf.keras.Model([model2_input],output_tensors)
input_data2 = np.zeros((1,10,100,100,3))
result = submodel.predict([input_data2])
print(result)
Running in tf2.3 ,the error I am getting is:
File "/Users/bouluoyu/anaconda/envs/tf2/lib/python3.6/site-packages/tensorflow/python/keras/engine/functional.py", line 115, in __init__
self._init_graph_network(inputs, outputs)
File "/Users/bouluoyu/anaconda/envs/tf2/lib/python3.6/site-packages/tensorflow/python/training/tracking/base.py", line 457, in _method_wrapper
result = method(self, *args, **kwargs)
File "/Users/bouluoyu/anaconda/envs/tf2/lib/python3.6/site-packages/tensorflow/python/keras/engine/functional.py", line 191, in _init_graph_network
self.inputs, self.outputs)
File "/Users/bouluoyu/anaconda/envs/tf2/lib/python3.6/site-packages/tensorflow/python/keras/engine/functional.py", line 931, in _map_graph_network
str(layers_with_complete_input))
ValueError: Graph disconnected: cannot obtain value for tensor Tensor("input_1:0", shape=(None, 100, 100, 3), dtype=float32) at layer "cov1". The following previous layers were accessed without issue: ['time_dis1', 'attention', 'dense1']
But the following code works:
model1_input = embedding_model.input
model2_input = model2.input
submodel = tf.keras.Model([model1_input,model2_input],output_tensors)
input_data1 = np.zeros((1,100,100,3))
input_data2 = np.zeros((1,10,100,100,3))
result = submodel.predict([input_data1,input_data2])
print(result)
But not what I want.This is strange, model1 is part of model2, so why do we need to input an extra tensor.Sometimes,it is hard to get an extra tensor,especially for complex models.
so why do we need to input an extra tensor
Short answer is, TensorFlow doesn't know to make the connection between the inputs you expect it to make. The problem arises because you're passing a Model (instead of a Layer) to your TimeDistributed layer. This leave the Input layer of your model1 hanging, unless you explicitly pass it an input. The TimeDistributed layer is not smart enough to handle models in this way.
My solution would depend on the answer to the following question,
Why do you need model1? All it has is a Conv2D layer. You can easily do
sequence_embedding = tf.keras.layers.TimeDistributed(
tf.keras.layers.Conv2D(filters=32, kernel_size=3, strides=1,name='cov1'),
name='time_dis1'
)
If you do this, now you gotta change the following lines,
model1_output_layer = model2.get_layer('time_dis1').layer.get_layer('cov1')
output1 = model1_output_layer.get_output_at(0)
to something like (the exact output you want will depend on what you're actually after)
model1_output_layer = model2.get_layer('time_dis1')
output1 = model1_output_layer.output
# This output1 may need further processing depending on what you need
# e.g. if you need mean embeddings over time axis
output_mean = tf.keras.layers.Average(output1, axis=1)
This is because you can't access the output of the layer nested by a TimeDistributed layer. Because the layer passed to the TimeDistributed layer doesn't actually do anything. And it doesn't have a defined output. It's just sitting there as a template for the TimeDistributed layer to compute the output using it. So, to get the output from a TimeDistributed layer, you need to access it via that layer.
You try to do it as you have it (instead of my way), you'll get,
AttributeError: Layer cov1 has no inbound nodes.
You may ask, "why did it work before"?
It's because, before you had a Model there instead of a Layer. Because the Conv2D layer was wrapped by the model, the layer output was defined (because it had an Input layer). And this feeds back to the reason, why it complained about the missing Input from model1 when trying to define the submodel.
I know this explanation may make your head spin as the reasons behind this error are quite convoluted. But going through it a few times will hopefully help.
How to pick one execution flow at random, out of several alternatives, in a trainable fashion? For example:
import random
from tensorflow import keras
class RandomModel(keras.Model):
def __init__(self, model_set):
super(RandomModel, self).__init__()
self.models = model_set
def call(self, inputs):
"""Calls one of its models at random"""
return random.sample(self.models, 1)[0](inputs)
def new_model():
return keras.Sequential([
keras.layers.Dense(10, activation='softmax')
])
model = RandomModel({new_model(), new_model()})
model.build(input_shape=(32, 784))
model.summary()
While this code runs, it doesn't seem to allow gradients to backpropagate. This is its output:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
Total params: 0
Trainable params: 0
Non-trainable params: 0
_________________________________________________________________
I found a way to do this. However, execution is slow, because of nested tf.cond operations:
def random_network_applied_to_inputs(inputs, networks):
"""
Returns a tf.cond tree that does binary search
before applying a network to the inputs.
"""
length = len(networks)
index = tf.random.uniform(
shape=[],
minval=0,
maxval=length,
dtype=tf.dtypes.int32
)
def branch(lower_bound, upper_bound):
if lower_bound + 1 == upper_bound:
return networks[lower_bound](inputs)
else:
center = (lower_bound + upper_bound) // 2
return tf.cond(
pred=index < center,
true_fn=lambda: branch(lower_bound, center),
false_fn=lambda: branch(center, upper_bound)
)
return branch(0, length)
I have trained a binary classification model with CNN, and here is my code
model = Sequential()
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1],
border_mode='valid',
input_shape=input_shape))
model.add(Activation('relu'))
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=pool_size))
# (16, 16, 32)
model.add(Convolution2D(nb_filters*2, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(Convolution2D(nb_filters*2, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=pool_size))
# (8, 8, 64) = (2048)
model.add(Flatten())
model.add(Dense(1024))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(2)) # define a binary classification problem
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adadelta',
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=batch_size,
nb_epoch=nb_epoch,
verbose=1,
validation_data=(x_test, y_test))
And here, I wanna get the output of each layer just like TensorFlow, how can I do that?
You can easily get the outputs of any layer by using: model.layers[index].output
For all layers use this:
from keras import backend as K
inp = model.input # input placeholder
outputs = [layer.output for layer in model.layers] # all layer outputs
functors = [K.function([inp, K.learning_phase()], [out]) for out in outputs] # evaluation functions
# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = [func([test, 1.]) for func in functors]
print layer_outs
Note: To simulate Dropout use learning_phase as 1. in layer_outs otherwise use 0.
Edit: (based on comments)
K.function creates theano/tensorflow tensor functions which is later used to get the output from the symbolic graph given the input.
Now K.learning_phase() is required as an input as many Keras layers like Dropout/Batchnomalization depend on it to change behavior during training and test time.
So if you remove the dropout layer in your code you can simply use:
from keras import backend as K
inp = model.input # input placeholder
outputs = [layer.output for layer in model.layers] # all layer outputs
functors = [K.function([inp], [out]) for out in outputs] # evaluation functions
# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = [func([test]) for func in functors]
print layer_outs
Edit 2: More optimized
I just realized that the previous answer is not that optimized as for each function evaluation the data will be transferred CPU->GPU memory and also the tensor calculations needs to be done for the lower layers over-n-over.
Instead this is a much better way as you don't need multiple functions but a single function giving you the list of all outputs:
from keras import backend as K
inp = model.input # input placeholder
outputs = [layer.output for layer in model.layers] # all layer outputs
functor = K.function([inp, K.learning_phase()], outputs ) # evaluation function
# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = functor([test, 1.])
print layer_outs
From https://keras.io/getting-started/faq/#how-can-i-obtain-the-output-of-an-intermediate-layer
One simple way is to create a new Model that will output the layers that you are interested in:
from keras.models import Model
model = ... # include here your original model
layer_name = 'my_layer'
intermediate_layer_model = Model(inputs=model.input,
outputs=model.get_layer(layer_name).output)
intermediate_output = intermediate_layer_model.predict(data)
Alternatively, you can build a Keras function that will return the output of a certain layer given a certain input, for example:
from keras import backend as K
# with a Sequential model
get_3rd_layer_output = K.function([model.layers[0].input],
[model.layers[3].output])
layer_output = get_3rd_layer_output([x])[0]
Based on all the good answers of this thread, I wrote a library to fetch the output of each layer. It abstracts all the complexity and has been designed to be as user-friendly as possible:
https://github.com/philipperemy/keract
It handles almost all the edge cases.
Hope it helps!
Following looks very simple to me:
model.layers[idx].output
Above is a tensor object, so you can modify it using operations that can be applied to a tensor object.
For example, to get the shape model.layers[idx].output.get_shape()
idx is the index of the layer and you can find it from model.summary()
This answer is based on: https://stackoverflow.com/a/59557567/2585501
To print the output of a single layer:
from tensorflow.keras import backend as K
layerIndex = 1
func = K.function([model.get_layer(index=0).input], model.get_layer(index=layerIndex).output)
layerOutput = func([input_data]) # input_data is a numpy array
print(layerOutput)
To print output of every layer:
from tensorflow.keras import backend as K
for layerIndex, layer in enumerate(model.layers):
func = K.function([model.get_layer(index=0).input], layer.output)
layerOutput = func([input_data]) # input_data is a numpy array
print(layerOutput)
I wrote this function for myself (in Jupyter) and it was inspired by indraforyou's answer. It will plot all the layer outputs automatically. Your images must have a (x, y, 1) shape where 1 stands for 1 channel. You just call plot_layer_outputs(...) to plot.
%matplotlib inline
import matplotlib.pyplot as plt
from keras import backend as K
def get_layer_outputs():
test_image = YOUR IMAGE GOES HERE!!!
outputs = [layer.output for layer in model.layers] # all layer outputs
comp_graph = [K.function([model.input]+ [K.learning_phase()], [output]) for output in outputs] # evaluation functions
# Testing
layer_outputs_list = [op([test_image, 1.]) for op in comp_graph]
layer_outputs = []
for layer_output in layer_outputs_list:
print(layer_output[0][0].shape, end='\n-------------------\n')
layer_outputs.append(layer_output[0][0])
return layer_outputs
def plot_layer_outputs(layer_number):
layer_outputs = get_layer_outputs()
x_max = layer_outputs[layer_number].shape[0]
y_max = layer_outputs[layer_number].shape[1]
n = layer_outputs[layer_number].shape[2]
L = []
for i in range(n):
L.append(np.zeros((x_max, y_max)))
for i in range(n):
for x in range(x_max):
for y in range(y_max):
L[i][x][y] = layer_outputs[layer_number][x][y][i]
for img in L:
plt.figure()
plt.imshow(img, interpolation='nearest')
From: https://github.com/philipperemy/keras-visualize-activations/blob/master/read_activations.py
import keras.backend as K
def get_activations(model, model_inputs, print_shape_only=False, layer_name=None):
print('----- activations -----')
activations = []
inp = model.input
model_multi_inputs_cond = True
if not isinstance(inp, list):
# only one input! let's wrap it in a list.
inp = [inp]
model_multi_inputs_cond = False
outputs = [layer.output for layer in model.layers if
layer.name == layer_name or layer_name is None] # all layer outputs
funcs = [K.function(inp + [K.learning_phase()], [out]) for out in outputs] # evaluation functions
if model_multi_inputs_cond:
list_inputs = []
list_inputs.extend(model_inputs)
list_inputs.append(0.)
else:
list_inputs = [model_inputs, 0.]
# Learning phase. 0 = Test mode (no dropout or batch normalization)
# layer_outputs = [func([model_inputs, 0.])[0] for func in funcs]
layer_outputs = [func(list_inputs)[0] for func in funcs]
for layer_activations in layer_outputs:
activations.append(layer_activations)
if print_shape_only:
print(layer_activations.shape)
else:
print(layer_activations)
return activations
Previous solutions were not working for me. I handled this issue as shown below.
layer_outputs = []
for i in range(1, len(model.layers)):
tmp_model = Model(model.layers[0].input, model.layers[i].output)
tmp_output = tmp_model.predict(img)[0]
layer_outputs.append(tmp_output)
Wanted to add this as a comment (but don't have high enough rep.) to #indraforyou's answer to correct for the issue mentioned in #mathtick's comment. To avoid the InvalidArgumentError: input_X:Y is both fed and fetched. exception, simply replace the line outputs = [layer.output for layer in model.layers] with outputs = [layer.output for layer in model.layers][1:], i.e.
adapting indraforyou's minimal working example:
from keras import backend as K
inp = model.input # input placeholder
outputs = [layer.output for layer in model.layers][1:] # all layer outputs except first (input) layer
functor = K.function([inp, K.learning_phase()], outputs ) # evaluation function
# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = functor([test, 1.])
print layer_outs
p.s. my attempts trying things such as outputs = [layer.output for layer in model.layers[1:]] did not work.
Assuming you have:
1- Keras pre-trained model.
2- Input x as image or set of images. The resolution of image should be compatible with dimension of the input layer. For example 80*80*3 for 3-channels (RGB) image.
3- The name of the output layer to get the activation. For example, "flatten_2" layer. This should be include in the layer_names variable, represents name of layers of the given model.
4- batch_size is an optional argument.
Then you can easily use get_activation function to get the activation of the output layer for a given input x and pre-trained model:
import six
import numpy as np
import keras.backend as k
from numpy import float32
def get_activations(x, model, layer, batch_size=128):
"""
Return the output of the specified layer for input `x`. `layer` is specified by layer index (between 0 and
`nb_layers - 1`) or by name. The number of layers can be determined by counting the results returned by
calling `layer_names`.
:param x: Input for computing the activations.
:type x: `np.ndarray`. Example: x.shape = (80, 80, 3)
:param model: pre-trained Keras model. Including weights.
:type model: keras.engine.sequential.Sequential. Example: model.input_shape = (None, 80, 80, 3)
:param layer: Layer for computing the activations
:type layer: `int` or `str`. Example: layer = 'flatten_2'
:param batch_size: Size of batches.
:type batch_size: `int`
:return: The output of `layer`, where the first dimension is the batch size corresponding to `x`.
:rtype: `np.ndarray`. Example: activations.shape = (1, 2000)
"""
layer_names = [layer.name for layer in model.layers]
if isinstance(layer, six.string_types):
if layer not in layer_names:
raise ValueError('Layer name %s is not part of the graph.' % layer)
layer_name = layer
elif isinstance(layer, int):
if layer < 0 or layer >= len(layer_names):
raise ValueError('Layer index %d is outside of range (0 to %d included).'
% (layer, len(layer_names) - 1))
layer_name = layer_names[layer]
else:
raise TypeError('Layer must be of type `str` or `int`.')
layer_output = model.get_layer(layer_name).output
layer_input = model.input
output_func = k.function([layer_input], [layer_output])
# Apply preprocessing
if x.shape == k.int_shape(model.input)[1:]:
x_preproc = np.expand_dims(x, 0)
else:
x_preproc = x
assert len(x_preproc.shape) == 4
# Determine shape of expected output and prepare array
output_shape = output_func([x_preproc[0][None, ...]])[0].shape
activations = np.zeros((x_preproc.shape[0],) + output_shape[1:], dtype=float32)
# Get activations with batching
for batch_index in range(int(np.ceil(x_preproc.shape[0] / float(batch_size)))):
begin, end = batch_index * batch_size, min((batch_index + 1) * batch_size, x_preproc.shape[0])
activations[begin:end] = output_func([x_preproc[begin:end]])[0]
return activations
In case you have one of the following cases:
error: InvalidArgumentError: input_X:Y is both fed and fetched
case of multiple inputs
You need to do the following changes:
add filter out for input layers in outputs variable
minnor change on functors loop
Minimum example:
from keras.engine.input_layer import InputLayer
inp = model.input
outputs = [layer.output for layer in model.layers if not isinstance(layer, InputLayer)]
functors = [K.function(inp + [K.learning_phase()], [x]) for x in outputs]
layer_outputs = [fun([x1, x2, xn, 1]) for fun in functors]
Well, other answers are very complete, but there is a very basic way to "see", not to "get" the shapes.
Just do a model.summary(). It will print all layers and their output shapes. "None" values will indicate variable dimensions, and the first dimension will be the batch size.
Generally, output size can be calculated as
[(W−K+2P)/S]+1
where
W is the input volume - in your case you have not given us this
K is the Kernel size - in your case 2 == "filter"
P is the padding - in your case 2
S is the stride - in your case 3
Another, prettier formulation:
I am a beginner with Keras and in writing Neural Networks models and actually I'm trying to write a LSTM for text-generation purpose, without success. What am I doing wrong?
I read this question: here
and other articles but there is something I am missing I can't get, sorry if I seem dumb.
The goal
My purpose is to generate english articles of a fixed length (1500 by now).
Suppose I have a 20k records dataset in sequences (articles, basically) of different lengths, I set a fixed length for all articles (MAX_SEQUENCE_LENGTH=1500) and tokenized them, getting a matrix (X, my training-data) looking like:
[[ 0 0 0 ... 88 664 206]
[ 0 0 0 ... 1 93 140]
[ 0 0 0 ... 3 173 2283]
...
[ 50 2761 4 ... 167 148 156]
[ 0 0 0 ... 10 77 206]
[ 0 0 0 ... 167 148 156]]
with a shape of 20000x1500
the output of my LSTM should be a 1 x MAX_SEQUENCE_LENGTH array of tokens.
My model looks like that:
def generator_model(sequence_input, embedded_sequences, output_shape):
layer = LSTM(16,return_sequences = True)(embedded_sequences)
layer = LSTM(32,return_sequences = True)(layer)
layer = Flatten()(layer)
output = Dense(output_shape, activation='softmax')(layer)
generator = Model(sequence_input, output)
return generator
with:
sequence_input = Input(batch_shape=(1, 1,1500), dtype='int32')
embedded_sequences = embedding_layer(sequence_input)
output_shape = MAX_SEQUENCE_LENGTH
the LSTM is supposed to train, with model.fit(), on a training-set of 20k x MAX_SEQUENCE_LENGTH shape (X).
and getting an array of tokens with 1 x MAX_SEQUENCE_LENGTH shape as output when I call model.predict(seed), with seed a random noise array.
compile, fit and predict
comments for the following section:
. generator.compile works, the model is given in edit section of ths post.
. generator.fit compile, epochs=1 param is for testing-purpose, will be BATCH_NUM
. now i have some doubts on the y I give to generator.fit, by now I'm giving a matrix of 0 as target output, if I generate it with a different shape from the X.shape[0], it throw the error, this means it needs to have a label for every record in X. but if I give him a matrix of 0 as target for model.fit, isn't it going to predict just arrays of 0?
. the error is giving is always the same, despite i use the noise_generator() or noise_integer_generator(), i believe it's because it doesn't like the y_shape param i'm giving
embedding_layer = load_embeddings(word_index)
sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,))
embedded_sequences = embedding_layer(sequence_input)
generator = generator_model(sequence_input, embedded_sequences, X.shape[1])
print(generator.summary())
generator.compile(loss='sparse_categorical_crossentropy', optimizer='adam')
Xnoise = generate_integer_noise(MAX_SEQUENCE_LENGTH)
y_shape = np.zeros((X.shape[0],), dtype=int)
generator.fit(X, y_shape, epochs=1)
acc = generator.predict(Xnoise, verbose=1)
But actually I'm getting the following error
ValueError: Error when checking input: expected input_1 to have shape (1500,) but got array with shape (1,)
when I call:
Xnoise = generate_noise(samples_number=MAX_SEQUENCE_LENGTH)
generator.predict(Xnoise, verbose=1)
The noise I give is a 1 x 1500 array, but it seems it's expecting a (1500,) matrix, So there must be some kind of error in the shape settings for my output.
Is my model correct for my purpose? or did I wrote something really really stupid I can't see?
Thanks for the help you can give me, I appreciate that!
edit
Changelog:
v1.
###
- Changed model structure, now return_sequences = True and using shape instead of batch_shape
###
- Changed
sequence_input = Input(batch_shape=(1,1,1500), dtype='int32')
to
sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,))
###
- Changed the error the model is giving
v2.
###
- Changed generate_noise() code
###
- Added generate_integer_noise() code
###
- Added full sequence with the model compile, fit and predict
###
- Added model.fit summary under the model summary, in the tail of the post
generate_noise() code:
def generate_noise(samples_number, mean=0.5, stdev=0.1):
noise = np.random.normal(mean, stdev, (samples_number, MAX_SEQUENCE_LENGTH))
print(noise.shape)
return noise
which print: (1500,)
generate_integer_noise() code:
def generate_integer_noise(samples_number):
noise = []
for _ in range(0, samples_number):
noise.append(np.random.randint(1, MAX_NB_WORDS))
Xnoise = np.asarray(noise)
return Xnoise
my function load_embeddings() is as follow:
def load_embeddings(word_index, embeddingsfile='Embeddings/glove.6B.%id.txt' %EMBEDDING_DIM):
embeddings_index = {}
f = open(embeddingsfile, 'r', encoding='utf8')
for line in f:
values = line.split(' ') #split the line by spaces
word = values[0] #each line starts with the word
coefs = np.asarray(values[1:], dtype='float32') #the rest of the line is the vector
embeddings_index[word] = coefs #put into embedding dictionary
f.close()
print('Found %s word vectors.' % len(embeddings_index))
embedding_matrix = np.zeros((len(word_index) + 1, EMBEDDING_DIM))
for word, i in word_index.items():
embedding_vector = embeddings_index.get(word)
if embedding_vector is not None:
# words not found in embedding index will be all-zeros.
embedding_matrix[i] = embedding_vector
embedding_layer = Embedding(len(word_index) + 1,
EMBEDDING_DIM,
weights=[embedding_matrix],
input_length=MAX_SEQUENCE_LENGTH,
trainable=False)
return embedding_layer
model summary:
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 1500) 0
_________________________________________________________________
embedding_1 (Embedding) (None, 1500, 300) 9751200
_________________________________________________________________
lstm_1 (LSTM) (None, 1500, 16) 20288
_________________________________________________________________
lstm_2 (LSTM) (None, 1500, 32) 6272
_________________________________________________________________
flatten_1 (Flatten) (None, 48000) 0
_________________________________________________________________
dense_1 (Dense) (None, 1500) 72001500
=================================================================
Total params: 81,779,260
Trainable params: 72,028,060
Non-trainable params: 9,751,200
_________________________________________________________________
model.fit() summary (using a 999-sized dataset for testing, instad of the 20k-sized):
999/999 [==============================] - 62s 62ms/step - loss: 0.5491 - categorical_accuracy: 0.9680
I rewrote full answer, now it works (at least compiles and runs, can't say anything about convergence).
First, I don't know why you use sparse_categorical_crossentropy instead of categorical_crossentropy? It could be important. I change the model a bit, so it compiles and use a categorical_crossentropy. If you need a sparse one, change the shape of a target.
Also, I change batch_shape to shape argument, because it allows to use batches of different shape. It's easier to work with.
And the last edit: you should change generate_noise, because an Embedding layer awaits a numbers from (0, max_features), not the normally distributed floats (see a comment in the function).
EDIT
Addressing the last comments, I've removed a generate_noise and post modified generate_integer_noise function:
from keras.layers import Input, Embedding, LSTM
from keras.models import Model
import numpy as np
def generate_integer_noise(samples_number):
"""
samples_number is a number of samples, i.e. first dimension in (some, 1500)
"""
return np.random.randint(1, MAX_NB_WORDS, size=(samples_number, MAX_SEQUENCE_LENGTH))
MAX_SEQUENCE_LENGTH = 1500
"""
Tou can use your definition of embedding layer,
I post to make a reproducible example
"""
max_features, embed_dim = 10, 300
embedding_matrix = np.zeros((max_features, embed_dim))
output_shape = MAX_SEQUENCE_LENGTH
embedded_layer = Embedding(
max_features,
embed_dim,
weights=[embedding_matrix],
trainable=False
)
def generator_model(embedded_layer, output_shape):
"""
embedded_layer: Embedding keras layer
output_shape: shape of the target
"""
sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH, ))
embedded_sequences = embedded_layer(sequence_input) # Set trainable to the True if you wish to train
layer = LSTM(32, return_sequences=True)(embedded_sequences)
layer = LSTM(64, return_sequences=True)(layer)
output = LSTM(output_shape)(layer)
generator = Model(sequence_input, output)
return generator
generator = generator_model(embedded_layer, output_shape)
noise = generate_integer_noise(32)
# generator.predict(noise)
generator.compile(loss='categorical_crossentropy', optimizer='adam')
generator.fit(noise, noise)