How do I build a TFmodel from NumPy array files? - python

I have a dir with NumPy array files: bias1.npy, kernel1.npy, bias2.npy, kernel2.npy. How can I build a TF model that uses those arrays as kernels and biases of layers?

To avoid confusion bias matrix for the consistency of the numpy file is the 2D matrix with one column. This post shows how did I reproduce tf's model based on the numpy weights and biases.
class NumpyInitializer(tf.keras.initializers.Initializer):
# custom class converting numpy arrays to tf's initializers
# used to initialize both kernel and bias
def __init__(self, array):
# convert numpy array into tensor
self.array = tf.convert_to_tensor(array.tolist())
def __call__(self, shape, dtype=None):
# return tensor
return self.array
def restore_model_from_numpy(directory):
"""
Recreate model from the numpy files.
Numpy files in the directory are ordered by layers
and bias numpy matrix comes before numpy weight matrix.
In example:
directory-
- L1B.npy //numpy bias matrix for layer 1
- L1W.npy //numpy weights matrix for layer 1
- L2B.npy //numpy bias matrix for layer 2
- L2W.npy //numpy weights matrix for layer 2
Parameters:
directory - path to the directory with numpy files
Return:
tf's model recreated from numpy files
"""
def file_iterating(directory):
"""
Iterate over directory and create
dictionary of layers number and it's structure
layers[layer_number] = [numpy_bias_matrix, numpy_weight_matrix]
"""
pathlist = Path(directory).rglob("*.npy") # list of numpy files
layers = {} # initialize dictionary
index = 0
for file in pathlist: # iterate over file in the directory
if index % 2 == 0:
layers[int(index/2)] = [] # next layer - new key in dictionary
layers[int(index/2)].append(np.load(file)) # add to dictionary bias or weight
index +=1
print(file) # optional to show list of files we deal with
return layers # return dictionary
layers = file_iterating(directory) # get dictionary with model structure
inputs = Input(shape = (np.shape(layers[0][1])[0])) # create first model input layer
x = inputs
for key, value in layers.items(): # iterate over all levers in the layers dictionary
bias_initializer = NumpyInitializer(layers[key][0][0]) # create bias initializer for key's layer
kernal_initializer = NumpyInitializer(layers[key][1]) # create weights initializer for key's layer
layer_size = np.shape(layers[key][0])[-1] # get the size of the layer
new_layer = tf.keras.layers.Dense( # initialize new Dense layer
units = layer_size,
kernel_initializer=kernal_initializer,
bias_initializer = bias_initializer,
activation="tanh")
x = new_layer(x) # stack layer at the top of the previous layer
model = tf.keras.Model(inputs, x) # create tf's model based on the stacked layers
model.compile() # compile model
return model # return compiled model
In my directory, I had 4 numpy files (layer 1 - L1 and layer 2 - L2):
100_5_25_1Knapsack_Layer1\100_5_25_1Knapsack\L1B.npy , shape: (1, 80)
100_5_25_1Knapsack_Layer1\100_5_25_1Knapsack\L1W.npy , shape: (100, 80)
100_5_25_1Knapsack_Layer1\100_5_25_1Knapsack\L2B.npy , shape: (1, 100)
100_5_25_1Knapsack_Layer1\100_5_25_1Knapsack\L2W.npy , shape: (80, 100)
Calling the function result in:
m = restore_model_from_numpy(my_numpy_files_directory)
m.summary()
Model: "model_592"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_312 (InputLayer) [(None, 100)] 0
_________________________________________________________________
dense_137 (Dense) (None, 80) 8080
_________________________________________________________________
dense_138 (Dense) (None, 100) 8100
=================================================================
Total params: 16,180
Trainable params: 16,180
Non-trainable params: 0
_________________________________________________________________
I hope that this post will be helpful to anyone as it's my first one.
Happy coding :D

Related

Keras: list of Numpy arrays that you are passing to your model is not the size the model expected

I have implemented a Keras model using the functional API:
x_inp, x_out = graphsage_model.in_out_tensors()
prediction = layers.Dense(units=train_targets.shape[1], activation="softmax")(x_out)
model = Model(inputs=x_inp, outputs=prediction)
model.compile(
optimizer=optimizers.Adam(lr=0.005),
loss=losses.categorical_crossentropy,
metrics=["acc"],
)
With tensors are:
x_inp: [<tf.Tensor 'input_1:0' shape=(None, 1, 1433) dtype=float32>, <tf.Tensor 'input_2:0' shape=(None, 10, 1433) dtype=float32>, <tf.Tensor 'input_3:0' shape=(None, 50, 1433) dtype=float32>]
x_out: Tensor("lambda/Identity:0", shape=(None, 32), dtype=float32)
prediction: Tensor("dense/Identity:0", shape=(None, 7), dtype=float32)
train_targets.shape[1] = 7
As far as I understand is that my model has 50 units in the input layer, 32 in the hidden layer and 7 in the output layer in the functional API approach. To understand how the Sequential model of Keras works in contrast with the functional API, I've tried to implement this in the Sequential approach:
model = models.Sequential()
model.add(layers.Dense(32, activation='softmax', input_shape=(50,)))
model.add(layers.Dense(7, activation='softmax'))
But it gives me the following error:
ValueError: Error when checking model input: the list of Numpy arrays
that you are passing to your model is not the size the model expected.
Expected to see 1 array(s), but instead got the following list of 3
arrays:
Additional info about how the x_inp and x_out were computed:
def in_out_tensors(self, multiplicity=None):
"""
Builds a GraphSAGE model for node or link/node pair prediction, depending on the generator used to construct
the model (whether it is a node or link/node pair generator).
Returns:
tuple: (x_inp, x_out), where ``x_inp`` is a list of Keras input tensors
for the specified GraphSAGE model (either node or link/node pair model) and ``x_out`` contains
model output tensor(s) of shape (batch_size, layer_sizes[-1])
"""
if multiplicity is None:
multiplicity = self.multiplicity
if multiplicity == 1:
return self._node_model()
elif multiplicity == 2:
return self._link_model()
else:
raise RuntimeError(
"Currently only multiplicities of 1 and 2 are supported. Consider using node_model or "
"link_model method explicitly to build node or link prediction model, respectively."
)
def _node_model(self):
"""
Builds a GraphSAGE model for node prediction
Returns:
tuple: (x_inp, x_out) where ``x_inp`` is a list of Keras input tensors
for the specified GraphSAGE model and ``x_out`` is the Keras tensor
for the GraphSAGE model output.
"""
# Create tensor inputs for neighbourhood sampling
x_inp = [
Input(shape=(s, self.input_feature_size)) for s in self.neighbourhood_sizes
]
# Output from GraphSAGE model
x_out = self(x_inp)
# Returns inputs and outputs
return x_inp, x_out

How to get logits from a sequential model in keras/tensorflow? [duplicate]

I have trained a binary classification model with CNN, and here is my code
model = Sequential()
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1],
border_mode='valid',
input_shape=input_shape))
model.add(Activation('relu'))
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=pool_size))
# (16, 16, 32)
model.add(Convolution2D(nb_filters*2, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(Convolution2D(nb_filters*2, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=pool_size))
# (8, 8, 64) = (2048)
model.add(Flatten())
model.add(Dense(1024))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(2)) # define a binary classification problem
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adadelta',
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=batch_size,
nb_epoch=nb_epoch,
verbose=1,
validation_data=(x_test, y_test))
And here, I wanna get the output of each layer just like TensorFlow, how can I do that?
You can easily get the outputs of any layer by using: model.layers[index].output
For all layers use this:
from keras import backend as K
inp = model.input # input placeholder
outputs = [layer.output for layer in model.layers] # all layer outputs
functors = [K.function([inp, K.learning_phase()], [out]) for out in outputs] # evaluation functions
# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = [func([test, 1.]) for func in functors]
print layer_outs
Note: To simulate Dropout use learning_phase as 1. in layer_outs otherwise use 0.
Edit: (based on comments)
K.function creates theano/tensorflow tensor functions which is later used to get the output from the symbolic graph given the input.
Now K.learning_phase() is required as an input as many Keras layers like Dropout/Batchnomalization depend on it to change behavior during training and test time.
So if you remove the dropout layer in your code you can simply use:
from keras import backend as K
inp = model.input # input placeholder
outputs = [layer.output for layer in model.layers] # all layer outputs
functors = [K.function([inp], [out]) for out in outputs] # evaluation functions
# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = [func([test]) for func in functors]
print layer_outs
Edit 2: More optimized
I just realized that the previous answer is not that optimized as for each function evaluation the data will be transferred CPU->GPU memory and also the tensor calculations needs to be done for the lower layers over-n-over.
Instead this is a much better way as you don't need multiple functions but a single function giving you the list of all outputs:
from keras import backend as K
inp = model.input # input placeholder
outputs = [layer.output for layer in model.layers] # all layer outputs
functor = K.function([inp, K.learning_phase()], outputs ) # evaluation function
# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = functor([test, 1.])
print layer_outs
From https://keras.io/getting-started/faq/#how-can-i-obtain-the-output-of-an-intermediate-layer
One simple way is to create a new Model that will output the layers that you are interested in:
from keras.models import Model
model = ... # include here your original model
layer_name = 'my_layer'
intermediate_layer_model = Model(inputs=model.input,
outputs=model.get_layer(layer_name).output)
intermediate_output = intermediate_layer_model.predict(data)
Alternatively, you can build a Keras function that will return the output of a certain layer given a certain input, for example:
from keras import backend as K
# with a Sequential model
get_3rd_layer_output = K.function([model.layers[0].input],
[model.layers[3].output])
layer_output = get_3rd_layer_output([x])[0]
Based on all the good answers of this thread, I wrote a library to fetch the output of each layer. It abstracts all the complexity and has been designed to be as user-friendly as possible:
https://github.com/philipperemy/keract
It handles almost all the edge cases.
Hope it helps!
Following looks very simple to me:
model.layers[idx].output
Above is a tensor object, so you can modify it using operations that can be applied to a tensor object.
For example, to get the shape model.layers[idx].output.get_shape()
idx is the index of the layer and you can find it from model.summary()
This answer is based on: https://stackoverflow.com/a/59557567/2585501
To print the output of a single layer:
from tensorflow.keras import backend as K
layerIndex = 1
func = K.function([model.get_layer(index=0).input], model.get_layer(index=layerIndex).output)
layerOutput = func([input_data]) # input_data is a numpy array
print(layerOutput)
To print output of every layer:
from tensorflow.keras import backend as K
for layerIndex, layer in enumerate(model.layers):
func = K.function([model.get_layer(index=0).input], layer.output)
layerOutput = func([input_data]) # input_data is a numpy array
print(layerOutput)
I wrote this function for myself (in Jupyter) and it was inspired by indraforyou's answer. It will plot all the layer outputs automatically. Your images must have a (x, y, 1) shape where 1 stands for 1 channel. You just call plot_layer_outputs(...) to plot.
%matplotlib inline
import matplotlib.pyplot as plt
from keras import backend as K
def get_layer_outputs():
test_image = YOUR IMAGE GOES HERE!!!
outputs = [layer.output for layer in model.layers] # all layer outputs
comp_graph = [K.function([model.input]+ [K.learning_phase()], [output]) for output in outputs] # evaluation functions
# Testing
layer_outputs_list = [op([test_image, 1.]) for op in comp_graph]
layer_outputs = []
for layer_output in layer_outputs_list:
print(layer_output[0][0].shape, end='\n-------------------\n')
layer_outputs.append(layer_output[0][0])
return layer_outputs
def plot_layer_outputs(layer_number):
layer_outputs = get_layer_outputs()
x_max = layer_outputs[layer_number].shape[0]
y_max = layer_outputs[layer_number].shape[1]
n = layer_outputs[layer_number].shape[2]
L = []
for i in range(n):
L.append(np.zeros((x_max, y_max)))
for i in range(n):
for x in range(x_max):
for y in range(y_max):
L[i][x][y] = layer_outputs[layer_number][x][y][i]
for img in L:
plt.figure()
plt.imshow(img, interpolation='nearest')
From: https://github.com/philipperemy/keras-visualize-activations/blob/master/read_activations.py
import keras.backend as K
def get_activations(model, model_inputs, print_shape_only=False, layer_name=None):
print('----- activations -----')
activations = []
inp = model.input
model_multi_inputs_cond = True
if not isinstance(inp, list):
# only one input! let's wrap it in a list.
inp = [inp]
model_multi_inputs_cond = False
outputs = [layer.output for layer in model.layers if
layer.name == layer_name or layer_name is None] # all layer outputs
funcs = [K.function(inp + [K.learning_phase()], [out]) for out in outputs] # evaluation functions
if model_multi_inputs_cond:
list_inputs = []
list_inputs.extend(model_inputs)
list_inputs.append(0.)
else:
list_inputs = [model_inputs, 0.]
# Learning phase. 0 = Test mode (no dropout or batch normalization)
# layer_outputs = [func([model_inputs, 0.])[0] for func in funcs]
layer_outputs = [func(list_inputs)[0] for func in funcs]
for layer_activations in layer_outputs:
activations.append(layer_activations)
if print_shape_only:
print(layer_activations.shape)
else:
print(layer_activations)
return activations
Previous solutions were not working for me. I handled this issue as shown below.
layer_outputs = []
for i in range(1, len(model.layers)):
tmp_model = Model(model.layers[0].input, model.layers[i].output)
tmp_output = tmp_model.predict(img)[0]
layer_outputs.append(tmp_output)
Wanted to add this as a comment (but don't have high enough rep.) to #indraforyou's answer to correct for the issue mentioned in #mathtick's comment. To avoid the InvalidArgumentError: input_X:Y is both fed and fetched. exception, simply replace the line outputs = [layer.output for layer in model.layers] with outputs = [layer.output for layer in model.layers][1:], i.e.
adapting indraforyou's minimal working example:
from keras import backend as K
inp = model.input # input placeholder
outputs = [layer.output for layer in model.layers][1:] # all layer outputs except first (input) layer
functor = K.function([inp, K.learning_phase()], outputs ) # evaluation function
# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = functor([test, 1.])
print layer_outs
p.s. my attempts trying things such as outputs = [layer.output for layer in model.layers[1:]] did not work.
Assuming you have:
1- Keras pre-trained model.
2- Input x as image or set of images. The resolution of image should be compatible with dimension of the input layer. For example 80*80*3 for 3-channels (RGB) image.
3- The name of the output layer to get the activation. For example, "flatten_2" layer. This should be include in the layer_names variable, represents name of layers of the given model.
4- batch_size is an optional argument.
Then you can easily use get_activation function to get the activation of the output layer for a given input x and pre-trained model:
import six
import numpy as np
import keras.backend as k
from numpy import float32
def get_activations(x, model, layer, batch_size=128):
"""
Return the output of the specified layer for input `x`. `layer` is specified by layer index (between 0 and
`nb_layers - 1`) or by name. The number of layers can be determined by counting the results returned by
calling `layer_names`.
:param x: Input for computing the activations.
:type x: `np.ndarray`. Example: x.shape = (80, 80, 3)
:param model: pre-trained Keras model. Including weights.
:type model: keras.engine.sequential.Sequential. Example: model.input_shape = (None, 80, 80, 3)
:param layer: Layer for computing the activations
:type layer: `int` or `str`. Example: layer = 'flatten_2'
:param batch_size: Size of batches.
:type batch_size: `int`
:return: The output of `layer`, where the first dimension is the batch size corresponding to `x`.
:rtype: `np.ndarray`. Example: activations.shape = (1, 2000)
"""
layer_names = [layer.name for layer in model.layers]
if isinstance(layer, six.string_types):
if layer not in layer_names:
raise ValueError('Layer name %s is not part of the graph.' % layer)
layer_name = layer
elif isinstance(layer, int):
if layer < 0 or layer >= len(layer_names):
raise ValueError('Layer index %d is outside of range (0 to %d included).'
% (layer, len(layer_names) - 1))
layer_name = layer_names[layer]
else:
raise TypeError('Layer must be of type `str` or `int`.')
layer_output = model.get_layer(layer_name).output
layer_input = model.input
output_func = k.function([layer_input], [layer_output])
# Apply preprocessing
if x.shape == k.int_shape(model.input)[1:]:
x_preproc = np.expand_dims(x, 0)
else:
x_preproc = x
assert len(x_preproc.shape) == 4
# Determine shape of expected output and prepare array
output_shape = output_func([x_preproc[0][None, ...]])[0].shape
activations = np.zeros((x_preproc.shape[0],) + output_shape[1:], dtype=float32)
# Get activations with batching
for batch_index in range(int(np.ceil(x_preproc.shape[0] / float(batch_size)))):
begin, end = batch_index * batch_size, min((batch_index + 1) * batch_size, x_preproc.shape[0])
activations[begin:end] = output_func([x_preproc[begin:end]])[0]
return activations
In case you have one of the following cases:
error: InvalidArgumentError: input_X:Y is both fed and fetched
case of multiple inputs
You need to do the following changes:
add filter out for input layers in outputs variable
minnor change on functors loop
Minimum example:
from keras.engine.input_layer import InputLayer
inp = model.input
outputs = [layer.output for layer in model.layers if not isinstance(layer, InputLayer)]
functors = [K.function(inp + [K.learning_phase()], [x]) for x in outputs]
layer_outputs = [fun([x1, x2, xn, 1]) for fun in functors]
Well, other answers are very complete, but there is a very basic way to "see", not to "get" the shapes.
Just do a model.summary(). It will print all layers and their output shapes. "None" values will indicate variable dimensions, and the first dimension will be the batch size.
Generally, output size can be calculated as
[(W−K+2P)/S]+1
where
W is the input volume - in your case you have not given us this
K is the Kernel size - in your case 2 == "filter"
P is the padding - in your case 2
S is the stride - in your case 3
Another, prettier formulation:

Keras LSTM Model for text-generation purpose

I am a beginner with Keras and in writing Neural Networks models and actually I'm trying to write a LSTM for text-generation purpose, without success. What am I doing wrong?
I read this question: here
and other articles but there is something I am missing I can't get, sorry if I seem dumb.
The goal
My purpose is to generate english articles of a fixed length (1500 by now).
Suppose I have a 20k records dataset in sequences (articles, basically) of different lengths, I set a fixed length for all articles (MAX_SEQUENCE_LENGTH=1500) and tokenized them, getting a matrix (X, my training-data) looking like:
[[ 0 0 0 ... 88 664 206]
[ 0 0 0 ... 1 93 140]
[ 0 0 0 ... 3 173 2283]
...
[ 50 2761 4 ... 167 148 156]
[ 0 0 0 ... 10 77 206]
[ 0 0 0 ... 167 148 156]]
with a shape of 20000x1500
the output of my LSTM should be a 1 x MAX_SEQUENCE_LENGTH array of tokens.
My model looks like that:
def generator_model(sequence_input, embedded_sequences, output_shape):
layer = LSTM(16,return_sequences = True)(embedded_sequences)
layer = LSTM(32,return_sequences = True)(layer)
layer = Flatten()(layer)
output = Dense(output_shape, activation='softmax')(layer)
generator = Model(sequence_input, output)
return generator
with:
sequence_input = Input(batch_shape=(1, 1,1500), dtype='int32')
embedded_sequences = embedding_layer(sequence_input)
output_shape = MAX_SEQUENCE_LENGTH
the LSTM is supposed to train, with model.fit(), on a training-set of 20k x MAX_SEQUENCE_LENGTH shape (X).
and getting an array of tokens with 1 x MAX_SEQUENCE_LENGTH shape as output when I call model.predict(seed), with seed a random noise array.
compile, fit and predict
comments for the following section:
. generator.compile works, the model is given in edit section of ths post.
. generator.fit compile, epochs=1 param is for testing-purpose, will be BATCH_NUM
. now i have some doubts on the y I give to generator.fit, by now I'm giving a matrix of 0 as target output, if I generate it with a different shape from the X.shape[0], it throw the error, this means it needs to have a label for every record in X. but if I give him a matrix of 0 as target for model.fit, isn't it going to predict just arrays of 0?
. the error is giving is always the same, despite i use the noise_generator() or noise_integer_generator(), i believe it's because it doesn't like the y_shape param i'm giving
embedding_layer = load_embeddings(word_index)
sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,))
embedded_sequences = embedding_layer(sequence_input)
generator = generator_model(sequence_input, embedded_sequences, X.shape[1])
print(generator.summary())
generator.compile(loss='sparse_categorical_crossentropy', optimizer='adam')
Xnoise = generate_integer_noise(MAX_SEQUENCE_LENGTH)
y_shape = np.zeros((X.shape[0],), dtype=int)
generator.fit(X, y_shape, epochs=1)
acc = generator.predict(Xnoise, verbose=1)
But actually I'm getting the following error
ValueError: Error when checking input: expected input_1 to have shape (1500,) but got array with shape (1,)
when I call:
Xnoise = generate_noise(samples_number=MAX_SEQUENCE_LENGTH)
generator.predict(Xnoise, verbose=1)
The noise I give is a 1 x 1500 array, but it seems it's expecting a (1500,) matrix, So there must be some kind of error in the shape settings for my output.
Is my model correct for my purpose? or did I wrote something really really stupid I can't see?
Thanks for the help you can give me, I appreciate that!
edit
Changelog:
v1.
###
- Changed model structure, now return_sequences = True and using shape instead of batch_shape
###
- Changed
sequence_input = Input(batch_shape=(1,1,1500), dtype='int32')
to
sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,))
###
- Changed the error the model is giving
v2.
###
- Changed generate_noise() code
###
- Added generate_integer_noise() code
###
- Added full sequence with the model compile, fit and predict
###
- Added model.fit summary under the model summary, in the tail of the post
generate_noise() code:
def generate_noise(samples_number, mean=0.5, stdev=0.1):
noise = np.random.normal(mean, stdev, (samples_number, MAX_SEQUENCE_LENGTH))
print(noise.shape)
return noise
which print: (1500,)
generate_integer_noise() code:
def generate_integer_noise(samples_number):
noise = []
for _ in range(0, samples_number):
noise.append(np.random.randint(1, MAX_NB_WORDS))
Xnoise = np.asarray(noise)
return Xnoise
my function load_embeddings() is as follow:
def load_embeddings(word_index, embeddingsfile='Embeddings/glove.6B.%id.txt' %EMBEDDING_DIM):
embeddings_index = {}
f = open(embeddingsfile, 'r', encoding='utf8')
for line in f:
values = line.split(' ') #split the line by spaces
word = values[0] #each line starts with the word
coefs = np.asarray(values[1:], dtype='float32') #the rest of the line is the vector
embeddings_index[word] = coefs #put into embedding dictionary
f.close()
print('Found %s word vectors.' % len(embeddings_index))
embedding_matrix = np.zeros((len(word_index) + 1, EMBEDDING_DIM))
for word, i in word_index.items():
embedding_vector = embeddings_index.get(word)
if embedding_vector is not None:
# words not found in embedding index will be all-zeros.
embedding_matrix[i] = embedding_vector
embedding_layer = Embedding(len(word_index) + 1,
EMBEDDING_DIM,
weights=[embedding_matrix],
input_length=MAX_SEQUENCE_LENGTH,
trainable=False)
return embedding_layer
model summary:
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 1500) 0
_________________________________________________________________
embedding_1 (Embedding) (None, 1500, 300) 9751200
_________________________________________________________________
lstm_1 (LSTM) (None, 1500, 16) 20288
_________________________________________________________________
lstm_2 (LSTM) (None, 1500, 32) 6272
_________________________________________________________________
flatten_1 (Flatten) (None, 48000) 0
_________________________________________________________________
dense_1 (Dense) (None, 1500) 72001500
=================================================================
Total params: 81,779,260
Trainable params: 72,028,060
Non-trainable params: 9,751,200
_________________________________________________________________
model.fit() summary (using a 999-sized dataset for testing, instad of the 20k-sized):
999/999 [==============================] - 62s 62ms/step - loss: 0.5491 - categorical_accuracy: 0.9680
I rewrote full answer, now it works (at least compiles and runs, can't say anything about convergence).
First, I don't know why you use sparse_categorical_crossentropy instead of categorical_crossentropy? It could be important. I change the model a bit, so it compiles and use a categorical_crossentropy. If you need a sparse one, change the shape of a target.
Also, I change batch_shape to shape argument, because it allows to use batches of different shape. It's easier to work with.
And the last edit: you should change generate_noise, because an Embedding layer awaits a numbers from (0, max_features), not the normally distributed floats (see a comment in the function).
EDIT
Addressing the last comments, I've removed a generate_noise and post modified generate_integer_noise function:
from keras.layers import Input, Embedding, LSTM
from keras.models import Model
import numpy as np
def generate_integer_noise(samples_number):
"""
samples_number is a number of samples, i.e. first dimension in (some, 1500)
"""
return np.random.randint(1, MAX_NB_WORDS, size=(samples_number, MAX_SEQUENCE_LENGTH))
MAX_SEQUENCE_LENGTH = 1500
"""
Tou can use your definition of embedding layer,
I post to make a reproducible example
"""
max_features, embed_dim = 10, 300
embedding_matrix = np.zeros((max_features, embed_dim))
output_shape = MAX_SEQUENCE_LENGTH
embedded_layer = Embedding(
max_features,
embed_dim,
weights=[embedding_matrix],
trainable=False
)
def generator_model(embedded_layer, output_shape):
"""
embedded_layer: Embedding keras layer
output_shape: shape of the target
"""
sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH, ))
embedded_sequences = embedded_layer(sequence_input) # Set trainable to the True if you wish to train
layer = LSTM(32, return_sequences=True)(embedded_sequences)
layer = LSTM(64, return_sequences=True)(layer)
output = LSTM(output_shape)(layer)
generator = Model(sequence_input, output)
return generator
generator = generator_model(embedded_layer, output_shape)
noise = generate_integer_noise(32)
# generator.predict(noise)
generator.compile(loss='categorical_crossentropy', optimizer='adam')
generator.fit(noise, noise)

Optimizing this stacking of N weight-sharing Keras models

I have two Keras (Tensorflow backend) models, which are stacked to make a combined model:
small_model with In: (None,K), Out: (None,K)
large_model with In: (None,N,K), Out: (None,1)
combined_model (N x small_model -> large_model) with In: (None,N,K), Out: (None,1)
large_model needs N stacked outputs from small_model as input.
I can define N small_models, which share weights, then concatenate their outputs (technically, I need to stack them), and then send that to large_model, as in the code below.
My problem is that I need to be able to do this for very large N (> 10**6), and that my current solution uses a lot of memory and time when creating the models, even for N ~ 10**2.
I'm hoping that there is a solution which sends the N data points through small_model in parallel (like what is done when giving a batch to a model), collects those points (with the Keras history, so that backprop is possible) and sends that to large_model, without having to define the N instances of small_model. The listed input and output shapes for the three models should not change, but other intermediate models can of course be defined.
Thank you.
Current unsatisfactory solution (assume that small_model and large_model already exist, and that N,K are defined):
from keras.layers import Input, Lambda
from keras.models import Model
from keras import backend as K
def build_small_model_on_batch():
def distribute_inputs_to_small_model(input):
return [small_model(input[:,i]) for i in range(N)]
def stacker(list_of_tensors):
return K.stack(list_of_tensors, axis=1)
input = Input(shape=(N,K,))
small_model_outputs = Lambda(distribute_inputs_to_small_model)(input)
stacked_small_model_outputs = Lambda(stacker)(small_model_outputs)
return Model(input, stacked_small_model_outputs)
def build_combined():
input = Input(shape=(N,K,))
stacked_small_model_outputs = small_model_on_batch(input)
output = large_model(stacked_small_model_outputs)
return Model(input, output)
small_model_on_batch = build_small_model_on_batch()
combined = build_combined()
You can do that with a TimeDistributed layer wrapper:
from keras.layers import Input, Dense, TimeDistributed
from keras.models import Sequential, Model
N = None # Use fixed value if you do not want variable input size
K = 20
def small_model():
inputs = Input(shape=(K,))
# Define the small model
# Here it is just a single dense layer
outputs = Dense(K, activation='relu')(inputs)
return Model(inputs=inputs, outputs=outputs)
def large_model():
inputs = Input(shape=(N, K))
# Define the large model
# Just a single neuron here
outputs = Dense(1, activation='relu')(inputs)
return Model(inputs=inputs, outputs=outputs)
def combined_model():
inputs = Input(shape=(N, K))
# The TimeDistributed layer applies the given model
# to every input across dimension 1 (N)
small_model_out = TimeDistributed(small_model())(inputs)
# Apply large model
outputs = large_model()(small_model_out)
return Model(inputs=inputs, outputs=outputs)
model = combined_model()
model.compile(loss='mean_squared_error', optimizer='sgd')
model.summary()
Output:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, None, 20) 0
_________________________________________________________________
time_distributed_1 (TimeDist (None, None, 20) 420
_________________________________________________________________
model_2 (Model) (None, None, 1) 21
=================================================================
Total params: 441
Trainable params: 441
Non-trainable params: 0
_________________________________________________________________

How to make Keras have two different initialisers in a dense layer?

I have two separately designed CNNs for two different features(image and text) of the same data, and the output has two classes
In the very last layer:
for image (resnet), I would like to use "he_normal" as the initializer
flatten1 = Flatten()(image_maxpool)
dense = Dense(output_dim=2, kernel_initializer="he_normal")(flatten1)
but for the text CNNs, i would like to use the default "glorot_normal"
flatten2 = Flatten()(text_maxpool)
output = Dense(output_dim=2, kernel_initializer="glorot_normal")(flatten2)
the flatten1 and flatten2 have sizes:
flatten_1 (Flatten) (None, 512)
flatten_2 (Flatten) (None, 192)
is there anyway i can concate these two flatten layers and have a long dense layer with a size 192+512 = 704, where the first 192 and second 512 has two seperate kernel_initializer, and produce a 2-class outputs?
something like this:
merged_tensor = merge([flatten1, flatten2], mode='concat', concat_axis=1)
output = Dense(output_dim=2,
kernel_initializer for [:512]='he_normal',
kernel_initializer for [512:]='glorot_normal')(merged_tensor)
Edit: I think I have gotten this work by having the following codes(thanks to #Aechlys):
def my_init(shape, shape1, shape2):
x = initializers.he_normal()(shape1)
y = initializers.glorot_normal()(shape2)
return tf.concat([x,y], 0)
class_num = 2
flatten1 = Flatten()(image_maxpool)
flatten2 = Flatten()(text_maxpool)
merged_tensor = concatenate([flatten1, flatten2],axis=-1)
output = Dense(output_dim=class_num, kernel_initializer=lambda shape: my_init(shape,\
shape1=(512,class_num),\
shape2=(192,class_num)),\
activation='softmax')(merged_tensor)
I have to manually add the shape size 512 and 192, because I failed to get the size of flatten1 and flatten1 via the code
flatten1.get_shape().as_list()
,which gave me [none, none], althought it should be [None, 512], other than that it should be fine
Oh my, have I had fun with this one. You have to create your own kernel intializer:
def my_init(shape, dtype=None, *, shape1, shape2):
x = keras.initializers.he_normal()(shape1, dtype=dtype)
y = keras.initializers.glorot_normal()(shape2, dtype=dtype)
return tf.concat([x,y], 0)
Then you will call it via lambda function within the Dense function:
Unfortunately, as you can see, I have not been able to deduce the shape programatically, yet. I may update this answer when I do. But, if you know the shape beforehand you can pass them as constants:
DENSE_UNITS = 64
input_t = Input((1,25))
input_i = Input((1,35))
input_a = Concatenate(axis=-1)([input_t, input_i])
dense = Dense(DENSE_UNITS, kernel_initializer=lambda shape: my_init(shape,
shape1=(int(input_t.shape[-1]), DENSE_UNITS),
shape2=(int(input_i.shape[-1]), DENSE_UNITS)))(input_a)
tf.keras.Model(inputs=[input_t, input_i], outputs=dense)
Out: <tensorflow.python.keras._impl.keras.engine.training.Model at 0x19ff7baac88>

Categories