Image + float array as input in a Keras model - python

I have an image as input on my model, but I need to input some floats as well as support information about the image, but I don´t want it to go through all the convolutions, I want it to go directly to my dense layers as information on how to train it. I know about the concatenate layer but I don´t know how to use it in the input, or if that is how it should be done.

Assuming you have a backbone which can be any convolutional neural nets (VGG, ResNet, etc.). Before the dense layer, you usually have a Flatten() one (or, in modern neural nets, you usually have a pooling layer like GAP or GeM) which prepares a 1D vector as input to your Dense layer. That's where you can concatenate with your floats.
Code example using Functional API:
class MyModel(tf.keras.Model):
def __init__(self, num_output_classes):
super().__init__()
self.backbone = tf.keras.applications.ResNet50(
input_shape=(224, 224, 3), include_top=False)
self.pool = tf.keras.layers.GlobalAveragePooling2D()
self.concat = tf.keras.layers.Concatenate(axis=-1)
self.dense = tf.keras.layers.Dense(num_output_classes)
def call(self, inputs):
# Unpack the inputs. `additional_floats` should be 1D
image, additional_floats = inputs
# Run image through backbone and get a feature vector
x = self.backbone(image)
x = self.pool(x)
# Concatenate with your additional floats
x = self.concat([x, additional_inputs])
# Classification, or whatever you might need on top
return self.dense(x, activation='softmax')

Related

python keras tensorflow - change Dense layer dot product to cosine distance

Creating a small example of the following will be a little bit difficult, so I will give a more abstract example. If it will be needed I can construct a reproducible example.
In general, I'm trying to build a 'fine tuning' model.
I have an 'embedding' architecture that takes an image and outputs a 256 vector.
I have 22 classes I try to predict.
After some initial work I have come with a (22, 256) matrix representing the embeddings of those classes.
So, now after I have my embedding layer, I'm adding a Dense layer (named 'layert') to follow it. This dense layer's kernel (weights) will hold the above matrix (22, 256) which represents my classes.
I will set 'layert' weights to be this matrix, and its biases to be 0.
The key of the question here is, how do I correctly make that Dense layer do a cosine similarity computation (between whatever comes from the embedding, and this Dense layer)
I have overcome this problem with building my own Dense layer by inheriting keras.layers.Dense but I feel this is not a good solution and wouldn't hold in production.
Let's show some code:
# self.embedding is a working keras model as written above
# self.mean_embd_logos is the matrix (22, 256) described above (it's of tf.Tensor type)
inputs = self.embedding.inputs
x = self.embedding(inputs)
# This is the Dense layer which will get 256 sized tensor and outputs a 22 sized tensor
layert = keras.layers.Dense(units=self.mean_embd_logos.shape[0], name='mean_logos_tensor',
bias_initializer='zeros', kernel_initializer='zeros')
output = layert(x)
# Here we override the initial weights and initialize them with the (22, 256) matrix vector
w = self.mean_embd_logos.numpy().T
b = layert.get_weights()[1]
layert.set_weights([w, b])
output = keras.layers.activation.softmax.Softmax()(output)
self.finetune_model = Model(inputs=inputs, outputs=output)
Ofcourse by this code example we will have a simple dot product between what ever comes out of the embedding layer to the Dense layer
How I solved it:
as described above, inherit from Dense and override call:
from keras.layers import Dense
import tensorflow as tf
class DenseCosineSimilarity(Dense):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def call(self, inputs):
# Everything is copied until this part:
# ...
a = tf.nn.l2_normalize(inputs, -1)
b = tf.nn.l2_normalize(self.kernel, 0)
outputs = tf.matmul(a=a, b=b)
So this is how I build a "Dense layer" that calculates a cosine similarity instead of dot product.
Original keras Dense layer has something like this:
outputs = tf.matmul(inputs, self.kernel)
Every comment/suggestion will be much appreciated.

Make fixed timestep length LSTM Keras model free timestep length

I have a Keras LSTM multitask model that performs two tasks. One is a sequence tagging task (so I predict a label per token). The other is a global classification task over the whole sequence using a CNN that is stacked on the hidden states of the LSTM.
In my setup (don't ask why) I only need the CNN task during training, but the labels it predicts have no use on the final product. So, on Keras, one can train a LSTM model without especifiying the input sequence lenght. like this:
l_input = Input(shape=(None,), dtype="int32", name=input_name)
However, if I add the CNN stacked on the LSTM hidden states I need to set a fixed sequence length for the model.
l_input = Input(shape=(timesteps_size,), dtype="int32", name=input_name)
The problem is that once I have trained the model with a fixed timestep_size I can no longer use it to predict longer sequences.
In other frameworks this is not a problem. But in Keras, I cannot get rid of the CNN and change the expected input shape of the model once it has been trained.
Here is a simplified version of the model
l_input = Input(shape=(timesteps_size,), dtype="int32")
l_embs = Embedding(len(input.keys()), 100)(l_input)
l_blstm = Bidirectional(GRU(300, return_sequences=True))(l_embs)
# Sequential output
l_out1 = TimeDistributed(Dense(len(labels.keys()),
activation="softmax"))(l_blstm)
# Global output
conv1 = Conv1D( filters=5 , kernel_size=10 )( l_embs )
conv1 = Flatten()(MaxPooling1D(pool_size=2)( conv1 ))
conv2 = Conv1D( filters=5 , kernel_size=8 )( l_embs )
conv2 = Flatten()(MaxPooling1D(pool_size=2)( conv2 ))
conv = Concatenate()( [conv1,conv2] )
conv = Dense(50, activation="relu")(conv)
l_out2 = Dense( len(global_labels.keys()) ,activation='softmax')(conv)
model = Model(input=input, output=[l_out1, l_out2])
optimizer = Adam()
model.compile(optimizer=optimizer,
loss="categorical_crossentropy",
metrics=["accuracy"])
I would like to know if anyone here has faced this issue, and if there are any solutions to delete layers from a model after training and, more important, how to reshape input layer sizes after training.
Thanks
Variable timesteps length makes a problem not because of using convolution layers (actually the good thing about convolution layers is that they do not depend on the input size). Rather, using Flatten layers cause the problem here since they need an input with specified size. Instead, you can use Global Pooling layers. Further, I think stacking convolution and pooling layers on top of each other might give a better result instead of using two separate convolution layers and merging them (although this depends on the specific problem and dataset you are working on). So considering these two points it might be better to write your model like this:
# Global output
conv1 = Conv1D(filters=16, kernel_size=5)(l_embs)
conv1 = MaxPooling1D(pool_size=2)(conv1)
conv2 = Conv1D(filters=32, kernel_size=5)(conv1)
conv2 = MaxPooling1D(pool_size=2)(conv2)
gpool = GlobalAveragePooling1D()(conv2)
x = Dense(50, activation="relu")(gpool)
l_out2 = Dense(len(global_labels.keys()), activation='softmax')(x)
model = Model(inputs=l_input, outputs=[l_out1, l_out2])
You may need to tune the number of conv+maxpool layers, number of filters, kernel size and even add dropout or batch normalization layers.
As a side note, using TimeDistributed on a Dense layer is redundant as the Dense layer is applied on the last axis.

How to use Bidirectional RNN and Conv1D in keras when shapes are not matching?

I am brand new to Deep-Learning so I'm reading though Deep Learning with Keras by Antonio Gulli and learning a lot. I want to start using some of the concepts. I want to try and implement a neural network with a 1-dimensional convolutional layer that feeds into a bidirectional recurrent layer (like the paper below). All the tutorials or code snippets I've encountered do not implement anything remotely similar to this (e.g. image recognition) or use an older version of keras with different functions and usage.
What I'm trying to do is a variation of this paper:
(1) convert DNA sequences to one-hot encoding vectors; ✓
(2) use a 1 dimensional convolutional neural network; ✓
(3) with max pooling; ✓
(4) send the output to a bidirectional RNN; ⓧ
(5) classify the input;
I cannot figure out how to get the shapes to match up on the Bidirectional RNN. I can't even get an ordinary RNN to work at this stage. How can I restructure the incoming layers to work with a Bidirectional RNN?
Note:
The original code came from https://github.com/uci-cbcl/DanQ/blob/master/DanQ_train.py but I simplified the output layer to just do binary classification. This processed was described (kind of) in https://github.com/fchollet/keras/issues/3322 but I cannot get it to work with the updated keras. The original code (and the 2nd link) work on a very large dataset so I am generating some fake data to illustrate the concept. They are also using an older version of keras where key functionality changes have been made since then.
# Imports
import tensorflow as tf
import numpy as np
from tensorflow.python.keras._impl.keras.layers.core import *
from tensorflow.python.keras._impl.keras.layers import Conv1D, MaxPooling1D, SimpleRNN, Bidirectional, Input
from tensorflow.python.keras._impl.keras.models import Model, Sequential
# Set up TensorFlow backend
K = tf.keras.backend
K.set_session(tf.Session())
np.random.seed(0) # For keras?
# Constants
NUMBER_OF_POSITIONS = 40
NUMBER_OF_CLASSES = 2
NUMBER_OF_SAMPLES_IN_EACH_CLASS = 25
# Generate sequences
https://pastebin.com/GvfLQte2
# Build model
# ===========
# Input Layer
input_layer = Input(shape=(NUMBER_OF_POSITIONS,4))
# Hidden Layers
y = Conv1D(100, 10, strides=1, activation="relu", )(input_layer)
y = MaxPooling1D(pool_size=5, strides=5)(y)
y = Flatten()(y)
y = Bidirectional(SimpleRNN(100, return_sequences = True, activation="tanh", ))(y)
y = Flatten()(y)
y = Dense(100, activation='relu')(y)
# Output layer
output_layer = Dense(NUMBER_OF_CLASSES, activation="softmax")(y)
model = Model(input_layer, output_layer)
model.compile(optimizer="adam", loss="categorical_crossentropy", )
model.summary()
# ~/anaconda/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/layers/recurrent.py in build(self, input_shape)
# 1049 input_shape = tensor_shape.TensorShape(input_shape).as_list()
# 1050 batch_size = input_shape[0] if self.stateful else None
# -> 1051 self.input_dim = input_shape[2]
# 1052 self.input_spec[0] = InputSpec(shape=(batch_size, None, self.input_dim))
# 1053
# IndexError: list index out of range
You don't need to restructure anything at all to get the output of a Conv1D layer into an LSTM layer.
So, the problem is simply the presence of the Flatten layer, which destroys the shape.
These are the shapes used by Conv1D and LSTM:
Conv1D: (batch, length, channels)
LSTM: (batch, timeSteps, features)
Length is the same as timeSteps, and channels is the same as features.
Using the Bidirectional wrapper won't change a thing either. It will only duplicate your output features.
Classifying.
If you're going to classify the entire sequence as a whole, your last LSTM must use return_sequences=False. (Or you may use some flatten + dense instead after)
If you're going to classify each step of the sequence, all your LSTMs should have return_sequences=True. You should not flatten the data after them.

Keras Conv2D layer outputs array filled with NaN

I built a keras model that takes an image as input and performs several convolutions and a pooling operation, then performs a specialized convolution layer with pre-initialized weights. When run on an image, this model outputs an array of the correct shape, but with all the elements as NaN.
The first part of the model is the first "block" of the pretrained VGG16 model for keras. The specialized layer (keras.layers.Conv2D) takes its weights as a set of filters corresponding to certain features I want to extract from the image. It does not matter if i flip the filters (to do cross-correlation), or if i change the image, always NaN. Any ideas?
EDIT: here is code. Takes a numpy image array as input.
def make_model(features, layer_name="block2_conv1"):
vgg = VGG16(include_top=False)
layer = vgg.get_layer(layer_name)
x = layer.output
num_chars, char_w, char_h, char_filters = features.shape
filters = features.transpose((1, 2, 3, 0)).astype(int)
filters = filters / np.sqrt(np.sum(np.square(filters), axis=(0, 1), keepdims=True))
x = BatchNormalization()(x)
specialized_layer = Conv2D(num_chars, (char_w, char_h))
x = specialized_layer(x)
biases = np.zeros((num_chars, ))
specialized_layer.set_weights([filters, biases])
model = Model(inputs=vgg.input, outputs=x)
return model

Keras - How to construct a shared Embedding() Layer for each Input-Neuron

I want to create a deep neural network in keras, where each element of the input layer is "encoded" using the same, shared Embedding()-layer, before it is fed into the deeper layers.
Each input would be a number that defines the type of an object, and the network should learn an embedding that encapsulates some internal representation of "what this object is".
So, if the input layer has X dimensions, and the embedding has Y dimensions, the first hidden layer should consist of X*Y neurons (each input neuron embedded).
Here is a little image that should show the network architecture that I would like to create, where each input-element is encoded using a 3D-Embedding
How can I do this?
from keras.layers import Input, Embedding
first_input = Input(shape = (your_shape_tuple) )
second_input = Input(shape = (your_shape_tuple) )
...
embedding_layer = Embedding(embedding_size)
first_input_encoded = embedding_layer(first_input)
second_input_encoded = embedding_layer(second_input)
...
Rest of the model....
The emnedding_layer will have shared weights. You can do this in form of lists of layers if you have a lot of inputs.
If what you want is transforming a tensor of inputs, the way to do it is :
from keras.layers import Input, Embedding
# If your inputs are all fed in one numpy array :
input_layer = Input(shape = (num_input_indices,) )
# the output of this layer will be a 2D tensor of shape (num_input_indices, embedding_size)
embedded_input = Embedding(embedding_size)(input_layer)
Is this what you were looking for?

Categories