How to concatenate an input and a matrix in Keras - python

I am building a model in Keras. I have an input
X = Input(shape=(input_size, ), name='input_feature')
and a fixed pre-given numpy matrix D which is input_size by n.
I want to concatenate X and D before input them to the next layer. In other word, I need to concatenate each slice of X and D to generate a new inpt whose expected size should be (none, input_size, n+1). So what should I do to concatenate them? In my understanding, the batch size is none since it will adaptive to the batch size of input X when we fit data to the model.

Answer if D has shape (batch, input_size,n)
Provided that D is a tensor (it's a tensor if it's an output from some layer):
X = Reshape((input_size,1))(X)
concat = Concatenate()([D,X])
If D is not a tensor:
import keras.backend as K
#create a tensor:
Dval = K.variable(numpyArrayForD)
#create an input for D:
D = Input(tensor=Dval)
#do as in the top of this answer.
If you want to avoid the additional Input (it will not affect the way you train, because of the tensor parameter), you can use a lambda layer:
def concatenation(x):
D = K.variable(D_df)
return K.concatenate([x,D])
XD = Lambda(concatenation,output_shape=(input_size,n+1))(X)
Answer if D has shape (input_size,n)
In this case, it's probably better replicate D many times. You can do this outside of the model, using numpy functions before creating the K.variable (see the other answer), like this:
D_df = D_df.reshape((1,input_size,n))
D_df = numpy.repeat(D_df,batch,axis=0)
But this approach requires you to adapt to the size of x beforehand.
If you want something that adapts to any size of X without having to change D before, then it's more complicated....

Related

Preserving unknown batch dimension for custom static tensors in Tensorflow

Some notes: I'm using tensorflow 2.3.0, python 3.8.2, and numpy 1.18.5 (not sure if that one matters though)
I'm writing a custom layer that stores a non-trainable tensor N of shape (a, b) internally, where a, b are known values (this tensor is created during init). When called on an input tensor, it flattens the input tensor, flattens its stored tensor, and concatenates the two together. Unfortunately, I can't seem to figure out how to preserve the unknown batch dimension during this concatenation. Here's minimal code:
import tensorflow as tf
from tensorflow.keras.layers import Layer, Flatten
class CustomLayer(Layer):
def __init__(self, N): # N is a tensor of shape (a, b), where a, b > 1
super(CustomLayer, self).__init__()
self.N = self.add_weight(name="N", shape=N.shape, trainable=False, initializer=lambda *args, **kwargs: N)
# correct me if I'm wrong in using this initializer approach, but for some reason, when I
# just do self.N = N, this variable would disappear when I saved and loaded the model
def build(self, input_shape):
pass # my reasoning is that all the necessary stuff is handled in init
def call(self, input_tensor):
input_flattened = Flatten()(input_tensor)
N_flattened = Flatten()(self.N)
return tf.concat((input_flattened, N_flattened), axis=-1)
The first problem I noticed was that Flatten()(self.N) would return a tensor with the same shape (a, b) as the original self.N, and as a result, the returned value would have a shape of (a, num_input_tensor_values+b). My reasoning for this was that the first dimension, a, was treated as the batch size. I modified the call function:
def call(self, input_tensor):
input_flattened = Flatten()(input_tensor)
N = tf.expand_dims(self.N, axis=0) # N would now be shape (1, a, b)
N_flattened = Flatten()(N)
return tf.concat((input_flattened, N_flattened), axis=-1)
This would return a tensor with shape (1, num_input_vals + a*b), which is great, but now the batch dimension is permanently 1, which I realized when I started training a model with this layer and it would only work for a batch size of 1. This is also really apparent in the model summary - if I were to put this layer after an input and add some other layers afterwards, the first dimension of the output tensors goes like None, 1, 1, 1, 1.... Is there a way to store this internal tensor and use it in call while preserving the variable batch size? (For example, with a batch size of 4, a copy of the same flattened N would be concatenated onto the end of each of the 4 flattened input tensors.)
You have to have as many flattened N vectors, as you have samples in your input, because you are concatenating to every sample. Think of it like pairing up rows and concatenating them. If you have only one N vector, then only one pair can be concatenated.
To solve this, you should use tf.tile() to repeat N as many times as there are samples in your batch.
Example:
def call(self, input_tensor):
input_flattened = Flatten()(input_tensor) # input_flattened shape: (None, ..)
N = tf.expand_dims(self.N, axis=0) # N shape: (1, a, b)
N_flattened = Flatten()(N) # N_flattened shape: (1, a*b)
N_tiled = tf.tile(N_flattened, [tf.shape(input_tensor)[0], 1]) # repeat along the first dim as many times, as there are samples and leave the second dim alone
return tf.concat((input_flattened, N_tiled), axis=-1)

How to get output from randomly sampled k entries from a tensor

I have a keras/tf problem using sub-sampling of values from a tensor. My model is given below:
x_input = Input((input_size,))
enc1 = Dense(encoder_size[0], activation='relu')(x_input)
drop = Dropout(keep_prob)(enc1)
enc2 = Dense(encoder_size[1], activation='relu')(drop)
drop = Dropout(keep_prob)(enc2)
mu = Dense(latent_dim, activation='linear', name='encoder_mean')(drop)
encoder = Model(x_input,mu)
I want to sample from the input randomly and then get the encoded values of the input. The error I am getting is
ValueError: When feeding symbolic tensors to a model, we expect the tensors to have a static batch size. Got tensor with shape: (None, 13)
which I can understand is because "predict" does not work on placeholder but I am not sure what to pass to get the output for a placeholder.
# sample input randomly
sample_num = 500
idxs = tf.range(tf.shape(x_input)[0])
ridxs = tf.random_shuffle(idxs)[:sample_num]
sample_input = tf.gather(x_input, ridxs)
# get sample shape
sample_shape = K.shape(sample_input)
# sample from encoded value
sample_encoded = encoder.predict(sample_input) <----- Error
If you see the predict function documentation, it doesn't expect a placeholder or a tensor node as an expected set of input. You have to pass directly the Numpy array (in your case).
If you wish to perform some special data preprocessing which is not part of your regular model, you have to do it in Numpy and avoid Tensor computations for it.

Train on transformed output

I have a recurrent neural network model that maps a (N,) sequence to a (N,3) length sequence. My target outputs are actually (N,N) matrices. However, I have a deterministic function implemented in numpy that converts (N,3) into these (N,N) matrices in a particular way that I want. How can I use this operation in training? I.e. currently my neural network is giving out (N,3) sequences, how do I perform my function to convert it to (N,N) on these before calling keras.fit?
Edit: I should also note that it is much harder to do the reverse function from (N,N) to (N,3) so it's not a viable option to just convert my target outputs to the (N,3) output representations.
You can use a Lambda layer as the last layer of your model:
def convert_to_n_times_n(x):
# transform x from shape (N, 3) to (N, N)
transformation_layer = tf.keras.layers.Lambda(convert_to_n_times_n)
You probably want to use "tf-native methods" within your function as much as possible to avoid unnecessary conversions of tensors to numpy arrays and back.
If you only want to use the layer during training, but not during inference, you can achieve that using the functional API:
# create your original model (N,) -> (N, 3)
input_ = Input(shape=(N,))
x = SomeFancyLayer(...)(input_)
x = ...
...
inference_output = OtherFancyLayer(...)(x)
inference_model = Model(inputs=input_, outputs=inference_output)
# create & fit the training model
training_output = transformation_layer(inference_output)
training_model = Model(inputs=input_, outputs=training_output)
training_model.compile(...)
training_model.fit(X, Y)
# run inference using your original model
inference_model.predict(...)

Writing this exotic NN architecture with keras, tensorflow and python

I'm trying to get Keras to train a multiclass classification model that can be written in a network like this:
The only set of trainable parameters are those , all the rest is given. The functions fi are combinations of usual mathematical functions (for example .Sigma stands for summing the previous terms and softmax is the usual function. The (x1,x2,...xn) are elements of train or test set and are a specific subset of the original data already selected.
The model in more depth:
Specificaly, given (x_1,x_2,...,x_n) an input in train or test set, the network evaluates
where fi are given mathematical functions, are rows of a particular subset of the original data and the coefficients are the parameters I want to train.
As I'm using keras, I expect it to add a bias term to each row.
After the above evaluation, I will apply a softmax layer (each of the m lines above are numbers that will be inputs for the softmax function).
At the end I want to compile the model and run model.fit as usual.
The problem is that I couln't translate the expression to keras sintax.
My attempt:
Following the network scratch above, I first tried to consider each of the expressions of the form as lambda layers in a Sequential Model, but the best I could get to work was a combination of a dense layer with linear activation (which would play the role of a row's parameters: ) followed by a Lambda layer outputting a vector without the required summation, as follows:
model = Sequential()
#single row considered:
model.add(Lambda(lambda x: f_fixedRow(x), input_shape=(nFeatures,)))
#parameters set after lambda layer to get (a1*f(x1,y1),...,an*f(xn,yn)) and not (f(a1*x1,y1),...,f(an*xn,yn))
model.add(Dense(nFeatures, activation='linear'))
#missing summation: sum(x)
#missing evaluation of f in all other rows
model.add(Dense(classes,activation='softmax',trainable=False)) #should get all rows
model.compile(optimizer='sgd',
loss='categorical_crossentropy',
metrics=['accuracy'])
Also, I had to define the function in the lambda function call with the argument already fixed (because the lambda function could have only the input layers as variable):
def f_fixedRow(x):
#picking a particular row (as a vector) to evaluate f in (f works element-wise)
y=tf.constant(value=x[0,:],dtype=tf.float32)
return f(x,y)
I managed to write the f function with tensorflow (working element-wise in a row), although this is a possible source for problems in my code (and the above workaround seems unnatural).
I also thought that if I could properly write the element-wise sum of the vector in the aforementioned attempt I could repeat the same procedure in a parallelized manner with the keras Functional API and then insert the output of each parallel model in a softmax function, as I need.
Another approach that I considered was to train the parameters keeping their natural matrix structure seen in Network Description, maybe writing a matrix Lambda layer, but I could not find anything related to this idea.
Anyway, I'm not sure what is a good way to work with this model within keras, maybe I'm missing an important point because of the non standard way the parameters are written or lack of experience with tensorflow. Any suggestions are welcome.
For this answer, it's important that f be a tensor function that operates elementwise. (No iterating). This is reasonably easy to have, just check the keras backend functions.
Assumptions:
The x_pk set is constant, otherwise this solution must be reviewed.
The function f is elementwise (if not, please show f for better code)
Your model will need x_pk as a tensor input. And you should do that in a functional API model.
import keras.backend as K
from keras.layers import Input, Lambda, Activation
from keras.models import Model
#x_pk data
x_pk_numpy = select_X_pk_samples(x_train)
x_pk_tensor = K.variable(x_pk_numpy)
#number of rows in x_pk
m = len(x_pk_numpy)
#I suggest a fixed batch size for simplicity
batch = some_batch_size
First let's work on the function that will take x and x_pk calling f.
def calculate_f(inputs): #inputs will be a list with x and x_pk
x, x_pk = inputs
#since f will work elementwise, let's replicate x and x_pk so they have equal shapes
#please explain f for better optimization
# x from (batch, n) to (batch, m, n)
x = K.stack([x]*m, axis=1)
# x_pk from (m, n) to (batch, m, n)
x_pk = K.stack([x_pk]*batch, axis=0)
#a batch size of 1 could make this even simpler
#a variable batch size would make this more complicated
#certain f functions could make this process unnecessary
return f(x, x_pk)
Now, different from a Dense layer, this formula is using the a_pk weights multiplied elementwise. So we need a custom layer:
class ElementwiseWeights(Layer):
def __init__(self, **kwargs):
super(ElementwiseWeights, self).__init__(**kwargs)
def build(self, input_shape):
weight_shape = (1,) + input_shape[1:] #shape (1, m, n)
self.kernel = self.add_weight(name='kernel',
shape=weight_shape,
initializer='uniform',
trainable=True)
super(ElementwiseWeights, self).build(input_shape)
def compute_output_shape(self,input_shape):
return input_shape
def call(self, inputs):
return self.kernel * inputs
Now let's build our functional API model:
#x_pk model tensor input
x_pk = Input(tensor=x_pk_tensor) #shape (m, n)
#x usual input with fixed batch size
x = Input(batch_shape=(batch,n)) #shape (batch, n)
#calculate F
out = Lambda(calculate_f)([x, xp_k]) #shape (batch, m, n)
#multiply a_pk
out = ElementwiseWeights()(out) #shape (batch, m, n)
#sum n elements, keep m rows:
out = Lambda(lambda x: K.sum(x, axis=-1))(out) #shape (batch, m)
#softmax
out = Activation('softmax')(out) #shape (batch,m)
Continue this model with whatever you want and finish it:
model = Model([x, x_pk], out)
model.compile(.....)
model.fit(x_train, y_train, ....) #perhaps you might need .fit([x_train], ytrain,...)
Edit for function f
You can have the proposed f like this:
#create the n coefficients:
coefficients = np.array([c0, c1, .... , cn])
coefficients = coefficients.reshape((1,1,n))
def f(x, x_pk):
c = K.variable(coefficients) #shape (1, 1, n)
out = (x - x_pk) / c
return K.exp(out)
This f would accept x with shape (batch, 1, n), without the stack used in the calculate_f function.
Or could accept x_pk with shape (1, m, n), allowing variable batch size.
But I'm not sure it's possible to have both of these shapes together. Testing this might be interesting.

Retrieving last value of LSTM sequence in Tensorflow

I have sequences of different lengths that I want to classify using LSTMs in Tensorflow. For the classification I just need the LSTM output of the last timestep of each sequence.
max_length = 10
n_dims = 2
layer_units = 5
input = tf.placeholder(tf.float32, [None, max_length, n_dims])
lengths = tf.placeholder(tf.int32, [None])
cell = tf.nn.rnn_cell.LSTMCell(num_units=layer_units, state_is_tuple=True)
sequence_outputs, last_states = tf.nn.dynamic_rnn(cell, sequence_length=lengths, inputs=input)
I would like to get, in NumPy notation: output = sequence_outputs[:,lengths]
Is there any way or workaround to get this behaviour in Tensorflow?
---UPDATE---
Following this post How to select rows from a 3-D Tensor in TensorFlow? it seems that is possible to solve the problem in an efficient manner with tf.gather and manipulating the indices. The only requirement is that the batch size must be known in advance. Here is the adaptation of the referred post to this concrete problem:
max_length = 10
n_dims = 2
layer_units = 5
batch_size = 2
input = tf.placeholder(tf.float32, [batch_size, max_length, n_dims])
lengths = tf.placeholder(tf.int32, [batch_size])
cell = tf.nn.rnn_cell.LSTMCell(num_units=layer_units, state_is_tuple=True)
sequence_outputs, last_states = tf.nn.dynamic_rnn(cell,
sequence_length=lengths, inputs=input)
#Code adapted from #mrry response in StackOverflow:
#https://stackoverflow.com/questions/36088277/how-to-select-rows-from-a-3-d-tensor-in-tensorflow
rows_per_batch = tf.shape(input)[1]
indices_per_batch = 1
# Offset to add to each row in indices. We use `tf.expand_dims()` to make
# this broadcast appropriately.
offset = tf.range(0, batch_size) * rows_per_batch
# Convert indices and logits into appropriate form for `tf.gather()`.
flattened_indices = lengths - 1 + offset
flattened_sequence_outputs = tf.reshape(self.sequence_outputs, tf.concat(0, [[-1],
tf.shape(sequence_outputs)[2:]]))
selected_rows = tf.gather(flattened_sequence_outputs, flattened_indices)
last_output = tf.reshape(selected_rows,
tf.concat(0, [tf.pack([batch_size, indices_per_batch]),
tf.shape(self.sequence_outputs)[2:]]))
#petrux option (Get the last output of a dynamic_rnn in TensorFlow) seems also to work but the need of building a list within a for loop may be less optimized, although I did not perform any benchmark to support this statement.
This could be an answer. I don't think there is anything similar to the NumPy notation you pointed out, but the effect is the same.
Here's a solution, using gather_nd, where batch size does not need to be known ahead of time.
def extract_axis_1(data, ind):
"""
Get specified elements along the first axis of tensor.
:param data: Tensorflow tensor that will be subsetted.
:param ind: Indices to take (one for each element along axis 0 of data).
:return: Subsetted tensor.
"""
batch_range = tf.range(tf.shape(data)[0])
indices = tf.stack([batch_range, ind], axis=1)
res = tf.gather_nd(data, indices)
return res
output = extract_axis_1(sequence_outputs, lengths - 1)
Now output is a tensor of dimension [batch_size, num_cells].

Categories