Creating constant value in Keras - python

I am trying to create a constant variable inside a keras model. What I was doing till now is to pass it as Input. But it is always a constant so I want it as a constant.(The input is [1,2,3...50] for each example => so I use np.tile(np.array(range(50)),(len(X_input))) to reproduce it for each example)
So for now I had:
constant_input = Input(shape=(50,), dtype='int32', name="constant_input")
Which gives a tensor: Tensor("constant_input", shape(?,50), dtype=int32)
Now trying to do it as a constant:
np_constant = np.array(list(range(50))).reshape(1, 50)
tf_constant = K.constant(np_constant)
tensor_constant = Input(tensor=tf_constant, shape=(50,), dtype='int32', name="constant_input")
which gives a tensor: Tensor("constant_input", shape(50,1),dtype=float32)
But What I want is the constant to be scaled in each batch, meaning that the shape of the tensor should be (?, 50), the same as the way of using Input.
Is it possible to do that?

You cannot have a constant with variable size. A constant always has the same value. What you can do is have the (1, 50) constant and then tile it within TensorFlow with K.tile. Also better use np.arange instead of np.array(list(range(50)). Something like:
from keras.layers.core import Lambda
import keras.backend as K
def operateWithConstant(input_batch):
tf_constant = K.constant(np.arange(50).reshape((1, 50)))
batch_size = K.shape(input_batch)[0]
tiled_constant = K.tile(tf_constant, (batch_size, 1))
# Do some operation with tiled_constant and input_batch
result = ...
return result
input_batch = Input(...)
input_operated = Lambda(operateWithConstant)(input_batch)
# continue...

Related

How to use a batch_size of Keras tensor at the model building time?

I want to use an external program as a custom operation.
Because automatic gradient would be not available, I wrote the code to provide gradients by using numerical methods. However, because it have to compute the batch_size number of derivatives,
I wrote it to get batch_size from the shape of x.
Following is an example using numpy function as an external program
f(x) = np.sum(x**2)
(In fact, for this simple numpy function, no loop over batch_size is necessary. But, it is written for general external function.)
#tf.custom_gradient
def custom_op(x):
# without using numpy, use external function
# assume x shape = (batch_size,3)
batch_size= x.shape[0]
input_length = x.shape[1]
# assert input_length==3
yout=[] # shape should be (batch_size,1)
gout=[] # shape should be (batch_size,3)
for i in range(batch_size):
inputs = x[i,:] # shape (3,)
y = np.sum(inputs**2) # shape (3,)
yout.append(y) # shape (1,)
# compute differences
dy = []
for j in range(len(inputs)):
delta = np.zeros_like(inputs)
delta[j] = np.abs(inputs[j])*0.001
yplus = np.sum((inputs + delta)**2) # change only j-th input
grad = (yplus-y)/delta[j] #shape (1,)
dy.append(grad)
gout.append(dy)
yout = tf.convert_to_tensor(yout,dtype='float32') # (batch_size,)
yout = tf.reshape(yout,shape=(batch_size,1)) # (batch_size,1)
gout = tf.convert_to_tensor(gout,dtype='float32') # (batch_size,)
gout = tf.reshape(gout,shape=(batch_size,input_length)) # (batch_size,1)
def grad(upstream):
return upstream*gout
return yout, grad
x = tf.Variable([[1.,2.,3.],[2.,3.,4.]],dtype='float32')
with tf.GradientTape() as tape:
y = custom_op(x)
tape.gradient(y,x)
and found it works.
However, when I tried to use it in the keras model , for example,
def construct_model():
inputs = tf.keras.Input(shape=(3,)) #input array
x = tf.keras.layers.Dense(1)(inputs)
outputs = custom_op(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
optimizer = 'adam'
model.compile(loss='mean_squared_error',
optimizer=optimizer,
metrics=['mean_absolute_error', 'mean_squared_error'])
return model
model = construct_model()
it gives errors
because kerasTensor "inputs" does not have specified batch_size.
I tried to specify batch_size as "tf.keras.Input(shape=(3,),batch_size=2)".
However, it also raises errors because of the use of kerasTensor.
How should I change the custom_op to be compatible with keras?

Why can the reshape function in keras not change the number of dimensions

I'm attempting to make a chess engine using a neural network made in Keras. I want to output a prediction of the probable policy based off training games, and I am using a 73x8x8 output to do that (each position on the board times 73 different possible moves, 8 directions * 7 squares for the "queen moves", 8 knight moves, 3 promotions (any other promotion is a queen promotion) times 3 directions).
However the final layer in my network is a Dense layer, which outputs a single dimensional 4672 long output. I am trying to reshape this into something easier to use through the Reshape layer.
However, it gives me this error: ValueError: Error when checking target: expected reshape_1 to have 4 dimensions, but got array with shape (2, 1)
I have had a look at this question: Error when checking target: expected dense_1 to have 3 dimensions, but got array with shape (118, 1) but the answer doesn't seem to apply to Dense layers as they do not have a "return sequences" input.
Here is my code:
from keras.models import Model, Input
from keras.layers import Conv2D, Dense, Flatten, Reshape
from keras.optimizers import SGD
import numpy
from copy import deepcopy
class NeuralNetwork:
def __init__(self):
self.network = Model()
self.create_network()
def create_network(self):
input = Input((13, 8, 8))
output = Conv2D(256, (3, 3), padding='same')(input)
policy_head_output = Conv2D(2, (1, 1))(output)
policy_head_output = Flatten()(policy_head_output)
policy_head_output = Dense(4672, name='policy_output')(policy_head_output)
policy_head_output = Reshape((73, 8, 8), input_shape=(4672,))(policy_head_output)
value_head_output = Conv2D(1, (1, 1))(output)
value_head_output = Dense(256)(value_head_output)
value_head_output = Flatten()(value_head_output)
value_head_output = Dense(1, name="value_output")(value_head_output)
self.network = Model(outputs=[value_head_output, policy_head_output], inputs=input)
def train_network(self, input_training_data, labels):
sgd = SGD(0.2, 0.9)
self.network.compile(sgd, 'categorical_crossentropy', metrics=['accuracy'])
self.network.fit(input_training_data, [labels[0], labels[1]])
self.network.save("Neural Net 1")
def make_training_data():
training_data = []
labels = []
for i in range(6):
training_data.append(make_image())
labels.append(make_label_image())
return training_data, labels
def make_image():
data = []
for i in range(13):
blank_board = []
for j in range(8):
a = []
for k in range(8):
a.append(0)
blank_board.append(a)
data.append(blank_board)
return data
def make_label_image():
policy_logits = []
blank_board = []
for i in range(8):
a = []
for j in range(8):
a.append(0)
blank_board.append(a)
for i in range(73):
policy_logits.append(deepcopy(blank_board))
return [policy_logits, [0]]
def main():
input_training_data, output_training_data = make_training_data()
neural_net = NeuralNetwork()
input_training_data = numpy.array(input_training_data)
output_training_data = numpy.array(output_training_data)
neural_net.train_network(input_training_data, output_training_data)
main()
Could someone please explain:
What's happening
What I can do to fix it
There's a few things wrong with your approach.
1. Your output/target data
So you're creating a list object with two elements (board, label). Board is 73x8x8 where label is 0/1. This creates inconsistent dimensions. And when you convert this ragged structure to a numpy array this happens.
a = [[0,1,2,3],[0]]
arr = np.array(a)
print(arr)
# => [list([0, 1, 2, 3]) list([0])]
Then data slicing indexing takes a very weird turn and I will not go there. So, first thing is separate out your data, so that each element returned in your make_training_data has consistent dimensions. So here we have the input_image, output_board_image and output_labels returned separately.
def make_training_data():
training_data = []
labels = []
board_output = []
for i in range(6):
training_data.append(make_image())
board, lbl = make_label_image()
labels.append(lbl)
board_output.append(board)
return training_data, board_output, labels
and in the main(), it becomes,
input_training_data, output_training_board, output_training_labels = make_training_data()
input_training_data = np.array(input_training_data)
output_training_board = np.array(output_training_board)
output_training_labels = np.array(output_training_labels)
The error
So you're getting the error
ValueError: Error when checking target: expected reshape_1 to have 4 dimensions, but got array with shape (2, 1)
Well, it's simple, you have given the outputs in the wrong order when doing the model.fit(). In other words, your model says,
outputs=[value_head_output, policy_head_output]
and your make_labels() says,
[policy_logits, [0]]
which is the other way around. Your poor model is trying to reshape labels to that 4 dimensional structure. That's why it complains. So it should be,
neural_net.train_network(input_training_data, [output_training_labels, output_training_board])
Even if you correct just this (without the make_training_data()), you probably won't get this working because of all those inconsistencies in your numpy structure (the first section).
The loss function
This is about your loss function. You have a Dense layer with a single output and you're using categorical_crossentropy which is for "categorical" outputs. You should use binary_crossentropy here, as you only have a single index.
Also, if you want multiple losses for your multiple outputs do the following.
self.network.compile(sgd, ['binary_crossentropy', 'mean_squared_error'], metrics=['accuracy'])
This is just an example. If you want you can have the same loss for both inputs too.

Dot pipeline data with constant matrix

Is it possible to multiply the batch in the middle of the pipeline with a constant transformation? Something along the lines of
constant_non_trainable_matrix = numpy.array([...]) # shape (n,n)
input = tf.keras.layers.InputLayer(shape = (n,))
dense_1 = tf.keras.layers.Dense((n,))(input)
transform = MultiplyWithMatrix(constant_non_trainable_matrix)(dense_1)
output = tf.keras.layers.Dense((n,))(transform)
model = tf.keras.models.Model(inputs = input, outputs = output)
You can use a Lambda layer and backend.dot() to achieve that:
from keras import layers
from keras import backend as K
# ...
transformed = layers.Lambda(lambda x: K.dot(x, mat))(dense_1)
You need to construct the mat tensor using the backend functions as well (e.g. K.constant(), K.variable(), etc.).

Keras_ERROR : "cannot import name '_time_distributed_dense"

Since the Keras wrapper does not support attention model yet, I'd like to refer to the following custom attention.
https://github.com/datalogue/keras-attention/blob/master/models/custom_recurrents.py
But the problem is, when I run the code above, it returns following error:
ImportError: cannot import name '_time_distributed_dense'
It looks like no more _time_distributed_dense is supported by keras over 2.0.0
the only parts that use _time_distributed_dense module is the part below:
def call(self, x):
# store the whole sequence so we can "attend" to it at each timestep
self.x_seq = x
# apply the a dense layer over the time dimension of the sequence
# do it here because it doesn't depend on any previous steps
# thefore we can save computation time:
self._uxpb = _time_distributed_dense(self.x_seq, self.U_a, b=self.b_a,
input_dim=self.input_dim,
timesteps=self.timesteps,
output_dim=self.units)
return super(AttentionDecoder, self).call(x)
In which way should I change the _time_distrubuted_dense(self ... ) part?
I just posted from An Chen's answer of the GitHub issue (the page or his answer might be deleted in the future)
def _time_distributed_dense(x, w, b=None, dropout=None,
input_dim=None, output_dim=None,
timesteps=None, training=None):
"""Apply `y . w + b` for every temporal slice y of x.
# Arguments
x: input tensor.
w: weight matrix.
b: optional bias vector.
dropout: wether to apply dropout (same dropout mask
for every temporal slice of the input).
input_dim: integer; optional dimensionality of the input.
output_dim: integer; optional dimensionality of the output.
timesteps: integer; optional number of timesteps.
training: training phase tensor or boolean.
# Returns
Output tensor.
"""
if not input_dim:
input_dim = K.shape(x)[2]
if not timesteps:
timesteps = K.shape(x)[1]
if not output_dim:
output_dim = K.shape(w)[1]
if dropout is not None and 0. < dropout < 1.:
# apply the same dropout pattern at every timestep
ones = K.ones_like(K.reshape(x[:, 0, :], (-1, input_dim)))
dropout_matrix = K.dropout(ones, dropout)
expanded_dropout_matrix = K.repeat(dropout_matrix, timesteps)
x = K.in_train_phase(x * expanded_dropout_matrix, x, training=training)
# collapse time dimension and batch dimension together
x = K.reshape(x, (-1, input_dim))
x = K.dot(x, w)
if b is not None:
x = K.bias_add(x, b)
# reshape to 3D tensor
if K.backend() == 'tensorflow':
x = K.reshape(x, K.stack([-1, timesteps, output_dim]))
x.set_shape([None, None, output_dim])
else:
x = K.reshape(x, (-1, timesteps, output_dim))
return x
You could just add this on your Python code.

Flattening two last dimensions of a tensor in TensorFlow

I'm trying to reshape a tensor from [A, B, C, D] into [A, B, C * D] and feed it into a dynamic_rnn. Assume that I don't know the B, C, and D in advance (they're a result of a convolutional network).
I think in Theano such reshaping would look like this:
x = x.flatten(ndim=3)
It seems that in TensorFlow there's no easy way to do this and so far here's what I came up with:
x_shape = tf.shape(x)
x = tf.reshape(x, [batch_size, x_shape[1], tf.reduce_prod(x_shape[2:])]
Even when the shape of x is known during graph building (i.e. print(x.get_shape()) prints out absolute values, like [10, 20, 30, 40] after the reshaping get_shape() becomes [10, None, None]. Again, still assume the initial shape isn't known so I can't operate with absolute values.
And when I'm passing x to a dynamic_rnn it fails:
ValueError: Input size (depth of inputs) must be accessible via shape inference, but saw value None.
Why is reshape unable to handle this case? What is the right way of replicating Theano's flatten(ndim=n) in TensorFlow with tensors of rank 4 and more?
It is not a flaw in reshape, but a limitation of tf.dynamic_rnn.
Your code to flatten the last two dimensions is correct. And, reshape behaves correctly too: if the last two dimensions are unknown when you define the flattening operation, then so is their product, and None is the only appropriate value that can be returned at this time.
The culprit is tf.dynamic_rnn, which expects a fully-defined feature shape during construction, i.e. all dimensions apart from the first (batch size) and the second (time steps) must be known. It is a bit unfortunate perhaps, but the current implementation does not seem to allow RNNs with a variable number of features, à la FCN.
I tried a simple code according to your requirements. Since you are trying to reshape a CNN output, the shape of X is same as the output of CNN in Tensorflow.
HEIGHT = 100
WIDTH = 200
N_CHANELS =3
N_HIDDEN =64
X = tf.placeholder(tf.float32, shape=[None,HEIGHT,WIDTH,N_CHANELS],name='input') # output of CNN
shape = X.get_shape().as_list() # get the shape of each dimention shape[0] =BATCH_SIZE , shape[1] = HEIGHT , shape[2] = HEIGHT = WIDTH , shape[3] = N_CHANELS
input = tf.reshape(X, [-1, shape[1] , shape[2] * shape[3]])
print(input.shape) # prints (?, 100, 600)
#Input for tf.nn.dynamic_rnn should be in the shape of [BATCH_SIZE, N_TIMESTEPS, INPUT_SIZE]
#Therefore, according to the reshape N_TIMESTEPS = 100 and INPUT_SIZE= 600
#create the RNN here
lstm_layers = tf.contrib.rnn.BasicLSTMCell(N_HIDDEN, forget_bias=1.0)
outputs, _ = tf.nn.dynamic_rnn(lstm_layers, input, dtype=tf.float32)
Hope this helps.
I found a solution to this by using .get_shape().
Assuming 'x' is a 4-D Tensor.
This will only work with the Reshape Layer. As you were making changes to the architecture of the model, this should work.
x = tf.keras.layers.Reshape(x, [x.get_shape()[0], x.get_shape()[1], x.get_shape()[2] * x.get_shape()][3])
Hope this works!
If you use the tf.keras.models.Model or tf.keras.layers.Layer wrapper, the build method provides a nice way to do this.
Here's an example:
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Conv1D, Conv2D, Conv2DTranspose, Attention, Layer, Reshape
class VisualAttention(Layer):
def __init__(self, channels_out, key_is_value=True):
super(VisualAttention, self).__init__()
self.channels_out = channels_out
self.key_is_value = key_is_value
self.flatten_images = None # see build method
self.unflatten_images = None # see build method
self.query_conv = Conv1D(filters=channels_out, kernel_size=1, padding='same')
self.value_conv = Conv1D(filters=channels_out, kernel_size=4, padding='same')
self.key_conv = self.value_conv if key_is_value else Conv1D(filters=channels_out, kernel_size=4, padding='same')
self.attention_layer = Attention(use_scale=False, causal=False, dropout=0.)
def build(self, input_shape):
b, h, w, c = input_shape
self.flatten_images = Reshape((h*w, c), input_shape=(h, w, c))
self.unflatten_images = Reshape((h, w, self.channels_out), input_shape=(h*w, self.channels_out))
def call(self, x, training=True):
x = self.flatten_images(x)
q = self.query_conv(x)
v = self.value_conv(x)
inputs = [q, v] if self.key_is_value else [q, v, self.key_conv(x)]
output = self.attention_layer(inputs=inputs, training=training)
return self.unflatten_images(output)
# test
import numpy as np
x = np.arange(8*28*32*3).reshape((8, 28, 32, 3)).astype('float32')
model = VisualAttention(8)
y = model(x)
print(y.shape)

Categories