Can I train a Tensorflow keras model with complex input/output? - python

I am trying to train a very simple model which only have one convolution layer.
def kernel_model(filters=1, kernel_size=3):
input_layer = Input(shape=(250,1))
conv_layer = Conv1D(filters=filters,kernel_size=kernel_size,padding='same',use_bias = False)(input_layer)
model = Model(inputs=input_layer,output=conv_layer)
return model
But the input(X), prediction output(y_pred) and true_output(y_true) are all complex number. When I call the function model.fit(X,y_true)
There is the error
TypeError: Gradients of complex tensors must set grad_ys (y.dtype = tf.complex64)
Does that means I have to write the back-propagation by hand?
What should I do to solve this problem? thanks

Your DNN needs to mininimize the Loss-function through back-propagation. To minimize something, it naturally needs to have an ordering. Complex numbers are not ordered, while Reals are.
So you generally need a loss function L: Complex -> Reals
Change your complex-valued loss function from simple square:
error = K.cast(K.mean(K.square(y_pred_propgation - y_true)),tf.complex64)
to a real-valued magnitude ||.||^2 of the complex number:
error = K.mean(K.square(K.abs(y_true-y_pred)))

Related

No gradients provided for any variable error

I'm creating a model using the Keras functional API.
The layer architecture is as follows:
n = tf.keras.layers.Dense(1)(input)
for i in tf.range(n):
output = tf.keras.layers.Dense(4)(input)
I then concat the outputs and return for a tensor with shape [1, None, 4] where [1] is the batch dimension, [None] is n, and [4] is the output from the second dense layer.
My loss function involves comparing the shape of the expected output, and comparing the outputs.
loss = tf.convert_to_tensor(abs(tf.shape(logits)[1] - tf.shape(expected)[1])) * 100.
When running this on a custom training loop, I'm getting the error
ValueError: No gradients provided for any variable: (['while/dense/kernel:0',
'while/dense/bias:0', 'while/while/dense_1/kernel:0', 'while/while/dense_1/bias:0'],).
Provided `grads_and_vars` is ((None, <tf.Variable 'while/dense/kernel:0' shape=(786432, 1)
Shape is not differentiable, you cannot do things like this with gradient based learning. Problems like this need to be tackled with more powerful tools, e.g. reinforcement learning where one considers n as an action, and get policy gradient for that.
A rule of thumb to remember is that you cannot really backprop through discrete objects. You need to produce floats, as gradients require smooth functions. In your case n should be an integer (what does a loop over a float mean?) so this should be your first warning sign. The other being shape itself, which is also an integer. A target can be discrete, but not the prediction. Note that even in classification we do not output class we output probability as probability is smooth.
You could build your model by assuming some maximum number of N and treat it more like a classification where you supervise N directly, and use some form of masking to keep all the results around.

Trouble with loss function tf.nn.weighted_cross_entropy_with_logits

I am trying to train a u-net network with binary targets. The usual Binary Cross Entropy loss does not perform well, since the lables are very imbalanced (many more 0 pixels than 1s). So I want to punish false negatives more. But tensorflow doesn't have a ready-made weighted binary cross entropy. Since I didn't want to write a loss from scratch, I'm trying to use tf.nn.weighted_cross_entropy_with_logits. to be able to easily feed the loss to model.compile function, I'm writing this wrapper:
def loss_wrapper(y,x):
x = tf.cast(x,'float32')
loss = tf.nn.weighted_cross_entropy_with_logits(y,x,pos_weight=10)
return loss
However, regardless of casting x to float, I'm still getting the error:
TypeError: Input 'y' of 'Mul' Op has type float32 that does not match type int32 of argument 'x'.
when the tf loss is called. Can someone explain what's happening?
If x represents your predictions. It probably already has the type float32. I think you need to cast y, which is presumably your labels. So:
loss = tf.nn.weighted_cross_entropy_with_logits(tf.cast(y, dtype=tf.float32),x,pos_weight=10)

Tensorflow Custom Regularization Term comparing the Prediction to the True value

Hello I am in need of a custom regularization term to add to my (binary cross entropy) Loss function. Can somebody help me with the Tensorflow syntax to implement this?
I simplified everything as much as possible so it could be easier to help me.
The model takes a dataset 10000 of 18 x 18 binary configurations as input and has a 16x16 of a configuration set as output. The neural network consists only of 2 Convlutional layer.
My model looks like this:
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
EPOCHS = 10
model = models.Sequential()
model.add(layers.Conv2D(1,2,activation='relu',input_shape=[18,18,1]))
model.add(layers.Conv2D(1,2,activation='sigmoid',input_shape=[17,17,1]))
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),loss=tf.keras.losses.BinaryCrossentropy())
model.fit(initial.reshape(10000,18,18,1),target.reshape(10000,16,16,1),batch_size = 1000, epochs=EPOCHS, verbose=1)
output = model(initial).numpy().reshape(10000,16,16)
Now I wrote a function which I'd like to use as an aditional regularization terme to have as a regularization term. This function takes the true and the prediction. Basically it multiplies every point of both with its 'right' neighbor. Then the difference is taken. I assumed that the true and prediction term is 16x16 (and not 10000x16x16). Is this correct?
def regularization_term(prediction, true):
order = list(range(1,4))
order.append(0)
deviation = (true*true[:,order]) - (prediction*prediction[:,order])
deviation = abs(deviation)**2
return 0.2 * deviation
I would really appreciate some help with adding something like this function as a regularization term to my loss for helping the neural network to train better to this 'right neighbor' interaction. I'm really struggling with using the customizable Tensorflow functionalities a lot.
Thank you, much appreciated.
It is quite simple. You need to specify a custom loss in which you define your adding regularization term. Something like this:
# to minimize!
def regularization_term(true, prediction):
order = list(range(1,4))
order.append(0)
deviation = (true*true[:,order]) - (prediction*prediction[:,order])
deviation = abs(deviation)**2
return 0.2 * deviation
def my_custom_loss(y_true, y_pred):
return tf.keras.losses.BinaryCrossentropy()(y_true, y_pred) + regularization_term(y_true, y_pred)
model.compile(optimizer='Adam', loss=my_custom_loss)
As stated by keras:
Any callable with the signature loss_fn(y_true, y_pred) that returns
an array of losses (one of sample in the input batch) can be passed to
compile() as a loss. Note that sample weighting is automatically
supported for any such loss.
So be sure to return an array of losses (EDIT: as I can see now it is possible to return also a simple scalar. It doesn't matter if you use for example the reduce function). Basically y_true and y_predicted have as first dimension the batch size.
here details: https://keras.io/api/losses/

Custom loss function which depends on another neural network in keras

I have a "How can I do that" question with keras :
Assuming that I have a first neural network, say NNa which has 4 inputs (x,y,z,t) which is already trained.
If I have a second neural network, say NNb, and that its loss function depends on the first neural network.
The custom loss function of NNb customLossNNb calls the prediction of NNa with a fixed grid (x,y,z) and just modify the last variable t.
Here in pseudo-python-code what I would like to do to traine the second NN : NNb:
grid=np.mgrid[0:10:1,0:10:1,0:10:1].reshape(3,-1).T
Y[:,0]=time
Y[:,1]=something
def customLossNNb(NNa,grid):
def diff(y_true,y_pred):
for ii in range(y_true.shape[0]):
currentInput=concatenation of grid and y_true[ii,0]
toto[ii,:]=NNa.predict(currentInput)
#some stuff with toto
return #...
return diff
Then
NNb.compile(loss=customLossNNb(NNa,K.variable(grid)),optimizer='Adam')
NNb.fit(input,Y)
In fact the line that cause me troubles is currentInput=concatenation of grid and y_true[ii,0]
I tried to send to customLossNNb the grid as a tensor with K.variable(grid). But I can't defined a new tensor inside the loss function, something like CurrentY which has a shape (grid.shape[0],1) fill with y[ii,0](i.e. the current t) and then concatenate grid and currentY to build currentInput
Any ideas?
Thanks
You can include your custom loss function into the graph using functional API of keras. The model in this case can be used as a function, something like this:
for l in NNa.layers:
l.trainable=False
x=Input(size)
y=NNb(x)
z=NNa(y)
Predict method will not work, since loss function should be part of the graph, and predict method returns np.array
First, make NNa untrainable. Notice that you should do this recursively if your model has inner models.
def makeUntrainable(layer):
layer.trainable = False
if hasattr(layer, 'layers'):
for l in layer.layers:
makeUntrainable(l)
makeUntrainable(NNa)
Then you have two options:
Attach NNa to the end of your model (notice that both y_true and y_pred will be changed)
Then change your targets (predict with NNa) for correct results since your model is now expecting the output of NNa, not NNb.
Create a custom loss function that uses NNa inside it, without changing your targets
Option 1 - Attaching models
inputs = NNb.inputs
outputs = NNa(NNb.outputs) #make sure NNb is outputing 4 tensors to match NNa inputs
fullModel = Model(inputs,outputs)
#changing the targets:
newY_train = NNa.predict(oldY_train)
Option 2 - Creating a custom loss
Warning: please test whether NNa's weights are really frozen while training this configuration
from keras.losses import binary_crossentropy
def customLoss(true,pred):
true = NNa(true)
pred = NNa(pred)
#use some of the usual losses or create your own
binary_crossentropy(true,pred)
NNb.compile(optimizer=anything, loss = customLoss)

Tensorflow loss function no gradient error when provided with scalar

This question is about the tf.losses.huber_loss() function and how it can be applied on scalars rather than vectors. Thank you for your time!
My model is similar to a classification problem like MNIST. I based my code on the TensorFlow layers tutorial and made changes where I saw fit. I do not think the exact code is needed for my question.
I have lables that take integer values in {0,..,8}, that are converted into onehot labels like this:
onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=n_classes)
The last layer in the model is
logits = tf.layers.dense(inputs=dense4, units=n_classes)
which is converted into predictions like this:
predictions = {"classes": tf.argmax(input=logits, axis=1), "probabilities": tf.nn.softmax(logits, name="softmax_tensor")}
From the tutorial, I started with the tf.losses.softmax_cross_entropy() loss function. But in my model, I am predicting in which discretized bin a value will fall. So I started looking for a loss function that would translate that a prediction of one bin off is less of a problem than two bins off. Something like the absolute_difference or Huber function.
The code
onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=n_classes)
loss = tf.losses.softmax_cross_entropy(onehot_labels=onehot_labels, logits=logits)
in combination with the optimizer:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=ps.learning_rate)
works without any errors. When changing to the Huber function:
loss = tf.losses.huber_loss(labels=onehot_labels, predictions=logits)
there are still no errors. But at this point I am unsure about what exactly happens. Based on the reduction definition I expect that the Huber function is applied pairwise for elements of the vectors and then summed up or averaged.
I would like to apply the Huber function only on the label integer (in {0,...,9}) and predicted value:
preds = tf.argmax(input=logits, axis=1)
So this is what I tried:
loss = tf.losses.huber_loss(labels=indices, predictions=preds)
This is raising the error
ValueError: No gradients provided for any variable
I have found two common causes that I do not think are happening in my situation:
This where there is no path between tf.Variable objects and the loss function. But since my prediction code is often used and the labels were provided as integers, I do not think this applies here.
The function is not derivable into a gradient. But the Huber function does work when vectors are used as input, so I do not think this is the case.
My question is: what code lets me use the Huber loss function on my two integer tensors (labels and predictions)?

Categories