If I add, e.g. kernel_regularizer=tf.keras.regularizers.L1(0.01) to a layer, do I need to add something to my loss description when I compile, or is it automatically added to my normal loss?
Using the tf.keras.regularizers.L1(0.01) will automatically add a penalty to your loss function. You can observe the changes in the loss function with and without the penalty using this simple example:
import tensorflow as tf
tf.random.set_seed(1)
x_input = tf.keras.layers.Input((1,))
x = tf.keras.layers.Dense(3, kernel_regularizer=tf.keras.regularizers.L1(0.01))(x_input)
x_output = tf.keras.layers.Dense(1, activation='sigmoid')(x)
model = tf.keras.Model(x_input, x_output)
model.compile(optimizer='adam', loss=tf.keras.losses.BinaryCrossentropy())
x = tf.random.normal((1, 1))
y = tf.random.uniform((1, 1), maxval=2, dtype=tf.int32)
model.fit(x, y, epochs=1)
If you were to use a custom training loop, you would have to manually add the additional penalties you have defined in certain layers to your loss, as shown here.
Related
I am writing a neural network in keras. I want to modify the loss function so that I can use the array (in the shape of a gradient array) of parameters as additional tool to modify the cost function.
To be precise, I'd like to use the variance of the gradients from past training. Parameters that have a high gradient variance - let's call it h, are assumed to be parameters that hold the features.
I would like the cost function to use parameters whose h value is as small as possible when training new features - for this I have to modify the cost functions for the parameter like this:
Loss (parameter) = Standard_loss (y, y_pred) + h * (parameter - old parameter) ** 2
I would very much like to ask for an answer.
Here is an excerpt from my code:
from keras import models
from keras.datasets import mnist
import tensorflow as tf
import matplotlib.pyplot as plt
from keras import backend as K
#I import CIFAR 10 dataset
from tensorflow.keras.datasets import cifar10
from keras.utils.np_utils import to_categorical
train_y = to_categorical(train_y, num_classes=10, dtype='float32')
test_y = to_categorical(test_y, num_classes=10, dtype='float32')
train_X = K.cast(train_X, dtype='float32')
test_X = K.cast(test_X, dtype='float32')
def get_model():
model = models.Sequential()
model.add(layers.Conv2D(1, 5, (1,1), input_shape=(32,32,3,), padding='same'))
model.add(layers.MaxPooling2D())
model.add(layers.ReLU())
model.add(layers.Conv2D(4, 5, (2,2), padding='same'))
model.add(layers.MaxPooling2D())
model.add(layers.ReLU())
model.add(layers.Flatten())
model.add(layers.Dense(128, activation='sigmoid'))
model.add(layers.Dense(10, activation='linear'))
model.add(layers.Softmax())
print(model.summary())
return model
model = get_model()
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(train_X, train_y, epochs=50, validation_split=0.2)
weights = model.get_weights()
Unfortunately, I don't know how to take the gradient from the weights :/
I want get a gradient table for each parameter for a single training example. I do not mean the total gradient of the cost function as mentioned elsewhere on the internet.
From what I can see, the cost function is modifiable, but it only takes y_pred and y_true. How could I input something that corresponds to the weights (but it is not a weight)?
Thanks in advance!
I have defined a stateful LSTM RNN, and I want to reset the state of the RNN after each epoch. I have found that one way to do this would be:
n_epochs = 50
for i in range(n_epochs):
lstm.fit(X, y, epochs = 1, batch_size = 64)
lstm.reset_states()
Is there any other more elegant way to implement this in the model specification or when training that is supported by Keras?
You should be able to solve this with a Keras callback, which probably a bit more elegant:
import tensorflow as tf
class CustomCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs=None):
lstm_layer.reset_states()
inputs = tf.keras.layers.Input(batch_shape = (10, 5, 2))
x = tf.keras.layers.LSTM(10, stateful=True)(inputs)
outputs = tf.keras.layers.Dense(1, activation='linear')(x)
model = tf.keras.Model(inputs, outputs)
lstm_layer = model.layers[1]
model.compile(optimizer='adam', loss='mse')
x = tf.random.normal((200, 5, 2))
y = tf.random.normal((200, 1))
model.fit(x, y, epochs=5, callbacks=[CustomCallback()], batch_size=10)
For experiments only, everyone knows when working for multiple steps and you set all input values back to 0 for all DATA ( long potential enough or the same number as input ) in the batch that reset all memories of LSTM.
That is the behavior of LSTM since they are sensitive to input because it contains comparison units and summation units.
I am trying to custom a loss function using the outputs of each neuron of the last layer. And the function may not be linear. Here is what I am working on:
## some previous layers##
## my last dense layer##
dense1 = Dense(4, activation="relu", name="dense_layer1")(previous layer)
dense11 = Dense(1, activation = "sigmoid", name = "dense11")(dense1)
dense12 = Dense(1, activation = "sigmoid", name = "dense12")(dense1)
dense13 = Dense(1, activation = "sigmoid", name = "dense13")(dense1)
dense14 = Dense(1, activation = "sigmoid", name = "dense14")(dense1)
## custom loss function ##
def custom_layer(tensor):
return tensor[1]*2+tensor[2]+tensor[3]/(tensor[4]*2) #some nonlinear function like this
lambda_layer = Lambda(custom_layer, name="lambda_layer")([dense11,dense12,dense13,dense14])
model = Model(inputs=Input, outputs=lambda_layer) # "Input" are in previous layers, not shown here
model.compile(loss='mse', optimizer='adam')
model.fit(X_train, Y_train, epochs=2, batch_size=512, verbose=1)
My Y_train is n*1 (n is the sample size).
So I am basically applying a nonlinear transformation of those final four neurons' output, which is equivalent as to construct a new loss function. After the transformation, the y hat should also be a n*1 vector.
But the code keeps not working. I think it is due to the lambda_layer or the custom_layer function. I also tried to define a new loss function (then there would be no "lambda_layer"), but it didn't work either. I have no idea what's wrong with it. (headache!)
Any ideas or suggestions are appreciated!! Thanks a lot! (I'm using Python3.7 with Tensorflow version 2.0.0)
Solved, thanks!
I have created a simple machine learning model to predict the multiplication of two given numbers. I followed a youtube tutorial to learn the basic and try to work on this simple idea.
My model has three dense layers - input, hidden, output. Input and hidden were using same activation function 'relu' which were giving me loss as NaN on model fit so I changed one of them to sigmoid which started giving me 0.00000+e... something as loss.
I don't know what is wrong. Anyone can please direct me what I am doing wrong or assuming wrong?
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('data.csv')
print(df)
x = np.array(df['X'])
y = np.array(df['Y'])
s = np.array(df['S'])
def build_model():
model = keras.Sequential()
inputLayer = layers.Dense(64, activation='sigmoid', input_shape=[2])
hiddenLayer = layers.Dense(64, activation='relu')
outputLayer = layers.Dense(1)
model.add(inputLayer)
model.add(hiddenLayer)
model.add(outputLayer)
model.compile(optimizer='sgd', loss='mean_squared_error',metrics=['accuracy'])
return model
model = build_model()
print(model.summary())
EPOCHS = 1000
# I didn't know how to provide mulitple input to my model for
# training so I checked stackoverflow here
# https://stackoverflow.com/questions/55233377/keras-sequential-model-with-multiple-inputs?noredirect=1&lq=1
merged_array = np.stack([x, y], axis=1)
history = model.fit(merged_array, s, epochs=EPOCHS, validation_split = 0.2, verbose=2)
print(history)
print(model.predict([[2,3],]))
Disclaimer: I am a beginner and I have just started using keras and python for the first time in my life.
It does work for smaller numbers with ReLU activation.
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
x = np.random.randint(0, 10, 1000)
y = np.random.randint(0, 10, 1000)
s = x*y
def build_model():
model = keras.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=[2]))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(1))
model.compile(optimizer=keras.optimizers.Adam(lr=0.01),
loss='mean_squared_error')
return model
model = build_model()
merged_array = np.stack([x, y], axis=1)
history = model.fit(merged_array, s, epochs=250,
validation_split=0.2)
test_input = [2, 3]
print('\n{} x {} ='.format(*test_input),
np.round(model.predict([test_input])[0][0]).astype(int))
2 x 3 = 6
SGD also works, but it requires standardization/normalization, which kind of defeats the purpose of your task, so I changed it. But it also works.
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
x = np.random.randint(0, 10, 1000)
y = np.random.randint(0, 10, 1000)
s = x*y
x = x/10
y = y/10
def build_model():
model = keras.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=[2]))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(1))
model.compile(optimizer=keras.optimizers.SGD(0.001), loss='mean_squared_error')
return model
model = build_model()
merged_array = np.stack([x, y], axis=1)
history = model.fit(merged_array, s, epochs=250,
validation_split=0.2, batch_size=16)
test_input = [2/10, 3/10]
print('\n{} x {} ='.format(*map(lambda l: int(l*10), test_input)),
np.round(model.predict([test_input])[0][0]).astype(int))
i noticed a couple of issues with your model:
Your input layer is not an input. You do not need to have a designated input layer in this case. The arguement input_shape=[2] is sufficient to add a proper input layer before this layer.
You do not determine any batchsize in the fit function: batches are usually a small subset of your training and validation set (commonly some base-2 numbers like 4, 8, 16, 32, ...). During training not only one sample of your set is used for backpropagating and adjusting your weights (aka "learning") but in batches, which makes it faster. Since your input data are two single floating numbers (I assume) you can choose a really high batchsize like 1024 or higher. The batch size belongs to the so called hyperparameter, which affect your overall training success.
history = model.fit(merged_array, s, batch_size=1024, epochs=EPOCHS, validation_split=0.2, verbose=2)
During training you track the "accuracy" metric. As you are working on a regression problem, this is not helping you in estimating your model's performance. (Accuracy is used for classification problems) You can leave it out
I cannnot give you more specific advice with knowledge about the data you are using, how many, datapoints you have and what kind of numbers you want to multiply (bounded to numbers between 0 and 10, float or integeres,...)
Hope this helps sofar (;
I am interested in building reinforcement learning models with the simplicity of the Keras API. Unfortunately, I am unable to extract the gradient of the output (not error) with respect to the weights. I found the following code that performs a similar function (Saliency maps of neural networks (using Keras))
get_output = theano.function([model.layers[0].input],model.layers[-1].output,allow_input_downcast=True)
fx = theano.function([model.layers[0].input] ,T.jacobian(model.layers[-1].output.flatten(),model.layers[0].input), allow_input_downcast=True)
grad = fx([trainingData])
Any ideas on how to calculate the gradient of the model output with respect to the weights for each layer would be appreciated.
To get the gradients of model output with respect to weights using Keras you have to use the Keras backend module. I created this simple example to illustrate exactly what to do:
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras import backend as k
model = Sequential()
model.add(Dense(12, input_dim=8, init='uniform', activation='relu'))
model.add(Dense(8, init='uniform', activation='relu'))
model.add(Dense(1, init='uniform', activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
To calculate the gradients we first need to find the output tensor. For the output of the model (what my initial question asked) we simply call model.output. We can also find the gradients of outputs for other layers by calling model.layers[index].output
outputTensor = model.output #Or model.layers[index].output
Then we need to choose the variables that are in respect to the gradient.
listOfVariableTensors = model.trainable_weights
#or variableTensors = model.trainable_weights[0]
We can now calculate the gradients. It is as easy as the following:
gradients = k.gradients(outputTensor, listOfVariableTensors)
To actually run the gradients given an input, we need to use a bit of Tensorflow.
trainingExample = np.random.random((1,8))
sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())
evaluated_gradients = sess.run(gradients,feed_dict={model.input:trainingExample})
And thats it!
The below answer is with the cross entropy function, feel free to change it your function.
outputTensor = model.output
listOfVariableTensors = model.trainable_weights
bce = keras.losses.BinaryCrossentropy()
loss = bce(outputTensor, labels)
gradients = k.gradients(loss, listOfVariableTensors)
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
evaluated_gradients = sess.run(gradients,feed_dict={model.input:training_data1})
print(evaluated_gradients)