Keras with activity_regularizer that is updated every iteration - python

I am building a simple neural network using Keras. It has activity regularization so that the output of the only hidden layer is forced to have small values. Here is the code:
import numpy as np
import math
import keras
from keras.models import Model, Sequential
from keras.layers import Input, Dense, Activation
from keras import regularizers
from keras import backend as K
a=1
def my_regularizer(inputs):
means=K.mean((inputs),axis=1)
return a*K.sum(means)**2
x_train=np.random.uniform(low=-1,high=1,size=(200,2))
model=Sequential([
Dense(20,input_shape=(2,),activity_regularizer=my_regularizer),
Activation('tanh'),
Dense(2,),
Activation('linear')
])
model.compile(optimizer='adam',loss='mean_squared_error')
model.fit(x_train,x_train,epochs=20,validation_split=0.1)
Questions:
1) Currently, parameter a is set at the beginning and it does not change. How can I change the code such that the parameter a is updated after each iteration such that
a_new=f(a_old,input)
where input is the values at the hidden layer and f(.) is an arbitrary function.
2) I want my activity regularizer to be applied after the first activation function tanh is applied. Have I written my code correctly? The term "activity_regularizer=my_regularizer" in
Dense(20,input_sahpe=(2,),activity_regularizer=my_regularizer)
makes me feel that the regularizer is being applied to values before the activation function tanh.

You can - but first, you need a valid Keras Regularizer object (your function won't work):
class MyActivityRegularizer(Regularizer):
def __init__(self, a=1):
self.a = K.variable(a, name='a')
# gets called at each train iteration
def __call__(self, x): # your custom function here
means = K.mean(x, axis=1)
return self.a * K.sum(means)**2
def get_config(self): # required class method
return {"a": float(K.get_value(self.a))}
Next, to work with .fit, you need a custom Keras Callback object (see alternative at bottom):
class ActivityRegularizerScheduler(Callback):
""" 'on_batch_end' gets automatically called by .fit when finishing
iterating over a batch. The model, and its attributes, are inherited by
'Callback' (except at __init__) and can be accessed via, e.g., self.model """
def __init__(self, model, update_fn):
self.update_fn=update_fn
self.activity_regularizers=_get_activity_regularizers(model)
def on_batch_end(self, batch, logs=None):
iteration = K.get_value(self.model.optimizer.iterations)
new_activity_reg = self.update_fn(iteration)
# 'activity_regularizer' references model layer's activity_regularizer (in this
# case 'MyActivityRegularizer'), so its attributes ('a') can be set directly
for activity_regularizer in self.activity_regularizers:
K.set_value(activity_regularizer.a, new_activity_reg)
def _get_activity_regularizers(model):
activity_regularizers = []
for layer in model.layers:
a_reg = getattr(layer,'activity_regularizer',None)
if a_reg is not None:
activity_regularizers.append(a_reg)
return activity_regularizers
Lastly, you'll need to create your model within the Keras CustomObjectScope - see in full ex. below.
Example usage:
from keras.layers import Dense
from keras.models import Sequential
from keras.regularizers import Regularizer
from keras.callbacks import Callback
from keras.utils import CustomObjectScope
from keras.optimizers import Adam
import keras.backend as K
import numpy as np
def make_model(my_reg):
return Sequential([
Dense(20, activation='tanh', input_shape=(2,), activity_regularizer=my_reg),
Dense(2, activation='linear'),
])
my_reg = MyActivityRegularizer(a=1)
with CustomObjectScope({'MyActivityRegularizer':my_reg}): # required for Keras to recognize
model = make_model(my_reg)
opt = Adam(lr=1e-4)
model.compile(optimizer=opt, loss='mse')
x = np.random.randn(320,2) # dummy data
y = np.random.randn(320,2) # dummy labels
update_fn = lambda x: .5 + .4*np.cos(x) #x = number of train updates (optimizer.iterations)
activity_regularizer_scheduler = ActivityRegularizerScheduler(model, update_fn)
model.fit(x,y,batch_size=32,callbacks=[activity_regularizer_scheduler],
epochs=4,verbose=1)
To TRACK your a and make sure it's changing, you can get its value at, e.g., each epoch end via:
for epoch in range(4):
model.fit(x,y,batch_size=32,callbacks=[activity_regularizer_scheduler],epochs=1)
print("Epoch {} activity_regularizer 'a': {}".format(epoch,
K.get_value(_get_activity_regularizers(model)[0].a)))
# My output:
# Epoch 0 activity_regularizer 'a': 0.7190816402435303
# Epoch 1 activity_regularizer 'a': 0.4982417821884155
# Epoch 2 activity_regularizer 'a': 0.2838689386844635
# Epoch 3 activity_regularizer 'a': 0.8644570708274841
Regarding (2), I'm afraid you're right - the 'tanh' outputs won't be used; you'll need to pass activation='tanh' instead.
Lastly, you can do it without a callback, via train_on_batch - but a drawback is, you'll need to feed data to the model yourself (and shuffle it, etc):
activity_regularizers = _get_activity_regularizers(model)
for iteration in range(100):
x, y = get_data()
model.train_on_batch(x,y)
iteration = K.get_value(model.optimizer.iterations)
for activity_regularizer in activity_regularizers:
K.set_value(activity_regularizer, update_fn(iteration))

Related

Gradient of one layer w.r.t another layer when there is an input layer (and no value for the input)

I have a network written in tensorflow keras functional API.
I'd like to use the gradient of one layer w.r.t to the previous layer as input for another layer.
I tried gradient tape and tf.gradients and none of them worked. I get the following error:
ValueError: tf.function-decorated function tried to create variables on non-first call.
There is no input at this point and I have input layer.
Is it possible to do this in tenserflow?
My code:
def Geo_branch(self, geo_inp):
Fully_Connected1 = layers.TimeDistributed(layers.Dense(128, activation='tanh'))(geo_inp)
Fully_Connected2 = layers.TimeDistributed(layers.Dense(64, activation='tanh'))(Fully_Connected1)
return Fully_Connected2
#tf.function
def geo_extension(self, geo_branch):
Fully_Connected = layers.TimeDistributed(layers.Dense(100, activation='tanh'))(geo_branch)
geo_ext = layers.LSTM(6,
activation="tanh",
recurrent_activation="sigmoid",
unroll=False,
use_bias=True,
name='Translation'
)(Fully_Connected)
grads = tf.gradients(geo_ext, geo_branch)
return geo_ext, grads
inp_geo = layers.Input(shape=(self.time_size, 6), name='geo_input')
Geo_branch = Geo_branch(inp_geo)
geo_ext, grads = geo_extension(Geo_branch)
Any solution is appreciated. It doesn't have to be GradientTape, if there is any other way to compute these gradients.
I would just inherit from tensorflow's Layer class and creating your own custom Layer. Also, it would probably be beneficial to put everything under one call so as to minimize the likelihood that there are disconnections in the graph.
Example:
import tensorflow as tf
from typing import List
from typing import Optional
from typing import Tuple
from tensorflow.keras import Model
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Input
from tensorflow.keras.layers import Layer
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import TimeDistributed
class CustomGeoLayer(Layer):
"""``CustomGeoLayer``."""
def __init__(self, num_units: List[int], name: Optional[str] = None):
super().__init__(name=name)
self.num_units = num_units
self.dense_0 = TimeDistributed(Dense(num_units[0], activation="tanh"))
self.dense_1 = TimeDistributed(Dense(num_units[1], activation="tanh"))
self.dense_2 = TimeDistributed(Dense(num_units[2], activation="tanh"))
self.rnn = LSTM(units=num_units[3], activation="tanh",
recurrent_activation="sigmoid",
unroll=False, use_bias=True,
name="Translation")
#tf.function
def call(self,
input_tensor: tf.Tensor,
training: bool = True) -> Tuple[tf.Tensor, tf.Tensor]:
x = self.dense_0(input_tensor)
x = self.dense_1(x)
r = self.dense_2(x)
x = self.rnn(r, training=training)
return x, tf.gradients(x, r)[0]
# create model
x_in = Input(shape=(10, 6))
x_out = CustomGeoLayer([128, 64, 100, 6])(x_in)
model = Model(x_in, x_out)
# fake input data
arr = tf.random.normal((3, 10, 6))
# forward pass
out, g = model(arr)
print(out.shape)
# (3, 6)
print(g.shape)
# (3, 10, 100)

How to mix many distributions in one tensorflow probability layer?

I have several DistributionLambda layers as the outputs of one model, and I would like to make a Concatenate-like operation into a new layer, in order to have only one output that is the mix of all the distributions, assuming they are independent. Then, I can apply a log-likelihood loss to the output of the model. Otherwise, I cannot apply the loss over a Concatenate layer, because it lost the log_prob method. I have been trying with the Blockwise distribution, but with no luck so far.
Here an example code:
from tensorflow.keras import layers
from tensorflow.keras import models
from tensorflow.keras import optimizers
from tensorflow_probability import distributions
from tensorflow_probability import layers as tfp_layers
def likelihood_loss(y_true, y_pred):
"""Adding negative log likelihood loss."""
return -y_pred.log_prob(y_true)
def distribution_fn(params):
"""Distribution function."""
return distributions.Normal(
params[:, 0], math.log(1.0 + math.exp(params[:, 1])))
output_steps = 3
...
lstm_layer = layers.LSTM(10, return_state=True)
last_layer, l_h, l_c = lstm_layer(last_layer)
lstm_state = [l_h, l_c]
dense_layer = layers.Dense(2)
last_layer = dense_layer(last_layer)
last_layer = tfp_layers.DistributionLambda(
make_distribution_fn=distribution_fn)(last_layer)
output_layers = [last_layer]
# Get output sequence, re-injecting the output of each step
for number in range(1, output_steps):
last_layer = layers.Reshape((1, 1))(last_layer)
last_layer, l_h, l_c = lstm_layer(last_layer, initial_state=lstm_states)
# Storing state for next time step
lstm_states = [l_h, l_c]
last_layer = tfp_layers.DistributionLambda(
make_distribution_fn=distribution_fn)(dense_layer(last_layer))
output_layers.append(last_layer)
# This does not work
# last_layer = distributions.Blockwise(output_layers)
# This works for the model but cannot compute loss
# last_layer = layers.Concatenate(axis=1)(output_layers)
the_model = models.Model(inputs=[input_layer], outputs=[last_layer])
the_model.compile(loss=likelihood_loss, optimizer=optimizers.Adam(lr=0.001))
The problem is your Input, not your output layer ;)
Input:0 is referenced in your error message.
Could you try to be more specific about your input?

Training this model leads to memory leakage

I've been a training a model and with htop I see that the memory keeps increasing with every iteration.Looking arround most people say that the graph must keep on growing either because I'm loading a new model with every iteration or because I add new ops, but I do none of the above.
This is the smallest reproducible example.
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
import tensorflow as tf
import numpy as np
#%% Params
OBSERVATION_SPACE_VALUES = 4
ACTION_SPACE_SIZE = 2
LEARNING_RATE = 0.00025/4
DENSE_PARAMS = [256]
class Network():
def __init__(self, state_size=OBSERVATION_SPACE_VALUES, action_size=ACTION_SPACE_SIZE, learning_rate=LEARNING_RATE,
dense_params=DENSE_PARAMS):
self.state_size = state_size
self.action_size = action_size
self.learning_rate= learning_rate
self.model = self.create_model(dense_params)
def create_model(self, dense_params=[256]):
model = Sequential()
for params in dense_params:
units = params
model.add(Dense(units, activation='relu',input_shape=[self.state_size]))
model.add(Dense(self.action_size, activation="linear"))
model.compile(loss="mse", optimizer=Adam(lr=self.learning_rate))
return model
Agent = Network()
for i in range(10_000):
state = np.random.rand(Agent.state_size)
state = np.expand_dims(state, axis=0)
output = np.random.rand(Agent.action_size)
output = np.expand_dims(output, axis=0)
Agent.model.fit(state,output,verbose=True)
And also:
tf.__version__
2.0.0
tf.keras.__version__
2.2.4-tf
The problem is the use of multiple .fit calls. To solve this, you can either:
create a data_generator of your data and call .fit(epochs=10000)
or
keep the for loop but call `train_on_batch (doc) instead

Access layer attribute in custom loss function in Keras

I want to write a custom loss function in Keras which depends on an attribute of a (custom) layer in the network.
The idea is the following:
I have a custom layer which modifies the input in each epoch based on a random variable
The output labels should be modified based on the same variable
Some example code to make it more clear:
import numpy as np
from keras import losses, layers, models
class MyLayer(layers.Layer):
def call(self, x):
a = np.random.rand()
self.a = a # <-- does this work as expected?
return x+a
def my_loss(layer):
def modified_loss(y_true, y_pred):
a = layer.a
y_true = y_true + a
return losses.mse(y_true, y_pred)
input_layer = layers.Input()
my_layer = MyLayer(input_layer, name="my_layer")
output_layer = layers.Dense(4)(my_layer)
model = models.Model(inputs=input_layer, outputs=output_layer)
model.compile('adam', my_loss(model.get_layer("my_layer")))
I expect that a is changing for every batch and that the same a is used in the layer and loss function.
Right now, it is not working the way I intended. It seems like the a in the loss function is never updated (and maybe not even in the layer).
How do I change the attribute/value of a in the layer at every call and access it in the loss function?
Not quite sure I am following the purpose on this (and I am bothered by the call to np inside the call() of your custom layer - could you not use the tf.random functions instead?) but you can certainly access the a property inside your loss function.
Perhaps something like:
class MyLayer(layers.Layer):
def call(self, x):
a = np.random.rand() # FIXME --> use tf.random
self.a = a
return x+a
input_layer = layers.Input()
my_layer = MyLayer(input_layer, name="my_layer")
output_layer = layers.Dense(4)(my_layer)
model = models.Model(inputs=input_layer, outputs=output_layer)
def my_loss(y_true, y_pred):
y_true = y_true + my_layer.a
return losses.mse(y_true, y_pred)
model.compile('adam', loss=my_loss)

Keras can't compute a graph node in a callback

To illustrate the issue, consider the (entirely artificial) example of a model below:
import numpy as np
from tensorflow.keras.utils import Sequence
from tensorflow.keras.callbacks import Callback
from tensorflow.keras import backend as K
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
class RandomSeq(Sequence):
def __len__(self):
return 5
def __getitem__(self, idx):
return (np.array([np.arange(39).reshape((39,)) for i in range(100)]),
np.array([-np.arange(39).reshape((39,)) for i in range(100)]))
class Foo(Callback):
def __init__(self, d):
super(Foo, self).__init__()
self._d = d
def on_epoch_end(self, epoch, logs=None):
print(epoch)
print(K.eval(self._d))
x = Input(shape=(39,), dtype='float32', name='input')
y_pred = Dense(39)(x)
y_another = x * 2
m = Model(inputs=x, outputs=y_pred)
m.compile(optimizer='sgd', loss='mse')
seq = RandomSeq()
m.fit_generator(seq, epochs=5, callbacks=[Foo(y_another)])
RandomSeq is just a sequence that returns x and y batches. Foo is a callback that will try to evaluate the attached quantity d at the end of an epoch. For me, if I choose d to be y_pred or y_another, then Keras complains that the placeholder x (input) is not fed.
You must feed a value for placeholder tensor 'input_1' with dtype float and shape [?,39]
Is this expected behavior? If so, is there another way to compute a node in a Keras callback? Note that the example works fine if there's no callback that computes the mentioned graph nodes.
That's not the correct way for doing it.
Running K.eval(y_another) you are asking keras backend to evaluate an Input object (which is only a placeholder for the data you want to feed into the network) without feeding it with any data, that's the reason of your error.
Thus, assuming that you want to compute the output of the network given a new input that is a random sequence multiplied by 2 (is this right?), then you should modify the body of on_epoch_end(self, epoch, logs=None) in your callback as follow:
x, _ = next(RandomSeq())
print(self.model.predict(x*2))

Categories