Different learning curve of Adam between Tensorflow and Keras

Different learning curve of Adam between Tensorflow and Keras - python

I'm currently changing my code from Keras to Tensorflow in order to use the new feature of quantized training in Tensorflow 1.10.0. However, I found out that the training process in Keras and Tensorflow shows very large difference when using Adam optimizer.
Here is the code for practice usage, which aims on the same purpose to train a "sin(10x)" function in Tensorflow and Keras way.
from keras.layers import Input, Dense, BatchNormalization
from keras.models import Model
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import keras.backend as K
KERAS = 'keras'
TENSORFLOW = 'tensorflow'
def create_model():
ipt = Input([1])
m = Dense(1000, activation='relu')(ipt)
m = BatchNormalization()(m)
m = Dense(1000, activation='relu')(m)
m = BatchNormalization()(m)
m = Dense(1)(m)
return Model(ipt, m)
valX = np.expand_dims(np.linspace(-1, 1, 10000), 1)
valY = np.sin(valX * 10)
valY_ = {}
for phase in (KERAS, TENSORFLOW):
sess = tf.Session()
sess.as_default()
K.set_session(sess)
model = create_model()
if phase is KERAS:
model.compile('adam', 'mean_squared_error')
else:
tensor_y_gt = tf.placeholder(dtype=tf.float32, shape=model.output.get_shape().as_list())
mse = tf.losses.mean_squared_error(model.output, tensor_y_gt)
training_steps = tf.train.AdamOptimizer().minimize(mse)
sess.run(tf.global_variables_initializer())
for step in range(2000):
X = np.random.uniform(-1, 1, [256, 1])
Y = np.sin(X * 10)
if phase is KERAS:
loss = model.train_on_batch(X, Y)
else:
loss, _ = sess.run([mse, training_steps], feed_dict={model.input: X, tensor_y_gt: Y})
if step % 100 == 0:
print('%s, step#%d, loss=%.5f' % (phase, step, loss))
valY_[phase] = model.predict(valX)[:, 0]
sess.close()
valX = valX[:, 0]
valY = valY[:, 0]
plt.plot(valX, valY, 'r--', label='sin(10x)')
plt.plot(valX, valY_[KERAS], 'g-', label=KERAS)
plt.plot(valX, valY_[TENSORFLOW], 'b-', label=TENSORFLOW)
plt.legend(loc='best', ncol=1)
plt.show()
You can see the difference between the two:
plot of sin(10x)
Environment:
tensorflow-gpu 1.10.0
Keras 2.2.2
Does anyone has a clue?

Related

How to save optimizer in Tensorflow 2.x

I am not using model.compile() function, and I am following 'Writing a training loop from scratch' (Therefore, simple 'model.save()' does not include my optimizer). Also, I create NN using a functional API in Tensorflow. So, I am able to save using 'model.save()', but it saves only NN model and does not save the state of my optimizer.
I think I was able to save an optimizer using pickle, but somehow I am no longer able to save it before
(it give me an error message (AttributeError: Can't pickle local object 'make_gradient_clipnorm_fn..'))
I can use 'get_config' and 'from_config' methods, but it does not save weights. There is another solution to save weights like here. I am looking for a better (and simple) solution.
(and actually this method does not work for me because I am not only training 'model.trainable_weights', but also other 'tf.Variable', which are not belong to my 'model'.)
Is there any easy and convenient way to save and load optimizer?
Thanks
colab: https://colab.research.google.com/drive/1Jn2lbMcaVURpKITL6qEQ-05XndVDoABQ?usp=sharing
or
--code--
import numpy as np
import tensorflow as tf
import matplotlib
import matplotlib.pyplot as plt
import datetime
import os
import importlib
import pickle
import time
from pathlib import Path
gpus = tf.config.experimental.list_physical_devices('GPU')
print(gpus)
"""Generate samples for training"""
my_dtype = 'float32'
tf.keras.backend.set_floatx(my_dtype)
num_smpl = 500
amp1 = 0.5
x = 0.5*np.random.rand(num_smpl, 1).astype(np.float32)
y = amp1 * np.heaviside(x, 0) - amp1*0.2*np.heaviside(x-0.15, 0) \
+ amp1*0.2*np.heaviside(x-0.17, 0)
mu, sigma = 0, 0.001 # mean and standard deviation
white_noise = np.random.normal(mu, sigma, num_smpl).reshape(-1, 1)
y = y + white_noise
fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(x, y, s=5, edgecolors='k')
"""Create model using a funtional API"""
layer_input = tf.keras.layers.Input(shape=(1,), dtype=my_dtype)
layer_dense = tf.keras.layers.Dense(8, activation='tanh', dtype=my_dtype)(layer_input)
layer_dense = tf.keras.layers.Dense(8, activation='tanh', dtype=my_dtype)(layer_dense)
layer_output = tf.keras.layers.Dense(1, name='', dtype=my_dtype)(layer_dense)
model = tf.keras.Model(inputs=layer_input, outputs=layer_output, name='my_model')
model.summary()
epoch = 1
optimizer = tf.keras.optimizers.Adam(learning_rate=0.01, epsilon=1e-09)
loss_fn = tf.keras.losses.MeanSquaredError()
"""Training using a custom training loop"""
while epoch < 100:
with tf.GradientTape() as t1:
output = model(x, training=True)
mse = loss_fn(output, y)
grads = t1.gradient(mse, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
current_time = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
if epoch % 10 == 0:
print('Epoch: {}, Loss: {}'.format(epoch, mse))
epoch = epoch + 1
"""Normal variables can be saved using pickle"""
print('Cur. locaiton: ' + os.getcwd())
a = np.arange(0., 10.)
print(a)
with open('test.pkl', 'wb') as file:
pickle.dump(a, file)
"""***How to save optimizer?***"""
with open('my_opti.pkl', 'wb') as file:
pickle.dump(optimizer, file)
'

Custom Keras loss function with the output's gradient [duplicate]

I am using TF2 (2.3.0) NN to approximate the function y which solves the ODE: y'+3y=0
I have defined cutsom loss class and function in which I am trying to differentiate the single output with respect to the single input so the equation holds, provided that y_true is zero:
from tensorflow.keras.losses import Loss
import tensorflow as tf
class CustomLossOde(Loss):
def __init__(self, x, model, name='ode_loss'):
super().__init__(name=name)
self.x = x
self.model = model
def call(self, y_true, y_pred):
with tf.GradientTape() as tape:
tape.watch(self.x)
y_p = self.model(self.x)
dy_dx = tape.gradient(y_p, self.x)
loss = tf.math.reduce_mean(tf.square(dy_dx + 3 * y_pred - y_true))
return loss
but running the following NN:
import tensorflow as tf
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense
from tensorflow.keras import Input
from custom_loss_ode import CustomLossOde
num_samples = 1024
x_train = 4 * (tf.random.uniform((num_samples, )) - 0.5)
y_train = tf.zeros((num_samples, ))
inputs = Input(shape=(1,))
x = Dense(16, 'tanh')(inputs)
x = Dense(8, 'tanh')(x)
x = Dense(4)(x)
y = Dense(1)(x)
model = Model(inputs=inputs, outputs=y)
loss = CustomLossOde(model.input, model)
model.compile(optimizer=Adam(learning_rate=0.01, beta_1=0.9, beta_2=0.99),loss=loss)
model.run_eagerly = True
model.fit(x_train, y_train, batch_size=16, epochs=30)
for now I am getting 0 loss from the fisrt epoch, which doesn't make any sense.
I have printed both y_true and y_test from within the function and they seem OK so I suspect that the problem is in the gradien which I didn't succeed to print.
Apprecitate any help

Defining a custom loss with the high level Keras API is a bit difficult in that case. I would instead write the training loop from scracth, as it allows a finer grained control over what you can do.
I took inspiration from those two guides :
Advanced Automatic Differentiation
Writing a training loop from scratch
Basically, I used the fact that multiple tape can interact seamlessly. I use one to compute the loss function, the other to calculate the gradients to be propagated by the optimizer.
import tensorflow as tf
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense
from tensorflow.keras import Input
num_samples = 1024
x_train = 4 * (tf.random.uniform((num_samples, )) - 0.5)
y_train = tf.zeros((num_samples, ))
inputs = Input(shape=(1,))
x = Dense(16, 'tanh')(inputs)
x = Dense(8, 'tanh')(x)
x = Dense(4)(x)
y = Dense(1)(x)
model = Model(inputs=inputs, outputs=y)
# using the high level tf.data API for data handling
x_train = tf.reshape(x_train,(-1,1))
dataset = tf.data.Dataset.from_tensor_slices((x_train,y_train)).batch(1)
opt = Adam(learning_rate=0.01, beta_1=0.9, beta_2=0.99)
for step, (x,y_true) in enumerate(dataset):
# we need to convert x to a variable if we want the tape to be
# able to compute the gradient according to x
x_variable = tf.Variable(x)
with tf.GradientTape() as model_tape:
with tf.GradientTape() as loss_tape:
loss_tape.watch(x_variable)
y_pred = model(x_variable)
dy_dx = loss_tape.gradient(y_pred, x_variable)
loss = tf.math.reduce_mean(tf.square(dy_dx + 3 * y_pred - y_true))
grad = model_tape.gradient(loss, model.trainable_variables)
opt.apply_gradients(zip(grad, model.trainable_variables))
if step%20==0:
print(f"Step {step}: loss={loss.numpy()}")

Using keras neural network in function

I am trying to implement an algorithm from a paper, using keras, where they train a neural network to approximate a mathematical function f(x) with limited amount of data points. I want the input of the neural network to be x and the output in the form of f(x) = 1 + xN(x), where N(x) is the value from the final dense layer.
I know how to make it work for output f(x) = N(x) but I just don't know how to adjust the network for f(x) = 1 + xN(x). Can someone help me?
This is my current code
from keras.layers import Input, Dense, Add, Multiply
from keras.models import Model
import keras.backend as K
import matplotlib.pyplot as plt
import numpy as np
import time
def f(x):
return x**2
Xtrain = np.linspace(0, 1, 10)
ytrain = np.array([f(x) for x in Xtrain])
X = np.linspace(0, 2, 100)
y = np.array([f(x) for x in X])
input = Input(shape=(1,))
init = np.ones(shape=(10, 1))
init = K.variable(init)
hidden = input
hidden = Dense(8, activation='relu')(hidden)
out = Dense(1, activation='linear')(hidden)
out = Add()([init, Multiply()([out, input])])
model = Model(inputs=input, outputs=out)
model.compile(loss='mean_squared_error', optimizer="adam")
tic = time.perf_counter()
model.fit(Xtrain, ytrain, epochs=1000, verbose=1)
toc = time.perf_counter()
print(f"Training time: {toc - tic:0.4f} seconds")
prediction = model.predict(X)
prediction = prediction.reshape((100,))
plt.figure(figsize=(10,5))
plt.plot(X, y, color='red', label='Analytical solution')
plt.plot(X, prediction, color='black', label = 'Prediction')
plt.scatter(Xtrain, ytrain, color='blue', label='Training points')
plt.legend()
plt.show()
plt.tight_layout()
which crashes at line
out = Add()([init, Multiply()([out, input])])

The Add layer is working between two layers and between a layer and a number/ndarray.
you can just use it like this:
init=np.ones(shape=(10, 1))
inp = Input(shape=(1,))
hidden = Dense(8, activation='relu')(inp)
out = Dense(1, activation='linear')(hidden)
mul=Multiply()([out, inp])
out = Add()([init, mul])
model = Model(inputs=inp, outputs=out)
model.compile(loss='mean_squared_error', optimizer="adam")
I checked it and it worked.
by the way, input is a builtin function, I don't recommend to use it unless you want to use it.

How to generate CNN heatmaps using built-in Keras in TF2.0 (tf.keras)

I used to generate heatmaps for my Convolutional Neural Networks, based on the stand-alone Keras library on top of TensorFlow 1. That worked fine, however, after my switch to TF2.0 and built-in tf.keras implementation (with eager execution) I cannot use my old heatmap generation code any longer.
So I re-wrote parts of my code for TF2.0 and ended up with the following:
from tensorflow.keras.applications.vgg16 import preprocess_input
from tensorflow.keras.preprocessing.image import load_img
from tensorflow.keras.models import load_model
from tensorflow.keras import preprocessing
from tensorflow.keras import backend as K
from tensorflow.keras import models
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
image_size = 150
image_path = "/tmp/images/test-image.jpg"
model_path = "/tmp/models/prototype/basic_vgg16.h5"
# Load pre-trained Keras model and the image to classify
model = load_model(model_path) # VGG16 CNN with custom classifier head
image = load_img(image_path, target_size=(image_size, image_size))
img_tensor = preprocessing.image.img_to_array(image)
img_tensor = np.expand_dims(img_tensor, axis=0)
img_tensor = preprocess_input(img_tensor)
input_layer = model.get_layer("model_input")
conv_layer = model.get_layer("block5_conv3")
heatmap_model = models.Model([model.inputs], [conv_layer.output, model.output])
# Get gradient of the winner class w.r.t. the output of the (last) conv. layer
with tf.GradientTape() as gtape:
conv_output, predictions = heatmap_model(img_tensor)
loss = predictions[:, np.argmax(predictions[0])]
grads = gtape.gradient(loss, conv_output)
pooled_grads = K.mean(grads, axis=(0, 1, 2))
# Get values of pooled grads and model conv. layer output as Numpy arrays
iterate = K.function([model.inputs], [pooled_grads, conv_layer.output[0]])
pooled_grads_value, conv_layer_output_value = iterate([img_tensor])
# Multiply each channel in the feature-map array by "how important it is"
for i in range(pooled_grads_value.shape[0]):
conv_layer_output_value[:, :, i] *= pooled_grads_value[i]
# Channel-wise mean of resulting feature-map is the heatmap of class activation
heatmap = np.mean(conv_layer_output_value, axis=-1)
heatmap = np.maximum(heatmap, 0)
max_heat = np.max(heatmap)
if max_heat == 0:
max_heat = 1e-10
heatmap /= max_heat
# Render heatmap via pyplot
plt.matshow(heatmap)
plt.show()
But now the following line:
iterate = K.function([model.inputs], [pooled_grads, conv_layer.output[0]])
leads to this error message:
AttributeError: Tensor.op is meaningless when eager execution is enabled.
I always used Keras and did not work with TF directly, so I am bit lost here.
Any ideas what could be the problem here?
PS: If you want to c&p this code, you can create the VGG16-based model like so:
# Create Keras model from pre-trained VGG16 and custom classifier
input_layer = layers.Input(shape=(image_size, image_size, 3), name="model_input")
vgg16_model = VGG16(weights="imagenet", include_top=False, input_tensor=input_layer)
model_head = vgg16_model.output
model_head = layers.Flatten(name="model_head_flatten")(model_head)
model_head = layers.Dense(256, activation="relu")(model_head)
model_head = layers.Dense(3, activation="softmax")(model_head)
model = models.Model(inputs=input_layer, outputs=model_head)
model.compile(loss="categorical_crossentropy", optimizer=optimizers.Adam(), metrics=["accuracy"])

At the end of the GradientTape loop, conv_output and grads already holds the value. The iterate function is no longer need to compute the values.
Working example below:
from tensorflow.keras.applications.vgg16 import preprocess_input
from tensorflow.keras.preprocessing.image import load_img
from tensorflow.keras.models import load_model
from tensorflow.keras import preprocessing
from tensorflow.keras import backend as K
from tensorflow.keras import models
import tensorflow as tf
import numpy as np
image_size = 224
# Load pre-trained Keras model and the image to classify
model = tf.keras.applications.vgg16.VGG16()
image = np.random.random((image_size, image_size, 3))
img_tensor = preprocessing.image.img_to_array(image)
img_tensor = np.expand_dims(img_tensor, axis=0)
img_tensor = preprocess_input(img_tensor)
conv_layer = model.get_layer("block5_conv3")
heatmap_model = models.Model([model.inputs], [conv_layer.output, model.output])
# Get gradient of the winner class w.r.t. the output of the (last) conv. layer
with tf.GradientTape() as gtape:
conv_output, predictions = heatmap_model(img_tensor)
loss = predictions[:, np.argmax(predictions[0])]
grads = gtape.gradient(loss, conv_output)
pooled_grads = K.mean(grads, axis=(0, 1, 2))
heatmap = tf.reduce_mean(tf.multiply(pooled_grads, conv_output), axis=-1)
heatmap = np.maximum(heatmap, 0)
max_heat = np.max(heatmap)
if max_heat == 0:
max_heat = 1e-10
heatmap /= max_heat
print(heatmap.shape)

one more issue to view the heatmap
import matplotlib.pyplot as plt
hm=np.squeeze(heatmap)
hm.shape
(14, 14)
plt.imshow(hm)

LSTM won't overfit training data

I have been trying to use an LSTM for regression in TensorFlow, but it doesn't fit the data. I have successfully fit the same data in Keras (with the same size network). My code for trying to overfit a sine wave is below:
import tensorflow as tf
import numpy as np
yt = np.cos(np.linspace(0, 2*np.pi, 256))
xt = np.array([yt[i-50:i] for i in range(50, len(yt))])[...,None]
yt = yt[-xt.shape[0]:]
g = tf.Graph()
with g.as_default():
x = tf.constant(xt, dtype=tf.float32)
y = tf.constant(yt, dtype=tf.float32)
lstm = tf.nn.rnn_cell.BasicLSTMCell(32)
outputs, state = tf.nn.dynamic_rnn(lstm, x, dtype=tf.float32)
pred = tf.layers.dense(outputs[:,-1], 1)
loss = tf.reduce_mean(tf.square(pred-y))
train_op = tf.train.AdamOptimizer().minimize(loss)
init = tf.global_variables_initializer()
sess = tf.InteractiveSession(graph=g)
sess.run(init)
for i in range(200):
_, l = sess.run([train_op, loss])
print(l)
This results in a MSE of 0.436067 (while Keras got to 0.0022 after 50 epochs), and the predictions range from -0.1860 to -0.1798. What am I doing wrong here?
Edit:
When I change my loss function to the following, the model fits properly:
def pinball(y_true, y_pred):
tau = np.arange(1,100).reshape(1,-1)/100
pin = tf.reduce_mean(tf.maximum(y_true[:,None] - y_pred, 0) * tau +
tf.maximum(y_pred - y_true[:,None], 0) * (1 - tau))
return pin
I also change the assignments of pred and loss to
pred = tf.layers.dense(outputs[:,-1], 99)
loss = pinball(y, pred)
This results in a decrease of loss from 0.3 to 0.003 as it trains, and seems to properly fit the data.

Looks like a shape/broadcasting issue. Here's a working version:
import tensorflow as tf
import numpy as np
yt = np.cos(np.linspace(0, 2*np.pi, 256))
xt = np.array([yt[i-50:i] for i in range(50, len(yt))])
yt = yt[-xt.shape[0]:]
g = tf.Graph()
with g.as_default():
x = tf.constant(xt, dtype=tf.float32)
y = tf.constant(yt, dtype=tf.float32)
lstm = tf.nn.rnn_cell.BasicLSTMCell(32)
outputs, state = tf.nn.dynamic_rnn(lstm, x[None, ...], dtype=tf.float32)
pred = tf.squeeze(tf.layers.dense(outputs, 1), axis=[0, 2])
loss = tf.reduce_mean(tf.square(pred-y))
train_op = tf.train.AdamOptimizer().minimize(loss)
init = tf.global_variables_initializer()
sess = tf.InteractiveSession(graph=g)
sess.run(init)
for i in range(200):
_, l = sess.run([train_op, loss])
print(l)
x gets a batch dimension of 1 before going into dynamic_rnn, since with time_major=False the first dimension is expected to be a batch dimension. It's important that the last dimension of the output of tf.layers.dense get squeezed off so that it doesn't broadcast with y (TensorShape([256, 1]) and TensorShape([256]) broadcast to TensorShape([256, 256])). With those fixes it converges:
5.78507e-05

You are not passing-on the state from one call of dynamic_rnn to next. That's the problem for sure.
Also, why take only last item of the output through the dense layer and onward?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Different learning curve of Adam between Tensorflow and Keras - python

Related

How to save optimizer in Tensorflow 2.x

Custom Keras loss function with the output's gradient [duplicate]

Using keras neural network in function

How to generate CNN heatmaps using built-in Keras in TF2.0 (tf.keras)

LSTM won't overfit training data

Categories

Resources