Implementing the Rprop algorithm in Keras

Implementing the Rprop algorithm in Keras - python

I am trying to implement the resilient backpropagation optimizer for Keras (link), but the challenging part was being able to perform an update on each individual parameter based on whether its corresponding gradient is positive, negative or zero. I wrote the code below as a start towards implementing the Rprop optimizer. However, I can't seem to find a way to access the parameters individually. Looping over params (as in the code below) returns p, g, g_old, s, wChangeOld at each iteration which are all matrices.
Is there a way where I could iterate over the individual parameters and update them ? It would also work if I could index the parameter vector based on the sign of its gradients.
class Rprop(Optimizer):
def __init__(self, init_step=0.01, **kwargs):
super(Rprop, self).__init__(**kwargs)
self.init_step = K.variable(init_step, name='init_step')
self.iterations = K.variable(0., name='iterations')
self.posStep = 1.2
self.negStep = 0.5
self.minStep = 1e-6
self.maxStep = 50.
def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
self.updates = [K.update_add(self.iterations, 1)]
shapes = [K.get_variable_shape(p) for p in params]
stepList = [K.ones(shape)*self.init_step for shape in shapes]
wChangeOldList = [K.zeros(shape) for shape in shapes]
grads_old = [K.zeros(shape) for shape in shapes]
self.weights = stepList + grads_old + wChangeOldList
self.updates = []
for p, g, g_old, s, wChangeOld in zip(params, grads, grads_old,
stepList, wChangeOldList):
change = K.sign(g * g_old)
if change > 0:
s_new = K.minimum(s * self.posStep, self.maxStep)
wChange = s_new * K.sign(g)
g_new = g
elif change < 0:
s_new = K.maximum(s * self.posStep, self.maxStep)
wChange = - wChangeOld
g_new = 0
else:
s_new = s
wChange = s_new * K.sign(g)
g_new = p
self.updates.append(K.update(g_old, g_new))
self.updates.append(K.update(wChangeOld, wChange))
self.updates.append(K.update(s, s_new))
new_p = p - wChange
# Apply constraints
if p in constraints:
c = constraints[p]
new_p = c(new_p)
self.updates.append(K.update(p, new_p))
return self.updates
def get_config(self):
config = {'init_step': float(K.get_value(self.init_step))}
base_config = super(Rprop, self).get_config()
return dict(list(base_config.items()) + list(config.items()))

I was looking for an RProp algorithm in Keras as well and found this question. I took the liberty of adapting your code to my purpose and post it back here now. So far it seems to work quite well, but I didn't test it extensively.
Disclaimer: I'm very new to keras but have a lot of experience with theano (and blocks). Further I tested this only with theano as a backend, but not tensorflow.
class RProp(Optimizer):
def __init__(self, init_alpha=1e-3, scale_up=1.2, scale_down=0.5, min_alpha=1e-6, max_alpha=50., **kwargs):
super(RProp, self).__init__(**kwargs)
self.init_alpha = K.variable(init_alpha, name='init_alpha')
self.scale_up = K.variable(scale_up, name='scale_up')
self.scale_down = K.variable(scale_down, name='scale_down')
self.min_alpha = K.variable(min_alpha, name='min_alpha')
self.max_alpha = K.variable(max_alpha, name='max_alpha')
def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
shapes = [K.get_variable_shape(p) for p in params]
alphas = [K.variable(numpy.ones(shape) * self.init_alpha) for shape in shapes]
old_grads = [K.zeros(shape) for shape in shapes]
self.weights = alphas + old_grads
self.updates = []
for param, grad, old_grad, alpha in zip(params, grads, old_grads, alphas):
new_alpha = K.switch(
K.greater(grad * old_grad, 0),
K.minimum(alpha * self.scale_up, self.max_alpha),
K.maximum(alpha * self.scale_down, self.min_alpha)
)
new_param = param - K.sign(grad) * new_alpha
# Apply constraints
if param in constraints:
c = constraints[param]
new_param = c(new_param)
self.updates.append(K.update(param, new_param))
self.updates.append(K.update(alpha, new_alpha))
self.updates.append(K.update(old_grad, grad))
return self.updates
def get_config(self):
config = {
'init_alpha': float(K.get_value(self.init_alpha)),
'scale_up': float(K.get_value(self.scale_up)),
'scale_down': float(K.get_value(self.scale_down)),
'min_alpha': float(K.get_value(self.min_alpha)),
'max_alpha': float(K.get_value(self.max_alpha)),
}
base_config = super(RProp, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
Important notes:
RProp is often not included in machine learning libraries for a reason: It does not work at all unless you use full-batch learning. And full-batch learning is only useful if you have a small training set.
Adam (Keras builtin) outperforms this RProp algorithm. Maybe because that's just how it is, or maybe because I made a mistake :)
A few comments about your code (referring to your original variable names):
wChange is never used across iterations, so you don't need to store those in permanent variables.
change > 0 does not do what you think it does because change is a tensor variable. What you want here is a element-wise comparison, use K.switch() instead.
You used maxStep twice instead of using minStep the other time.
The situation where change is zero is negligible, since that almost never happens in practice.
g_new = 0 and g_new = p are both completely bogus and should be g_new = g as in the first if branch.

I'm new to keras and Python but I modified the code above for my purposes a bit.
It is incredibly fast and simple algorithm due to using full-batch learning and partial derivatives. In my tests it outperformed all other backpropagation algorithms, including Adam. I tested it with Tensorflow and CNTK as a backend.
Modified Rprop without Weight-Backtracking:
https://pdfs.semanticscholar.org/df9c/6a3843d54a28138a596acc85a96367a064c2.pdf
class iRprop_(Optimizer):
def __init__(self, init_alpha=0.01, scale_up=1.2, scale_down=0.5, min_alpha=0.00001, max_alpha=50., **kwargs):
super(iRprop_, self).__init__(**kwargs)
self.init_alpha = K.variable(init_alpha, name='init_alpha')
self.scale_up = K.variable(scale_up, name='scale_up')
self.scale_down = K.variable(scale_down, name='scale_down')
self.min_alpha = K.variable(min_alpha, name='min_alpha')
self.max_alpha = K.variable(max_alpha, name='max_alpha')
def get_updates(self, params, loss):
grads = self.get_gradients(loss, params)
shapes = [K.get_variable_shape(p) for p in params]
alphas = [K.variable(K.ones(shape) * self.init_alpha) for shape in shapes]
old_grads = [K.zeros(shape) for shape in shapes]
self.weights = alphas + old_grads
self.updates = []
for p, grad, old_grad, alpha in zip(params, grads, old_grads, alphas):
grad = K.sign(grad)
new_alpha = K.switch(
K.greater(grad * old_grad, 0),
K.minimum(alpha * self.scale_up, self.max_alpha),
K.switch(K.less(grad * old_grad, 0),K.maximum(alpha * self.scale_down, self.min_alpha),alpha)
)
grad = K.switch(K.less(grad * old_grad, 0),K.zeros_like(grad),grad)
new_p = p - grad * new_alpha
# Apply constraints.
if getattr(p, 'constraint', None) is not None:
new_p = p.constraint(new_p)
self.updates.append(K.update(p, new_p))
self.updates.append(K.update(alpha, new_alpha))
self.updates.append(K.update(old_grad, grad))
return self.updates
def get_config(self):
config = {
'init_alpha': float(K.get_value(self.init_alpha)),
'scale_up': float(K.get_value(self.scale_up)),
'scale_down': float(K.get_value(self.scale_down)),
'min_alpha': float(K.get_value(self.min_alpha)),
'max_alpha': float(K.get_value(self.max_alpha)),
}
base_config = super(iRprop_, self).get_config()
return dict(list(base_config.items()) + list(config.items()))

Related

Style loss is always zero

I am trying to use feature reconstruction and style reconstruction losses on my model. For this, I followed the example code on PyTorch website for “Neural Style Transfer”.
https://pytorch.org/tutorials/advanced/neural_style_tutorial.html
Although the feature loss is calculated without problem, the style loss is always zero. And I could not find the reason since everything looks fine in the implementation. The calculation methods are the same as the proposed mathematical methods for these loss functions. Besides, as you know, the style and feature losses are almost the same in terms of calculation except Gram matrix step in style loss and no problem in feature loss.
Could anyone help me with this situation?
class Feature_and_style_losses():
def __init__(self, ):
self.vgg_model = models.vgg19(pretrained=True).features.cuda().eval()
self.content_layers = ['conv_16']
self.style_layers = ['conv_5']
def calculate_feature_and_style_losses(self, input_, target, feature_coefficient, style_coefficient):
i = 0
feature_losses = []
style_losses = []
for layer_ in self.vgg_model.children():
if isinstance(layer_, nn.Conv2d):
i += 1
name = "conv_{}".format(i)
if name in self.content_layers:
features_input = self.vgg_model(input_).detach()
features_target = self.vgg_model(target).detach()
feature_losses.append(self.feature_loss(features_input, features_target))
if name in self.style_layers:
style_input = self.vgg_model(input_).detach()
style_target = self.vgg_model(target).detach()
style_losses.append(self.style_loss(style_input, style_target))
feature_loss_value = (torch.mean(torch.from_numpy(np.array(feature_losses, dtype=np.float32)))) * feature_coefficient
style_loss_value = (torch.mean(torch.from_numpy(np.array(style_losses, dtype=np.float32)))) * style_coefficient
return feature_loss_value, style_loss_value
def feature_loss(self, input_, target):
target = target.detach()
feature_reconstruction_loss = F.mse_loss(input_, target)
return feature_reconstruction_loss
def gram_matrix(self, input_):
a, b, c, d = input_.size() #??? check size
features = input_.view(a*b, c*d)
#features_t = features.transpose(1, 2)
#G = features.bmm(features_t) / (b*c*d)
#print(features.shape)
G = torch.mm(features, features.t())
return G.div(a*b*c*d)
return G
def style_loss(self, input_, target):
G_input = self.gram_matrix(input_)
G_target = self.gram_matrix(target).detach()
#style_reconstruction_loss = self.feature_loss(G_input, G_target)
style_reconstruction_loss = F.mse_loss(G_input, G_target)
return style_reconstruction_loss
feature_loss_ = Feature_and_style_losses()
...
for e in range(epochs):
for i, batch in enumerate(dataloader):
...
real_C = Variable(batch["C"].type(Tensor))
fake_C = independent_decoder(features_all)
f_loss, s_loss = feature_loss_.calculate_feature_and_style_losses(fake_C, real_C, 1, 10)
loss_G_3 = loss_GAN_3 + lambda_pixel * (loss_pixel_3_object + loss_pixel_3_scene) * 0.5 + f_loss + s_loss
loss_G_3.backward(retain_graph=True)
optimizer_independent_decoder.step()
Best.

Implementing A3C on TensorFlow 2

After finishing Coursera's Practical RL course on A3C, I'm trying to implement my own A3C agent using tensorflow 2. To start, I'm training it on the Cartpole environment but I can't get good results. For now, I've already launched several training with the following code, changing the entropy coefficient to see its impact (the results are shown below). Does it come from my implementation, or is it more a fine-tuning issue ?
class A3C:
def __init__(self, state_dim, n_actions, optimizer=tf.keras.optimizers.Adam(1e-3)):
self.state_input = Input(shape=state_dim)
self.x = Dense(256, activation='relu')(self.state_input)
self.head_v = Dense(1, activation='linear')(self.x)
self.head_p = Dense(n_actions, activation='linear')(self.x)
self.network = tf.keras.Model(inputs=[self.state_input], outputs=[self.head_v, self.head_p])
self.optimizer = optimizer
def forward(self, state):
return self.network(state)
def sample(self, logits):
policy = np.exp(logits.numpy()) / np.sum(np.exp(logits.numpy()), axis=-1, keepdims=True)
return np.array([np.random.choice(len(p), p=p) for p in policy])
def evaluate(agent, env, n_games=1): """Plays an a game from start till done, returns per-game rewards """
game_rewards = []
for _ in range(n_games):
state = env.reset()
total_reward = 0
while True:
action = agent.sample(agent.forward(np.array([state]))[1])[0]
state, reward, done, info = env.step(action)
total_reward += reward
if done: break
game_rewards.append(total_reward)
return game_rewards
class EnvBatch:
def __init__(self, n_envs = 10):
self.envs = [gym.make(env_id) for _ in range(n_envs)]
def reset(self):
return np.array([env.reset() for env in self.envs])
def step(self, actions):
results = [env.step(a) for env, a in zip(self.envs, actions)]
new_obs, rewards, done, infos = map(np.array, zip(*results))
for i in range(len(self.envs)):
if done[i]:
new_obs[i] = self.envs[i].reset()
return new_obs, rewards, done, infos
env_id = "CartPole-v0"
env = gym.make(env_id)
state_dim = env.observation_space.shape
n_actions = env.action_space.n
agent = A3C(state_dim, n_actions)
env_batch = EnvBatch(10)
batch_states = env_batch.reset()
gamma=0.99
rewards_history = []
entropy_history = []
for i in trange(200000):
with tf.GradientTape() as t:
batch_values, batch_logits = agent.forward(batch_states)
batch_actions = agent.sample(batch_logits)
batch_next_states, batch_rewards, batch_dones, _ = env_batch.step(batch_actions)
batch_next_values, btach_next_logits = agent.forward(batch_next_states)
batch_next_values *= (1 - batch_dones)
probs = tf.nn.softmax(batch_logits)
logprobs = tf.nn.log_softmax(batch_logits)
logp_actions = tf.reduce_sum(logprobs * tf.one_hot(batch_actions, n_actions), axis=-1)
advantage = batch_rewards + gamma*batch_next_values - batch_values
entropy = -tf.reduce_sum(probs * logprobs, 1, name="entropy")
actor_loss = - tf.reduce_mean(logp_actions * tf.stop_gradient(advantage)) - 0.005 * tf.reduce_mean(entropy)
target_state_values = batch_rewards + gamma*batch_next_values
critic_loss = tf.reduce_mean((batch_values - tf.stop_gradient(target_state_values))**2 )
loss = actor_loss + critic_loss
var_list = agent.network.trainable_variables
grads = t.gradient(loss,var_list)
agent.optimizer.apply_gradients(zip(grads, var_list))
batch_states = batch_next_states
entropy_history.append(np.mean(entropy))
if i % 500 == 0:
if i % 2500 == 0:
rewards_history.append(np.mean(evaluate(agent, env, n_games=3)))
clear_output(True)
plt.figure(figsize=[8, 4])
plt.subplot(1, 2, 1)
plt.plot(rewards_history, label='rewards')
plt.title("Session rewards")
plt.grid()
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(entropy_history, label='entropy')
plt.title("Policy entropy")
plt.grid()
plt.legend()
plt.show()
Beta = 0.005 - Training 1
Beta = 0.005 - Training 2
Beta = 0.005 - Training 3
Beta = 0.05 - Training 1
Beta = 0.05 - Training 2
Beta = 0.05 - Training 3

I've looked through your code, and it doesn't look like there's any problem with the algorithm. That is, it seems to me that the Hyper Parameter was chosen incorrectly. Try different Hyper Parameter Sets. If it doesn't work properly, refer to repository

The critic loss is wrong. You should get first expect returns, predicting the next state and iterate over it with bellman equation.
Here is an example:
def getExpectedReturns(self, states, next_states, done, rewards, standarize=True):
# Get next value
if done[-1] == 1.0:
arr_idx = np.zeros((rewards.shape[0], 1))
arr_idx[-1] = 1.0
values_rewards_sum_one_hot = tf.convert_to_tensor(arr_idx, dtype=tf.float32)
next_value = tf.reduce_sum(rewards * values_rewards_sum_one_hot, axis=0)
else:
values_rewards_sum = self.model_a2c(next_states)[-1]
arr_idx = np.zeros((rewards.shape[0], 1))
arr_idx[0] = 1.0
values_rewards_sum_one_hot = tf.convert_to_tensor(arr_idx, dtype=tf.float32)
next_value = tf.reduce_sum(values_rewards_sum * values_rewards_sum_one_hot, axis=0)
# Iterate over rewards
list_true_values = []
for i in reversed(range(0, len(rewards))):
if done[i]==0.0:
next_value = rewards[i] + next_value * self.gamma
else:
next_value = rewards[i]
list_true_values.append(next_value)
list_true_values.reverse()
list_true_values = tf.convert_to_tensor(list_true_values, dtype=tf.float32)
if standarize:
list_true_values = ((list_true_values - tf.math.reduce_mean(list_true_values)) /
(tf.math.reduce_std(list_true_values) + tf.constant(1e-12)))
return list_true_values
with tf.GradientTape() as tape:
# Advantage
returns = self.getExpectedReturns(states, next_states, done, rewards, standarize=False)
actions_probs_logits, values = self.model_a2c(states)
advantage = returns - values
advantage = tf.squeeze(advantage)
# Actions probs
actions_probs_softmax = tf.nn.softmax(actions_probs_logits)
actions_log_probs_softmax = tf.nn.log_softmax(actions_probs_logits)
actions_one_hot = tf.one_hot(actions, self.num_actions, 1.0, 0.0)
actions_log_probs = tf.reduce_sum(actions_log_probs_softmax * actions_one_hot, axis=-1)
# Entropy
entropy = self.entropy_coef * tf.reduce_mean(actions_probs_softmax * actions_log_probs_softmax, axis=1)
# Losses
actor_loss = -tf.reduce_mean(actions_log_probs * tf.stop_gradient(advantage), axis=0)
critic_loss = self.critic_coef * tf.reduce_mean(tf.math.pow(advantage, 2), axis=0)
total_loss = actor_loss + critic_loss - entropy

Adding layers and bidirectionality to custom LSTM cell in pytorch

I use a very custom LSTM-cell inspired by http://mlexplained.com/2019/02/15/building-an-lstm-from-scratch-in-pytorch-lstms-in-depth-part-1/.
I use it to look at intermediate gating values. My question is, how would I expand this class to have an option for adding more layers and for adding bidirectionality? Should it be wrapped in a new class or added in the present one?
class Dim(IntEnum):
batch = 0
seq = 1
class simpleLSTM(nn.Module):
def __init__(self, input_sz: int, hidden_sz: int):
super().__init__()
self.input_size = input_sz
self.hidden_size = hidden_sz
# input gate
self.W_ii = Parameter(torch.Tensor(input_sz, hidden_sz))
self.W_hi = Parameter(torch.Tensor(hidden_sz, hidden_sz))
self.b_i = Parameter(torch.Tensor(hidden_sz))
# forget gate
self.W_if = Parameter(torch.Tensor(input_sz, hidden_sz))
self.W_hf = Parameter(torch.Tensor(hidden_sz, hidden_sz))
self.b_f = Parameter(torch.Tensor(hidden_sz))
# ???
self.W_ig = Parameter(torch.Tensor(input_sz, hidden_sz))
self.W_hg = Parameter(torch.Tensor(hidden_sz, hidden_sz))
self.b_g = Parameter(torch.Tensor(hidden_sz))
# output gate
self.W_io = Parameter(torch.Tensor(input_sz, hidden_sz))
self.W_ho = Parameter(torch.Tensor(hidden_sz, hidden_sz))
self.b_o = Parameter(torch.Tensor(hidden_sz))
self.init_weights()
self.out = nn.Linear(hidden_sz, len(TRG.vocab))
def init_weights(self):
for p in self.parameters():
if p.data.ndimension() >= 2:
nn.init.xavier_uniform_(p.data)
else:
nn.init.zeros_(p.data)
def forward(self, x, init_states=None ):
"""Assumes x is of shape (batch, sequence, feature)"""
seq_sz, bs, = x.size()
hidden_seq = []
prediction = []
if init_states is None:
h_t, c_t = torch.zeros(self.hidden_size).to(x.device), torch.zeros(self.hidden_size).to(x.device)
else:
h_t, c_t = init_states
for t in range(seq_sz): # iterate over the time steps
x_t = x[t, :].float()
#LOOK HERE!!!
i_t = torch.sigmoid(x_t # self.W_ii + h_t # self.W_hi + self.b_i)
f_t = torch.sigmoid(x_t # self.W_if + h_t # self.W_hf + self.b_f)
g_t = torch.tanh(x_t # self.W_ig + h_t # self.W_hg + self.b_g)
o_t = torch.sigmoid(x_t # self.W_io + h_t # self.W_ho + self.b_o)
c_t = f_t * c_t + i_t * g_t
h_t = o_t * torch.tanh(c_t)
hidden_seq.append(h_t.unsqueeze(Dim.batch))
pred_t = self.out(h_t.unsqueeze(Dim.batch))
#pred_t = F.softmax(pred_t)
prediction.append(pred_t)
hidden_seq = torch.cat(hidden_seq, dim=Dim.batch)
prediction = torch.cat(prediction, dim=Dim.batch)
# reshape from shape (sequence, batch, feature) to (batch, sequence, feature)
hidden_seq = hidden_seq.transpose(Dim.batch, Dim.seq).contiguous()
prediction = prediction.transpose(Dim.batch, Dim.seq).contiguous()
return prediction, hidden_seq, (h_t, c_t)
I call it and train using the following as an example.
lstm = simpleLSTM(1, 100)
hidden_size = lstm.hidden_size
optimizer = optim.Adam(lstm.parameters())
h_0, c_0 = (torch.zeros(hidden_size, requires_grad=True),
torch.zeros(hidden_size, requires_grad=True))
grads = []
h_t, c_t = h_0, c_0
N_EPOCHS = 10
for epoch in range(N_EPOCHS):
epoch_loss = 0
for i, batch in enumerate(train):
optimizer.zero_grad()
src, src_len = batch.src
trg = batch.trg
trg = trg.view(-1)
predict, output, hidden_states = lstm(src)
predict = predict.t().unsqueeze(1)
predict= predict.view(-1, predict.shape[-1])
loss = criterion(predict,trg)
loss.backward()
optimizer.step()
epoch_loss += loss.item()
print(epoch_loss)

The easiest would be to create another module (say Bidirectional) and pass any cell you want to it.
Implementation itself is quite easy to do. Notice that I'm using concat operation for joining bi-directional output, you may want to specify other modes like summation etc.
Please read the comments in the code below, you may have to change it appropriately.
import torch
class Bidirectional(torch.nn.Module):
def __init__(self, cell):
super().__init__()
self.cell = cell
def __call__(self, x, init_states=None):
prediction, hidden_seq, (h_t, c_t) = self.cell(x, init_states)
backward_prediction, backward_hidden_seq, (
backward_h_t,
backward_c_t,
# Assuming sequence is first dimension, otherwise change 0 appropriately
# Reverses sequences so the LSTM cell acts on the reversed sequence
) = self.cell(torch.flip(x, (0,)), init_states)
return (
# Assuming you transpose so it has (batch, seq, features) dimensionality
torch.cat((prediction, backward_prediction), 2),
torch.cat((hidden_seq, backward_hidden_seq), 2),
# Assuming it has (batch, features) dimensionality
torch.cat((h_t, backward_ht), 1),
torch.cat((c_t, backward_ct), 1),
)
When it comes to multiple layers you could do something similiar in principle:
import torch
class Multilayer(torch.nn.Module):
def __init__(self, *cells):
super().__init__()
self.cells = torch.nn.ModuleList(cells)
def __call__(self, x, init_states=None):
inputs = x
for cell in self.cells:
prediction, hidden_seq, (h_t, c_t) = cell(inputs, init_states)
inputs = hidden_seq
return prediction, hidden_seq, (h_t, c_t)
Please note you have to pass created cell objects into Multilayer e.g.:
# For three layers of LSTM, each needs features to be set up correctly
multilayer_LSTM = Multilayer(LSTM(), LSTM(), LSTM())
You may also pass classes instead of instances into constructor and create those inside Multilayer (so hidden_size matches automatically), but those ideas should get you started.

Gradients error using TensorArray Tensorflow

i am trying to implement multidimentional lstm in tensorflow, I am using TensorArray to remember previous states, i am using a complicated way to get two neigbours state(above and from left). tf.cond want that both posible condition to exist and to have the same number of inputs. this is why i added one more cell.zero_state to the (last index +1) of the states. then i using a function to get the correct indexes to the states. when i am trying to use an optimizer in order to minimize a cost, i getting that error:
InvalidArgumentError (see above for traceback): TensorArray
MultiDimentionalLSTMCell-l1-multi-l1/state_ta_262#gradients: Could not
read from TensorArray index 809 because it has not yet been written
to.
Can someone tell how to fix it?
Ps: without optimizer it works!
class MultiDimentionalLSTMCell(tf.nn.rnn_cell.RNNCell):
"""
Note that state_is_tuple is always True.
"""
def __init__(self, num_units, forget_bias=1.0, activation=tf.nn.tanh):
self._num_units = num_units
self._forget_bias = forget_bias
self._activation = activation
#property
def state_size(self):
return tf.nn.rnn_cell.LSTMStateTuple(self._num_units, self._num_units)
#property
def output_size(self):
return self._num_units
def __call__(self, inputs, state, scope=None):
"""Long short-term memory cell (LSTM).
#param: imputs (batch,n)
#param state: the states and hidden unit of the two cells
"""
with tf.variable_scope(scope or type(self).__name__):
c1,c2,h1,h2 = state
# change bias argument to False since LN will add bias via shift
concat = tf.nn.rnn_cell._linear([inputs, h1, h2], 5 * self._num_units, False)
i, j, f1, f2, o = tf.split(1, 5, concat)
new_c = (c1 * tf.nn.sigmoid(f1 + self._forget_bias) +
c2 * tf.nn.sigmoid(f2 + self._forget_bias) + tf.nn.sigmoid(i) *
self._activation(j))
new_h = self._activation(new_c) * tf.nn.sigmoid(o)
new_state = tf.nn.rnn_cell.LSTMStateTuple(new_c, new_h)
return new_h, new_state
def multiDimentionalRNN_whileLoop(rnn_size,input_data,sh,dims=None,scopeN="layer1"):
"""Implements naive multidimentional recurent neural networks
#param rnn_size: the hidden units
#param input_data: the data to process of shape [batch,h,w,chanels]
#param sh: [heigth,width] of the windows
#param dims: dimentions to reverse the input data,eg.
dims=[False,True,True,False] => true means reverse dimention
#param scopeN : the scope
returns [batch,h/sh[0],w/sh[1],chanels*sh[0]*sh[1]] the output of the lstm
"""
with tf.variable_scope("MultiDimentionalLSTMCell-"+scopeN):
cell = MultiDimentionalLSTMCell(rnn_size)
shape = input_data.get_shape().as_list()
if shape[1]%sh[0] != 0:
offset = tf.zeros([shape[0], sh[0]-(shape[1]%sh[0]), shape[2], shape[3]])
input_data = tf.concat(1,[input_data,offset])
shape = input_data.get_shape().as_list()
if shape[2]%sh[1] != 0:
offset = tf.zeros([shape[0], shape[1], sh[1]-(shape[2]%sh[1]), shape[3]])
input_data = tf.concat(2,[input_data,offset])
shape = input_data.get_shape().as_list()
h,w = int(shape[1]/sh[0]),int(shape[2]/sh[1])
features = sh[1]*sh[0]*shape[3]
batch_size = shape[0]
x = tf.reshape(input_data, [batch_size,h,w, features])
if dims is not None:
x = tf.reverse(x, dims)
x = tf.transpose(x, [1,2,0,3])
x = tf.reshape(x, [-1, features])
x = tf.split(0, h*w, x)
sequence_length = tf.ones(shape=(batch_size,), dtype=tf.int32)*shape[0]
inputs_ta = tf.TensorArray(dtype=tf.float32, size=h*w,name='input_ta')
inputs_ta = inputs_ta.unpack(x)
states_ta = tf.TensorArray(dtype=tf.float32, size=h*w+1,name='state_ta',clear_after_read=False)
outputs_ta = tf.TensorArray(dtype=tf.float32, size=h*w,name='output_ta')
states_ta = states_ta.write(h*w, tf.nn.rnn_cell.LSTMStateTuple(tf.zeros([batch_size,rnn_size], tf.float32),
tf.zeros([batch_size,rnn_size], tf.float32)))
def getindex1(t,w):
return tf.cond(tf.less_equal(tf.constant(w),t),
lambda:t-tf.constant(w),
lambda:tf.constant(h*w))
def getindex2(t,w):
return tf.cond(tf.less(tf.constant(0),tf.mod(t,tf.constant(w))),
lambda:t-tf.constant(1),
lambda:tf.constant(h*w))
time = tf.constant(0)
def body(time, outputs_ta, states_ta):
constant_val = tf.constant(0)
stateUp = tf.cond(tf.less_equal(tf.constant(w),time),
lambda: states_ta.read(getindex1(time,w)),
lambda: states_ta.read(h*w))
stateLast = tf.cond(tf.less(constant_val,tf.mod(time,tf.constant(w))),
lambda: states_ta.read(getindex2(time,w)),
lambda: states_ta.read(h*w))
currentState = stateUp[0],stateLast[0],stateUp[1],stateLast[1]
out , state = cell(inputs_ta.read(time),currentState)
outputs_ta = outputs_ta.write(time,out)
states_ta = states_ta.write(time,state)
return time + 1, outputs_ta, states_ta
def condition(time,outputs_ta,states_ta):
return tf.less(time , tf.constant(h*w))
result , outputs_ta, states_ta = tf.while_loop(condition, body, [time,outputs_ta,states_ta])
outputs = outputs_ta.pack()
states = states_ta.pack()
y = tf.reshape(outputs, [h,w,batch_size,rnn_size])
y = tf.transpose(y, [2,0,1,3])
if dims is not None:
y = tf.reverse(y, dims)
return y
def tanAndSum(rnn_size,input_data,scope):
outs = []
for i in range(2):
for j in range(2):
dims = [False]*4
if i!=0:
dims[1] = True
if j!=0:
dims[2] = True
outputs = multiDimentionalRNN_whileLoop(rnn_size,input_data,[2,2],
dims,scope+"-multi-l{0}".format(i*2+j))
outs.append(outputs)
outs = tf.pack(outs, axis=0)
mean = tf.reduce_mean(outs, 0)
return tf.nn.tanh(mean)
graph = tf.Graph()
with graph.as_default():
input_data = tf.placeholder(tf.float32, [20,36,90,1])
#input_data = tf.ones([20,36,90,1],dtype=tf.float32)
sh = [2,2]
out1 = tanAndSum(20,input_data,'l1')
out = tanAndSum(25,out1,'l2')
cost = tf.reduce_mean(out)
optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)
#out = multiDimentionalRNN_raw_rnn(2,input_data,sh,dims=[False,True,True,False],scopeN="layer1")
#cell = MultiDimentionalLSTMCell(10)
#out = cell.zero_state(2, tf.float32).c
with tf.Session(graph=graph) as session:
tf.global_variables_initializer().run()
ou,k,_ = session.run([out,cost,optimizer],{input_data:np.ones([20,36,90,1],dtype=np.float32)})
print(ou.shape)
print(k)

You should add parameter parallel_iterations=1 to your while loop call.
Such as:
result, outputs_ta, states_ta = tf.while_loop(
condition, body, [time,outputs_ta,states_ta], parallel_iterations=1)
This is required because inside body you perform read and write operations on the same tensor array (states_ta). And in case of parallel loop execution(parallel_iterations > 1) some thread may try to read info from tensorArray, that was not written to it by another one.
I've test your code snippet with parallel_iterations=1 on tensorflow 0.12.1 and it works as expected.

TensorFlow: How to use Adam optimizer properly

Somebody have already asked a similar question, but the solution, which is given there, does not work for me.
I am trying to use Adam optimizer in TensorFlow. Here is a part of my code about it:
adamOptimizer = tf.train.AdamOptimizer(learning_rate=0.001, beta1=0.9,
beta2=0.999, epsilon=1e-08, use_locking=False, name='Adam')
print('Optimizer was created!')
# Create a variable to track the global step.
global_step = tf.Variable(0, name='global_step', trainable=False)
# Initialize variables
vars_to_init = ae.get_variables_to_init(n)
vars_to_init.append(global_step)
vars_to_init.append
sess.run(tf.variables_initializer(vars_to_init))
# Create an optimizer
train_op = adamOptimizer.minimize(loss, global_step=global_step)
The following error is raised after train_op is used for the first time:
FailedPreconditionError (see above for traceback): Attempting to use uninitialized value pretrain_1/beta2_power
[[Node: pretrain_1/beta2_power/read = IdentityT=DT_FLOAT, _class=["loc:#autoencoder_variables/weights1"], _device="/job:localhost/replica:0/task:0/cpu:0"]]
If I try to add a line
vars_to_init.append(beta2_power)
I am getting the following error:
NameError: global name 'beta2_power' is not defined
If I follow an advice for the similar question and replace sess.run(tf.variables_initializer(vars_to_init)) by sess.run(tf.initialize_all_variables()), I am getting the following error after running this line:
FailedPreconditionError: Attempting to use uninitialized value autoencoder_variables/biases1
[[Node: autoencoder_variables/biases1/read = IdentityT=DT_FLOAT, _class=["loc:#autoencoder_variables/biases1"], _device="/job:localhost/replica:0/task:0/cpu:0"]]
I didn't have any problems when I was using Gradient Descent optimizer...
What am I doing wrong? What is the proper way to use this optimizer?
More details about the class to clarify autoencoder_variables:
class AutoEncoder(object):
_weights_str = "weights{0}"
_biases_str = "biases{0}"
def __init__(self, shape, sess):
self.__shape = shape
self.__num_hidden_layers = len(self.__shape) - 2
self.__variables = {}
self.__sess = sess
self._setup_variables()
#property
def shape(self):
return self.__shape
#property
def num_hidden_layers(self):
return self.__num_hidden_layers
#property
def session(self):
return self.__sess
def __getitem__(self, item):
return self.__variables[item]
def __setitem__(self, key, value):
self.__variables[key] = value
def _setup_variables(self):
with tf.name_scope("autoencoder_variables"):
for i in xrange(self.__num_hidden_layers + 1):
# Train weights
name_w = self._weights_str.format(i + 1)
w_shape = (self.__shape[i], self.__shape[i + 1])
a = tf.mul(4.0, tf.sqrt(6.0 / (w_shape[0] + w_shape[1])))
w_init = tf.random_uniform(w_shape, -1 * a, a)
self[name_w] = tf.Variable(w_init,
name=name_w,
trainable=True)
# Train biases
name_b = self._biases_str.format(i + 1)
b_shape = (self.__shape[i + 1],)
b_init = tf.zeros(b_shape)
self[name_b] = tf.Variable(b_init, trainable=True, name=name_b)
if i <= self.__num_hidden_layers:
# Hidden layer fixed weights (after pretraining before fine tuning)
self[name_w + "_fixed"] = tf.Variable(tf.identity(self[name_w]),
name=name_w + "_fixed",
trainable=False)
# Hidden layer fixed biases
self[name_b + "_fixed"] = tf.Variable(tf.identity(self[name_b]),
name=name_b + "_fixed",
trainable=False)
# Pretraining output training biases
name_b_out = self._biases_str.format(i + 1) + "_out"
b_shape = (self.__shape[i],)
b_init = tf.zeros(b_shape)
self[name_b_out] = tf.Variable(b_init,
trainable=True,
name=name_b_out)
def _w(self, n, suffix=""):
return self[self._weights_str.format(n) + suffix]
def _b(self, n, suffix=""):
return self[self._biases_str.format(n) + suffix]
def get_variables_to_init(self, n):
assert n > 0
assert n <= self.__num_hidden_layers + 1
vars_to_init = [self._w(n), self._b(n)]
if n <= self.__num_hidden_layers:
vars_to_init.append(self._b(n, "_out"))
if 1 < n <= self.__num_hidden_layers+1:
# Fixed matrices for learning of deeper layers
vars_to_init.append(self._w(n - 1, "_fixed"))
vars_to_init.append(self._b(n - 1, "_fixed"))
return vars_to_init

The problem was that I was using one variable's values in order to initialize other variable (it raised an error of using uninitialized variables during initialization).
Instead of using another variable during initialization,
self[name_b + "_fixed"] = tf.Variable(tf.identity(self[name_b]),
name=name_b + "_fixed",
trainable=False)
I initialize it randomly:
self[name_b + "_fixed"] = tf.Variable(init_b,
name=name_b + "_fixed",
trainable=False)
And assigned it to another variable once it was trained:
ae[name_w + "_fixed"] = tf.identity(ae[name_w])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Implementing the Rprop algorithm in Keras - python

Related

Style loss is always zero

Implementing A3C on TensorFlow 2

Adding layers and bidirectionality to custom LSTM cell in pytorch

Gradients error using TensorArray Tensorflow

TensorFlow: How to use Adam optimizer properly

Categories

Resources