I'd like to add a max norm constraint to several of the weight matrices in my TensorFlow graph, ala Torch's renorm method.
If the L2 norm of any neuron's weight matrix exceeds max_norm, I'd like to scale its weights down so that their L2 norm is exactly max_norm.
What's the best way to express this using TensorFlow?
Here is a possible implementation:
import tensorflow as tf
def maxnorm_regularizer(threshold, axes=1, name="maxnorm", collection="maxnorm"):
def maxnorm(weights):
clipped = tf.clip_by_norm(weights, clip_norm=threshold, axes=axes)
clip_weights = tf.assign(weights, clipped, name=name)
tf.add_to_collection(collection, clip_weights)
return None # there is no regularization loss term
return maxnorm
Here's how you would use it:
from tensorflow.contrib.layers import fully_connected
from tensorflow.contrib.framework import arg_scope
with arg_scope(
hidden1 = fully_connected(X, 200, scope="hidden1")
hidden2 = fully_connected(hidden1, 100, scope="hidden2")
outputs = fully_connected(hidden2, 5, activation_fn=None, scope="outs")
max_norm_ops = tf.get_collection("max_norm")
with tf.Session() as sess:
for epoch in range(n_epochs):
for X_batch, y_batch in load_next_batch():
sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
This creates a 3 layer neural network and trains it with max norm regularization at every layer (with a threshold of 1.5). I just tried it, seems to work. Hope this helps! Suggestions for improvements are welcome. :)
This code is based on tf.clip_by_norm():
>>> x = tf.constant([0., 0., 3., 4., 30., 40., 300., 400.], shape=(4, 2))
>>> print(x.eval())
[[ 0. 0.]
[ 3. 4.]
[ 30. 40.]
[ 300. 400.]]
>>> clip_rows = tf.clip_by_norm(x, clip_norm=10, axes=1)
>>> print(clip_rows.eval())
[[ 0. 0. ]
[ 3. 4. ]
[ 6. 8. ] # clipped!
[ 6.00000048 8. ]] # clipped!
You can also clip columns if you need to:
>>> clip_cols = tf.clip_by_norm(x, clip_norm=350, axes=0)
>>> print(clip_cols.eval())
[[ 0. 0. ]
[ 3. 3.48245788]
[ 30. 34.82457733]
[ 300. 348.24578857]]
# clipped!
Using RafaĆ's suggestion and TensorFlow's implementation of clip_by_norm, here's what I came up with:
def renorm(x, axis, max_norm):
'''Renormalizes the sub-tensors along axis such that they do not exceed norm max_norm.'''
# This elaborate dance avoids empty slices, which TF dislikes.
rank = tf.rank(x)
bigrange = tf.range(-1, rank + 1)
dims = tf.slice(
tf.concat(0, [tf.slice(bigrange, [0], [1 + axis]),
tf.slice(bigrange, [axis + 2], [-1])]),
[1], rank - [1])
# Determine which columns need to be renormalized.
l2norm_inv = tf.rsqrt(tf.reduce_sum(x * x, dims, keep_dims=True))
scale = max_norm * tf.minimum(l2norm_inv, tf.constant(1.0 / max_norm))
# Broadcast the scalings
return tf.mul(scale, x)
It seems to have the desired behavior for 2-dimensional matrices and should
generalize to tensors:
> x = tf.constant([0., 0., 3., 4., 30., 40., 300., 400.], shape=(4, 2))
> print x.eval()
[[ 0. 0.] # rows have norms of 0, 5, 50, 500
[ 3. 4.] # cols have norms of ~302, ~402
[ 30. 40.]
[ 300. 400.]]
> print renorm(x, 0, 10).eval()
[[ 0. 0. ] # unaffected
[ 3. 4. ] # unaffected
[ 5.99999952 7.99999952] # rescaled
[ 6.00000048 8.00000095]] # rescaled
> print renorm(x, 1, 350).eval()
[[ 0. 0. ] # col 0 is unaffected
[ 3. 3.48245788] # col 1 is rescaled
[ 30. 34.82457733]
[ 300. 348.24578857]]
Take a look at clip_by_norm function, which does exactly this. It takes a single tensor as input and returns a scaled down tensor.
The bounty expires in 6 days. Answers to this question are eligible for a +500 reputation bounty.
Chrispresso wants to draw more attention to this question.
I have a really small model with architecture [2, 3, 6] where the hidden layer uses ReLU and it's a softmax activation for multiclass classification. Trained offline and statically quantized later to qint8. What I would like to do now is extract the weights so I can use them on other hardware via matrix multiplication/addition. The problem I'm encountering is it doesn't seem to behave as expected. Take for instance this GraphModule output of state_dict():
OrderedDict([('input_layer_input_scale_0', tensor(0.0039)),
('input_layer_input_zero_point_0', tensor(0)),
('input_layer.scale', tensor(0.0297)),
('input_layer.zero_point', tensor(0)),
('input_layer._packed_params.dtype', torch.qint8),
(tensor([[-0.1180, 0.1180],
[-0.2949, -0.5308],
[-3.3029, -7.5496]], size=(3, 2), dtype=torch.qint8,
quantization_scheme=torch.per_tensor_affine, scale=0.05898105353116989,
Parameter containing:
tensor([-0.4747, -0.3563, 7.7603], requires_grad=True))),
('out.scale', tensor(1.5963)),
('out.zero_point', tensor(243)),
('out._packed_params.dtype', torch.qint8),
(tensor([[ 0.4365, 0.4365, -55.4356],
[ 0.4365, 0.0000, 1.3095],
[ 0.4365, 0.0000, -13.9680],
[ 0.4365, -0.4365, 4.3650],
[ 0.4365, 0.4365, -3.0555],
[ 0.4365, 0.0000, -1.3095],
[ 0.4365, 0.0000, 3.0555]], size=(7, 3), dtype=torch.qint8,
quantization_scheme=torch.per_tensor_affine, scale=0.43650051951408386,
Parameter containing:
tensor([ 19.2761, -1.0785, 14.2602, -22.3171, 10.1059, 7.2197, -11.7253],
If I directly access the weights the way I think I should like so:
input_weights = np.array(
[[-0.1180, 0.1180],
[-0.2949, -0.5308],
[-3.3029, -7.5496]])
inputs_scale = 0.05898105353116989
inputs_zero_point = 0
W1=np.clip(np.round(input_weights/inputs_scale+ inputs_zero_scale), -127, 128)
b1=np.clip(np.round(np.array([-0.4747, -0.3563, 7.7603])/inputs_scale + inputs_zer_scale), -127, 128)
output_weights = np.array(
[[ 0.4365, 0.4365, -55.4356],
[ 0.4365, 0.0000, 1.3095],
[ 0.4365, 0.0000, -13.9680],
[ 0.4365, -0.4365, 4.3650],
[ 0.4365, 0.4365, -3.0555],
[ 0.4365, 0.0000, -1.3095],
[ 0.4365, 0.0000, 3.0555]])
W1=np.clip(np.round(output_weights/outputs_scale+ outputs_zero_scale), -127, 128)
W2=np.clip(np.round(np.array([ 19.2761, -1.0785, 14.2602, -22.3171, 10.1059, 7.2197, -11.7253])/outputs_scale + outputs_zero_scale), -127, 128)
And then I give it some data:
inputs = np.array(
[[1. , 1. ], # class 0 example
[1. , 0. ], # class 1 example
[0. , 1. ],
[0. , 0. ],
[0. , 0.9 ],
[0. , 0.75],
[0. , 0.25]]) # class 6 example
Where each row is an example, then I would expect to be able to do matrix multiplication and argmax over the rows to get the result. However, doing that gives me this:
>>> (ReLU((inputs # W1.T) + b1) # W2.T + b2).argmax(axis=0)
array([0, 3, 0, 3, 0, 0, 3])
which is not right.
And when I test accuracy of the quantized model in pytorch it's high enough that it should get all examples correct here. So what am I misunderstanding in terms of accessing these weights/bias?
My English is poor. I will try my best to clarify my question.
My inputs are various, [[1,2],[3,4]] and [[5,6],[7,8],[10,11]].
The outputs that I want are [[1,0,2,0],[3,0,4,0]] and [[5,0,6,0],[7,0,8,0],[10,0,11,0]] (which means adding zeros between the numbers)
Here is my implemention:
import tensorflow as tf
import numpy as np
matrix2 = [[5,6],[7,8],[10,11]]
with tf.Session() as sess:
input = tf.placeholder(tf.float32, [None, 2])
[matrix3] = sess.run([output], feed_dict={input:matrix1})
the code about how_to_add is:
def how_to_add(input):
shape = input.get_shape().as_list()
with tf.control_dependencies([output[:,1::2].assign(input) ]):
output = tf.identity(output)
return output
but shape[0] is ?, so I got an error:
"Cannot convert a partially known TensorShape to a Tensor: %s" % s)
ValueError: Cannot convert a partially known TensorShape to a Tensor: (?, 4)
How to correct my codes?
These codes work:
import tensorflow as tf
import numpy as np
matrix2 = [[5,6],[7,8],[10,11]]
with tf.Session() as sess:
input = tf.placeholder(tf.float32, [2, 2]) #'None' is repalced with '2'
[matrix3] = sess.run([output], feed_dict={input:matrix1})
the code about how_to_add is:
def how_to_add(input):
#shape = input.get_shape().as_list()
output=tf.Variable(tf.zeros(([2,4)) # 'shape[0]' is replaced with '2'
with tf.control_dependencies([output[:,1::2].assign(input) ]):
output = tf.identity(output)
return output
Although these codes work, they can only deal with matrix1 rather than matrix2.
Do not use a variable for this, that is not their purpose. You should create a new tensor that is made from your input tensor. For your problem, you can do that like this:
import tensorflow as tf
def interleave_zero_columns(matrix):
# Add a matrix of zeros along a new third dimension
a = tf.stack([matrix, tf.zeros_like(matrix)], axis=2)
# Reshape to interleave zeros across columns
return tf.reshape(a, [tf.shape(matrix)[0], -1])
# Test
matrix1 = [[1, 2], [3, 4]]
matrix2 = [[5, 6], [7, 8], [10, 11]]
with tf.Session() as sess:
input = tf.placeholder(tf.float32, [None, 2])
output = interleave_zero_columns(input)
print(sess.run(output, feed_dict={input: matrix1}))
# [[1. 0. 2. 0.]
# [3. 0. 4. 0.]]
print(sess.run(output, feed_dict={input: matrix2}))
# [[ 5. 0. 6. 0.]
# [ 7. 0. 8. 0.]
# [10. 0. 11. 0.]]
I have tried the example with keras but was not with LSTM. My model is with LSTM in Tensorflow and I am willing to predict the output in the form of classes as the keras model thus with predict_classes.
The Tensorflow model I am trying is something like this:
n_steps = seq_len-1
n_inputs = x_train.shape[2]
n_neurons = 50
n_outputs = y_train.shape[1]
n_layers = 2
learning_rate = 0.0001
batch_size =100
n_epochs = 1000
train_set_size = x_train.shape[0]
test_set_size = x_test.shape[0]
X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y = tf.placeholder(tf.float32, [None, n_outputs])
layers = [tf.contrib.rnn.LSTMCell(num_units=n_neurons,activation=tf.nn.sigmoid, use_peepholes = True) for layer in range(n_layers)]
multi_layer_cell = tf.contrib.rnn.MultiRNNCell(layers)
rnn_outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X, dtype=tf.float32)
stacked_rnn_outputs = tf.reshape(rnn_outputs, [-1, n_neurons])
stacked_outputs = tf.layers.dense(stacked_rnn_outputs, n_outputs)
outputs = tf.reshape(stacked_outputs, [-1, n_steps, n_outputs])
outputs = outputs[:,n_steps-1,:]
loss = tf.reduce_mean(tf.square(outputs - y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)
I am encoding the with sklearn LabelEncoder as:
encoder_train = LabelEncoder()
encoded_Y_train = encoder_train.transform(y_train)
y_train = np_utils.to_categorical(encoded_Y_train)
The data is converted to sparse matrix kinda thing in binary format.
When I tried to predict the output I got the following:
actual==> [[0. 0. 1.]
[1. 0. 0.]
[1. 0. 0.]
[0. 0. 1.]
[1. 0. 0.]
[1. 0. 0.]
[1. 0. 0.]
[0. 1. 0.]
[0. 1. 0.]]
predicted==> [[0.3112209 0.3690182 0.31357136]
[0.31085992 0.36959863 0.31448898]
[0.31073445 0.3703295 0.31469804]
[0.31177694 0.37011752 0.3145326 ]
[0.31220382 0.3692756 0.31515726]
[0.31232828 0.36947766 0.3149037 ]
[0.31190437 0.36756667 0.31323162]
[0.31339088 0.36542615 0.310322 ]
[0.31598282 0.36328828 0.30711085]]
What I was expecting for the label based on the encoding done. As the Keras model thus. See the following:
predictions = model.predict_classes(X_test, verbose=True)
print("REAL VALUES:",reverse_category(Y_test,axis=1))
print("PRED VALUES:",predictions)
print("REAL COLORS:")
The output is something like the following:
REAL VALUES: [1 1 1 ... 1 2 1]
PRED VALUES: [2 1 1 ... 1 2 2]
['ball' 'ball' 'ball' ... 'ball' 'bat' 'ball']
['bat' 'ball' 'ball' ... 'ball' 'bat' 'bat']
Kindly, let me know what I can do in the tensorflow model that will get me the result with respect to the encoding done.
I am using Tensorflow 1.12.0 and Windows 10
You are trying to map the predicted class probabilities back to class labels. Each row in the list of output predictions contains the three predicted class probabilities. Use np.argmax to obtain the one with the highest predicted probability in order to map to the predicted class label:
import numpy as np
predictions = [[0.3112209, 0.3690182, 0.31357136],
[0.31085992, 0.36959863, 0.31448898],
[0.31073445, 0.3703295, 0.31469804],
[0.31177694, 0.37011752, 0.3145326 ],
[0.31220382, 0.3692756, 0.31515726],
[0.31232828, 0.36947766, 0.3149037 ],
[0.31190437, 0.36756667, 0.31323162],
[0.31339088, 0.36542615, 0.310322 ],
[0.31598282, 0.36328828, 0.30711085]]
np.argmax(predictions, axis=1)
array([1, 1, 1, 1, 1, 1, 1, 1, 1])
In this case, class 1 is predicted 9 times.
As noted in the comments: this is exactly what Keras does under the hood, as you'll see in the source code.
import tensorflow as tf
M = tf.Variable([0.01],tf.float32)
b = tf.Variable([1.0],tf.float32)
#inputs and outputs
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32) # actual value of y which we already know
Yp = M * x + b # y predicted value
squareR = tf.square(Yp - y)
loss = tf.reduce_sum(squareR)
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
init = tf.global_variables_initializer()
sess = tf.Session()
for i in range(1000):
[array([ 0.88999945], dtype=float32), array([ 0.93000191], dtype=float32)]
when I am changing the values of x and y to
then the output is:
[array([ nan], dtype=float32), array([ nan], dtype=float32)]
please help me out to get slope and y-intercept of linear model.
Adding some print statements to your training loop, we can see what's going on during training:
for i in range(1000):
_, mm, bb = sess.run([train,M,b],{x:[100,200,300,400,500],y:[19,24,37,49,51]})
print(mm, bb)
if np.isnan(mm):
The output:
[ 1118.01000977] [ 4.19999981]
[-12295860.] [-33532.921875]
[ 1.35243170e+11] [ 3.68845632e+08]
[ -1.48755065e+15] [ -4.05696309e+12]
[ 1.63616896e+19] [ 4.46228634e+16]
[ -1.79963571e+23] [ -4.90810521e+20]
[ 1.97943407e+27] [ 5.39846559e+24]
[ -2.17719537e+31] [ -5.93781625e+28]
[ 2.39471499e+35] [ 6.53105210e+32]
[-inf] [-inf]
[ nan] [ nan]
That output means your training is diverging. In this case, lowering the learning rate is one of the possible approaches to fix the problem.
Lowering the learning rate to 0.000001 works, these are the learned M and b after 1000 iterations:
[array([ 0.11159456], dtype=float32), array([ 1.01534212], dtype=float32)]
I have the following code based on the MNIST example. It is modified in two ways:
1) I'm not using a one-hot-vector, so I simply use tf.equal(y, y_)
2) My results are binary: either 0 or 1
import tensorflow as tf
import numpy as np
# get the data
train_data, train_results = get_data(2000, 2014)
test_data, test_results = get_data(2014, 2015)
# setup a session
sess = tf.Session()
x_len = len(train_data[0])
y_len = len(train_results[0])
# make placeholders for inputs and outputs
x = tf.placeholder(tf.float32, shape=[None, x_len])
y_ = tf.placeholder(tf.float32, shape=[None, y_len])
# create the weights and bias
W = tf.Variable(tf.zeros([x_len, 1]))
b = tf.Variable(tf.zeros([1]))
# initialize everything
# create the "equation" for y in terms of x
y_prime = tf.matmul(x, W) + b
y = tf.nn.softmax(y_prime)
# construct the error function
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(y_prime, y_)
# setup the training algorithm
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
# train the thing
for i in range(1000):
rand_rows = np.random.choice(train_data.shape[0], 100, replace=False)
_, w_out, b_out, ce_out = sess.run([train_step, W, b, cross_entropy], feed_dict={x: train_data[rand_rows, :], y_: train_results[rand_rows, :]})
print("%d: %s %s %s" % (i, str(w_out), str(b_out), str(ce_out)))
# compute how many times it was correct
correct_prediction = tf.equal(y, y_)
# find the accuracy of the predictions
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print(sess.run(accuracy, feed_dict={x: test_data, y_: test_results}))
for i in range(0, len(test_data)):
res = sess.run(y, {x: [test_data[i]]})
print("RES: " + str(res) + " ACT: " + str(test_results[i]))
The accuracy is always 0.5 (because my test data has about as many 1s as 0s). The values of W and b always seem to increase, probably because the values of cross_entropy are always a vector of all zeros.
When I try and use this model for prediction, the predictions are always 1:
RES: [[ 1.]] ACT: [ 0.]
RES: [[ 1.]] ACT: [ 1.]
RES: [[ 1.]] ACT: [ 0.]
RES: [[ 1.]] ACT: [ 1.]
RES: [[ 1.]] ACT: [ 0.]
RES: [[ 1.]] ACT: [ 1.]
RES: [[ 1.]] ACT: [ 0.]
RES: [[ 1.]] ACT: [ 0.]
RES: [[ 1.]] ACT: [ 1.]
RES: [[ 1.]] ACT: [ 0.]
RES: [[ 1.]] ACT: [ 1.]
What am I doing wrong here?
You seem to be predicting a single scalar, rather than a vector. The softmax op produces a vector-valued prediction for each example. This vector must always sum to 1. When the vector only contains one element, that element must always be 1. If you want to use a softmax for this problem, you could use [1, 0] as the output target where you are currently using [0] and use [0, 1] where you are currently using [1]. Another option is you could keep using just one number, but change the output layer to sigmoid instead of softmax, and change the cost function to be the sigmoid-based cost function as well.