After training the cnn model, I want to visualize the weight or print out the weights, what can I do?
I cannot even print out the variables after training.
Thank you!
To visualize the weights, you can use a tf.image_summary() op to transform a convolutional filter (or a slice of a filter) into a summary proto, write them to a log using a tf.train.SummaryWriter, and visualize the log using TensorBoard.
Let's say you have the following (simplified) program:
filter = tf.Variable(tf.truncated_normal([8, 8, 3]))
images = tf.placeholder(tf.float32, shape=[None, 28, 28])
conv = tf.nn.conv2d(images, filter, strides=[1, 1, 1, 1], padding="SAME")
# More ops...
loss = ...
optimizer = tf.GradientDescentOptimizer(0.01)
train_op = optimizer.minimize(loss)
filter_summary = tf.image_summary(filter)
sess = tf.Session()
summary_writer = tf.train.SummaryWriter('/tmp/logs', sess.graph_def)
for i in range(10000):
sess.run(train_op)
if i % 10 == 0:
# Log a summary every 10 steps.
summary_writer.add_summary(filter_summary, i)
After doing this, you can start TensorBoard to visualize the logs in /tmp/logs, and you will be able to see a visualization of the filter.
Note that this trick visualizes depth-3 filters as RGB images (to match the channels of the input image). If you have deeper filters, or they don't make sense to interpret as color channels, you can use the tf.split() op to split the filter on the depth dimension, and generate one image summary per depth.
Like #mrry said, you can use tf.image_summary. For example, for cifar10_train.py, you can put this code somewhere under def train(). Note how you access a var under scope 'conv1'
# Visualize conv1 features
with tf.variable_scope('conv1') as scope_conv:
weights = tf.get_variable('weights')
# scale weights to [0 255] and convert to uint8 (maybe change scaling?)
x_min = tf.reduce_min(weights)
x_max = tf.reduce_max(weights)
weights_0_to_1 = (weights - x_min) / (x_max - x_min)
weights_0_to_255_uint8 = tf.image.convert_image_dtype (weights_0_to_1, dtype=tf.uint8)
# to tf.image_summary format [batch_size, height, width, channels]
weights_transposed = tf.transpose (weights_0_to_255_uint8, [3, 0, 1, 2])
# this will display random 3 filters from the 64 in conv1
tf.image_summary('conv1/filters', weights_transposed, max_images=3)
If you want to visualize all your conv1 filters in one nice grid, you would have to organize them into a grid yourself. I did that today, so now I'd like to share a gist for visualizing conv1 as a grid
You can extract the values as numpy arrays the following way:
with tf.variable_scope('conv1', reuse=True) as scope_conv:
W_conv1 = tf.get_variable('weights', shape=[5, 5, 1, 32])
weights = W_conv1.eval()
with open("conv1.weights.npz", "w") as outfile:
np.save(outfile, weights)
Note that you have to adjust the scope ('conv1' in my case) and the variable name ('weights' in my case).
Then it boils down on visualizing numpy arrays. One example how to visualize numpy arrays is
#!/usr/bin/env python
"""Visualize numpy arrays."""
import numpy as np
import scipy.misc
arr = np.load('conv1.weights.npb')
# Get each 5x5 filter from the 5x5x1x32 array
for filter_ in range(arr.shape[3]):
# Get the 5x5x1 filter:
extracted_filter = arr[:, :, :, filter_]
# Get rid of the last dimension (hence get 5x5):
extracted_filter = np.squeeze(extracted_filter)
# display the filter (might be very small - you can resize the window)
scipy.misc.imshow(extracted_filter)
Using the tensorflow 2 API, There are several options:
Weights extracted using the get_weights() function.
weights_n = model.layers[n].get_weights()[0]
Bias extracted using the numpy() convert function.
bias_n = model.layers[n].bias.numpy()
Related
I am using Keras with Tensorflow backend and want to evaluate the PSNR between two images. I am able to evaluate on all three RGB channels, like this:
def psnr(hr, sr):
return tf.image.psnr(hr, sr, max_val=255)
with the psnr function from tensorflow (tf.image.psnr
But what would I do to only evaluate on the first channel? I assume I need to extract those values from the tensor some how. In python it is usually possible to do something like hr[:, 0, 0], but this obviously does not work here.
im1 = tf.image.convert_image_dtype(np.random.randn(64,64,3), tf.float32)
im2 = tf.image.convert_image_dtype(np.random.randn(64,64,3), tf.float32)
psnr = tf.image.psnr(tf.expand_dims(im1[:,:,0], 2),
tf.expand_dims(im2[:,:,0], 2), max_val=255)
with tf.Session() as sess:
print (sess.run(psnr))
Get the first channel using im1[:,:,0] and reshape it to h x w x 1 by adding the one channel using expand_dims
I want to create a network where in the input layer nodes are just connected to some nodes in the next layer. Here is a small example:
My solution so far is that I set the weight of the edge between i1 and h1 to zero and after every optimization step I multiply the weights with a matrix (I call this matrix mask matrix) in which every entry is 1 except the entry of the weight of the edge between i1 and h1.
(See code below)
Is this approach right? Or does this have a affect on the GradientDescent? Is there another approach to create this kind of a network in TensorFlow?
import tensorflow as tf
import tensorflow.contrib.eager as tfe
import numpy as np
tf.enable_eager_execution()
model = tf.keras.Sequential([
tf.keras.layers.Dense(2, activation=tf.sigmoid, input_shape=(2,)), # input shape required
tf.keras.layers.Dense(2, activation=tf.sigmoid)
])
#set the weights
weights=[np.array([[0, 0.25],[0.2,0.3]]),np.array([0.35,0.35]),np.array([[0.4,0.5],[0.45, 0.55]]),np.array([0.6,0.6])]
model.set_weights(weights)
model.get_weights()
features = tf.convert_to_tensor([[0.05,0.10 ]])
labels = tf.convert_to_tensor([[0.01,0.99 ]])
mask =np.array([[0, 1],[1,1]])
#define the loss function
def loss(model, x, y):
y_ = model(x)
return tf.losses.mean_squared_error(labels=y, predictions=y_)
#define the gradient calculation
def grad(model, inputs, targets):
with tf.GradientTape() as tape:
loss_value = loss(model, inputs, targets)
return loss_value, tape.gradient(loss_value, model.trainable_variables)
#create optimizer an global Step
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
global_step = tf.train.get_or_create_global_step()
#optimization step
loss_value, grads = grad(model, features, labels)
optimizer.apply_gradients(zip(grads, model.variables),global_step)
#masking the optimized weights
weights=(model.get_weights())[0]
masked_weights=tf.multiply(weights,mask)
model.set_weights([masked_weights])
If you are looking for a solution for the specific example you provided, you can simply use tf.keras Functional API and define two Dense layers where one is connected to both neurons in the previous layer and the other one is only connected to one of the neurons:
from tensorflow.keras.layer import Input, Lambda, Dense, concatenate
from tensorflow.keras.models import Model
inp = Input(shape=(2,))
inp2 = Lambda(lambda x: x[:,1:2])(inp) # get the second neuron
h1_out = Dense(1, activation='sigmoid')(inp2) # only connected to the second neuron
h2_out = Dense(1, activation='sigmoid')(inp) # connected to both neurons
h_out = concatenate([h1_out, h2_out])
out = Dense(2, activation='sigmoid')(h_out)
model = Model(inp, out)
# simply train it using `fit`
model.fit(...)
The problem with your solution and some others suggested by other answers in this post is that they do not prevent training of this weight. They allow the gradient descent to train the non existent weight and then overwrite it retrospectively. This will result in a network that has a zero in this location as desired, but will negatively affect your training process as the back propagation calculation will not see the masking step as it is not part of a TensorFlow graph and so the gradient descent will follow a path which includes the assumption that this weight does have an affect on the outcome (it does not).
A better solution would be to include the masking step as a part of your TensorFlow graph, so that it can be factored into the gradient descent. Since the masking step is simply a element wise multiplication by your sparse, binary martix mask, you could just include the mask matrix as an elementwise matrix multiplicaiton in the graph definition using tf.multiply.
Sadly this means sying goodbye to the user friendly keras,layers methods and embracing a more nuts & bolts approach to TensorFlow. I can't see an obvious way to do it using the layers API.
See the implementation below, I have tried to provide comments explaining what is happening at each stage.
import tensorflow as tf
## Graph definition for model
# set up tf.placeholders for inputs x, and outputs y_
# these remain fixed during training and can have values fed to them during the session
with tf.name_scope("Placeholders"):
x = tf.placeholder(tf.float32, shape=[None, 2], name="x") # input layer
y_ = tf.placeholder(tf.float32, shape=[None, 2], name="y_") # output layer
# set up tf.Variables for the weights at each layer from l1 to l3, and setup feeding of initial values
# also set up mask as a variable and set it to be un-trianable
with tf.name_scope("Variables"):
w_l1_values = [[0, 0.25],[0.2,0.3]]
w_l1 = tf.Variable(w_l1_values, name="w_l1")
w_l2_values = [[0.4,0.5],[0.45, 0.55]]
w_l2 = tf.Variable(w_l2_values, name="w_l2")
mask_values = [[0., 1.], [1., 1.]]
mask = tf.Variable(mask_values, trainable=False, name="mask")
# link each set of weights as matrix multiplications in the graph. Inlcude an elementwise multiplication by mask.
# Sequence takes us from inputs x to output final_out, which will be compared to labels fed to placeholder y_
l1_out = tf.nn.relu(tf.matmul(x, tf.multiply(w_l1, mask)), name="l1_out")
final_out = tf.nn.relu(tf.matmul(l1_out, w_l2), name="output")
## define loss function and training operation
with tf.name_scope("Loss"):
# some loss defined as a function of graph output: final_out and labels: y_
loss = tf.nn.sigmoid_cross_entropy_with_logits(logits=final_out, labels=y_, name="loss")
with tf.name_scope("Train"):
# some optimisation strategy, arbitrary learning rate
optimizer = tf.train.AdamOptimizer(learning_rate=0.001, name="optimizer_adam")
train_op = optimizer.minimize(loss, name="train_op")
# create session, initialise variables and train according to inputs and corresponding labels
# This should show that the values of the first layer weights change, but the one set to 0 remains at 0
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
initial_l1_weights = sess.graph.get_tensor_by_name("Variables/w_l1:0")
print(initial_l1_weights.eval())
inputs = [[0.05, 0.10]]
labels = [[0.01, 0.99]]
ans = sess.run(train_op, feed_dict={"Placeholders/x:0": inputs, "Placeholders/y_:0": labels})
train_steps = 1
for i in range(train_steps):
initial_l1_weights = sess.graph.get_tensor_by_name("Variables/w_l1:0")
print(initial_l1_weights.eval())
Or use the answer provided by today for a keras friendly option.
You have multiple options here.
First, you could use the dynamic masking approach in your example. I believe this will work as expected since the gradients w.r.t. the masked-out parameters will be zero (the output is constant when you change the unused parameters). This approach is simple and it can be used even when your mask is not constant during the training.
Second, if you know beforehand which weights will be always zero, you can compose your weight matrix using tf.get_variable to get a submatrix, and then concatenate it with a tf.constant tensor, e.g.:
weights_sub = tf.get_variable("w", [dim_in, dim_out - 1])
zeros = tf.zeros([dim_in, 1])
weights = tf.concat([weights_sub, zeros], axis=1)
this example will make one column of your weight matrix to be always zero.
Finally, if your mask is more complex, you can use tf.get_variable on a flattened vector and then compose a tf.SparseTensor with the variable values on the used indices:
weights_used = tf.get_variable("w", [num_used_vars])
indices = ... # get your indices in a 2-D matrix of shape [num_used_vars, 2]
dense_shape = tf.constant([dim_in, dim_out]) # this is the final shape of the weight matrix
weights = tf.SparseTensor(indices, weights_used, dense_shape)
EDIT: This probably won't work in combination with Keras' set_weights method, as it expects Numpy arrays, not Tensors.
I need a way to have access to the weight matrix in TensorFlow or Keras within each iteration, so that I can convert it into a format that I can use in Numpy to carry out certain operations on it, then send it back to TensorFlow.
For example, I want to change my filter such that some of the neurons are specified by others neuron of the filter. They have to be obtained as solutions of linear systems with other neurons as its coefficients, not by the learning process. As I could not find a way to do this in TensorFlow or Keras, I have to use Numpy.
I have found many questions with the same or similar titles, but none of them helped. I would appreciate any hints.
EDIT
let me explain the problem more clearly
consider the following code
import tensorflow as tf
import numpy as np
x = tf.placeholder(tf.float32, (1, 5, 5, 1))
y = tf.placeholder(tf.float32, (1))
# create variable
weights = {
"my_filter": tf.Variable(tf.truncated_normal([3, 3, 1, 1]), name="my_filter"),
"f_c": tf.Variable(tf.truncated_normal([25,1]), name="f_c") }
conv = tf.nn.conv2d(x, weights["my_filter"], [1,1,1,1], padding='SAME')
flatten= tf.reshape(conv,[1,25])
logits= tf.matmul(flatten,weights["f_c"])
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits,labels= y))
optmize = tf.train.AdamOptimizer()
grads_and_vars = optmize.compute_gradients(cost)
#In this part before applying gradient I have to apply some complicated mathematical operation
train_op=optmize.apply_gradients(grads_and_vars)
train_epochs=10
input_x = np.arange(25).reshape([1,5,5,1])
input_y = np.arange(1)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(train_epochs):
sess.run(train_op, feed_dict={x: input_x, y: input_y})
I have a 5*5 filter named my_filter and I want all its elements to be trained except for one of them, for exaple the (1,1) element, and I need the latter element to be determined by the rest of the elemnets. This has to be done in each iteration. This is exactly where my problem is. I know how to access the weight matrix after the training is finished, but I do not know how to do this within each iteration.
In my code, I have first computed the gradients, then made the changes, and then applied the gradients. But the problem is that the gradients are tuples of types, for exapmle tensor, which are not easy to work with in Numpy. I need some method to convert these data to more familiar Numpy types.
Keras layers, and tf.keras.layers layers, support get_weights / set_weights methods, which return numpy arrays for the weights. So you can call get_weights, modify the result in numpy, and call set_weights back to put the new numpy values into tensorflow.
Something like this:
model = tf.keras.Sequential(...)
for batch in data:
model.fit(batch)
if ...:
weights_as_numpy = model.get_weights()
# modify the weights
model.set_weights(weights_as_numpy)
For that, you will need to be able to access to the weights. Instead of defining layer using tf.layers which automatically allocate variable, you can first get a variable yourself, and then call tf.nn instead.
# input
x = tf.placeholder(tf.float32, (1, 5, 5, 1))
dummy_input = np.arange(25).reshape([1,5,5,1])
# create variable
w = tf.get_variable('weight',[3,3,1,1])
# assign the variable to layer e.g. conv
y = tf.nn.conv2d(x, w, [1,1,1,1], padding='SAME')
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# read the weight
random_weight = sess.run(w, feed_dict={x:dummy_input})
print('random weight', random_weight)
# create some new values for weight
new_weight = np.arange(9).reshape([3,3,1,1])
# load it into the variable
w.load(new_weight,sess)
# read back and print to verify
new_weight = sess.run(w, feed_dict={x:dummy_input})
print('new weight', new_weight)
Usually, an activation function is applied to all neurons of a given layer as in
layer = tf.nn.relu(layer)
How can I apply an activation function to say the second neuron only?
How can I apply a specific transformation (say tf.exp()) to a specific neuron only?
Slicing a column cannot apply here since to slice a column I need to know the number of rows and it is unknown at construction time.
You can do slicing of dynamically-shaped tensors, just like static ones. Here I stripped everything down to a [?, 2] tensor and it's 0-slice:
import numpy as np
import tensorflow as tf
x = tf.placeholder(dtype=tf.float32, shape=[None, 2], name='x')
layer = tf.nn.relu(x)
slice = layer[:, 0]
activation = tf.log(1 + tf.exp(slice))
with tf.Session() as session:
session.run(tf.global_variables_initializer())
layer_val, slice_val, activ_val = session.run([layer, slice, activation],
feed_dict={x: np.random.randn(10, 2)})
print layer_val[:, 0]
print slice_val
print activ_val
You should see that layer_val[:, 0] is the same as slice_val, and activ_val is its transformation.
I'm trying to build a LSTM RNN that handles 3D data in Tensorflow. From this paper, Grid LSTM RNN's can be n-dimensional. The idea for my network is a have a 3D volume [depth, x, y] and the network should be [depth, x, y, n_hidden] where n_hidden is the number of LSTM cell recursive calls. The idea is that each pixel gets its own "string" of LSTM recursive calls.
The output should be [depth, x, y, n_classes]. I'm doing a binary segmentation -- think foreground and background, so the number of classes is just 2.
# Network Parameters
n_depth = 5
n_input_x = 200 # MNIST data input (img shape: 28*28)
n_input_y = 200
n_hidden = 128 # hidden layer num of features
n_classes = 2
# tf Graph input
x = tf.placeholder("float", [None, n_depth, n_input_x, n_input_y])
y = tf.placeholder("float", [None, n_depth, n_input_x, n_input_y, n_classes])
# Define weights
weights = {}
biases = {}
# Initialize weights
for i in xrange(n_depth * n_input_x * n_input_y):
weights[i] = tf.Variable(tf.random_normal([n_hidden, n_classes]))
biases[i] = tf.Variable(tf.random_normal([n_classes]))
def RNN(x, weights, biases):
# Prepare data shape to match `rnn` function requirements
# Current data input shape: (batch_size, n_input_y, n_input_x)
# Permuting batch_size and n_input_y
x = tf.reshape(x, [-1, n_input_y, n_depth * n_input_x])
x = tf.transpose(x, [1, 0, 2])
# Reshaping to (n_input_y*batch_size, n_input_x)
x = tf.reshape(x, [-1, n_input_x * n_depth])
# Split to get a list of 'n_input_y' tensors of shape (batch_size, n_hidden)
# This input shape is required by `rnn` function
x = tf.split(0, n_depth * n_input_x * n_input_y, x)
# Define a lstm cell with tensorflow
lstm_cell = grid_rnn_cell.GridRNNCell(n_hidden, input_dims=[n_depth, n_input_x, n_input_y])
# lstm_cell = rnn_cell.MultiRNNCell([lstm_cell] * 12, state_is_tuple=True)
# lstm_cell = rnn_cell.DropoutWrapper(lstm_cell, output_keep_prob=0.8)
outputs, states = rnn.rnn(lstm_cell, x, dtype=tf.float32)
# Linear activation, using rnn inner loop last output
# pdb.set_trace()
output = []
for i in xrange(n_depth * n_input_x * n_input_y):
#I'll need to do some sort of reshape here on outputs[i]
output.append(tf.matmul(outputs[i], weights[i]) + biases[i])
return output
pred = RNN(x, weights, biases)
pred = tf.transpose(tf.pack(pred),[1,0,2])
pred = tf.reshape(pred, [-1, n_depth, n_input_x, n_input_y, n_classes])
# pdb.set_trace()
temp_pred = tf.reshape(pred, [-1, n_classes])
n_input_y = tf.reshape(y, [-1, n_classes])
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(temp_pred, n_input_y))
Currently I'm getting the error: TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'
It occurs after the RNN intialization: outputs, states = rnn.rnn(lstm_cell, x, dtype=tf.float32)
x of course is of type float32
I am unable to tell what type GridRNNCell returns, any helpe here? This could be the issue. Should I be defining more arguments to this? input_dims makes sense, but what should output_dims be?
Is this a bug in the contrib code?
GridRNNCell is located in contrib/grid_rnn/python/ops/grid_rnn_cell.py
I was unsure on some of the implementation decisions of the code, so I decided to roll my own. One thing to keep in mind is that this is an implementation of just the cell. It is up to you to build the actual machinery that handles the locations and interactions of the h and m vectors and isn't as simple as passing in your data and expecting it to traverse the dimensions properly.
So for example, if you are working in two dimensions, start with the top left block, take the incoming x and y vectors, concat them together, then use your cell to compute the output (which includes outgoing vectors for both x and y); and it is up to you to store the output for later use in neighboring blocks. Pass those outputs individually to each corresponding dimension, and in each of those neighboring blocks, concat the incoming vectors (again, for each dimension) and compute the output for the neighboring blocks. To do this, you'll need two for-loops, one for each dimension.
Perhaps the version in contrib will work for this, but a couple problems I have with it (I could be wrong here, but as far as I can tell):
1) The vectors are handled using concat and slice rather than with tuples. This will likely result in slower performance.
2) It looks like the input is projected at each step, which doesn't sit well with me. In the paper they only project into the network for incoming blocks along the edge of the grid and not throughout.
If you look at the code, it is actually very simple. Perhaps reading the paper and making adjustments to the code as needed, or rolling your own are your best bet. And remember that the cell is only good for performing the recurrence at each step, and not for managing the incoming and outgoing h and m vectors.
which version of Grid LSTM cells are you using?
If you are using https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/rnn/python/ops/rnn_cell.py
I think you can try to initialize 'feature_size' and 'frequency_skip'.
Also, I think there may exists another bug. Feed a dynamic shape into this version may cause a TypeError
Yes, dynamic shape was the cause. There is a PR to fix this: https://github.com/tensorflow/tensorflow/pull/4631
#jstaker7: Thank you for trying it out. Re. problem 1, the above PR uses tuples for states and outputs, hopefully it can address the performance issue. GridRNNCell was created some while ago, at that time all the LSTMCells in Tensorflow was using concat/slice instead of tuple.
Re. problem 2, GridRNNCell will not project the input if you pass None. A dimension can be both input and recurrent, and when there is no input (inputs = None), it will use the recurrent tensors for computation. We can also use 2 input dimensions, by instantiate the GridRNNCell directly.
Of course writing a generic class for all cases makes the code looks a bit convoluted, and I think that it needs better documentation.
Anyway, it will be great if you could share your improvements, or any idea you might have to make it clearer/more useful. It is the nature of an open-source project anyway.