is there some way to create variables in a map_fn loop like shown in the code beneath? how can I solve this error while keeping a variable in the loop? the info log does not really help me either, so am I getting any concept of tensorflow fundamentally wrong here? [tensorflow 1.14.0, python 3.6.8]
import tensorflow as tf
### function called in map_fn
def opt_variable(theta):
init_theta = lambda: theta
var_theta = tf.get_variable(dtype=tf.float32, initializer=tf.Variable(init_theta))
### ... other steps which need variable type to optimize
return tf.constant(3.) # some return
def iterate_over_cols(theta):
iter_cols = tf.range(5)
map_theta = tf.map_fn(lambda x: (opt_variable(theta[x])),
iter_cols, dtype=tf.float32 )
return map_theta
### example run
t_test = tf.convert_to_tensor([1.4, 3.1, 4.6, 6.3], dtype=tf.float32)
iterate_over_cols(t_test)
leads to this error:
ValueError: Cannot use 'map_18/while/strided_slice' as input to
'map_18/while/Variable/Assign' because 'map_18/while/strided_slice' is
in a while loop. See info log for more details.
It seems that you can not use nested while loops in this version, that means you can not use the output of one map_fn to the input of the other.
Related
I apologize for the poor question title but I'm not sure quite how to phrase it. Here's the problem I'm trying to solve: I have two NNs working off of the same input dataset in my code. One of them is a traditional network while the other is used to limit the acceptable range of the first. This works by using a tf.where() statement which works fine in most cases, such as this toy example:
pcts= [0.04,0.06,0.06,0.06,0.06,0.06,0.06,0.04,0.04,0.04]
legal_actions = tf.where(pcts>=0.05, tf.ones_like(pcts), tf.zeros_like(pcts))
Which gives the correct result: legal_actions = [0,1,1,1,1,1,1,0,0,0]
I can then multiply this by the output of my first network to limit its Q values to only those of the legal actions. In a case like the above this works great.
However, it is also possible that my original vector looks something like this, with low values in the middle of the high values: pcts= [0.04,0.06,0.06,0.04,0.04,0.06,0.06,0.04,0.04,0.04]
Using the same code as above my legal_actions comes out as this: legal_actions = [0,1,1,0,0,1,1,0,0,0]
Based on the code I have this is correct, however, I'd like to include any zeros in the middle as part of my legal_actions. In other words, I'd like this second example to be the same as the first. Working in basic TF this is easy to do in several different ways, such as in this reproducible example (it's also easy to do with sparse tensors):
import tensorflow as tf
pcts= tf.placeholder(tf.float32, shape=(10,))
legal_actions = tf.where(pcts>=0.05, tf.ones_like(pcts), tf.zeros_like(pcts))
mask = tf.where(tf.greater(legal_actions,0))
legals = tf.cast(tf.range(tf.reduce_min(mask),tf.reduce_max(mask)+1),tf.int64)
oh = tf.one_hot(legals,10)
oh = tf.reduce_sum(oh,0)
with tf.Session() as sess:
print(sess.run(oh,feed_dict={pcts:[0.04,0.06,0.06,0.04,0.04,0.06,0.06,0.04,0.04,0.04]}))
The problem that I'm running into is when I try to apply this to my actual code which is reading in batches from a file. I can't figure out a way to fill in the "gaps" in my tensor without the range function and/or I can't figure out how to make the range function work with batches (it will only make one range at a time, not one per batch, as near as I can tell). Any suggestions on how to either make what I'm working on work or how to solve the problem a completely different way would be appreciated.
Try this code:
import tensorflow as tf
pcts = tf.random.uniform((2,3,4))
a = pcts>=0.5
shape = tf.shape(pcts)[-1]
a = tf.reshape(a, (-1, shape))
a = tf.cast(a, dtype=tf.float32)
def rng(t):
left = tf.scan(lambda a, x: max(a, x), t)
right = tf.scan(lambda a, x: max(a, x), t, reverse=True)
return tf.minimum(left, right)
a = tf.map_fn(lambda x: rng(x), a)
a = tf.reshape(a, (tf.shape(pcts)))
I'm a new here, studying tensorflow and encountering a problem.
import model_method
fittt(model_method.build(self,...),...parameters...)
The above is in the main.py importing model_method.py. Function fittt in main.py:
def fittt(model,...):
model.fit(...)
build() in model_method.py:
def build(self,...):
self.op_C,self.op_A = self.function_A(...)
self.op_B = self.function_B(self.op_C,...)
fit() in model_method.py:
def fit(self,...):
sess = tf.Session(graph=self.graph,config=config)
BB,AA = sess.run([self.op_B,self.op_A],feed_dict)
To check running process, I added pdb.set_trace() at the beginning of function_A() and function_B() in model_method.py as follows:
def function_A(self,...):
pdb.set_trace()
......
def function_B(self,...):
pdb.set_trace()
......
The two pdb.set_trace() only stopped when the build() called and didn't work when sess.run([self.op_B,self.op_A],feed_dict) called. So it means the sess.run() didn't run function_A() and function_B() actually. I wonder why and wanna know how to make the two functions work?
By calling the model_method.build() function you create a computation graph. In this call every line of code is executed (hence why pdb stopped).
However, tf.Session.run(...) executes only those parts of computational graph which are necessary to compute the fetched values (self.op_A, self.op_B in your example). The function does not execute the entire build() function again.
Therefore the reason why pdb.set_trace() did not execute when you've run sess.run(...) is because they are not valid Tensor objects and hence not part of the computational graph.
UPDATE
Consider the following:
class My_Model:
def __init__(self):
self.np_input = np.random.normal(size=(10,2)) # 10x2
def build(self):
self._in = tf.placeholder(dtype=tf.float32, shape=[10, None]) # matrix 10xN
W_exception = tf.random_normal(dtype=tf.float32, shape=[3,3]) # matrix 3x3
W_success = tf.random_normal(dtype=tf.float32, shape=[2,3]) # matrix 2x3
self.op_exception = tf.matmul(self._in, W_exception) # [10x2] x [3x3] = ERROR
self.op_success = tf.matmul(self._in, W_success) # [10x2] x [2x3] = [10x3]
print('Computational Graph Built')
def fit_success(self):
with tf.Session() as sess:
res = sess.run(self.op_success, feed_dict={self._in : self.np_input})
print('Result shape: {}'.format(res.shape))
def fit_exception(self):
with tf.Session() as sess:
res = sess.run(self.op_exception, feed_dict={self._in : self.np_input})
print('Result shape: {}'.format(res.shape))
and then calling:
m = My_Model()
m.build()
#> Computational Graph Built
m.fit_success()
#> Result shape: (10, 3)
m.fit_exception()
#> InvalidArgumentError: Matrix size-incompatible: In[0]: [10,2], In[1]: [3,3]
So to explain what you see there. We first define the computational graph in the build() function. The _in is our input tensor; None means the dimension 1 is determined dynamically - that is once we provide a tensor with specified values.
Then we defined two matrices W_exception and W_success which have all dimensions specified and their values will be randomly generated.
Then we define two operations, matrix multiplication, that each returns a tensor.
We called the build() function and created the computational graph, print() function is also executed but NOT added to the graph. Nothing is computed here. In fact, it can't even be, because the values of _in are not specified.
Now to show, that only necessary parts required for computation are evaluated, we call the fit_success() function, which simply multiplies the input tensor _in with the W_success tensor (with correct dimensions). We receive a tensor with correct shape: [10x3]. Note, that we receive no error that op_exception cannot be computed due to mismatched dimensions. That's because we do not need it to evaluate op_success.
Lastly, I just show that exception is indeed thrown when we try to evaluate the op_exception with the same input tensor.
I would like to write a TensorFlow op in python, but I would like it to be differentiable (to be able to compute a gradient).
This question asks how to write an op in python, and the answer suggests using py_func (which has no gradient): Tensorflow: Writing an Op in Python
The TF documentation describes how to add an op starting from C++ code only: https://www.tensorflow.org/versions/r0.10/how_tos/adding_an_op/index.html
In my case, I am prototyping so I don't care about whether it runs on GPU, and I don't care about it being usable from anything other than the TF python API.
Yes, as mentionned in #Yaroslav's answer, it is possible and the key is the links he references: here and here. I want to elaborate on this answer by giving a concret example.
Modulo opperation: Let's implement the element-wise modulo operation in tensorflow (it already exists but its gradient is not defined, but for the example we will implement it from scratch).
Numpy function: The first step is to define the opperation we want for numpy arrays. The element-wise modulo opperation is already implemented in numpy so it is easy:
import numpy as np
def np_mod(x,y):
return (x % y).astype(np.float32)
The reason for the .astype(np.float32) is because by default tensorflow takes float32 types and if you give it float64 (the numpy default) it will complain.
Gradient Function: Next we need to define the gradient function for our opperation for each input of the opperation as tensorflow function. The function needs to take a very specific form. It need to take the tensorflow representation of the opperation op and the gradient of the output grad and say how to propagate the gradients. In our case, the gradients of the mod opperation are easy, the derivative is 1 with respect to the first argument and
with respect to the second (almost everywhere, and infinite at a finite number of spots, but let's ignore that, see https://math.stackexchange.com/questions/1849280/derivative-of-remainder-function-wrt-denominator for details). So we have
def modgrad(op, grad):
x = op.inputs[0] # the first argument (normally you need those to calculate the gradient, like the gradient of x^2 is 2x. )
y = op.inputs[1] # the second argument
return grad * 1, grad * tf.neg(tf.floordiv(x, y)) #the propagated gradient with respect to the first and second argument respectively
The grad function needs to return an n-tuple where n is the number of arguments of the operation. Notice that we need to return tensorflow functions of the input.
Making a TF function with gradients: As explained in the sources mentioned above, there is a hack to define gradients of a function using tf.RegisterGradient [doc] and tf.Graph.gradient_override_map [doc].
Copying the code from harpone we can modify the tf.py_func function to make it define the gradient at the same time:
import tensorflow as tf
def py_func(func, inp, Tout, stateful=True, name=None, grad=None):
# Need to generate a unique name to avoid duplicates:
rnd_name = 'PyFuncGrad' + str(np.random.randint(0, 1E+8))
tf.RegisterGradient(rnd_name)(grad) # see _MySquareGrad for grad example
g = tf.get_default_graph()
with g.gradient_override_map({"PyFunc": rnd_name}):
return tf.py_func(func, inp, Tout, stateful=stateful, name=name)
The stateful option is to tell tensorflow whether the function always gives the same output for the same input (stateful = False) in which case tensorflow can simply the tensorflow graph, this is our case and will probably be the case in most situations.
Combining it all together: Now that we have all the pieces, we can combine them all together:
from tensorflow.python.framework import ops
def tf_mod(x,y, name=None):
with ops.op_scope([x,y], name, "mod") as name:
z = py_func(np_mod,
[x,y],
[tf.float32],
name=name,
grad=modgrad) # <-- here's the call to the gradient
return z[0]
tf.py_func acts on lists of tensors (and returns a list of tensors), that is why we have [x,y] (and return z[0]).
And now we are done. And we can test it.
Test:
with tf.Session() as sess:
x = tf.constant([0.3,0.7,1.2,1.7])
y = tf.constant([0.2,0.5,1.0,2.9])
z = tf_mod(x,y)
gr = tf.gradients(z, [x,y])
tf.initialize_all_variables().run()
print(x.eval(), y.eval(),z.eval(), gr[0].eval(), gr[1].eval())
[ 0.30000001 0.69999999 1.20000005 1.70000005] [ 0.2 0.5 1. 2.9000001] [ 0.10000001 0.19999999 0.20000005 1.70000005] [ 1. 1. 1. 1.] [ -1. -1. -1. 0.]
Success!
Here's an example of adding gradient to a specific py_func
https://gist.github.com/harpone/3453185b41d8d985356cbe5e57d67342
Here's the issue discussion
Suppose we have a variable:
x = tf.Variable(...)
This variable can be updated during the training process using the assign() method.
What is the best way to get the current value of a variable?
I know we could use this:
session.run(x)
But I'm afraid this would trigger a whole chain of operations.
In Theano, you could just do
y = theano.shared(...)
y_vals = y.get_value()
I'm looking for the equivalent thing in TensorFlow.
The only way to get the value of the variable is by running it in a session. In the FAQ it is written that:
A Tensor object is a symbolic handle to the result of an operation,
but does not actually hold the values of the operation's output.
So TF equivalent would be:
import tensorflow as tf
x = tf.Variable([1.0, 2.0])
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
v = sess.run(x)
print(v) # will show you your variable.
The part with init = global_variables_initializer() is important and should be done in order to initialize variables.
Also, take a look at InteractiveSession if you work in IPython.
In general, session.run(x) will evaluate only the nodes that are necessary to compute x and nothing else, so it should be relatively cheap if you want to inspect the value of the variable.
Take a look at this great answer https://stackoverflow.com/a/33610914/5543198 for more context.
tf.Print can simplify your life!
tf.Print will print the value of the tensor(s) you tell it to print at the moment where the tf.Print line is called in your code when your code is evaluated.
So for example:
import tensorflow as tf
x = tf.Variable([1.0, 2.0])
x = tf.Print(x,[x])
x = 2* x
tf.initialize_all_variables()
sess = tf.Session()
sess.run()
[1.0 2.0 ]
because it prints the value of x at the moment when the tf.Print line is. If instead you do
v = x.eval()
print(v)
you will get:
[2.0 4.0 ]
because it will give you the final value of x.
As they cancelled tf.Variable() in tensorflow 2.0.0,
If you want to extract values from a tensor(ie "net"), you can use this,
net.[tf.newaxis,:,:].numpy().
I have what is conceptually a simple question about Theano but I haven't been able to find the answer (I'll confess upfront to not really understanding how shared variables work in Theano, despite many hours with the tutorials).
I'm trying to implement a "deconvolutional network"; specifically I have a 3-tensor of inputs (each input is a 2D image) and a 4-tensor of codes; for the ith input codes[i] represents a set of codewords which together code for input i.
I've been having a lot of trouble figuring out how to do gradient descent on the codewords. Here are the relevant parts of my code:
idx = T.lscalar()
pre_loss_conv = conv2d(input = codes[idx].dimshuffle('x', 0, 1,2),
filters = dicts.dimshuffle('x', 0,1, 2),
border_mode = 'valid')
loss_conv = pre_loss_conv.reshape((pre_loss_conv.shape[2], pre_loss_conv.shape[3]))
loss_in = inputs[idx]
loss = T.sum(1./2.*(loss_in - loss_conv)**2)
del_codes = T.grad(loss, codes[idx])
delc_fn = function([idx], del_codes)
train_codes = function([input_index], loss, updates = [
[codes, T.set_subtensor(codes[input_index], codes[input_index] -
learning_rate*del_codes[input_index]) ]])
(here codes and dicts are shared tensor variables). Theano is unhappy with this, specifically with defining
del_codes = T.grad(loss, codes[idx])
The error message I'm getting is: theano.gradient.DisconnectedInputError: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: Subtensor{int64}.0
I'm guessing that it wants a symbolic variable instead of codes[idx]; but then I'm not sure how to get everything connected to get the intended effect. I'm guessing I'll need to change the final line to something like
learning_rate*del_codes) ]])
Can someone give me some pointers as to how to define this function properly? I think I'm probably missing something basic about working with Theano but I'm not sure what.
Thanks in advance!
-Justin
Update: Kyle's suggestion worked very nicely. Here's the specific code I used
current_codes = T.tensor3('current_codes')
current_codes = codes[input_index]
pre_loss_conv = conv2d(input = current_codes.dimshuffle('x', 0, 1,2),
filters = dicts.dimshuffle('x', 0,1, 2),
border_mode = 'valid')
loss_conv = pre_loss_conv.reshape((pre_loss_conv.shape[2], pre_loss_conv.shape[3]))
loss_in = inputs[input_index]
loss = T.sum(1./2.*(loss_in - loss_conv)**2)
del_codes = T.grad(loss, current_codes)
train_codes = function([input_index], loss)
train_dicts = theano.function([input_index], loss, updates = [[dicts, dicts - learning_rate*del_dicts]])
codes_update = ( codes, T.set_subtensor(codes[input_index], codes[input_index] - learning_rate*del_codes) )
codes_update_fn = function([input_index], updates = [codes_update])
for i in xrange(num_inputs):
current_loss = train_codes(i)
codes_update_fn(i)
To summarize the findings:
Assigning grad_var = codes[idx], then making a new variable such as:
subgrad = T.set_subtensor(codes[input_index], codes[input_index] - learning_rate*del_codes[input_index])
Then calling
train_codes = function([input_index], loss, updates = [[codes, subgrad]])
seemed to do the trick. In general, I try to make variables for as many things as possible. Sometimes tricky problems can arise from trying to do too much in a single statement, plus it is hard to debug and understand later! Also, in this case I think theano needs a shared variable, but has issues if the shared variable is created inside the function that requires it.
Glad this worked for you!