Defining a gradient with respect to a subtensor in Theano

Defining a gradient with respect to a subtensor in Theano - python

I have what is conceptually a simple question about Theano but I haven't been able to find the answer (I'll confess upfront to not really understanding how shared variables work in Theano, despite many hours with the tutorials).
I'm trying to implement a "deconvolutional network"; specifically I have a 3-tensor of inputs (each input is a 2D image) and a 4-tensor of codes; for the ith input codes[i] represents a set of codewords which together code for input i.
I've been having a lot of trouble figuring out how to do gradient descent on the codewords. Here are the relevant parts of my code:
idx = T.lscalar()
pre_loss_conv = conv2d(input = codes[idx].dimshuffle('x', 0, 1,2),
filters = dicts.dimshuffle('x', 0,1, 2),
border_mode = 'valid')
loss_conv = pre_loss_conv.reshape((pre_loss_conv.shape[2], pre_loss_conv.shape[3]))
loss_in = inputs[idx]
loss = T.sum(1./2.*(loss_in - loss_conv)**2)
del_codes = T.grad(loss, codes[idx])
delc_fn = function([idx], del_codes)
train_codes = function([input_index], loss, updates = [
[codes, T.set_subtensor(codes[input_index], codes[input_index] -
learning_rate*del_codes[input_index]) ]])
(here codes and dicts are shared tensor variables). Theano is unhappy with this, specifically with defining
del_codes = T.grad(loss, codes[idx])
The error message I'm getting is: theano.gradient.DisconnectedInputError: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: Subtensor{int64}.0
I'm guessing that it wants a symbolic variable instead of codes[idx]; but then I'm not sure how to get everything connected to get the intended effect. I'm guessing I'll need to change the final line to something like
learning_rate*del_codes) ]])
Can someone give me some pointers as to how to define this function properly? I think I'm probably missing something basic about working with Theano but I'm not sure what.
Thanks in advance!
-Justin
Update: Kyle's suggestion worked very nicely. Here's the specific code I used
current_codes = T.tensor3('current_codes')
current_codes = codes[input_index]
pre_loss_conv = conv2d(input = current_codes.dimshuffle('x', 0, 1,2),
filters = dicts.dimshuffle('x', 0,1, 2),
border_mode = 'valid')
loss_conv = pre_loss_conv.reshape((pre_loss_conv.shape[2], pre_loss_conv.shape[3]))
loss_in = inputs[input_index]
loss = T.sum(1./2.*(loss_in - loss_conv)**2)
del_codes = T.grad(loss, current_codes)
train_codes = function([input_index], loss)
train_dicts = theano.function([input_index], loss, updates = [[dicts, dicts - learning_rate*del_dicts]])
codes_update = ( codes, T.set_subtensor(codes[input_index], codes[input_index] - learning_rate*del_codes) )
codes_update_fn = function([input_index], updates = [codes_update])
for i in xrange(num_inputs):
current_loss = train_codes(i)
codes_update_fn(i)

To summarize the findings:
Assigning grad_var = codes[idx], then making a new variable such as:
subgrad = T.set_subtensor(codes[input_index], codes[input_index] - learning_rate*del_codes[input_index])
Then calling
train_codes = function([input_index], loss, updates = [[codes, subgrad]])
seemed to do the trick. In general, I try to make variables for as many things as possible. Sometimes tricky problems can arise from trying to do too much in a single statement, plus it is hard to debug and understand later! Also, in this case I think theano needs a shared variable, but has issues if the shared variable is created inside the function that requires it.
Glad this worked for you!

Related

How to build TF tensor with ones in specified locations - batch compatible

I apologize for the poor question title but I'm not sure quite how to phrase it. Here's the problem I'm trying to solve: I have two NNs working off of the same input dataset in my code. One of them is a traditional network while the other is used to limit the acceptable range of the first. This works by using a tf.where() statement which works fine in most cases, such as this toy example:
pcts= [0.04,0.06,0.06,0.06,0.06,0.06,0.06,0.04,0.04,0.04]
legal_actions = tf.where(pcts>=0.05, tf.ones_like(pcts), tf.zeros_like(pcts))
Which gives the correct result: legal_actions = [0,1,1,1,1,1,1,0,0,0]
I can then multiply this by the output of my first network to limit its Q values to only those of the legal actions. In a case like the above this works great.
However, it is also possible that my original vector looks something like this, with low values in the middle of the high values: pcts= [0.04,0.06,0.06,0.04,0.04,0.06,0.06,0.04,0.04,0.04]
Using the same code as above my legal_actions comes out as this: legal_actions = [0,1,1,0,0,1,1,0,0,0]
Based on the code I have this is correct, however, I'd like to include any zeros in the middle as part of my legal_actions. In other words, I'd like this second example to be the same as the first. Working in basic TF this is easy to do in several different ways, such as in this reproducible example (it's also easy to do with sparse tensors):
import tensorflow as tf
pcts= tf.placeholder(tf.float32, shape=(10,))
legal_actions = tf.where(pcts>=0.05, tf.ones_like(pcts), tf.zeros_like(pcts))
mask = tf.where(tf.greater(legal_actions,0))
legals = tf.cast(tf.range(tf.reduce_min(mask),tf.reduce_max(mask)+1),tf.int64)
oh = tf.one_hot(legals,10)
oh = tf.reduce_sum(oh,0)
with tf.Session() as sess:
print(sess.run(oh,feed_dict={pcts:[0.04,0.06,0.06,0.04,0.04,0.06,0.06,0.04,0.04,0.04]}))
The problem that I'm running into is when I try to apply this to my actual code which is reading in batches from a file. I can't figure out a way to fill in the "gaps" in my tensor without the range function and/or I can't figure out how to make the range function work with batches (it will only make one range at a time, not one per batch, as near as I can tell). Any suggestions on how to either make what I'm working on work or how to solve the problem a completely different way would be appreciated.

Try this code:
import tensorflow as tf
pcts = tf.random.uniform((2,3,4))
a = pcts>=0.5
shape = tf.shape(pcts)[-1]
a = tf.reshape(a, (-1, shape))
a = tf.cast(a, dtype=tf.float32)
def rng(t):
left = tf.scan(lambda a, x: max(a, x), t)
right = tf.scan(lambda a, x: max(a, x), t, reverse=True)
return tf.minimum(left, right)
a = tf.map_fn(lambda x: rng(x), a)
a = tf.reshape(a, (tf.shape(pcts)))

What is the best approach to deal with batches within a Lambda layer?

I created a neural network with Keras, and added a Lambda layer to perform some calculations, but it is showing a poor performance on inferences.
I was able to make the inferences successfully using a batch of one input and added one more loop to handle multiple inputs. Everything works fine, but the performance is somewhat poor. I figured using a larger batch would make things a lot faster. My question is whether I am handling batches correctly (is it really necessary to use another loop?) as I have not found any keras or tensorflow documentation dealing with this topic in more depth.
Below is a code with a structure similar to the one I'm using in the Lambda layer.
def GenericFunc(x, batch=10, channels=64):
y, group = [], []
for i in range(batch):
for j in range(channels):
y.append(backend.sum(x[0, :, :, j]))
group.append(tf.convert_to_tensor(y, dtype=np.float32))
y = []
yy = backend.stack(group, axis=0)
tensor_stack = backend.reshape(yy, [batch,channels])
return tensor_stack
Any suggestions will be welcome!

Never use loops. Tensors are made for tensor operations.
def GenericFunc(x):
y = backend.sum(x, axis=1)
y = backend.sum(y, axis=1)
return y
Probably also works with
def GenericFunc(x):
return backend.sum(x, axis=[1,2])

How should Euler integration be implemented in TensorFlow?

I want to write a crude Euler simulation of a set of PDEs. I read the PDE tutorial on tensorflow.org and I am a little puzzled about how to do this properly. I have two specific questions but would welcome further feedback if there is anything I have overlooked or misunderstood.
The following code is from the tutorial:
# Discretized PDE update rules
U_ = U + eps * Ut
Ut_ = Ut + eps * (laplace(U) - damping * Ut)
# Operation to update the state
step = tf.group(
U.assign(U_),
Ut.assign(Ut_))
Question 1
Isn't there a bug here? Once U.assign(U_) has been evaluated, surely the next evaluation of Ut_ will use the updated value of U rather than the value from the same time step? I would have thought that the correct way to do it would be as follows:
delta_U = tf.Variable(dU_init)
delta_Ut = tf.Variable(dUt_init)
delta_step = tf.group(
delta_U.assign(Ut)
delta_Ut.assign(laplace(U) - damping * Ut)
)
update_step = tf.group(
U.assign_add(eps * delta_U),
Ut.assign_add(eps * delta_Ut)
)
We could then run Euler integration steps by alternating evaluations of delta_step and update_step. If I understand correctly, this could be done via separate invocations of Session.run():
with tf.Session() as sess:
...
for i in range(1000):
sess.run(delta_step)
sess.run(update_step)
Question 2
It seems frustrating that a single operation can't be defined that combines both steps in a fixed order, e.g.
combined_update = tf.group(delta_step, update_step)
with tf.Session() as sess:
...
for i in range(1000):
sess.run(combined_update)
but according to an answer on this thread, tf.group() does not guarantee any particular evaluation order. The approach described on that thread for controlling evaluation order involves something called "control dependencies"; can they be used in this instance, where we want to ensure that repeated evaluations of two tensors are made in a fixed order?
If not, is there another way to control the order of evaluation of these tensors, beyond explicitly using sequential Session.run() calls?
Update (12/02/2019)
Update: based on jdehesa's answer, I investigated in greater detail. The results support my original intuition that there is a bug in the PDE tutorial which produces incorrect results due to inconsistent evaluation order of tf.assign() calls; this is not resolved by using control dependencies. However, the method from the PDE tutorial usually produces correct results, and I don't understand why.
I checked the results of running the assignment operations in an explicit order, using the following code:
import tensorflow as tf
import numpy as np
# define two variables a and b, and the PDEs that govern them
a = tf.Variable(0.0)
b = tf.Variable(1.0)
da_dt_ = b * 2
db_dt_ = 10 - a * b
dt = 0.1 # integration step size
# after one step of Euler integration, we should have
# a = 0.2 [ = 0.0 + (1.0 * 2) * 0.1 ]
# b = 2.0 [ = 1.0 + (10 - 0.0 * 1.0) * 0.1 ]
# using the method from the PDE tutorial, define updated values for a and b
a_ = a + da_dt_ * dt
b_ = b + db_dt_ * dt
# and define the update operations
assignA = a.assign(a_)
assignB = b.assign(b_)
# define a higher-order function that runs a particular simulation n times
# and summarises the results
def summarise(simulation, n=500):
runs = np.array( [ simulation() for i in range(n) ] )
summary = dict( { (tuple(run), 0) for run in np.unique(runs, axis=0) } )
for run in runs:
summary[tuple(run)] += 1
return summary
# check the results of running the assignment operations in an explicit order
def explicitOrder(first, second):
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(first)
sess.run(second)
return (sess.run(a), sess.run(b))
print( summarise(lambda: explicitOrder(assignA, assignB)) )
# prints {(0.2, 1.98): 500}
print( summarise(lambda: explicitOrder(assignB, assignA)) )
# prints {(0.4, 2.0): 500}
As expected, if we evaluate assignA first then a gets updated to 0.2, and this updated value is then used to update b to 1.98. If we evaluate assignB first, b is first updated to 2.0, and this updated value is then used to update a to 0.4. These are both the wrong answer to the Euler integration: what we ought to get is a = 0.2, b = 2.0.
I tested what happens when we allow the order of evaluation to be controlled implicitly by tf.group(), without using control dependencies.
noCDstep = tf.group(assignA, assignB)
def implicitOrder():
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(noCDstep)
return (sess.run(a), sess.run(b))
print( summarise(lambda: implicitOrder()) )
# prints, e.g. {(0.4, 2.0): 37, (0.2, 1.98): 1, (0.2, 2.0): 462}
Occasionally, this produces the same result as evaluating assignB followed by assignA, or (more rarely) evaluating assignA followed by assignB. But most of the time, there is an entirely unexpected result: the correct answer to the Euler integration step. This behaviour is both inconsistent and surprising.
I tried to resolve this inconsistent behaviour by introducing control dependencies as suggested by jdehesa, using the following code:
with tf.control_dependencies([a_, b_]):
cdStep = tf.group(assignA, assignB)
def cdOrder():
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(cdStep)
return (sess.run(a), sess.run(b))
print( summarise(lambda: cdOrder()) )
# prints, e.g. {(0.4, 2.0): 3, (0.2, 1.98): 3, (0.2, 2.0): 494}
It appears that control dependencies do not resolve this inconsistency, and it is not clear that they make any difference at all. I then tried implementing the approach originally suggested in my question, which uses additional variables to enforce the computation of deltas and updates independently:
da_dt = tf.Variable(0.0)
db_dt = tf.Variable(0.0)
assignDeltas = tf.group( da_dt.assign(da_dt_), db_dt.assign(db_dt_) )
assignUpdates = tf.group( a.assign_add(da_dt * dt), b.assign_add(db_dt * dt) )
def explicitDeltas():
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(assignDeltas)
sess.run(assignUpdates)
return (sess.run(a), sess.run(b))
print( summarise(lambda: explicitDeltas()) )
# prints {(0.2, 2.0): 500}
As expected, this consistently computes the Euler integration step correctly.
I can understand why sometimes tf.group(assignA, assignB) produces an answer consistent with running assignA followed by assignB, and why it sometimes produces an answer consistent with running assignB followed by assignA, but I don't understand why it usually produces an answer that is magically correct (for the Euler integration case) and consistent with neither of these orders. What is going on?

Indeed, you can make sure that things run in the order that you want using control dependencies. In this case, you just need to make sure that U_ and Ut_ are computed before the assignment operations are executed. I think (although I'm not absolutely sure) that the code in the tutorial is probably correct, and that for Ut_ to be computed with the updated U you would need to have something like:
U_ = U + eps * Ut
U = U.assign(U_)
Ut_ = Ut + eps * (laplace(U) - damping * Ut)
step = Ut.assign(Ut_)
However, whenever you want to make sure that some thing gets executed before another, you can just write the dependencies explicitly:
# Discretized PDE update rules
U_ = U + eps * Ut
Ut_ = Ut + eps * (laplace(U) - damping * Ut)
# Operation to update the state
with tf.control_dependencies([U_, Ut_]):
step = tf.group(
U.assign(U_),
Ut.assign(Ut_))
This will make sure that, before any of the assignment operations are executed, both U_ and Ut_ will have been computed first.
EDIT: Some additional explanation about the new snippets.
In the first snippet in your update (12/02/2019), the code runs first one assignment, then the next. As you said, this is obviously wrong, since the second update will use the already updated value of the other variable.
The second snippet, if I'm not mistaken (correct me if I'm wrong) is what the tutorial proposes, grouping the assignment operations. Since you say you have seen instances of this producing the wrong result, I suppose it is not always safe to evaluate it like this. However, it is not surprising that you frequently get the right result. Here TensorFlow will compute all the necessary values to update both variables. Since evaluation order is not deterministic (when there are no explicit dependencies), it may happen that the update of a happens before b_ is computed, for example, in which case you would get the wrong result. But it is reasonable to expect that many times a_ and b_ will get computed before a and b are updated.
In the third snippet, you use control dependencies, but not in an effective manner. What you are indicating with your code is that the group operation should not run before a_ and b_ are computed. However, that does not mean much; the group operation is pretty much a no-op with dependencies to its inputs. The control dependencies there only affect this no-op, but does not prevent the assignment operations to run whenever before. As I suggested originally, you should instead put the assignment operations within the control dependencies block, to make sure that the assignments do not happen sooner than they should (in my snippet I also put the group operation within the block just for convenience, but it does not really matter whether that is in or out).

Network Flow Optimimization (Gurobi)

I am trying to model and solve an optimization problem, with python and gurobi optimizer. It is my first experience to solve a problem using optimizer. firstly I wrote a really big problem and add all variables and constraints, step by step. But there was problem(S) in that. so I reduce the problem to the small version, again and again. After all, now I have a very simple code:
from gurobipy import *
m = Model('net')
x = m.addVar(name = 'x')
y = m.addVar(name = 'y')
m.addConstr(x >= 0 and x <= 9000, name = 'flow0')
m.addConstr(y >= 0 and y <= 1000, name = 'flow1')
m.addConstr(y + x == 9990, name = 'total_flow')
m.setObjective(x *(4 + 0.6*(x/9000)) + (y * (4 + 0.6*(y/1000))), GRB.MINIMIZE)
solo = m.optimize()
if solo:
print ('find!!!')
It actually is a simple network flow problem (for a graph with two nodes and two edges) I want to calculate the flow of each edge (x and y). Obviously the flow of each edge cant be negative and cant be bigger than edge capacity(x(capa) = 9000, y(capa) = 1000). and the third constraint shows the the total flow limitation on both edges. Finally, the objective function has to minimize the equation.
Now I have some question on this code:
why 'solo' is (None)?
How can I print solution variables. I used getAttr() function. but I couldn't find out the role of variables name (x, y or flow0, flow1)
3.Ive got this result. But I really cant understand this!!!!
for example: what dose it calculate in each iteration??!
Tnx in advance, and excuse for my simple question...

The optimize() method always returns None, see print(help(m.optimize)). The status of your model after calling this method is stored in m.status while the solution values are stored in the .X attribute for each variable (assumed the model was solved to optimality). To access them you can use m.getVars():
# your model ...
m.optimize()
if m.status = GRB.OPTIMAL:
for var in m.getVars():
print(var.VarName, var.X)
Your posted log shows for each iteration of the barrier method (also known as interior point method) the objective value. See here for a detailed overview.

The right way to define a function in theano?

Background:
Usually I will define a theano function with input like 'x = fmatrix()', however, during modifying keras (a deep learning library based on theano) to make it work with CTC cost, I noticed a very weird problem: if one input of the cost function is declared as
x = tensor.zeros(shape=[M,N], dtype='float32')
instead of
x = fmatrix()
the training process will converge much faster.
A simplified problem:
The whole codes above are quite big. So I try to simplify the problem like the following: say a function for computing Levenshtein edit distance as
import theano
from theano import tensor
from theano.ifelse import ifelse
def editdist(s, t):
def update(x, previous_row, target):
current_row = previous_row + 1
current_row = tensor.set_subtensor(current_row[1:], tensor.minimum(current_row[1:], tensor.add(previous_row[:-1], tensor.neq(target,x))))
current_row = tensor.set_subtensor(current_row[1:], tensor.minimum(current_row[1:], current_row[0:-1] + 1))
return current_row
source, target = ifelse(tensor.lt(s.shape[0], t.shape[0]), (t, s), (s, t))
previous_row = tensor.arange(target.size + 1, dtype=theano.config.floatX)
result, updates = theano.scan(fn = update, sequences=source, outputs_info=previous_row, non_sequences=target, name='editdist')
return result[-1,-1]
then I define two functions f1 and f2 like:
x1 = tensor.fvector()
x2 = tensor.fvector()
r1 = editdist(x1,x2)
f1 = theano.function([x1,x2], r1)
x3 = tensor.zeros(3, dtype='float32')
x4 = tensor.zeros(3, dtype='float32')
r2 = editdist(x3,x4)
f2 = theano.function([x3,x4], r2)
When computing with f1 and f2, the results are different:
>>f1([1,2,3],[1,3,3])
array(1.0)
>>f2([1,2,3],[1,3,3])
array(3.0)
f1 gives the right result, but f2 doen't.
So my problem is: what is the right way to define a theano function? And, what actually went wrong about f2?
Update:
I'm using theano of version 0.8.0.dev0. I just tried theano 0.7.0, both f1 and f2 give correct result. Maybe this is a bug of theano?
Update_1st 1-27-2016:
According to the explanation of #lamblin on this issue (https://github.com/Theano/Theano/issues/3925#issuecomment-175088918), this was actually a bug of theano, and has been fixed in the latest (1-26-2016) version. For convenience, lamblin's explanation is quoted here:
The first way is the most natural one, but in theory both should be equivalent.
x3 and x4 are created as the output of an "alloc" operation, the input of which would be the constant 3, rather than free inputs like x1 and x2, but that should not matter since you pass [x3, x4] as inputs to theano.function, which should cut the computation graph right there.
My guess is that scan is optimizing prematurely, believing that x3 or x4 is guaranteed to always be the constant 0, and does some simplifications that proved incorrect when values are provided for them. That would be an actual bug in scan."
Update_2nd 1-27-2016:
Unfortunately the bug is not totally fixed yet. In the background section I mentioned if one input of the cost function is declared as tensor.zeros() the convergence process will be much faster, I've found the reason: when input declared as tensor.zeros(), the cost function gave incorrect result, though mysteriously this helped the convergence.
I managed a simplified problem reproduction demo here (https://github.com/daweileng/TheanoDebug), run the ctc_bench.py and you can see the results.

theano.tensor.zeros(...) can't take any other value than 0.
Unless you add nodes to the graph of course and modify parts of the zeros tensor using theano.tensor.set_subtensor.
The input tensor theano.tensor.fmatrix can take any value you input.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.