I want to write a crude Euler simulation of a set of PDEs. I read the PDE tutorial on tensorflow.org and I am a little puzzled about how to do this properly. I have two specific questions but would welcome further feedback if there is anything I have overlooked or misunderstood.
The following code is from the tutorial:
# Discretized PDE update rules
U_ = U + eps * Ut
Ut_ = Ut + eps * (laplace(U) - damping * Ut)
# Operation to update the state
step = tf.group(
U.assign(U_),
Ut.assign(Ut_))
Question 1
Isn't there a bug here? Once U.assign(U_) has been evaluated, surely the next evaluation of Ut_ will use the updated value of U rather than the value from the same time step? I would have thought that the correct way to do it would be as follows:
delta_U = tf.Variable(dU_init)
delta_Ut = tf.Variable(dUt_init)
delta_step = tf.group(
delta_U.assign(Ut)
delta_Ut.assign(laplace(U) - damping * Ut)
)
update_step = tf.group(
U.assign_add(eps * delta_U),
Ut.assign_add(eps * delta_Ut)
)
We could then run Euler integration steps by alternating evaluations of delta_step and update_step. If I understand correctly, this could be done via separate invocations of Session.run():
with tf.Session() as sess:
...
for i in range(1000):
sess.run(delta_step)
sess.run(update_step)
Question 2
It seems frustrating that a single operation can't be defined that combines both steps in a fixed order, e.g.
combined_update = tf.group(delta_step, update_step)
with tf.Session() as sess:
...
for i in range(1000):
sess.run(combined_update)
but according to an answer on this thread, tf.group() does not guarantee any particular evaluation order. The approach described on that thread for controlling evaluation order involves something called "control dependencies"; can they be used in this instance, where we want to ensure that repeated evaluations of two tensors are made in a fixed order?
If not, is there another way to control the order of evaluation of these tensors, beyond explicitly using sequential Session.run() calls?
Update (12/02/2019)
Update: based on jdehesa's answer, I investigated in greater detail. The results support my original intuition that there is a bug in the PDE tutorial which produces incorrect results due to inconsistent evaluation order of tf.assign() calls; this is not resolved by using control dependencies. However, the method from the PDE tutorial usually produces correct results, and I don't understand why.
I checked the results of running the assignment operations in an explicit order, using the following code:
import tensorflow as tf
import numpy as np
# define two variables a and b, and the PDEs that govern them
a = tf.Variable(0.0)
b = tf.Variable(1.0)
da_dt_ = b * 2
db_dt_ = 10 - a * b
dt = 0.1 # integration step size
# after one step of Euler integration, we should have
# a = 0.2 [ = 0.0 + (1.0 * 2) * 0.1 ]
# b = 2.0 [ = 1.0 + (10 - 0.0 * 1.0) * 0.1 ]
# using the method from the PDE tutorial, define updated values for a and b
a_ = a + da_dt_ * dt
b_ = b + db_dt_ * dt
# and define the update operations
assignA = a.assign(a_)
assignB = b.assign(b_)
# define a higher-order function that runs a particular simulation n times
# and summarises the results
def summarise(simulation, n=500):
runs = np.array( [ simulation() for i in range(n) ] )
summary = dict( { (tuple(run), 0) for run in np.unique(runs, axis=0) } )
for run in runs:
summary[tuple(run)] += 1
return summary
# check the results of running the assignment operations in an explicit order
def explicitOrder(first, second):
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(first)
sess.run(second)
return (sess.run(a), sess.run(b))
print( summarise(lambda: explicitOrder(assignA, assignB)) )
# prints {(0.2, 1.98): 500}
print( summarise(lambda: explicitOrder(assignB, assignA)) )
# prints {(0.4, 2.0): 500}
As expected, if we evaluate assignA first then a gets updated to 0.2, and this updated value is then used to update b to 1.98. If we evaluate assignB first, b is first updated to 2.0, and this updated value is then used to update a to 0.4. These are both the wrong answer to the Euler integration: what we ought to get is a = 0.2, b = 2.0.
I tested what happens when we allow the order of evaluation to be controlled implicitly by tf.group(), without using control dependencies.
noCDstep = tf.group(assignA, assignB)
def implicitOrder():
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(noCDstep)
return (sess.run(a), sess.run(b))
print( summarise(lambda: implicitOrder()) )
# prints, e.g. {(0.4, 2.0): 37, (0.2, 1.98): 1, (0.2, 2.0): 462}
Occasionally, this produces the same result as evaluating assignB followed by assignA, or (more rarely) evaluating assignA followed by assignB. But most of the time, there is an entirely unexpected result: the correct answer to the Euler integration step. This behaviour is both inconsistent and surprising.
I tried to resolve this inconsistent behaviour by introducing control dependencies as suggested by jdehesa, using the following code:
with tf.control_dependencies([a_, b_]):
cdStep = tf.group(assignA, assignB)
def cdOrder():
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(cdStep)
return (sess.run(a), sess.run(b))
print( summarise(lambda: cdOrder()) )
# prints, e.g. {(0.4, 2.0): 3, (0.2, 1.98): 3, (0.2, 2.0): 494}
It appears that control dependencies do not resolve this inconsistency, and it is not clear that they make any difference at all. I then tried implementing the approach originally suggested in my question, which uses additional variables to enforce the computation of deltas and updates independently:
da_dt = tf.Variable(0.0)
db_dt = tf.Variable(0.0)
assignDeltas = tf.group( da_dt.assign(da_dt_), db_dt.assign(db_dt_) )
assignUpdates = tf.group( a.assign_add(da_dt * dt), b.assign_add(db_dt * dt) )
def explicitDeltas():
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(assignDeltas)
sess.run(assignUpdates)
return (sess.run(a), sess.run(b))
print( summarise(lambda: explicitDeltas()) )
# prints {(0.2, 2.0): 500}
As expected, this consistently computes the Euler integration step correctly.
I can understand why sometimes tf.group(assignA, assignB) produces an answer consistent with running assignA followed by assignB, and why it sometimes produces an answer consistent with running assignB followed by assignA, but I don't understand why it usually produces an answer that is magically correct (for the Euler integration case) and consistent with neither of these orders. What is going on?
Indeed, you can make sure that things run in the order that you want using control dependencies. In this case, you just need to make sure that U_ and Ut_ are computed before the assignment operations are executed. I think (although I'm not absolutely sure) that the code in the tutorial is probably correct, and that for Ut_ to be computed with the updated U you would need to have something like:
U_ = U + eps * Ut
U = U.assign(U_)
Ut_ = Ut + eps * (laplace(U) - damping * Ut)
step = Ut.assign(Ut_)
However, whenever you want to make sure that some thing gets executed before another, you can just write the dependencies explicitly:
# Discretized PDE update rules
U_ = U + eps * Ut
Ut_ = Ut + eps * (laplace(U) - damping * Ut)
# Operation to update the state
with tf.control_dependencies([U_, Ut_]):
step = tf.group(
U.assign(U_),
Ut.assign(Ut_))
This will make sure that, before any of the assignment operations are executed, both U_ and Ut_ will have been computed first.
EDIT: Some additional explanation about the new snippets.
In the first snippet in your update (12/02/2019), the code runs first one assignment, then the next. As you said, this is obviously wrong, since the second update will use the already updated value of the other variable.
The second snippet, if I'm not mistaken (correct me if I'm wrong) is what the tutorial proposes, grouping the assignment operations. Since you say you have seen instances of this producing the wrong result, I suppose it is not always safe to evaluate it like this. However, it is not surprising that you frequently get the right result. Here TensorFlow will compute all the necessary values to update both variables. Since evaluation order is not deterministic (when there are no explicit dependencies), it may happen that the update of a happens before b_ is computed, for example, in which case you would get the wrong result. But it is reasonable to expect that many times a_ and b_ will get computed before a and b are updated.
In the third snippet, you use control dependencies, but not in an effective manner. What you are indicating with your code is that the group operation should not run before a_ and b_ are computed. However, that does not mean much; the group operation is pretty much a no-op with dependencies to its inputs. The control dependencies there only affect this no-op, but does not prevent the assignment operations to run whenever before. As I suggested originally, you should instead put the assignment operations within the control dependencies block, to make sure that the assignments do not happen sooner than they should (in my snippet I also put the group operation within the block just for convenience, but it does not really matter whether that is in or out).
Related
I want to shift my variables in PyMC3. I'm currently just using deterministic transforms, but when I perform the inference, it saves both the original samples and the shifted samples (which is expected behavior).
Example code:
import pymc3 as pm
x_lower = -3
with pm.Model():
x = pm.Gamma('x', alpha=2., beta=1.5)
x_shift = pm.Deterministic("x_shift", x + x_lower)
trace = pm.sample(1000, tune=1000)
trace.remove_values("x") # my current solution
tp = pm.traceplot(trace)
# other analysis...
Now trace stores all x samples and all x_shift samples, which is clearly a waste when the numbers of variables and samples increase. I can do trace.remove_values("x") before continuing with analysis, but I would prefer to simply not savex at all.
Another option is to not save x_shift at all, but I can't find how to add x_lower on to the samples after inference. So this isn't really a solution if I want to use the in-built analysis tools.
Can I save only the x_shift samples, and not the x samples, when I sample?
You can specify exactly what you want to save by setting the trace argument in the pm.sample() function, e.g.,
trace = pm.sample(1000, tune=1000, trace=[x_shift])
In case anyone finds this...
I went for a still-unsatsifying solution: I'm not using Deterministic transformations any more. I still transform the variables, but don't save them. I just transform the saved (original) samples after sampling.
The above code now looks like this:
import pymc3 as pm
x_lower = -3
def f_x(x):
return x + x_lower
with pm.Model():
x = pm.Gamma('x', alpha=2., beta=1.5)
x_shift = f_x(x)
#keep = {"x": f_x} # something like this for more variables
trace = pm.sample(1000, tune=1000)
trace = {"x": f_x(trace["x"])} # note this merges all chains
#trace = {varname: f(trace[varname]) for varname, f in keep.items()}
tp = pm.traceplot(trace)
# other analysis...
I think the pm.traceplot(trace) still works with trace in this form, but otherwise just import arviz and use it directly --- it does work with dicts like this.
Note: take care with more complicated transformations. e.g. you'll have to use pm.math.exp in the model, but np.exp in the post-sampling transformation.
I am trying to model and solve an optimization problem, with python and gurobi optimizer. It is my first experience to solve a problem using optimizer. firstly I wrote a really big problem and add all variables and constraints, step by step. But there was problem(S) in that. so I reduce the problem to the small version, again and again. After all, now I have a very simple code:
from gurobipy import *
m = Model('net')
x = m.addVar(name = 'x')
y = m.addVar(name = 'y')
m.addConstr(x >= 0 and x <= 9000, name = 'flow0')
m.addConstr(y >= 0 and y <= 1000, name = 'flow1')
m.addConstr(y + x == 9990, name = 'total_flow')
m.setObjective(x *(4 + 0.6*(x/9000)) + (y * (4 + 0.6*(y/1000))), GRB.MINIMIZE)
solo = m.optimize()
if solo:
print ('find!!!')
It actually is a simple network flow problem (for a graph with two nodes and two edges) I want to calculate the flow of each edge (x and y). Obviously the flow of each edge cant be negative and cant be bigger than edge capacity(x(capa) = 9000, y(capa) = 1000). and the third constraint shows the the total flow limitation on both edges. Finally, the objective function has to minimize the equation.
Now I have some question on this code:
why 'solo' is (None)?
How can I print solution variables. I used getAttr() function. but I couldn't find out the role of variables name (x, y or flow0, flow1)
3.Ive got this result. But I really cant understand this!!!!
for example: what dose it calculate in each iteration??!
Tnx in advance, and excuse for my simple question...
The optimize() method always returns None, see print(help(m.optimize)). The status of your model after calling this method is stored in m.status while the solution values are stored in the .X attribute for each variable (assumed the model was solved to optimality). To access them you can use m.getVars():
# your model ...
m.optimize()
if m.status = GRB.OPTIMAL:
for var in m.getVars():
print(var.VarName, var.X)
Your posted log shows for each iteration of the barrier method (also known as interior point method) the objective value. See here for a detailed overview.
Below is reproducible code. If you run it, you will see that in the first sess run, the result is nan, whereas the second case gives the correct gradient value of 0.5. But per tf.where and condition specified, they should return the same value. I also simply don't understand why the tf.where function gradient is nan at 1 or -1, which seem to be totally fine input values to me.
tf.reset_default_graph()
x = tf.get_variable('x', shape=[1])
condition = tf.less(x, 0.0)
output = tf.where(condition, -tf.log(-x + 1), tf.log(x + 1))
deriv = tf.gradients(output, x)
with tf.Session() as sess:
print(sess.run(deriv, {x:np.array([-1])}))
logg = -tf.log(-x+1)
derivv = tf.gradients(logg, x)
with tf.Session() as sess:
print(sess.run(derivv, {x:np.array([-1])}))
Thanks for comments!
As explained in the github issue provided by #mikkola, the problem stems from the internal implementation of tf.where. Basically, both alternatives (and their gradient) are computed, and only the correct part is chosen by multiplication of the conditionnal. Alas, if the gradient is inf or nan for the part that is not selected, even when multiplied by 0 you get a nan that eventually propagates to the result.
Since the issue has been filed in May 2016 (that's tensorflow v0.7!) and not patched since, one can safely assume that this won't be anytime soon and start looking for work around.
The easiest way to fix it is to modify your statements so that they always valid and differentiable, even for values that are not meant to be selected.
A general technique would be to clip the input value inside its valid domain. So in your case for example, you could use
cond = tf.less(x, 0.0)
output = tf.where(cond,
-tf.log(-tf.where(cond, x, 0) + 1),
tf.log(tf.where(cond, 0, x) + 1))
In your particular case however it would be simpler to just use
output = tf.sign(x) * tf.log(tf.abs(x) + 1)
Background:
Usually I will define a theano function with input like 'x = fmatrix()', however, during modifying keras (a deep learning library based on theano) to make it work with CTC cost, I noticed a very weird problem: if one input of the cost function is declared as
x = tensor.zeros(shape=[M,N], dtype='float32')
instead of
x = fmatrix()
the training process will converge much faster.
A simplified problem:
The whole codes above are quite big. So I try to simplify the problem like the following: say a function for computing Levenshtein edit distance as
import theano
from theano import tensor
from theano.ifelse import ifelse
def editdist(s, t):
def update(x, previous_row, target):
current_row = previous_row + 1
current_row = tensor.set_subtensor(current_row[1:], tensor.minimum(current_row[1:], tensor.add(previous_row[:-1], tensor.neq(target,x))))
current_row = tensor.set_subtensor(current_row[1:], tensor.minimum(current_row[1:], current_row[0:-1] + 1))
return current_row
source, target = ifelse(tensor.lt(s.shape[0], t.shape[0]), (t, s), (s, t))
previous_row = tensor.arange(target.size + 1, dtype=theano.config.floatX)
result, updates = theano.scan(fn = update, sequences=source, outputs_info=previous_row, non_sequences=target, name='editdist')
return result[-1,-1]
then I define two functions f1 and f2 like:
x1 = tensor.fvector()
x2 = tensor.fvector()
r1 = editdist(x1,x2)
f1 = theano.function([x1,x2], r1)
x3 = tensor.zeros(3, dtype='float32')
x4 = tensor.zeros(3, dtype='float32')
r2 = editdist(x3,x4)
f2 = theano.function([x3,x4], r2)
When computing with f1 and f2, the results are different:
>>f1([1,2,3],[1,3,3])
array(1.0)
>>f2([1,2,3],[1,3,3])
array(3.0)
f1 gives the right result, but f2 doen't.
So my problem is: what is the right way to define a theano function? And, what actually went wrong about f2?
Update:
I'm using theano of version 0.8.0.dev0. I just tried theano 0.7.0, both f1 and f2 give correct result. Maybe this is a bug of theano?
Update_1st 1-27-2016:
According to the explanation of #lamblin on this issue (https://github.com/Theano/Theano/issues/3925#issuecomment-175088918), this was actually a bug of theano, and has been fixed in the latest (1-26-2016) version. For convenience, lamblin's explanation is quoted here:
The first way is the most natural one, but in theory both should be equivalent.
x3 and x4 are created as the output of an "alloc" operation, the input of which would be the constant 3, rather than free inputs like x1 and x2, but that should not matter since you pass [x3, x4] as inputs to theano.function, which should cut the computation graph right there.
My guess is that scan is optimizing prematurely, believing that x3 or x4 is guaranteed to always be the constant 0, and does some simplifications that proved incorrect when values are provided for them. That would be an actual bug in scan."
Update_2nd 1-27-2016:
Unfortunately the bug is not totally fixed yet. In the background section I mentioned if one input of the cost function is declared as tensor.zeros() the convergence process will be much faster, I've found the reason: when input declared as tensor.zeros(), the cost function gave incorrect result, though mysteriously this helped the convergence.
I managed a simplified problem reproduction demo here (https://github.com/daweileng/TheanoDebug), run the ctc_bench.py and you can see the results.
theano.tensor.zeros(...) can't take any other value than 0.
Unless you add nodes to the graph of course and modify parts of the zeros tensor using theano.tensor.set_subtensor.
The input tensor theano.tensor.fmatrix can take any value you input.
I have what is conceptually a simple question about Theano but I haven't been able to find the answer (I'll confess upfront to not really understanding how shared variables work in Theano, despite many hours with the tutorials).
I'm trying to implement a "deconvolutional network"; specifically I have a 3-tensor of inputs (each input is a 2D image) and a 4-tensor of codes; for the ith input codes[i] represents a set of codewords which together code for input i.
I've been having a lot of trouble figuring out how to do gradient descent on the codewords. Here are the relevant parts of my code:
idx = T.lscalar()
pre_loss_conv = conv2d(input = codes[idx].dimshuffle('x', 0, 1,2),
filters = dicts.dimshuffle('x', 0,1, 2),
border_mode = 'valid')
loss_conv = pre_loss_conv.reshape((pre_loss_conv.shape[2], pre_loss_conv.shape[3]))
loss_in = inputs[idx]
loss = T.sum(1./2.*(loss_in - loss_conv)**2)
del_codes = T.grad(loss, codes[idx])
delc_fn = function([idx], del_codes)
train_codes = function([input_index], loss, updates = [
[codes, T.set_subtensor(codes[input_index], codes[input_index] -
learning_rate*del_codes[input_index]) ]])
(here codes and dicts are shared tensor variables). Theano is unhappy with this, specifically with defining
del_codes = T.grad(loss, codes[idx])
The error message I'm getting is: theano.gradient.DisconnectedInputError: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: Subtensor{int64}.0
I'm guessing that it wants a symbolic variable instead of codes[idx]; but then I'm not sure how to get everything connected to get the intended effect. I'm guessing I'll need to change the final line to something like
learning_rate*del_codes) ]])
Can someone give me some pointers as to how to define this function properly? I think I'm probably missing something basic about working with Theano but I'm not sure what.
Thanks in advance!
-Justin
Update: Kyle's suggestion worked very nicely. Here's the specific code I used
current_codes = T.tensor3('current_codes')
current_codes = codes[input_index]
pre_loss_conv = conv2d(input = current_codes.dimshuffle('x', 0, 1,2),
filters = dicts.dimshuffle('x', 0,1, 2),
border_mode = 'valid')
loss_conv = pre_loss_conv.reshape((pre_loss_conv.shape[2], pre_loss_conv.shape[3]))
loss_in = inputs[input_index]
loss = T.sum(1./2.*(loss_in - loss_conv)**2)
del_codes = T.grad(loss, current_codes)
train_codes = function([input_index], loss)
train_dicts = theano.function([input_index], loss, updates = [[dicts, dicts - learning_rate*del_dicts]])
codes_update = ( codes, T.set_subtensor(codes[input_index], codes[input_index] - learning_rate*del_codes) )
codes_update_fn = function([input_index], updates = [codes_update])
for i in xrange(num_inputs):
current_loss = train_codes(i)
codes_update_fn(i)
To summarize the findings:
Assigning grad_var = codes[idx], then making a new variable such as:
subgrad = T.set_subtensor(codes[input_index], codes[input_index] - learning_rate*del_codes[input_index])
Then calling
train_codes = function([input_index], loss, updates = [[codes, subgrad]])
seemed to do the trick. In general, I try to make variables for as many things as possible. Sometimes tricky problems can arise from trying to do too much in a single statement, plus it is hard to debug and understand later! Also, in this case I think theano needs a shared variable, but has issues if the shared variable is created inside the function that requires it.
Glad this worked for you!