Is there any essential difference between * and + for Pytorch autograd?

Is there any essential difference between * and + for Pytorch autograd? - python

I was trying to understand the autograd mechanism in more depth. To test my understanding, I tried to write the following code which I expected to produce an error (i.e., Trying to backward through the graph a second time).
b = torch.Tensor([0.5])
for i in range(5):
b.data.zero_().add_(0.5)
b = b + a
c = b*a
c.backward()
Apparently, it should report an error when c.backward() is called for the second time in the for loop, because the history of b has been freed， however, nothing happens.
But when I tried to change b + a to b * a as follows,
b = torch.Tensor([0.5])
for i in range(5):
b.data.zero_().add_(0.5)
b = b * a
c = b*a
c.backward()
It did report the error I was expecting.
This looks pretty weird to me. I don't understand why there is no error evoked for the former case, and why it makes a difference to change from + to *.

The difference is adding a constant doesn't change the gradient, but mull by const does. It seems, autograd is aware of it, and optimizes out 'b = b + a'.

Related

Sympy simplification of maximum

I don't understand why Sympy won't return to me the expression below simplified (not sure its a bug in my code or a feature of Sympy).
import sympy as sp
a = sp.Symbol('a',finite = True, real = True)
b = sp.Symbol('b',finite = True, real = True)
sp.assumptions.assume.global_assumptions.add(sp.Q.positive(b))
sp.assumptions.assume.global_assumptions.add(sp.Q.negative(a))
sp.simplify(sp.Max(a-b,a+b))
I would expect the output to be $a+b$, but Sympy still gives me $Max(a-b,a+b)$.
Thanks; as you can see I am a beginner in Sympy so any hints/help are appreciated.

Surely the result should be a + b...
You can do this by setting the assumptions on the symbol as in:
In [2]: a = Symbol('a', negative=True)
In [3]: b = Symbol('b', positive=True)
In [4]: Max(a - b, a + b)
Out[4]: a + b
You are trying to use the new assumptions system but that system is still experimental and is not widely used within sympy. The new assumptions are not used in core evaluation so e.g. the Max function has no idea that you have declared global assumptions on a and b unless those assumptions are declared on the symbols as I show above.

Cost function using absolute value and division by decision variables

I am trying to implement a cost function in a pydrake Mathematical program, however I encounter problems whenever I try to divide by a decision variable and use the abs(). A shortened version of my attempted implementation is as follows, I tried to include only what I think may be relevant.
T = 50
na = 3
nq = 5
prog = MathematicalProgram()
h = prog.NewContinuousVariables(rows=T, cols=1, name='h')
qd = prog.NewContinuousVariables(rows=T+1, cols=nq, name='qd')
d = prog.NewContinuousVariables(1, name='d')
u = prog.NewContinuousVariables(rows=T, cols=na, name='u')
def energyCost(vars):
assert vars.size == 2*na + 1 + 1
split_at = [na, 2*na, 2*na + 1]
qd, u, h, d = np.split(vars, split_at)
return np.abs([qd.dot(u)*h/d])
for t in range(T):
vars = np.concatenate((qd[t, 2:], u[t,:], h[t], d))
prog.AddCost(energyCost, vars=vars)
initial_guess = np.empty(prog.num_vars())
solver = SnoptSolver()
result = solver.Solve(prog, initial_guess)
The error I am getting is:
RuntimeError Traceback (most recent call last)
<ipython-input-55-111da18cdce0> in <module>()
22 initial_guess = np.empty(prog.num_vars())
23 solver = SnoptSolver()
---> 24 result = solver.Solve(prog, initial_guess)
25 print(f'Solution found? {result.is_success()}.')
RuntimeError: PyFunctionCost: Output must be of .ndim = 0 (scalar) and .size = 1. Got .ndim = 2 and .size = 1 instead.
To the best of my knowledge the problem is the dimensions of the output, however I am unsure of how to proceed. I spent quite some time trying to fix this, but with little success. I also tried changing np.abs to pydrake.math.abs, but then I got the following error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-56-c0c2f008616b> in <module>()
22 initial_guess = np.empty(prog.num_vars())
23 solver = SnoptSolver()
---> 24 result = solver.Solve(prog, initial_guess)
25 print(f'Solution found? {result.is_success()}.')
<ipython-input-56-c0c2f008616b> in energyCost(vars)
14 split_at = [na, 2*na, 2*na + 1]
15 qd, u, h, d = np.split(vars, split_at)
---> 16 return pydrake.math.abs([qd.dot(u)*h/d])
17
18 for t in range(T):
TypeError: abs(): incompatible function arguments. The following argument types are supported:
1. (arg0: float) -> float
2. (arg0: pydrake.autodiffutils.AutoDiffXd) -> pydrake.autodiffutils.AutoDiffXd
3. (arg0: pydrake.symbolic.Expression) -> pydrake.symbolic.Expression
Invoked with: [array([<AutoDiffXd 1.691961398933386e-257 nderiv=8>], dtype=object)]
Any help would be greatly appreciated, thanks!

BTW, as Tobia has mentioned, dividing a decision variable in the cost function could be problematic. There are two approaches to avoid the problem
Impose a bound on your decision variable, and 0 is not included in this bound. For example, say you want to optimize
min f(x) / y
If you can impose a bound that y > 1, then SNOPT will not try to use y=0, thus you avoid the division by zero problem.
One trick is to introduce another variable as the result of division, and then minimize this variable.
For example, say you want to optimize
min f(x) / y
You could introduce a slack variable z = f(x) / y. And formulate this problem as
min z
s.t f(x) - y * z = 0

Some observations:
The kind of cost function you are trying to use does not need the use of a python function to be enforced. You can just say (even though it would raise other errors as is) prog.AddCost(np.abs([qd[t, 2:].dot(u[t,:])*h[t]/d])).
The argument of prog.AddCost must be a Drake scalar expression. So be sure that your numpy matrix multiplications return a scalar. In the case above they return a (1,1) numpy array.
To minimize the absolute value, you need something a little more sophisticated than that. In the current form you are passing a nondifferentiable objective function: solvers do not quite like that. Say you want to minimize abs(x). A standard trick in optimization is to add an extra (slack) variable, say s, and add the constraints s >= x, s >= -x, and then minimize s itself. All these constraints and this objective are differentiable and linear.
Regarding the division of the objective by an optimization variable. Whenever you can, you should avoid that. For example (I'm 90% sure) that solvers like SNOPT or IPOPT set the initial guess to zero if you do not provide one. This implies that, if you do not provide a custom initial guess, at the first evaluation of the constraints, the solver will have a division by zero and it'll crash.

Basic multi GPU parallelization of matrix multiplication

I want to parallelize the simple following expression on 2 GPUs: C = A^n + B^n by calculating A^n on GPU 0 and B^n on GPU 1 before summing the results.
In TensorFlow I would go like:
with tf.device('/gpu:0'):
An = matpow(A, n)
with tf.device('/gpu:1'):
Bn = matpow(B, n)
with tf.Session() as sess:
C = sess.run(An + Bn)
However, since PyTorch is dynamic, I'm having trouble doing the same thing. I tried the following but it only takes more time.
with torch.cuda.device(0):
A = A.cuda()
with torch.cuda.device(1):
B = B.cuda()
C = matpow(A, n) + matpow(B, n).cuda(0)
I know there is a module to parallelize models on the batch dimension using torch.nn.DataParallel but here I try to do something more basic.

You can use cuda streams for this. This will not necessarily distribute it over two devices, but the execution will be in parallel.
s1 = torch.cuda.Stream()
s2 = torch.cuda.Stream()
with torch.cuda.stream(s1):
A = torch.pow(A,n)
with torch.cuda.stream(s2):
B = torch.pow(B,n)
C = A+B
Although I'm not sure whether it will really speed up your computation if you only parallelize this one operation. Your matrices must be really big.
If your requirement is to split it across devices, you can add this before the streams:
A = A.cuda(0)
B = B.cuda(1)
Then after the power operation, you need to get them on the same device again, e.g. B = B.cuda(0). After that you can do the addition.

solving non linear problems in python

in last equation i need to solve for q. Here is the problem from miranda feckler , I need to develop equivalent python code If my function is based on many variables and i need to solve non linear root finding problem for only one variable then how will i write-
when i write all the three variable, I get following error
TypeError: 'numpy.ndarray' object is not callable
and when i write only one of variables-
i get error-
TypeError: resid() missing 2 required positional arguments: 'p' and 'phi'
can anyone tell me my mistake and a better code for this.

broyden1(resid(co, p_node, q), co)
breaks because the term resid(co, p_node, q) gets evaluated (returning an array) before passing into the function.
broyden1(resid, co)
breaks because when broyden1 evaluates it calls resid(co) which is clearly not well defined. You want to be able to pass the initial guess as a single object (e.g. a tuple) in broyden1, so a simple solution is to just redefine resid to take in a tuple instead of three sepearate arguments, like so:
def resid(arg):
c,p,phi = arg
return p + (phi * c) * ((-1 / eta) * (p ** (eta + 1))) \
- alpha * (np.sqrt(np.abs(phi * c))) - (phi * c) ** 2
c1 = scipy.optimize.broyden1(resid, (co, p_node, q))

Using Theano.scan with shared variables

I want to calculate the sumproduct of two arrays in Theano. Both arrays are declared as shared variables and are the result of prior computations. Reading the tutorial, I found out how to use scan to compute what I want using 'normal' tensor arrays, but when I tried to adapt the code to shared arrays I got the error message TypeError: function() takes at least 1 argument (1 given). (See minimal running code example below)
Where is the mistake in my code? Where is my misconception? I am also open to a different approach for solving my problem.
Generally I would prefer a version which takes the shared variables directly, because in my understanding, converting the arrays first back to Numpy arrays and than again passing them to Theano, would be wasteful.
Error message producing sumproduct code using shared variables:
import theano
import theano.tensor as T
import numpy
a1 = [1,2,4]
a2 = [3,4,5]
Ta1_shared = theano.shared(numpy.array(a1))
Ta2_shared = theano.shared(numpy.array(a2))
outputs_info = T.as_tensor_variable(numpy.asarray(0, 'float64'))
Tsumprod_result, updates = theano.scan(fn=lambda Ta1_shared, Ta2_shared, prior_value:
prior_value + Ta1_shared * Ta2_shared,
outputs_info=outputs_info,
sequences=[Ta1_shared, Ta2_shared])
Tsumprod_result = Tsumprod_result[-1]
Tsumprod = theano.function(outputs=Tsumprod_result)
print Tsumprod()
Error message:
TypeError: function() takes at least 1 argument (1 given)
Working sumproduct code using non-shared variables:
import theano
import theano.tensor as T
import numpy
a1 = [1, 2, 4]
a2 = [3, 4, 5]
Ta1 = theano.tensor.vector("a1")
Ta2 = theano.tensor.vector("coefficients")
outputs_info = T.as_tensor_variable(numpy.asarray(0, 'float64'))
Tsumprod_result, updates = theano.scan(fn=lambda Ta1, Ta2, prior_value:
prior_value + Ta1 * Ta2,
outputs_info=outputs_info,
sequences=[Ta1, Ta2])
Tsumprod_result = Tsumprod_result[-1]
Tsumprod = theano.function(inputs=[Ta1, Ta2], outputs=Tsumprod_result)
print Tsumprod(a1, a2)

You need to change the compilation line to this one:
Tsumprod = theano.function([], outputs=Tsumprod_result)
theano.function() always need a list of inputs. If the function take 0 input, like in this case, you need to give an empty list for the inputs.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Is there any essential difference between * and + for Pytorch autograd? - python

The difference is adding a constant doesn't change the gradient, but mull by const does. It seems, autograd is aware of it, and optimizes out 'b = b + a'.

Related

Sympy simplification of maximum

Cost function using absolute value and division by decision variables

Basic multi GPU parallelization of matrix multiplication

solving non linear problems in python

Using Theano.scan with shared variables

Categories

Resources