I want to define the following function of two variables in Theano and compute its Jacobian:
f(x1,x2) = sum((2 + 2k - exp(k*x1) - exp(k*x2))^2, k = 1..10)
How do I make a Theano function for the above expression - and eventually minimize it using its Jacobian?
Since your function is scalar, the Jacobian reduces to the gradient. Assuming your two variables x1, x2 are scalar (looks like it from the formula, easily generalizable to other objects), you can write
import theano
import theano.tensor as T
x1 = T.fscalar('x1')
x2 = T.fscalar('x2')
k = T.arange(1, 10)
expr = ((2 + 2 * k - T.exp(x1 * k) - T.exp(x2 * k)) ** 2).sum()
func = theano.function([x1, x2], expr)
You can call func on two scalars
In [1]: func(0.25,0.25)
Out[1]: array(126.5205307006836, dtype=float32)
The gradient (Jacobian) is then
grad_expr = T.grad(cost=expr, wrt=[x1, x2])
And you can use updates in theano.function in the standard way (see theano tutorials) to make your gradient descent, setting x1, x2 as shared variables in givens, by hand on the python level, or using scan as indicated by others.
Related
I am little bit confused how to calculate the partial derivatives of sigmoid function in python. Since in general we can calculate that by using the given code:
Example: f(x,y) = x4 + x * y4
w.r.t x. would be then :
import sympy as sym
#Derivatives of multivariable function
x , y = sym.symbols('x y')
f = x4+x*y4
#Differentiating partially w.r.t x
derivative_f = f.diff(x)
derivative_f
how would the code work for partial derivatives of this then:
I tried to sub to the same function but I think I am doing something incorrectly
I have the following equations:
sqrt((x0 - x)^2 + (y0 - y)^2) - sqrt((x1 - x)^2 + (y1 - y)^2) = c1
sqrt((x3 - x)^2 + (y3 - y)^2) - sqrt((x4 - x)^2 + (y4 - y)^2) = c2
And I would like to find the intersection. I tried using fsolve, and transforming the equations into linear f(x) functions, and it worked for small numbers. I am working with huge numbers and to solve the linear equation there are lots of calculations performed, specifically the calculations reach to a square root of a subtraction, and when handling huge numbers precision is lost, and the left operand is smaller than the right one getting to a math value domain error trying to solve the square root of a negative number.
I am trying to solve this issue in different manners:
Trying to use bigger precision floats. Tried using numpy.float128 but fsolve wont allow using this.
Currently searching for a library that allows to solve non linear equations system, but no luck so far.
Any help/guidance/tip I will appreciate!!
Thanks!!
Taking all advice, i ended using code like the following:
for the the system:
0 = x + y - 8
0 = sqrt((-6 - x)^2 + (4 - y)^2) - sqrt((1 - x)^2 + y^) - 5
from math import sqrt
import numpy as np
from scipy.optimize import fsolve
def f(x):
y = np.zeros(2)
y[0] = x[1] + x[0] - 8
y[1] = sqrt((-6 - x[0]) ** 2 + (4 - x[1]) ** 2) - sqrt((1 - x[0]) ** 2 + x[1] ** 2) - 5
return y
x0 = np.array([0, 0])
solution = fsolve(f, x0)
print "(x, y) = (" + str(solution[0]) + ", " + str(solution[1]) + ")"
Note: the line x0 = np.array([0, 0]) corresponds to the seed that the method uses in fsolve in order to get to a solution. It is important to have a close seed to reach for a solution.
The example provided works :)
You might find some use in SymPy, which is a symbolic algebra manipulation in Python.
From it's home page:
SymPy is a Python library for symbolic mathematics. It aims to become a full-featured computer algebra system (CAS) while keeping the code as simple as possible in order to be comprehensible and easily extensible. SymPy is written entirely in Python.
As you have a non-linear equation you need some kind of optimizer to solve it. Probably you can use something like scipy.optimize (https://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html). However, as I have no experience with that scipy function can offer you only a solution with the gradient descent method of the tensorflow library. You can find a short guide here: https://learningtensorflow.com/lesson7/ (check out the Gradient descent cahpter). Analog to the method described there you could do something like that:
# These arrays are pseudo code, fill in your values for x0,x1,y0,y1,...
x_array = [x0,x1,x3,x4]
y_array = [y0,y1,y3,y4]
c_array = [c1,c2]
# Tensorflow model starts here
x=tf.placeholder("float")
y=tf.placeholder("float")
z=tf.placeholder("float")
# the array [0,0] are initial guesses for the "correct" x and y that solves the equation
xy_array = tf.Variable([0,0], name="xy_array")
x0 = tf.constant(x_array[0], name="x0")
x1 = tf.constant(x_array[1], name="x1")
x3 = tf.constant(x_array[2], name="x3")
x4 = tf.constant(x_array[3], name="x4")
y0 = tf.constant(y_array[0], name="y0")
y1 = tf.constant(y_array[1], name="y1")
y3 = tf.constant(y_array[2], name="y3")
y4 = tf.constant(y_array[3], name="y4")
c1 = tf.constant(c_array[0], name="c1")
c2 = tf.constant(c_array[1], name="c2")
# I took your first line and subtracted c1 from it, same for the second line, and introduced d_1 and d_2
d_1 = tf.sqrt(tf.square(x0 - xy_array[0])+tf.square(y0 - xy_array[1])) - tf.sqrt(tf.square(x1 - xy_array[0])+tf.square(y1 - xy_array[1])) - c_1
d_2 = tf.sqrt(tf.square(x3 - xy_array[0])+tf.square(y3 - xy_array[1])) - tf.sqrt(tf.square(x4 - xy_array[0])+tf.square(y4 - xy_array[1])) - c_2
# this z_model should actually be zero in the end, in that case there is an intersection
z_model = d_1 - d_2
error = tf.square(z-z_model)
# you can try different values for the "learning rate", here 0.01
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(error)
model = tf.global_variables_initializer()
with tf.Session() as session:
session.run(model)
# here you are creating a "training set" of size 1000, you can also make it bigger if you like
for i in range(1000):
x_value = np.random.rand()
y_value = np.random.rand()
d1_value = np.sqrt(np.square(x_array[0]-x_value)+np.square(y_array[0]-y_value)) - np.sqrt(np.square(x_array[1]-x_value)+np.square(y_array[1]-y_value)) - c_array[0]
d2_value = np.sqrt(np.square(x_array[2]-x_value)+np.square(y_array[2]-y_value)) - np.sqrt(np.square(x_array[3]-x_value)+np.square(y_array[3]-y_value)) - c_array[1]
z_value = d1_value - d2_value
session.run(train_op, feed_dict={x: x_value, y: y_value, z: z_value})
xy_value = session.run(xy_array)
print("Predicted model: {a:.3f}x + {b:.3f}".format(a=xy_value[0], b=xy_value[1]))
But be aware: This code will probably run a while... This is why haven't tested it...
Also I am currently not sure what will happen if there is no intersection. Probably you get the coordinates of the closest distance of the functions...
Tensorflow can be somewhat difficult if you haven't used it yet, but it is worth to learn it, as you can also use it for any deep learning application (actual purpose of this library).
I have a random variable Y whose distribution is Poisson with parameter that is itself a random variable X, which is Poisson with parameter 10.
How can I use SymPy to automatically calculate the covariance between X and Y?
The code
from sympy.stats import *
x1 = Poisson("x1", 3)
x2 = Poisson("x2", x1)
print(covariance(x2,x1))
raises an error
ValueError: Lambda must be positive
The documentation is not clear to me on this matter, and playing around with the function given did not seem to work.
This kind of manipulation is not implemented in SymPy. But you can pass a symbol (z1 below) for the parameter of a distribution. Then after the first step of computation, replace z1 by x1 and take expected value.
from sympy import Symbol
from sympy.stats import Poisson, E
z1 = Symbol("z1")
x1 = Poisson("x1", 3)
x2 = Poisson("x2", z1)
Ex2 = E(E(x2).subs(z1, x1))
Vx2 = E(E((x2-Ex2)**2).subs(z1, x1))
cov = E(E((z1-E(x1))*(x2-Ex2)).subs(z1, x1))
print("E(x2) = {}, var(x2) = {}, cov(x1, x2) = {}".format(Ex2, Vx2, cov))
Output:
E(x2) = 3, var(x2) = 6, cov(x1, x2) = 3
Notice the appearance of Ex2 instead of E(x2) in the formulas for variance and covariance. Using E(x2) here would give incorrect results because E(x2) is an expression involving z1. For the same reason I'm not using variance or covariance functions (as they'd involve the variable E(x2) instead of the correct value 3), expressing everything explicitly as an expected value.
I'm trying to solve a second order ODE using odeint from scipy. The issue I'm having is the function is implicitly coupled to the second order term, as seen in the simplified snippet (please ignore the pretend physics of the example):
import numpy as np
from scipy.integrate import odeint
def integral(y,t,F_l,mass):
dydt = np.zeros_like(y)
x, v = y
F_r = (((1-a)/3)**2 + (2*(1+a)/3)**2) * v # 'a' implicit
a = (F_l - F_r)/mass
dydt = [v, a]
return dydt
y0 = [0,5]
time = np.linspace(0.,10.,21)
F_lon = 100.
mass = 1000.
dydt = odeint(integral, y0, time, args=(F_lon,mass))
in this case I realise it is possible to algebraically solve for the implicit variable, however in my actual scenario there is a lot of logic between F_r and the evaluation of a and algebraic manipulation fails.
I believe the DAE could be solved using MATLAB's ode15i function, but I'm trying to avoid that scenario if at all possible.
My question is - is there a way to solve implicit ODE functions (DAE) in python( scipy preferably)? And is there a better way to pose the problem above to do so?
As a last resort, it may be acceptable to pass a from the previous time-step. How could I pass dydt[1] back into the function after each time-step?
Quite Old , but worth updating so it may be useful for anyone, who stumbles upon this question. There are quite few packages currently available in python that can solve implicit ODE.
GEKKO (https://github.com/BYU-PRISM/GEKKO) is one of the packages, that specializes on dynamic optimization for mixed integer , non linear optimization problems, but can also be used as a general purpose DAE solver.
The above "pretend physics" problem can be solved in GEKKO as follows.
m= GEKKO()
m.time = np.linspace(0,100,101)
F_l = m.Param(value=1000)
mass = m.Param(value =1000)
m.options.IMODE=4
m.options.NODES=3
F_r = m.Var(value=0)
x = m.Var(value=0)
v = m.Var(value=0,lb=0)
a = m.Var(value=5,lb=0)
m.Equation(x.dt() == v)
m.Equation(v.dt() == a)
m.Equation (F_r == (((1-a)/3)**2 + (2*(1+a)/3)**2 * v))
m.Equation (a == (1000 - F_l)/mass)
m.solve(disp=False)
plt.plot(x)
if algebraic manipulation fails, you can go for a numerical solution of your constraint, running for example fsolve at each timestep:
import sys
from numpy import linspace
from scipy.integrate import odeint
from scipy.optimize import fsolve
y0 = [0, 5]
time = linspace(0., 10., 1000)
F_lon = 10.
mass = 1000.
def F_r(a, v):
return (((1 - a) / 3) ** 2 + (2 * (1 + a) / 3) ** 2) * v
def constraint(a, v):
return (F_lon - F_r(a, v)) / mass - a
def integral(y, _):
v = y[1]
a, _, ier, mesg = fsolve(constraint, 0, args=[v, ], full_output=True)
if ier != 1:
print "I coudn't solve the algebraic constraint, error:\n\n", mesg
sys.stdout.flush()
return [v, a]
dydt = odeint(integral, y0, time)
Clearly this will slow down your time integration. Always check that fsolve finds a good solution, and flush the output so that you can realize it as it happens and stop the simulation.
About how to "cache" the value of a variable at a previous timestep, you can exploit the fact that default arguments are calculated only at the function definition,
from numpy import linspace
from scipy.integrate import odeint
#you can choose a better guess using fsolve instead of 0
def integral(y, _, F_l, M, cache=[0]):
v, preva = y[1], cache[0]
#use value for 'a' from the previous timestep
F_r = (((1 - preva) / 3) ** 2 + (2 * (1 + preva) / 3) ** 2) * v
#calculate the new value
a = (F_l - F_r) / M
cache[0] = a
return [v, a]
y0 = [0, 5]
time = linspace(0., 10., 1000)
F_lon = 100.
mass = 1000.
dydt = odeint(integral, y0, time, args=(F_lon, mass))
Notice that in order for the trick to work the cache parameter must be mutable, and that's why I use a list. See this link if you are not familiar with how default arguments work.
Notice that the two codes DO NOT produce the same result, and you should be very careful using the value at the previous timestep, both for numerical stability and precision. The second is clearly much faster though.
Short summary: How do I quickly calculate the finite convolution of two arrays?
Problem description
I am trying to obtain the finite convolution of two functions f(x), g(x) defined by
To achieve this, I have taken discrete samples of the functions and turned them into arrays of length steps:
xarray = [x * i / steps for i in range(steps)]
farray = [f(x) for x in xarray]
garray = [g(x) for x in xarray]
I then tried to calculate the convolution using the scipy.signal.convolve function. This function gives the same results as the algorithm conv suggested here. However, the results differ considerably from analytical solutions. Modifying the algorithm conv to use the trapezoidal rule gives the desired results.
To illustrate this, I let
f(x) = exp(-x)
g(x) = 2 * exp(-2 * x)
the results are:
Here Riemann represents a simple Riemann sum, trapezoidal is a modified version of the Riemann algorithm to use the trapezoidal rule, scipy.signal.convolve is the scipy function and analytical is the analytical convolution.
Now let g(x) = x^2 * exp(-x) and the results become:
Here 'ratio' is the ratio of the values obtained from scipy to the analytical values. The above demonstrates that the problem cannot be solved by renormalising the integral.
The question
Is it possible to use the speed of scipy but retain the better results of a trapezoidal rule or do I have to write a C extension to achieve the desired results?
An example
Just copy and paste the code below to see the problem I am encountering. The two results can be brought to closer agreement by increasing the steps variable. I believe that the problem is due to artefacts from right hand Riemann sums because the integral is overestimated when it is increasing and approaches the analytical solution again as it is decreasing.
EDIT: I have now included the original algorithm 2 as a comparison which gives the same results as the scipy.signal.convolve function.
import numpy as np
import scipy.signal as signal
import matplotlib.pyplot as plt
import math
def convolveoriginal(x, y):
'''
The original algorithm from http://www.physics.rutgers.edu/~masud/computing/WPark_recipes_in_python.html.
'''
P, Q, N = len(x), len(y), len(x) + len(y) - 1
z = []
for k in range(N):
t, lower, upper = 0, max(0, k - (Q - 1)), min(P - 1, k)
for i in range(lower, upper + 1):
t = t + x[i] * y[k - i]
z.append(t)
return np.array(z) #Modified to include conversion to numpy array
def convolve(y1, y2, dx = None):
'''
Compute the finite convolution of two signals of equal length.
#param y1: First signal.
#param y2: Second signal.
#param dx: [optional] Integration step width.
#note: Based on the algorithm at http://www.physics.rutgers.edu/~masud/computing/WPark_recipes_in_python.html.
'''
P = len(y1) #Determine the length of the signal
z = [] #Create a list of convolution values
for k in range(P):
t = 0
lower = max(0, k - (P - 1))
upper = min(P - 1, k)
for i in range(lower, upper):
t += (y1[i] * y2[k - i] + y1[i + 1] * y2[k - (i + 1)]) / 2
z.append(t)
z = np.array(z) #Convert to a numpy array
if dx != None: #Is a step width specified?
z *= dx
return z
steps = 50 #Number of integration steps
maxtime = 5 #Maximum time
dt = float(maxtime) / steps #Obtain the width of a time step
time = [dt * i for i in range (steps)] #Create an array of times
exp1 = [math.exp(-t) for t in time] #Create an array of function values
exp2 = [2 * math.exp(-2 * t) for t in time]
#Calculate the analytical expression
analytical = [2 * math.exp(-2 * t) * (-1 + math.exp(t)) for t in time]
#Calculate the trapezoidal convolution
trapezoidal = convolve(exp1, exp2, dt)
#Calculate the scipy convolution
sci = signal.convolve(exp1, exp2, mode = 'full')
#Slice the first half to obtain the causal convolution and multiply by dt
#to account for the step width
sci = sci[0:steps] * dt
#Calculate the convolution using the original Riemann sum algorithm
riemann = convolveoriginal(exp1, exp2)
riemann = riemann[0:steps] * dt
#Plot
plt.plot(time, analytical, label = 'analytical')
plt.plot(time, trapezoidal, 'o', label = 'trapezoidal')
plt.plot(time, riemann, 'o', label = 'Riemann')
plt.plot(time, sci, '.', label = 'scipy.signal.convolve')
plt.legend()
plt.show()
Thank you for your time!
or, for those who prefer numpy to C. It will be slower than the C implementation, but it's just a few lines.
>>> t = np.linspace(0, maxtime-dt, 50)
>>> fx = np.exp(-np.array(t))
>>> gx = 2*np.exp(-2*np.array(t))
>>> analytical = 2 * np.exp(-2 * t) * (-1 + np.exp(t))
this looks like trapezoidal in this case (but I didn't check the math)
>>> s2a = signal.convolve(fx[1:], gx, 'full')*dt
>>> s2b = signal.convolve(fx, gx[1:], 'full')*dt
>>> s = (s2a+s2b)/2
>>> s[:10]
array([ 0.17235682, 0.29706872, 0.38433313, 0.44235042, 0.47770012,
0.49564748, 0.50039326, 0.49527721, 0.48294359, 0.46547582])
>>> analytical[:10]
array([ 0. , 0.17221333, 0.29682141, 0.38401317, 0.44198216,
0.47730244, 0.49523485, 0.49997668, 0.49486489, 0.48254154])
largest absolute error:
>>> np.max(np.abs(s[:len(analytical)-1] - analytical[1:]))
0.00041657780840698155
>>> np.argmax(np.abs(s[:len(analytical)-1] - analytical[1:]))
6
Short answer: Write it in C!
Long answer
Using the cookbook about numpy arrays I rewrote the trapezoidal convolution method in C. In order to use the C code one requires three files (https://gist.github.com/1626919)
The C code (performancemodule.c).
The setup file to build the code and make it callable from python (performancemodulesetup.py).
The python file that makes use of the C extension (performancetest.py)
The code should run upon downloading by doing the following
Adjust the include path in performancemodule.c.
Run the following
python performancemodulesetup.py build
python performancetest.py
You may have to copy the library file performancemodule.so or performancemodule.dll into the same directory as performancetest.py.
Results and performance
The results agree neatly with one another as shown below:
The performance of the C method is even better than scipy's convolve method. Running 10k convolutions with array length 50 requires
convolve (seconds, microseconds) 81 349969
scipy.signal.convolve (seconds, microseconds) 1 962599
convolve in C (seconds, microseconds) 0 87024
Thus, the C implementation is about 1000 times faster than the python implementation and a bit more than 20 times as fast as the scipy implementation (admittedly, the scipy implementation is more versatile).
EDIT: This does not solve the original question exactly but is sufficient for my purposes.